AI Foundations Recap: Reading a Model Spec Sheet

By Carey Whitten— May 5, 2026

AI Foundations Recap: Reading a Model Spec Sheet

You've done the reading. Your head is full. This document is the reference card — the thing you open the morning of a call where a CAIO is expected to drop model names and spec details, and you need to follow without bluffing.

Below: a real spec sheet annotated with vocabulary you've already built, five mental models in canonical form, the collision tables, and the one question every spec sheet leaves open.

The Annotated Spec Sheet

The example below uses OpenAI's publicly available model documentation for GPT-4o and the o-series reasoning models. Specific numbers — context window sizes, pricing — are subject to change; provider documentation updates without notice. The field labels and what they mean do not change.

Spec Field	Typical Value (GPT-4o)	Lesson It Connects To	What to Listen For in a Buyer Conversation
Context window	128,000 tokens*	Tokens	"How much can it hold at once?" — this is the working memory budget
Input pricing	Per 1M input tokens*	Tokens-as-currency	Cost questions about long documents or large-scale processing
Output pricing	Per 1M output tokens*	Tokens-as-currency	Why generating long responses costs more than short ones
Modalities	Text, image, audio	Model taxonomy	"Can it read PDFs / look at screenshots?" — yes, if vision is listed
Reasoning mode	Not available (see o3)	Model taxonomy	If buyer mentions "thinking" or "chain-of-thought," they want an o-series model
Knowledge cutoff	Varies by version*	Hallucination / grounding	"Does it know about [recent event]?" — if after cutoff, no, without grounding
API access	REST, streaming	Not covered above	Surfaces in integration conversations; flag for your SE

*Subject to change. Verify at platform.openai.com/docs/models before any customer conversation.

If you remember nothing else from this section: The context window is your working budget. The knowledge cutoff is your expiration date. Everything else is configuration.

Five Mental Models, Consolidated

The load-bearing concepts from the preceding eight pieces. Definition, sales context, confusion trap — in that order for each.

Next-token prediction — The model generates text by predicting the most statistically probable next word-fragment, one at a time, given everything that came before it. When it comes up: When a buyer asks why the model "made something up" or why it sounds confident when it's wrong. The engine operates on probability, not verification. Don't confuse with: Retrieval. Prediction is what the model does with its training. Retrieval is how you get current or proprietary information in front of it.

Tokens-as-currency — Everything the model processes — input text, output text, images converted to tokens — costs tokens. The context window is the budget. Pricing is denominated in tokens. When it comes up: Any cost-per-use conversation, any question about processing large documents, any discussion of why a long conversation gets expensive. Don't confuse with: OAuth tokens. Different concept entirely. See the collision table below.

Embeddings-as-meaning — Before generating anything, the model converts words and concepts into high-dimensional numerical vectors that encode semantic relationships. Similar meanings cluster together in this space. When it comes up: When buyers ask how semantic search works, how the model "understands" a question, or how retrieval-augmented systems match documents to queries. Don't confuse with: Keyword matching. Embeddings capture meaning; keyword search captures exact strings. A query for "access revocation" can surface documents about "deprovisioning" via embeddings, not via keyword.

Grounding-not-trusting — Because models predict from training data with a fixed knowledge cutoff, they cannot be trusted to know current facts. Grounding means connecting the model to authoritative, current sources at inference time so it generates from verified context rather than from memory. When it comes up: Every time a buyer asks about compliance data, recent policy changes, or real-time information. The answer is always: ground it, don't trust it. Don't confuse with: Fine-tuning. Fine-tuning updates the model's weights with new training data — expensive, slow, and still subject to a new cutoff. Grounding is runtime; fine-tuning is pre-deployment.

Reasoning-as-extra-tokens — Reasoning models (OpenAI's o-series, Anthropic's extended thinking variants) generate internal "thinking" tokens before producing a final answer. This costs more and takes longer, but improves accuracy on multi-step problems. When it comes up: When a buyer asks about complex document analysis, policy interpretation, or multi-step workflows. Also when they ask why one model costs significantly more than another for the same task. Don't confuse with: A smarter model. Reasoning mode is the same underlying architecture spending more tokens to work through a problem. More compute, same engine.

If you remember nothing else from this section: The model is a prediction engine. Everything else — pricing, context, reasoning, grounding — is scaffolding around that engine.

Vocabulary Collision Tables

Three terms from your IDAM world that surface in AI conversations carrying different meanings. The Key Divergence column is where the actual confusion lives.

Table 1: Token

AI Term	What It Means in AI	IDAM Equivalent	Key Divergence
Token	A word-fragment unit of text; the atomic unit of model input/output and pricing	OAuth access token / SAML assertion	IDAM tokens are opaque identifiers that carry no semantic content. LLM tokens are the content — they're the actual pieces of language being processed. Conflating them in a buyer conversation signals fluency failure.

Table 2: Context / Session

AI Term	What It Means in AI	IDAM Equivalent	Key Divergence
Context window	The total token budget available for a single model call — everything the model can "see" at once	Security context / session	An auth session persists across requests and can be revoked. A context window exists only for one inference call. There is no persistence, no revocation, and no server-side state. When the call ends, the context is gone.
Conversation history	Prior turns of a chat, re-injected as tokens at each new call	Session state	Conversation history is not stored by the model — it's reconstructed by the application layer, consuming tokens each time. What looks like memory is actually re-reading.

Table 3: Agent

AI Term	What It Means in AI	IDAM Equivalent	Key Divergence
AI agent	An AI system that takes actions in the world — calling APIs, running code, reading files — based on model outputs, often across multiple steps	IDAM agent (service account / non-human identity)	IDAM agents have stable identities, credentials, and scopes that can be governed. AI agents may have none of these by default. The identity question for AI agents — who are they, what can they access, who authorized them — is largely unsolved and is where your next section begins.

The Question the Spec Sheet Doesn't Answer

Every provider spec sheet tells you what the model can do in isolation: how many tokens it holds, what modalities it accepts, how it's priced, when its knowledge ends.

None of them tell you how to connect your organization's data, tools, and users to it.

That gap — between what the model can do and what it can reach — is where the next section lives. Retrieval-augmented generation, tool calling, API integrations, agent frameworks: these are the mechanisms that close the distance between a capable model and a useful one. And every one of them raises questions that are, at their core, identity questions: what does the model have access to, who authorized that access, and what happens when the answer should be no.

That's where we're going next.

For More Information

Concept	Source Lesson
Next-token prediction	Lesson 1: What Is a Language Model?
Tokens as units; context window as budget	Lesson 2: Tokens — The Unit of Everything
Token pricing and cost modeling	Lesson 2: Tokens — The Unit of Everything
Model taxonomy; modalities; reasoning vs. standard models	Lesson 3: Model Taxonomy — Which Model for Which Job
Embeddings and semantic representation	Lesson 4: Embeddings — How Models Represent Meaning
Hallucination mechanics; knowledge cutoff	Lesson 5: Hallucination — Why Models Confabulate
Grounding and the knowledge cutoff problem	Lesson 5: Hallucination — Why Models Confabulate
Reasoning models; thinking tokens; cost/accuracy tradeoffs	Lesson 6: Reasoning Models — Thinking as a Token Budget
Conversation context; why models don't "remember"	Lesson 7: Context and Memory — What the Model Actually Holds
Provider spec sheet structure; field-by-field annotation	This document