You've done the reading. Your head is full. This document is the reference card — the thing you open the morning of a call where a CAIO is expected to drop model names and spec details, and you need to follow without bluffing.
Below: a real spec sheet annotated with vocabulary you've already built, five mental models in canonical form, the collision tables, and the one question every spec sheet leaves open.
The Annotated Spec Sheet
The example below uses OpenAI's publicly available model documentation for GPT-4o and the o-series reasoning models. Specific numbers — context window sizes, pricing — are subject to change; provider documentation updates without notice. The field labels and what they mean do not change.
| Spec Field | Typical Value (GPT-4o) | Lesson It Connects To | What to Listen For in a Buyer Conversation |
|---|---|---|---|
| Context window | 128,000 tokens* | Tokens | "How much can it hold at once?" — this is the working memory budget |
| Input pricing | Per 1M input tokens* | Tokens-as-currency | Cost questions about long documents or large-scale processing |
| Output pricing | Per 1M output tokens* | Tokens-as-currency | Why generating long responses costs more than short ones |
| Modalities | Text, image, audio | Model taxonomy | "Can it read PDFs / look at screenshots?" — yes, if vision is listed |
| Reasoning mode | Not available (see o3) | Model taxonomy | If buyer mentions "thinking" or "chain-of-thought," they want an o-series model |
| Knowledge cutoff | Varies by version* | Hallucination / grounding | "Does it know about [recent event]?" — if after cutoff, no, without grounding |
| API access | REST, streaming | Not covered above | Surfaces in integration conversations; flag for your SE |
*Subject to change. Verify at platform.openai.com/docs/models before any customer conversation.
If you remember nothing else from this section: The context window is your working budget. The knowledge cutoff is your expiration date. Everything else is configuration.
Five Mental Models, Consolidated
The load-bearing concepts from the preceding eight pieces. Definition, sales context, confusion trap — in that order for each.
Next-token prediction — The model generates text by predicting the most statistically probable next word-fragment, one at a time, given everything that came before it. When it comes up: When a buyer asks why the model "made something up" or why it sounds confident when it's wrong. The engine operates on probability, not verification. Don't confuse with: Retrieval. Prediction is what the model does with its training. Retrieval is how you get current or proprietary information in front of it.
Tokens-as-currency — Everything the model processes — input text, output text, images converted to tokens — costs tokens. The context window is the budget. Pricing is denominated in tokens. When it comes up: Any cost-per-use conversation, any question about processing large documents, any discussion of why a long conversation gets expensive. Don't confuse with: OAuth tokens. Different concept entirely. See the collision table below.
Embeddings-as-meaning — Before generating anything, the model converts words and concepts into high-dimensional numerical vectors that encode semantic relationships. Similar meanings cluster together in this space. When it comes up: When buyers ask how semantic search works, how the model "understands" a question, or how retrieval-augmented systems match documents to queries. Don't confuse with: Keyword matching. Embeddings capture meaning; keyword search captures exact strings. A query for "access revocation" can surface documents about "deprovisioning" via embeddings, not via keyword.
Grounding-not-trusting — Because models predict from training data with a fixed knowledge cutoff, they cannot be trusted to know current facts. Grounding means connecting the model to authoritative, current sources at inference time so it generates from verified context rather than from memory. When it comes up: Every time a buyer asks about compliance data, recent policy changes, or real-time information. The answer is always: ground it, don't trust it. Don't confuse with: Fine-tuning. Fine-tuning updates the model's weights with new training data — expensive, slow, and still subject to a new cutoff. Grounding is runtime; fine-tuning is pre-deployment.
Reasoning-as-extra-tokens — Reasoning models (OpenAI's o-series, Anthropic's extended thinking variants) generate internal "thinking" tokens before producing a final answer. This costs more and takes longer, but improves accuracy on multi-step problems. When it comes up: When a buyer asks about complex document analysis, policy interpretation, or multi-step workflows. Also when they ask why one model costs significantly more than another for the same task. Don't confuse with: A smarter model. Reasoning mode is the same underlying architecture spending more tokens to work through a problem. More compute, same engine.
If you remember nothing else from this section: The model is a prediction engine. Everything else — pricing, context, reasoning, grounding — is scaffolding around that engine.
Vocabulary Collision Tables
Three terms from your IDAM world that surface in AI conversations carrying different meanings. The Key Divergence column is where the actual confusion lives.
Table 1: Token
| AI Term | What It Means in AI | IDAM Equivalent | Key Divergence |
|---|---|---|---|
| Token | A word-fragment unit of text; the atomic unit of model input/output and pricing | OAuth access token / SAML assertion | IDAM tokens are opaque identifiers that carry no semantic content. LLM tokens are the content — they're the actual pieces of language being processed. Conflating them in a buyer conversation signals fluency failure. |
Table 2: Context / Session
| AI Term | What It Means in AI | IDAM Equivalent | Key Divergence |
|---|---|---|---|
| Context window | The total token budget available for a single model call — everything the model can "see" at once | Security context / session | An auth session persists across requests and can be revoked. A context window exists only for one inference call. There is no persistence, no revocation, and no server-side state. When the call ends, the context is gone. |
| Conversation history | Prior turns of a chat, re-injected as tokens at each new call | Session state | Conversation history is not stored by the model — it's reconstructed by the application layer, consuming tokens each time. What looks like memory is actually re-reading. |
Table 3: Agent
| AI Term | What It Means in AI | IDAM Equivalent | Key Divergence |
|---|---|---|---|
| AI agent | An AI system that takes actions in the world — calling APIs, running code, reading files — based on model outputs, often across multiple steps | IDAM agent (service account / non-human identity) | IDAM agents have stable identities, credentials, and scopes that can be governed. AI agents may have none of these by default. The identity question for AI agents — who are they, what can they access, who authorized them — is largely unsolved and is where your next section begins. |
The Question the Spec Sheet Doesn't Answer
Every provider spec sheet tells you what the model can do in isolation: how many tokens it holds, what modalities it accepts, how it's priced, when its knowledge ends.
None of them tell you how to connect your organization's data, tools, and users to it.
That gap — between what the model can do and what it can reach — is where the next section lives. Retrieval-augmented generation, tool calling, API integrations, agent frameworks: these are the mechanisms that close the distance between a capable model and a useful one. And every one of them raises questions that are, at their core, identity questions: what does the model have access to, who authorized that access, and what happens when the answer should be no.
That's where we're going next.
For More Information
| Concept | Source Lesson |
|---|---|
| Next-token prediction | Lesson 1: What Is a Language Model? |
| Tokens as units; context window as budget | Lesson 2: Tokens — The Unit of Everything |
| Token pricing and cost modeling | Lesson 2: Tokens — The Unit of Everything |
| Model taxonomy; modalities; reasoning vs. standard models | Lesson 3: Model Taxonomy — Which Model for Which Job |
| Embeddings and semantic representation | Lesson 4: Embeddings — How Models Represent Meaning |
| Hallucination mechanics; knowledge cutoff | Lesson 5: Hallucination — Why Models Confabulate |
| Grounding and the knowledge cutoff problem | Lesson 5: Hallucination — Why Models Confabulate |
| Reasoning models; thinking tokens; cost/accuracy tradeoffs | Lesson 6: Reasoning Models — Thinking as a Token Budget |
| Conversation context; why models don't "remember" | Lesson 7: Context and Memory — What the Model Actually Holds |
| Provider spec sheet structure; field-by-field annotation | This document |

