Five mental models. One spec sheet. Every term you'll hear in a CAIO conversation.
Annotated Spec Sheet: GPT-4o (Illustrative, May 2026)
Fields drawn from current OpenAI platform documentation. Numbers are representative benchmarks for annotation purposes.
Context Length: 128,000 tokens Mental model activated: Tokens-as-currency. This number is a budget, not a feature. A 10-page policy document runs roughly 7,500 tokens. Load three of them plus a system prompt and you've spent ~25,000 tokens before the model produces a single word of output. Mental model activated: Reasoning-as-extra-tokens. If extended thinking is enabled, the model's internal chain-of-thought draws from this same window. The context limit is the ceiling for everything: your input, the model's scratchpad, and its response.
Token Pricing: $5.00 / 1M input · $15.00 / 1M output Mental model activated: Tokens-as-currency. Output costs 3x input. That asymmetry matters in agentic workflows. Verbose responses aren't just slow; they're expensive. Buyers who ask "what will this cost?" are asking a token question, whether they know it or not.
Supported Modalities: Text, image, audio, code Mental model activated: Embeddings-as-meaning. Each modality gets converted to a numerical vector before the model processes it. An image isn't "seen" — it's encoded into the same embedding space as text. The model reasons across modalities because they share a coordinate system.
Reasoning Mode: Available (o-series architecture) Mental model activated: Reasoning-as-extra-tokens. Extended thinking generates a scratchpad before the final answer. That scratchpad is billed. For complex multi-step tasks, this trades token cost for accuracy. It is a deliberate tradeoff, not a free upgrade.
Knowledge Cutoff: January 2025 Mental model activated: Grounding-not-trusting. Everything after this date is a gap. The model doesn't know it has a gap. It answers questions about post-cutoff events with the same confidence it answers questions about 2020. This field, more than any other, drives the retrieval conversation.
Architecture: Transformer-based, autoregressive Mental model activated: Next-token prediction. "Autoregressive" means the model generates one token at a time, each conditioned on everything before it. The architecture is the explanation for why the model can be fluent and wrong simultaneously. Fluency is a property of token prediction. Accuracy is not.
If you remember nothing else: A spec sheet is a budget document. Context length is the account balance; token pricing is the burn rate; knowledge cutoff is the date the data stopped.
Five Mental Models: Reference Entries
Next-Token Prediction The model's only job is to predict the most probable next token given everything before it. When it comes up: When a buyer asks why the model said something wrong so confidently. Fluency and accuracy are independent outputs of the same mechanism. Don't confuse with: Retrieval. The model isn't looking anything up. It's pattern-completing from training data.
Tokens-as-Currency Every word, subword, and character costs tokens; tokens cost money and consume context window capacity. When it comes up: Pricing conversations, context window sizing, agentic workflow cost modeling. Don't confuse with: OAuth tokens. Same word, completely different concept. See Table 1.
Embeddings-as-Meaning The model converts text and other inputs into numerical vectors that encode semantic relationships. Similar meaning, similar coordinates. When it comes up: When buyers ask how semantic search works, or why the model recognizes a paraphrase it's never seen verbatim. Don't confuse with: Keyword matching. Embeddings find meaning proximity; keyword search finds string matches. The difference is why RAG works at all.
Grounding-not-Trusting Models don't know what's true. They know what's probable given training data. Grounding means supplying authoritative context at inference time. When it comes up: Every conversation about knowledge cutoffs, retrieval-augmented generation, and why the model needs access to agency documents to be useful. Don't confuse with: Fine-tuning. Fine-tuning changes the model's weights permanently. Grounding changes what the model sees at runtime. They solve different problems on different timelines at different costs.
Reasoning-as-Extra-Tokens Extended thinking models generate a chain-of-thought scratchpad before answering. More tokens, more cost, better accuracy on complex tasks. When it comes up: When buyers ask what makes reasoning models different, or why o-series pricing is higher than standard GPT-4o pricing. Don't confuse with: Retrieval augmentation. Reasoning is internal computation. Retrieval is external data access. A model can do both, neither, or either independently.
If you remember nothing else: The model is always doing one thing — predicting the next token. Every other capability is a consequence of doing that well at scale.
Vocabulary Collision Tables
Table 1: Token
| AI Term | What It Means in AI | IDAM Equivalent | Key Divergence |
|---|---|---|---|
| Token | Subword unit of text; the atomic unit of LLM input/output and pricing | OAuth bearer token / SAML assertion | An LLM token carries no identity, grants no access, and has no expiration in the credential sense. An OAuth token is a credential. These share a name and nothing else. |
Table 2: Context and Session
| AI Term | What It Means in AI | IDAM Equivalent | Key Divergence |
|---|---|---|---|
| Context window | The full text the model can process at one time: input, history, documents, instructions | Session / security context | A security context carries authenticated identity and persists across requests with enforcement. A context window carries text and ends when the conversation ends. There is no authenticated principal inside a context window unless you put one there explicitly. |
| System prompt | Instructions prepended to every conversation, not visible to the end user | Policy enforcement point / session-level policy | A system prompt is not enforced by any external authority. It is text the model is asked to follow. It can be overridden, leaked, or ignored. A PEP is enforced by infrastructure. In any compliance conversation, this gap surfaces immediately. |
Table 3: Agent and Scope
| AI Term | What It Means in AI | IDAM Equivalent | Key Divergence |
|---|---|---|---|
| AI agent | A model that takes actions — calling APIs, running code, browsing — based on instructions | Service account / non-human identity | A service account has a defined identity, scoped credentials, and an audit trail. An AI agent's identity is whatever the orchestration layer assigns it, which may be nothing. The agent problem is an identity problem that most agent frameworks have not solved. |
| Tool scope | The set of functions and APIs an agent is permitted to call | OAuth scope | OAuth scope is declared, enforced by the authorization server, and auditable. Tool scope in most agent frameworks is advisory — the model is asked to stay in bounds, not prevented from leaving them. |
If you remember nothing else: When a buyer uses an AI term that sounds like an IDAM term, assume the meanings diverge until you've confirmed otherwise. The collision is where the confusion lives.
For More Information
| Recap Entry | Source Lesson |
|---|---|
| Next-token prediction; autoregressive architecture | Lesson 1: How Language Models Actually Work |
| Tokens-as-currency; context window sizing | Lesson 2: Tokens — The Unit of Everything |
| Embeddings; semantic search; modality encoding | Lesson 3: How Models Understand Meaning |
| Knowledge cutoffs; grounding vs. fine-tuning | Lesson 4: What Models Know and Don't Know |
| Reasoning models; chain-of-thought; o-series pricing | Lesson 5: Reasoning Models — What's Actually Different |
| System prompts; context window limits; model behavior | Lesson 6: Model Behavior and Limits |
| Multimodal inputs; embedding spaces | Lesson 7: Multimodal Models in Practice |
| Agent identity gap; tool scope; orchestration risk | Lesson 8: Agentic Systems — Architecture and Risk |
| Spec sheet interpretation; model selection | Lesson 9: Reading a Model Card |
What the Spec Sheet Doesn't Answer
Every field above describes what the model can do. None of them tell you how your agency's documents, policies, and internal tools get in front of it at runtime.
That's the next section: retrieval architecture, tool-calling, and the infrastructure that connects a model's capabilities to the data it needs to be useful. The model is capable. Getting your agency's documents and tools in front of it at runtime is a separate problem. That's where the next section starts.

