What this document is.You've finished the AI Foundations curriculum. This Recap is not a re-read — it's a retrieval scaffold. Use it before a meeting with a CAIO or CISO to confirm the vocabulary is loaded. Use it six weeks from now when you can't remember what "context length" actually means in a pricing conversation. Every entry here points back to a source article. Nothing here is new unless it's labeled as such.
Note: GPT-4o figures used throughout are illustrative placeholders. Specific numbers — pricing, context limits, cutoff dates — are subject to change. Verify against the OpenAI platform documentation before citing in a customer conversation.
The Five Mental Models
Next-token prediction — The model generates output one token at a time, each token chosen based on probability given everything that came before it. When it comes up: When a buyer asks "does it actually understand what I'm asking?" The answer is: it behaves as if it does, because predicting the next token at scale produces outputs that look like comprehension. Don't confuse with: Retrieval. The model generates forward; it doesn't look things up.
Tokens-as-currency — Tokens are the unit of measure for both cost and capacity. Input tokens and output tokens are priced separately. Context length is denominated in tokens. When it comes up: Every pricing conversation. Every discussion of what fits in a single call to the model. Don't confuse with: Words. A token is roughly four characters in English, but that ratio shifts with technical content, code, and non-Latin scripts.
Embeddings-as-meaning — The model converts text into vectors — lists of numbers that encode semantic relationships. Similar meaning produces similar vectors. When it comes up: When the conversation turns to retrieval-augmented generation or semantic search. Similarity in vector space drives retrieval, not keyword matching. Don't confuse with: The model's context window. Embeddings live in a separate index; the context window is what the model sees during a single inference call.
Grounding-not-trusting — Model output is a probability distribution over plausible text, not a lookup from verified facts. Grounding means supplying authoritative content in the prompt so the model generates against it, not against training data alone. When it comes up: Every time a buyer asks about accuracy or hallucination. The posture is: verify outputs against sources, and supply those sources in the prompt when accuracy matters. Don't confuse with: Fine-tuning. Grounding is a runtime operation. Fine-tuning changes the model's weights permanently and is a different, more expensive intervention.
Reasoning-as-extra-tokens — Some models expose a reasoning mode that generates a chain of intermediate steps before producing a final answer. Those steps consume tokens. More tokens, more cost, more latency, better performance on complex tasks. When it comes up: When a buyer asks about "thinking" models or asks why one model costs more than another on the same task. Don't confuse with: A smarter model. Reasoning mode is the same underlying mechanism — next-token prediction — applied to a longer, structured internal monologue before the visible output.
If you remember nothing else: All five models reduce to one mechanism. Next-token prediction, running at scale, with tokens as the unit of cost, vectors as the representation of meaning, grounding as the reliability intervention, and extended token chains as the performance lever.
The Spec Sheet, Field by Field
Context length — 128,000 tokens (illustrative) Recall tokens-as-currency. Context length is the total token budget for a single inference call: system prompt, conversation history, retrieved documents, and the model's output all draw from the same pool. At 128K tokens, a GPT-4o call can hold roughly 90,000 words of combined input and output. In practice, performance can degrade near the limit. Buyers who ask "how much can it handle?" are asking a tokens question, not a pages question.
Token pricing — ~$5 input / ~$15 output per million tokens (illustrative) The asymmetry matters. Output costs three times more than input because generating tokens is computationally heavier than reading them. A workflow that produces long outputs — summaries, drafts, structured reports — costs more than one that produces short answers. When a buyer asks about cost at scale, the output token rate is usually the number that moves the estimate.
Supported modalities — text, image, audio GPT-4o was designed as a multimodal-native model, not a text model with vision bolted on. Recall next-token prediction: the mechanism is the same across modalities; non-text inputs are tokenized before prediction begins. In a federal account context, modality support affects what data types can enter the model and what compliance obligations follow. An image of a document is still data.
Reasoning mode — not available on GPT-4o GPT-4o does not expose a reasoning mode. OpenAI's o-series models (o1, o3, o4-mini) do. Recall reasoning-as-extra-tokens: those models generate a hidden chain-of-thought before producing visible output, which consumes additional tokens and increases latency. If a buyer is comparing GPT-4o to an o-series model on a complex analytical task, the performance difference is largely explained by this mechanism. GPT-4o is faster and cheaper; o-series is slower, more expensive, and better on tasks requiring multi-step logic.
Knowledge cutoff — April 2024 (illustrative) The model's training data ends here. Recall grounding-not-trusting: anything that happened after April 2024 is outside the model's training distribution. The model will still generate plausible-sounding text about post-cutoff events — which is exactly the failure mode grounding is designed to prevent. For federal buyers, this matters most when the use case involves current policy, recent threat intelligence, or updated regulations.
Architecture notes — transformer-based, decoder-only, multimodal-native This field is background, not a selling point. The attention mechanism you encountered in the embeddings lesson weighs relationships across the full context window. "Decoder-only" means the model generates forward; there's no separate encoder reading the input and decoder writing the output. Multimodal-native means the architecture was trained on mixed-modality data from the start, not adapted afterward.
If you remember nothing else: A spec sheet is a token budget, a cost structure, a capability boundary, and a reliability caveat. Every field maps to one of the five mental models. If a field doesn't map, ask why it's on the sheet.
Vocabulary Collision Zones
These are the terms where your IDAM vocabulary and the AI vocabulary use the same word to mean different things. The collision happens mid-conversation, and it's invisible until something breaks.
| AI Term | What It Means in AI | IDAM Equivalent | Key Divergence |
|---|---|---|---|
| Token | A chunk of text (roughly 4 characters); the unit of cost and context | OAuth access token; a credential authorizing a request | An LLM token has no identity, no expiry, no scope. It's a unit of text, not a unit of trust. |
| Context | Everything the model can see in a single inference call | Security context; the identity and authorization state of a request | LLM context is ephemeral and stateless. It doesn't persist between calls unless you explicitly reconstruct it. |
| Session | Informally used to describe a conversation thread; not a formal LLM concept | An authenticated, stateful session with a defined lifecycle | LLM inference is stateless by default. "Session" in AI products is usually an application-layer construct built on top of stateless API calls. |
| AI Term | What It Means in AI | IDAM Equivalent | Key Divergence |
|---|---|---|---|
| Knowledge cutoff | The date after which training data was excluded; the model has no native awareness of later events | Stale directory data; an identity store that hasn't synced | A stale directory can be refreshed. A knowledge cutoff is fixed in the model's weights. You can't sync the model; you can only ground it with current data at runtime. |
| Modality | The data type the model can process natively (text, image, audio) | Authentication factor type (something you know, have, are) | Modality in AI describes input/output capability, not assurance level. A model that accepts images is not making any claim about the trustworthiness of those images. |
If you remember nothing else: When an AI vendor uses a word you recognize from IDAM, assume the meaning has shifted until you've confirmed it hasn't. The collision is most dangerous with "token" and "session" because those carry specific trust semantics in your world that don't exist in theirs.
The Question This Spec Sheet Cannot Answer
You now know what GPT-4o can process, what it costs, how far its knowledge extends, and what mechanism underlies its outputs. The spec sheet is complete.
It tells you nothing about how your data gets in front of the model.
Not your agency's policy documents. Not the user's case history. Not the tools the model would need to call to take action in a system. The spec sheet describes a capable, stateless function. It says nothing about how that function connects to the identity infrastructure, the data stores, and the authorization layer that make it useful in an actual enterprise workflow.
That connection problem is what the next section covers. It has a name, and it has emerging standards, and it has implications for every Okta account where an AI agent is being considered. The spec sheet is where the model ends. The next section is where your work begins.
For More Information
| Recap Entry | Source Article | Section |
|---|---|---|
| Next-token prediction | "How Language Models Actually Work" | AI Foundations, Lesson 1 |
| Tokens-as-currency | "The Token Economy: Cost, Context, and Capacity" | AI Foundations, Lesson 2 |
| Embeddings-as-meaning | "How Models Represent Meaning: Vectors and Retrieval" | AI Foundations, Lesson 3 |
| Grounding-not-trusting | "Accuracy, Hallucination, and the Grounding Posture" | AI Foundations, Lesson 4 |
| Reasoning-as-extra-tokens | "Reasoning Models: What They Cost and What They Buy" | AI Foundations, Lesson 5 |
| Vocabulary collision: token | "The Token Economy: Cost, Context, and Capacity" | AI Foundations, Lesson 2 |
| Vocabulary collision: context, session | "How Language Models Actually Work" | AI Foundations, Lesson 1 |
| Knowledge cutoff vs. stale data | "Accuracy, Hallucination, and the Grounding Posture" | AI Foundations, Lesson 4 |
| Reasoning mode (o-series) | "Reasoning Models: What They Cost and What They Buy" | AI Foundations, Lesson 5 |

