You just worked through seven vocabulary collisions. What follows is the scaffold. The lesson lives in the reading. Use this before calls, not instead of it.
Four clusters. One load-bearing claim each. If the scaffold holds, the details reconstruct themselves.
Cluster 1: The Model Is a Predictor
Prediction (next-token prediction) — The model generates output by selecting the statistically most likely next token given everything preceding it. It does not look anything up.
- When it comes up: A buyer says "can't we just have the model pull the right answer from its training data?" They're picturing a database. It's not one.
- Don't confuse with: Retrieval-augmented generation (RAG), which does retrieve documents. The model still predicts its response based on what's retrieved. No copy-paste happens.
Training data — The corpus the model learned patterns from. Frozen at training time. The model cannot access it at inference; it produces outputs whose statistical shape was influenced by it.
- When it comes up: A public sector buyer asks whether agency data used in fine-tuning is "in the model," and what that means for data residency or FedRAMP boundaries. The mechanism is pattern absorption. The model learned statistical shapes from the data, but the data itself isn't sitting inside it anywhere. That distinction changes the data governance conversation entirely.
- Don't confuse with: Context (Cluster 2). Training data shaped the model's weights. Context is what you hand it at runtime.
The model predicts plausible next tokens. Every capability and every failure follows from this single fact. When a buyer treats the model as a knowledge store, correct the frame before the conversation builds on it.
Cluster 2: Context Is Instruction + Data + History
Context window — The total text (measured in tokens) a model can hold and attend to during one interaction. Anthropic's documentation calls it "working memory". The analogy is precise: finite, expensive, and when it's full, something gets dropped.
- Where it maps: Like a security context, the context window defines what the model can attend to during a single interaction. Both are bounded, both are per-interaction, both determine what's "visible."
- When it comes up: A buyer asks how much data the model can "see." Current windows reach 1M tokens on Claude and Gemini 3. These numbers shift with releases.
- Don't confuse with: Training data. The context window is what the model works with right now. Training data is what it learned from before deployment. A security context governs access. A context window governs attention. Similar shape, different job entirely.
System prompt — The directive telling the model how to behave. Part of the context window. Consumes tokens. Not free.
- Where it maps: Like a policy rule, it defines expected behavior. Both exist to constrain what the system does.
- When it comes up: Buyers designing agentic workflows assume the system prompt is a fixed configuration, like a policy sitting outside the system it governs. It isn't. Recall that it competes with conversation history and retrieved documents for the same finite window.
- Don't confuse with: A Policy Decision Point. A PDP enforces; a system prompt suggests. The model can drift outside its system prompt in ways no request can drift outside a PDP decision. Nothing prevents it.
Context is the model's entire reality at inference time. System prompt, user input, conversation history, retrieved documents — all share one finite window. What's outside the window does not exist to the model. The control lives inside the budget. There's nothing above it.
Cluster 3: Token ≠ Token
Token (AI) — A chunk of text the model processes as a single unit. Not a word. Roughly 4 characters or 0.75 words in English, per OpenAI. Google's estimate runs slightly wider: 60–80 words per 100 tokens. The exact breakdown depends on the tokenizer, and tokenizers differ across providers and models.
- When it comes up: Every pricing conversation. Every context window sizing discussion. Every time a buyer asks "how much text can it handle?"
- Don't confuse with: An OAuth token, a SAML token, a session token, or any other identity artifact. The word is identical. The concepts share nothing. In AI, a token is a unit of text. In IDAM, a token is a security credential. In mixed conversations, specify which one you mean. Every time.
Tokenizer variance — Different models break the same text into different token counts. Not covered in the source articles, but worth knowing: Anthropic's Opus 4.7 uses a new tokenizer producing up to 35% more tokens for identical input compared to previous Claude models. Same text, same provider, different model, different bill.
- When it comes up: Cost estimation across models. A buyer comparing Claude to GPT pricing can't just compare per-token rates. The token counts themselves diverge.
- Don't confuse with: A bug. Tokenizer changes are deliberate engineering tradeoffs. More tokens can mean better model performance on code and structured data.
When someone says "token" in an AI conversation, they mean a chunk of text, not a credential. And the size of that chunk changes depending on which model is doing the chunking. Say this out loud in mixed meetings before the terminology creates a twenty-minute detour.
Cluster 4: Determinism Doesn't Exist Here
Temperature — A parameter controlling randomness in token selection. Lower temperature means more predictable output. Temperature 0 means the model always picks the highest-probability token. "Most predictable" is still not "deterministic."
- When it comes up: A buyer says "we need reproducible outputs for audit." Temperature 0 is the closest lever, but Anthropic's documentation states directly: "Even with temperature set to 0, the results will not be fully deterministic."
- Don't confuse with: A configuration guarantee. In IDAM, setting a policy parameter produces a deterministic enforcement outcome. Temperature is a statistical dial. Policies enforce or they don't. Temperature sort of enforces, mostly.
Confabulation — NIST's preferred term for what the industry calls "hallucination": the production of confidently stated but erroneous or false content. Listed as risk #1 in NIST AI 600-1's 12-risk taxonomy for generative AI.
- Where it maps: As you saw in the prediction collision, the closest IDAM analog is a misconfigured policy producing unintended access. Both produce wrong outputs from a system that's technically running as designed.
- When it comes up: Every conversation about trust, compliance, or AI in decision-making workflows. Use "confabulation" with government buyers. It's the term in the framework they're reading.
- Don't confuse with: A model error or malfunction. Confabulation is the system working as designed. A misconfigured policy is fixable. Confabulation is inherent to the architecture. Mitigation is the ceiling.
No setting makes a language model produce guaranteed-identical outputs. No setting prevents confident falsehoods. Both properties are structural. A buyer who needs auditability or factual guarantees needs architectural controls around the model. Configuration within it won't get there. That reframe matters in every public sector conversation.
Vocabulary Mapping Tables
Core Vocabulary Collisions
| AI Term | What It Means in AI | IDAM Equivalent | Key Divergence |
|---|---|---|---|
| Token | A chunk of text (~4 characters) processed as one unit | OAuth/SAML token: a security credential carrying claims | Zero overlap. Same word, unrelated concepts. Specify which you mean in mixed conversations. |
| Context window | Finite working memory during a single model interaction | Security context: attributes and claims for an authenticated session | Security context governs access. Context window governs attention. Same shape, different job. |
| Session | A conversation thread with history consuming the context window | An authenticated session with TTL and revocation | AI "sessions" have no authentication, no cryptographic binding, no revocation. History is just tokens in the window. |
| Scope | Not a native AI concept; used loosely for "what the model can access" | OAuth scope: a formally defined permission boundary | OAuth scopes are enforced by the authorization server. No equivalent enforcement layer exists in a base model. |
| Agent | An AI system taking autonomous actions via tools and APIs | A service account or machine identity acting on behalf of a user | IDAM agents have scoped credentials and audit trails by design. AI agents have neither unless you build both. |
Behavioral Concepts
| AI Term | What It Means in AI | IDAM Equivalent | Key Divergence |
|---|---|---|---|
| Confabulation | Confidently stated false output; structural property of prediction | No direct equivalent; closest is a misconfigured policy producing unintended access | Misconfigured policies are fixable. Confabulation is inherent to the architecture. Mitigation is the ceiling. |
| Temperature | Parameter controlling output randomness (0 = most predictable) | No direct equivalent; loosely analogous to a policy strictness setting | Temperature 0 still isn't deterministic. No IDAM policy works this way. Policies enforce or they don't. |
| Hallucination | Colloquial term for confabulation; NIST's draft profile flagged it as anthropomorphizing, and the final AI 600-1 adopted "confabulation" as the preferred term | N/A | "Hallucination" implies a malfunction. "Confabulation" names a structural property. The final AI 600-1 uses "confabulation" throughout — use the term your buyer's framework uses. |
Source Index
Every entry traces to sources used across the AI Foundations section. NIST frameworks and provider documentation prioritized.
NIST Frameworks
- NIST AI 600-1 — Generative AI risk profile, 12-risk taxonomy, confabulation definition. Final, July 2024.
- NIST AI 100-1 — AI Risk Management Framework. Core governance reference.
- NIST AI 100-3 — "The Language of Trustworthy AI" glossary. Published 2023. The AIRC online glossary remains pending final release as of May 2026.
- NIST CSRC Glossary — Standing definitions for AI model, artificial intelligence, generative AI.
Provider Documentation (verify at press time; these shift without notice)
- Anthropic API Glossary — Context window as "working memory," non-determinism caveat.
- Anthropic Pricing — Token costs, Opus 4.7 tokenizer change, 1M context window.
- OpenAI Platform Docs — Token definition, "1 token ≈ 4 characters or 0.75 words."
- Google Gemini API — Tokens — Token definition, multimodal rates, "100 tokens ≈ 60–80 English words."
- Google Gemini API — Long Context — Context window as "short term memory," Gemini 3 at 1M input tokens.
Things to follow up on...
- Opus 4.7 kills temperature control: Anthropic's newest model returns a 400 error if you set temperature, top_p, or top_k to any non-default value, removing even the partial determinism lever previous models offered — relevant context in the Anthropic API glossary entry on non-determinism.
- Mechanistic interpretability is accelerating: MIT Technology Review named it a top breakthrough technology for 2026, with Anthropic, OpenAI, and DeepMind publishing research that begins to explain what's actually happening inside the layers — the "black box" framing may have an expiration date.
- NIST critical infrastructure AI profile: On April 7, 2026, NIST released a concept note for an AI RMF Profile on Trustworthy AI in Critical Infrastructure, a separate document from AI 600-1 that will guide operators toward specific risk management practices for AI-enabled capabilities.
- Reasoning models hide their work: Anthropic's own research found that Claude 3.7 Sonnet only mentioned the actual reasoning hints it used 25% of the time, while DeepSeek's R1 did so 39% of the time — a finding that complicates any audit strategy built on inspecting chain-of-thought traces.

