Recap — Anchoring Your Mental Model. A consolidation of the foundational vocabulary so the rest of the guide builds on solid ground. If anything in this chapter still feels fuzzy, slow down here — the next four chapters assume it.
You've done the reading. This is the structure that makes it stick. Use it before a call to reload the vocabulary; use it to identify which source article to revisit if something still feels soft.
How AI Is Built
Transformer — The neural network architecture underlying every major language model in production today. Not a synonym for "AI" or "LLM"; it's the specific architectural pattern that makes large-scale language modeling tractable. When it comes up: When a buyer asks "is this the same technology as ChatGPT?" Broadly, yes — same architecture class. Don't confuse with: The model itself. GPT-4, Claude, Gemini are all transformer-based models. Transformer is the blueprint; the model is the building.
Training — The process of exposing a model to massive text datasets and adjusting its internal weights until it gets good at predicting what comes next. Happens once (or periodically) at enormous compute cost. The model does not "learn" during a conversation. When it comes up: When a buyer asks "can we train it on our data?" They almost always mean fine-tuning or RAG, not training from scratch. Don't confuse with: Inference. Training is baking the cake. Inference is serving a slice.
Parameters / Weights — The numerical values inside a model that encode everything it learned during training. "A 70-billion-parameter model" means 70 billion of these values. More parameters generally means more capability and more compute cost. When it comes up: Model selection conversations — "what size model do we need for this use case?" Don't confuse with: Configuration settings. These parameters are baked into the model at training time; they are not things you adjust at deployment.
Foundation Model — A large model trained on broad data at scale, intended to be adapted for many downstream tasks. GPT-4, Claude 3, Gemini 1.5 are foundation models. The "foundation" is that you build on top of it rather than training from scratch. When it comes up: Vendor lock-in conversations — which foundation model is the agency's deployment built on, and what does switching cost? Don't confuse with: Fine-tuned model. A fine-tuned model is a foundation model that's been further trained on domain-specific data.
Fine-Tuning — Additional training on a smaller, domain-specific dataset to specialize a foundation model's behavior. Changes the model's weights. More durable than prompting; cheaper than training from scratch. When it comes up: "Can we make the model only answer questions about our policies?" Fine-tuning is one answer; RAG is often the better one. Don't confuse with: RAG. Fine-tuning changes what the model knows. RAG changes what the model can see at call time.
Embeddings — Numerical representations of text that capture semantic meaning. The mechanism that lets a model find "similar" content. Core to how RAG retrieval works. When it comes up: When a buyer asks how the model "searches" their documents. It is not keyword search; it's similarity across a high-dimensional numerical space. Don't confuse with: Vectors in a general math sense. In this context, embeddings are specifically the output of encoding text for semantic similarity comparison.
If you remember nothing else: Training is when the model learns; inference is when it works. Everything else in this section is detail on those two moments.
How It Behaves
Token (AI sense) — A chunk of text, roughly ¾ of a word on average. The unit the model processes and the unit API pricing is based on. "This prompt is 800 tokens" means roughly 600 words. When it comes up: Constantly — cost estimates, context window limits, latency discussions. See the collision table below before using this word with a buyer. Don't confuse with: OAuth/OIDC tokens. These share a word and nothing else.
Context Window — The total amount of text a model can process in a single call, measured in tokens. Both the input you send and the output it generates count against the limit. When the conversation exceeds the window, earlier content falls out. When it comes up: "Can the model hold our entire security policy?" Depends on policy length and the model's window. GPT-4 Turbo: 128K tokens (~96K words). Claude 3 Opus: up to 200K tokens. Don't confuse with: Memory. The context window is not persistent. It resets with each new call unless you explicitly reconstruct it from storage.
Temperature — A parameter that controls how deterministic or varied the model's output is. Low temperature (near 0): consistent, predictable. High temperature (near 1): more varied, occasionally more creative, more prone to drift. When it comes up: When buyers want consistent, auditable outputs — set temperature low. When they ask why the model gives different answers to the same question — temperature is part of the answer. Don't confuse with: A quality setting. Temperature doesn't make the model smarter or dumber; it makes it more or less random.
Hallucination — When the model generates confident, fluent, plausible-sounding output that is factually wrong. Not a traditional software bug. An emergent property of how prediction-based models work. Cannot be fully eliminated; can be mitigated with architecture. When it comes up: Every security and compliance conversation. Buyers will ask whether it hallucinates. The more productive conversation is about mitigation architecture. Don't confuse with: Model failure. Hallucination is an emergent property of prediction-based generation. The model predicts plausible next tokens; sometimes those predictions are factually wrong. That's the mechanism, not a defect.
RAG (Retrieval-Augmented Generation) — An architecture where the model retrieves relevant documents from an external store at call time and includes them in the context window before generating a response. The model doesn't "know" the documents; it reads them fresh each call. When it comes up: "How do we give the model access to our knowledge base without fine-tuning?" RAG is the standard answer. Don't confuse with: Fine-tuning. RAG is a runtime architecture. Fine-tuning is a training-time modification.
System Prompt — Instructions placed at the start of the context window that shape the model's behavior for the entire conversation. Invisible to the end user in most deployments. The mechanism by which an application developer constrains what the model will and won't do. When it comes up: "Who controls what the model says?" The system prompt owner does. In an enterprise deployment, that's the application team, not the model vendor. Don't confuse with: User prompt. The system prompt is set by the developer; the user prompt is what the end user types.
If you remember nothing else: The model is a prediction engine working inside a fixed window of text. Hallucination and context limits are both consequences of that same underlying fact.
How It's Deployed and Paid For
Inference — The act of running a trained model to generate output. What happens when a user sends a message and the model responds. Most enterprise AI spend is inference spend, not training spend. When it comes up: Cost and latency conversations. "How much does this cost to run?" is an inference question. Don't confuse with: Training. Training is a one-time (or periodic) cost. Inference is the ongoing operational cost.
Inference Endpoint — The API surface through which applications call a model. Could be a cloud provider's managed endpoint (OpenAI API, Azure OpenAI, AWS Bedrock) or a self-hosted endpoint. Where authentication and authorization decisions get made. When it comes up: "Where does identity fit in the AI stack?" The inference endpoint is one answer — it's where you enforce who can call the model and under what conditions. Don't confuse with: The model itself. The endpoint is the access layer; the model is what sits behind it.
Token Pricing — The billing model for most commercial LLM APIs: pay per thousand tokens of input plus per thousand tokens of output, at different rates. Context window size directly affects cost. When it comes up: TCO conversations. A large context window is powerful but expensive — every token in that window costs money on every call. Don't confuse with: Seat licensing. Token pricing is consumption-based, not per-user. Usage patterns matter more than headcount.
Agent (AI sense) — A model-driven process that takes multi-step actions autonomously — calling tools, querying APIs, making decisions across a sequence of steps — to complete a goal. The model is the reasoning engine; the agent is the behavioral pattern built on top of it. When it comes up: Every agentic AI conversation. The identity question is: what credentials does the agent use when it calls those APIs? That's your entry point. Don't confuse with: A service account or non-human identity. The agent is the behavior; it needs an identity to act. These are distinct layers. See the collision table below.
Latency vs. Throughput — Latency: how long a single request takes end-to-end. Throughput: how many requests the system handles simultaneously. These trade off against each other and require different optimization approaches. When it comes up: When buyers evaluate whether a model is "fast enough." Interactive use cases need low latency; batch processing needs high throughput. The answer to "is it fast enough?" depends entirely on which one they mean. Don't confuse with: Each other. Optimizing for one often degrades the other.
If you remember nothing else: You pay per token, the agent needs an identity, and the inference endpoint is where your access control conversation starts.
Vocabulary Mapping: Where IDAM and AI Collide
These four terms do completely different work in AI conversations than they do in IDAM conversations. The Key Divergence column is the part worth reading twice.
Table 1 — Required Collisions
| AI Term | What It Means in AI | IDAM Equivalent | Key Divergence |
|---|---|---|---|
| Token | A chunk of text (~¾ of a word); the unit of model input/output and the basis for API pricing | OAuth/OIDC bearer token; a signed credential proving authorization | AI tokens are content units. No security property attaches to them. IDAM tokens are credentials. Same word, entirely different objects. |
| Agent | A model-driven process that takes autonomous multi-step actions by calling tools and APIs | A non-human identity (service account, workload identity) that authenticates to systems | The AI agent is a behavioral pattern, not an identity. It needs an IDAM identity to act, but the two are distinct layers. Governing the identity doesn't govern the behavior. |
| Context | The full input sent to the model in a single call — everything it can "see" at that moment | Contextual signals (device posture, location, risk score) that modify an adaptive authentication decision | IDAM context is a signal that shapes a policy decision. AI context is the model's entire working memory for a task. One is an input to a decision; the other is the decision's environment. |
| Session | A conversation thread or task run — the span of a single context window's use | An authenticated session with defined lifetime, idle timeout, and revocation capability | IDAM sessions are governed objects with enforced lifecycle. AI "sessions" are typically stateless at the model layer; context is reconstructed from external storage on each call. There is no native revocation. |
Table 2 — Secondary Collisions
| AI Term | What It Means in AI | IDAM Equivalent | Key Divergence |
|---|---|---|---|
| Model | A trained artifact — the weights and architecture that produce outputs when called | An identity model or trust model (zero trust model, threat model, access model) | In AI, "model" always refers to the trained artifact. In IDAM, "model" is usually a framework or paradigm. A buyer saying "our security model" and a buyer saying "the AI model" are talking about categorically different things. |
| Endpoint | The API surface through which applications call a model | A managed device or workload in the EDR/endpoint security sense | AI endpoints are API call surfaces. IDAM and security endpoints are devices or workloads. The word does completely different work in each context — and both will come up in the same conversation. |
| Parameter | A numerical weight inside a trained model (billions of them); also a configuration value passed in an API call | A policy parameter or configuration attribute in an access policy | "Parameters" in AI most often refers to model weights. In IDAM, parameters are configuration values. Context usually disambiguates — but "parameter tuning" means something very different to each side of this conversation. |
Source Index
Every entry in this Recap traces back to one of the following articles in the AI Foundations section. If an entry still feels soft, the source article is where to go.
| Entry / Concept | Source Article | Section |
|---|---|---|
| Transformer, Training, Parameters, Foundation Model, Fine-Tuning | "What the Model Actually Does" | AI Foundations |
| Embeddings, RAG | "RAG: Giving the Model Access to Real Information" | AI Foundations |
| Token (AI sense), Context Window, System Prompt | "The Context Window" | AI Foundations |
| Temperature, Hallucination, Grounding | "Why Models Get Things Wrong" | AI Foundations |
| Agent (AI sense), Tool Use, Function Calling | "Agents: When the Model Starts Taking Actions" | AI Foundations |
| Inference, Inference Endpoint, Token Pricing, Latency vs. Throughput | "How You Pay for This" | AI Foundations |
| All collision tables | Cross-section; compiled for this Recap | AI Foundations |

