AI Foundations Recap: Anchoring Your Mental Model

By Leigh Garrity— May 8, 2026

AI Foundations Recap: Anchoring Your Mental Model

How Models Are Built

Training — The process of adjusting a model's internal weights by exposing it to massive amounts of text and correcting prediction errors, billions of times, until the model gets good at predicting what comes next. When it comes up: When a buyer asks "can we train it on our data?" They almost never mean this. They mean fine-tuning. The distinction matters because pre-training costs millions of dollars and months of compute — not a procurement conversation you're having. Don't confuse with: Fine-tuning. Pre-training builds the model from nothing. Fine-tuning specializes a model that already exists.

Parameters — The numerical weights inside a model that encode what it learned during training. When you see "7 billion parameters," that's 7 billion numbers, each adjusted incrementally during training. When it comes up: Model size comparisons and capability discussions. Bigger parameter counts don't automatically mean better performance on a specific task — a smaller model fine-tuned on relevant data often outperforms a larger general model. Don't confuse with: Configuration settings. Parameters in the ML sense are baked in at training time. Users don't tune them.

Tokenization — The process of breaking input text into chunks (tokens) before the model processes it. A token is roughly three-quarters of a word in English. The model never sees text. It sees sequences of numbers representing tokens. When it comes up: API pricing (charged per token), context window limits, and any conversation about what the model actually processes. Also: every time someone asks why the model "cut off" a long document. Don't confuse with: OAuth tokens, session tokens, access tokens. Same word. Completely different concept. See the mapping table below — this collision will happen in front of a customer.

Fine-tuning — Additional training on a smaller, domain-specific dataset to adjust a pre-trained model's behavior without rebuilding it from scratch. Changes how the model responds; does not update its knowledge cutoff date. When it comes up: When agencies ask about customizing a model for their mission. Fine-tuning shapes behavior and tone. It does not inject new factual knowledge reliably, and it does not replace retrieval. Don't confuse with: RAG (retrieval-augmented generation). Fine-tuning changes the model itself. RAG changes what the model can see at inference time. Different levers, different costs, different tradeoffs.

If you remember nothing else: When a buyer says "train it on our data," they mean fine-tuning. Pre-training is not on the table. Clarifying this early saves everyone a painful scope conversation later.

How They Run

Inference — The act of running a trained model to generate a response. Every prompt a user sends triggers inference. The model's weights don't change. Inference is read-only. When it comes up: Cost and latency discussions. Training is a one-time event. Inference happens millions of times a day. The operational cost of running AI at scale is almost entirely inference cost, which is why model efficiency matters to procurement. Don't confuse with: Training. A common conflation. Inference consumes the model; it doesn't modify it.

Context Window — The maximum amount of text, measured in tokens, that a model can process in a single interaction. Everything outside the window is invisible to the model — it doesn't exist as far as the model is concerned. When it comes up: Agent memory, multi-turn conversations, document analysis, and any question about why the model "forgot" something it was told earlier. Context window size is one of the most practically significant specs when evaluating a model for an agentic use case. Don't confuse with: Security context. In IDAM, context carries authorization signals — who the user is, what conditions apply. In LLMs, context is the model's working memory for the current interaction. The word is doing completely different work.

Temperature — A user-configurable parameter at inference time that controls output randomness. Temperature 0 produces the same answer to the same question every time. Higher temperature introduces variation. It doesn't change what the model knows. It changes how deterministic it is. When it comes up: Auditability and consistency requirements. Federal buyers asking about reproducible outputs need to understand this setting exists and what it does. A model running at temperature 0 behaves very differently from one running at 0.8. Don't confuse with: A quality dial. Temperature doesn't make the model smarter or more accurate. It controls predictability, not capability.

If you remember nothing else: The context window is the model's working memory. When an agent forgets something it was told, this is the mechanism. Knowing this lets you ask the right diagnostic question.

How They Fail

Hallucination — When a model generates a confident, fluent, factually wrong response. Not a bug in the traditional sense. The model is doing exactly what it was trained to do: predict plausible next tokens. Sometimes plausible is not accurate, and there's no internal signal for "I don't know this." When it comes up: Every security, compliance, and audit conversation. It will hallucinate. Ask what the guardrails are when it does, and whether the use case can tolerate the failure mode. Don't confuse with: A defect that gets patched in the next version. Hallucination is a structural property of how these models work. Mitigation strategies exist (retrieval grounding, output validation, human review). Elimination doesn't.

If you remember nothing else: Hallucination is architectural, not accidental. Claiming "our model doesn't hallucinate" to a security-conscious buyer is a credibility-ending move. The conversation that works is about system design: how hallucinations are caught before they reach a decision point.

Vocabulary Mapping: Terms That Mean Something Different Here

These are the collisions that will happen in a live conversation. The Key Divergence column is the one to read.

Core Term Collisions

AI Term	What It Means in AI	IDAM Equivalent	Key Divergence
Token	A chunk of text (roughly ¾ of a word); the unit of input/output measurement for LLMs	OAuth token, session token, access token	Completely unrelated concepts sharing a word. An LLM token has no security properties, no expiry, no bearer semantics. Conflating them in front of a technical buyer signals you're new to the space.
Context	The full text visible to the model in a single interaction (the context window)	Security context: the set of attributes and conditions that inform an authorization decision	In IDAM, context enriches a policy decision. In LLMs, context is the conversation — it's working memory, not an attribute. The word is structural in both domains but structurally different.
Model	A trained artifact — the weights and architecture that produce outputs from inputs	Data model, threat model, trust model	In AI, "the model" is the thing you're running. In IDAM, "model" is almost always a framework or schema. When a buyer says "what model does it use," they're asking about the AI artifact, not a governance framework.
Agent	An AI system that takes actions autonomously — calls tools, makes decisions, executes multi-step tasks	Directory agent, service agent, software agent in legacy IAM contexts	An AI agent is a principal that acts on behalf of a user or system, often with delegated credentials. The IDAM concept of an agent is passive infrastructure. The AI concept is an active, decision-making entity. The identity implications are entirely different.

Behavioral and Boundary Concepts

AI Term	What It Means in AI	Closest IDAM Parallel	Key Divergence
Hallucination	Confident, fluent, factually incorrect output — a structural property of text prediction	False positive in anomaly detection	A false positive is a system error you can tune. Hallucination is a feature of the architecture operating as designed. The mitigation logic is different: you're not tuning a threshold, you're designing a review layer.
Temperature	Inference-time parameter controlling output randomness and determinism	Session timeout, retry policy — configurable runtime behavior	Temperature has no security analog. It's a quality-of-output control, not a security control. Relevant to auditability requirements, not to access policy.
Embeddings	Numerical representations of text that encode semantic meaning, used for similarity search and retrieval	Attribute values in a directory schema	Embeddings are dense vectors in high-dimensional space — they're not human-readable and not queryable like directory attributes. The similarity they encode is semantic, not categorical.

Source Index

Return to these when you need the full argument, not just the anchor.

Concept	Source Article	Section
Training, Parameters	"What a Language Model Actually Is"	The Architecture Underneath
Tokenization	"Tokenization: The Unit of Everything"	How Text Becomes Numbers
Fine-tuning	"Fine-Tuning and What It Costs"	Full article
Inference	"From Prompt to Response: How Inference Works"	Inference vs. Training
Context Window	"The Context Window Problem"	Full article
Temperature	"Temperature, Sampling, and Determinism"	Controlling Output Behavior
Hallucination	"Why Models Hallucinate"	Full article
Embeddings	"Retrieval and Semantic Search"	How Embeddings Work

If anything above still feels uncertain, stop here before moving to Chapter 2. The next four chapters — agents, identity, authorization, and agentic workflows — assume this vocabulary is solid. A shaky foundation there will compound.