A language model is a system that assigns probabilities to the next token in a sequence, based on patterns learned from training data. That description is mechanical, unglamorous, and precise. It is also the only description that makes the rest of this section make sense.
Every concept that follows — context windows, hallucination, temperature, reasoning modes — is a consequence of that baseline. If the baseline is wrong, the concepts don't land correctly. And the baseline is wrong for most people walking into AI conversations right now, including sellers who've been briefed on AI and buyers who've been deploying it.
Why This Matters in Your Conversations
CAIOs and federal IT leaders are testing vendors on AI vocabulary. They've been burned by vendors who used AI terms without understanding what they were describing. The three errors that surface most often aren't obscure — they're reasonable inferences from adjacent domains. Anyone who's worked with expert systems, search infrastructure, or knowledge management has a mental model that partially fits and then fails at exactly the wrong moment.
The three errors: that a language model reasons, that it retrieves, and that it knows. Each one maps to a specific failure mode in a buyer conversation. Clearing them here means every subsequent lesson builds on something solid.
It Doesn't Reason
When a language model produces output that looks like reasoning — "given X, therefore Y" — it is not running an inference engine. It is producing a token sequence that was statistically likely given the preceding context. The word "therefore" appears because "therefore" frequently follows certain patterns in training data, full stop. No deliberative process evaluated the logical relationship between premises.
A buyer who asks whether an AI system can explain its decisions is asking something more complicated than it sounds, and it's complicated in a specific way that the word "reasoning" obscures. A model can produce text that resembles an explanation without that text corresponding to any actual process the model ran. Sellers who treat model output as the product of deliberation will give wrong answers to that question, and federal buyers — who care deeply about explainability and auditability — will notice.
It Doesn't Retrieve
When a language model produces a fact, it is not fetching a record from a database. Its "knowledge" is encoded in the weights of the model — numerical parameters adjusted during training to make certain token sequences more probable. There is no lookup. There is no query. There is no record that can be audited, updated, or revoked.
Federal buyers think about information in terms of sources, authorities, and currency. A language model has none of those properties by default. When it produces a date, a statute, a policy number, it is generating a statistically likely token sequence, not retrieving a verified fact. A technique called retrieval-augmented generation adds an actual retrieval layer on top of a base model; that's a later lesson. The base model, on its own, is not a retrieval system, and treating it like one produces confident errors.
It Doesn't Know
"Knowing" implies a correspondence between a claim and reality — some mechanism by which the claim can be checked against ground truth. A language model has no such mechanism. It has statistical weight. A token sequence that was common in training data will be produced confidently regardless of whether it corresponds to anything true.
This is a structural feature of how these systems work, and it has a name — hallucination — which the next lesson covers in detail. But hallucination only makes sense once you've accepted that there was never a ground truth mechanism to begin with. A model producing a false statement with high confidence is doing exactly what it was built to do: generating a statistically likely continuation. Statistical likelihood and factual accuracy are different properties, and the model has no way to distinguish between them.
What This Section Covers
The lessons that follow build on this baseline in sequence. Tokens and context windows explain the unit of processing and the limits it imposes on what a model can work with at any given moment. Hallucination explains what happens when statistical confidence diverges from factual accuracy — and why that divergence is predictable rather than random. Temperature explains how the probability distribution gets shaped at inference time. Reasoning modes, including chain-of-thought and related techniques, explain what "reasoning" actually means in this context, now that you know it doesn't mean deliberation.
None of those concepts require mathematical depth. They require the mechanical baseline you now have: a system that assigns probabilities to the next token, based on patterns learned from training data. That's the whole foundation. Everything else is a consequence.
IDAM Bridge — In identity, every claim about a principal traces back to an authoritative source: the IdP, the directory, the certificate authority. The system's confidence in a claim is proportional to the trustworthiness of that source, and the source can be audited, queried, and revoked. The closest AI equivalent is the model's training corpus — the body of text from which statistical patterns were learned. It diverges here: the training corpus is not queryable at inference time, not auditable per-output, and not revocable. When a model produces a confident claim, there is no IdP to verify it against. The confidence is statistical, not authoritative. Buyers who expect an AI system to behave like an identity-aware system — traceable claims, verifiable sources, revocable assertions — are applying a mental model that doesn't transfer.

