Every output a language model produces is a confabulation. Whether it happens to match reality on that particular prompt is a separate matter entirely. "Hallucination" implies something different — that the model deviated from a correct path — and that implication is doing real damage in buyer conversations about AI reliability.
This matters in your accounts because the mental model a CISO or CAIO holds about AI failure shapes the governance architecture they build. A buyer who thinks AI failure is anomalous will build for anomaly detection. A buyer who understands AI failure as structural will build for systematic verification. These are different programs, different budgets, and different conversations about trust.
This piece profiles both framings — the "hallucination" term and the more accurate framing of confabulation as baseline behavior — then runs them against the dimensions that matter for your next discovery call. It closes with a field language guide for both directions of buyer misuse: the buyer who uses "hallucination" to mean "this thing is fundamentally untrustworthy" and the buyer who uses it to mean "occasional glitch, we'll catch it in review."
One thing this piece does not do: re-litigate why model outputs are non-deterministic. That's 1.1's territory. This piece goes deeper into the specific failure mode of confident wrongness — what it is mechanically, why it can't be patched, and why the word you use to describe it changes what you do about it.
Profile 1: "Hallucination"
What it is: A term describing model outputs that are confidently wrong, borrowed from the clinical vocabulary for sensory experiences without external stimuli.
What it does in a conversation: It gives buyers a mental model of AI failure as an anomalous event — something the model does occasionally, when it goes off the rails. This makes the failure mode feel manageable. If hallucinations are discrete events, they can be caught, counted, and mitigated. The term also implies the model has a correct path it sometimes strays from, which shapes how buyers think about verification: find the deviations, flag them, route them to human review.
Where it comes from: The term gained traction in NLP research around 2018–2020, initially in the context of neural machine translation and summarization systems that would generate fluent text unsupported by the source document. It migrated into the broader LLM conversation as GPT-3 and its successors demonstrated the same behavior at scale. OpenAI, Anthropic, and Google all use the term in their public documentation, which has cemented its position in the field vocabulary even as a visible subset of researchers and practitioners has been pushing back on it since at least 2023. It remains the dominant term in vendor communications, press coverage, and most buyer conversations.
What makes it distinct: The term carries an implicit claim about internal state. To hallucinate, in the clinical sense, is to perceive something that isn't there — which means there's a perceiver with a normal state that has been disrupted. Applied to a language model, this implies the model has a normal state (correct output) and an abnormal state (wrong output). The generation process is the same in both cases. The "hallucination" framing obscures this.
IDAM Concept Mapping
In access control, a wrong decision is auditable: the policy said X, the system did Y, here's the log entry. "Wrong" has a reference point. When a model produces a wrong output, there's no equivalent reference point at inference time — the model wasn't checking against a ground truth and getting it wrong. It was generating plausible text. The audit analogy holds for understanding what "correct" means in principle; it breaks when you try to apply the same audit and remediation logic to model outputs. You can log what the model said. You can't log what it "should have" said, because the model had no access to that.
Profile 2: Confabulation as Baseline
What it is: A framing in which language models are understood to always generate text by statistical pattern-matching, with no internal mechanism for verifying factual accuracy — meaning every output is a confabulation, and correctness is a property of the output, not the process.
What it does in a conversation: Buyers stop asking "how do we catch the occasional wrong output" and start asking "how do we verify outputs systematically, given that the model cannot tell us which ones to check." This is a different architecture conversation. It also removes the implicit assumption that the model has a correct path — there's no deviation to detect, only outputs to verify against external sources.
Where it comes from: The term "confabulation" comes from neuropsychology, where it describes the brain's tendency to fill gaps in memory with plausible-seeming fabrications, without awareness that it's doing so. Applied to language models, it was surfaced in technical discussions as a more mechanistically accurate alternative to "hallucination," particularly by researchers who objected to the anthropomorphization implied by the latter. The framing has gained traction in AI safety research and among practitioners working on reliability and verification architectures. It is not yet the dominant term in vendor documentation or mainstream coverage, but it's where serious technical conversations about AI reliability tend to land.
What makes it distinct: The framing is mechanistically accurate. A language model generates the next token based on probability distributions learned during training. There is no separate module that evaluates factual correctness before emitting output. The model doesn't have access to a ground-truth database it checks against at inference time. It has patterns. When those patterns produce a wrong answer, the model isn't overriding an internal uncertainty flag — there is no flag. Confabulation-as-baseline captures this: the behavior is constant, and correctness is a property of the output, not a property of the process.
Some models have been fine-tuned to express uncertainty: "I'm not sure about this" or "I don't have reliable information on that topic." This is a trained behavior, not an internal confidence signal. The model has learned to produce hedged tokens in certain contexts. It's not accessing an internal state that says "confidence: low." A trained hedge can be bypassed by prompting, may not trigger on novel input types, and can be absent precisely when the model is most confidently wrong — because the training data for uncertainty expressions didn't include the specific failure mode you're encountering.
IDAM Concept Mapping
Authentication systems are designed to fail closed: when uncertain, deny. A language model fails plausibly: when uncertain, produce something that sounds right. These are opposite failure postures, and buyers who have spent their careers in IDAM will carry the fail-closed intuition into AI conversations without realizing it. The confabulation-as-baseline framing makes this explicit — the model producing a wrong answer is succeeding at its actual objective (generating plausible text). The failure is in the assumption that plausible and correct are the same thing.
Comparison: Two Framings, One Phenomenon
These profiles describe the same behavior. This isn't a comparison of two independent technical concepts — it's a contest between a misleading frame and a more accurate one, and the question is what each frame causes buyers to do.
Structure used here: trait-led analysis across three dimensions. A flat A/B table would show what each framing says; this structure shows what each framing does. The three dimensions are: accuracy of the mental model, implication for risk posture, and what each framing causes buyers to do or not do. Both framings appear on every dimension.
Dimension 1: Accuracy of the mental model
"Hallucination" implies a model with a correct internal state that sometimes goes wrong. This is not the mechanism. The model generates tokens by pattern-matching. It has no internal state representing "the right answer" and no process that checks output against that state before emitting it.
Confabulation-as-baseline is mechanistically accurate. The model is always doing the same thing. The output either matches reality or it doesn't, and the model has no way to know which.
Buyers who hold the "hallucination" mental model will look for ways to detect when the model is "going wrong." Buyers who hold the confabulation-as-baseline mental model will look for ways to verify outputs systematically, because they understand the model can't signal which outputs need checking.
Dimension 2: Implication for risk posture
The "hallucination" frame implies a risk posture built around anomaly detection: find the bad outputs, flag them, route them to human review. This is a reasonable approach, but it rests on an assumption the frame doesn't support — that the bad outputs are distinguishable from the good ones by some signal the model emits. They're not. The model sounds equally confident when it's wrong.
Confabulation-as-baseline implies a risk posture built around verification architecture: assume all outputs require verification, design the verification layer based on the stakes of the specific use case, and don't rely on the model to tell you which outputs to check. This is a higher bar. It's also the accurate bar.
The difference matters most in high-stakes public sector use cases. An agency that builds its AI governance around catching "hallucinations" will design for the wrong failure mode.
Dimension 3: What each framing causes buyers to do
"Hallucination" causes buyers to ask: how often does this happen, and can the vendor reduce it? These are reasonable questions, but they lead to a conversation about frequency rather than architecture. Vendors can reduce confabulation frequency through fine-tuning, retrieval augmentation, and output filtering. They cannot remove the underlying behavior — it's structural to how these models work, not a defect in a specific code path. A buyer who frames the problem as frequency will accept a "we've reduced hallucinations by X%" answer as meaningful closure. It isn't.
Confabulation-as-baseline causes buyers to ask: what verification architecture do we need, and what does it cost? That's the right conversation for a public sector deployment where the output touches a federal record, a policy decision, or a citizen interaction.
IDAM Concept Mapping
Authentication flows have defined error states: 401, 403, token expired. These are signals the system emits when it can't complete a transaction correctly. A language model has no equivalent error state for factual incorrectness — it doesn't emit a "confidence below threshold" signal by default. Some implementations add this through retrieval-augmented generation (RAG) or explicit uncertainty training, but these are architectural additions, not native behaviors. A buyer who expects the model to signal its own unreliability — the way an auth system signals a failed token — will be surprised when it doesn't. The model's silence on its own uncertainty is not a bug. It's the default.
How to Say This in the Field
The table below handles both directions. Left column is the buyer language that signals a framing problem — either overstating the risk ("this thing is fundamentally untrustworthy") or understating it ("we'll catch the bad ones in review"). Every "Do say" is usable verbatim.
| Buyer says | Don't say | Do say | Why it matters |
|---|---|---|---|
| "It hallucinates sometimes" | "Yes, that's a known limitation" | "Every output is a confabulation — whether it matched reality on that prompt is a separate matter. There's no internal check that runs first." | Reframes the risk from occasional to structural |
| "We can catch hallucinations in review" | "Great, that's the right mitigation" | "Review catches wrong outputs after the fact. The problem is the model sounds equally confident when it's wrong — your reviewers need domain expertise, not just attention." | Identifies the detection gap |
| "The vendor is working on reducing hallucinations" | "Good to know, let's revisit when that's fixed" | "Vendors can reduce the frequency. They can't remove the behavior — it's structural to how these models work, not a defect in a specific code path." | Prevents false closure on the risk |
| "This thing just makes stuff up, we can't trust it for anything" | "That's a bit overstated" | "It doesn't make stuff up randomly — it generates plausible text. The problem is that plausible and correct aren't the same thing, and the model can't tell the difference." | Accurate framing without dismissing the concern |
| "We need a human to review every output" | "That's the safest approach" | "That's one architecture. The question is whether your reviewers can detect confident wrongness — they need domain knowledge the model doesn't have." | Surfaces the reviewer competence dependency |
| "It hallucinated a citation" | "That's a common problem with LLMs" | "The model generated a plausible-looking citation. It wasn't retrieving a real source and mangling it — it was generating text that looks like a citation." | Distinguishes retrieval failure from generation behavior |
| "We turned on the uncertainty warnings" | "Good, that should help" | "That's a trained behavior — the model has learned to hedge in certain contexts. Novel inputs may not trigger it, and it can be prompted around." | Prevents over-reliance on trained uncertainty signals |
| "Hallucinations are a prompt engineering problem" | "Partially true" | "Better prompts reduce frequency. They don't change the underlying behavior — the model is still generating plausible text, not retrieving verified facts." | Keeps the architectural conversation honest |
| "We're using RAG so hallucinations aren't an issue" | "RAG helps significantly" | "RAG gives the model better source material. It doesn't guarantee the model uses it accurately — it can still generate plausible text that diverges from the retrieved content." | RAG reduces but doesn't eliminate the behavior |
| "It's like a bug we need to fix" | "It's more complex than that" | "Software bugs are reproducible and patchable. This isn't — same prompt, different outputs, and the behavior is structural, not a defect in a specific code path." | Prevents buyers from expecting a patch |

