Agent Failure Modes That Aren't Model Failures

Three infrastructure failures routinely blamed on broken models, diagnosed through their mechanics, mapped to production fixes, and translated into field-ready buyer language.

By Leigh Garrity— May 9, 2026

Agent Failure Modes That Aren't Model Failures

Three infrastructure failures routinely blamed on broken models, diagnosed through their mechanics, mapped to production fixes, and translated into field-ready buyer language.

Three infrastructure failures routinely get misdiagnosed as broken models in production agent systems: context rot, instruction centrifugation, and stale-world reasoning. You'll encounter them when a buyer says their agent "started hallucinating after twenty minutes," or when an engineering lead describes an agent that "worked great in demos but falls apart in production." The precise language that buys you credibility: "Which layer failed: retrieval, context management, or the model itself?" That question alone signals you understand the architecture.

The concrete version. In March 2026, Amazon engineers used the company's AI coding assistant, Amazon Q, to pull implementation guidance for a production change. Q retrieved advice from an internal wiki page. The documentation had been superseded. The tool had no freshness signal, no version check, no mechanism to tell the model the page was stale. An engineer followed the advice. The change went to production. Amazon's own published statement confirmed the cause:

“

"An engineer following inaccurate advice that an agent inferred from an outdated internal wiki."

The result: roughly 120,000 lost orders and a customer-facing outage that lasted hours.

The model reasoned correctly over information that used to be true. The fix lived in the infrastructure layer. A better model would have made the same mistake. That distinction runs through everything below.

The Three Failure Modes

Same structure for each. Same depth. In a buyer conversation you need to distinguish them quickly, not rank them.

Context Rot

What it is: Quality degradation as accumulated tokens fill the context window with noise.

What it does: Over many turns, an agent's context fills with tool call results, intermediate reasoning, error traces, prior conversation. The useful signal from early turns gets buried under volume. The model drowns. Everything is technically still in context, just buried. Practitioner-cited Databricks Mosaic research (referenced in TianPan.co's 12-Factor Agents analysis, though the original publication isn't directly linked) found model correctness starts dropping after roughly 32K tokens of accumulated context, with agents favoring repetitive actions from their growing history over correct next steps.

Where it comes from: The harness layer. Every tool call returns tokens. Every error trace returns tokens. Nothing in the default architecture discards what's no longer relevant. A design omission in the orchestration layer, full stop.

What makes it distinct: It's gradual. No single bad turn causes it. The agent just gets progressively worse, like a desk accumulating papers until you can't find the one you actually need. Production teams describe the signature: systems that work fine in demos, then hit a reliability ceiling that no amount of prompt engineering fixes. And the counterintuitive part: bigger context windows make it worse. More room to accumulate noise.

Instruction Centrifugation

What it is: The system prompt loses effective attention weight as conversation turns accumulate, even though it remains technically present in context.

What it does: The agent stops following its instructions. The instructions are still there in the context window, but dozens of turns of tool outputs now sit between the system prompt and where the model is generating. Autoregressive models have a strong recency bias: tokens near the generation point exert disproportionate influence on output. After forty turns, the system prompt is still present. The model just pays no meaningful attention to it.

Where it comes from: Transformer attention mechanics. This is how the architecture works. The term "instruction centrifugation" is practitioner vocabulary. The earliest published usage I can find is from Prassanna Ravishankar (an engineer who has published detailed analysis of agent failure modes on SWE-bench Pro benchmarks). You won't find it in published ML papers, but you'll hear it from engineers who've debugged goal drift in long-running agents. The metaphor is precise: accumulated execution logs push original instructions to the periphery, the way a centrifuge separates layers by density. Per Ravishankar's analysis of SWE-bench Pro results, context overflow accounted for 35.6% of Claude Sonnet 4's failures on enterprise-level coding tasks.

What makes it distinct: The instructions are intact. If you inspect the full context, everything looks correct. The failure is entirely in attention allocation. The agent had the skill. It lost the target.

Stale-World Reasoning

What it is: The agent acts on outdated tool output as though it reflects current state.

What it does: A tool returns information. The information was accurate when written. By the time the agent acts on it, the world has changed. The agent has no way to know. This is exactly what happened in the Amazon Q incident: the wiki page existed, the content was real, it had been superseded months earlier. As Adaline.ai's engineering team documented (a production-focused AI memory platform publishing from direct multi-agent system experience): when retrieval returns a plausible but wrong result, the model treats it as signal. High relevance combined with incorrect information is worse than irrelevance, because it doesn't trigger the model's uncertainty.

Where it comes from: The retrieval and tool layer. No freshness metadata. No version signaling. No mechanism for the model to distinguish "retrieved now, written today" from "retrieved now, written eighteen months ago."

What makes it distinct: In a multi-turn agent loop, stale-world reasoning compounds. Each subsequent turn builds on the outdated premise. But it's the only one of these three that can cause catastrophic damage on a single retrieval. Context rot and instruction centrifugation are slow burns requiring accumulated turns. The Amazon Q incident came down to one retrieval of one outdated page, and the change went to production.

IDAM Anchor: Stale Authorization State

You already know this failure mode. A session token is valid, but the permissions it encodes were revoked two hours ago. The system enforces rules that are no longer true. Stale-world reasoning is that same pattern at the agent layer: the retrieval succeeded, the content was real, the world moved on. Your instinct about freshness enforcement and cache invalidation applies directly. Where it breaks: agents don't have a TTL on retrieved content the way tokens have expiry. That's the infrastructure gap nobody has cleanly solved.

Production Fixes and What They Cost You

Structure choice: trait-led analysis, anchored on solutions. The dimensions that matter in a buyer conversation are: which failure does this fix, what does it mechanically do, and what does it sacrifice? A flat comparison table collapses when you're mapping four solutions against three failure modes with different coverage profiles. So I'm leading with each solution and mapping its coverage explicitly.

Compaction

Compaction summarizes the full conversation history into a compressed form and reinitializes the context. Anthropic's implementation triggers at a token threshold (their cookbook example uses 180K tokens), preserving architectural decisions and unresolved issues while discarding redundant tool outputs. Directly addresses context rot by reducing volume. Partially helps instruction centrifugation by bringing objectives back to prominence in the summary. Does nothing for stale-world reasoning. Compaction summarizes what was retrieved. Whether that information is still true is a separate problem entirely.

The tradeoff: overly aggressive compaction can lose subtle context whose importance only surfaces later. A constraint dropped at turn 12 may not matter until turn 50. A practitioner benchmark (Victor Dibia, testing agents on a 44-file repository review) found that no-compaction baselines scored highest on quality but at 2–6× the token cost. Compaction trades quality ceiling for cost and runtime.

Tool-Result Clearing

Tool-result clearing selectively removes old tool outputs from conversation history without summarizing the rest. Anthropic's API replaces each cleared result with placeholder text so the model knows something was removed. LangChain takes a different approach: when a tool result exceeds 20,000 tokens, it saves it to a virtual filesystem and replaces it with a pointer (offloading rather than discarding). Clearing targets context rot from tool-specific bloat. It's the lightest intervention available.

The tradeoff: if the agent needs to re-reference a cleared result, it must re-fetch or reason from an incomplete picture. Has no effect on instruction centrifugation or stale-world reasoning.

Sub-Agent Isolation

Sub-agent isolation runs complex subtasks in separate context windows, returning only final output to the orchestrating agent. The main agent receives a clean summary, not the dozens of tool calls that produced it. This is the only approach that addresses all three failure modes structurally. It prevents context rot in the orchestrator, prevents instruction centrifugation by giving each sub-agent a clean context with its task instructions prominent, and mitigates stale-world reasoning by allowing sub-agents fresh tool access with explicit freshness constraints.

The tradeoff: coordination overhead. Each sub-agent spawn is a separate LLM call. If the orchestrator passes full conversation history to sub-agents, you can reach 150K tokens in three hops, most of it irrelevant. Sub-agent isolation is an architectural choice made at design time. You can't bolt it on later as a runtime patch.

Session Resets

Session resets clear the entire context and start fresh, either with a summary or from external state stored in a filesystem or database. The agent rediscovers where it was from structured artifacts rather than conversation memory. This addresses all three failure modes completely, at the cost of continuity. Anthropic documented a revealing case: Claude Sonnet 4.5 exhibited "context anxiety", wrapping up tasks prematurely as it sensed its context limit approaching. Session resets fixed it. When the same harness ran on Claude Opus 4.5, the behavior was gone and the resets were dead weight.

Worth sitting with that for a second: harnesses encode model-specific assumptions that need revisiting as models change. The infrastructure that fixes one model's failure can become unnecessary overhead for the next one.

One pattern across all of this deserves attention: stale-world reasoning, the failure mode with the highest single-incident risk, has the least mature production tooling. Compaction and clearing don't touch it. Sub-agent isolation only mitigates it if the sub-agent is specifically designed with freshness constraints. Session resets address it as a side effect. The Amazon Q incident required governance changes (mandatory two-person review, executive-level audits of production code changes) because the infrastructure layer didn't have a clean technical answer. When a buyer asks about stale-world reasoning, the honest response is that process discipline still matters more than tooling here.

IDAM Anchor: Session Timeout Policies

Compaction is to context rot what sliding session expiry is to session management: it extends the useful life of accumulated state without starting over. Session resets are the hard timeout. The architectural question is the same one you've navigated in federation design: how long do you trust accumulated state before forcing a refresh? In agent systems, nobody has a clean answer yet. When a buyer asks about context management strategy, they're asking about session policy for agents. That framing will land.

How to Say This in the Field

Don't say	Do say	Why it matters
"The model hallucinated"	"The agent reasoned correctly over stale data. The retrieval layer returned outdated information."	Shifts conversation from model quality to infrastructure, where the fix actually lives
"AI makes mistakes"	"The failure was in context management, not the model's reasoning"	Positions you as someone who understands the architecture
"You need a better model"	"You need freshness signals on your retrieval layer. The model can only evaluate what it can see."	Directly addresses stale-world reasoning without overselling model upgrades
"Agents drift over time"	"After 20-30 turns, accumulated tool outputs bury the original instructions. Engineers call it instruction centrifugation."	Names the mechanism; the buyer remembers the term and associates it with you
"Just reset the session"	"Session resets are one option, but the tradeoff is continuity. Compaction or sub-agent isolation might preserve more state."	Shows you understand the solution space, not just one fix
"Context windows are getting bigger, so this goes away"	"Bigger windows actually make context rot worse. More room to accumulate noise."	Counterintuitive and correct; this is a credibility-building observation
"That's a model problem"	"Which layer failed? Was it retrieval, context management, or the model's reasoning?"	Diagnostic framing that mirrors how production engineers actually triage
"AI coding assistants are risky"	"In the Amazon Q incident, the assistant retrieved superseded documentation and had no way to flag it as outdated"	Specific, accurate, demonstrates you've read the postmortem
"You need guardrails"	"You need context hygiene: compaction, tool-result clearing, and freshness enforcement on your retrieval layer"	Replaces a vague term with specific infrastructure concepts
"We can help with AI governance"	"The identity question is: what did the agent think was true about permissions when it acted?"	Bridges to identity and audit concerns without forcing a pitch

IDAM Anchor: Credential Lifecycle Management

Context rot, instruction centrifugation, and stale-world reasoning are all lifecycle problems. Credentials expire. Permissions go stale. Session state accumulates until it no longer reflects reality. Your buyers already manage these lifecycles for human identities. Agent context is a new lifecycle surface with the same fundamental dynamics: issue, validate, refresh, revoke. The tooling is immature. The governance pattern is one your buyers already own.

What This Means for Your Next Call

When a buyer says their agent "started hallucinating," ask which kind of failure they saw. Gradual quality degradation over a long session is context rot. Agent stopped following instructions despite the instructions being present: instruction centrifugation. Confident action on information that turned out to be outdated: stale-world reasoning.

Each has a different infrastructure fix. A better model would have hit the same wall. And the diagnostic question that earns you the room, "which layer failed?", is one you can ask on Tuesday without bluffing, because you now know what the answers mean.

The Amazon Q incident caused 120,000 lost orders because a retrieval layer returned superseded documentation and nothing in the infrastructure flagged it. Your buyers have been solving the identity version of that problem for years. The agent version is newer, less mature, and waiting for the same rigor they already bring to credential and session lifecycle.

Things to follow up on...

Amazon's 90-day code reset: After the Q incident, Amazon instituted mandatory two-person review across 335 Tier-1 systems and executive-level deployment audits — a response their SVP Dave Treadwell called "controlled friction" that's worth understanding as a governance template.
SWE-bench Pro failure breakdowns: Practitioner analysis of enterprise-level coding benchmarks found that 35.6% of agent failures traced to context overflow rather than reasoning errors, with detailed failure-mode taxonomy in Prassanna Ravishankar's agent drift analysis.
LangChain's Deep Agents architecture: Released March 2026, Deep Agents implements automatic tool-result offloading and sub-agent isolation as first-class context management primitives, documented in their context management engineering post.
Anthropic's context engineering cookbook: The most specific published guidance on compaction thresholds, clearing strategies, and when to use session resets versus summarization lives in Anthropic's context engineering tools cookbook, updated March 2026.

“