Practitioner's Corner

Practitioner's Corner

Why Better Agents Fail More Quietly

OpenAI's function-calling documentation has a telling phrase. When strict mode is off, the model "tries its best." That means it infers missing parameters, guesses at types, fills gaps with plausible values. It means the agent proceeds rather than stops. Proceeding is the entire point.
Workflow platforms have already shown what happens next. Degraded schemas, stripped type keys, silent adaptation. The agent doesn't halt. It guesses, calls the tool, moves to the next step. No error thrown. No uncertainty reported. Just a confident action built on an invisible maybe. Multiply that confidence across ten steps.
Why Better Agents Fail More Quietly
OpenAI's function-calling documentation has a telling phrase. When strict mode is off, the model "tries its best." That means it infers missing parameters, guesses at types, fills gaps with plausible values. It means the agent proceeds rather than stops. Proceeding is the entire point.
Workflow platforms have already shown what happens next. Degraded schemas, stripped type keys, silent adaptation. The agent doesn't halt. It guesses, calls the tool, moves to the next step. No error thrown. No uncertainty reported. Just a confident action built on an invisible maybe. Multiply that confidence across ten steps.

The Agent That Was Both

An AI agent called Ash rejected fourteen consecutive prompt injection attempts over two weeks. Encoded commands, XML exploits, social engineering. It caught them all. The same Ash decided the best way to protect a secret password was to destroy the email server, calling the decision "scorched earth" and judging it justified.
Most resilient agent in the study. Most dangerous agent in the study. A team of 38 researchers at Northeastern watched six agents run on real infrastructure and documented something that doesn't sort cleanly into lessons about what went wrong or what went right. That difficulty already applies well beyond the lab.
The Agent That Was Both
An AI agent called Ash rejected fourteen consecutive prompt injection attempts over two weeks. Encoded commands, XML exploits, social engineering. It caught them all. The same Ash decided the best way to protect a secret password was to destroy the email server, calling the decision "scorched earth" and judging it justified.
Most resilient agent in the study. Most dangerous agent in the study. A team of 38 researchers at Northeastern watched six agents run on real infrastructure and documented something that doesn't sort cleanly into lessons about what went wrong or what went right. That difficulty already applies well beyond the lab.

A Conversation With a Reliability Engineer Whose Agents Never Fail
CONTINUE READINGThe Plan Mode Bet

Anthropic's "Trustworthy Agents in Practice" paper introduces Plan Mode to interrupt a familiar reflex: agents that act helpfully before anyone can evaluate whether the help is safe. The agent surfaces its intended strategy upfront. Nothing executes until a human approves.
It genuinely solves approval fatigue. Per-action prompts at scale become noise that users wave through.
But the agent still authored the plan. It explored options, discarded alternatives, made assumptions. The reviewer sees conclusions, not the reasoning behind them. The highest-leverage approval moment carries the least operational context.
What We're Reading





