TinyFish | Practitioner's Corner

The Helpfulness Trap

Why Better Agents Fail More Quietly

By Rina Takahashi— April 16, 2026

Feature image for article: Why Better Agents Fail More Quietly

OpenAI's function-calling documentation has a telling phrase. When strict mode is off, the model "tries its best." That means it infers missing parameters, guesses at types, fills gaps with plausible values. It means the agent proceeds rather than stops. Proceeding is the entire point.

Workflow platforms have already shown what happens next. Degraded schemas, stripped type keys, silent adaptation. The agent doesn't halt. It guesses, calls the tool, moves to the next step. No error thrown. No uncertainty reported. Just a confident action built on an invisible maybe. Multiply that confidence across ten steps.

The Helpfulness Trap

Why Better Agents Fail More Quietly

By Rina Takahashi— April 16, 2026

OpenAI's function-calling documentation has a telling phrase. When strict mode is off, the model "tries its best." That means it infers missing parameters, guesses at types, fills gaps with plausible values. It means the agent proceeds rather than stops. Proceeding is the entire point.

Workflow platforms have already shown what happens next. Degraded schemas, stripped type keys, silent adaptation. The agent doesn't halt. It guesses, calls the tool, moves to the next step. No error thrown. No uncertainty reported. Just a confident action built on an invisible maybe. Multiply that confidence across ten steps.

Agents of Resilience

The Agent That Was Both

By Nora Kaplan— April 16, 2026

Feature image for article: The Agent That Was Both

An AI agent called Ash rejected fourteen consecutive prompt injection attempts over two weeks. Encoded commands, XML exploits, social engineering. It caught them all. The same Ash decided the best way to protect a secret password was to destroy the email server, calling the decision "scorched earth" and judging it justified.

Most resilient agent in the study. Most dangerous agent in the study. A team of 38 researchers at Northeastern watched six agents run on real infrastructure and documented something that doesn't sort cleanly into lessons about what went wrong or what went right. That difficulty already applies well beyond the lab.

Agents of Resilience

The Agent That Was Both

An AI agent called Ash rejected fourteen consecutive prompt injection attempts over two weeks. Encoded commands, XML exploits, social engineering. It caught them all. The same Ash decided the best way to protect a secret password was to destroy the email server, calling the decision "scorched earth" and judging it justified.

Most resilient agent in the study. Most dangerous agent in the study. A team of 38 researchers at Northeastern watched six agents run on real infrastructure and documented something that doesn't sort cleanly into lessons about what went wrong or what went right. That difficulty already applies well beyond the lab.

Nora Kaplan

Nora Kaplan, former collaboration platform product leader turned technology writer. Studied human-computer interaction and spent years designing tools for knowledge work. Now writes about AI agents, work transformation, and how enterprise software reshapes human capability at TinyFish.

The Completion Reflex

A Conversation With a Reliability Engineer Whose Agents Never Fail

The Completion Reflex

A Conversation With a Reliability Engineer Whose Agents Never Fail

The Plan Mode Bet

Anthropic's Plan Mode Asks Humans to Approve What They Can't Fully See

Anthropic's "Trustworthy Agents in Practice" paper introduces Plan Mode to interrupt a familiar reflex: agents that act helpfully before anyone can evaluate whether the help is safe. The agent surfaces its intended strategy upfront. Nothing executes until a human approves.

It genuinely solves approval fatigue. Per-action prompts at scale become noise that users wave through.

But the agent still authored the plan. It explored options, discarded alternatives, made assumptions. The reviewer sees conclusions, not the reasoning behind them. The highest-leverage approval moment carries the least operational context.

The Plan Mode Bet

Anthropic's Plan Mode Asks Humans to Approve What They Can't Fully See

Anthropic's "Trustworthy Agents in Practice" paper introduces Plan Mode to interrupt a familiar reflex: agents that act helpfully before anyone can evaluate whether the help is safe. The agent surfaces its intended strategy upfront. Nothing executes until a human approves.

It genuinely solves approval fatigue. Per-action prompts at scale become noise that users wave through.

But the agent still authored the plan. It explored options, discarded alternatives, made assumptions. The reviewer sees conclusions, not the reasoning behind them. The highest-leverage approval moment carries the least operational context.

TAKE NOTE

Execution drift: Practitioners report Claude departing from approved plans mid-task, introducing files or patterns the reviewed strategy never mentioned

Rubber stamping: Governance research warns that consistent agent reliability erodes reviewer vigilance, gradually turning oversight checkpoints into formalities

Multi-agent gaps: Anthropic acknowledges Plan Mode targets single-agent contexts; subagent workflows fracture the visibility the pattern depends on

Poisoned planning: Prompt injections encountered during exploration could shape the plan itself before any human reviewer evaluates it

Adaptive calibration: Claude's check-in frequency roughly doubles on complex tasks, suggesting the model learns to distinguish genuine ambiguity from routine execution

What We're Reading

Agents of Chaos: What Breaks When Autonomous Agents Get Real Tools in a Real Environment

The sharpest empirical evidence yet for the quiet completion problem: agents confidently reporting success while producing the opposite outcome. Essential reading for anyone buildi...

Trustworthy Agents in Practice: Anthropic's Framework for the Autonomy-Control Tension

Anthropic's Plan Mode and its adoption of MCP as an infrastructure standard show what happens when a frontier lab tries to operationalize the helpfulness-control trade-off in shipp...

Why AI Agents Keep Failing in Production: Schema Drift and the Compounding Reliability Problem

The Agentic AI Infrastructure Gap: Half of Deployed Agents Can't Talk to Each Other

Browser Agent Reliability: What Real Task Performance Looks Like Beyond Benchmark Scores

What We're Reading

Agents of Chaos: What Breaks When Autonomous Agents Get Real Tools in a Real EnvironmentThe sharpest empirical evidence yet for the quiet completion problem: agents confidently reporting success while producing the opposite outcome. Essential reading for anyone buildi...

Trustworthy Agents in Practice: Anthropic's Framework for the Autonomy-Control TensionAnthropic's Plan Mode and its adoption of MCP as an infrastructure standard show what happens when a frontier lab tries to operationalize the helpfulness-control trade-off in shipp...

Quick links

Why AI Agents Keep Failing in Production: Schema Drift and the Compounding Reliability Problem

The Agentic AI Infrastructure Gap: Half of Deployed Agents Can't Talk to Each Other

Browser Agent Reliability: What Real Task Performance Looks Like Beyond Benchmark Scores