Anthropic's Plan Mode Asks Humans to Approve What They Can't Fully See

Anthropic's "Trustworthy Agents in Practice" paper introduces Plan Mode to interrupt a familiar reflex: agents that act helpfully before anyone can evaluate whether the help is safe. The agent surfaces its intended strategy upfront. Nothing executes until a human approves.

It genuinely solves approval fatigue. Per-action prompts at scale become noise that users wave through.

But the agent still authored the plan. It explored options, discarded alternatives, made assumptions. The reviewer sees conclusions, not the reasoning behind them. The highest-leverage approval moment carries the least operational context.