A person clicks a button on a webpage. Before that click fires, a lot has already happened. They loaded the page, read it (or skimmed it), understood roughly where they were, decided what they wanted to do, and accepted that the outcome is on them. The click is the only part of that sequence the infrastructure ever saw. Everything before it was invisible and assumed.
The web never had to encode any of this because it was never in doubt. Session cookies, confirmation dialogs, terms-of-service checkboxes, login flows. Every one of these assumes a present, perceiving human. Nobody documented this as a deliberate design decision. It was a condition so obvious it functioned like gravity. You just lived in it.
When an agent clicks the same button, the HTTP request is identical. The page updates the same way. The mechanical event reproduces perfectly, absent everything that gave it meaning.
Benchmark results trace the outline of what's missing. WebArena's human baseline reaches 78.24% task completion; the best reported agent manages 58.1% on the same tasks. TheAgentCompany, which simulates workplace tasks involving browsing and communicating with coworkers, reports its strongest autonomous agent completing roughly 30%. The gap shows up most clearly in exploration, failure recovery, contextual judgment. Exactly the things a human click implicitly signals.
That gap cascades into infrastructure. The MCP authorization specification now requires access tokens to be validated as issued specifically for the server receiving them, and explicitly prohibits passing tokens through to upstream APIs. The spec names the risk: a confused-deputy problem, where downstream systems incorrectly trust a token because the entity presenting it looks authorized. These rules exist because when the clicker is an agent acting on behalf of a user acting through a tool chain three layers deep, "who authorized this" stops having a clean answer.
But properly authorized agents still lack judgment about what to do with that authority. OWASP classifies prompt injection as a risk that scales with the agency the system has been given, independent of attack sophistication. An agent browsing a webpage can encounter instructions embedded in content and follow them, even when those instructions conflict with the user's original intent. Anthropic's computer-use documentation states this plainly: the model may follow commands found in content even when they conflict with user instructions. Their recommended mitigations are sandboxed containers, minimal privileges, domain allowlists, and human confirmation for consequential actions. Containment measures, because the click itself carries no judgment.
OWASP's Agentic Applications Top 10 names the structural version of this problem: an "attribution gap" created when agents operate through identity systems designed for humans, lacking distinct governed identities of their own.
Their recommended principle is least-agency. Where least privilege limits what a system can access, least agency limits how much autonomous judgment a system should exercise. That distinction matters once you accept that an agent's click and a human's click are fundamentally different events, even though they produce identical network traffic.
Nothing broke. The systems kept working. But the clicks carry less now than every downstream system expects them to carry. Every system that quietly relied on human presence has to decide what to do without it.
-
Reliability across repeated trials: Tau-bench's pass^k metric reveals how agent success rates decay sharply over multiple attempts, with state-of-the-art agents dropping below 25% reliability in retail tasks when measured across eight runs rather than one.
-
Non-human identity risks: OWASP's Non-Human Identities Top 10 catalogs risks including secret leakage, overprivilege, and identity reuse that become sharper when agents inherit service accounts originally provisioned for background jobs.
-
Observability naming agent internals: OpenTelemetry's GenAI semantic conventions are beginning to define span operations like
invoke_agent,execute_tool, andplanthat would make agent decision traces auditable rather than opaque. -
Token audience and confused deputies: MCP's security best-practices guide walks through the confused-deputy scenario in detail, showing how token passthrough breaks audit trails and lets downstream APIs incorrectly trust credentials they were never meant to see.

