The Distance Between Breaking and Fixing

Each workaround built by the people suffering breakage insulates the people causing it, compounding the distance between failure and fix.

Abrowser agent tries to select a date on a booking page. The calendar is a grid of tightly packed cells, each about 24 pixels wide. The agent fails repeatedly, clicking the wrong day, sometimes spending minutes on what should take a moment. The failure log records a model capability problem.

The page's markup points somewhere older. Most production browser agent systems read the web through the accessibility tree, a structured representation that strips away decorative noise and surfaces meaning: roles, names, states. When a button has no accessible label, it appears in that tree as a button with an empty string where the name should be. The agent finds the element. It just can't tell what it does.

The accessibility tree was built for screen readers, not agents.

WAI-ARIA emerged in the mid-2000s to make dynamic web content navigable for blind and low-vision users. When developers implemented it well, navigation worked. When they didn't, screen readers adapted.

JAWS, the most widely used commercial screen reader, infers missing form labels from nearby text. A developer forgets to label an input field. JAWS guesses from context. The user navigates successfully. The developer receives no signal that anything was broken.

The compensation worked well enough that it became invisible. Accessibility auditors noted that testing with JAWS could mask issues a stricter screen reader would expose. But JAWS dominated enterprise environments precisely because it handled imperfect markup gracefully. The tool that made broken pages usable also made broken pages durable. Over half of homepages today still have unlabeled form inputs.

Then agents arrived and chose the same infrastructure as their perception layer. They had good reason to. The accessibility tree already does the filtering work agents need, and it's dramatically cheaper to process than raw page markup or screenshots. So agents plugged in. And they inherited two decades of unrepaired damage, because the screen reader's workarounds had prevented the repairs from ever becoming urgent.

The same dynamic played out in enterprise automation. When a UI changed and a bot broke, someone called a bot wrangler. That's the actual role: a dedicated operator who handles exceptions, patches scripts, keeps the automation running. The application developers who changed the UI often didn't know the bot existed. ThoughtWorks described it as "pouring concrete over your interfaces." The wrangler absorbed the breakage. The signal never traveled upstream.

Each workaround, built by the people closest to the pain, widens the distance between the failure and the person whose decisions caused it. Nobody has to decide to sever that connection. Each working adaptation just makes it less necessary for the people furthest from it.

The calendar widget is still broken. It was broken for screen reader users before agents existed, and the feedback never arrived.

Things to follow up on...

Capability or reliability failure: A 2026 paper argues that single-run evaluations conflate capability and reliability, systematically misattributing infrastructure breakage to model limitations.
Architecture over model quality: Research on browser agent design finds that architectural decisions determine success or failure more than model capability does, reinforcing that the root cause often sits in page structure, not the agent.
RPA's concrete layer persists: ThoughtWorks described RPA as pouring concrete over existing interfaces, creating a maintenance burden that grew silently as application teams changed UIs without knowing bots depended on them.
Agentic debt gets a name: A May 2026 paper formally defines "agentic technical debt" as the liability that accumulates when agent orchestration, prompts, and tool schemas are patched together faster than they can be validated.

The accessibility tree was built for screen readers, not agents.

The calendar widget is still broken. It was broken for screen reader users before agents existed, and the feedback never arrived.

Things to follow up on...

Capability or reliability failure: A 2026 paper argues that single-run evaluations conflate capability and reliability, systematically misattributing infrastructure breakage to model limitations.
Architecture over model quality: Research on browser agent design finds that architectural decisions determine success or failure more than model capability does, reinforcing that the root cause often sits in page structure, not the agent.
RPA's concrete layer persists: ThoughtWorks described RPA as pouring concrete over existing interfaces, creating a maintenance burden that grew silently as application teams changed UIs without knowing bots depended on them.
Agentic debt gets a name: A May 2026 paper formally defines "agentic technical debt" as the liability that accumulates when agent orchestration, prompts, and tool schemas are patched together faster than they can be validated.