Foundations

Foundations

The Number You Didn't Ask About

An agent succeeds at a task 70% of the time. Frame the results one way and you can report 97%. Frame them differently and you report 34%. Same system, same math, a 63-point gap created entirely by which question the benchmark decided to answer. Nearly every major agent benchmark chooses the generous framing. The number that captures what production actually feels like, consecutive successes across a full workload, is one that almost nobody reports.
The Number You Didn't Ask About
An agent succeeds at a task 70% of the time. Frame the results one way and you can report 97%. Frame them differently and you report 34%. Same system, same math, a 63-point gap created entirely by which question the benchmark decided to answer. Nearly every major agent benchmark chooses the generous framing. The number that captures what production actually feels like, consecutive successes across a full workload, is one that almost nobody reports.

The Accessibility Tree Wasn't Built for AI Agents

Accessibility trees were built for screen readers, not AI agents. But both share the same constraint: neither can see a web page. Both need it translated into structured, sequential text. Playwright MCP builds its entire browser automation architecture around that convergence, using accessibility snapshots that cost a fraction of what screenshots require per interaction. The token savings are real and they compound across multi-step workflows. So does the downside: on 95.9% of the web's top million pages, the underlying markup has detectable failures. The agent inherits every one of them.

The Accessibility Tree Wasn't Built for AI Agents
Accessibility trees were built for screen readers, not AI agents. But both share the same constraint: neither can see a web page. Both need it translated into structured, sequential text. Playwright MCP builds its entire browser automation architecture around that convergence, using accessibility snapshots that cost a fraction of what screenshots require per interaction. The token savings are real and they compound across multi-step workflows. So does the downside: on 95.9% of the web's top million pages, the underlying markup has detectable failures. The agent inherits every one of them.
Further Reading







