Echoes

Echoes

The Thread That Runs Through the Browser

In 2004, a developer at ThoughtWorks wrote a JavaScript tool to stop his colleagues from manually clicking through a billing app. He named it Selenium, as a joke. Twenty-two years later, AI agents open browsers using protocols descended from that tool, reading page structures originally built for screen readers, deciding what to do on sites they've never encountered. Every layer of browser automation was a reasonable response to the problem directly in front of it. The problem in front of it kept quietly changing.

The Thread That Runs Through the Browser
In 2004, a developer at ThoughtWorks wrote a JavaScript tool to stop his colleagues from manually clicking through a billing app. He named it Selenium, as a joke. Twenty-two years later, AI agents open browsers using protocols descended from that tool, reading page structures originally built for screen readers, deciding what to do on sites they've never encountered. Every layer of browser automation was a reasonable response to the problem directly in front of it. The problem in front of it kept quietly changing.
Before the Click

Before an agent can decide what to click, something has to confirm the click will actually land. Playwright runs four actionability checks on every single click action. Visible. Stable. Receives events. Enabled. All four must pass simultaneously, or the action doesn't fire.
The third check is worth pausing on. An element can be fully rendered, motionless, and enabled, yet a transparent overlay sitting above it will silently capture the click. Playwright calculates the exact pointer coordinates and asks: at this pixel, which element would actually receive the event? The documentation concedes that intentional overlay behavior is "indistinguishable from a bug." The infrastructure can't always tell the difference.
These checks formalized what earlier automation left to guesswork and sleep timers. They're also what agents inherit. Playwright MCP exposes this same machinery to LLMs. When an agent clicks a button, these four checks still run underneath. Agent reasoning sits on top of a layer that's still working out whether the action is physically possible.

The Standard and the Cycle

The W3C Moment
In June 2018, the W3C published a spec describing software that "emulates the actions of a real person using the browser." They meant testing tools. That sentence now reads like a product description for AI agents. WebDriver's authors knew they were building something more general than a test harness. They just scoped it as one. The word "primarily" left room for everything that followed.

The Screen Scraping Cycle
In 1986, IBM shipped a tool that let programs read characters off a terminal screen because mainframe applications had no APIs. The assumption was that this was temporary. Forty years and several technology generations later, browser agents automate the web using the same basic logic: parsing whatever the human sees. Each generation believed it was building a workaround. Each time, the workaround became infrastructure.
Further Reading




Past Articles

RPA's governance problems didn't arrive all at once. They accumulated quietly: orphaned bots running on departed employe...

Browser automation spent twenty years learning to distrust the gap between issuing a command and the command doing what ...

In 1977, a Fortran compiler overwrote a billing file because a user asked it to. Nobody hacked anything. The compiler ju...

If voluntary coordination almost never closes a severed feedback loop, what has? HTTPS adoption and GDPR enforcement are...
