The W3C Moment

When the W3C published WebDriver as a formal Recommendation in June 2018, the announcement described it as enabling "out-of-process programs to remotely control a browser in a way that emulates the actions of a real person using the browser."

That sentence now reads like a product description for AI agents.

The spec itself was more measured. It called WebDriver "a remote control interface that enables introspection and control of user agents," adding that it was "primarily intended to allow web authors to write tests." The working group charter was blunter: the Browser Testing and Tools Working Group existed to produce "technologies for automating testing of Web applications running in browsers." Testing was the stated scope, full stop.

The word primarily is worth pausing on. A small qualifier, easy to read past, but it left room. The spec authors knew they were building something more general than a testing tool, even as they defined it as one.

The origin story makes this legible in retrospect. Jason Huggins built the original Selenium at ThoughtWorks to test an internal time-and-expenses application that worked poorly for remote employees. Simon Stewart built WebDriver at the same company to solve a similar testing problem. The two projects merged into Selenium 2.0 in 2011, and by the time the W3C standardized the protocol, what they were formalizing had already outgrown the mundane problems that motivated it. An interface built so that QA engineers could verify a button click had become a general-purpose vocabulary for making browsers do things on behalf of software. The abstraction was more general than its authors needed it to be. Click this element, read that text, navigate here, submit this form. Testing operations and browser operations turned out to be the same vocabulary. The spec encoded the verbs of browser interaction without restricting who could conjugate them.

What's happening now with WebDriver BiDi extends this trajectory further. Classic WebDriver used HTTP request-response: send a command, wait, get an answer. BiDi, with its latest revision from February 2026, switches to WebSockets. The browser can push events to the controlling program. DOM mutations, network activity, console output, unhandled prompts. Instead of polling to ask "did anything change?", the program listens.

This matters because agents don't interact with browsers the way test suites do. A test knows what it expects. An agent needs to react to what it finds. Bidirectional events let the controlling program respond to a live page rather than interrogate a static one, which is closer to how agents actually work: discovering states, adapting, proceeding. Mozilla's implementation notes make the subtext nearly explicit: the work is "foundational for future use cases, including controlling Firefox via MCP servers."

The charter still says testing. The spec still lives under the Browser Testing and Tools umbrella. But the protocol's actual constituency has quietly expanded well beyond the people who wrote it. A standard designed to let QA engineers verify that a button works now provides the low-level interface through which AI systems operate on the live web. None of this was planned. The word primarily just left enough room for everything that followed.