What the Click Was Holding Together

The Xerox Star designers wanted users to feel they were operating directly on data, not through an intermediary. The earliest HTML form spec described fields "edited by the user," submitted when "the user indicates submission." The human was load-bearing. Operator and silent guarantor at once, ensuring that four distinct assertions landed together: I'm here. I see what I'm about to do. I have the authority to do it. I mean to do it now.

Nobody needed to name those components separately, because nobody needed to separate them. The person in the chair held them together.

So the click became an integration surface. An accident of proximity, nothing more. Audit logs could record "user did X" without distinguishing who authorized it from who performed it, because those were the same body. Session tokens could stand in for identity because the session implied continuous human presence. Approval workflows could compress a complex organizational decision into a single button, because the approver was assumed to have read the screen, understood the context, and chosen to proceed. The click was shorthand that worked because everyone involved shared the same unstated grammar.

This is what compression looks like when it's working. You forget the components exist.

When software tries to reproduce the gesture, the seams show immediately. The W3C WebDriver spec defines a click as: scroll the element into view, hit-test its center point against the paint order, confirm nothing obscures it, verify it can receive events, then dispatch. Playwright adds its own preconditions: the element must be visible, stable, enabled. A finger just lands. The infrastructure to simulate that landing without a finger is, itself, a small catalog of everything the finger was quietly providing.

And still, even after all those checks succeed, the gesture arrives intact while the meaning doesn't. A browser agent can locate an element and click its center point with geometric precision. It has timing. It may have authorization. But its intent is borrowed from somewhere upstream, carried through a chain of delegation that the receiving system has no way to inspect.

The systems surrounding the click have no vocabulary for the separated components. An audit log entry that says "the approve button was clicked at 2:47 PM" used to be a complete sentence. It now reads more like a sentence missing its subject and most of its context. Token standards already have delegation claims that can distinguish who requested an action from who performed it. Most web systems don't speak that language yet. They still assume the click is self-interpreting.

This may be why agent infrastructure is harder than the demo suggests. Making software click a button has been a solved problem for decades. Automation tools have been perfectly capable of dispatching a click event since before most current engineers started their careers. Those tools could dispatch the event. They couldn't replicate what the click was quietly serving as: proof of presence, evidence of intent, exercise of authority, and timestamp of decision. All compressed into a single gesture.

The screen can operate. It can find the element, dispatch the event, move on. What it cannot do is testify. It cannot say who meant this, whether the action performed was the action approved, or whether anyone with standing was watching when it happened.

Things to follow up on...

Payment rails force legibility: Visa recently embedded its payment network into ChatGPT with spending limits, approval steps, and merchant restrictions, suggesting that money may decompress the click faster than ordinary enterprise systems because disputes need structured records.
Non-human identity as category: OWASP's 2025 Non-Human Identities Top 10 treats service accounts, API keys, and agent credentials as a distinct security surface, mapping the gap between sessions designed for people and the entities now acting through them.
Selenium's accidental origin: The project began in 2004 when a ThoughtWorks engineer built a test runner for an internal Time and Expenses application, making the browser's first serious automation use case not a grand integration play but a workaround for a missing machine interface.
Bainbridge's enduring irony: A 1983 paper on the ironies of automation argued that automation leaves human operators responsible for the hardest residual tasks while reducing their practice, a pattern that maps directly onto today's agent approval buttons.