Practitioner's Corner

Practitioner's Corner

The Web Wasn't Built for This

To a person with a browser, a form field is a text box. To a machine trying to operate that same page, it's a hundred lines of nested HTML with the semantically important element buried under presentational scaffolding a visual cortex would simply ignore. Magnus Müller's Browser Use project started from a hobbyist's question about connecting LLMs to the web. His design choices since — rejecting vision-based navigation, treating site instability as a condition to absorb — trace the real complexity: the web was built for human interpreters, and operating it as a machine surfaces a mismatch that no selector strategy can bridge.

The Web Wasn't Built for This
To a person with a browser, a form field is a text box. To a machine trying to operate that same page, it's a hundred lines of nested HTML with the semantically important element buried under presentational scaffolding a visual cortex would simply ignore. Magnus Müller's Browser Use project started from a hobbyist's question about connecting LLMs to the web. His design choices since — rejecting vision-based navigation, treating site instability as a condition to absorb — trace the real complexity: the web was built for human interpreters, and operating it as a machine surfaces a mismatch that no selector strategy can bridge.
Retry Logic Assumes a Contract That LLM Agents Can't Keep

Retry logic in distributed systems works because of a quiet contract: same input, same execution path. An idempotency key ties attempts together, and the server returns a cached result instead of executing again. Agent systems inherit this discipline. But when an LLM decides which tools to call, token sampling variance means the same prompt can produce different tool selections, different parameters, different ordering on retry. Each individual tool call can be perfectly idempotent, and the composed sequence still produces different side effects on each attempt. One developer documented an agent double-charging a customer $847 when a timed-out payment triggered a retry that took a different path through the same workflow.
Retry Logic Assumes a Contract That LLM Agents Can't Keep
Retry logic in distributed systems works because of a quiet contract: same input, same execution path. An idempotency key ties attempts together, and the server returns a cached result instead of executing again. Agent systems inherit this discipline. But when an LLM decides which tools to call, token sampling variance means the same prompt can produce different tool selections, different parameters, different ordering on retry. Each individual tool call can be perfectly idempotent, and the composed sequence still produces different side effects on each attempt. One developer documented an agent double-charging a customer $847 when a timed-out payment triggered a retry that took a different path through the same workflow.


Isolation Ward

OpenAI's Lockdown Mode disables live browsing, deep research, agent mode, image retrieval, and file downloads. Everything that makes a web agent act like a web agent, turned off to keep data from leaving after a prompt injection attack.
OpenAI is candid about what remains. "Lockdown Mode does not prevent prompt injections from appearing in the content ChatGPT processes," their documentation states. A malicious instruction embedded in a cached page or uploaded file still reaches the model, still shapes its reasoning, still degrades accuracy. The agent can be manipulated into producing wrong outputs. It just can't send your data to an attacker's server.
So the containment works. But it works by removing the agent's reach, which was the point of having an agent. OpenAI's own framing: users "trade elements of product functionality for stricter product guardrails." The agent still ingests poisoned content and still acts on bad instructions. It's in the room with the liar. It just can't call anyone about what it heard.
Further Reading




Past Articles

An agent running on an 8K-context model that entered a loop used to crash. The window filled, the API errored, the run d...

Suchintan Singh describes Skyvern as building "APIs for websites that don't have APIs." It sounds like a pitch deck line...

Skyvern's architecture compiles AI-learned browser workflows into deterministic scripts, then calls the model back only ...

Airworthiness rules carry the fingerprints of specific crashes. Banking regulations followed specific crises. The Five E...

