Practitioner's Corner

Practitioner's Corner

Retries Were Built for a Deterministic World

A request times out. You send it again. Retry logic is so fundamental to distributed systems that most frameworks ship it by default. It works because the retried operation is identical to the original. Replace the executor with an LLM, and that assumption dissolves. The agent may reason differently on the second attempt, choose different tools, take a different path. Deterministic side effects, produced by a nondeterministic executor. Reliability infrastructure, doing exactly what it was designed to do, starts producing incorrect outcomes.

Retries Were Built for a Deterministic World
Arequest times out. You send it again. Retry logic is so fundamental to distributed systems that most frameworks ship it by default. It works because the retried operation is identical to the original. Replace the executor with an LLM, and that assumption dissolves. The agent may reason differently on the second attempt, choose different tools, take a different path. Deterministic side effects, produced by a nondeterministic executor. Reliability infrastructure, doing exactly what it was designed to do, starts producing incorrect outcomes.
Between Complete and Correct

An agent asked to add a specific sneaker to a cart navigated Amazon, found a shoe, added it, and reported success. It was the wrong shoe. Every step completed. The task failed. Browser Use co-founders Gregor Zunic and Magnus Müller have made two moves that trace this gap between "complete" and "correct": a migration to raw Chrome DevTools Protocol for real-time browser visibility, and an unsolved problem listed quietly on their hiring page in five words. One reaches for more observation. The other, reproducibility. Neither closes the gap.
Between Complete and Correct
An agent asked to add a specific sneaker to a cart navigated Amazon, found a shoe, added it, and reported success. It was the wrong shoe. Every step completed. The task failed. Browser Use co-founders Gregor Zunic and Magnus Müller have made two moves that trace this gap between "complete" and "correct": a migration to raw Chrome DevTools Protocol for real-time browser visibility, and an unsolved problem listed quietly on their hiring page in five words. One reaches for more observation. The other, reproducibility. Neither closes the gap.


Governing the Thirty Percent That Was Polite Enough to Introduce Itself
CONTINUE READINGThe Overnight Run

New Year's Eve 2025. A Claude Code sub-agent entered a loop, retried the same failing command over 300 times across 4.6 hours, and burned roughly 27 million tokens. The user got the bill. GitHub issue #15909 was closed without a fix.
From the orchestrator's vantage, everything looked fine. The agent was running. Running looks like progress when you haven't built anything that measures progress.
Nobody ships a circuit breaker before the first overnight run teaches them why they needed one.
Further Reading




Past Articles

A production agent runs for twelve minutes, calls nine tools, and returns a clean result. Every dashboard is green. The ...

Every layer of web infrastructure, from crawling to indexing to ranking, was designed for a user who clicks and scans. A...

Government portals, insurance dashboards, vendor procurement systems. No API, no programmatic access. Just a browser and...

Most enterprise teams running agent workflows have never checked the approval rate on their human-in-the-loop controls. ...
