TinyFish | Practitioner's Corner

Reliability vs. Correctness

Retries Were Built for a Deterministic World

By Rina Takahashi— May 27, 2026

Feature image for article: Retries Were Built for a Deterministic World

A request times out. You send it again. Retry logic is so fundamental to distributed systems that most frameworks ship it by default. It works because the retried operation is identical to the original. Replace the executor with an LLM, and that assumption dissolves. The agent may reason differently on the second attempt, choose different tools, take a different path. Deterministic side effects, produced by a nondeterministic executor. Reliability infrastructure, doing exactly what it was designed to do, starts producing incorrect outcomes.

Reliability vs. Correctness

Retries Were Built for a Deterministic World

By Rina Takahashi— May 27, 2026

A request times out. You send it again. Retry logic is so fundamental to distributed systems that most frameworks ship it by default. It works because the retried operation is identical to the original. Replace the executor with an LLM, and that assumption dissolves. The agent may reason differently on the second attempt, choose different tools, take a different path. Deterministic side effects, produced by a nondeterministic executor. Reliability infrastructure, doing exactly what it was designed to do, starts producing incorrect outcomes.

Builder Profile

Between Complete and Correct

By Rina Takahashi— May 28, 2026

Feature image for article: Between Complete and Correct

An agent asked to add a specific sneaker to a cart navigated Amazon, found a shoe, added it, and reported success. It was the wrong shoe. Every step completed. The task failed. Browser Use co-founders Gregor Zunic and Magnus Müller have made two moves that trace this gap between "complete" and "correct": a migration to raw Chrome DevTools Protocol for real-time browser visibility, and an unsolved problem listed quietly on their hiring page in five words. One reaches for more observation. The other, reproducibility. Neither closes the gap.

Builder Profile

Between Complete and Correct

By Rina Takahashi— May 28, 2026

An agent asked to add a specific sneaker to a cart navigated Amazon, found a shoe, added it, and reported success. It was the wrong shoe. Every step completed. The task failed. Browser Use co-founders Gregor Zunic and Magnus Müller have made two moves that trace this gap between "complete" and "correct": a migration to raw Chrome DevTools Protocol for real-time browser visibility, and an unsolved problem listed quietly on their hiring page in five words. One reaches for more observation. The other, reproducibility. Neither closes the gap.

Governing the Invisible

Governing the Thirty Percent That Was Polite Enough to Introduce Itself

Governing the Invisible

Governing the Thirty Percent That Was Polite Enough to Introduce Itself

The Overnight Run

27 Million Tokens, Zero Progress

New Year's Eve 2025. A Claude Code sub-agent entered a loop, retried the same failing command over 300 times across 4.6 hours, and burned roughly 27 million tokens. The user got the bill. GitHub issue #15909 was closed without a fix.

From the orchestrator's vantage, everything looked fine. The agent was running. Running looks like progress when you haven't built anything that measures progress.

Nobody ships a circuit breaker before the first overnight run teaches them why they needed one.

The Overnight Run

27 Million Tokens, Zero Progress

New Year's Eve 2025. A Claude Code sub-agent entered a loop, retried the same failing command over 300 times across 4.6 hours, and burned roughly 27 million tokens. The user got the bill. GitHub issue #15909 was closed without a fix.

From the orchestrator's vantage, everything looked fine. The agent was running. Running looks like progress when you haven't built anything that measures progress.

Nobody ships a circuit breaker before the first overnight run teaches them why they needed one.

TAKE NOTE

Recurring pattern: GitHub's tracker shows at least four distinct infinite-loop variants filed between late 2025 and mid-2026, each draining full context windows

Quota collapse: Claude Code Max subscribers reported burning through daily limits in under twenty minutes, a problem Anthropic publicly acknowledged

Silent drain: Rate-limit errors look like generic failures to the agent, triggering automatic retries that empty budgets before anyone notices

Wrong instrument: The completion signal worked perfectly. What was absent, entirely, was a progress signal. Two very different measurements.

Design lesson: Circuit breakers aren't observability tooling for mature systems. They're the minimum viable stop condition for anything running autonomously.