Practitioner's Corner

Practitioner's Corner

APIs for Websites That Don't Have APIs

Suchintan Singh describes Skyvern as building "APIs for websites that don't have APIs." It sounds like a pitch deck line. But the phrase does real work if you've spent years watching ML platforms get measured by marketplace revenue, not model accuracy, and Selenium scripts break every time a frontend developer renames a CSS class. His team built a harder benchmark, scored below 50% on it, and published the results anyway. The architecture and the measurement both point the same direction: web agents are an infrastructure problem that the AI framing keeps obscuring.
APIs for Websites That Don't Have APIs
Suchintan Singh describes Skyvern as building "APIs for websites that don't have APIs." It sounds like a pitch deck line. But the phrase does real work if you've spent years watching ML platforms get measured by marketplace revenue, not model accuracy, and Selenium scripts break every time a frontend developer renames a CSS class. His team built a harder benchmark, scored below 50% on it, and published the results anyway. The architecture and the measurement both point the same direction: web agents are an infrastructure problem that the AI framing keeps obscuring.

The Wall That Used to Be There

An agent running on an 8K-context model that entered a loop used to crash. The window filled, the API errored, the run died. That was an accidental safety mechanism, and it worked. Context windows now stretch past a million tokens, and the accidental wall is gone. A recent multi-agent loop ran for eleven days and $47,000 before a human noticed the billing dashboard. The team had monitoring. They could see it happening. They couldn't stop it programmatically. Watching and stopping, it turns out, require entirely different infrastructure.

The Wall That Used to Be There
An agent running on an 8K-context model that entered a loop used to crash. The window filled, the API errored, the run died. That was an accidental safety mechanism, and it worked. Context windows now stretch past a million tokens, and the accidental wall is gone. A recent multi-agent loop ran for eleven days and $47,000 before a human noticed the billing dashboard. The team had monitoring. They could see it happening. They couldn't stop it programmatically. Watching and stopping, it turns out, require entirely different infrastructure.

The Crash That Stopped Coming and the Engineer Who Had to Replace It
CONTINUE READINGCompound Failure Math

P(success) = a^m. Per-step accuracy raised to the number of sequential steps. At 85% accuracy across 10 steps, you land at 19.7% end-to-end success. Bump to 95% per step and a 10-step workflow still only clears 60%.
That's the generous version. The formula assumes independent errors. In practice, a botched early step poisons downstream context, so failures correlate. The math gives you a floor, not a ceiling. And most production workflows run longer than ten steps.
Further Reading


Past Articles

Skyvern's architecture compiles AI-learned browser workflows into deterministic scripts, then calls the model back only ...

Airworthiness rules carry the fingerprints of specific crashes. Banking regulations followed specific crises. The Five E...

An agent asked to add a specific sneaker to a cart navigated Amazon, found a shoe, added it, and reported success. It wa...

A request times out. You send it again. Retry logic is so fundamental to distributed systems that most frameworks ship i...
