What Operator's 38% Success Rate Actually Means

OpenAI announced Operator in January 2025: an agent that handles web tasks autonomously. Book restaurants, order groceries, plan vacations. The demo looked smooth.

You get a $200/month research preview that stops at every CAPTCHA and password field. It refuses financial transactions. Early users compared performance to "watching an arthritic half-blind grandma use a rusty typewriter."

The 38.1% benchmark success rate tells you everything. Rate limits on concurrent tasks. Ninety-day data retention. Computational costs OpenAI calls "cost-prohibitive for widespread use." The gap between "can interact with browsers" and "can reliably complete tasks" remains enormous.

Novel architecture, genuine innovation. But production reality? Still distant.

OpenAI announced Operator in January 2025: an agent that handles web tasks autonomously. Book restaurants, order groceries, plan vacations. The demo looked smooth.

Novel architecture, genuine innovation. But production reality? Still distant.