OpenAI's Agent Hits Production Reality

OpenAI launched ChatGPT agent in January 2025 promising autonomous browser-based task execution. The demo showed smooth navigation, intelligent decision-making, workflows that run themselves.

Then people tried to run it.

Leon Furze tested it extensively. His conclusion: "OpenAI hasn't released this product because it works. They've released it to be first." The system takes 1-2 seconds per action. It fails silently on dynamic interfaces. No telemetry when things break. Users report treating it "like a junior intern with admin access: watch carefully and limit what they see."

The vision-based approach creates inherent constraints. Taking screenshots and deciding where to click works differently than structured API calls. Modern web interfaces confuse the visual understanding model. Hidden state, hover effects, multi-step forms. What scores 68.9% on BrowseComp breaks in production environments where consistency matters.

The gap between announcement and production reveals where agent technology actually stands right now.

OpenAI launched ChatGPT agent in January 2025 promising autonomous browser-based task execution. The demo showed smooth navigation, intelligent decision-making, workflows that run themselves.

Then people tried to run it.

The gap between announcement and production reveals where agent technology actually stands right now.