The Metric You Pick Determines the Architecture You Ship

Agentic workflows burn 5 to 30 times more tokens per task than a chatbot interaction. Context compounds with every loop, every tool call, every retry. By step 50, you're paying for the same conversation history 50 times over.

Most teams track cost-per-run. The number that actually matters is cost-per-successful-task, which folds in every failed attempt, every retry, every cleanup. At a 70% success rate, your true unit cost is roughly 43% higher than the number on your dashboard.

That gap is where architecture decisions go wrong. A cheaper model with a lower pass rate can quietly cost more per successful outcome than an expensive one that finishes in fewer steps. Teams measuring the wrong thing optimize confidently in the wrong direction, and the spreadsheet agrees with them the whole way down.