Market Pulse

Market Pulse

When Infrastructure Writes the Spec

A support platform prices its AI agents per automated resolution: no escalation, ticket not reopened within 72 hours. Clean metric — one where a customer who gives up counts the same as one who's satisfied. Most organizations deploying agents haven't specified what "good" means in operational terms. Infrastructure is filling that vacuum. Pricing models, protocol defaults, eval frameworks each encoding their own answer. And the specification decision already happened, whether anyone noticed or not.

When Infrastructure Writes the Spec
Asupport platform prices its AI agents per automated resolution: no escalation, ticket not reopened within 72 hours. Clean metric — one where a customer who gives up counts the same as one who's satisfied. Most organizations deploying agents haven't specified what "good" means in operational terms. Infrastructure is filling that vacuum. Pricing models, protocol defaults, eval frameworks each encoding their own answer. And the specification decision already happened, whether anyone noticed or not.
Research Radar
Towards a Science of AI Agent Reliability
It can't distinguish an agent that fails predictably from one that fails randomly, or benign mistakes from catastrophic ones.
Replit's AI assistant deleted a production database; OpenAI's Operator made an unauthorized $31.43 Instacart purchase. Accuracy captured neither risk.
Research Radar
Beyond Accuracy: A Multi-Dimensional Framework for Evaluating Enterprise Agentic AI Systems
Agents with the highest raw scores cost 4.4–10.8x more than Pareto-efficient alternatives delivering comparable real-world performance.
Do-nothing agents pass 38% of a leading benchmark's tasks, meaning evaluations can't reliably tell working agents from idle ones.
Regulation as Specification

Organizations have been rationally avoiding the hardest upstream work in AI deployment: articulating what their systems actually do, what they optimize for, and why those choices are defensible. That avoidance is political as much as practical. Writing the spec forces disagreements into the open.
The EU AI Act's documentation requirements read like the design specification most teams never wrote. Intended purpose, trade-off rationale, metric justification. A regulator, in effect, mandating the homework the ecosystem kept deferring.
The GDPR parallel is instructive. Before 2018, most organizations couldn't inventory their own data. The deadline created the discipline. The AI Act forces something harder: not just knowing what you have, but explaining why it's defensible. Whether that discipline outlasts the compliance scramble will determine if this is a one-time audit or a genuine shift in how systems get built.
Further Reading




