The Furniture Is Arriving Before the House Is Built

The most revealing developments in the agent ecosystem lately have been aggressively mundane. Admin consoles, spend controls, identity plumbing, protocol governance. The kind of operational infrastructure that usually shows up once a technology category has settled. OpenAI shipped a Global Admin Console. Anthropic introduced workload identity federation. OpenTelemetry published a dedicated repository for GenAI semantic conventions, defining spans for operations like create_agent and execute_tool, all still labeled "Development." The A2A protocol moved to the Linux Foundation at version 0.3.0.

Institutional furniture. And the natural reading is that it signals maturity. Governance frameworks and admin dashboards are what you build once the hard problems are solved. That's how it worked in cloud, in SaaS, in mobile. But the timing here is strange.

McKinsey's 2025 global AI survey puts useful numbers on the gap. Eighty-eight percent of organizations report regular AI use. Only 23 percent say they're scaling agentic AI anywhere, and no individual business function exceeds 10 percent scaled deployment. Nearly two-thirds remain in experimentation or pilot phases. That's a wide space between "we're using AI" and "we're operating agents accountably," and the furniture seems to be arriving in that space, not after it.

One way to read this: the furniture is responding to the growing cost of unsettled practice. Nobody needs an admin console to run a demo. But when three different teams have spun up agents with three different providers and nobody knows what's being spent or who authorized what, the absence of one starts to hurt. Observability conventions are similarly unremarkable during experimentation. They become urgent when experiments start touching production data and someone asks what happened at 2 a.m.

A historical parallel is instructive here. When NIST published its cloud security guidance in 2011, the document acknowledged that cloud computing "remains a work in progress." When Google moved Kubernetes to the newly formed CNCF in 2015, container orchestration was promising but far from standardized. In both cases, governance structures arrived because enough organizations were using the technology in enough different ways that the lack of shared operational norms was itself becoming a problem.

Something similar seems to be playing out with agents, compressed into a shorter timeline. A protocol at v0.3.0 gets foundation governance. Observability conventions are being defined for agent operations most organizations haven't built yet. Identity federation for workloads that are still largely experimental. Each is reasonable on its own. Together, they suggest something about where the ecosystem actually sits: past curiosity, past early experimentation, but well before anyone has converged on what "running agents in production" reliably looks like.

And that might be the most useful thing the furniture tells us. Enough organizations have bumped into the same operational gaps simultaneously to make the boring stuff urgent. The demand signal worth watching is probably in the admin consoles, quietly accumulating.

Things to follow up on...

Payments as authority testbed: Visa's integration into ChatGPT for agent-completed purchases surfaces the question of who authorized what, under which limit, and with what record when an agent spends money on someone's behalf.
MCP's confused-deputy warning: The MCP authorization specification explicitly forbids token passthrough to prevent confused-deputy risk, a concrete example of protocol designers encoding governance constraints before most implementations exist.
Agent reliability denominators: WebArena's ICLR 2024 paper reported a 14.41 percent end-to-end task success rate for its best GPT-4-based agent, a reminder that operational furniture needs something reliable to furnish.
Enterprise spend is shifting: Menlo Ventures estimates companies spent $37 billion on generative AI in 2025, up from $11.5 billion in 2024, with 76 percent of enterprise use cases purchased rather than built internally.