Dashiell "Dash" Greenfield is, by his own admission, a fictional composite assembled from the operational realities that multiple platform engineers encounter roughly 90 days into governing an enterprise agent layer. He agreed to this interview on the condition that we note he is "at least as real as the agent activity in most enterprise audit logs." As will become clear, that's a more loaded statement than it sounds.
The role itself is new. Mid-size enterprises are appointing people to own "agent layer governance," usually senior platform engineers or SREs. The job description sounds straightforward: ensure AI agents operating across the enterprise are visible, compliant, and controlled. The reality is that the instruments inherited for this job were built for a different world entirely.
You're about ninety days in. What's the first thing you learned that nobody warned you about?
Dash: That every dashboard I inherited is answering a question I didn't ask. I've got APM dashboards showing green across the board. Response times under 500ms, error rates at 0.2%, CPU healthy. Great. But did the agent do the right thing?
A UC Berkeley study analyzed over 1,600 agent execution traces. Failure rates between 41 and 87 percent. Every single failure returned HTTP 200.1 The system is "up." The system is also wrong. My dashboard cannot tell the difference.
That's a monitoring problem, though. Hasn't observability tooling caught up?
Dash: You'd think. The observability market split into two camps: traditional APM platforms bolting on AI tabs, and AI-native platforms building tracing from scratch.2 Both camps claim they've solved it. Neither actually answers the question: is the output correct?
An agent can execute five model calls, pick the wrong tool on the third one, get back garbage data, and then accurately summarize that garbage into a confident, well-formatted response. Your trace looks clean. The output is wrong. The aggregate metrics show a slight bump in latency somewhere. They don't show where reasoning failed.3
I spent ten years as an SRE. I built monitoring systems. And the thing I keep circling back to is that the instruments exist. They're just pointed at the wrong layer.
Tell me about the identity problem. You mentioned agents running under human credentials.
Dash: This one keeps me up at night.
Traditional IAM was designed for humans logging in and out. AI agents don't do that. They run continuously, span multiple applications, accumulate permissions. And in most enterprise deployments right now, they authenticate using a human's credentials. A developer's agent runs under that developer's AWS account. In the audit log, it looks like a very productive employee.4
I run an access review, which I'm required to do for SOC 2, and I get a clean picture. Because the agent's actions are attributed to the person whose credentials it borrowed. The audit trail doesn't show an agent. It shows Dave from engineering having a very busy Tuesday.
And the part that really got me: SOC 2 Type II audits examine access control effectiveness, but the scope defaults to human user access. Agent credentials that were never provisioned through formal IAM don't appear in user access reviews. They pass outside the audit scope entirely.5
I inherited a SOC 2-compliant environment that is functionally dark on agent activity.
How many agents are we talking about?
Dash: That's the question I can't answer, which is sort of the whole problem. Non-human identities outnumber human identities at something like 45 to 1 across enterprise environments, and in cloud-native setups, ratios hit 144 to 1.5 That was before the agentic AI acceleration. Ninety-two percent of organizations say they're not confident their legacy IAM tools can manage agent security risks.6
But the number that really haunts me is the shadow population.
Shadow agents.
Dash: The agents I know about, the ones formally provisioned, those I can at least inventory. The ones I can't see are the ones built by someone on the data team who needed to automate a workflow, connected it to three internal APIs, and never filed a ticket. The Cloud Security Alliance draws a distinction between operational visibility and assurance-grade oversight.7 I have the first one, maybe. I definitely don't have the second.
And the paradox that kills me: the shadow agents worth worrying about were built by the best engineers. People competent enough to solve real problems, confident enough to ship without waiting for a governance process that didn't exist three months ago.8 Lenovo's research found 70 percent of enterprise AI operates outside IT oversight.9
Seventy percent. I'm governing the thirty percent that was polite enough to introduce itself.
What about cost? Surely that surfaces eventually.
Dash: Monthly. It surfaces monthly.
There's a GitHub issue for claude-code, number 15909, where a sub-agent got stuck in a loop and consumed 27 million tokens. Four and a half hours of compute, billed in full, providing zero value.10 The user didn't know until after.
Cost is the one signal that will eventually find every shadow agent and every runaway process. But it's a lagging indicator. By the time the invoice shows the anomaly, the tokens are consumed, the actions are taken, the data's been accessed. It's forensic evidence, not a warning system.
Only about 15 percent of companies can forecast AI costs within plus-or-minus ten percent accuracy.11 And that's the companies trying. The ones with shadow spend scattered across departmental credit cards, personal API keys, and SaaS subscriptions that went through different procurement processes? They don't have a number that represents total AI spend. They have several numbers in several dashboards that nobody's added together.
So what is the governance problem, actually?
Dash: A visibility problem. You can't govern what you can't see. The instruments I have measure availability, not behavior. They capture agent actions as human actions. They can't find agents that were never registered. And the one universal signal, cost, arrives too late to be anything but an autopsy.
I was hired to govern the agent layer. Ninety days in, I'm still trying to find it.
Footnotes
-
UC Berkeley MAST study, NeurIPS 2025 Spotlight. Analysis of 1,642 agent execution traces across 7 frameworks. https://medium.com/data-science-collective/your-ai-agent-isnt-down-it-s-wrong-building-observability-for-multi-agent-systems-aeb9fb6badd3 ↩
-
Confident AI, "Best AI Observability Tools in 2026." https://www.confident-ai.com/knowledge-base/compare/best-ai-observability-tools-2026 ↩
-
Sentry Blog, "AI agent observability: The developer's guide to agent monitoring." https://blog.sentry.io/ai-agent-observability-developers-guide-to-agent-monitoring/ ↩
-
Entro Security, "Every NHI Has a Human Owner. Your IAM System Can't See Them." https://entro.security/blog/non-human-identity-lineage-iam-governance/ ↩
-
Cloud Security Alliance, "The Non-Human Identity Governance Vacuum." May 20, 2026. https://labs.cloudsecurityalliance.org/research/csa-whitepaper-nonhuman-identity-agentic-ai-governance-v1-cs/ ↩ ↩2
-
MightyBot, "What Is Non-Human Identity Management for AI Agents?" https://mightybot.ai/blog/what-is-non-human-identity-management-ai-agents/ — cites IANS Research figure. ↩
-
Cloud Security Alliance Blog, "Shadow AI Agents: Enterprise Governance." April 28, 2026. https://cloudsecurityalliance.org/blog/2026/04/28/the-shadow-ai-agent-problem-in-enterprise-environments ↩
-
Matt Hopkins, "Shadow agents: the enterprise governance crisis nobody planned for." April 25, 2026. https://matthopkins.com/business/shadow-agents-enterprise-governance-crisis/ ↩
-
Sphere Partners, citing Lenovo April 2026 research (6,000 enterprise employees). https://www.sphereinc.com/blogs/shadow-ai-governance-gap ↩
-
GitHub Issue #15909, anthropics/claude-code. December 31, 2025. Primary-source incident documentation. ↩
-
Trussed AI, "AI Cost Observability: A Practical Guide." April 20, 2026. https://feeds.trussed.ai/blog/ai-cost-observability — cites Flexera 2025 State of Cloud data. ↩
