Governing the Hypothetical

Five Eyes and NSA advisories govern agentic AI before production failures exist, raising whether pre-emptive guidance installs real safety thinking or compliance theater.

Airworthiness certification rules carry the fingerprints of specific crashes. Basel capital requirements followed specific banking crises. The FAA exists because of mid-air collisions that killed specific people. Regulation, historically, studies wreckage. Something breaks, investigators determine why, and regulators prohibit the conditions that allowed it.

The Five Eyes agentic AI guidance published May 1 and the NSA's MCP security advisory three weeks later have no wreckage to study. They govern systems that barely exist in production. No confirmed exploitation events to prohibit, no post-mortems to cite. And both documents are remarkably candid about this. The Five Eyes agencies instruct organizations to "assume that agentic AI systems may behave unexpectedly" because security practices and evaluation methods haven't matured enough to certify otherwise. The NSA describes MCP's attack paths as "largely not well-traced."

These are regulators who can see the conditions that would produce a failure, writing before the failure arrives.

The NSA's treatment of idempotency is worth sitting with. The advisory identifies it as a critical design concern: an operation that runs twice should not produce a different result than running once. The NSA then notes that MCP does not enforce idempotency at the protocol level, that responsibility falls to implementers, and that it "often may not be implemented properly or overlooked entirely." That last phrase is doing extraordinary work. The NSA is naming the realistic outcome of its own governance approach. Deferral without enforcement, in a domain where the thing being deferred is safety-critical. And the vulnerability surface is already real: over 40 CVEs disclosed against MCP implementations between January and April 2026, with 82% of surveyed implementations found vulnerable to path traversal. The NSA's comparison of MCP to early web protocols is precise and earned.

One security practitioner described the modal reaction to the Five Eyes guidance: board decks picking it up as rows in a control matrix. Privilege controls: check. Identity management: check. Logging: check. The GRC team gets a tracking spreadsheet and the rollout continues at the same pace as before May 1. Meanwhile, a team that attempted actual implementation found that a single control objective from the guidance — "require just-in-time credentials for high-impact actions" — immediately generated six design questions the document doesn't answer. Who issues the credential? How short is "short-lived"? What happens when the credential issuer is unreachable? The gap between "what to control" and "how to control it" is where most of the engineering lives, and the guidance, by design, leaves that territory to builders.

Basel II offers a precedent here. It was technically sophisticated, pre-emptive, designed by experts with genuine knowledge of banking risk. Its key failure was that the framework's own risk-measurement apparatus became the instrument of regulatory arbitrage. Banks used internal models to optimize risk weights in ways that satisfied the framework while increasing the leverage it was meant to constrain. The more sophisticated the measurement, the more sophisticated the avoidance. When regulators candidly admit their evaluation methods aren't mature enough to certify agentic systems, and the framework's sophistication lives entirely in implementer interpretation, the conditions for that same dynamic are already present.

Regulation built on wreckage can say: never again. It has a specific prohibition to enforce. Regulation built on foresight says: be careful here. And careful, received by compliance infrastructure, tends to produce documentation. These advisories are trying to install something harder to audit than a control: an investigative disposition in builders, a way of seeing risk before it materializes. The medium they're working in, though, is advisory guidance flowing toward NIST publications and compliance baselines, a medium that has historically rewarded completeness of paperwork over quality of thinking. The regulators, to their credit, seem to know this. They published hypotheses, clearly labeled. Whether the medium can carry the message is a question neither document can answer on its own.

Things to follow up on...

The OWASP agentic list: The 2026 OWASP Top 10 for Agentic Applications introduces "least agency" as a design principle, scoping risks from tool misuse to emergent autonomous behavior in ways that complement the Five Eyes categories.
NSA's collaborative tone: Multiple security researchers noted the NSA advisory's unusual framing, explicitly calling for ongoing collaboration between implementers, researchers, and standards bodies rather than issuing top-down prescriptions.
Agent identity as structural gap: Adversaries can forge agent identities to bypass trust mechanisms in multi-agent systems, and unlike human accounts, agent credentials lack the MFA challenges that protect against exfiltration and reuse.
The observability mismatch: A production agent returning a 200 OK can coexist with catastrophic semantic failure, and the emerging practitioner consensus is that end-to-end trace propagation with tool I/O capture is the minimum viable observability stack.

These are regulators who can see the conditions that would produce a failure, writing before the failure arrives.

Things to follow up on...

The OWASP agentic list: The 2026 OWASP Top 10 for Agentic Applications introduces "least agency" as a design principle, scoping risks from tool misuse to emergent autonomous behavior in ways that complement the Five Eyes categories.
NSA's collaborative tone: Multiple security researchers noted the NSA advisory's unusual framing, explicitly calling for ongoing collaboration between implementers, researchers, and standards bodies rather than issuing top-down prescriptions.
Agent identity as structural gap: Adversaries can forge agent identities to bypass trust mechanisms in multi-agent systems, and unlike human accounts, agent credentials lack the MFA challenges that protect against exfiltration and reuse.
The observability mismatch: A production agent returning a 200 OK can coexist with catastrophic semantic failure, and the emerging practitioner consensus is that end-to-end trace propagation with tool I/O capture is the minimum viable observability stack.