The Pilot Worked. That's the Problem.

By Carey Whitten— May 5, 2026

Federal AI pilots aren't stalling because agencies lack enthusiasm, executive sponsorship, or change management discipline. The GAO has documented the pattern across enough programs now that the cultural explanation has worn thin: agencies that ran successful pilots, generated real usage data, and secured follow-on funding are still not shipping to production. The pilots worked. The production path doesn't exist.

The failure is architectural. And if you've been selling into federal accounts long enough, you've seen it before.

The Pattern You Already Know

Between 2015 and 2020, federal cloud migration programs hit a specific wall at a specific moment. The sequence was almost formulaic: a team stood up workloads in AWS or Azure, often under an innovation mandate or a FITARA-driven modernization push, outside the enterprise IAM boundary. They got real usage. They demonstrated cost savings. They got executive attention. Then someone asked the production question — "who has access to what, and how do we prove it?" — and the answer was: we don't know, and the infrastructure to find out doesn't exist here.

The pilot worked because it was insulated from the compliance and access control requirements that production demands. The ATO process exposed the gap. The path forward wasn't configuration; it was reconstruction. Teams had to rebuild the access control layer, instrument the audit trail, and integrate with the enterprise identity infrastructure they'd bypassed in the name of speed. Some programs never recovered the momentum. Others took two to three years longer than the original timeline assumed.

That's the cloud migration story. For AI, it rhymes — but in three specific places, the rhyme breaks badly.

Where the Cloud Analogy Stops Working

The structural parallel between cloud migration stalls and AI pilot stalls is real, but it understates the problem. The identity infrastructure gaps for AI agents aren't just larger versions of the gaps that existed for cloud workloads. They're different in kind, and the enterprise identity stack wasn't built to close them.

The Agent Identity Problem

In cloud migration, the identity subjects were humans and service accounts. Both are known types with established provisioning patterns. A service account has an owner, a purpose, a lifecycle tied to the application it serves, and a decommission path when that application retires. The SCIM provisioning model handles this. The joiner-mover-leaver framework handles the human side. Enterprise identity infrastructure was built around the assumption that the subjects it governs are either people or stable, purpose-specific non-human actors.

AI agents are neither. An agent operating in a federal environment acts autonomously, may spawn sub-agents mid-task to handle delegated subtasks, and has a "purpose" that shifts based on what the user asked it to do in a given session. Its lifecycle doesn't map to an HR record. Its identity doesn't map to a service account, because service accounts don't spawn child processes that need their own access decisions. The provisioning model that enterprise identity infrastructure runs on assumes a relatively stable subject with attributes that can be synchronized across systems. An agent that creates other agents mid-session, each needing scoped access to different resources, breaks that model at the architectural level.

"What identity does this agent have?" doesn't have an answer in most enterprise identity deployments today, because the question assumes a category of principal that the infrastructure wasn't designed to accommodate. A missing lane, not a misconfigured one.

Non-Human Credential Management

Cloud workloads needed secrets management. The enterprise response was vault solutions, rotation schedules, and secrets injection at deployment time. The credential lifecycle was relatively static: rotate every 90 days, revoke on decommission, audit access to the vault. This was a real operational improvement over hardcoded credentials, and it worked well enough for the workload patterns cloud migration created.

AI agents operating in production need a different credential model entirely. The credentials an agent uses to access a downstream system need to be short-lived — measured in minutes or the duration of a specific task, not days or rotation cycles. They need to be scoped to what the agent is doing right now, not what it was generally provisioned to do. And they need to be revocable mid-session if the agent's behavior deviates from expected parameters, not at the next rotation cycle, not at decommission, but while the session is active.

The vault infrastructure enterprises built for human and service identity credential management doesn't fit this pattern. Rotation frequency alone breaks most configurations: a vault designed to rotate credentials on a 90-day schedule isn't architected to issue and expire credentials on a per-task basis. The revocation model is even further off. Revoking a service account credential is a deliberate, relatively infrequent administrative action. Revoking an agent's credential mid-session because a behavioral anomaly was detected is an automated, real-time response to a runtime signal. Those are different systems doing different things, and most agencies don't have the second one.

Dynamic Scope Assignment

OAuth scopes in enterprise deployments are defined at application registration time. The application knows what resources it needs to access, the authorization server is configured with those scopes, the admin approves the grant, and the scope set is stable for the life of the application. This works because traditional applications have predictable, bounded resource requirements. A payroll application needs access to payroll data. It doesn't need access to email. That's true today and it will be true next quarter.

AI agents don't work this way. An agent that was instantiated to summarize a document may be asked, mid-session, to also schedule a follow-up meeting, pull a related contract from a different system, and draft a response to a stakeholder. Each of those tasks requires a different scope set. The agent's resource requirements are determined at runtime by the user's intent, not at registration time by the developer's design. The governance model for "permissions that change based on what the agent is doing right now" doesn't exist in most enterprise identity deployments, because the model was never needed. Static applications don't have this problem.

For production deployment, that means an agency wanting to govern an AI agent's access to federal systems needs a fine-grained authorization infrastructure that can evaluate access decisions dynamically, at the moment of access, based on context that includes what the agent is currently doing. That's a different architecture than the role-based access control model most agencies have implemented. It's closer to what the NIST AI Risk Management Framework gestures at when it discusses "runtime governance," but gesturing at it and having the infrastructure to implement it are different things. This is also where the most confident-sounding takes about "just use OAuth" go wrong.

The Rebuild Problem

When these three gaps surface in a production review — and they will surface, because the ATO process will ask about agent identity, credential management, and access governance — the answer isn't a configuration change. It's a rebuild.

The pilot's access control layer has to be reconstructed to accommodate agent principals. The credential management infrastructure has to be extended to support short-lived, task-scoped credentials with real-time revocation. The authorization model has to be replaced with something that can evaluate dynamic scope assignments against policy at runtime. None of this is incremental. All of it takes time that wasn't in the original program plan, budget that wasn't in the original appropriation, and expertise that's genuinely scarce.

Okta's machine identity management capabilities and fine-grained authorization work address parts of this — the non-human credential lifecycle and the dynamic authorization model, respectively — but the integration work required to bring those capabilities into an existing federal identity architecture is non-trivial, and most agencies haven't started it. The technology exists. Getting it deployed through the enterprise identity stack is the problem.

The pilot was built outside the envelope. Getting it inside the envelope requires building the envelope first.

What to Ask Before the Next Meeting

If you're in a discovery conversation with a CAIO or a federal IT leader who's talking about moving an AI pilot to production, four questions will tell you whether they're architecturally stranded:

Where does your pilot's identity infrastructure live relative to your enterprise IAM boundary? If the answer involves a separate tenant, a standalone identity provider, or "we're using the application's built-in auth," the pilot is outside the envelope.

When your pilot agent needs to access a downstream system, what credential does it use, and who manages its lifecycle? Vague answers here — "it uses a service account," "the vendor manages it" — indicate the credential lifecycle hasn't been designed for production.

If the agent's behavior changed unexpectedly mid-session, what's the mechanism for revoking its access in real time? Most buyers can't answer this question. That's the signal.

How are the agent's access decisions being logged in a format your audit infrastructure can consume? Not "are they being logged" — they probably are, somewhere. But whether the log format, retention policy, and query capability meet the audit requirements for a production federal system is a different question.

A buyer who can answer all four with specificity is in better shape than most. A buyer who stumbles on the third question has found the gap that's going to stop their production timeline. Either way, you've just had a more useful conversation than the one about features.

The pilots are working exactly as designed. The design is the problem.