This is the dominant state of federal AI right now. GAO's 2024 review of AI adoption across civilian agencies found that while agencies had catalogued hundreds of AI use cases in their annual inventories — required under OMB M-24-10 — the vast majority remained in pilot or proof-of-concept status, with production deployment rates that the report characterized as "limited relative to stated investment levels." (GAO-24-106821, published February 2024; the sample covered 23 CFO Act agencies, so the finding is broadly representative of the large-agency landscape, though smaller agencies weren't included.) The programs that made it to production weren't distinguished by better vendor selection, more enthusiastic leadership, or superior change management. They were distinguished by a set of architectural and governance decisions that most pilots never made.
The seller's instinct, shaped by years of enterprise software deals, is to read stalled adoption as a people problem. Insufficient training. Skeptical middle management. Leadership that talks AI but doesn't fund AI. That instinct captures something real and stops well short of the specific diagnosis. The programs that died in the pilot graveyard were often killed by something more concrete: they were built outside the agency's approved enterprise ecosystem, without an identity and compliance envelope, and with no governance infrastructure that could survive a production audit. That's an architecture problem. It has an identity fingerprint. And it's diagnosable before you walk into the room.
Why the Change Management Frame Stops Short
Change management frameworks were built for a different class of problem — deploying ERP systems, migrating to cloud email, rolling out new HR platforms. In those contexts, the technology itself was usually enterprise-approved before the change management work began. Someone had already run the FedRAMP authorization, connected the system to the agency's identity provider, and gotten the ISSO's sign-off. The change management challenge was adoption, not architecture.
Federal AI pilots are different. They frequently start as skunkworks: a motivated program office, a vendor willing to run a free or low-cost pilot, and an informal agreement to "see what happens." The tool gets deployed to a small group of users, often through individual accounts rather than enterprise provisioning, with data handling that's been informally blessed rather than formally reviewed. The pilot produces results. The program manager wants to scale.
And then the architecture catches up with them.
The ISSO asks how users were provisioned. The answer is "we sent them invites." The security officer asks where the audit logs are. The answer is "in the vendor portal, I think." The privacy officer asks whether the data processed by the AI system was covered by the existing PIA. The answer is silence. The ATO process begins, and the pilot — which was never built to survive an ATO — effectively has to be rebuilt from scratch. Most of them don't survive that rebuild. The program manager moves on, the vendor loses the deal, and the pilot joins the graveyard.
Structurally frozen, and recognizable in advance.
The Failure Patterns
Ecosystem isolation. The clearest leading indicator of a stuck program is that the AI tool was deployed outside the agency's approved enterprise ecosystem. It's not in the software asset management system. It's not connected to the agency's SSO. Users authenticate with personal or vendor-managed credentials rather than PIV or enterprise identity. This isn't always visible from the outside, but it surfaces quickly in discovery: ask how the tool was provisioned to pilot users, and listen for whether the answer involves the agency's identity provider or a workaround.
Ecosystem isolation isn't always intentional. Program offices often don't know what "approved enterprise ecosystem" means in practice for AI tools, because most agencies haven't published clear guidance. But the isolation creates a structural ceiling: the tool cannot scale past the pilot without going through the enterprise integration process, and that process — FedRAMP authorization, ATO, SSO integration, SCIM provisioning — takes time the program office didn't budget and requires stakeholders who weren't in the original pilot conversation.
The identity envelope gap. A production AI system in a federal environment needs to know who is using it, what they're authorized to access, and how to enforce those boundaries at the data layer. That's not a vendor feature; it's an integration requirement. The agency's identity infrastructure has to be the source of truth for user identity, roles, and entitlements, and the AI system has to consume that truth in real time.
Most pilots skip this. They use flat access models — everyone in the pilot group can see everything the pilot group can see — because building role-based access controls into a proof-of-concept is expensive and the program office is trying to demonstrate value, not build production infrastructure. The problem is that the flat access model is disqualifying for production. When the ISSO asks "how does the system enforce need-to-know?" and the answer is "it doesn't, we trust the users," that's not a gap you close with a configuration change. That's an architectural rebuild.
The identity envelope gap is the single most reliable predictor of pilot stuckness I've seen in the federal AI landscape. Programs that integrated with the agency's identity provider early — even imperfectly — have a path to production. Programs that didn't are structurally frozen until someone is willing to fund the rebuild.
Audit readiness failure. Federal production systems generate audit logs. Auditors aren't watching every transaction; the audit log is the evidence base for incident response, compliance reporting, and the annual FISMA review. An AI system that can't produce a coherent audit trail — who queried what, when, with what result, using what data — cannot pass a production security review.
A lot of AI pilots are quietly failing on exactly this right now. The tools generate logs, but the logs are in vendor-proprietary formats that don't integrate with the agency's SIEM. Or the logs capture model outputs but not the input data that generated them. Or the retention period doesn't match the agency's records management requirements. The ISSO knows this is a problem. The program office doesn't, because nobody told them audit log architecture was part of the pilot scope.
The governance survival test. OMB M-24-10 requires agencies to establish AI governance structures — risk management frameworks, use case review processes, accountability mechanisms. Most agencies have published something that satisfies this requirement on paper. The question is whether the governance structure is real enough to make a production decision.
A governance structure that can make a production decision has: a named accountable official who can sign off on production deployment, a documented risk assessment that covers the specific use case, a monitoring plan that specifies how the system's outputs will be reviewed post-deployment, and a decommissioning plan. Programs that have all four of these things are rare. Programs that have a governance committee that meets quarterly and produces minutes are common. The distinction matters enormously when a program office is trying to get a production deployment approved.
When a CAIO says at a GovCIO event that their agency is "building a culture of responsible AI," that's an aspirational statement, not evidence of governance infrastructure. Treat published AI strategies the same way — they document intent, not outcomes. The gap between the strategy document and the governance structure that can actually approve a production deployment is where most programs live.
What Production-Ready Looks Like
The programs that have made it to production share a set of observable characteristics that are findable before you walk in.
They integrated with the agency's enterprise identity provider before the pilot ended, not after. The AI tool authenticates users through the agency's existing SSO, inherits role assignments from the agency's HR system or directory, and enforces access boundaries at the data layer based on those roles. The ISSO was in the room when this was designed, not called in to review it after the fact.
They have a named ISSO who has reviewed the system and documented the review. A named individual with a documented position, checkable: ask who the ISSO is for the AI program, and whether they've issued a formal finding.
They have FedRAMP-authorized components or an explicit, documented exception with a compensating control rationale. The exception path is legitimate, but it requires paperwork that most pilots haven't done.
They have a use case that fits within an existing data classification boundary. The AI system processes data at a classification level the agency already has infrastructure to handle, rather than requiring new data handling agreements or cross-domain solutions. Programs that require novel data handling are not impossible, but they're on a longer timeline than programs that fit within existing boundaries.
And they have a CAIO who has personally reviewed the program and can articulate what production deployment means for their agency's AI governance posture. One who has made a specific decision about this specific program, not one who is generically "supportive of AI innovation."
The Discovery Questions That Actually Reveal Structure
Generic AI governance questions produce generic answers. These don't.
"When your ISSO reviewed the pilot, what did they flag about access provisioning?" If the answer is "they haven't reviewed it yet" or "we haven't formally engaged the ISSO," you're looking at a program that hasn't started the production path. A specific finding with a specific remediation means the program is moving.
"How are pilot users provisioned — through your enterprise identity provider or through the vendor?" The answer to this question tells you more about production readiness than any roadmap slide. Enterprise IdP provisioning means someone made an architectural decision early. Vendor-managed credentials means the integration work hasn't started.
"What does your ATO path look like for this system?" Programs that have a path — even an incomplete one — have thought about production. Programs that respond with "we're hoping to leverage an existing ATO" without specifics haven't.
"Who in your CAIO's office has sign-off authority for production deployment?" This separates governance structures that can make decisions from governance structures that can produce documents. If the answer is a committee, ask when the committee last approved a production deployment of anything.
"What's your audit log architecture for AI system activity, and where does it feed?" This surfaces the SIEM integration gap faster than any security questionnaire.
The Account Qualification Call
Given this framework, the seller's job in the first discovery call is to find the architectural fingerprint, not assess enthusiasm. Enthusiasm is abundant and nearly worthless as a qualification signal.
Is the AI tool inside or outside the enterprise ecosystem? Is there an ISSO who has reviewed it? Is there a provisioning model that connects to the agency's identity infrastructure? Is there a governance structure that can make a production decision, or one that can only make a pilot decision?
Programs with the right fingerprint are worth pursuing aggressively. Programs without it aren't necessarily dead — but they're on a different timeline, and the seller needs to know which timeline they're on before committing resources. The pilot graveyard is full of programs that vendors chased for two years before realizing the architectural conditions for production were never present.
Carry one question into every federal AI discovery call: have they built the envelope that AI can live in? The answer is almost always visible if you know what to look for. Now you know what to look for.
Workforce data referenced in this piece reflects pre-2025 baselines where applicable. Federal staffing disruptions through 2025 RIF activity have altered agency capacity in ways that may affect AI program timelines independent of architectural readiness — a variable worth surfacing in discovery but outside the scope of this framework.

