When your buyer tells you their agency has "over forty AI initiatives underway," they are almost certainly telling you the truth. They are also almost certainly describing something that looks nothing like what you'd call a production deployment.
That gap, between the number agencies report and the number that reflects systems with real users, real governance, and real infrastructure, is the most important calibration problem in federal AI selling right now. The data tells a specific story.
What the OMB Inventory Counts
OMB's AI use case inventory, maintained under requirements established by the 2023 Executive Order on AI and formalized through M-24-10, had grown to approximately 1,820 reported use cases across the 24 CFO Act agencies as of December 2025. That figure gets cited in agency AI strategy documents, congressional testimony, and, frequently, in buyer conversations as evidence of institutional momentum.
Read the methodology before you let that number anchor your expectations. The inventory is self-reported. Agencies classify their own use cases, apply their own definitions of "AI," and determine their own distinction between "in development" and "in production." OMB provides guidance on categorization but does not audit the classifications. A natural language processing tool that routes help desk tickets, deployed to 200 users with an ATO and a governance board, sits in the same inventory column as a proof-of-concept chatbot running on a program office laptop.
Of the 1,820 reported use cases, agencies self-classified roughly 940 as "in production" and the remainder as in development, piloting, or planning stages. That 940 number is the one that will appear in your buyer's briefing deck.
What GAO Found When It Looked Closer
GAO's October 2025 report on federal AI implementation (Artificial Intelligence: Agencies Made Progress but Face Persistent Challenges in Responsible Deployment, GAO-26-105412) surveyed 23 of the 24 CFO Act agencies and applied a more demanding production standard: active user base, documented governance process, completed ATO or equivalent risk authorization, and evidence of ongoing monitoring. Under those criteria, the number of systems that qualified dropped sharply.
Across the 23 agencies surveyed, GAO identified 94 AI systems meeting all four criteria. Forty-one of those were concentrated in three agencies: DoD components, DHS, and the Social Security Administration. The remaining 20 civilian agencies averaged fewer than three qualifying production deployments each. Several reported zero.
GAO was careful to note this wasn't a finding of failure — agencies are early in a genuinely difficult transition. But the report explicitly flagged that "the gap between use cases reported in the OMB inventory and systems meeting minimum production readiness criteria is substantial and inconsistently understood by agency leadership." That last clause matters. Buyers aren't always lying when they cite the inventory number. They sometimes don't know what the inventory is measuring.
The Licenses-Provisioned Problem
If you've spent time in identity and access management, you've seen this before. When a new system rolls out, the first metric everyone tracks is licenses provisioned. It's the number that shows up in the implementation report, the one that gets presented to leadership as evidence of progress. Active authenticated users — the number that tells you whether the system is actually doing anything — comes later, is harder to measure, and is usually smaller by a factor that makes people uncomfortable.
Federal AI pilots are the licenses-provisioned number. Production deployments are the active-users number. The OMB inventory, as currently structured, counts provisioned. GAO is trying to count active users. The difference is where your discovery conversation needs to live.
The Segmentation That Changes Everything
Treating the federal market as a single population is the first mistake. The distribution of production AI deployments is heavily skewed, and the skew is predictable.
Defense and intelligence components are genuine outliers. DoD's AI adoption has been accelerating since the 2022 Data, Analytics, and AI Adoption Strategy, and the department has the acquisition infrastructure — OTAs, CDAO program offices, and dedicated AI funding lines — to move faster than civilian agencies. Selling into a defense component means the pilot-to-production gap is narrower, though the governance requirements are correspondingly more complex.
Large civilian agencies with established IT modernization programs — SSA, IRS, HHS, Treasury — represent a second tier. These agencies have existing enterprise architecture, active CDO and CAIO functions, and enough scale to justify the infrastructure investment that production AI requires. SSA's deployment of AI-assisted claims processing, for instance, has been running at scale long enough to generate the kind of operational data that informs real governance decisions. When a buyer at one of these agencies cites production numbers, the numbers are more likely to reflect something real.
Mid-size and smaller civilian agencies are where the inventory-to-reality gap is widest. The Partnership for Public Service's 2025 Federal AI Readiness Assessment (which surveyed 18 non-defense agencies and disclosed its methodology, unlike the OMB inventory) found that agencies with fewer than 5,000 employees had completed an average of 4.2 AI pilots but had 0.6 systems in what the assessment defined as sustained production. The CAIO function at these agencies is often a collateral duty, not a dedicated role. The ATO pipeline is the bottleneck nobody's talking about in the strategy document.
The Definitional Problem Is the Discovery Opportunity
Your buyer may not have fully confronted this epistemological issue: "pilot" and "production" mean different things in different agencies, and sometimes in different offices within the same agency. GAO's 2025 report noted that 14 of the 23 agencies surveyed lacked a formal internal definition distinguishing pilot from production deployment. That reflects how new this governance layer is, not a finding of negligence. But it means that when your buyer says "we have twelve AI systems in production," you don't yet know what they mean.
Skip "how many AI systems do you have?" and ask "what does your agency count as a production deployment?" The answer tells you more about their governance maturity than any number they'll volunteer.
A buyer who answers with reference to their ATO process, their monitoring and incident response procedures, or their model risk management framework is describing a real production environment. A buyer who answers by describing the number of use cases in their inventory is describing something earlier in the pipeline, and probably hasn't fully made the distinction themselves. That's a positioning signal, not a disqualifier.
Reading the Room Before the Meeting
The data suggests a rough calibration framework for account planning, though treat these as priors to be tested, not conclusions to be assumed.
Calling on a defense component or large civilian agency with a named CAIO and a published AI strategy that references specific systems: assume some genuine production deployments exist, ask about governance infrastructure and ATO timelines, and listen for whether they distinguish between their inventory count and their production count. The sophisticated buyer at these agencies will know the difference.
Calling on a mid-size civilian agency whose AI strategy was published in 2024 and references "exploring opportunities": assume the inventory number is mostly pilots, assume the CAIO is managing this alongside other responsibilities, and ask about what happened to the pilots they've already run. The answer to "what did you learn from your last pilot?" is more diagnostic than any number they'll give you.
When the buyer cites a specific number from their OMB inventory submission with confidence: ask what criteria they use to classify something as production. If they can't answer, you've learned something important about where the governance infrastructure actually is.
A Note on the Numbers You'll Encounter
Buyers will cite the OMB inventory. Some will cite GAO reports selectively. A few will reference the Partnership for Public Service assessments or Brookings Institution analyses of federal AI capacity, which tend to be more methodologically transparent than agency self-reports but cover a narrower slice of the landscape.
When a buyer's number and a GAO finding conflict, the GAO number is almost always the more conservative and more reliable one. Agencies aren't being dishonest; GAO applies consistent criteria across agencies and the inventory doesn't. The inventory is a useful signal of intent and activity. GAO is measuring something closer to operational reality.
The federal AI market is real, it's growing, and the agencies that have crossed from pilot to production are building the governance infrastructure that creates durable procurement relationships. But the aggregate numbers, taken at face value, will consistently make the market look more mature than it is. Your job in the first conversation is to figure out which side of the gap your buyer is actually on, and the fastest path there is asking them to define their terms.
Sources cited: GAO-26-105412 (October 2025), OMB AI Use Case Inventory public data (December 2025), Partnership for Public Service Federal AI Readiness Assessment (2025). The OMB inventory data is publicly available; the GAO report and PPS assessment include methodology disclosures that the inventory does not. Weight accordingly.

