Three categories of model availability show up in public sector AI conversations: true open source (per the OSI's OSAID v1.0), open weights (downloadable model files under various licenses), and closed (API access only, provider holds everything). Your buyer will use "open source" to mean all three, sometimes in the same sentence. The vocabulary that earns you credibility: "open-weight" when they mean downloadable, "open source" only when the training data and process are also published, and "closed" when they mean API-only. That distinction alone puts you ahead of most people in the room, because the license page tells a very different story than the press release.
Open Source (OSI OSAID v1.0 Definition)
What it is: A model released with weights, training code, and sufficient data information to reproduce the training process, all under OSI-approved licenses.
What it does: Gives you the full recipe. You can verify how the model was trained, what data shaped its behavior, and reproduce the result independently. This is the only category where "we audited the model" can mean something beyond "we tested its outputs."
Who's behind it: Research institutions, exclusively. The validated models are Pythia (EleutherAI), OLMo (AI2), Amber and CrystalCoder (LLM360), and T5 (Google). That list hasn't changed since October 2024. No frontier-class model has achieved OSAID compliance, and the reason is structural: commercial labs treat training data as competitive advantage. Disclosing the full training corpus and process is what OSAID requires and what every commercial incentive opposes. Meta has been openly critical of the definition, arguing "there is no single open source AI definition." Google and Microsoft, after discussions with OSI, agreed to stop calling their models "open source." Meta has not.
What makes it distinct: Training data transparency. You get the ingredients list and the finished product. The catch: none of these models appear in enterprise procurement conversations. They're research artifacts, useful for understanding what "open source" should mean but not for running inference at scale in a federal environment. The definition itself is under revision, with OSI targeting a Q4 2026 update, so even this narrow category is a moving target.
Open Weights
What it is: A model whose weight files are publicly downloadable, but whose training data and full training process are withheld.
What it does: Lets you download the model, host it on your own infrastructure, fine-tune it, and run inference without calling a vendor API. You control where the computation happens and where your data flows. You do not get visibility into what the model learned or how it learned it.
Who's behind it: This is where the license terms diverge sharply, and where the word "open" starts doing work it can't support:
- DeepSeek R1 — MIT license. Maximally permissive. No commercial restrictions, no MAU thresholds. Training data not released.
- Llama 4 — Meta Community License. Commercial use permitted, but services exceeding 700 million MAUs must negotiate separately with Meta. An Acceptable Use Policy prohibits military, ITAR, nuclear, and espionage applications. Multimodal models are restricted for EU-domiciled entities. You must display a "Built with Llama" badge.
- Qwen 3 — Apache 2.0 for the open-weight releases. No MAU thresholds, no commercial restrictions. Note: some Qwen variants (Qwen3.5-Omni, Qwen3.6-Plus) are proprietary and API-only.
- Mistral Large 3 — Apache 2.0. No commercial restrictions.
- Grok 2.5 — Custom community license. Explicitly prohibits using weights to train other AI models. Businesses over $1M annual revenue must negotiate commercial terms. The license is revocable. Note: xAI has not published the full license text at a stable public URL; the terms described here are sourced from secondary reporting and community analysis, which is itself a data point about transparency.
What makes it distinct: You control the hosting. That's genuinely valuable for data sovereignty, air-gapped environments, and keeping inference traffic off third-party infrastructure. But "downloadable" and "auditable" are different words for a reason.
Open-weight models work like accepting a SAML assertion from a federated IdP. You trust the output (the assertion, the inference result) without auditing the internal process that produced it (the partner's directory, the training data). Where this analogy breaks: with federation, you can require specific authentication assurance levels via LoA policies. Nothing equivalent exists for specifying training data quality on a model you downloaded from Hugging Face.
Closed
What it is: A model accessible only through the provider's API, with no weight release and no self-hosting option.
What it does: You send prompts, you get responses. The provider handles infrastructure, scaling, and model updates. You get contractual commitments (data handling, SOC 2, sometimes FedRAMP authorization) but no ability to run the model on your own infrastructure or inspect it beyond its outputs.
Who's behind it: OpenAI (GPT-4o, o3), Anthropic (Claude 4 Opus and Sonnet), Google (Gemini Ultra via API). These are the models most federal agencies encounter first because they're the easiest to procure as a service. Increasingly, though, closed models are accessed through intermediary platforms: Amazon Bedrock, Azure AI, OpenRouter. Each intermediary adds its own contractual layer and data-flow path, which means the question "where does my data go?" has more stops than the buyer might assume. The provider's terms of service govern the model; the intermediary's terms govern the routing.
What makes it distinct: The vendor holds the security obligation. You can't audit the weights (same limitation as open weights), but you get contractual and compliance artifacts: SOC 2 reports, data processing agreements, incident response commitments. For agencies that need a vendor to point at during an audit, this is the path of least organizational resistance. The tradeoff is control: the provider can update, deprecate, or modify the model without your consent, and your data transits infrastructure you don't own.
Comparison — Trait-Led Analysis
I'm organizing this around the four dimensions that actually drive public sector model-availability decisions: self-hosting capability, audit depth, license restrictions, and compliance posture. A flat three-column table would obscure the fact that the same model can be strong on one dimension and weak on another. Trait-led analysis lets the buyer's priority determine which row matters most.
| Dimension | Open Source (OSAID) | Open Weights | Closed |
|---|---|---|---|
| Self-hosting | Yes, full stack available | Yes, weights downloadable, run on your infra | No, API only (or via intermediary platforms like Bedrock/Azure) |
| Audit: training process | Yes, training code and data information disclosed | No, weights only; training data withheld | No, fully proprietary |
| Audit: model behavior | Full: output testing plus training reproduction | Partial: output testing, red-teaming, fine-tuning observation (you can test what it does, though you can't verify why) | Partial: output testing only, within the provider's API constraints |
| License restrictions | Apache 2.0 or equivalent; minimal | Varies wildly: MIT on DeepSeek R1, Meta's 700M MAU cap and military-use prohibition on Llama 4, revocable anti-competitive terms on Grok 2.5 | Commercial ToS; usage limits, rate limits, data retention terms set by provider |
| Federal compliance artifacts | None standard; research models lack SOC 2 or vendor support | You own the compliance burden unless a cloud provider wraps it (e.g., Azure hosting Llama 4) | Vendor provides SOC 2, data handling agreements, potentially FedRAMP authorization |
| Data sovereignty | Full control | Full control of inference; no control over what's embedded in weights from training | Provider controls data routing; contractual restrictions are your lever |
| Update/recall control | You control versioning | You control versioning; once downloaded, weights can't be recalled by the developer | Provider can update, deprecate, or modify the model without your consent |
| Origin transparency | High, provenance documented | Low to moderate; RAND's research (May 2026) found only 1 of 37 open-weight model families met proportional evaluation standards | Low, vendor attestations only |
Two policy facts worth knowing. OMB M-26-04 (December 2025) directs agencies to avoid requesting sensitive technical data such as specific model weights during procurement, steering them instead toward "enough information" to assess risk management. The audit gap is a technical limitation, and OMB is codifying it into policy posture. And OMB M-25-22 instructs agencies to "maximize the use of AI products and services that are developed and produced in the United States." Neither memo defines what that means for a model trained on global data by a multinational team, but the signal matters in conversations about DeepSeek and Qwen.
IBM's framing remains the cleanest statement of the structural problem across all three categories:
"Without the training data or training code, others can't scrutinize or recreate the training process."
That sentence applies to every model in your buyer's shortlist except the five research models nobody is deploying.
Llama 4's AUP prohibiting military and ITAR use functions like scope restrictions on an OAuth access token. The artifact (the weight file) is technically valid, but the terms constrain what you're authorized to do with it. An AE who's explained OAuth scopes to a buyer can use the same framing: the license is the authorization policy, and it travels with the artifact.
The "self-host for security" argument for open weights maps to the on-prem vs. cloud identity debate your buyers already know. Running your own IdP on-prem gives you infrastructure control. That doesn't mean you wrote the authentication code or can audit every decision the system makes. Same principle: hosting weights on your infrastructure controls data flow. It tells you nothing about model provenance. When a buyer conflates self-hosting with auditability, that's a familiar correction to make.
How to Say This in the Field
| Don't say | Do say | Why it matters |
|---|---|---|
| "DeepSeek is open source" | "DeepSeek R1 is open-weight under MIT. You can host it yourself, but the training data isn't published, so you can't audit what it learned." | Precision builds credibility with technical buyers who know the difference. |
| "Llama is open source" | "Llama 4 is open-weight under Meta's community license. It prohibits military and ITAR applications, caps commercial use at 700M MAUs, and restricts EU entities on multimodal models." | Defense and IC buyers need to hear the ITAR restriction early, not after legal review. |
| "Open source means you can audit it" | "Open weights let you test the model's behavior. Auditing what it learned requires the training data, and none of the frontier models release that." | Separates output testing from training-process audit, the distinction RAND's May 2026 research centers on. |
| "Closed models are black boxes" | "Closed models and open-weight models are both black boxes at the training level. They diverge on who holds the infrastructure and the compliance obligation." | Reframes around who-owns-what. |
| "You should use open source for security" | "Self-hosting gives you data sovereignty. Training transparency is a separate security property, and open weights don't provide it." | Prevents the buyer from conflating two distinct benefits. |
| "The OSI says it's not open source" | "Only five research models meet the OSI's open source AI definition. Every model in enterprise conversations right now — Llama, DeepSeek, Qwen, Mistral — is open-weight, not open source by that standard." | Grounds the claim in a specific, countable fact. |
| "Chinese models are a security risk" | "DeepSeek and Qwen are under permissive licenses, but OMB guidance directs agencies to prioritize US-developed AI. The procurement conversation runs on origin policy, and the OMB guidance reflects that." | Keeps the conversation on documented policy rather than geopolitical speculation. |
| "Open weights are always better for compliance" | "Open weights give you hosting control. Closed models give you a vendor with SOC 2 and contractual obligations. Which matters more depends on your agency's compliance posture." | Matches the recommendation to the buyer's actual constraint. |
| "We don't need to worry about the license" | "Llama 4's license prohibits military and ITAR use. Grok 2.5's license is revocable and blocks training other models. License terms are the first thing your buyer's legal team will flag." | Surfacing restrictions before legal does earns trust. |
| "Just use the open-source version" | "Which model, under which license? Apache 2.0 on Mistral Large 3 is very different from Meta's community license on Llama 4. The word 'open' is doing a lot of work in this market." | Forces specificity, which is where the real conversation starts. |
The honest summary for 2026: "open" in AI mostly means you control where inference runs. That's a real, material benefit for agencies with data sovereignty requirements or air-gapped environments, and that's where the benefit ends. The training data that shaped every open-weight model remains undisclosed, and RAND found that only one of 37 open-weight model families reviewed in 2025–2026 met proportional evaluation standards. The OSI's definition is met by exactly five research models nobody is deploying in production. And OMB is steering agencies toward governance artifacts and vendor documentation rather than weight-level inspection. Know what "open" buys your buyer. Know what it doesn't. That's the conversation worth having next Tuesday.
Things to follow up on...
- OSAID v1.0 revision timeline: OSI is targeting a Q4 2026 update to the Open Source AI Definition, which could shift the validated model list and change which models can legally call themselves "open source."
- RAND's open-weight evaluation gap: RAND's May 2026 research found that only 1 of 37 open-weight model families met proportional evaluation standards, establishing the most current baseline for how weak the audit story actually is.
- NIST Cyber AI Profile draft: NIST's preliminary Cybersecurity Framework Profile for AI adapts CSF 2.0 for AI systems but notably lacks guidance for agentic orchestration patterns where one AI directs another.
- OMB's weight-inspection stance: M-26-04 explicitly directs agencies to avoid requesting model weights during procurement, which codifies the audit gap as federal policy rather than just a technical limitation.

