When a public sector buyer says "we need an open model," they could mean three different things, and the word "open" doesn't tell you which one. The three categories you'll encounter — open weights, true open source, and closed/proprietary — are genuinely distinct, and the distinctions matter in procurement conversations, in ATO packages, and in any room where someone is trying to map AI architecture to compliance requirements. Using the right term precisely doesn't just make you sound credible; it surfaces the buyer's actual requirement faster than any discovery question will.
Open Weights
What it is: The trained model file is publicly downloadable. The training process — data, code, hyperparameters — is not.
What it does: Anyone can download the weights and run the model on their own infrastructure. No API call required, no vendor in the loop at inference time. The model behaves identically to how it behaves in the provider's own environment, because it is the same artifact.
Who's behind it / where it comes from: Meta's Llama family is the most prominent example. Mistral AI releases most of its models this way — Mistral 7B and Mixtral 8x7B ship under Apache 2.0. DeepSeek released the weights for DeepSeek-R1 under MIT. Alibaba's Qwen models are available under a Qianwen license. These are the models your buyers are most likely to name.
What makes it distinct: The file is yours to run. The story of how it became that file is not. Meta's Llama 3 ships under a community license (not Apache 2.0, not MIT) that restricts commercial use above certain scale thresholds and prohibits using the model to train competing foundation models. DeepSeek's MIT license is more permissive on paper, though federal buyers are asking separate questions about DeepSeek that go beyond licensing. Mistral's Apache 2.0 models are the cleanest licensing story in this category. "Open weights" describes a distribution mechanism, and the actual license governs what the buyer can do. The license varies by model, and the license is what actually governs what the buyer can do.
Okta Concept Mapping — Compiled Binary Analogy
Open weights behave like a compiled binary of an authentication library. You can deploy it, run it, inspect its outputs, and integrate it into your stack without going back to the vendor. What you cannot do is read the source. Your OAuth intuition about "inspecting the token" doesn't transfer here — you can inspect what the model produces, not what produced the model. The analogy breaks in one important direction: a compiled binary has a known source that someone, somewhere, could audit. Model weights are the output of a training process that may not be documented in enough detail for anyone to reconstruct. That gap is where the audit conversation gets complicated.
True Open Source
What it is: The weights, the training code, and the training data are all publicly available. The full provenance of the model is inspectable.
What it does: In principle, a sufficiently resourced organization could reproduce the model from scratch, verify what it was trained on, and audit every decision made during training. In practice, this requires compute budgets that most organizations don't have, but the capability exists in theory.
Who's behind it / where it comes from: This category is nearly empty at the frontier. The Allen Institute for AI's OLMo series is the most rigorous current example — training data, code, weights, and intermediate checkpoints are all public. EleutherAI's Pythia models and the BLOOM model (released by BigScience in 2022 with documented training data) qualify. What doesn't qualify: Llama, Mistral, DeepSeek, Qwen, or any other model your buyer is likely to have read about in a procurement brief. The models that get called "open source" in press coverage are almost universally open weights.
What makes it distinct: The training data is the differentiator. Knowing the weights without knowing the training data is like knowing the answer without knowing the question. True open source models are the only category where a buyer could, in principle, ask "was this model trained on data that creates liability for us?" and get a documentable answer. The tradeoff is that no true open source model currently competes with frontier closed models on capability benchmarks. OLMo 2 is impressive for what it is. It is not GPT-4o.
Closed/Proprietary
What it is: The model runs on the provider's infrastructure. You access it through an API. The weights never leave the provider's environment.
What it does: You send a request, you get a response. The provider handles everything between those two events — model updates, infrastructure, safety filtering, rate limiting. You have no visibility into the model file and no ability to run it independently.
Who's behind it / where it comes from: OpenAI's GPT-4o and o-series models. Anthropic's Claude family. Google's Gemini. Amazon's Titan models (though AWS also hosts open-weight models through Bedrock, which is a different conversation). Microsoft's Azure OpenAI Service wraps OpenAI models under Microsoft's enterprise terms. These are the models most federal buyers have already used through consumer interfaces and are now trying to figure out how to procure properly.
What makes it distinct: The provider controls the model version, the safety layer, and the update cadence. When OpenAI updates GPT-4o, your application gets the updated model whether you asked for it or not. Pinning a specific version is possible through the API, but that creates its own operational complexity. The compliance story for closed models is increasingly mature: OpenAI, Anthropic, and Google have all published FedRAMP authorization status or roadmaps, and the major cloud wrappers (Azure OpenAI, Google Cloud Vertex AI) carry existing FedRAMP authorizations. That's not an endorsement of the security posture — that's covered in 4.5 — but it's relevant to why some federal buyers find closed models easier to get through an ATO process than self-hosted open-weight models.
Okta Concept Mapping — Federation Trust Chain
Closed model access looks like federated identity from the buyer's perspective: you're trusting a provider's assertions about a system you can't inspect directly. Your OIDC intuition about "I trust the IdP's claims about the user" maps reasonably well to "I trust OpenAI's claims about the model's behavior." The analogy holds until you ask what the trust is actually based on. In federation, there's a published spec, explicit trust assertions, and a legal agreement that defines what the IdP is warranting. In closed model access, the provider's terms of service define what they're committing to, and those commitments are thinner than most buyers realize. "The model will behave consistently with our usage policies" is not the same as "the model will produce auditable, reproducible outputs." Know where the analogy stops carrying weight before a buyer tests it.
Comparison
This section uses trait-led analysis across three dimensions: hosting control, audit capability, and licensing constraints. These three traits directly determine what a buyer can and cannot do with a model in a federal procurement context. Scenario mapping would work for a different audience; this audience needs the traits stated cleanly so they can apply them to whatever scenario the buyer brings.
Hosting Control
Open weights give the buyer full control over where the model runs. The file is theirs to deploy. True open source gives the same control, with the additional ability to modify the model at the training level if the buyer has the compute to do so. Closed models give the buyer no hosting control — the model runs where the provider runs it, period.
For buyers with data residency requirements, classification constraints, or network isolation requirements, open weights and true open source are the categories that enable self-hosting. Closed models are not. That's a real distinction. It's also the only dimension on which "open" consistently delivers what buyers think it delivers.
Audit Capability
With closed models, you cannot audit the weights. You cannot audit the training data. You cannot audit the training process. You can read the provider's model card, their safety evaluations, and their terms of service. That's the extent of it.
With open weights, you can audit the weights — but auditing weights is not the same as auditing a model. The weights are a 70-billion-parameter matrix of floating-point numbers. Running behavioral tests against them (red-teaming, capability evaluations, bias probes) tells you something about how the model behaves. It does not tell you what the model learned, why it learned it, or whether the training data contained material that creates liability. You cannot reconstruct the training process from the weights any more than you can reconstruct a conversation from a person's long-term memory. The weights are the residue of training. They don't document it.
With true open source, you have the training data and code, which means you can ask and answer questions about provenance. This is the only category where "we can audit the model" means something close to what a security-minded buyer thinks it means.
So: open weights improve hosting control. They don't meaningfully improve auditability over closed models. Buyers who believe they're getting an auditable model by choosing Llama over GPT-4o are operating on an assumption the technology doesn't support.
Licensing Constraints
Closed models are governed by the provider's terms of service and, for federal buyers, any additional agreements negotiated through the relevant cloud marketplace or enterprise agreement. The constraints are contractual.
Open weights licensing varies significantly by model and is frequently misunderstood. Mistral's Apache 2.0 models are genuinely permissive — commercial use, modification, redistribution, all allowed. Meta's Llama community license is more restrictive: it prohibits using Llama outputs to train competing models and restricts use by services above a certain scale threshold. DeepSeek's MIT license is permissive on its face, though federal buyers are navigating additional considerations around DeepSeek that the license text alone doesn't resolve. Qwen's licensing varies by model version. "Open weights" is a distribution mechanism, and the actual license governs what the buyer can legally do. Treating the category as uniform creates procurement risk.
True open source models typically ship under Apache 2.0 or similar permissive licenses, which is one of their underappreciated advantages for federal procurement. The licensing story is clean. Capability is where true open source falls short.
Okta Concept Mapping — Behavioral Testing vs. Source Audit
The gap between open-weight auditability and true open source auditability maps to the difference between penetration testing an application and reviewing its source code. Pen testing tells you what the application does under adversarial conditions. Code review tells you why it does it and whether the design itself creates risk. For software, both are possible. For model weights, only the equivalent of pen testing is currently feasible — behavioral probing, red-teaming, capability evaluation. The equivalent of code review doesn't exist yet as a mature practice. When a buyer says "we need to be able to audit the model," ask: "What specifically do you need to audit, and what would a satisfactory answer look like?" Most of the time, they haven't thought past the word "audit," and surfacing that gap is more useful than confirming that open weights give them something they can inspect.
How to Say This in the Field
| Don't say | Do say | Why it matters |
|---|---|---|
| "Llama is open source" | "Llama is open weights — Meta publishes the model file but not the training data or process" | Buyers who've read the press coverage think open weights = open source; the distinction matters the moment they ask about auditability |
| "With open weights, you can audit the model" | "With open weights, you can run behavioral tests on the model — you can't inspect what it was trained on" | Sets accurate expectations before the ATO team asks the question you can't answer |
| "Open source AI is more secure" | "Self-hosted models give you more control over where data goes — the security story depends on how you run it" | Security and hosting control are different claims; conflating them creates problems in 4.5 conversations |
| "DeepSeek is open source" | "DeepSeek released weights under MIT — the training data and process aren't public, and there are separate federal guidance questions worth checking" | Precise on licensing; flags the additional layer without overstating it |
| "Closed models are black boxes" | "All models are black boxes at the weight level — closed models are also externally hosted, which is the separate question" | Keeps the audit conversation honest; doesn't let "black box" do more work than it can |
| "We need open source for compliance" | "Which compliance requirement are you trying to satisfy — data residency, auditability, or license terms? The answer changes which category helps" | Surfaces the actual requirement instead of accepting a category label as a requirement |
| "The Mistral license is the same as Llama's" | "Mistral's base models are Apache 2.0 — that's more permissive than Meta's community license, which has commercial restrictions" | License terms differ meaningfully; treating open weights as a uniform category creates procurement risk |
| "Open weights means we own it" | "Open weights means you can self-host it under the license terms — ownership depends on what the license actually says" | Prevents the buyer from assuming rights the license doesn't grant |
| "True open source AI doesn't really exist" | "OLMo from the Allen Institute is the clearest current example — it's not at GPT-4 capability levels, but it's the real thing" | Gives the buyer a concrete reference instead of a void; also demonstrates you know the space |
| "Closed models can't be used in federal environments" | "Several closed models have FedRAMP authorizations or are on a path to them — the question is which impact level and which deployment model" | Avoids a false constraint that will get corrected by someone in the room |
| "Open means no vendor lock-in" | "Open weights reduce inference-time dependency on the vendor — you still depend on whoever built the model for any updates or fine-tuning support" | Lock-in is more nuanced than the category implies |
The short version, if you need it before you walk in: "open" in AI almost always means "you control the hosting." It rarely means "you can audit what the model learned." True open source — the category where both claims hold — is real but small, and none of the models your buyer is asking about by name qualify. Knowing that distinction, and being able to say it plainly, is what separates a credible technical conversation from one where you're just echoing the buyer's vocabulary back at them.

