A weights file is a serialized collection of floating-point numbers. Billions of them, in the case of a large model — Llama 3 70B is roughly 140 gigabytes of parameters stored in a format called safetensors. When Meta "releases" Llama, they're publishing those files to a download endpoint. Not a running service. Not an API. A file you download, load onto compute you control, and run yourself.
That's the mechanical reality underneath the word "open," and it's worth holding onto precisely, because the word is currently doing more work than it can support.
Three Things "Open" Might Mean
The software industry has a definition for open source. The Open Source Initiative requires that software be freely redistributable, include source code, allow modifications and derived works, and impose no restrictions on who can use it or what they can use it for. That definition is specific, and most AI models that call themselves "open" don't meet it.
The honest term for what most labs release is open-weight: the trained model artifact is publicly available, but the training data and training code are not. You get the output of the process, not the process itself. You can run the model, fine-tune it, build products on it. You cannot audit what it was trained on, reproduce the training run, or verify the data pipeline.
Open source in the strict sense would require all of that. Almost no major model meets this bar. EleutherAI's work comes closest, with released training code and documented datasets. The models most commonly described as "open" — Llama, Mistral, Gemma, Qwen, DeepSeek — are open-weight.
Source-available is a third category, mostly relevant to software rather than models: the code is visible but the license restricts commercial use. You'll encounter this framing occasionally in AI tooling, less often in model releases themselves.
When a procurement document or a vendor says "open source AI," ask which of these three they mean. The answer changes what you can do with it.
The Licensing Landscape, Mapped
Open-weight doesn't mean license-free. The weights come with terms, and the terms vary.
Apache 2.0 is the most permissive common license. Mistral's models (7B, Mixtral), Google's Gemma family, and Alibaba's Qwen 2.5 series all use it. Commercial use is unrestricted. You can build a product, charge for it, and modify the model without releasing your changes. Attribution is required — you need to preserve copyright notices — but that's a documentation obligation, not an operational one. For enterprise deployment, Apache 2.0 is effectively a green light.
MIT is similarly permissive, arguably simpler. DeepSeek's models are MIT-licensed. Same commercial freedom, same attribution requirement, shorter license text. If you've ever shipped software under MIT, the obligations here are identical.
Meta's Llama Community License is where it gets interesting, though less interesting than it sounds for most enterprise use cases. The license is broadly permissive — commercial use allowed, fine-tuning allowed, redistribution allowed — with one notable carve-out: if your product or service has more than 700 million monthly active users, you need a separate license from Meta. That threshold exists to prevent the hyperscalers from building Llama-powered products that compete with Meta at scale without a commercial arrangement.
For a federal agency or an enterprise deploying an internal tool, 700 million MAU is not a number you will reach. The clause matters to Google, Microsoft, and a handful of consumer platforms. It does not matter to your accounts.
When You'll Actually Need This
An agency RFP includes language requiring "open source" AI components, or a prospect tells you their security team will only approve "open" models. Neither statement means anything until you know which definition they're using.
An agency that wants open-weight models for air-gapped deployment has a solvable problem — Apache 2.0 or MIT models can be downloaded, run on-premises, and integrated with Okta's workforce identity stack without ongoing vendor dependency. One that wants open source in the OSI sense, with auditable training data and reproducible training runs, has a much harder problem, because that tier of transparency barely exists in production-quality models yet.
In competitive conversations, it's worth pressing on the specific license. A vendor claiming their model is "open source" when it's actually open-weight under a custom license isn't necessarily lying; the terminology is genuinely unsettled. Ask for the license text. The answer is usually a URL.
Okta Concept Mapping: The closest IDAM analogy is receiving a compiled binary under a permissive license. You can run it, redistribute it, build products on it — you just can't see the source that produced it. This is where the "open source" intuition you hold from software starts to mislead you: in software, open source means you can audit the code for vulnerabilities and verify what it does. With open-weight models, you get the artifact that does inference, not the data or process that shaped it. The artifact is yours to use. What produced it remains opaque.

