What "Open" Actually Means in AI Licensing

By Carey Whitten— May 5, 2026

What "Open" Actually Means in AI Licensing

The Spectrum, Defined

Open-weight means the trained model weights have been released publicly — you can download the parameters, run inference, fine-tune, and deploy. It says nothing about whether the training data, training code, or evaluation methodology are available. You have the artifact, not the recipe. Llama, in all its versions, is open-weight. So is Mistral's Mixtral family, Qwen, and Gemma.

Open-source, applied rigorously, means the weights, training code, and training data are all publicly available under a license that meets the Open Source Initiative's ten criteria — including free redistribution, no restriction on fields of endeavor, and no discrimination against persons or groups. In practice, almost no major AI model meets this bar. Training datasets at scale involve licensed, scraped, or proprietary data that can't be fully redistributed. The OSI itself has noted that applying its definition to AI systems requires careful interpretation, and the organization has been working on updated guidance. When someone calls a model "open source" in a vendor pitch, ask which of the OSI criteria it satisfies. The answer is usually partial.

Source-available means the code or weights are visible — you can inspect them — but the license includes restrictions that disqualify it from OSI's definition. Use restrictions are the most common form: no commercial use, no use for training competing models, no deployment above a certain scale. Source-available is not open source. It is auditable proprietary software. The distinction matters enormously in procurement.

The Licensing Landscape

Three license types cover most of what you'll encounter in the field.

Apache 2.0 is the cleanest commercial tier. It permits unrestricted commercial use, modification, and redistribution, requires attribution and preservation of license notices, and includes an explicit patent grant. Mistral's publicly released models, Google's Gemma family, and Alibaba's Qwen models use Apache 2.0. If a buyer's procurement policy requires an OSI-approved license, Apache 2.0 satisfies it — though the training data question remains open.

MIT is similarly permissive, with even lighter attribution requirements. DeepSeek's released models carry an MIT license. For commercial deployment purposes, MIT and Apache 2.0 are functionally equivalent; the differences matter more to lawyers than to practitioners.

The Llama Community License is where the nuance lives. Meta's license for the Llama model family permits commercial use — fine-tuning, deployment, building products — with one notable exception: if your product or service reaches 700 million monthly active users, you must request a separate commercial license from Meta. For the overwhelming majority of enterprise and public sector deployments, that threshold is irrelevant. A federal agency running Llama on-premise for internal workflows is nowhere near 700M MAU. But the clause does two things worth understanding. First, it means Llama is not OSI-compliant — the restriction on high-scale commercial use violates the "no discrimination against fields of endeavor" criterion. Second, it means hyperscalers building consumer-facing products on Llama are in a different licensing situation than an enterprise running it behind a firewall. When a buyer's legal team flags "the Llama license," this is what they found.

Okta Concept Mapping

The closest IDAM parallel here is access control policy attached to a resource — the model weights are the resource, the license is the policy. Open-weight is roughly analogous to making a resource publicly readable: the ACL permits access, but the policy still governs what the accessor can do with it. Where the analogy breaks: in IDAM, policy is enforced by a system at runtime. In AI licensing, policy is enforced by contract law after the fact. There is no technical mechanism that prevents a 700M-MAU platform from deploying Llama without a commercial license — only legal exposure. Identity architects are trained to ask "who enforces this, at what layer, with what mechanism?" In AI licensing, the honest answer to that question is "your legal department, retroactively."

When You'll Need This

Take a public sector discovery call where the CIO mentions that their agency's AI policy requires "open source models" for any on-premise deployment. You are proposing a solution that uses Llama.

Confirming that Llama is open source and moving on is the wrong call. Ask what the policy is trying to achieve. If the intent is vendor independence and the ability to run inference without ongoing licensing fees, Llama satisfies that — and you can say so accurately. If the intent is OSI-compliant licensing, Llama does not satisfy it, and you need to know that before procurement legal does. Apache 2.0 models like Gemma or Qwen may be the better fit for that specific requirement.

"Open" in AI is a marketing gradient. Buyers who say "open source" usually mean "we can run it ourselves without a per-seat bill." That's a reasonable requirement, and most open-weight models under Apache 2.0 or MIT satisfy it. But the OSI definition exists for a reason — it was designed to prevent exactly the kind of license drift that "open source AI" now exhibits. When a buyer's procurement team applies it literally, they're not being difficult. They're doing their job.

Your job is to know which definition is in the room.