A trained AI model is a file. A large file, granted. Llama 4 Scout, Meta's mixture-of-experts model with 109 billion total parameters, runs somewhere between 54 and 216 GBdepending on how aggressively you compress it. DeepSeek R1 is considerably larger. But the physical reality is mundane: these are collections of numerical weights, billions of parameters encoding statistical patterns learned during training. You can download them. Copy them. Move them between servers. Run them on hardware you control.
Once training is done, the weights don't change. They sit there, inert, until someone sends them a query and the model runs inference, generating a response token by token. Every deployment decision that actually matters to your buyer — what hardware runs it, where it's hosted, which jurisdiction the data touches, what the license permits, what it costs, how it gets customized — is about the envelope around that file.
The file is the invariant. The envelope is where all the variables live. That distinction organizes everything in this section.
Where the distinction earns its keep
Your buyers are evaluating models right now, and the procurement conversations have moved past "should we use AI" into "which model, hosted where, under what terms." When a public sector buyer asks about DeepSeek, they're asking about the envelope — where it runs, who touches the data, what the license permits.
DeepSeek R1 is an MIT-licensed model developed by a Chinese AI lab. If your buyer sends a query to DeepSeek's own API, that data routes to servers in China. If your buyer runs the same model through Amazon Bedrock, inference happens in AWS US regions — Virginia, Ohio, Oregon — and AWS states that "inputs and outputs aren't shared with any model providers."
Same weights. Byte-for-byte identical model. Radically different security posture. Entirely a function of the envelope.
A CISO will sign off on one of those configurations and send you back to the parking lot on the other. And the distinction collapses the moment anyone thinks of "DeepSeek" as a monolithic thing, instead of a file that can sit inside very different envelopes.
You already know that where a cloud tenant runs determines which data residency rules apply. That instinct transfers directly: where model inference runs determines the security posture of the interaction. Here's where it breaks. In your IaaS world, the data is unique to each tenant — it matters intrinsically. A model weight file is the same artifact everywhere. So the security question lands squarely on the envelope: is this deployment trustworthy? Your jurisdiction instinct is right. Your instinct to evaluate the model by who built it, though, will mislead you in front of a buyer who's done their homework on where it actually runs.
"Open source" doesn't mean what your buyer thinks
If the file/envelope frame feels clean, licensing is where you feel it flex under load.
Meta calls Llama 4 "open source." The Open Source Initiative says it isn't. The Free Software Foundation classified Llama's license as nonfree. Meta's official response: "There is no single open source AI definition." The OSI released its Open Source AI Definition v1.0 in October 2024, but the definition itself is contested enough that the OSI has a planned update for Q4 2026, and its own board elections were suspended in early 2026 amid governance disputes.
By the OSAID's strict criteria, almost none of the models your buyers are evaluating qualify as open source. DeepSeek R1, Llama 4, Qwen: they release weights but not training data. The models that do meet the standard aren't the ones topping benchmarks.
Buyers will use "open source" and "open weights" interchangeably, and the difference has real procurement implications. The license attached to the file constrains what the envelope is allowed to look like.
In traditional software, "open source" means access to source code: you can read, modify, and rebuild the binary. Releasing model weights is closer to releasing the compiled binary — you can run it, but you can't reproduce how it was built without the training data and training code. When a buyer says a model is "open source," ask whether they mean the weights are available (probably yes) or the training pipeline is reproducible (almost certainly no). The distinction is roughly analogous to having a vendor's compiled SAML library versus having their source and build toolchain. Both let you run the software. Only one lets you audit it.
What this section covers
The six articles that follow each take one dimension of the envelope and make it concrete.
Hardware covers what inference actually requires and why GPU memory is the binding constraint. Llama 4 Scout needs one H100 at 4-bit quantization; at full precision, it needs four. Same file.
Hosting maps the spectrum from API calls to self-hosted deployment, and what each option means for data handling, latency, and control.
Jurisdiction addresses where data goes during inference, why that's a different question from where the model was trained, and how Bedrock-style hosting changes the calculus for public sector buyers.
Licensing unpacks the actual terms behind "open" models, what they permit and restrict, and why this is the least settled dimension of all six.
Cost examines what inference costs at scale, why token pricing is harder to predict than it looks, and where the numbers are moving.
Customization covers fine-tuning, retrieval-augmented generation, and the shift toward what practitioners now call context engineering: shaping the full information environment around the model, well beyond the prompt itself.
All of this follows from one thing worth holding onto: the model is a fixed file. The envelope is where your buyer's actual decisions live.
Let's open the envelope.
Things to follow up on...
- Bedrock's inference routing tiers: AWS now offers in-region, geographic, and global cross-region inference options for DeepSeek models, each with different data residency implications that map directly to public sector procurement requirements.
- Agent cost compounding math: A Stanford Digital Economy Lab study found that agentic tasks consume 1,000× more tokens than single-turn queries because each step re-reads the entire accumulated context at full price.
- OSI governance in flux: The organization behind the Open Source AI Definition suspended its 2026 board elections to redesign its selection process, after candidates who disagreed with current directors were excluded in 2025.
- Red Hat's quantization findings: Red Hat ran over half a million evaluations on quantized LLMs and confirmed that 4-bit compression preserves model integrity while cutting memory requirements by 75%, which matters for the hardware dimension of the envelope.

