Where the Model Physically Runs

By Leigh Garrity— May 8, 2026

When a federal buyer asks where their AI runs, they're asking four different questions depending on what they've deployed. The configurations you'll encounter — air-gapped on-premises hardware, a hyperscaler's managed AI service, a sovereign or government cloud environment, and the model provider's own API endpoint — each have distinct answers for hardware ownership, data movement, and authorization path. The vocabulary matters because buyers increasingly know the difference. Saying "we're using DeepSeek" in a federal account means something very different if the weights are loaded on AWS hardware in a US data center versus if the application is calling DeepSeek's own API endpoint. The model file is byte-identical in both cases. The infrastructure is not.

The Four Configurations

Configuration 1: Air-Gapped On-Premises Hardware

What it is: The model weights (the billions of numerical parameters that define the model's behavior) are loaded into GPU memory on hardware that the buyer physically owns and operates, with no connection to external networks.

What it does: Inference — the process of generating a response from a prompt — happens entirely inside the buyer's facility. A prompt goes in, a response comes out, and nothing crosses a network boundary that the buyer doesn't control. The hardware is typically a GPU cluster sized to the model's memory requirements. A 70-billion-parameter model needs roughly 140GB of GPU memory at minimum, which means multiple high-end GPUs before you've accounted for inference overhead.

Who controls the hardware: The buyer. Procurement goes through capital acquisition — hardware on an approved vehicle such as GSA Schedule or NASA SEWP. The hardware sits in the buyer's data center or secure facility. No third party has physical or logical access unless the buyer explicitly grants it.

What makes it distinct: The data never leaves the building. Never, by design. That's a physical description of the network topology, full stop. The tradeoff is that the buyer owns the entire operational burden: hardware procurement, model updates, inference optimization, and capacity planning. Nobody else is minding the infrastructure.

“

Okta Concept Mapping: This maps to on-premises Active Directory — the buyer owns the directory, the hardware, and the operational responsibility. The analogy holds for hardware ownership and data boundary. It breaks at access governance: AD has decades of tooling for managing who can authenticate and what they can reach; an air-gapped inference cluster typically doesn't. In a buyer conversation, that gap is worth naming explicitly: "You own the hardware, but who's authorized to submit prompts, and how is that enforced?"

Configuration 2: Hyperscaler Managed AI Service (DeepSeek on Bedrock)

What it is: The model weights are hosted and served by a major cloud provider — in this case, AWS — through a managed inference service, with the buyer accessing the model via API within their existing cloud environment.

What it does: AWS loads the DeepSeek weights onto their own GPU infrastructure in US data centers and exposes the model through the Bedrock API. The buyer's application sends prompts to Bedrock; Bedrock runs inference on AWS hardware; the response comes back. The buyer's data stays within the AWS network boundary. It does not reach DeepSeek's infrastructure.

Who controls the hardware: AWS. The buyer has no visibility into or control over the physical hardware. AWS operates the GPU clusters in their own data centers under their own security and operational controls. The buyer controls their application layer and the AWS IAM policies governing who can call the Bedrock API.

What makes it distinct: The model provider — DeepSeek — never sees the buyer's data. AWS is the operator. DeepSeek is the model file. "DeepSeek on Bedrock" is a categorically different statement from "calling DeepSeek's API," even though the model producing the response is the same. The buyer is trusting AWS's infrastructure, not DeepSeek's.

“

Okta Concept Mapping: This maps to federated SSO with a trusted identity provider. The buyer doesn't own the infrastructure, but they've established a trust relationship with a provider operating under known, contractually defined terms. The analogy holds for the trust boundary: you're trusting the operator, not the upstream vendor. It breaks when buyers assume "AWS manages it" means AWS manages the model's behavior — they manage the infrastructure, not the model's outputs or training data provenance. In a buyer conversation: "AWS is the operator here, the same way they're the operator for your RDS instances. You're trusting their infrastructure controls, not DeepSeek's."

Configuration 3: Sovereign or Government Cloud Environment

What it is: The model is hosted in a cloud environment specifically designated for government use, operated under government oversight, and physically located within a defined jurisdictional boundary.

What it does: Inference runs on hardware in a government-designated cloud — AWS GovCloud (US), Azure Government, or a purpose-built sovereign environment. The buyer accesses the model through an API or managed service within that environment. Data stays within the jurisdictional boundary and is subject to the operational controls required for that classification tier.

Who controls the hardware: A government-designated cloud provider, operating under a specific authorization framework. In the US federal context, this typically means a FedRAMP-authorized environment; for defense accounts, it may mean IL4, IL5, or IL6 environments with additional operational requirements. The provider owns the hardware; the government sets the operational terms.

What makes it distinct: The environment was built for government workloads. The authorization baseline already exists — the buyer inherits it rather than constructing it from scratch. That's the practical difference from spinning up a commercial cloud environment and starting an authorization package from zero. It's a procurement and authorization distinction as much as a physical one.

Configuration 4: Model Provider's Own API Endpoint

What it is: The buyer's application calls the model provider's API directly, with inference running on the provider's own infrastructure wherever that infrastructure is located.

What it does: A prompt leaves the buyer's environment, crosses the public internet (or a private connection), arrives at the model provider's infrastructure, runs inference on the provider's hardware, and returns a response. The provider's systems handle the request end-to-end. The provider has access to the prompt, the response, and any metadata associated with the call.

Who controls the hardware: The model provider. For OpenAI, that's infrastructure under OpenAI's operational control. For Anthropic, that's AWS infrastructure under Anthropic's operational control. For DeepSeek, that's infrastructure operated by DeepSeek, a Chinese company, in data centers outside the United States. The buyer has no visibility into or control over any of it.

What makes it distinct: The model provider is the operator. Their infrastructure, their logs, their network, their terms of service. For most commercial use cases, this is the fastest path to a working integration. For federal buyers, it's the configuration that generates the most questions in an authorization package — questions about where the data goes and who operates the environment receiving it, which often don't have clean answers yet in agency AI governance policies.

Comparing the Four Configurations

Comparison structure: trait-led analysis. Three dimensions matter for buyer conversations — hardware control, data boundary, and federal procurement path — and all four configurations need to be compared on each. Trait-led lets the reader scan by dimension, which is how they'll use this in a live conversation. Scenario mapping would be more useful if the configurations were more situationally distinct from each other; they're not. Each is genuinely different on all three dimensions, so the trait-led structure does the job.

Hardware Control

Air-gapped on-premises is the only configuration where the buyer owns the physical hardware outright. Hyperscaler and sovereign cloud environments are both operated by third-party providers, but with different contractual structures and oversight requirements. Model provider API is operated by the model provider, with the least buyer visibility of the four.

Hardware ownership determines who bears the operational burden, who can be compelled to produce data under legal process, and which authorization frameworks apply. For federal buyers, it also determines who is responsible when something breaks at 2am.

Data Boundary

Air-gapped: data never leaves the buyer's physical facility. Hyperscaler (Bedrock): data stays within the cloud provider's network; does not reach the model provider. Sovereign/government cloud: data stays within the designated jurisdictional boundary. Model provider API: data crosses to the model provider's infrastructure.

The DeepSeek case makes this concrete. A prompt sent to DeepSeek on Bedrock stays on AWS infrastructure in the United States. A prompt sent to api.deepseek.com leaves the United States and reaches infrastructure operated by a Chinese company. The model producing the response is the same. The data boundary is completely different. This is the distinction worth having precise language for before you walk into a federal account.

Federal Procurement and Authorization Path

Air-gapped hardware goes through capital acquisition — hardware procurement on an approved vehicle, followed by an ATO process for the software stack running on that hardware. Timeline is measured in months to years, depending on classification requirements.

Hyperscaler managed AI services often ride existing cloud contract vehicles. If an agency already has an AWS GovCloud task order, adding Bedrock may be a contract amendment rather than a new procurement. The model itself may require additional review depending on the agency's AI governance policy, but the infrastructure authorization is typically already in place.

Sovereign and government cloud environments carry existing FedRAMP or DoD authorization baselines. The buyer's ATO can inherit from that baseline, which shortens the authorization path compared to a net-new commercial environment. The procurement vehicle is typically a government-specific GWAC or agency contract.

Model provider API typically requires a direct commercial agreement with the provider. For federal buyers, this means either a new procurement action or an existing GWAC that covers SaaS. The authorization path depends heavily on what data the buyer intends to send and at what classification level — questions that often don't have clean answers yet in agency AI governance policies.

“

Okta Concept Mapping: The authorization path for each configuration maps loosely to the difference between delegated and direct authorization in OAuth. Air-gapped is like a local credential store — you own everything, you authorize everything, you maintain everything. Hyperscaler and sovereign cloud are like federated SSO — you're delegating operational responsibility to a trusted party under defined terms. Model provider API is like a third-party OAuth client requesting access to your data — the terms of that grant matter, and the scope of what the external party can see is the question. In a buyer conversation: "The question isn't just whether the model is authorized; it's whether the infrastructure operator is authorized to handle your data at the classification level you need."

How to Say This in the Field

Don't say	Do say	Why it matters
"We're using DeepSeek"	"We're running DeepSeek weights on AWS Bedrock in GovCloud"	Names the operator, not just the model
"It's on-prem"	"The model is loaded on hardware inside your facility, air-gapped from external networks"	"On-prem" is ambiguous in hybrid environments
"It's in the cloud"	"It runs on [provider] managed infrastructure in [region/authorization tier]"	"The cloud" tells a federal buyer nothing about data boundary
"It's a sovereign cloud"	"It runs in a government-designated cloud environment operating under [specific authorization baseline]"	"Sovereign cloud" has no standard definition in US federal procurement
"AWS handles the security"	"AWS is the infrastructure operator; your data doesn't leave their US data centers"	"AWS handles security" sounds like a deflection; the data boundary is the specific claim
"It's the same model"	"The model weights are identical; the operator and data flow are different depending on how it's deployed"	Buyers need to understand the model-versus-deployment distinction
"It's open source so you can run it anywhere"	"The weights are available under an open license, which means your team can host them on hardware you control — that's a separate procurement and operational path"	"Open source" doesn't imply self-hosted or pre-authorized
"It's not Chinese"	"The model was developed by a Chinese company; in this configuration, AWS is the operator and your data doesn't reach the model provider's infrastructure"	The geopolitical framing is imprecise; the data flow is the verifiable fact
"We can deploy it on-prem if needed"	"The weights can be loaded on hardware inside your ATO boundary; that's a different procurement path than the managed service, and the timeline is different"	"If needed" undersells the authorization and procurement difference
"The data is secure"	"Your prompts stay within [specific boundary] and don't reach [specific party's] infrastructure"	"Secure" is a conclusion; buyers need the mechanism

The next piece covers what each of these configurations means for compliance posture, security review, and latency. Those are the "so what" questions that come after you've established the physical facts. The physical facts are the prerequisite.