The Model Is a File. The Hardware Is the Risk.

By Leigh Garrity— May 9, 2026

The Model Is a File. The Hardware Is the Risk.

Four AI hosting scenarios show up in federal accounts: a hyperscaler's own model served through their public API, open-weight models hosted on hyperscaler infrastructure, sovereign or GovCloud deployments, and on-premises or air-gapped installations. Your buyers use all four, often simultaneously, and they frequently conflate the model with the infrastructure running it. That conflation is where conversations go sideways. The language that buys you something in a federal account is precise language about physical location — who owns the hardware the weights are loaded into, not which weights they are. DeepSeek on AWS Bedrock and DeepSeek's own API can produce byte-identical outputs. They are not the same risk posture. That distinction is what this piece is built around.

The Four Scenarios

Scenario 1: Hyperscaler Public API

What it is: A model designed, trained, and served by the same company, accessed via their API endpoint.

What it does: Your application sends prompts over HTTPS to the provider's inference endpoint. The provider's hardware — in their data centers, under their operational control — runs the model and returns completions. You have no visibility into where specifically that inference happens within their global infrastructure.

Who's behind it: The model developer and the infrastructure operator are the same entity. OpenAI, Anthropic, Google (Gemini API direct) — they built the model and they own the servers it runs on. Their terms of service, their data retention policies, and their security controls govern what happens to your prompts.

What makes it distinct: You're renting compute from the model's author. There's no separation between "who made this" and "who sees your data." That's simple, which is also why it's the scenario with the fewest knobs to turn.

“

Okta concept mapping: In OAuth terms, the hyperscaler public API is the resource server and the authorization server rolled into one entity. The provider issues the API key (credential), validates it, and controls the protected resource (inference). In a traditional enterprise architecture, you'd never let a single vendor own both sides of that transaction for a sensitive workload — but that's exactly what this scenario is. The analog holds until you ask "what's the equivalent of a token scope?" The answer is: there isn't one. Prompt content isn't scoped the way resource access is. That's where the OAuth intuition stops being useful.

Scenario 2: Hyperscaler-Hosted Open Weights

What it is: A third-party model's weights loaded into and served from a hyperscaler's own infrastructure.

What it does: AWS Bedrock, Azure AI Foundry, and Google Cloud Vertex AI all host model weights from external developers — Meta's Llama family, Mistral, Cohere, and, relevantly, DeepSeek. When you call the Bedrock endpoint for DeepSeek R1, your prompt goes to AWS infrastructure in AWS-controlled US data centers. AWS loads the weights, runs inference on their hardware, and returns the result. The weights themselves may have originated from a third-party developer, but the compute environment is entirely AWS.

Who's behind it: The model weights come from the third-party developer (DeepSeek AI, in the example). The hardware, network, and operational environment belong to the hyperscaler. AWS has documented that Bedrock hosts model weights within AWS infrastructure and that customer data processed through Bedrock is not shared with the model provider. The model developer's infrastructure is not in the data path.

What makes it distinct: Model provenance and infrastructure provenance are separated. The weights are DeepSeek's; the hardware is Amazon's. These are independent facts with independent implications, and federal account conversations about this scenario routinely mix them up.

Scenario 3: Sovereign / GovCloud Deployment

What it is: AI inference running within a physically and logically separated cloud environment built to meet government-specific access and residency requirements.

What it does: AWS GovCloud (US), Azure Government, and Google Cloud's government offerings provide infrastructure where physical access is restricted to US persons, data residency is within specific US regions, and the operational staff supporting the environment meet government-mandated screening requirements. Models running in these environments — whether the hyperscaler's own models or third-party weights they've brought into the environment — operate within that boundary.

Who's behind it: The hyperscaler operates the environment under specific contractual and regulatory commitments to government customers. The model weights may be the same as those available in commercial regions, but the infrastructure operating them is a distinct, separately authorized environment.

What makes it distinct: The authorization boundary is the product. Commercial cloud already puts compute in US regions. What GovCloud adds is an authorization boundary with documented access controls, personnel requirements, and operational separation. That boundary is what makes the environment relevant to federal workloads, not geography alone.

“

Okta concept mapping: The GovCloud authorization boundary maps reasonably well to the concept of a security domain in identity architecture — a perimeter within which a specific set of trust relationships and access controls apply. When Okta operates in FedRAMP-authorized environments, the authorization boundary defines what's inside the trust domain and what isn't. The same logic applies here: the boundary is the thing, not the zip code of the data center. Where the analogy breaks: in identity, you can cryptographically verify that a token was issued within a specific domain. In AI infrastructure, you're largely trusting the provider's attestation that inference is happening within the authorized boundary. There's no equivalent of a signed assertion. That gap matters in conversations about auditability — but that's 4.5's territory.

Scenario 4: On-Premises / Air-Gapped

What it is: Model weights loaded onto hardware you own and operate, with no network path to external infrastructure.

What it does: The organization acquires model weights (through a license, a download, or a government-specific distribution), loads them onto on-premises GPU hardware, and runs inference entirely within their own environment. Nothing leaves the building — not prompts, not completions, not telemetry. This is the scenario that intelligence community and DoD components reach for when the data classification level makes external compute non-negotiable.

Who's behind it: The organization itself. They own the hardware, they operate the software stack (typically something like vLLM or Ollama for inference serving), and they are responsible for model updates, security patching, and operational continuity. The model developer has no ongoing relationship with the running system.

What makes it distinct: Maximum control, maximum operational burden. The organization that chooses this scenario is accepting full responsibility for everything that happens to the model after they download the weights. There's no vendor to call when inference latency spikes at 2am.

Comparing the Four: A Trait-Led Analysis

A flat table comparing these four scenarios would obscure more than it reveals. The dimensions that actually matter for federal account conversations aren't evenly distributed across rows — they cut differently depending on what the buyer is trying to protect. So this section uses trait-led analysis: three dimensions, all four scenarios, no "best overall" conclusion because the right answer depends on the workload.

Why trait-led: Buyers in federal accounts aren't asking which scenario is better. They're asking which scenario fits their data classification, their operational capacity, and their procurement vehicle. A framework they can apply to their own situation is more useful than a recommendation that ignores their constraints.

Dimension 1: Where Do Prompts Physically Go?

Hyperscaler public API: To the provider's global infrastructure. The specific region may be configurable; the specific data center is not. The provider's infrastructure team can access the systems your prompts traverse.

Hyperscaler-hosted open weights: To the hyperscaler's infrastructure, in the region you specify. The third-party model developer is not in the data path. AWS has been explicit about this for Bedrock: customer data stays in AWS infrastructure and is not transmitted to the model provider.

Sovereign / GovCloud: To the hyperscaler's government-specific infrastructure, within the authorization boundary. Personnel with access to that infrastructure meet government-mandated screening requirements.

On-premises / air-gapped: Nowhere. Prompts never leave the organization's physical environment.

Dimension 2: What Can You Configure and Control?

Hyperscaler public API: API parameters (temperature, max tokens, system prompt), data retention settings within the provider's options, and which model version you're calling. You cannot configure the infrastructure, the network path, or the operational security controls.

Hyperscaler-hosted open weights: Same infrastructure controls as any other hyperscaler workload — VPC configuration, endpoint policies, IAM controls on who can call the inference endpoint. You cannot modify the weights or the inference runtime. You can control who in your organization can invoke the model.

Sovereign / GovCloud: The same infrastructure controls as the commercial hyperscaler offering, plus the access restrictions that come with the authorization boundary. Some model options available in commercial regions are not yet available in GovCloud environments — availability varies by provider and changes frequently.

On-premises / air-gapped: Full control of everything, including the inference runtime, the hardware configuration, and the network isolation. Also full responsibility for everything, including updates, availability, and security patching of the inference stack.

Dimension 3: Can You Verify What's Actually Running?

This dimension gets the least attention in these conversations. It probably deserves the most.

Hyperscaler public API: You're trusting the provider's representation that the model version you're calling is what they say it is. Model cards and version documentation exist; cryptographic verification of the running weights does not.

Hyperscaler-hosted open weights: You can verify the weights against a published hash if the model developer publishes one. DeepSeek publishes model weights on Hugging Face with SHA-256 hashes. Whether AWS Bedrock is running those exact weights is something you're trusting AWS's attestation on — there's no mechanism for the customer to independently verify the loaded weights.

Sovereign / GovCloud: Same verification situation as commercial hyperscaler, within the authorization boundary. The boundary adds operational controls; it doesn't add cryptographic weight verification.

On-premises / air-gapped: You downloaded the weights. You can verify the hash yourself before loading. This is the only scenario where independent verification of what's running is actually possible.

“

Okta concept mapping: The verification gap across all four scenarios maps to a concept identity architects know well: the difference between authentication and attestation. In zero trust architecture, you don't just authenticate a device — you attest to its configuration state. The model hosting question has an analogous gap: you can authenticate that you're talking to an AWS endpoint, but you can't attest to the configuration state of the model running behind it. Okta's device trust implementation addresses this gap for endpoints by requiring a posture signal alongside an identity assertion. No equivalent mechanism exists for AI inference endpoints yet. That's not a criticism of any specific vendor; it's an accurate description of where the industry is.

How to Say This in the Field

The conversation you'll have: a federal buyer mentions they're using DeepSeek. What follows is usually either unnecessary alarm or insufficient curiosity. Neither serves the account. The table below is built for that specific conversation.

Don't say	Do say	Why it matters
"DeepSeek is a Chinese model, that's a problem."	"Where are you running it — through Bedrock or Azure, or are you calling DeepSeek's API directly?"	The model's origin country is less relevant than who owns the hardware it's running on. The first question gets you to the actual risk question.
"That's a security risk."	"The risk posture depends entirely on the hosting environment, not the model name."	Vague alarm signals you don't know the space. The precise statement positions you as someone who does.
"Is that FedRAMP authorized?"	"Is this running in a GovCloud environment, or commercial cloud?"	FedRAMP authorization is a compliance question; this is an infrastructure question. Ask the infrastructure question first.
"You should use a US model instead."	"If it's on Bedrock, your prompts are staying on AWS infrastructure — the model's developer isn't in the data path."	Recommending a different model doesn't address the actual concern. Explaining the data path does.
"The weights are open source, so it's fine."	"Open weights means anyone can download and inspect the model file. It doesn't tell you anything about where inference is happening."	"Open source" is doing a lot of work in AI conversations and most of it is wrong. Separate model availability from hosting reality.
"We need to check with legal before you go further with this."	"Let's map out where this is running and who's in the data path — that's what legal is going to ask you anyway."	Punting to legal without doing the infrastructure analysis wastes everyone's time and makes you look like a speed bump rather than a resource.
"DeepSeek on Bedrock is the same as DeepSeek's API."	"Same model, completely different infrastructure. On Bedrock, AWS owns the hardware. Calling DeepSeek's API directly, DeepSeek's infrastructure is in the data path."	This is the core distinction. If the buyer understands this, the rest of the conversation gets easier.
"What's your data classification for this workload?"	"What's the highest classification of data that might end up in a prompt to this system?"	The first version sounds like a checklist. The second version sounds like you're thinking about their actual exposure.
"I'm not sure how Bedrock works exactly."	"Bedrock hosts the weights in AWS infrastructure — the model developer doesn't have access to your prompts or completions."	Uncertainty about basic infrastructure facts in this conversation costs you credibility you won't get back. Know this cold.
"That's probably fine for unclassified."	"For unclassified workloads on Bedrock, you're in the same infrastructure posture as any other AWS-hosted workload — the question is whether your agency's policies treat AI inference differently than other cloud compute."	"Probably fine" is not a posture. The precise version gives the buyer something they can actually take to their ISSO.
"The model is the risk."	"The model is a file. The risk is in who owns the hardware it's loaded into."	This reframe changes the whole conversation. Use it early.

The buyer who says "we're using DeepSeek" is usually not asking you to evaluate DeepSeek. They're asking — sometimes without knowing they're asking — whether they've made a defensible infrastructure decision. Your job is to help them answer that question precisely, which means asking where it runs before you say anything else.

A model file loaded into AWS GPU memory in us-east-1 and a model file loaded into DeepSeek's inference cluster in Beijing are not the same thing, even if you could diff the weights and find nothing. The hardware is the answer. Get there fast.