Why Hosting Location Is the Whole Ballgame

Five hosting arrangements compared across jurisdiction, legal exposure, breach surface, audit capability, and agentic latency so you know what changes when the same model runs somewhere else.

By Leigh Garrity— May 9, 2026

Why Hosting Location Is the Whole Ballgame

Five hosting arrangements compared across jurisdiction, legal exposure, breach surface, audit capability, and agentic latency so you know what changes when the same model runs somewhere else.

DeepSeek R1 running on AWS Bedrock and DeepSeek R1 running on DeepSeek's own API use identical weights. Same model. Same file. The legal, security, and operational envelope wrapped around the inference, though, is entirely different: which government can compel disclosure of your prompts, what an attacker who breaches the provider actually sees, whether you can prove what happened during an audit, and how long every agentic task takes when the model sits far from its tools. In procurement conversations, your buyer's CISO or CAIO will weight one of these dimensions above the others. Knowing which one, and being precise about it, is the conversation worth having.

The Five Hosting Arrangements

Self-Hosted (On-Prem or Own Cloud Tenant)

What it is: You run the model on infrastructure you control, whether that's a GPU rack in your data center or a private cloud instance you manage.

What it does: Inference happens entirely within your security perimeter. No prompts or outputs leave your environment unless you send them somewhere.

Who's behind it: You. The model weights come from an open-weights provider (Meta's Llama, Mistral, DeepSeek's published models), but once downloaded, the provider has no access to your deployment.

What makes it distinct: You own the entire envelope. Every security control, every log, every access policy is yours to build. Every gap is also yours to fill. Ollama ships without authentication by default. Nobody's going to fix that for you.

US Hyperscaler Region (AWS Bedrock, Azure AI)

What it is: You call a managed inference service running in a US cloud provider's commercial data center, in a region you select.

What it does: The hyperscaler hosts the model, manages the serving infrastructure, and processes your prompts in their environment. You configure access, logging, and networking through their control plane.

Who's behind it: AWS, Microsoft, or Google. The model provider (Anthropic, Meta, DeepSeek) delivers the weights but has no access to your data after handoff.

What makes it distinct: You inherit the hyperscaler's security posture and compliance certifications, plus their legal exposure. The model provider is out of the loop. Your data is in the cloud provider's custody.

Sovereign/Government Cloud (GovCloud, Azure Gov)

What it is: A hyperscaler region operated under additional access restrictions, typically limited to screened US persons and entities, with infrastructure isolated from commercial regions.

What it does: Same managed inference as commercial hyperscaler regions, but within a boundary designed for controlled unclassified information and regulated workloads. ITAR screening and FIPS-validated crypto endpoints are standard.

Who's behind it: The same hyperscalers (AWS GovCloud, Azure Government, Google Cloud for Government), operating under stricter personnel and physical controls.

What makes it distinct: Same US jurisdiction as commercial regions, tighter perimeter, and the compliance artifacts your buyer's authorizing official actually needs to see. The legal envelope is identical. The documentation and authorization posture is where the gap opens.

Provider API, US-Based (OpenAI, Anthropic Direct)

What it is: You call the model provider's own API endpoint. Inference runs on infrastructure the provider operates, typically in US data centers.

What it does: The provider handles everything: model serving, scaling, updates. You send prompts, get responses, and interact through their platform.

Who's behind it: The AI lab itself. OpenAI, Anthropic, Google (Gemini API). US-incorporated entities operating US infrastructure.

What makes it distinct: The provider sees your data. They retain it under their published policies (yours don't govern), and their retention terms define your breach surface. You're a tenant, not an operator.

Provider API, Non-US (DeepSeek Direct)

What it is: You call an API endpoint operated by a non-US company, with inference running on infrastructure in that company's home jurisdiction.

What it does: Functionally identical to a US provider API. You send prompts, get responses. Where those prompts land and whose laws govern them is the whole distinction.

Who's behind it: DeepSeek (China). DeepSeek's servers are located in the People's Republic of China.

What makes it distinct: Your prompts are subject to the legal regime of the provider's home jurisdiction. For DeepSeek, that means China's National Intelligence Law and Data Security Law. No Zero Data Retention offering. Data is retained for the life of the account.

What Changes Across Each Dimension

I'm anchoring on each prescribed dimension and positioning all five hosting arrangements against it. The buyer's decision is usually driven by whichever dimension their compliance office or CISO weights most heavily, so this structure lets you find the dimension your buyer cares about and see the full picture in one pass.

Legal Jurisdiction

Jurisdiction follows corporate ownership, full stop. Where the server sits is a separate question. Commit that to memory.

Self-hosted: Your jurisdiction. Your legal entity controls the infrastructure. Foreign governments have no direct compulsion mechanism against your inference data.

US hyperscaler: US jurisdiction, regardless of which region you select. The CLOUD Act gives US authorities the power to compel disclosure of data in a US company's possession, custody, or control, wherever that data physically sits. Selecting eu-west-1 changes where your data is stored. The CLOUD Act still reaches a US company regardless of region.

Sovereign/government cloud: Also US jurisdiction. The legal process here involves a US government request to a US company handling US government data. No cross-jurisdiction conflict. The CLOUD Act still applies, but the scenario it creates is less adversarial.

US provider API: US jurisdiction. Same CLOUD Act exposure, plus FISA Section 702, which authorizes intelligence collection of non-US persons' communications without individualized warrants. If your buyer's end users include foreign nationals, this matters.

Non-US provider API: Chinese jurisdiction. Article 7 of China's National Intelligence Law requires organizations to "support, assist, and cooperate with national intelligence efforts." The scope of this obligation is genuinely contested among legal scholars, but the US government interprets it broadly.

Okta Concept Mapping: Federation Trust Boundaries

Your IDAM intuition about federation trust is directly useful here. In SAML federation, the trust boundary is defined by who operates the IdP, regardless of where the SP sits. Jurisdiction works the same way: it follows the operator, regardless of where the data center sits. When a buyer says "we'll just use the EU region," that's like saying "we'll point the SP at a different URL." The trust boundary hasn't moved.

Legal Process Exposure

This is the operational version of jurisdiction: what specific instruments can compel disclosure, and what does the provider actually do when served?

Self-hosted: Domestic warrants and subpoenas apply to your organization directly. No third-party provider is in the chain to be compelled independently.

US hyperscaler: CLOUD Act plus standard law enforcement instruments. AWS states publicly that it challenges overbroad requests and has reported zero disclosures of content stored outside the US from government customers. That's a track record. Whether it holds under sustained pressure is a different bet.

Sovereign/government cloud: Same instruments, different political dynamics. A request for US government data in GovCloud is a US-to-US matter, with none of the cross-border friction.

US provider API: CLOUD Act plus FISA 702. The provider holds your data under their retention policy, which means there's something to compel. OpenAI retains API data for up to 30 days by default. Anthropic retains for seven days. Zero Data Retention is available from both but requires approval and eliminates your own audit trail in the process.

Non-US provider API: China's NIL, Data Security Law, and domestic legal process. DeepSeek's privacy policy explicitly permits sharing with law enforcement and retains data for the life of the account. No ZDR option exists.

One conflict worth naming explicitly: GDPR Article 48 prohibits honoring foreign court orders for data transfers unless grounded in an international agreement. No US-EU CLOUD Act executive agreement exists. This conflict is live for any US-incorporated provider holding EU-protected data, whether that's a hyperscaler running Bedrock in Frankfurt or an AI lab processing prompts from European users through its US API. A US-incorporated entity caught between GDPR and the CLOUD Act faces two legal regimes with no clean resolution. "We'd fight it" is an argument about willingness. The CLOUD Act is an argument about power.

Breach Surface

What does an attacker who compromises the hosting environment actually get?

Self-hosted: The model weights, whatever tools the inference server has credentials to reach, and any logs you configured. Prompts that have already been processed and not logged are gone. The blast radius is defined by what you connected to the server. The risk is misconfiguration: between October 2025 and January 2026, researchers captured over 91,000 attack sessions targeting exposed LLM inference endpoints, most running without authentication.

US hyperscaler: Prompt data retention depends on your configuration. AWS Bedrock does not log prompt content by default. If you haven't enabled invocation logging, there's no prompt data to steal from the logging layer. The hyperscaler's infrastructure is a high-value target, but you're also betting on a security team with resources most organizations can't match. For most buyers, that bet is sound.

Sovereign/government cloud: Same as hyperscaler, with additional physical and personnel controls. Smaller attack surface in practice because the environment is more restricted.

US provider API: The provider's retention policy defines your exposure. An attacker who breaches OpenAI's infrastructure could access up to 30 days of your prompts. Anthropic, seven days. ZDR eliminates this, but then you have no audit trail either. Pick your gap.

Non-US provider API: DeepSeek retains data for the life of the account, with broad rights to use it including training and analytics. In January 2025, researchers found a publicly accessible ClickHouse database containing API secrets, chat logs, and backend details, exposed without authentication. The breach surface is the provider's entire retention store. And the retention store is everything they've ever collected.

Audit Capability

Can you prove what happened?

Self-hosted: Unlimited ceiling, zero floor. You build every log, every schema, every retention policy. No logging ships enabled by default in vLLM, llama.cpp, or Ollama. If you didn't build it, it doesn't exist.

US hyperscaler: Two tiers on AWS Bedrock, and the gap between them matters. CloudTrail is on by default and captures metadata: who called what, when, from where. Model invocation logging is off by default and captures actual prompts and responses. Without the second tier, you know an API call happened but have no record of what was said. Most deployments have the first without the second. Azure AI Foundry integrates with Azure Monitor, inheriting existing RBAC and retention policies.

Sovereign/government cloud: Same two-tier architecture, with FIPS-validated endpoints and the compliance certifications the authorizing official needs to see.

US provider API: You see what the provider's dashboard shows you. The logs live on their infrastructure. ZDR removes retention entirely. Your audit trail disappears along with the provider's breach surface.

Non-US provider API: No published audit capability for API customers. DeepSeek's privacy policy describes what they collect and retain, with no corresponding visibility for you to inspect.

Okta Concept Mapping: Authentication Logs vs. Authorization Logs

The Bedrock two-tier logging gap maps to something you already know. CloudTrail without invocation logging is like having authentication events (user X logged in at time Y) without authorization logs (user X accessed record Z). You know someone showed up. You have no record of what they did. When a buyer says "we have CloudTrail enabled," ask whether they've enabled invocation logging. Most haven't.

Latency to the Tool Stack in Agentic Workloads

This dimension is new to most procurement conversations, and increasingly the one that drives the decision for technical buyers.

An agent makes many calls to a model, often dozens per task. Production agentic workloads average around 20 tool calls per trace for coding tasks and roughly 40 for office work, with a long tail into the hundreds. Each tool call is a round-trip: model calls tool, tool responds, model processes response, model calls next tool. Network distance between the inference endpoint and the tool APIs compounds across every call.

The math: sub-millisecond overhead per call when colocated in the same region. About 75ms round-trip cross-US, so 20 calls adds 1.5 seconds of pure network latency. Trans-Pacific to DeepSeek's infrastructure runs roughly 200ms per round-trip, so 20 calls adds 4 seconds and 40 calls adds 8. Network overhead alone, before inference time. An orchestrator-worker flow with reflection loops can take 10 to 30 seconds on its own. Stack network latency on top and the task time balloons.

Self-hosted: If your inference server is on the same network as your tools and data, latency is minimal. Strongest latency story.

US hyperscaler (same region as tools): If your tools are in us-east-1 and your Bedrock endpoint is in us-east-1, you're colocated. Single-digit millisecond overhead per call.

Sovereign/government cloud: GovCloud regions are limited in number. If your tools are in a commercial region and your inference is in GovCloud, you're paying cross-region latency on every tool call.

US provider API: OpenAI and Anthropic's API endpoints route to their infrastructure, not yours. You don't control the region. Every tool call crosses the boundary between their network and yours.

Non-US provider API: Every tool call crosses the Pacific. At 200ms per round-trip and 20 calls, that's 4 seconds of network overhead per task before compute begins.

Okta Concept Mapping: Token Refresh Overhead

If you've ever debugged a workflow where OAuth token refresh added latency to every API call in a chain, you already understand this pattern. One extra round-trip per call is invisible in a single request. Chain 20 or 40 of them and the overhead dominates the task. Where the analogy breaks: token refresh is a solved problem with caching strategies. Agentic latency compounds in ways that don't have an equivalent caching fix yet, because each round-trip depends on the previous response.

How to Say This in the Field

Don't say	Do say	Why it matters
"You should host on-prem for security."	"Self-hosting gives you the smallest legal exposure surface, but you own every security control, including the ones that don't exist yet."	Buyers need to hear the tradeoff stated cleanly.
"The EU region keeps your data out of US jurisdiction."	"The EU region controls where data is stored. Jurisdiction follows corporate ownership, and AWS is a US company. Those are two different questions."	Conflating storage location with jurisdiction is the most common mistake in these conversations.
"DeepSeek is dangerous."	"DeepSeek's model on Bedrock and DeepSeek's API are legally different things. On Bedrock, no data reaches DeepSeek. On their API, your prompts are in Chinese jurisdiction with no retention limits."	Precision prevents the buyer from dismissing you as uninformed.
"GovCloud solves the sovereignty problem."	"GovCloud gives you the access controls and compliance posture your AO needs. The CLOUD Act still applies, but it's a US-to-US legal process with none of the cross-border friction."	Buyers in public sector know GovCloud has limits. Showing you know that too builds trust.
"Just use Zero Data Retention."	"ZDR eliminates the provider's breach surface for your data. Along with your audit trail. Your compliance team needs to decide which gap they'd rather manage."	ZDR trades one gap for another. Framing it as a clean fix will get corrected by the CISO.
"Latency doesn't matter for AI."	"A single inference call, sure. But agents make 20 to 40 tool calls per task. If your model is 200ms from your tools, that's 4 to 8 seconds of network overhead before compute."	The technical insight that separates you from someone reading a slide deck.
"We have CloudTrail enabled."	"CloudTrail captures who called the model and when. Invocation logging captures what was said. It's off by default. Most deployments have the first without the second."	The audit gap between metadata and content is the governance question most buyers haven't asked yet.
"On-prem is always more secure."	"On-prem eliminates third-party legal exposure and gives you full audit control. Every misconfiguration is also yours. Researchers found 91,000 attacks on exposed LLM endpoints in three months."	The buyer's security team will raise this. Better if you raise it first.
"China's intelligence law means they can see everything."	"Article 7 of China's NIL requires organizations to cooperate with intelligence efforts. Legal scholars disagree on the scope, but the US government interprets it broadly."	Nuance on a politically charged topic signals credibility.
"The cloud provider will protect your data."	"The cloud provider's security posture protects against external breach. Compelled legal disclosure is a separate threat model entirely."	That distinction is the entire article in one sentence.
"Each hosting option has pros and cons."	"Self-hosted wins on jurisdiction and audit ceiling. A colocated hyperscaler wins on latency. Sovereign cloud wins on compliance posture for the AO. Which dimension matters most to your team?"	Names the wins, then hands the decision back to the buyer.

Things to follow up on...

Bedrock's invocation logging gap: AWS's own prescriptive guidance now walks through enabling model invocation logging via CloudFormation, which tells you how many deployments they think are running without it.
DeepSeek's exposed database incident: Theori's January 2025 writeup on the publicly accessible ClickHouse database containing chat logs and API secrets is the clearest documented example of what "breach surface equals retention store" looks like in practice.
Agent tool-call economics at scale: Stanford's Digital Economy Lab published a May 2026 analysis showing agentic tasks consume 1,000x more tokens than single-turn code reasoning, with the cost concentrated in input tokens from re-reading accumulated context on every call.
CLOUD Act vs. GDPR, still unresolved: A detailed April 2026 analysis of AWS's European Sovereign Cloud concludes that four German GmbHs don't fix the CLOUD Act problem because corporate ownership, not subsidiary structure, determines jurisdictional reach.