The AI supply chain has layers, and most of the confusion in buyer conversations comes from not knowing which layer someone is talking about.
Anthropic trained Claude. When an enterprise runs Claude in production, the invoice usually goes to Amazon Web Services, Google Cloud, or Microsoft Azure. The company that built the model and the company that serves it are different entities. Different contracts, different SLAs, different security boundaries.
Separation of concerns, applied to the AI supply chain. It's the structural idea underneath every topic in this section. Pricing, open weights, model selection, geopolitics: all of them get easier once you can place a vendor name into the correct layer and ask the right questions from there.
Four layers, four functions
The AI stack has four layers. Each one does one thing.
Frontier labs train models. Anthropic, OpenAI, Google DeepMind, Meta. These organizations spend billions in compute to produce a set of model weights. The weights are the product. This layer is manufacturing.
Hyperscalers host models. AWS (Bedrock), Google Cloud (Vertex AI), Microsoft Azure (Foundry). These platforms run the models at scale and wrap them in enterprise access controls, billing, and compliance tooling. The hyperscaler operates the environment where the model runs. Training happened at the lab, on different infrastructure, under a different budget.
Aggregators resell access. This layer is still forming and its boundaries are genuinely unsettled. It includes platforms that offer a single API endpoint across multiple models (OpenRouter and LiteLLM are current examples), SaaS products that embed model capabilities without exposing the model directly, and middleware that handles routing or cost optimization between providers. If the enterprise isn't calling the hyperscaler's API directly, there's probably an aggregator in the path. Expect this layer to look different in twelve months.
Enterprises consume. Your buyer's organization. They pick a model or several, access it through a hyperscaler or aggregator, and integrate it into their workflows. The enterprise rarely has a direct relationship with the lab that trained the model.
These layers aren't always cleanly separated. AWS is both a hyperscaler and Anthropic's primary cloud partner and investor. Google DeepMind trains Gemini, and Google Cloud hosts it, so the lab and the hyperscaler share a parent company. The boundaries blur. The functions, though, remain distinct, and that's what makes the stack useful as a thinking tool. When you hear a vendor name you don't recognize, the first productive question is: which layer are they in?
Why the layers are pulling apart
Through most of 2024 and into early 2025, you could mostly treat "the AI vendor" as a single entity. OpenAI's models ran on Azure. Anthropic's ran on AWS. Google's ran on Google Cloud. Each lab had a primary hyperscaler partner, and the mapping was roughly one-to-one.
That mapping broke. In February 2026, Anthropic announced that Claude would be available on Microsoft Azure, making it the only frontier model accessible on all three major cloud platforms. Azure now offers both Claude and GPT-series models through a single platform. AWS Bedrock hosts Claude, Llama, Mistral, and others. Google Vertex AI runs Gemini alongside Claude and a growing catalog of open-weight models.
The hyperscalers have become model marketplaces. An enterprise running on AWS can use Anthropic's Claude, Meta's Llama, or Mistral's models without leaving Bedrock. Training and hosting have decoupled.
The economics behind this are simple enough. Deloitte projects that inference (running a trained model in production, generating outputs) will account for roughly two-thirds of all AI compute in 2026, up from about half in 2025 and a third in 2023. (Deloitte's TMT Predictions series is industry-standard forecasting with disclosed methodology; these numbers are directional, not precise, but the trend line is consistent across multiple analysts.) Training a model is a periodic capital expenditure. Running it millions of times a day is ongoing operational spend. That operational spend is where the hyperscalers compete, which is why they want every lab's models on their platform.
What this means in practice: the choice of model and the choice of cloud provider are increasingly independent decisions. Your buyer might standardize on Azure for compliance reasons and run Claude for capability reasons. Two vendors, two contracts, two different sets of questions.
What "available on a platform" actually means
The distinction between hosting arrangements matters more than it might seem. When Claude runs on AWS Bedrock, the inference happens on AWS infrastructure. When Claude runs on Azure Foundry, the billing goes through Azure, but the compute runs on Anthropic's own infrastructure. (This is documented in Anthropic's deployment documentation for Azure Foundry; verify the current architecture before using this in a meeting, as hosting arrangements evolve.)
"Available on Azure" and "running on Azure hardware" are different claims with different security implications. This is exactly the kind of distinction that surfaces when a buyer asks about data residency or compliance boundaries. Getting it wrong in that conversation is expensive.
In identity, federation separates concerns cleanly: the IdP issues a credential, the SP consumes it, the user never touches the plumbing. The closest AI equivalent is the lab-to-hyperscaler relationship — Anthropic trains the model, AWS hosts it, the enterprise calls the API. It diverges here: in federation, the SP can cryptographically verify the IdP's assertion. In the AI stack, the enterprise cannot verify that the model weights on Bedrock are exactly what Anthropic last trained. There is no signed SAML assertion proving model integrity. The trust is purely contractual, backed by an agreement with no cryptographic equivalent of that signed assertion. That gap doesn't have a clean solution yet.
What this section covers next
This piece is the map. The rest of the section fills it in.
We'll cover how frontier labs differ from each other and why those differences matter to enterprise decisions. We'll look at how hyperscaler platforms work as model marketplaces, what "open weights" means in practice and where the "open source" label misleads, how pricing works across the stack, and how geopolitics is starting to shape which models are available where.
Every one of those topics sits on the four-layer structure introduced here. When a new vendor name comes up in a buyer conversation, the first useful move is still the same: figure out which layer they're in. That's where the right questions start.
Things to follow up on...
- Claude's multi-cloud fine print: Anthropic's Azure Foundry documentation specifies that Claude models run on Anthropic's infrastructure even when billed through Azure, a detail that matters for data residency conversations.
- Inference is overtaking training: Deloitte's 2026 TMT Predictions estimate inference will account for roughly two-thirds of all AI compute this year, up from a third in 2023, which is the economic force pulling the stack layers apart.
- Hyperscaler model catalogs keep expanding: AWS Bedrock now hosts over 40 foundation models from eight providers, while Azure Foundry lists 11,000+ models, though the vast majority come from the Hugging Face open-weight collection rather than curated enterprise offerings.
- Open weights versus open source: Most models called "open source" in vendor marketing are actually open-weight, meaning the weights are downloadable but training data and code are not released, and none comply with the OSI definition of open source, a distinction the next lesson in this section unpacks in detail.

