A large language model is a file. A very large file — tens or hundreds of gigabytes — trained on an enormous dataset, loaded onto specialized hardware, and queried through an API. That's the physical reality underneath every AI product conversation you're going to have this year.
This section exists because vendor decks don't tell you that. They show you the capability layer: what the model does, what it can generate, what workflows it fits into. The infrastructure layer, what the model physically is, where it runs, what it costs to run, and what happens when it breaks at scale, gets glossed over. That's a reasonable choice for a product demo. It's a problem for a seller who needs to follow a buyer's architecture conversation.
The Moment the Gap Shows Up
Buyers who are actually deploying AI don't talk about it in capability terms. They talk about it in infrastructure terms, because that's where the decisions live. When a buyer says "we're running that on Bedrock," they're telling you something specific: they've chosen AWS's managed inference service, which means they've made a call about data residency, about which model families they have access to, and almost certainly about how this fits into their existing cloud procurement. When they say "we fine-tuned on our own data," they're telling you the model they're using isn't the base model. It's been modified, which has implications for what it knows, how it behaves, and who owns the resulting artifact.
If you're hearing those phrases as brand names rather than architectural choices, you're a beat behind. The buyer knows it. The conversation continues, but something has shifted.
The seller who can follow that conversation isn't necessarily more technical. They just have a concrete picture of what AI systems physically are, so the buyer's language maps onto something real rather than floating as a vague impression of sophistication.
What Gets Glossed Over
The gap between AI-as-magic and AI-as-infrastructure is partly the industry's fault. The marketing around AI systems emphasizes what they can do and deliberately obscures what they cost and where they break. A foundation model that costs $30 million to train gets described in terms of its capabilities, not its operational overhead. A retrieval-augmented generation system that requires careful data pipeline management gets described as "grounding the model in your enterprise data," which is accurate but tells you nothing about what you're actually building.
Some of this is also a function of how fast the space has moved. The people buying AI infrastructure right now are often making decisions faster than the conceptual vocabulary has stabilized. "RAG" means something specific, but it gets used loosely. "Fine-tuning" means something specific, but it gets conflated with things that aren't fine-tuning. When vocabulary is imprecise, conversations become imprecise, and the seller without grounding in the actual concepts can't tell the difference between a buyer who's using terms correctly and one who isn't.
What This Section Will Do
The pieces here cover the mechanical reality that the vendor decks skip. Not at the level of an ML engineer — you don't need to implement any of this. At the level of a technically credible seller who can parse what a buyer is describing and ask the right follow-on questions.
That means: what a model actually is as a physical artifact, and why that matters for how it gets deployed. What inference means, what's actually happening when a model responds to a query, and why that process has a cost structure that looks nothing like traditional software licensing. How hosting environments differ, and why a buyer's choice between Bedrock, Azure OpenAI, and self-hosted infrastructure tells you something real about their constraints and priorities. What fine-tuning actually does to a model versus what retrieval-augmented generation does, and why those are fundamentally different architectural choices even though both get described as "customizing the model." And what scale does to all of this, where the economics shift, where the failure modes emerge, and why a system that works fine at a hundred queries a day behaves differently at a million.
The goal is fluency, not depth. "DeepSeek on Bedrock" should parse as a concrete thing — a specific model, running on a specific managed service, with specific implications — not a phrase you nod at and move past.
Note: In identity, where your IdP runs determines your data residency profile, your latency characteristics, and your failure blast radius. The closest AI equivalent is the inference endpoint — where the model actually executes when a query arrives. It diverges here: in identity, you're largely choosing between your infrastructure and the vendor's. In AI, the hosting decision fragments into several interacting choices: where the model weights live, where inference runs, where the data being queried resides, and whether those are even the same environment. A buyer saying "we're on Bedrock" has answered one of those questions. The others are still open.
Note: In identity, customizing a connector means adding configuration on top of a base integration — something you can update, roll back, or swap out without touching the underlying system. The closest AI equivalent is fine-tuning, where additional training adjusts the model's weights to improve performance on a specific task or domain. It diverges here: fine-tuning modifies the model itself, not a layer above it. There's no configuration rollback. If the fine-tuning data was bad, or the process introduced unintended behavior, remediation means retraining — not reverting a setting. This is why buyers who've fine-tuned on proprietary data are describing a different kind of asset than buyers who are using a base model with a retrieval layer on top.
The rest of this section is organized around those concepts, in the order they tend to surface in buyer conversations. Each piece is designed to stand alone — if you only have five minutes before a call where RAG came up, you can read that piece and have something concrete to work with. Read the section straight through and the concepts build on each other in a way that should make the whole picture legible.
Start wherever the conversation is. That's what it's for.

