Training a large language model from scratch costs somewhere between $10 million and $100 million in compute, depending on who's counting and what they're counting. That number explains the entire structure of the enterprise AI market: almost no organization will ever do it, which means almost every organization is working with a model someone else built and trying to make it useful for their specific situation.
That adaptation problem is what "foundation model" and "fine-tuning" are both about. Understanding them mechanically — not just as vocabulary — is what lets you follow the CAIO conversation instead of nodding through it.
What a Foundation Model Is
A foundation model is a large neural network trained on a massive, general-purpose dataset — text scraped from the web, books, code, scientific papers — at a scale that produces surprisingly broad capability. The training process (covered in another piece) produces a model that can write, reason, summarize, translate, and answer questions across a wide range of domains without having been explicitly programmed for any of them.
The "foundation" framing is intentional and accurate. These models are designed to be adapted, not deployed as-is. OpenAI's GPT series, Anthropic's Claude, Google's Gemini, Meta's Llama — these are all foundation models. The organizations that trained them spent the compute budget; your customers inherit the capability and customize from there.
This is why your customers can deploy AI without becoming AI labs. They're not training models. They're adapting one.
• Foundation model: A large general-purpose neural network trained at massive cost on broad data, designed to be adapted rather than deployed raw. Your customers start here because starting anywhere else is economically irrational.
What Fine-Tuning Is, Precisely
Fine-tuning is continuing the training process on a smaller, task-specific dataset after the foundation model's initial training is complete.
The mechanism, step by step:
The foundation model arrives with billions of parameters — numerical weights that encode everything the model learned during pretraining. These weights determine how the model responds to any given input. Fine-tuning doesn't replace those weights; it adjusts them. You run additional training passes using a curated dataset of examples specific to your use case — say, thousands of question-answer pairs formatted the way you want the model to respond, or examples of the writing style you need, or domain-specific text the model should learn to handle fluently.
Each training pass nudges the weights slightly in the direction of your examples. After enough passes on enough examples, the model's behavior has shifted: it responds differently than the base model would, in ways that reflect your training data.
Shifted is the operative word. The model still knows everything it learned during pretraining. Fine-tuning layers your preferences and domain patterns on top of that general knowledge. Think of it as adjusting the model's priors rather than rebuilding its knowledge.
The dataset sizes involved are illustrative of the scale difference: pretraining might use hundreds of billions of examples; fine-tuning typically uses thousands to hundreds of thousands. The compute cost is proportionally smaller — fine-tuning a frontier model is expensive by most standards, but it's a rounding error compared to training one.
• Fine-tuning: Continuing a foundation model's training on a smaller, task-specific dataset to shift its behavior toward your use case. The model's general knowledge stays; your examples adjust how it applies that knowledge.
Okta Concept Mapping
The closest IDAM analogy is a commercial IdP — and the break matters.
Your customers don't build their own identity providers. They buy Okta, configure it for their environment, and extend it through integrations. Foundation models work the same way: you don't train your own, you configure and extend one. Fine-tuning, though, is where the analogy breaks. Configuring Okta doesn't change how Okta's authentication engine works internally — you're adjusting behavior through policy, not rewriting logic. Fine-tuning actually modifies the model's internal weights. It's closer to forking the IdP's codebase than to writing a policy. Which is why, like forking a codebase, it's almost never the right answer when configuration would do.
The Honest Consensus: Fine-Tuning Is Overrated
This is where a lot of buyer conversations go wrong, and where you need solid footing.
Fine-tuning was the dominant enterprise AI customization strategy roughly three years ago, when it was often the only viable strategy. Context windows were small — early GPT-3 deployments worked with around 4,000 tokens, which is about three pages of text. If you needed the model to know your organization's policies, your product documentation, or your domain-specific terminology, you had limited options: fine-tune the model on that material, or accept that the model wouldn't know it.
That constraint has largely dissolved. Current frontier models support context windows of 128,000 tokens or more; some configurations go higher. Prompting techniques have matured significantly — structured prompting, chain-of-thought reasoning, and few-shot examples can coax reliable, specific behavior from a base model without touching its weights. And retrieval-augmented generation, RAG, has become the standard pattern for giving a model access to specific, current, or proprietary information at the moment it needs it, rather than baking that information into the model's weights permanently.
These three developments — longer context, better prompting, and RAG — have absorbed most of the territory fine-tuning used to own. Not all of it. But most of it.
When fine-tuning still earns its complexity:
Fine-tuning genuinely helps when you need consistent style or tone at scale — a model that always writes in your agency's voice, not just when you remind it to. It helps when you're deploying at a volume where the cost of longer prompts becomes significant, and a fine-tuned model can achieve the same result with a shorter context. It helps when you have a genuinely novel task type that the base model handles poorly even with extensive prompting — certain specialized classification problems, for instance, or domains with highly specific formatting requirements.
Fine-tuning does not help with knowledge cutoff problems, hallucination on factual questions, or "our data is proprietary so we can't send it to the model." All three of those are RAG problems. A model fine-tuned on your policy documents doesn't reliably know your policies — it writes in a style that sounds like your policies. If you want accurate answers about what your policies say, you need RAG: retrieve the relevant document sections at query time, inject them into the context, let the model reason over them. That's a fundamentally different mechanism, and it's the right one for most of what buyers describe when they say "fine-tuning."
The field isn't unanimous on this. There are researchers and practitioners who argue that fine-tuning and RAG are complementary rather than competing, and they're not wrong — you can do both. But the practical consensus among enterprise AI teams who've shipped production systems is that organizations reach for fine-tuning too early, before they've exhausted what's achievable through prompting and retrieval. The complexity cost is real: fine-tuning creates a model artifact you have to version, maintain, and re-fine-tune when the base model updates. That operational burden needs a specific justification.
• Fine-tuning consensus: Better prompting, RAG, and longer context windows have absorbed most of what fine-tuning used to be necessary for. Fine-tuning still earns its complexity for style consistency, cost optimization at scale, and genuinely novel task types — not for knowledge problems, which belong to RAG.
What This Sounds Like in a Real Conversation
A CIO tells you their agency needs to fine-tune a model on three years of procurement policy documents so it can answer questions from contracting officers. This is a specific, reasonable-sounding request. It is also, almost certainly, a RAG problem wearing fine-tuning's clothes.
Fine-tuning on those documents would shift the model's style toward procurement language. It would not reliably make the model accurate about what those documents say. A contracting officer asking "does FAR 52.204-21 apply to this contract type?" needs a factually correct answer, not an answer that sounds like it came from a procurement office. Accuracy on specific factual questions requires retrieval — pulling the relevant clause at query time and letting the model reason over it — not weight adjustment.
The question you can ask, without bluffing: "Is the goal for the model to write in a specific style, or to answer questions accurately about specific documents?" That distinction surfaces the actual requirement. If it's the latter, the conversation moves to RAG architecture, data access controls, and how the retrieval system gets authorized to pull from the agency's document store — which is, not coincidentally, where identity starts to matter.
You don't need to know how to build a RAG pipeline. You need to know that the buyer's stated solution and their actual problem are often mismatched, and that asking one clarifying question is enough to surface it.
• Practical read: When a buyer says "fine-tune," ask whether they need style or accuracy. Style is a fine-tuning problem. Accuracy on specific documents is a RAG problem. The distinction changes the architecture, the vendor conversation, and the identity requirements downstream.
Fine-tuning is a real technique with real applications. It's also become a default answer to questions it doesn't actually answer, partly because it sounds more technical than "better prompting" and partly because vendors with fine-tuning services have an interest in positioning it as the serious enterprise option. Most organizations deploying AI in 2026 will never fine-tune a model, and most of those that do will wish they'd spent more time on their retrieval architecture first.
When your buyer leads with fine-tuning, they're usually describing a knowledge problem. The conversation that follows — about where the knowledge lives, who can access it, and how access gets controlled — is one you're better positioned to have than most people in the room.

