Your buyer is going to say "customize" in an AI conversation, and they're going to mean it the way you'd mean it: configuration, settings, making the platform behave the way their organization needs. That instinct will carry you through most of the conversation. The part it misses is why we're here.
A foundation model is a large general-purpose AI model trained on massive data, designed to be adapted rather than used as-is. You'll also hear "base model" or "pretrained model" in buyer conversations. Same thing. The Stanford researchers who coined the term in 2021 called it "critically central yet incomplete." That second word is the one doing work. The model ships unfinished on purpose. The buyer's organization finishes it.
NIST's definition adds a useful detail: models "trained on broad data using self-supervised learning that can be adapted such as through fine-tuning for a variety of downstream tasks." Notice "such as through fine-tuning." Fine-tuning is one adaptation method among several, and increasingly the least common.
Your customer can deploy AI without becoming an AI lab because of this architecture. They inherit the model's general capabilities and adapt it to their work. The adaptation is the whole game. And the way they adapt it determines the cost, the complexity, and whether identity infrastructure touches the problem at all.
The adaptation spectrum
When the buyer says "customize," they could mean three very different things. These sit on a spectrum from light to heavy, and the distance between the bottom and the top is categorical.
Prompting is the lightest touch. You write instructions that tell the model how to behave: what format to use, what role to adopt, what constraints to follow. No training data. No model changes. The model's internal weights, the numerical parameters that encode everything the model learned during training, stay exactly as they shipped. You're shaping behavior at runtime through language.
Why the weights matter: changing a prompt is like changing the question you ask someone. Changing the weights is like changing how that person thinks. That distinction runs through everything else in this piece.
"Prompt engineering is typically the best place to start. It is often the only method needed for use cases like summarization, translation, and code generation."
That's OpenAI's own documentation. When the largest model provider tells you to try the free option first, that signal is worth registering.
Retrieval-Augmented Generation (RAG) adds the organization's own knowledge without changing the model. At query time, the system searches the buyer's documents, retrieves relevant chunks, and feeds them to the model alongside the user's question. The model reasons over their data without having been trained on it. Agency policy manuals, contract language, FOIA precedents, all available to the model without touching its weights.
RAG is why a general-purpose model can answer questions about a specific agency's procedures. The model didn't learn those procedures. It read them just now, the same way you'd read a briefing document before a call.
Fine-tuning is where you actually retrain the model. You feed it labeled examples of the behavior you want, and the training process adjusts the model's weights. The model itself changes. This requires training data, typically thousands of labeled examples (AWS cites 10,000 for classification tasks), plus dedicated GPU infrastructure for the training runs. That's why fine-tuning costs real money when prompting is essentially free. Add weeks of iteration. The result is a modified version of the model that behaves differently at a fundamental level.
Prompting requires zero training data and takes minutes. RAG requires a document corpus and an indexing pipeline, typically days to weeks. Fine-tuning requires thousands of curated examples, compute infrastructure, and sustained iteration. Most buyers who say "customize" are describing the first two.
Fine-tuning's shrinking territory
This section would have read differently eighteen months ago.
Context windows have grown dramatically. Models now process hundreds of thousands of tokens in a single request, which means examples and reference documents that used to require fine-tuning to "bake in" can simply be included in the prompt. Research confirms that in-context learning performance continues to improve as you add more demonstrations. Tasks that once required retraining the model can now be handled by showing the model what you want, every time you ask.
The providers themselves are making this shift hard to miss. Anthropic does not offer fine-tuning through its public API. Their guidance steers developers toward system prompts, few-shot examples, and RAG. OpenAI is winding down traditional fine-tuning for new users, moving toward Reinforcement Fine-Tuning on reasoning models only. The company that popularized fine-tuning is sunsetting the conventional version of it.
Provider fine-tuning offerings are actively evolving. Confirm specifics before citing in a live conversation.
Fine-tuning still earns its cost in narrow cases: encoding a consistent output format at scale when prompting can't stabilize it (think structured intelligence report templates that must be identical across thousands of daily outputs), reducing latency by training a smaller model to handle a task that would otherwise require a larger one, or baking in a closed-domain classification scheme like medical coding or threat categorization. But better prompting and longer context windows have absorbed most of the territory fine-tuning used to own. Federal AI practitioners estimate that prompting and RAG cover roughly 80% of government generative AI use cases. That's a practitioner rule of thumb, not a research finding, but it matches the pattern everywhere else.
Translating "customize" in real time
When your public sector buyer says "can we customize the model," they are almost certainly describing one of these:
- "We need it to know our policies and procedures." That's RAG.
- "We need it to respond in a specific format or follow specific rules." That's prompting.
- "We need it to use our terminology and follow our classification scheme." Probably prompting with good examples, possibly RAG.
- "We need it to get better at a very specific task where we have thousands of labeled examples and latency constraints." That might actually be fine-tuning. Maybe.
The word "customize" in the buyer's mouth comes from the same place your instincts come from: a world of tenant configuration, policy rules, and integration settings. And the instinct is more right than wrong. Most AI customization really is closer to configuration than to rebuilding.
Where your IDAM intuition helps, and where it starts to mislead
The parallel works well up to a point. In your world, you configure a tenant: set policies, define attribute mappings, establish trust relationships. The platform's core code doesn't change. Prompting and RAG work the same way conceptually. You're instructing and feeding context to a system whose core weights don't change. Configuration on top of a shared foundation.
Your intuition is genuinely useful here. When the buyer says "customize," you can map it to the configuration layer you already understand and ask the right clarifying questions: what behavior do you want to change, what data does the model need access to, how often does that data change?
Now the part where the metaphor cracks. In IDAM, your configuration is deterministic. You set a policy rule, it fires the same way every time. You can inspect it, audit it, predict its behavior. Prompting an AI model is not deterministic. You write an instruction, and the model produces a statistically likely response. It will usually follow your instruction. It will sometimes not. The configuration metaphor holds for the intent of what you're doing but breaks for the reliability of the outcome. Your buyer's compliance team will care about that gap, and you should be the one who names it before they discover it.
Fine-tuning breaks the metaphor further. It's closer to forking the codebase. You've created a new version of the model with different internal behavior, and the changes aren't human-readable. You can't open the weights and see what changed the way you can open a SCIM mapping and see the attribute flow. You evaluate the result by testing outputs, not by inspecting configuration. That opacity is the reason fine-tuning creates governance questions that prompting and RAG mostly don't.
For your next call
When the buyer says "customize," hear "configure" and you'll be right most of the time. Ask what data they need the model to know (that's RAG) and what behavior they need it to follow (that's prompting). If they describe a narrow, high-volume task with thousands of labeled examples and real latency constraints, then fine-tuning enters the conversation. For everything else, the foundation stays intact. They're building on top of it.
That's the whole point of calling it a foundation.
Things to follow up on...
- OpenAI's fine-tuning wind-down: OpenAI is sunsetting its traditional fine-tuning platform for new users and moving toward Reinforcement Fine-Tuning on reasoning models only, a shift that's still actively evolving and worth tracking before citing in buyer conversations.
- Federal AI adoption velocity: GAO found that generative AI use cases across federal agencies increased ninefold in a single year, from 32 to 282, a deployment pace that strongly favors prompting and RAG over fine-tuning given infrastructure constraints.
- Long context limits are real: A January 2026 study in Advances in Artificial Intelligence and Machine Learning found that models fell short of their stated maximum context windows by as much as 99% in practical tasks, so "million-token context" doesn't mean million-token comprehension.
- GAO's documentation gap: Federal agencies are not yet systematically collecting lessons learned from AI acquisitions, which means the fine-tuning vs. RAG decision records that would inform public sector best practices simply don't exist in shareable form yet.

