Three terms show up in every serious AI budget conversation: pre-training, fine-tuning, and inference. You'll hear all three, often in the same meeting, often used interchangeably by people who mean very different things. Getting the vocabulary right means knowing which term maps to which cost regime — and knowing how to ask the question that surfaces what the buyer actually wants without making them feel corrected.
Pre-Training
What it is: Building a large language model from scratch by exposing it to massive amounts of text data until it learns the statistical patterns of language, reasoning, and general knowledge.
What it does: Pre-training produces the base model — the thing that can read, write, summarize, reason, and generate. GPT-4, Claude, Gemini, Llama — these are all pre-trained models. Pre-training is where the capability comes from. Everything downstream (fine-tuning, inference) runs on top of it.
Who's behind it: OpenAI, Google DeepMind, Anthropic, Meta, Mistral, and a small number of other organizations with the infrastructure to run it. Not enterprises. Not agencies. Not system integrators. The compute requirements for a frontier model run into tens of thousands of GPUs operating for months. The investment is measured in the hundreds of millions of dollars — OpenAI's GPT-4 training run is estimated to have cost over $100 million in compute alone, and that number has only moved in one direction since. Pre-training is vendor territory, full stop.
What makes it distinct: The scale is so far outside enterprise reach that "we want to train our own AI" is almost never a real request. It's a proxy for something else — usually control, privacy, or customization — and those problems have different, accessible solutions. When a buyer says they want to train their own model, find out what outcome they're actually after, because the answer almost certainly isn't pre-training.
Fine-Tuning
What it is: Taking an existing pre-trained model and adjusting its internal weights (the numerical values that determine how it responds) using a smaller, domain-specific dataset, so the model behaves differently than it did out of the box.
What it does: Fine-tuning shifts the model's behavior without rebuilding it. You can use it to make a general-purpose model speak in your agency's terminology, follow a specific response format, prioritize certain types of answers, or perform better on a narrow task like document classification or policy summarization. The base capability stays intact; the tuning changes how that capability expresses itself.
Who's behind it: The enterprise or agency, typically with tooling from a cloud provider (Azure OpenAI Service, AWS Bedrock, Google Vertex AI) or directly from the model vendor. It requires a dataset — usually hundreds to thousands of examples of the behavior you want — and an ML engineer or data scientist to run the process. The compute cost is a fraction of pre-training: a fine-tuning run on a mid-size model typically takes hours to a few days and costs anywhere from a few hundred to tens of thousands of dollars depending on model size and dataset volume. Expensive relative to a software license; cheap relative to what the buyer imagines when they say "train."
What makes it distinct: Fine-tuning is the only regime where the enterprise's data actually changes the model. The weights get updated. The model is, in a meaningful sense, different after fine-tuning than before. This matters for buyers who want their AI to "know" something in a durable way — not retrieved at runtime (that's RAG, covered in Lesson 3), but baked into how the model thinks. The tradeoff is that fine-tuning is relatively static: once you've tuned, updating the model with new information requires another tuning run.
Okta Analog: Tenant-Specific Policy Configuration
Fine-tuning is structurally similar to tenant-level policy configuration in Okta — you're taking a base capability and adjusting its behavior for a specific context without rebuilding the engine underneath. The analogy holds for explaining customization to a buyer. It breaks when you push it too far: policy configuration doesn't change the underlying Okta platform, but fine-tuning actually modifies the model's weights. In a buyer conversation, this distinction matters if they ask whether fine-tuning is reversible (it is, but you're not just toggling a setting — you're running a new process).
Inference
What it is: Running the model to generate a response to a specific input.
What it does: Inference is what happens every time a user sends a message, submits a document for summarization, or triggers an AI-powered workflow. The model receives the input, processes it through its weights, and produces an output. This is the operational regime — the one that runs continuously, in production, at whatever scale the deployment demands.
Who's behind it: The organization hosting the model, which is usually a cloud provider or the model vendor's API. The enterprise pays per call — typically measured in tokens (roughly, word-fragments) consumed and generated. Costs vary widely by model: a frontier model like GPT-4o might run $5–$15 per million tokens; smaller, optimized models can run under $1 per million. For a high-volume deployment, inference is the line on the bill that actually matters operationally.
What makes it distinct: Inference is the only regime that happens at runtime, on every interaction, indefinitely. Pre-training happens once. Fine-tuning happens occasionally. Inference happens constantly. This means inference cost is a function of usage, not a one-time investment — and it's the cost that scales with adoption. An agency that deploys an AI assistant to 10,000 employees is not paying 10,000 times the cost of a pilot; they're paying for every query those employees generate, every day, forever. That's a different budget conversation than the one that started with "train it on our data."
Okta Analog: Authentication Events
Inference cost maps cleanly to how Okta bills for authentication events — per transaction, scaling with usage, predictable at the unit level but variable in aggregate. The analogy is useful for buyers who already understand MAU or MFA transaction pricing. Where it breaks: authentication events are largely stateless, while inference carries context (the conversation history, the system prompt) that affects both cost and behavior. A buyer who asks "why does it cost more when the conversation gets longer?" needs to know about context windows, not just transaction volume.
Comparison: Three Scenarios, Three Conversations
Comparison structure: Scenario mapping. The three regimes matter to an AE not as abstract categories but as buyer moments — specific things a buyer says that signal which regime they're actually talking about. Each scenario below maps a buyer statement to the correct regime and the cost reality behind it.
Scenario 1: "We want to build our own AI."
Buyers frame this as a pre-training conversation. Almost none of them are actually in one. No federal agency and no enterprise is going to spend $100M+ and 18 months to produce a model that will be outperformed by the next OpenAI release before they finish the paperwork. What the buyer usually means: we want control, we want our data to stay on-premises, or we want something that feels like ours. Those are fine-tuning conversations, or RAG conversations, or infrastructure conversations about where inference runs. Ask what "our own" means to them specifically.
Scenario 2: "We want to train it on our data."
The most common mismatch in enterprise AI conversations. The buyer says "train," they mean "customize." The distinction matters because the solutions are different. If they want the model to answer questions using their documents, that's RAG — retrieval at inference time, no training involved. If they want the model to adopt their terminology, follow their formats, or perform better on their specific tasks, that's fine-tuning — hours to days, their data, accessible. If they want the model to have genuinely internalized their institutional knowledge in a way that persists across all interactions without retrieval, fine-tuning is the closer answer, with the caveat that it's not a live feed — updates require new runs.
The question that moves this conversation forward: "When you say train it, are you thinking about how it responds, or what it knows?" That's not a correction. It's a diagnostic.
Scenario 3: "What does this cost to run?"
An inference conversation, and the one that catches buyers off guard. They've budgeted for the implementation — the fine-tuning run, the integration work, the pilot — and they haven't modeled the operational cost. Inference pricing is real and it scales. A deployment that looks affordable at 100 users can look very different at 10,000. Useful framing: inference cost is usage cost, and usage cost is a function of adoption success. If the tool works, it costs more to run. That's a planning input, not a problem.
Okta Analog: Building Your Own IdP
Pre-training is the AI equivalent of building your own identity provider from scratch. Nobody does it unless they have a very specific reason and the engineering capacity to sustain it indefinitely. The analogy is useful precisely because your buyers already know why they didn't build their own IdP — the same logic applies. In a buyer conversation, this framing can defuse the "we want our own AI" instinct quickly: "What would have to be true for you to build your own identity provider? Same question applies here." It's a genuine diagnostic for whether the buyer has a pre-training problem (almost never) or a customization and control problem (almost always).
How to Say This in the Field
The table below is built around the specific vocabulary buyers use when they mean something different from what they say. Every "Do say" is usable as written.
| Don't say | Do say | Why it matters |
|---|---|---|
| "That's not really training, that's fine-tuning." | "When you say train it, are you thinking about how it responds, or what it knows?" | Correcting the term loses the room; the question surfaces the actual requirement. |
| "Pre-training is way too expensive for you." | "Pre-training is what OpenAI and Google do — that's not where we'd start. What outcome are you trying to get to?" | Reframes the conversation without making the buyer feel naive. |
| "You don't need to train it, you need RAG." | "If the goal is getting it to answer questions from your documents, there's a faster path than training — want me to walk through how that works?" | Keeps you in the conversation instead of handing it to a technical resource. |
| "Fine-tuning is cheap." | "Fine-tuning runs in hours to days and uses your data — it's the accessible version of what people mean when they say 'train it.'" | Sets accurate expectations without underselling the capability. |
| "Inference is just the API call." | "Inference is the operational cost — it runs every time a user interacts with the system, so it scales with adoption." | Prepares the buyer for the budget conversation that comes after the pilot. |
| "We can't tell you what it'll cost to run." | "Inference pricing is per-token, and we can model it once we know your expected query volume and which model you're deploying on." | Turns an uncertainty into a planning conversation. |
| "Their data won't be in the model." | "If the requirement is that your data never influences the model's weights, fine-tuning is off the table — but RAG keeps your data in retrieval, not training." | Precise enough to hold up if a security officer asks the follow-up. |
| "You want a custom model." | "When you say custom — are you thinking about behavior, like how it responds? Or knowledge, like what it knows? Those have different paths." | Disambiguates the two fine-tuning use cases before you're committed to one. |
| "Training takes months." | "Pre-training takes months. Fine-tuning — which is what most enterprises actually do — takes hours to days." | Prevents sticker shock from killing a conversation that was actually about fine-tuning. |
| "The model learns from your users over time." | "Out of the box, inference doesn't update the model — what users do doesn't change how it responds to the next user. Continuous learning requires a separate process." | Corrects a common assumption before it becomes a support issue. |
What to Hold Onto
Pre-training is what built the model you're selling access to. Fine-tuning is what an enterprise does when they want the model to behave differently — their data, their timeline, their cost. Inference is what runs every time a user interacts with the system, and it's the cost that scales.
When a buyer says "train it on our data," they almost always mean fine-tuning or RAG. The question that finds out which one: ask whether they're trying to change how the model responds or what it knows. One answer points to fine-tuning. The other points back to Lesson 3.
These terms aren't jargon for their own sake. They're the difference between a buyer who leaves the meeting thinking AI is a research project and one who leaves with a scoped, priceable path forward. That's the conversation you're trying to have.
Lesson 5 covers model hosting and deployment architecture — where inference actually runs, and why it matters for data residency requirements in federal accounts.

