Pre-Training, Fine-Tuning, Inference: Three Regimes Your Buyer Is Conflating Into One

By Leigh Garrity— May 6, 2026

Pre-Training, Fine-Tuning, Inference: Three Regimes Your Buyer Is Conflating Into One

Three distinct things happen to a language model before it answers your buyer's question. Your buyer has a name for all three of them, and it's usually the same name: training.

That vocabulary collapse is expensive. It sets wrong expectations about cost, timeline, and who does the work. It turns a conversation about a $50,000 customization project into a conversation about a $500 million infrastructure investment — or vice versa, which is worse, because then the buyer feels misled when the actual scope lands. AEs who can separate these three regimes — pre-training, fine-tuning, and inference — can redirect the conversation before it goes sideways. AEs who can't will nod along and find out six months later what "train it on our data" actually meant to the buyer.

Pre-Training

What it is: The process of building a foundation model from scratch by exposing a neural network to massive amounts of text (or other data) and adjusting billions of parameters until the model develops general language understanding.

What it does: Pre-training produces the base capability of a model — the thing that lets GPT-4 or Claude or Gemini understand syntax, reason about context, follow instructions, and generate coherent text. It's not teaching the model facts about your company. It's teaching the model what language is. The model learns from hundreds of billions of tokens of text scraped from the internet, digitized books, code repositories, and curated datasets. By the time pre-training ends, the model has internalized statistical patterns across human knowledge at a scale no human team could review.

Who's behind it: Model providers. OpenAI, Anthropic, Google DeepMind, Meta. No enterprise customer is doing this. The compute requirements are measured in thousands of A100 or H100 GPUs running continuously for months. Published estimates for training frontier models — GPT-4 class and above — range from $50 million to over $500 million in compute costs alone, before accounting for the data pipeline, the engineering team, and the iteration cycles when early runs fail. Meta's published research on the Llama 3 family describes training runs on clusters exceeding 16,000 GPUs. These are not numbers that appear in a software procurement conversation.

What makes it distinct: Pre-training is the only regime where the model's fundamental capabilities are established. Everything downstream — fine-tuning, prompting, retrieval — works within the envelope that pre-training created. You cannot fine-tune a model into understanding something it has no pre-trained basis for. You can specialize it; you cannot build it. The timeline is months, not days. The investment is nine figures, not five. And the people doing it are researchers with PhDs in machine learning, not your IT team.

When your buyer says "we want to train our own model," they almost certainly don't mean this. Almost no enterprise does. The ones that do — large financial institutions building proprietary models for regulatory reasons, defense contractors with classification requirements — know exactly what they're getting into and have already had a very different conversation.

Fine-Tuning

What it is: The process of taking a pre-trained model and continuing to train it on a smaller, domain-specific dataset to shift its behavior toward a particular task, vocabulary, or style.

What it does: Fine-tuning adjusts the model's weights — the internal parameters that govern how it processes and generates text — based on examples you provide. If you give a model 10,000 examples of your company's support tickets paired with ideal responses, the model updates its weights to become better at generating responses that match that pattern. It doesn't forget what it learned during pre-training; it layers specialization on top of general capability. The result is a model that handles your domain vocabulary more fluently, follows your preferred response format more consistently, and reflects your specific use case more accurately than the base model would.

Who's behind it: This is where enterprise customization lives. Fine-tuning is accessible to organizations with a data science team, a curated dataset, and a budget in the range of thousands to low hundreds of thousands of dollars depending on model size and dataset volume. OpenAI's fine-tuning API allows customers to submit training files and receive a customized model endpoint. Google's Vertex AI platform offers supervised fine-tuning for Gemini models. The compute cost for fine-tuning a mid-size model on a few thousand examples can be under $1,000. Fine-tuning a larger model on a richer dataset might run $20,000–$100,000. The timeline is hours to days, not months.

What makes it distinct: Fine-tuning actually changes the model. The weights update. Prompting guides a static model with instructions. Retrieval-augmented generation (RAG) feeds the model relevant documents at runtime without touching its parameters. Fine-tuning is the right choice when you need consistent behavioral changes — a specific tone, a domain vocabulary, a task format — that would be impractical to enforce through prompting alone. It's a poor fit when you need the model to access current information (fine-tuning bakes in a snapshot; it doesn't stay current) or when your customization need is really about retrieving specific documents rather than changing how the model reasons.

This is the clarification that matters most in a buyer conversation: when someone says "train it on our data," they almost always mean fine-tuning or RAG. The question to ask back is whether they want the model to behave differently (fine-tuning) or know specific things (RAG). Those are different problems with different solutions and different cost profiles.

Inference

What it is: The process of running a trained model to generate a response to a specific input — one call, one output, one bill line item.

What it does: Inference is what happens when a user submits a prompt and the model produces a response. The model's weights are fixed. No learning occurs. The model processes the input tokens, runs them through its layers, and generates output tokens according to the probability distributions it learned during training (and fine-tuning, if applicable). From the model's perspective, every inference call is stateless — it doesn't remember the last call unless memory is explicitly engineered into the application layer.

Who's behind it: Inference is the operational cost that shows up on your customer's bill. When an enterprise deploys an AI application — a support chatbot, a document summarizer, a code assistant — every user interaction is an inference call. Model providers charge per token: typically fractions of a cent per thousand tokens for input, slightly more for output. At low volumes, this is negligible. At enterprise scale, millions of calls per day across thousands of users, inference cost becomes a significant line item that requires active management. Published pricing from OpenAI as of early 2026 puts GPT-4o input tokens at roughly $2.50 per million tokens. A moderately complex enterprise application processing 10 million tokens per day runs approximately $25/day in model costs alone, before infrastructure, latency optimization, and caching.

What makes it distinct: Inference is the only regime that happens continuously, at scale, in production. Pre-training happens once (per model version). Fine-tuning happens occasionally (when behavior needs to change). Inference happens every time someone uses the application. This means inference cost scales with adoption in a way that pre-training and fine-tuning costs don't. It also means inference latency — how long the model takes to respond — is a product quality issue, not just a technical one. Buyers who ask about "AI cost" are usually asking about inference cost without knowing that's the specific question.

Comparing the Three: A Trait-Led Analysis

Pulling one dimension at a time and mapping it across all three regimes is the fastest way to orient a buyer conversation. Cost, time, and ownership are the three that matter.

Cost

Pre-training: $50M–$500M+. Vendor-only. Not a customer conversation. Fine-tuning: $1K–$100K for most enterprise use cases. Customer-accessible with the right data and tooling. Inference: Fractions of a cent per call, scaling with volume. The ongoing operational cost.

When a buyer asks "how much does this cost," they're almost certainly asking about inference (the recurring bill) or fine-tuning (the customization investment). Pre-training cost is irrelevant to their decision — it's already been paid by the model provider and amortized into the API pricing.

Time

Pre-training: Months. Measured in training runs that span weeks, with iteration cycles on top. Fine-tuning: Hours to days. A well-prepared dataset and a clear objective can produce a fine-tuned model in an afternoon. Inference: Milliseconds to seconds per call. The latency the end user experiences.

Who performs it

Pre-training: Model providers. Research teams. Not your customer. Fine-tuning: The customer's data science team, or a vendor offering fine-tuning as a managed service. Increasingly, model providers offer fine-tuning APIs that reduce the technical barrier significantly. Inference: Automated. The model serves calls; no human is in the loop per call.

When an AE encounters it

Pre-training: When a buyer says "we want to build our own model" or "we don't want to use a third-party model." The right response is to understand whether the concern is capability, data privacy, or regulatory — because each of those has a different answer, and almost none of them require pre-training. Fine-tuning: When a buyer says "train it on our data," "make it understand our terminology," or "customize it for our use case." This is the most common conversation. Inference: When a buyer asks about cost at scale, latency, or "how does it perform in production." This is the ongoing operational conversation.

Pre-training is the right choice when you need a capability that doesn't exist in any available model — a narrow circumstance that applies to almost no enterprise buyer. Fine-tuning fits when you need consistent behavioral adaptation. Inference optimization is the conversation when you're in production and the bill is growing. The choice depends entirely on where the buyer is in their deployment lifecycle and what problem they're actually trying to solve.

Field Language Guide

Don't say	Do say	Why it matters
"We can train it on your data"	"We can fine-tune it on your data"	"Train" implies pre-training; "fine-tune" names the actual process and sets accurate expectations
"Build your own AI"	"Deploy a model customized for your use case"	"Build" implies pre-training from scratch; most buyers want customization, not construction
"The AI learns from your interactions"	"The model can be fine-tuned on your interaction data"	Implies continuous learning during inference, which doesn't happen by default
"We need to train it on your documents"	"We can use RAG to give the model access to your documents at runtime"	Document retrieval is usually a RAG problem, not a fine-tuning problem
"How much does training cost?"	"Are you asking about the customization cost or the per-call cost?"	Separates fine-tuning investment from inference spend — two different budget conversations
"It'll learn your company's terminology"	"Fine-tuning can adapt the model's outputs to your domain vocabulary"	Precise about mechanism; avoids implying the model is continuously updating
"We don't want our data in the model"	"Fine-tuning uses your data to update weights; RAG retrieves your data at runtime without embedding it in the model permanently"	Distinguishes two very different data handling approaches with different privacy implications
"Can we retrain it when it gets something wrong?"	"Specific errors are usually addressed through prompt engineering or RAG; systematic behavioral issues can be corrected with fine-tuning"	Calibrates what fine-tuning is for versus what it can't efficiently fix
"The model knows everything up to its training date"	"The base model's knowledge has a cutoff; RAG can give it access to current information at runtime"	Separates pre-training knowledge cutoff from inference-time retrieval capability
"Training is expensive"	"Pre-training is expensive — that's the model provider's investment. Fine-tuning is accessible; inference is the cost that scales with your usage"	Prevents the buyer from thinking customization requires a nine-figure budget

“

Okta Concept Mapping

The closest IDAM analog to these three regimes is the distinction between Okta's platform infrastructure, a customer's tenant configuration, and an individual authentication event. Okta built and maintains the Universal Directory, the policy engine, the protocol implementations — that's the pre-training analog: foundational capability, vendor-owned, not something the customer touches. A customer's attribute mappings, authorization policies, and app integrations are the fine-tuning analog: configuration that specializes the platform for a specific organization's needs, built on top of the foundation. Each authentication or authorization check is the inference analog: a discrete, per-call operation that consumes compute and produces an outcome. Where the analogy breaks: a customer's Okta configuration doesn't change the underlying platform code, but fine-tuning actually modifies a model's weights — it's a deeper form of customization than attribute mapping, closer to what it would mean if your policy rules could alter how the Universal Directory itself processes lookups. In a buyer conversation, that distinction matters: fine-tuning isn't just configuration, it's a structural change to the model, which is why it requires data, compute, and a deliberate process rather than a UI toggle.

The vocabulary problem at the center of AI budget conversations runs deeper than buyers not understanding the technology. The same word — "training" — describes three things that differ by three orders of magnitude in cost and two orders of magnitude in time. The buyer who says "train it on our data" is almost always describing a fine-tuning or RAG project. The AE who doesn't catch that is setting up a conversation that will eventually require a very uncomfortable recalibration.

Ask whether they want the model to behave differently or know specific things. That question separates fine-tuning from RAG, which separates the behavioral customization conversation from the knowledge retrieval conversation. Pre-training doesn't enter the picture. It almost never does.