Pre-Training, Fine-Tuning, and Inference

By Carey Whitten— May 5, 2026

Three cost regimes. One phrase that conflates all of them.

Three terms show up in almost every enterprise AI conversation: pre-training, fine-tuning, and inference. Buyers use them interchangeably. They aren't. Pre-training is what OpenAI did to build GPT-4 — a nine-figure infrastructure project that no enterprise is replicating. Fine-tuning is what a buyer almost always means when they say "train it on our data" — a process that modifies an existing model using the buyer's own content. Inference is the line item on the bill — what happens every time the model answers a question. Knowing which regime a buyer is describing, and being able to redirect cleanly when they're not, is the difference between a productive scoping conversation and one that goes sideways on budget before the second meeting.

Pre-Training

What it is. The process of building a large language model from scratch by exposing it to massive quantities of text data until it learns statistical patterns across language, reasoning, and knowledge.

What it does. Pre-training produces the base model — the artifact that everything else runs on. A pre-trained model has no knowledge of your agency's policies, your data classification schema, or your mission context. It has general language capability. Think of it as the foundation before any walls go up.

Who's behind it. Vendors, exclusively. OpenAI, Anthropic, Google DeepMind, Meta — these organizations run pre-training on clusters of thousands of GPUs over periods measured in months. The compute costs for a frontier model run are estimated in the range of $50 million to several hundred million dollars per training run (illustrative benchmark; costs shift as hardware and efficiency improve). No enterprise customer initiates a pre-training run. No agency procurement vehicle funds one. This is not a line item that appears in a federal AI acquisition.

What makes it distinct. Pre-training is the only regime where the model's fundamental capabilities are established. Fine-tuning and inference both operate on top of a pre-trained base. The capabilities established at pre-training set the ceiling; customer projects work within that ceiling, not above it. This is the regime buyers invoke when they don't know they're invoking it, and the one they're almost never actually asking for.

Fine-Tuning

What it is. A supervised training process that updates the weights of an existing pre-trained model using a smaller, curated dataset — typically provided by the customer — to shift the model's behavior toward a specific domain, style, or task.

What it does. Fine-tuning changes the model. Not the base architecture, but the learned parameters that determine how it responds. A fine-tuned model on federal acquisition regulations will answer acquisition questions differently than the base model — more accurately, more consistently, in the right vocabulary. The change is persistent: the fine-tuned weights are a new artifact, distinct from the base model, that can be hosted and versioned separately.

Who's behind it. This is where the customer enters the picture. Major model providers offer fine-tuning APIs — OpenAI's fine-tuning endpoint, for example, accepts training files in a structured format and returns a fine-tuned model identifier the customer can then call at inference. The customer supplies the training data; the provider runs the compute. Some agencies with higher data sensitivity requirements pursue fine-tuning in private cloud or on-premise environments using open-weight models (Meta's Llama family being the most common). Either way, the customer's data is the input and the customer's use case is the target.

What makes it distinct. Fine-tuning is the only regime where customer data directly modifies model behavior at the parameter level. That distinction matters for data governance conversations: the training data is consumed and encoded into weights, not stored as retrievable documents. It also matters for cost — a fine-tuning run on a mid-size dataset using a hosted API might run from a few hundred to tens of thousands of dollars (illustrative; varies significantly by model size, dataset volume, and provider pricing). Time is measured in hours to days, not months. Accessible, but not trivial.

Inference

What it is. The process of running a trained model against an input — a prompt, a query, a document — to produce an output.

What it does. Inference is what happens in production. Every time a user submits a question to an AI assistant, every time an agent calls a model to decide its next action, every time a document gets summarized — that's an inference call. The model's weights don't change. The model receives input, processes it through its learned parameters, and returns output. Repeat, at scale, indefinitely.

Who's behind it. The model provider bills for inference, typically on a per-token basis. OpenAI, Anthropic, and Google each publish inference pricing for their hosted models; as of early 2026, costs range from fractions of a cent to several cents per thousand tokens depending on model capability tier (illustrative benchmark; pricing changes frequently — verify against current provider documentation before quoting). Agencies running open-weight models in private infrastructure pay for compute directly rather than per-token, but the economic logic is the same: inference is a recurring operational cost, not a one-time project expense.

What makes it distinct. Inference is the only regime that generates ongoing cost at runtime. Pre-training and fine-tuning are capital expenditures, discrete projects with defined endpoints. Inference is operational expenditure that scales with usage. This is the number that shows up on the cloud bill every month. It's also the number most buyers aren't thinking about when they ask about "AI costs." They're imagining a project, not a meter running.

Comparison Strategy

The comparison below uses trait-led analysis: each dimension is examined across all three regimes simultaneously. This structure is the most efficient choice for this content because the buyer confusion is fundamentally about cost and time, and a side-by-side trait view makes those differences visible at a glance, sparing the reader from reconstructing them across sequential profiles.

Dimension	Pre-Training	Fine-Tuning	Inference
Who initiates	Model vendor	Customer (via vendor API or private infra)	Customer (every production call)
Time horizon	Months	Hours to days	Milliseconds per call
Cost structure	$50M–$500M+ (illustrative)	Hundreds to tens of thousands of dollars (illustrative)	Per-token or per-compute-hour, ongoing
What changes	Model is created from nothing	Model weights are updated	Nothing — weights are read, not written
Customer data involved	No	Yes — training data shapes the weights	Yes — as input (prompt), not stored
Output artifact	Base model	Fine-tuned model variant	Response (text, structured data, action)
Reversible	N/A	Yes, by reverting to base or retraining	N/A — each call is stateless
Appears in procurement	Never	Sometimes, as a project	Always, as an operational line item

The column that most often surprises buyers: What changes. Inference doesn't modify the model. Fine-tuning does. Pre-training creates it. Each is a structurally different operation with different principals, different economics, and different governance implications.

Field Language Guide

The phrase "train it on our data" is the most common vocabulary collision in enterprise AI sales. Buyers rarely mean pre-training. In practice, they mean one of three things: fine-tune the model on proprietary data, connect the model to proprietary data at inference time (RAG — covered separately), or both. The table below covers the full range of scenarios where these three terms surface in buyer conversations.

Don't say	Do say	Why it matters
"We'd need to train a model from scratch on your data."	"We'd fine-tune an existing model on your data — that's a weeks-long process, not months, and the costs are in a different order of magnitude."	Pre-training framing kills budget conversations before they start.
"Training costs are prohibitive for most agencies."	"Fine-tuning costs are accessible; pre-training is vendor-only and not part of any agency acquisition."	Conflating the two makes the right option sound impossible.
"The model learns from your data over time."	"Fine-tuning updates the model on your data once, as a discrete project. Inference doesn't change the model — it just uses it."	"Learns over time" implies continuous retraining, which isn't how production inference works.
"What's your training budget?"	"Are you scoping a fine-tuning project, or asking about ongoing inference costs? Those are different line items."	The question sounds the same; the answers are orders of magnitude apart.
"We can customize the AI for your agency."	"We can fine-tune the model on your agency's data, or configure it to retrieve from your document repositories at inference time."	"Customize" is too vague to anchor a procurement conversation.
"Your data stays private during training."	"Your data is used to update the model weights during fine-tuning and isn't stored as retrievable documents afterward — but confirm data handling terms with the provider before scoping."	Data governance is a real concern; don't wave it away, but don't misrepresent the mechanics.
"Inference is basically free."	"Inference costs scale with usage — it's the recurring line item that grows as adoption grows."	Underselling inference costs creates budget surprises that damage the relationship post-deployment.
"We'd retrain it whenever your policies change."	"Policy updates are typically handled by re-fine-tuning or updating the retrieval layer — not a full retraining cycle."	"Retrain" sounds like months of work; the actual answer is usually much faster.
"The AI doesn't know your terminology yet, but it will learn."	"We can fine-tune it on documents that use your terminology, or use prompt engineering to establish that context at inference time."	Vague learning language erodes credibility with technical buyers.
"Building your own model gives you full control."	"Fine-tuning an existing model gives you behavioral control without the infrastructure cost of building from scratch."	"Your own model" implies pre-training; the buyer almost certainly wants fine-tuning.
"What model are you training on?"	"Which base model are you considering fine-tuning from, or are you evaluating hosted inference options?"	The question assumes fine-tuning; clarifying that assumption opens the right conversation.
"Once it's trained, you're done."	"Fine-tuning is a one-time project, but inference is ongoing — and so is the cost."	The "done" framing misses the operational reality of production AI.

Callout: Okta Concept Mapping

“

The identity lifecycle analog — and where it breaks.

The three-regime structure maps loosely onto the identity lifecycle: pre-training is like building a new directory service from the ground up (a vendor-side infrastructure project no customer initiates), fine-tuning is like provisioning and schema extension (configuring the system to reflect your organization's specific structure and attributes), and inference is like an authentication transaction (a per-call operation that reads from the configured system without changing it). The mapping is useful for grounding the conversation; it gives buyers who think in identity terms a place to stand. Where it breaks: in identity, provisioning is cheap, fast, and reversible with minimal friction. Fine-tuning is expensive relative to provisioning, takes days rather than minutes, and produces a new artifact that has to be managed, versioned, and potentially re-run when the training data changes. The other break point is cost structure: enterprise identity doesn't typically bill per-authentication at the scale that makes inference economics interesting. When a buyer asks "what does it cost to run," they're in inference territory, and the answer requires a different kind of scoping conversation than IDAM pricing usually does.