Training, Fine-Tuning, and Inference: Three Regimes, One Phrase That Conflates All of Them

By Carey Whitten— May 5, 2026

Training, Fine-Tuning, and Inference: Three Regimes, One Phrase That Conflates All of Them

When a CAIO says "we want to train it on our data," they're usually not describing a technical requirement. They're describing a governance concern, and the technical question underneath it is which of three very different regimes they're actually asking about. Pre-training, fine-tuning, and inference describe decisions made at different moments by different actors, with budgets that differ by orders of magnitude. The AE who can decode that phrase in real time, without making the buyer feel corrected, controls the next step of the conversation.

The Three Regimes

Same fields, same order, same depth — because these subjects only become useful when you can compare them directly.

Pre-Training

What it is: Building a foundation model from scratch by training on internet-scale text and data.

What it does: Produces the base model, the entity that has internalized language structure, general reasoning patterns, and broad world knowledge before any customer-specific context is introduced. Everything downstream, including fine-tuning, starts here.

Who's behind it: Model vendors. OpenAI, Anthropic, Google DeepMind, Meta. Not enterprises, not system integrators, not agencies. The compute infrastructure, the data curation, and the training runs are entirely vendor-side operations. By the time a customer is evaluating a model, pre-training is already history.

What makes it distinct: The cost and time regime is categorically different from everything downstream. Illustrative benchmarks, subject to change and model-specific: training runs at GPT-4 scale have been estimated in published analyses at $50M to over $500M in compute costs, taking months of continuous GPU cluster operation. These numbers are not customer budget items. They are vendor capital expenditures that get amortized into API pricing and licensing. A customer who says "we want to train our own model" is describing a pre-training project, and that conversation needs to be redirected immediately.

Fine-Tuning

What it is: Adapting an existing pre-trained model using a smaller, domain-specific dataset to shift its behavior toward a particular task, domain, or style.

What it does: Adjusts the model's internal weights, the numerical parameters that govern how it processes and generates text, using examples drawn from the customer's own data. The base model's general capabilities remain; the fine-tuning layer shapes how those capabilities express themselves in a specific context.

Who's behind it: The enterprise or agency, using their own data, on top of a vendor's base model. Fine-tuning is the regime where customer data actually enters the model. This is the decision that has data governance implications, data ownership questions, and contractual terms that vary significantly by vendor.

What makes it distinct: The cost and time regime is accessible to enterprises in a way pre-training is not. Illustrative benchmarks: fine-tuning a mid-size model on a domain-specific dataset might run hours to a few days of compute, at costs ranging from a few thousand to tens of thousands of dollars depending on model size, dataset volume, and infrastructure. (Vendor-hosted fine-tuning APIs, like those offered by OpenAI and others, publish per-token pricing for training runs; check current documentation for current numbers.) Fine-tuning changes the model's weights. Customer data doesn't just inform the model's responses; it becomes part of how the model reasons. That's a different data governance posture than providing data at query time.

Inference

What it is: Running the model to generate a response to a specific input, one call, one output.

What it does: Produces the actual answer, summary, classification, or generation for each request. Every interaction a user has with an AI system is an inference call. This is the operational regime, the one that runs continuously from the moment a system goes live.

Who's behind it: Whoever operates the deployment. Vendor-hosted inference (API calls to OpenAI, Anthropic, Google) or self-hosted inference on agency infrastructure. The model being run may be a base model or a fine-tuned model; inference is the act of running it, regardless of what shaped it.

What makes it distinct: Inference is the line on the bill that never stops. Costs are per-token; illustrative benchmarks as of mid-2026 range from roughly $0.002 to $0.06 per thousand tokens depending on model and provider, with significant variation for frontier versus smaller models. Volume is the variable that matters. An agency running a high-traffic document summarization workflow will accumulate inference costs that dwarf any one-time fine-tuning expense. At inference time, customer data is in the context window, the text the model is reading right now, not in the model's weights. It doesn't persist. It doesn't change the model. This distinction is the technical answer to "we don't want our data used for training."

Comparison: Three Clocks, Three Actors, Three Budgets

The organizing structure here is temporal and organizational. Pre-training, fine-tuning, and inference are not a menu of alternatives; they are decisions made at different moments by different parties. The comparison that earns its keep is: who decides, when, at what cost, and with whose data.

Cost

Pre-training cost is absorbed by the vendor and invisible to the customer except as it's reflected in licensing and API pricing. It is not a customer budget line. Fine-tuning cost is a customer-paid, one-time (or periodic) expense, typically in the thousands to low hundreds of thousands of dollars for enterprise use cases, depending on model and dataset size. Inference cost is ongoing, scales directly with usage volume, and is often the largest AI cost item in a mature deployment. The common mistake in early procurement conversations is treating fine-tuning as the big-ticket item. For high-volume deployments, inference accumulates faster.

Time

Pre-training: months. Fine-tuning: hours to days, sometimes a week for large datasets and large models. Inference: milliseconds per call. The time dimension matters for procurement because it shapes the project timeline. An agency that wants a domain-adapted model needs to budget for a fine-tuning cycle before deployment; that's not a day-one capability. An agency that wants to start using a base model immediately can do so through inference from day one.

Data Ownership

Buyer concern about "training it on our data" usually lives here, and this is where the three regimes diverge most sharply.

Pre-training uses vendor-curated data. Customer data is not involved. Customer data ownership is not implicated.

Fine-tuning uses customer data to modify model weights. What the customer owns after fine-tuning varies by vendor contract. Generally: the customer owns the fine-tuned weights (the adapted model), but not the base model those weights were built on. The base model remains licensed, not owned. Data submitted for fine-tuning is subject to vendor data handling terms, which vary, and which matter for FedRAMP-authorized environments. Read the contract.

Inference uses customer data in the context window, the input to each call. That data is processed to generate a response and, under most vendor terms for enterprise and government customers, is not retained or used for further training. The model's weights are unchanged. This is the "we don't want our data used for training" posture, and it's achievable through inference-only deployment with appropriate vendor terms.

The phrase "train it on our data" most often describes a fine-tuning intent or a RAG intent, retrieval-augmented generation, where customer data is retrieved and placed in the context window at inference time without modifying the model at all. RAG is an inference-regime solution to a problem buyers often frame as a training problem. It has different cost, time, and data governance characteristics than fine-tuning, and it's frequently the right answer for agencies with sensitive data and strict retention requirements.

Field Language Guide

Don't say	Do say	Why it matters
"Train it on our data"	"Are you looking to fine-tune on your data, or retrieve it at inference time through RAG?"	These are different cost regimes and different data governance postures
"We want to build our own model"	"Do you mean fine-tune an existing model, or pre-train from scratch?"	Pre-training is vendor-only territory; the question resets scope before it becomes a problem
"How much does training cost?"	"Which regime — fine-tuning or inference — are you budgeting for? The numbers differ by orders of magnitude"	Prevents a number from landing in the wrong context
"We own the model"	"You'd own the fine-tuned weights; the base model underneath is licensed, not owned"	Partial ownership is the accurate framing; "own the model" overstates it
"We don't want our data used for training"	"That's an inference-only posture — your data stays in the context window, not the weights"	Gives the buyer a technically accurate way to describe their governance requirement
"Can we retrain it when it gets things wrong?"	"You can fine-tune again on corrected examples — you're not rebuilding from scratch each time"	Reframes the feedback loop as an incremental cost, not a major project
"We want a private model"	"Private inference endpoint, or a fine-tuned model on private infrastructure — which are you describing?"	These are different procurement paths with different timelines and costs
"Make it smarter about our domain"	"Fine-tuning or RAG — which fits your data sensitivity requirements?"	Surfaces the governance question before the technical one, which is where the buyer actually needs to decide
"What does it cost to run?"	"Inference cost is per-token — what's your expected call volume?"	Volume is the variable; without it, the number is meaningless
"We want to update it regularly"	"Fine-tuning on a cadence, or RAG with a live data source? Depends on how current the data needs to be"	Distinguishes batch model updates from real-time retrieval — different architectures, different costs
"The model doesn't know our policies"	"That's a fine-tuning or RAG decision — what's your data classification on those documents?"	Routes to the right conversation and signals you understand the governance layer
"We need it to be accurate about our data"	"Accuracy on your data is a fine-tuning or RAG question, separate from model selection"	Prevents the buyer from chasing a different base model when the real fix is a different data strategy

“

Okta Concept Mapping

The closest IDAM analog to these three regimes is the directory schema. Pre-training built the base schema, the universal structure that defines what a model knows about language and the world, the way a base LDAP schema defines what an identity object can contain. Fine-tuning is like extending that schema with your organization's custom attributes: your data, your domain, your behavioral requirements, layered onto the base. Inference is the lookup, every query hitting the model the way every authentication request hits the directory, per-call, per-millisecond.

The analogy holds for data flow and cost structure. It breaks for behavioral impact, and that's the part that matters in a buyer conversation. In a directory, adding custom attributes to the schema doesn't change how the authentication engine evaluates a login. The logic stays the same; you've just added fields. In fine-tuning, your data changes the model's actual reasoning behavior, not just what it knows, but how it processes and responds. "We'll just add our data" undersells what fine-tuning actually does to a model, and oversells what inference-with-context does. Knowing where the analogy holds and where it doesn't is what separates a fluent answer from a confident one that lands wrong.