Lesson 4 — AI Foundations
Every language model conversation in an enterprise account eventually arrives at one of three regimes: pre-training, fine-tuning, or inference. They operate on different timescales — months, days, milliseconds — carry different price tags, and involve different actors. When a buyer says "we want to train it on our data," they're describing one of these regimes, and only one of them is actually accessible to an enterprise. Knowing which clock is running is what lets you redirect that conversation without bluffing, without freezing, and without pulling in an SE.
Pre-Training
What it is: The process of training a language model from scratch on a massive corpus of text, establishing the model's foundational knowledge and capabilities.
What it does: Pre-training is where the model learns language — grammar, reasoning patterns, world knowledge, the ability to follow instructions. A model that hasn't been pre-trained isn't a model; it's a random weight initialization. Pre-training is what turns that into GPT-4, Claude, or Llama 3. The training corpus typically spans hundreds of billions to trillions of tokens drawn from web crawls, books, code repositories, and licensed datasets. The model sees all of it, adjusts its internal parameters to predict patterns in that data, and emerges with general-purpose capabilities.
Who's behind it: AI vendors and well-funded research labs. OpenAI, Anthropic, Google DeepMind, Meta, Mistral. The compute cost for a frontier model pre-training run is estimated in the range of $50 million to $100 million or more — figures that are widely cited but difficult to verify precisely and that shift as hardware costs change. The time commitment is months of continuous GPU cluster operation. Nobody in your buyer's organization is doing this. Nobody in your buyer's organization should be doing this.
What makes it distinct: Pre-training is the only regime that creates a new model. Fine-tuning and inference both assume a pre-trained model already exists. When a buyer says "we want to build our own AI," pre-training is what they'd actually have to do — and it's almost never what they mean.
Okta Concept Mapping
The closest IDAM analog to pre-training is standing up your own identity provider from scratch — writing the authentication logic, the token issuance, the directory schema, all of it. Nobody does that. You buy Okta, or you buy a competitor, because the foundational infrastructure already exists and the value is in adaptation, not construction. Pre-training follows the same logic: the foundational model already exists. The buyer's question is how to adapt it. Where the analogy breaks: unlike an IDP, a pre-trained model can't be "configured" in the traditional sense. Adaptation requires a separate process — fine-tuning — that actually modifies the model's weights. Configuration and weight modification are not the same operation, and conflating them will mislead a buyer who's trying to understand data governance.
Fine-Tuning
What it is: The process of continuing to train an existing pre-trained model on a smaller, domain-specific dataset to adjust its behavior.
What it does: Fine-tuning takes a pre-trained model and shifts its outputs toward a specific style, domain, or task. A model fine-tuned on federal acquisition regulations will produce outputs that reflect that vocabulary and those patterns. A model fine-tuned on your agency's incident response playbooks will write incident reports that sound like your agency writes them. Fine-tuning doesn't replace the model's foundational knowledge — it layers on top of it. The base model's weights shift slightly; the model's general capabilities remain intact; its behavior in the target domain improves.
Who's behind it: Enterprises, agencies, and developers with access to a pre-trained base model and a curated dataset. Cloud providers — Azure OpenAI, AWS Bedrock, Google Vertex AI — offer fine-tuning as a managed service, which means the buyer doesn't need to run GPU infrastructure. Open-source base models like Llama 3 and Mistral can be fine-tuned on the buyer's own infrastructure, which matters when data residency is a constraint. The time commitment is hours to days, not months. The cost is a fraction of pre-training — typically thousands to low hundreds of thousands of dollars in compute, depending on dataset size and the fine-tuning technique used.
What makes it distinct: Fine-tuning is the only regime where the buyer's data actually modifies the model. This is what most buyers mean when they say "train it on our data." The data goes in, the weights change, the model's behavior shifts. The buyer owns the fine-tuned model, or the fine-tuned adapter, depending on the technique. The data questions are real: where does the training data live, who processes it, what happens to it after the fine-tuning run completes. These are answerable questions, and they're worth surfacing before the run starts.
Okta Concept Mapping
Fine-tuning maps reasonably well to configuring an Okta tenant for a specific agency environment — you're working with an existing platform and adjusting its behavior to match your context. The analogy holds for the "your data shapes the output" intuition. Where it breaks: configuring Okta doesn't change the underlying software. Fine-tuning modifies the model's parameters and produces a new artifact. This matters in a buyer conversation when data governance comes up. Who owns that artifact, and where it lives, are questions worth surfacing early.
Inference
What it is: The process of running a trained model to generate a response to a specific input.
What it does: Inference is what happens every time a user sends a prompt and gets a response. The model's weights are fixed. The input goes in, the output comes out, nothing about the model changes. Inference is the operational state — the model doing its job. It's also the only regime that runs continuously in production. Pre-training and fine-tuning are one-time or periodic events. Inference is ongoing, which means its cost structure is fundamentally different: a consumption cost that scales with usage, not a capital line item. When a buyer asks "what does it cost to run this," they're asking about inference.
Who's behind it: The entity running the model infrastructure. For cloud-hosted models, that's the vendor — OpenAI, Anthropic, Azure, Google. For self-hosted models, that's whoever is running the servers. The billing model is per-call or per-token, and it shows up as a line item, not a project budget.
What makes it distinct: Inference is stateless with respect to the model. Your prompts don't change the weights. The model you're running today is the same model you'll run tomorrow unless someone explicitly runs a fine-tuning process. That's what kills the "it's learning our secrets" concern. It's also what kills the "it gets smarter over time" assumption. Neither is true at inference time.
Okta Concept Mapping
Inference billing maps cleanly to the per-transaction cost model that IDAM teams already manage — think SCIM provisioning calls, MFA authentications, or API gateway throughput. Each call costs something, volume drives the bill, and optimization is about reducing unnecessary calls. Where the analogy gets complicated: in IDAM, a failed authentication still costs a transaction. In inference, a poorly constructed prompt doesn't just cost tokens — it may require a follow-up call to get a usable response, doubling the cost. Prompt efficiency is the inference analog of reducing redundant authentication requests. Worth naming it that way in a buyer conversation about AI operational costs.
Comparing the Three Regimes
I'm using trait-led analysis here, anchored on five dimensions that map directly to buyer decision points: time, cost, data involvement, who controls the process, and what the output is. These are the dimensions that actually come up in an enterprise AI conversation. Every regime appears on every dimension.
| Dimension | Pre-Training | Fine-Tuning | Inference |
|---|---|---|---|
| Time | Months (continuous GPU cluster operation) | Hours to days | Milliseconds per query, ongoing |
| Cost | $50M–$100M+ in compute (estimates; subject to change as hardware costs shift) | Thousands to low hundreds of thousands of dollars | Per-token or per-call; consumption billing |
| Who does it | AI vendors and well-funded research labs | Enterprises, agencies, developers with cloud access | Whoever runs the model infrastructure |
| Your data's role | None — pre-training uses publicly available or licensed corpora | Central — your data modifies the model's weights | None — your input is processed, not stored in the model |
| Output | A new base model | A modified version of an existing model | A response to a specific prompt |
Control deserves its own treatment, because it means something different at each layer. In pre-training, the buyer controls nothing — they're buying the output of someone else's process. In fine-tuning, the buyer controls the training data and, depending on the deployment model, the resulting artifact. In inference, the buyer controls the prompt and the context window, but not the model itself.
A buyer who wants "control over the AI" might mean they want to own the model weights (fine-tuning territory), or they might mean they want to control what the model can see and say (inference-time controls). Those are different problems with different solutions, and the conversation stalls until you've established which one they're describing.
How to Say This in the Field
The primary scenario: redirecting "train it on our data" to the correct regime. Secondary scenarios: timeline conversations, pricing conversations, data governance questions, and the "does it learn from us" misconception.
| Don't say | Do say | Why it matters |
|---|---|---|
| "We can train it on your data" | "What you're describing is fine-tuning. That's a process that takes days, not months, and it uses your data to adjust how the model behaves. That's different from building a model from scratch." | Commits you to the right regime without overpromising |
| "Training is expensive" | "Pre-training a frontier model from scratch costs north of $100 million in compute. Fine-tuning an existing model costs a fraction of that — we're talking thousands to low hundreds of thousands, depending on the dataset and the approach." | Separates the regimes by cost before the buyer conflates them |
| "Every time it runs, it learns from your data" | "Inference doesn't update the model. The model you're running today is the same model you'll run tomorrow unless someone explicitly runs a fine-tuning process. Your prompts don't change the weights." | Kills the "it's learning our secrets" concern before it derails the conversation |
| "We'd have to build a custom model" | "You don't need a custom model — you need a fine-tuned one. That's a different process, a different timeline, and a different price point." | Prevents the buyer from walking away because they think the cost is nine figures |
| "The AI gets smarter over time" | "The model's capabilities are fixed at training time. What improves over time is how you use it — the prompts, the context, the workflow integration." | Accurate, and it shifts the conversation to where the buyer actually has leverage |
| "Training takes a while" | "Fine-tuning takes hours to days. Pre-training takes months. Which one are we talking about?" | Forces the buyer to specify the regime, which is the whole point |
| "Your data is safe" | "During fine-tuning, your data is used to adjust the model's weights and then it's done — it's not stored in the model or accessible through the model's outputs. The data governance questions are real and worth working through with your security team." | Honest, specific, and doesn't overclaim |
| "It's just a configuration" | "Fine-tuning actually modifies the model's parameters — it produces a new artifact. That's different from configuring a software product, and it has implications for who owns the output and where it lives." | Prevents the buyer from underestimating what fine-tuning involves |
| "We can do that at inference time" | "What you're describing would need to happen at fine-tuning time, not inference time. Inference is read-only — the model doesn't update while it's running." | Keeps the buyer from expecting runtime learning |
| "The inference cost is negligible" | "Inference is a consumption cost — it scales with usage. At low volume it's cheap; at enterprise scale it's a real line on the bill. Worth modeling the expected call volume before you commit to a pricing structure." | Prevents sticker shock at renewal |
Next: Lesson 5 covers grounding — the techniques for connecting a model's outputs to your organization's current data without modifying the model itself.

