Recap — Models & Vendors
This is not a summary. You've done the reading. What follows is the structure that makes it retrievable when you're in a room and someone asks a question you've already answered somewhere in the last eight articles.
Three questions. In this order. Every time.
The Sequence
The order is load-bearing. Buyers who start with vendor preference skip the questions that would tell them whether that preference is costing them money or creating compliance exposure they haven't mapped yet. Walk them back to question one.
Question One: Capability Tier
The principle: Route, don't maximize. Most enterprise AI tasks don't require frontier capability. Deploying a frontier model for document summarization or classification is the equivalent of licensing enterprise software for a use case the standard tier handles without breaking a sweat.
Frontier Model — The highest-capability, highest-cost model a given lab offers at a given point in time.
- When it comes up: When a buyer says "we want GPT-4" or "we need the best model" without having specified a task.
- Don't confuse with: The right model for the task. Frontier is a market position, not a fitness designation. The capability gap between frontier and capable tiers is closing faster than the price gap.
Capable Model — Mid-tier models that handle the majority of enterprise tasks: summarization, classification, extraction, structured generation, moderate reasoning.
- When it comes up: When you're routing after the task is defined. This is where most workloads land.
- Don't confuse with: A downgrade. "Capable" is not a formal designation — different vendors use different names for the same tier. Claude Sonnet, GPT-4o mini, Gemini Flash. Same tier, different labels.
Task-Specific / Fine-Tuned Model — A base model adapted on domain-specific data for a narrow, high-volume, well-defined task.
- When it comes up: High-volume, low-variability workloads where latency and cost matter more than generality.
- Don't confuse with: Configuration. Fine-tuning creates a new artifact. It's closer to custom development than a settings change, and it may require a separate security review cycle.
| AI Term | What It Means in AI | IDAM Equivalent | Key Divergence |
|---|---|---|---|
| Frontier model | Highest-capability, highest-cost offering from a given lab | Enterprise tier / premium SKU | Capability gap between tiers is closing faster than pricing gap — last year's frontier is this year's mid-tier |
| Capable model | Mid-tier, handles most enterprise tasks | Professional / standard tier | "Capable" is not a formal designation — vendor naming varies widely for the same performance band |
| Fine-tuned model | Base model adapted on domain-specific data | Customization / configuration | Fine-tuning produces a new artifact requiring separate security review — not a config change |
If you remember nothing else from Question One: The buyer who insists on frontier models for every task is paying for capability they won't use. Your job is to ask what the task is before you agree that the tier is right.
Question Two: Procurement and Compliance Constraints
The principle: Your procurement office has already made most of this decision. Find out which hyperscaler has the existing contract vehicle before you spec the model.
In federal accounts, roughly 70% of AI workloads in FY2025 ran on infrastructure already covered by an existing hyperscaler IDIQ or BPA. Model selection reads as a technical conversation. In practice, it's a contracting decision, and the procurement office often answered it first.
Model Provider — The lab that trained and maintains the model weights.
- When it comes up: When a buyer asks "is this an OpenAI product?" or "are we using Anthropic?"
- Don't confuse with: The contracting party. The lab may have no direct contractual relationship with the agency. The hyperscaler does.
Deployment Environment — Where the model runs: which cloud region, under whose ATO, on whose infrastructure.
- When it comes up: Every compliance conversation. This is the only variable that FedRAMP authorization attaches to.
- Don't confuse with: Model origin. A model trained in one country and deployed in a FedRAMP High environment is a different compliance profile than the same model running on ambiguous infrastructure.
The Geopolitics Question — Resolved to One Principle
Buyers will ask about Chinese-developed models, European data sovereignty, and model origin generally. The answer that holds across every variation: who operates the datacenter matters, not who trained the weights.
FedRAMP authorization, data residency requirements, and contractual data handling obligations all attach to the operator, not the model origin. A Llama model running in Azure Government under an existing EA is a different risk profile than a US-trained model running on infrastructure with no clear data handling agreement. The compliance question is always about the operator. Model provenance is a secondary consideration that matters for supply chain risk assessments, not for primary compliance determinations.
| AI Term | What It Means in AI | IDAM Equivalent | Key Divergence |
|---|---|---|---|
| Model provider | Lab that trained the model | Software vendor / ISV | The lab may have no contractual relationship with the agency — the hyperscaler does |
| Deployment environment | Where the model runs (cloud region, infrastructure) | Hosting / infrastructure layer | FedRAMP authorization attaches to the environment, not the model — same model, different compliance profile depending on where it runs |
| Open-weight model | Model with publicly available weights, deployable on your own infrastructure | Open-source software | Weights are available; training data and full methodology often aren't — "open" is partial, not equivalent to OSI-definition open source |
If you remember nothing else from Question Two: The compliance question is always about the operator. Who trained the weights is a supply chain question. Who runs the datacenter is the compliance question.
Question Three: Volume Profile
The principle: Per-token is a credit card. Provisioned throughput is a lease. Know which one the buyer's volume justifies before the conversation about cost.
The crossover point is approximately 10 million model calls per month, though this varies by model tier and provider. Below that threshold, per-token pricing is almost always cheaper. Above it, provisioned throughput typically wins on cost and provides the latency predictability that production workloads require.
Per-Token Pricing — Pay per unit of text processed, billed on consumption.
- When it comes up: Early-stage deployments, pilots, variable workloads, anything where call volume hasn't stabilized.
- Don't confuse with: Cheap. Token volume in agentic workflows is unpredictable in ways that auth events aren't. A single multi-step agent task can consume thousands of tokens invisibly. Buyers who benchmark on simple queries and then deploy agents get surprised by the bill.
Provisioned Throughput — Reserved model capacity billed regardless of use, in exchange for guaranteed availability and latency.
- When it comes up: Production workloads above the per-token crossover, or any workload where latency SLAs matter.
- Don't confuse with: A performance guarantee. You're reserving capacity, not guaranteeing output quality. The SLA covers availability, not results.
Context Window — The maximum amount of text the model can process in a single call, measured in tokens.
- When it comes up: When buyers ask about document length limits, or when agentic workflows start hitting errors.
- Don't confuse with: A session. Context windows don't expire on inactivity — they're a hard ceiling on input size per call, not a timeout mechanism.
| AI Term | What It Means in AI | IDAM Equivalent | Key Divergence |
|---|---|---|---|
| Per-token pricing | Pay per unit of text processed | Consumption-based licensing (per-auth, per-event) | Token volume in agentic tasks is unpredictable in ways auth events aren't — a single task can consume thousands of tokens invisibly |
| Provisioned throughput | Reserved model capacity, billed regardless of use | Reserved instance / committed use in cloud contracts | No SLA on output quality, only on availability — you're reserving capacity, not guaranteeing results |
| Context window | Maximum text processable in one model call | Session size limit | Context windows don't expire on inactivity — they're a hard input ceiling per call, not a timeout |
If you remember nothing else from Question Three: The buyer who asks "what does this cost?" before knowing their call volume is asking the wrong question first. Volume profile determines which pricing model applies. Get the volume estimate before you quote.
The Open-Weight Default
Most enterprises assume open-weight means unsupported, frontier means enterprise-grade, and proprietary means safer. All three assumptions are wrong in the current market.
Open-weight models — Llama 3, Mistral, Falcon — running on Azure, AWS, or GCP have enterprise support agreements, SLAs, and existing contract vehicles. The model origin is less relevant than the deployment wrapper. For buyers with high-volume, well-defined tasks and existing hyperscaler relationships, open-weight on a hyperscaler comes out ahead more often than the default conversation suggests.
The cases where proprietary frontier models are clearly right: tasks requiring the highest available reasoning capability, multimodal tasks at the edge of current open-weight performance, or workloads where the lab's ongoing model improvements are part of the value proposition. Everything else is a routing decision.
If you remember nothing else from this section: For most enterprise workloads, open-weight on a hyperscaler passes all three questions simultaneously. Start there.
For More Information
| Topic | Source Article |
|---|---|
| Capability tier definitions and routing logic | Frontier, Capable, Specialized: How to Read the Model Tier Map |
| Per-token vs. provisioned throughput crossover math | The Pricing Conversation: What to Say Before the Quote |
| Open-weight model landscape and hyperscaler deployment options | Open Weights, Closed Assumptions: The Enterprise Case for Llama and Friends |
| Geopolitics, data residency, and the datacenter operator principle | Who Trained It vs. Who Runs It: Resolving the Model Provenance Question |
| FedRAMP authorization and AI deployment environments | Compliance Vocabulary for AI: What FedRAMP Actually Covers |
| Context windows and agentic token consumption | Why Your Agent's Bill Is Bigger Than You Expected |
| Procurement vehicles and hyperscaler contract alignment | The Contracting Layer: Why the Hyperscaler Decision Usually Comes First |
| Fine-tuning as custom development, not configuration | Fine-Tuning Is Not a Setting: What Buyers Mean and What They're Asking For |

