A Decision Framework for Picking a Model and a Vendor

By Carey Whitten— May 5, 2026

A Decision Framework for Picking a Model and a Vendor

Recap — Models & Vendors

This is not a summary. You've done the reading. What follows is the structure that makes it retrievable when you're in a room and someone asks a question you've already answered somewhere in the last eight articles.

Three questions. In this order. Every time.

The Sequence

The order is load-bearing. Buyers who start with vendor preference skip the questions that would tell them whether that preference is costing them money or creating compliance exposure they haven't mapped yet. Walk them back to question one.

Question One: Capability Tier

The principle: Route, don't maximize. Most enterprise AI tasks don't require frontier capability. Deploying a frontier model for document summarization or classification is the equivalent of licensing enterprise software for a use case the standard tier handles without breaking a sweat.

Frontier Model — The highest-capability, highest-cost model a given lab offers at a given point in time.

When it comes up: When a buyer says "we want GPT-4" or "we need the best model" without having specified a task.
Don't confuse with: The right model for the task. Frontier is a market position, not a fitness designation. The capability gap between frontier and capable tiers is closing faster than the price gap.

Capable Model — Mid-tier models that handle the majority of enterprise tasks: summarization, classification, extraction, structured generation, moderate reasoning.

When it comes up: When you're routing after the task is defined. This is where most workloads land.
Don't confuse with: A downgrade. "Capable" is not a formal designation — different vendors use different names for the same tier. Claude Sonnet, GPT-4o mini, Gemini Flash. Same tier, different labels.

Task-Specific / Fine-Tuned Model — A base model adapted on domain-specific data for a narrow, high-volume, well-defined task.

When it comes up: High-volume, low-variability workloads where latency and cost matter more than generality.
Don't confuse with: Configuration. Fine-tuning creates a new artifact. It's closer to custom development than a settings change, and it may require a separate security review cycle.

AI Term	What It Means in AI	IDAM Equivalent	Key Divergence
Frontier model	Highest-capability, highest-cost offering from a given lab	Enterprise tier / premium SKU	Capability gap between tiers is closing faster than pricing gap — last year's frontier is this year's mid-tier
Capable model	Mid-tier, handles most enterprise tasks	Professional / standard tier	"Capable" is not a formal designation — vendor naming varies widely for the same performance band
Fine-tuned model	Base model adapted on domain-specific data	Customization / configuration	Fine-tuning produces a new artifact requiring separate security review — not a config change

If you remember nothing else from Question One: The buyer who insists on frontier models for every task is paying for capability they won't use. Your job is to ask what the task is before you agree that the tier is right.

Question Two: Procurement and Compliance Constraints

The principle: Your procurement office has already made most of this decision. Find out which hyperscaler has the existing contract vehicle before you spec the model.

In federal accounts, roughly 70% of AI workloads in FY2025 ran on infrastructure already covered by an existing hyperscaler IDIQ or BPA. Model selection reads as a technical conversation. In practice, it's a contracting decision, and the procurement office often answered it first.

Model Provider — The lab that trained and maintains the model weights.

When it comes up: When a buyer asks "is this an OpenAI product?" or "are we using Anthropic?"
Don't confuse with: The contracting party. The lab may have no direct contractual relationship with the agency. The hyperscaler does.

Deployment Environment — Where the model runs: which cloud region, under whose ATO, on whose infrastructure.

When it comes up: Every compliance conversation. This is the only variable that FedRAMP authorization attaches to.
Don't confuse with: Model origin. A model trained in one country and deployed in a FedRAMP High environment is a different compliance profile than the same model running on ambiguous infrastructure.

The Geopolitics Question — Resolved to One Principle

Buyers will ask about Chinese-developed models, European data sovereignty, and model origin generally. The answer that holds across every variation: who operates the datacenter matters, not who trained the weights.

FedRAMP authorization, data residency requirements, and contractual data handling obligations all attach to the operator, not the model origin. A Llama model running in Azure Government under an existing EA is a different risk profile than a US-trained model running on infrastructure with no clear data handling agreement. The compliance question is always about the operator. Model provenance is a secondary consideration that matters for supply chain risk assessments, not for primary compliance determinations.

AI Term	What It Means in AI	IDAM Equivalent	Key Divergence
Model provider	Lab that trained the model	Software vendor / ISV	The lab may have no contractual relationship with the agency — the hyperscaler does
Deployment environment	Where the model runs (cloud region, infrastructure)	Hosting / infrastructure layer	FedRAMP authorization attaches to the environment, not the model — same model, different compliance profile depending on where it runs
Open-weight model	Model with publicly available weights, deployable on your own infrastructure	Open-source software	Weights are available; training data and full methodology often aren't — "open" is partial, not equivalent to OSI-definition open source

If you remember nothing else from Question Two: The compliance question is always about the operator. Who trained the weights is a supply chain question. Who runs the datacenter is the compliance question.

Question Three: Volume Profile

The principle: Per-token is a credit card. Provisioned throughput is a lease. Know which one the buyer's volume justifies before the conversation about cost.

The crossover point is approximately 10 million model calls per month, though this varies by model tier and provider. Below that threshold, per-token pricing is almost always cheaper. Above it, provisioned throughput typically wins on cost and provides the latency predictability that production workloads require.

Per-Token Pricing — Pay per unit of text processed, billed on consumption.

When it comes up: Early-stage deployments, pilots, variable workloads, anything where call volume hasn't stabilized.
Don't confuse with: Cheap. Token volume in agentic workflows is unpredictable in ways that auth events aren't. A single multi-step agent task can consume thousands of tokens invisibly. Buyers who benchmark on simple queries and then deploy agents get surprised by the bill.

Provisioned Throughput — Reserved model capacity billed regardless of use, in exchange for guaranteed availability and latency.

When it comes up: Production workloads above the per-token crossover, or any workload where latency SLAs matter.
Don't confuse with: A performance guarantee. You're reserving capacity, not guaranteeing output quality. The SLA covers availability, not results.

Context Window — The maximum amount of text the model can process in a single call, measured in tokens.

When it comes up: When buyers ask about document length limits, or when agentic workflows start hitting errors.
Don't confuse with: A session. Context windows don't expire on inactivity — they're a hard ceiling on input size per call, not a timeout mechanism.

AI Term	What It Means in AI	IDAM Equivalent	Key Divergence
Per-token pricing	Pay per unit of text processed	Consumption-based licensing (per-auth, per-event)	Token volume in agentic tasks is unpredictable in ways auth events aren't — a single task can consume thousands of tokens invisibly
Provisioned throughput	Reserved model capacity, billed regardless of use	Reserved instance / committed use in cloud contracts	No SLA on output quality, only on availability — you're reserving capacity, not guaranteeing results
Context window	Maximum text processable in one model call	Session size limit	Context windows don't expire on inactivity — they're a hard input ceiling per call, not a timeout

If you remember nothing else from Question Three: The buyer who asks "what does this cost?" before knowing their call volume is asking the wrong question first. Volume profile determines which pricing model applies. Get the volume estimate before you quote.

The Open-Weight Default

Most enterprises assume open-weight means unsupported, frontier means enterprise-grade, and proprietary means safer. All three assumptions are wrong in the current market.

Open-weight models — Llama 3, Mistral, Falcon — running on Azure, AWS, or GCP have enterprise support agreements, SLAs, and existing contract vehicles. The model origin is less relevant than the deployment wrapper. For buyers with high-volume, well-defined tasks and existing hyperscaler relationships, open-weight on a hyperscaler comes out ahead more often than the default conversation suggests.

The cases where proprietary frontier models are clearly right: tasks requiring the highest available reasoning capability, multimodal tasks at the edge of current open-weight performance, or workloads where the lab's ongoing model improvements are part of the value proposition. Everything else is a routing decision.

If you remember nothing else from this section: For most enterprise workloads, open-weight on a hyperscaler passes all three questions simultaneously. Start there.

For More Information

Topic	Source Article
Capability tier definitions and routing logic	Frontier, Capable, Specialized: How to Read the Model Tier Map
Per-token vs. provisioned throughput crossover math	The Pricing Conversation: What to Say Before the Quote
Open-weight model landscape and hyperscaler deployment options	Open Weights, Closed Assumptions: The Enterprise Case for Llama and Friends
Geopolitics, data residency, and the datacenter operator principle	Who Trained It vs. Who Runs It: Resolving the Model Provenance Question
FedRAMP authorization and AI deployment environments	Compliance Vocabulary for AI: What FedRAMP Actually Covers
Context windows and agentic token consumption	Why Your Agent's Bill Is Bigger Than You Expected
Procurement vehicles and hyperscaler contract alignment	The Contracting Layer: Why the Hyperscaler Decision Usually Comes First
Fine-tuning as custom development, not configuration	Fine-Tuning Is Not a Setting: What Buyers Mean and What They're Asking For