Nine pieces. What follows is organized by decision point, not by lesson, so you can find the right section before a call rather than reconstruct the whole sequence.
The Decision Framework
Run these in order. The answer to each one constrains the next.
Question 1: What capability tier does this task actually require?
Three tiers, established in Lesson 5:
- Small — Classification, extraction, routing, simple lookup. Runs fast, costs almost nothing, handles the majority of enterprise AI tasks once you break them apart.
- Efficient — Drafting, summarization, moderate reasoning, structured output. The workhorse tier. Most enterprise use cases land here.
- Frontier — Complex multi-step reasoning, novel synthesis, agentic task chains. Genuinely necessary for a narrow slice of what agencies are actually building.
The most common enterprise mistake, per Lesson 5: defaulting to frontier because it feels safer. Slower, more expensive, harder to audit. That's what frontier delivers when the task doesn't require it. Route by what the task requires. Vendor marketing will always point you toward frontier.
Open-weight models on a hyperscaler (Lesson 7) are often the right answer at the Efficient tier — comparable output quality, no proprietary lock-in, deployed inside infrastructure you already control.
If you remember nothing else: Route by task complexity first. Everything else follows from that.
Question 2: What procurement and compliance constraints already exist?
This question usually answers itself. From Lesson 3:
- Existing AWS enterprise agreement → Bedrock
- Microsoft EA or Azure credits → Azure OpenAI Service
- GCP contract or Google Workspace → Vertex AI
Technical merit is rarely the selection driver. The hyperscaler is selected by procurement, and the model catalog comes with it. Your job in the call is to confirm which hyperscaler the agency is already committed to, then work within that catalog. The infrastructure decision is already made.
Data residency requirements and FedRAMP authorization status narrow the model list further. The hyperscaler's compliance posture covers the infrastructure layer. Model-level authorization is a separate question, and the list of FedRAMP-authorized model deployments is shorter than most customers assume.
This is also where the geopolitics question resolves (covered below).
If you remember nothing else: The hyperscaler is usually already chosen. Find out which one before you discuss any specific model.
Question 3: What is the volume profile?
From Lesson 4:
- Under roughly 10 million API calls per month: Per-token pricing. Predictable enough to budget, flexible enough to absorb variable workloads. Watch for prompt caching opportunities when the same system prompt appears across many calls — the savings are real and the math is immediate.
- Over roughly 10 million calls per month: Provisioned throughput. The 30–50% cost reduction threshold from Lesson 4 applies here. Treat it like a capacity reservation, not a subscription — you're committing to a specific model version at a specific throughput level.
Bursty workloads complicate this. A use case that averages 8 million calls per month but spikes to 25 million during peak periods is not cleanly a per-token account. Lesson 4 has the framework for that.
If you remember nothing else: Per-token under 10 million. Provisioned above. The math in Lesson 4 shows you when the crossover pays.
Vocabulary Mapping: Pricing Terms
The collision zone where AI vendor pricing vocabulary lands on enterprise procurement ears trained by software deals.
| AI Term | What It Means in AI | IDAM / Procurement Equivalent | Key Divergence |
|---|---|---|---|
| Token (pricing unit) | Roughly ¾ of a word; the unit vendors charge against | API call / transaction | Token count varies by content length and language. You cannot predict monthly cost from call volume alone without knowing average prompt and response length. |
| Per-token pricing | Pay for what you use, billed by token consumed | Consumption-based / metered billing | Familiar model, unfamiliar unit. The budget conversation requires token estimates, not just call estimates. |
| Provisioned throughput | Reserved processing capacity for a specific model at a committed rate | Reserved capacity / committed use discount | Commitment is to a model version, not a service tier. If the vendor deprecates that version, the reservation terms change. Read the contract. |
| Prompt caching | Storing repeated prompt prefixes so they aren't re-processed on each call | Session persistence / stateful connection | Caching reduces cost but does not maintain state. Each call is still stateless. The savings come from not re-tokenizing the same system prompt 10,000 times a day. |
| Context window | Maximum tokens the model can hold in a single call | Session timeout / token lifetime | A context window is a capacity limit, not a time limit. It doesn't expire; it fills. When it fills, earlier content is dropped, not archived. |
If you remember nothing else: Provisioned throughput is a capacity reservation against a specific model version. It behaves more like a reserved instance than a software subscription.
Vocabulary Mapping: Architecture and Access Terms
The collision zone where AI system design vocabulary overlaps with IDAM vocabulary — same words, different mechanisms.
| AI Term | What It Means in AI | IDAM Equivalent | Key Divergence |
|---|---|---|---|
| Session (model context) | The active context window for a conversation or task chain | Auth session / SSO session | An AI session has no authentication state. It holds prompt history. An auth session holds identity assertions. These are unrelated objects that happen to share a name. |
| Context (prompt context) | The accumulated input the model can see in a given call | Identity context / claims | Identity context is about who the user is. Prompt context is about what the model has been told. Conflating them in a customer conversation creates real confusion about what data the model can access. |
| Scope (model access) | What data or tools an agent or model has been granted access to | OAuth scope | OAuth scope is enforced by an authorization server. Model scope is enforced by prompt design and system instructions — a fundamentally weaker guarantee. |
| Agent (AI agent) | A model configured to take sequential actions using tools and APIs | Endpoint agent / device agent | An AI agent acts on behalf of a user or system using credentials it was given. It is not an identity principal. It does not authenticate. This is the gap Okta's agentic identity work addresses. |
| Open weights | Model parameters published publicly; can be run on any infrastructure | Open source software | Open weights ≠ open source. The weights are available; the training data and methodology often aren't. Governance and licensing terms vary significantly by model family. |
If you remember nothing else: When a customer says "session," ask which kind. An AI context session and an auth session have nothing to do with each other, and the confusion will surface in a security review if not in the sales call.
Geopolitics, Resolved
Lesson 6 established the question. Here's the answer you can say in a meeting:
Who runs the datacenter matters. Who trained the weights does not.
A model trained by a Chinese lab and deployed on AWS Bedrock inside a FedRAMP-authorized region is, from a compliance standpoint, an AWS workload. The datacenter operator — the hyperscaler — determines the applicable compliance posture, data residency, and audit rights. The model's country of origin affects none of those things once it's running inside a controlled environment.
That's a procurement framing, not a political one. The datacenter operator is the answer to Question 2.
Where this breaks: if the model requires a call-home to an external API during inference, if it isn't fully self-contained in the hyperscaler environment, the datacenter argument doesn't hold. Confirm that the deployment is air-gapped from the model vendor's infrastructure before using this framing with a customer.
For More Information
| Recap Entry | Source | Section |
|---|---|---|
| Four-layer stack | Opening | Models & Vendors |
| Frontier / Efficient / Small tier taxonomy; routing pattern; defaulting-to-frontier mistake | Lesson 5 | Models & Vendors |
| Hyperscaler selection; procurement as selection driver; data residency | Lesson 3 | Models & Vendors |
| Per-token vs. provisioned throughput; 30–50% savings threshold; prompt caching | Lesson 4 | Models & Vendors |
| Open-weight models on hyperscalers | Lesson 7 | Models & Vendors |
| Geopolitics and the stack; datacenter operator vs. model origin | Lesson 6 | Models & Vendors |
| Token definition; context window mechanics | Lesson 1 | Models & Vendors |
| Agent architecture; scope enforcement gaps | Lesson 2 | Models & Vendors |

