The Reference Architecture
Every layer you covered maps to something you already manage.
| Layer | Function in AI Stack | Enterprise IT Equivalent | Okta's Position |
|---|---|---|---|
| Identity / SSO | Authenticates the user or application; provides per-user context to every downstream layer | Identity Provider — the same SAML/OIDC flows you've deployed for SaaS | Core Okta territory; Workforce Identity feeds user context into gateway policy |
| AI Gateway | Intercepts every prompt and completion; enforces auth, routing, DLP, caching, and logging | Reverse proxy / API gateway — the layer in front of your internal APIs | Okta identity signals feed gateway policy; the gateway itself is a separate product category |
| Provider(s) | Executes inference; holds negotiated data terms (ZDR, residency, retention) | SaaS vendor with a DPA — except the "data" is your users' prompts | Not Okta's layer; where legal and procurement live |
| Observability / Evals | Monitors model behavior, response quality, and drift in production | APM + SIEM — except the signals are semantic, not just latency and errors | Not Okta's layer; feeds the audit trail that compliance will eventually ask for |
| FinOps / Chargeback | Tracks token consumption by user, team, and application; allocates cost | Showback/chargeback in cloud FinOps — same model, different unit (tokens, not compute-hours) | Okta user identity is the attribution key that makes per-team chargeback possible |
| Provisioning | Controls which users and applications can access which AI services | Lifecycle Management / SCIM provisioning — the flows that onboard users to SaaS | Okta Lifecycle Management; Entra ID Governance where Microsoft is the IdP |
If you remember nothing else from this section: The gateway is not the identity layer. Both are required, and conflating them is the fastest way to lose credibility with a security architect.
Vocabulary Collision Zones
Three terms that mean different things to different people in the same meeting.
Token
| AI Term | What It Means in AI | IDAM Equivalent | Key Divergence |
|---|---|---|---|
| Token | A unit of text — roughly ¾ of a word — that the model processes; the billing unit for every API call | OAuth access token — a credential that proves authorization | An OAuth token grants access. An LLM token gets consumed during inference. Same word, completely different function — and confusing them in a FinOps conversation will end the conversation. |
Gateway
| AI Term | What It Means in AI | IDAM Equivalent | Key Divergence |
|---|---|---|---|
| AI Gateway | A proxy layer that intercepts prompts and completions to enforce policy: auth, routing, DLP, caching, logging | API gateway (Kong, Apigee, AWS API GW) — a layer that manages API traffic | A traditional API gateway is primarily concerned with routing and rate-limiting. An AI gateway adds semantic inspection — it reads the content of the request, not just its headers. That's a fundamentally different threat surface. |
Observability
| AI Term | What It Means in AI | IDAM Equivalent | Key Divergence |
|---|---|---|---|
| AI Observability | Monitoring model behavior, output quality, and drift — did the model start answering differently than it did last month? | APM / SIEM — latency, error rates, log aggregation | Traditional observability is structural: did the system respond? AI observability is semantic: did the system respond correctly? You can have green dashboards and a model that's quietly hallucinating. |
If you remember nothing else from this section: When a buyer says "token," ask which kind. The answer tells you whether you're talking to finance or security — and whether the conversation is about billing or breach.
Layer-by-Layer Reference
Ordered by position in the stack, top to bottom.
Sanctioned AI — AI services formally approved, provisioned, and governed by IT. When it comes up: Any conversation about AI policy, acceptable use, or why the CISO is involved in what looks like a productivity tool purchase. Don't confuse with: Managed AI — sanctioned means approved for use; managed means actively governed with logging, attribution, and policy enforcement. You want both.
Shadow AI — AI services employees are using without IT's knowledge or approval, usually because the sanctioned path is too slow. When it comes up: When a buyer says "we don't have an AI problem yet." Fourteen teams on one personal API key is an AI problem. Don't confuse with: Shadow IT broadly — shadow AI carries additional exposure because the data leaving the perimeter is often conversational and context-rich in ways that a rogue SaaS subscription isn't.
AI Gateway — The proxy layer between your users and your AI providers; the enforcement point for auth, routing, DLP, caching, and logging. When it comes up: Any architecture conversation where the buyer asks "how do we control what goes to the model." Don't confuse with: The identity layer — the gateway enforces policy; the IdP establishes who the user is. The gateway needs the IdP to do its job. Neither replaces the other.
Per-User Attribution — The ability to trace every prompt and completion back to a specific authenticated user identity. When it comes up: The moment legal or HR asks "who sent that?" — which will happen. Don't confuse with: Logging — you can log everything and still not know who sent it if the gateway is sitting behind a shared service account. Attribution requires identity, not just records.
Token Budget — A defined limit on how many LLM tokens a user, team, or application can consume in a period. When it comes up: FinOps conversations, budget cycle planning, any buyer who's had a surprise AI bill. Don't confuse with: Rate limiting — a rate limit controls requests per second; a token budget controls total consumption over time. Both are real controls; they answer different questions.
Chargeback — Allocating AI infrastructure costs to the business units that generated them, based on token consumption. When it comes up: Any conversation with a CFO or IT finance lead who wants to know why the AI line item is growing. Don't confuse with: Showback — showback reports consumption without transferring cost; chargeback actually moves money. Buyers often say chargeback when they mean showback. Clarify before you architect anything.
Zero Data Retention (ZDR) — A contractual term with an AI provider specifying that prompt and completion data is not stored after inference completes. When it comes up: When legal asks whether the provider can be subpoenaed for a client's prompts. ZDR is your answer — or the absence of it is your problem. Don't confuse with: Data residency — ZDR governs duration (how long data is held); residency governs location (where it's processed and stored). You can have residency without ZDR and vice versa. Most regulated buyers need both.
Data Residency — A contractual and architectural constraint specifying the geographic region where data is processed and stored. When it comes up: Any federal, financial services, or healthcare buyer with data sovereignty requirements. Don't confuse with: Encryption at rest — encryption governs protection of stored data, not its location. A provider can encrypt everything and still process it in the wrong jurisdiction.
Evals — Structured tests that measure whether a model's outputs meet defined quality criteria, run in CI/CD pipelines and in production. When it comes up: When a buyer asks "how do we know if the model is still working correctly" — especially after a provider updates the underlying model without notice. Don't confuse with: Unit tests — evals assess semantic correctness, not functional correctness. A model can pass every unit test and still give confidently wrong answers.
AI Observability — The practice of monitoring model behavior, output quality, and response drift in production. When it comes up: Any buyer who's deployed AI and is now asking "how do we govern it ongoing" rather than "how do we deploy it." Don't confuse with: Traditional APM — latency and error rates tell you the system is up; AI observability tells you whether the system is right. Green dashboards are not a compliance posture.
If you remember nothing else from this section: ZDR is what you point to when legal asks whether the provider can be subpoenaed for a client's prompts. If you don't know whether your buyer's provider offers it, find out before the legal review does.
What Comes Next
The plumbing is governed. The identity layer is in place. Every prompt is attributed, every token is counted, every completion is logged, and the data terms are in the contract.
The next question belongs to the auditor: what does all of this look like from a compliance posture?
The risk and compliance chapter picks up from here. The observability pipeline from Lesson 6 is the audit trail. The ZDR and residency terms from Lesson 5 are the contractual foundation. The per-user attribution from Lesson 3 is the accountability layer. The infrastructure you built is the evidence base. The next chapter is about presenting it.
For More Information
| Entry | Source |
|---|---|
| The full layered architecture | Section Opener: The Enterprise AI Stack |
| Sanctioned AI, Shadow AI, provisioning | Lesson 1: Sanctioned vs. Shadow AI — The Provisioning Problem |
| AI gateway, DLP, prompt routing, caching, logging | Lesson 2: AI Gateways — The Proxy Layer |
| Per-user attribution, SSO integration, identity federation | Lesson 3: Identity, SSO, and Per-User Attribution |
| Token budget, chargeback, cost variance, FinOps | Lesson 4: FinOps for AI — Token Budgets, Chargeback, and Cost Variance |
| ZDR, data residency, retention, data perimeter | Lesson 5: Data Governance — Residency, Retention, and What Leaves Your Perimeter |
| Evals, AI observability, production monitoring, drift | Lesson 6: Observability and Evals in Production |

