Chapter Recap: Enterprise Deployment

By Carey Whitten— May 5, 2026

The Reference Architecture

Every layer you covered maps to something you already manage.

Layer	Function in AI Stack	Enterprise IT Equivalent	Okta's Position
Identity / SSO	Authenticates the user or application; provides per-user context to every downstream layer	Identity Provider — the same SAML/OIDC flows you've deployed for SaaS	Core Okta territory; Workforce Identity feeds user context into gateway policy
AI Gateway	Intercepts every prompt and completion; enforces auth, routing, DLP, caching, and logging	Reverse proxy / API gateway — the layer in front of your internal APIs	Okta identity signals feed gateway policy; the gateway itself is a separate product category
Provider(s)	Executes inference; holds negotiated data terms (ZDR, residency, retention)	SaaS vendor with a DPA — except the "data" is your users' prompts	Not Okta's layer; where legal and procurement live
Observability / Evals	Monitors model behavior, response quality, and drift in production	APM + SIEM — except the signals are semantic, not just latency and errors	Not Okta's layer; feeds the audit trail that compliance will eventually ask for
FinOps / Chargeback	Tracks token consumption by user, team, and application; allocates cost	Showback/chargeback in cloud FinOps — same model, different unit (tokens, not compute-hours)	Okta user identity is the attribution key that makes per-team chargeback possible
Provisioning	Controls which users and applications can access which AI services	Lifecycle Management / SCIM provisioning — the flows that onboard users to SaaS	Okta Lifecycle Management; Entra ID Governance where Microsoft is the IdP

If you remember nothing else from this section: The gateway is not the identity layer. Both are required, and conflating them is the fastest way to lose credibility with a security architect.

Vocabulary Collision Zones

Three terms that mean different things to different people in the same meeting.

Token

AI Term	What It Means in AI	IDAM Equivalent	Key Divergence
Token	A unit of text — roughly ¾ of a word — that the model processes; the billing unit for every API call	OAuth access token — a credential that proves authorization	An OAuth token grants access. An LLM token gets consumed during inference. Same word, completely different function — and confusing them in a FinOps conversation will end the conversation.

Gateway

AI Term	What It Means in AI	IDAM Equivalent	Key Divergence
AI Gateway	A proxy layer that intercepts prompts and completions to enforce policy: auth, routing, DLP, caching, logging	API gateway (Kong, Apigee, AWS API GW) — a layer that manages API traffic	A traditional API gateway is primarily concerned with routing and rate-limiting. An AI gateway adds semantic inspection — it reads the content of the request, not just its headers. That's a fundamentally different threat surface.

Observability

AI Term	What It Means in AI	IDAM Equivalent	Key Divergence
AI Observability	Monitoring model behavior, output quality, and drift — did the model start answering differently than it did last month?	APM / SIEM — latency, error rates, log aggregation	Traditional observability is structural: did the system respond? AI observability is semantic: did the system respond correctly? You can have green dashboards and a model that's quietly hallucinating.

If you remember nothing else from this section: When a buyer says "token," ask which kind. The answer tells you whether you're talking to finance or security — and whether the conversation is about billing or breach.

Layer-by-Layer Reference

Ordered by position in the stack, top to bottom.

Sanctioned AI — AI services formally approved, provisioned, and governed by IT. When it comes up: Any conversation about AI policy, acceptable use, or why the CISO is involved in what looks like a productivity tool purchase. Don't confuse with: Managed AI — sanctioned means approved for use; managed means actively governed with logging, attribution, and policy enforcement. You want both.

Shadow AI — AI services employees are using without IT's knowledge or approval, usually because the sanctioned path is too slow. When it comes up: When a buyer says "we don't have an AI problem yet." Fourteen teams on one personal API key is an AI problem. Don't confuse with: Shadow IT broadly — shadow AI carries additional exposure because the data leaving the perimeter is often conversational and context-rich in ways that a rogue SaaS subscription isn't.

AI Gateway — The proxy layer between your users and your AI providers; the enforcement point for auth, routing, DLP, caching, and logging. When it comes up: Any architecture conversation where the buyer asks "how do we control what goes to the model." Don't confuse with: The identity layer — the gateway enforces policy; the IdP establishes who the user is. The gateway needs the IdP to do its job. Neither replaces the other.

Per-User Attribution — The ability to trace every prompt and completion back to a specific authenticated user identity. When it comes up: The moment legal or HR asks "who sent that?" — which will happen. Don't confuse with: Logging — you can log everything and still not know who sent it if the gateway is sitting behind a shared service account. Attribution requires identity, not just records.

Token Budget — A defined limit on how many LLM tokens a user, team, or application can consume in a period. When it comes up: FinOps conversations, budget cycle planning, any buyer who's had a surprise AI bill. Don't confuse with: Rate limiting — a rate limit controls requests per second; a token budget controls total consumption over time. Both are real controls; they answer different questions.

Chargeback — Allocating AI infrastructure costs to the business units that generated them, based on token consumption. When it comes up: Any conversation with a CFO or IT finance lead who wants to know why the AI line item is growing. Don't confuse with: Showback — showback reports consumption without transferring cost; chargeback actually moves money. Buyers often say chargeback when they mean showback. Clarify before you architect anything.

Zero Data Retention (ZDR) — A contractual term with an AI provider specifying that prompt and completion data is not stored after inference completes. When it comes up: When legal asks whether the provider can be subpoenaed for a client's prompts. ZDR is your answer — or the absence of it is your problem. Don't confuse with: Data residency — ZDR governs duration (how long data is held); residency governs location (where it's processed and stored). You can have residency without ZDR and vice versa. Most regulated buyers need both.

Data Residency — A contractual and architectural constraint specifying the geographic region where data is processed and stored. When it comes up: Any federal, financial services, or healthcare buyer with data sovereignty requirements. Don't confuse with: Encryption at rest — encryption governs protection of stored data, not its location. A provider can encrypt everything and still process it in the wrong jurisdiction.

Evals — Structured tests that measure whether a model's outputs meet defined quality criteria, run in CI/CD pipelines and in production. When it comes up: When a buyer asks "how do we know if the model is still working correctly" — especially after a provider updates the underlying model without notice. Don't confuse with: Unit tests — evals assess semantic correctness, not functional correctness. A model can pass every unit test and still give confidently wrong answers.

AI Observability — The practice of monitoring model behavior, output quality, and response drift in production. When it comes up: Any buyer who's deployed AI and is now asking "how do we govern it ongoing" rather than "how do we deploy it." Don't confuse with: Traditional APM — latency and error rates tell you the system is up; AI observability tells you whether the system is right. Green dashboards are not a compliance posture.

If you remember nothing else from this section: ZDR is what you point to when legal asks whether the provider can be subpoenaed for a client's prompts. If you don't know whether your buyer's provider offers it, find out before the legal review does.

What Comes Next

The plumbing is governed. The identity layer is in place. Every prompt is attributed, every token is counted, every completion is logged, and the data terms are in the contract.

The next question belongs to the auditor: what does all of this look like from a compliance posture?

The risk and compliance chapter picks up from here. The observability pipeline from Lesson 6 is the audit trail. The ZDR and residency terms from Lesson 5 are the contractual foundation. The per-user attribution from Lesson 3 is the accountability layer. The infrastructure you built is the evidence base. The next chapter is about presenting it.

For More Information

Entry	Source
The full layered architecture	Section Opener: The Enterprise AI Stack
Sanctioned AI, Shadow AI, provisioning	Lesson 1: Sanctioned vs. Shadow AI — The Provisioning Problem
AI gateway, DLP, prompt routing, caching, logging	Lesson 2: AI Gateways — The Proxy Layer
Per-user attribution, SSO integration, identity federation	Lesson 3: Identity, SSO, and Per-User Attribution
Token budget, chargeback, cost variance, FinOps	Lesson 4: FinOps for AI — Token Budgets, Chargeback, and Cost Variance
ZDR, data residency, retention, data perimeter	Lesson 5: Data Governance — Residency, Retention, and What Leaves Your Perimeter
Evals, AI observability, production monitoring, drift	Lesson 6: Observability and Evals in Production