The Reference Architecture
Each layer owns exactly one thing.
[Employee or Application]
↓
[SSO / IdP]
Identity assertion, per-user attribution
↓
[AI Gateway]
Authentication, routing, DLP, caching, logging
↓
[Provider(s) with Negotiated Data Terms]
Inference, ZDR commitment, regional endpoints
↓
[Observability and Eval Pipeline]
Prompt/completion logging, latency, quality, safety
↓
[FinOps Dashboard with Chargeback]
Token budget, cost attribution, variance reporting| Layer | Owns | Does Not Own |
|---|---|---|
| SSO / IdP | Who the user is | What they're allowed to do with AI |
| AI Gateway | Policy enforcement at the request level | Identity assertion |
| Provider | Inference and data handling commitments | Your audit log |
| Observability pipeline | What happened, in sequence | Whether it should have happened |
| FinOps dashboard | Cost attribution and chargeback | Usage policy |
If you remember nothing else about the architecture: The gateway is where policy becomes enforcement. Everything above it is identity. Everything below it is evidence.
Layer-by-Layer Entries
Identity and SSO
Sanctioned AI — AI tooling provisioned through IT, with identity lifecycle management attached. When it comes up: When a buyer asks how they'll know who's using what, and whether access gets revoked when someone leaves. Don't confuse with: Approved AI — a model or vendor that legal has reviewed. Sanctioned means provisioned, not just permitted.
Shadow AI — AI usage that bypasses provisioned channels: personal API keys, consumer-tier accounts, browser extensions the IT team doesn't know about. When it comes up: In discovery, when a CISO asks what the organization's AI exposure actually is. The honest answer usually involves a number that surprises someone. Don't confuse with: Unsanctioned use of sanctioned tools — an employee using the enterprise ChatGPT instance for something outside policy. Shadow AI is the tool itself being off the books.
Per-user attribution — The ability to tie a specific AI request to a specific authenticated identity, not just a team or application. When it comes up: In any conversation about audit trails, cost chargeback, or incident investigation. "We log all requests" is not the same as "we know who made them." Don't confuse with: Authentication. A user can be authenticated to an AI tool without their identity being carried through to the request log. The gap between those two states is where attribution lives.
The Gateway Layer
AI Gateway — A proxy layer that sits between the user or application and the AI provider, enforcing authentication, routing requests, applying DLP policies, caching responses, and writing to the audit log. When it comes up: When a buyer asks how they enforce policy across multiple AI providers without building separate integrations for each one. The gateway is the single enforcement point. Don't confuse with: An API gateway in the traditional sense. The function is similar — hence the mapping below — but AI gateways handle prompt/completion pairs, and the DLP and eval hooks are purpose-built for unstructured content.
DLP at the gateway — Data loss prevention applied to prompts before they leave the perimeter, scanning for PII, credentials, regulated data, or anything the organization has decided shouldn't reach an external model. When it comes up: In regulated industries, in federal procurement conversations, and whenever a CISO asks "what stops an employee from pasting a contract into GPT-4." Don't confuse with: DLP on completions. Scanning outbound prompts and scanning inbound completions are separate controls. Both matter; most early deployments only implement one.
Provider Data Terms
Zero data retention (ZDR) — A contractual commitment from the AI provider that prompts and completions will not be stored, logged, or used for model training after the request completes. When it comes up: In enterprise agreement negotiations, in federal procurement, and in any conversation where legal is in the room. ZDR is what legal wants confirmed before the agency signs. Don't confuse with: Encryption in transit. ZDR is a contractual data handling commitment. The provider can honor it or not — your audit mechanism is the contract and the provider's compliance attestation, not a cryptographic proof.
Regional endpoint — A provider-side configuration that routes inference requests to compute infrastructure in a specified geography, supporting data residency requirements. When it comes up: In EU deployments, in federal civilian agency conversations, and anywhere a DPA specifies where data may be processed. Don't confuse with: ZDR. Regional endpoints govern where processing happens; ZDR governs whether the data persists afterward. A request can satisfy one requirement and fail the other.
Observability and Evals
Observability — The logging and monitoring layer that captures what happened: which requests were made, by whom, to which model, with what latency, at what cost. When it comes up: In any conversation about audit readiness, incident response, or cost attribution. Observability is the precondition for everything else in this list. Don't confuse with: Evals. Observability tells you what happened. Evals tell you whether it was good.
Evals (evaluation pipeline) — Automated or human-in-the-loop assessment of AI output quality, safety, and policy compliance, running against logged completions in production. When it comes up: When a buyer asks how they'll know if the model starts producing wrong or harmful outputs after deployment. Evals are the answer — a continuous production process running against real traffic, not a benchmark you validate once before launch. Don't confuse with: Testing. Pre-deployment testing validates a model against a fixed benchmark. Evals in production validate behavior against real traffic, which drifts.
FinOps
Token budget — A pre-allocated spending limit, expressed in LLM tokens, assigned to a team, application, or user for a billing period. When it comes up: When a finance stakeholder asks how AI costs get controlled before they appear on the monthly invoice as a surprise. Don't confuse with: A rate limit. A rate limit is a provider-side control on requests per minute. A token budget is an organization-side control on cumulative spend. Both can be in effect simultaneously.
Chargeback — The mechanism by which AI infrastructure costs are allocated back to the consuming team or cost center, based on attributed usage. When it comes up: In any conversation with a CFO or budget owner about how AI costs are governed. Chargeback is how AI spend becomes visible at the team level rather than pooled in IT overhead. Don't confuse with: Showback. Showback reports usage without transferring cost. Chargeback transfers cost. The distinction matters to finance; it often doesn't matter to the team being charged until it does.
Vocabulary Mapping Tables
Table 1: Token
| AI Term | What It Means in AI | IDAM Equivalent | Key Divergence |
|---|---|---|---|
| Token (LLM) | The unit of text the model processes; the unit of cost you pay for | Bearer token (OAuth) | An LLM token is a billing unit, not an authorization artifact. It carries no identity, has no scope, and cannot be revoked. Calling both "tokens" in the same conversation produces confusion that is entirely avoidable and rarely avoided. |
Table 2: Session, Context, Scope
| AI Term | What It Means in AI | IDAM Equivalent | Key Divergence |
|---|---|---|---|
| Session (LLM) | A conversation thread; the sequence of turns the model holds in context | Authenticated session | An LLM session has no authentication event and no server-side state. It ends when the context window fills or the client closes it — not when a timeout fires or a token expires. |
| Context window | The maximum amount of text (in tokens) the model can hold in a single inference call | Session scope | A context window is a memory constraint. Expanding it costs money; it grants no additional permissions. |
| Scope (model) | Informally used to describe what a model or agent is allowed to do | OAuth scope | OAuth scope is a formal authorization claim carried in a token and validated by a resource server. "Model scope" is a configuration convention with no equivalent enforcement mechanism unless the gateway implements it. |
Table 3: Agent and Identity
| AI Term | What It Means in AI | IDAM Equivalent | Key Divergence |
|---|---|---|---|
| Agent | An AI system that takes actions autonomously, often calling tools or APIs on behalf of a user or process | Service account / non-human identity | A service account has a stable identity, a credential lifecycle, and audit trail. An AI agent may have none of these by default. The agent identity problem — who is this thing, what is it allowed to do, how do you revoke it — is unsettled. Not covered above, but worth knowing: MCP (Model Context Protocol) working groups are actively addressing agent identity; the spec is in active revision as of this writing. |
Enterprise IT Mapping Table
| Architecture Layer | Closest Enterprise IT Equivalent | What Transfers Directly | What Doesn't Transfer |
|---|---|---|---|
| AI Gateway | Reverse proxy / API gateway | Request interception, routing, policy enforcement, logging | DLP on unstructured content; prompt/completion pair handling; eval hooks |
| SSO / IdP integration | Identity provider federation (SAML, OIDC) | Authentication, SSO, session initiation | Per-request attribution to the AI audit log; token budget assignment |
| Per-user attribution | SCIM provisioning | User identity carried through to downstream systems | Cost attribution by token consumption; the unit here is spend, not entitlement |
| ZDR commitment | Data processing addendum (DPA) | Contractual data handling obligations; vendor accountability | Technical enforceability; ZDR is attested, not cryptographically verified |
| Observability pipeline | SIEM / log aggregation (Splunk, Sentinel) | Centralized log collection, alerting, retention policy | Quality and safety evaluation; SIEM doesn't know if a completion was wrong |
| FinOps dashboard with chargeback | IT chargeback / showback system | Cost center allocation, budget tracking, variance reporting | Token-level granularity; the unit of cost is novel and requires new instrumentation |
| Provider with negotiated data terms | Vendor contract + DPA | Data handling obligations, SLA, liability terms | Inference-time controls; the contract governs what happens to data, not what the model does with it |
If you remember nothing else about the enterprise IT mapping: The gateway maps cleanly to a reverse proxy until you need to evaluate what's inside the request. At that point the analogy holds the shape but not the tooling.
If You Remember Nothing Else
On architecture: The stack has six layers. Each layer owns exactly one thing. When a governance question lands without a clear owner, a layer is missing.
On vocabulary: "Token" means something in OAuth and something different in an LLM cost report. In the same conversation, they will collide. Name the collision before the buyer does.
On ZDR: It is a contract term. Verification runs through the provider's compliance attestation — there is no log you can pull.
On observability vs. evals: Observability tells you what happened. Evals tell you whether it should have. Both are required. Most deployments have one.
On chargeback: Chargeback is how AI spend becomes a team-level accountability problem rather than an IT overhead line. Finance will eventually ask for it. Better to have the infrastructure before they do.
For More Information, See…
| Entry | Source |
|---|---|
| The full stack overview; the fourteen-teams-on-one-API-key origin story | Opening: The Enterprise AI Stack: From Raw API Key to Governed Platform |
| Sanctioned vs. shadow AI; the provisioning gap; AI as an identity lifecycle problem | Lesson 1: Sanctioned vs. Shadow AI: The Provisioning Problem |
| Gateway architecture; DLP at the proxy layer; routing and caching | Lesson 2: AI Gateways: The Proxy Layer |
| SSO integration; per-user attribution; the gap between authentication and audit attribution | Lesson 3: Identity, SSO, and Per-User Attribution |
| Token budgets; chargeback vs. showback; cost variance and forecasting | Lesson 4: FinOps for AI: Token Budgets, Chargeback, and Cost Variance |
| ZDR commitments; regional endpoints; data processing addenda; what leaves the perimeter | Lesson 5: Data Governance: Residency, Retention, and What Leaves Your Perimeter |
| Observability pipeline; evals in production; the distinction between logging and evaluation | Lesson 6: Observability and Evals in Production |
The governed stack described in this chapter makes the audit possible. What the audit actually looks like — who conducts it, what it examines, what a finding means for the deployment — belongs to the next chapter.

