A Reference Architecture for Enterprise AI

By Carey Whitten— May 5, 2026

A Reference Architecture for Enterprise AI

The Reference Architecture

Each layer owns exactly one thing.

[Employee or Application]
         ↓
      [SSO / IdP]
    Identity assertion, per-user attribution
         ↓
    [AI Gateway]
    Authentication, routing, DLP, caching, logging
         ↓
  [Provider(s) with Negotiated Data Terms]
    Inference, ZDR commitment, regional endpoints
         ↓
  [Observability and Eval Pipeline]
    Prompt/completion logging, latency, quality, safety
         ↓
  [FinOps Dashboard with Chargeback]
    Token budget, cost attribution, variance reporting

Layer	Owns	Does Not Own
SSO / IdP	Who the user is	What they're allowed to do with AI
AI Gateway	Policy enforcement at the request level	Identity assertion
Provider	Inference and data handling commitments	Your audit log
Observability pipeline	What happened, in sequence	Whether it should have happened
FinOps dashboard	Cost attribution and chargeback	Usage policy

If you remember nothing else about the architecture: The gateway is where policy becomes enforcement. Everything above it is identity. Everything below it is evidence.

Layer-by-Layer Entries

Identity and SSO

Sanctioned AI — AI tooling provisioned through IT, with identity lifecycle management attached. When it comes up: When a buyer asks how they'll know who's using what, and whether access gets revoked when someone leaves. Don't confuse with: Approved AI — a model or vendor that legal has reviewed. Sanctioned means provisioned, not just permitted.

Shadow AI — AI usage that bypasses provisioned channels: personal API keys, consumer-tier accounts, browser extensions the IT team doesn't know about. When it comes up: In discovery, when a CISO asks what the organization's AI exposure actually is. The honest answer usually involves a number that surprises someone. Don't confuse with: Unsanctioned use of sanctioned tools — an employee using the enterprise ChatGPT instance for something outside policy. Shadow AI is the tool itself being off the books.

Per-user attribution — The ability to tie a specific AI request to a specific authenticated identity, not just a team or application. When it comes up: In any conversation about audit trails, cost chargeback, or incident investigation. "We log all requests" is not the same as "we know who made them." Don't confuse with: Authentication. A user can be authenticated to an AI tool without their identity being carried through to the request log. The gap between those two states is where attribution lives.

The Gateway Layer

AI Gateway — A proxy layer that sits between the user or application and the AI provider, enforcing authentication, routing requests, applying DLP policies, caching responses, and writing to the audit log. When it comes up: When a buyer asks how they enforce policy across multiple AI providers without building separate integrations for each one. The gateway is the single enforcement point. Don't confuse with: An API gateway in the traditional sense. The function is similar — hence the mapping below — but AI gateways handle prompt/completion pairs, and the DLP and eval hooks are purpose-built for unstructured content.

DLP at the gateway — Data loss prevention applied to prompts before they leave the perimeter, scanning for PII, credentials, regulated data, or anything the organization has decided shouldn't reach an external model. When it comes up: In regulated industries, in federal procurement conversations, and whenever a CISO asks "what stops an employee from pasting a contract into GPT-4." Don't confuse with: DLP on completions. Scanning outbound prompts and scanning inbound completions are separate controls. Both matter; most early deployments only implement one.

Provider Data Terms

Zero data retention (ZDR) — A contractual commitment from the AI provider that prompts and completions will not be stored, logged, or used for model training after the request completes. When it comes up: In enterprise agreement negotiations, in federal procurement, and in any conversation where legal is in the room. ZDR is what legal wants confirmed before the agency signs. Don't confuse with: Encryption in transit. ZDR is a contractual data handling commitment. The provider can honor it or not — your audit mechanism is the contract and the provider's compliance attestation, not a cryptographic proof.

Regional endpoint — A provider-side configuration that routes inference requests to compute infrastructure in a specified geography, supporting data residency requirements. When it comes up: In EU deployments, in federal civilian agency conversations, and anywhere a DPA specifies where data may be processed. Don't confuse with: ZDR. Regional endpoints govern where processing happens; ZDR governs whether the data persists afterward. A request can satisfy one requirement and fail the other.

Observability and Evals

Observability — The logging and monitoring layer that captures what happened: which requests were made, by whom, to which model, with what latency, at what cost. When it comes up: In any conversation about audit readiness, incident response, or cost attribution. Observability is the precondition for everything else in this list. Don't confuse with: Evals. Observability tells you what happened. Evals tell you whether it was good.

Evals (evaluation pipeline) — Automated or human-in-the-loop assessment of AI output quality, safety, and policy compliance, running against logged completions in production. When it comes up: When a buyer asks how they'll know if the model starts producing wrong or harmful outputs after deployment. Evals are the answer — a continuous production process running against real traffic, not a benchmark you validate once before launch. Don't confuse with: Testing. Pre-deployment testing validates a model against a fixed benchmark. Evals in production validate behavior against real traffic, which drifts.

FinOps

Token budget — A pre-allocated spending limit, expressed in LLM tokens, assigned to a team, application, or user for a billing period. When it comes up: When a finance stakeholder asks how AI costs get controlled before they appear on the monthly invoice as a surprise. Don't confuse with: A rate limit. A rate limit is a provider-side control on requests per minute. A token budget is an organization-side control on cumulative spend. Both can be in effect simultaneously.

Chargeback — The mechanism by which AI infrastructure costs are allocated back to the consuming team or cost center, based on attributed usage. When it comes up: In any conversation with a CFO or budget owner about how AI costs are governed. Chargeback is how AI spend becomes visible at the team level rather than pooled in IT overhead. Don't confuse with: Showback. Showback reports usage without transferring cost. Chargeback transfers cost. The distinction matters to finance; it often doesn't matter to the team being charged until it does.

Vocabulary Mapping Tables

Table 1: Token

AI Term	What It Means in AI	IDAM Equivalent	Key Divergence
Token (LLM)	The unit of text the model processes; the unit of cost you pay for	Bearer token (OAuth)	An LLM token is a billing unit, not an authorization artifact. It carries no identity, has no scope, and cannot be revoked. Calling both "tokens" in the same conversation produces confusion that is entirely avoidable and rarely avoided.

Table 2: Session, Context, Scope

AI Term	What It Means in AI	IDAM Equivalent	Key Divergence
Session (LLM)	A conversation thread; the sequence of turns the model holds in context	Authenticated session	An LLM session has no authentication event and no server-side state. It ends when the context window fills or the client closes it — not when a timeout fires or a token expires.
Context window	The maximum amount of text (in tokens) the model can hold in a single inference call	Session scope	A context window is a memory constraint. Expanding it costs money; it grants no additional permissions.
Scope (model)	Informally used to describe what a model or agent is allowed to do	OAuth scope	OAuth scope is a formal authorization claim carried in a token and validated by a resource server. "Model scope" is a configuration convention with no equivalent enforcement mechanism unless the gateway implements it.

Table 3: Agent and Identity

AI Term	What It Means in AI	IDAM Equivalent	Key Divergence
Agent	An AI system that takes actions autonomously, often calling tools or APIs on behalf of a user or process	Service account / non-human identity	A service account has a stable identity, a credential lifecycle, and audit trail. An AI agent may have none of these by default. The agent identity problem — who is this thing, what is it allowed to do, how do you revoke it — is unsettled. Not covered above, but worth knowing: MCP (Model Context Protocol) working groups are actively addressing agent identity; the spec is in active revision as of this writing.

Enterprise IT Mapping Table

Architecture Layer	Closest Enterprise IT Equivalent	What Transfers Directly	What Doesn't Transfer
AI Gateway	Reverse proxy / API gateway	Request interception, routing, policy enforcement, logging	DLP on unstructured content; prompt/completion pair handling; eval hooks
SSO / IdP integration	Identity provider federation (SAML, OIDC)	Authentication, SSO, session initiation	Per-request attribution to the AI audit log; token budget assignment
Per-user attribution	SCIM provisioning	User identity carried through to downstream systems	Cost attribution by token consumption; the unit here is spend, not entitlement
ZDR commitment	Data processing addendum (DPA)	Contractual data handling obligations; vendor accountability	Technical enforceability; ZDR is attested, not cryptographically verified
Observability pipeline	SIEM / log aggregation (Splunk, Sentinel)	Centralized log collection, alerting, retention policy	Quality and safety evaluation; SIEM doesn't know if a completion was wrong
FinOps dashboard with chargeback	IT chargeback / showback system	Cost center allocation, budget tracking, variance reporting	Token-level granularity; the unit of cost is novel and requires new instrumentation
Provider with negotiated data terms	Vendor contract + DPA	Data handling obligations, SLA, liability terms	Inference-time controls; the contract governs what happens to data, not what the model does with it

If you remember nothing else about the enterprise IT mapping: The gateway maps cleanly to a reverse proxy until you need to evaluate what's inside the request. At that point the analogy holds the shape but not the tooling.

If You Remember Nothing Else

On architecture: The stack has six layers. Each layer owns exactly one thing. When a governance question lands without a clear owner, a layer is missing.

On vocabulary: "Token" means something in OAuth and something different in an LLM cost report. In the same conversation, they will collide. Name the collision before the buyer does.

On ZDR: It is a contract term. Verification runs through the provider's compliance attestation — there is no log you can pull.

On observability vs. evals: Observability tells you what happened. Evals tell you whether it should have. Both are required. Most deployments have one.

On chargeback: Chargeback is how AI spend becomes a team-level accountability problem rather than an IT overhead line. Finance will eventually ask for it. Better to have the infrastructure before they do.

For More Information, See…

Entry	Source
The full stack overview; the fourteen-teams-on-one-API-key origin story	Opening: The Enterprise AI Stack: From Raw API Key to Governed Platform
Sanctioned vs. shadow AI; the provisioning gap; AI as an identity lifecycle problem	Lesson 1: Sanctioned vs. Shadow AI: The Provisioning Problem
Gateway architecture; DLP at the proxy layer; routing and caching	Lesson 2: AI Gateways: The Proxy Layer
SSO integration; per-user attribution; the gap between authentication and audit attribution	Lesson 3: Identity, SSO, and Per-User Attribution
Token budgets; chargeback vs. showback; cost variance and forecasting	Lesson 4: FinOps for AI: Token Budgets, Chargeback, and Cost Variance
ZDR commitments; regional endpoints; data processing addenda; what leaves the perimeter	Lesson 5: Data Governance: Residency, Retention, and What Leaves Your Perimeter
Observability pipeline; evals in production; the distinction between logging and evaluation	Lesson 6: Observability and Evals in Production

The governed stack described in this chapter makes the audit possible. What the audit actually looks like — who conducts it, what it examines, what a finding means for the deployment — belongs to the next chapter.