Your AI gateway logs every request. Every token consumed, every model called, every prompt routed. And none of it answers the only question an auditor actually asks: who did this?
A gateway without identity integration is a surveillance camera pointed at a parking lot full of identical cars. You can count the traffic. Good luck writing the ticket. Per-user attribution is the layer that connects a gateway's mechanical logging to a human name, the layer that turns an audit trail into something with actual accountability behind it. In a federal environment where an IG can subpoena your AI usage records, that distinction is the entire ballgame.
What per-user attribution actually is
Per-user attribution means every request through your AI gateway resolves to an authenticated human identity — the person, rather than an application or an API key. The difference: a log entry that says "API key prod-key-7 consumed 14,000 tokens on Claude 3.5 Sonnet at 2:47 PM" versus one that says "jmorales@agency.gov consumed 14,000 tokens on Claude 3.5 Sonnet at 2:47 PM via the procurement analysis workspace."
The first entry tells you what happened. The second tells you who to call.
Mechanically, per-user attribution requires three things working together: an identity provider authenticating the user (your SSO layer), a gateway that receives and logs that identity with every request, and a platform RBAC model that constrains what the authenticated user can do. All three are familiar on their own. The integration pattern is where the work lives.
- Per-user attribution: tying every AI gateway request to an authenticated human identity rather than an API key or application credential. It turns volume logs into accountability logs.
The key model progression
Most organizations start AI adoption with a shared API key. One key, distributed across teams, maybe dropped into a .env file or a Slack DM. When the monthly bill spikes, you can't tell which team caused it. When the key leaks, you revoke it and break every application that depended on it. When an auditor asks who accessed the model, you shrug.
Per-application keys are the natural next step. One key per project or team, each with its own quota. You can attribute cost to the procurement app versus the HR chatbot. You can rotate one key without breaking the other. But you still can't see people. If three analysts share the procurement app's key, the log shows the app, not the analyst. You've moved from "someone in the building" to "someone on the third floor." The auditor remains unimpressed.
Per-user attribution completes the progression. Gateways that implement this issue virtual API keys mapped to individual users or service principals, and the gateway resolves every request to the underlying identity before logging it. Cost attribution, access control, and audit trails all operate at the human level. You can set spending limits per person, revoke access by deprovisioning a user in your IdP rather than hunting down key copies, and produce an audit trail that names names.
| Key Model | Identifies | Cost Attribution | Audit Answer |
|---|---|---|---|
| Shared API key | "Someone in the org" | None | Shrug |
| Per-application key | The app or team | By project | "Someone on the third floor" |
| Per-user key | The human | By person, team, cost center | jmorales@agency.gov at 2:47 PM |
- Key model progression: shared keys → per-application keys → per-user keys. Each step narrows the accountability gap. Only the last one answers "who did this?"
SSO integration is familiar plumbing with unfamiliar gaps
Your IDAM fluency pays off directly in this layer. The enterprise AI platforms are integrating SSO using the exact protocols you already manage.
Claude Enterprise supports SAML 2.0 and OIDC with Okta, Entra ID, Google Workspace, and Auth0. Anthropic acts as the SP. Your IdP handles authentication and MFA. Enable "Require SSO for Claude" and you've inherited your existing MFA policy without configuring anything new on the AI side. Domain capture routes any login attempt from a company email address to the enterprise workspace, which prevents the shadow-IT problem of employees spinning up personal Claude accounts with corporate credentials.
GitHub Copilot Enterprise supports SAML SSO and SCIM provisioning through Enterprise Managed Users. Portkey supports SSO via OIDC on enterprise plans and SCIM provisioning through Okta SAML apps. (Note the protocol split: Portkey uses OIDC for authentication but requires a SAML app in Okta for SCIM, because Okta doesn't support SCIM provisioning with OIDC apps. If you're configuring both, you need both.)
So far, so familiar. Pay closer attention to the gaps.
SCIM is gated behind enterprise tiers. Claude's SCIM support requires the Enterprise plan, which at current published pricing means a minimum 70-user commitment, 12-month contract, roughly $50K annually. Teams on the $25–30/user/month Team plan get SSO but no SCIM. You can authenticate users but can't automatically provision or deprovision them. For a federal IT team managing hundreds of accounts, manual deprovisioning is a compliance problem that scales linearly with headcount.
Deprovisioning has timing holes. Even when SCIM is available, Azure Entra users face a roughly 40-minute propagation delay for SCIM changes to Claude, per Stitchflow, a SaaS management vendor that tested the integration. (This hasn't been confirmed in Microsoft's or Anthropic's primary documentation, but the operational implication is worth flagging: a terminated employee could retain Claude access for nearly an hour after offboarding.) GitHub's EMU model suspends the entire managed user account on SCIM deprovisioning, but locally cached Copilot tokens may persist until expiry.
Audit logs don't all log the same things. Claude Enterprise provides per-user audit logs with exportable CSV breakdowns of individual request counts, token consumption, and model usage. That's real per-user attribution. GitHub Copilot's audit log records seat management events (who has access, when policies changed) but does not include client session data such as prompts sent to Copilot locally. You know who had a Copilot seat. You don't know what they asked it to generate. If the IG's question is "what did the developer ask the AI assistant to produce, and did it touch controlled unclassified information," GitHub's native audit log can't answer it. Closing that gap requires a custom gateway layer or a logging hook that captures prompt-level data before it reaches the model.
The SCIM provisioning pattern in AI platforms is mechanically identical to what Okta manages for SaaS apps today: user/group CRUD via a RESTful API against a defined schema. The constraint is tier gating. SCIM availability depends on the AI vendor's enterprise plan, not on your IdP's capability. Your Okta instance is ready. The receiving end may not be.
- SSO integration: SAML and OIDC are the actual protocols, and they work. The gaps are in SCIM tier gating, deprovisioning timing, and uneven audit log depth across platforms.
RBAC in AI platforms mirrors cloud IAM, up to a point
The workspace and role models emerging in AI tooling will look immediately recognizable to anyone who's managed AWS IAM policies or Azure RBAC.
Portkey implements an organization → workspace hierarchy where each workspace is a fully isolated unit with its own API keys, logs, usage limits, and prompt configurations. IdP groups map to workspace roles via SCIM. Claude Enterprise maps IdP groups to three roles: Primary Owner (full admin including billing), Admin (users, policies, audit logs), and Member (usage within admin-defined policies). OpenAI's API platform lets workspace Owners create custom roles controlling end-user access to tools, with permissions resolved as the maximum across inherited roles.
The structural parallel to cloud IAM is direct. Organization-level governance, project-level isolation, role-based access inherited from group membership. If you've explained Azure RBAC's scope hierarchy to a buyer, you can explain Portkey's workspace model in the same conversation.
The structural parallel is real. The maturity gap is enormous. Cloud IAM platforms offer conditional access policies, just-in-time privilege elevation, and attribute-based access control that evaluates context at request time. AI platform RBAC has none of that yet. The roles are static. There's no equivalent of Azure PIM where a developer can request elevated model access for a two-hour window and have it automatically expire. Workspace isolation controls who can reach the AI platform, and that's where its authority ends. Claude Enterprise lets you configure system prompts per workspace to enforce use-case constraints, but system prompts are instructions to the model, not policy enforcement. The model can be persuaded to ignore them. That's a fundamentally different security posture than an IAM policy that denies a request at the infrastructure layer.
Cost attribution follows the cloud pattern more closely. Gateways that support per-user identity can slice spend by team, environment, cost center, user, or service principal. This mirrors AWS Cost Explorer's tagging model. For a CIO who already manages cloud cost allocation, AI cost attribution through the same identity layer is a natural extension of what they already do.
Okta's group-to-application assignment model maps directly to AI platform workspace access. Assign an IdP group to a Portkey workspace or a Claude Enterprise role, and provisioning follows the same lifecycle you already govern. The mechanism is familiar. The governance model inside the workspace, once access is granted, is still immature — AI platform RBAC lacks the conditional and attribute-based controls your buyers expect from mature cloud IAM.
- RBAC patterns: AI platforms are replicating cloud IAM's org → project → role hierarchy. Group-based provisioning from your IdP works. Cost attribution follows cloud tagging patterns. But the role models are static and coarse compared to cloud IAM, with no conditional access, no just-in-time elevation, and no policy enforcement below the workspace boundary.
Where your OAuth intuition stops helping
Everything above should feel like home. SSO, SCIM, RBAC, group-based provisioning, cost attribution by identity. Your IDAM mental model maps cleanly onto AI platform governance.
It stops mapping cleanly right about here.
You know how OAuth scopes work. An issuer sets them, a resource server checks them. "Read." "Write." "Admin." A token carries its scopes, the resource server enforces them, and the system is well-understood.
Agentic AI changes the shape of the problem. A scope tells you whether the caller has permission. Whether a specific autonomous action is appropriate given the current context is a different question entirely, and scopes weren't designed to answer it. The reasons are structural.
First, granularity. OAuth scopes work well for coarse, role-level permissions, but fine-grained resource-level authorization ("this agent can read document A and document C but not document B") pushes against practical limits of what a token can carry. You can't encode per-resource permissions, time-based rules, or contextual constraints in a static scope string without the mechanism becoming unwieldy. This reflects a design boundary in the token model itself.
Second, agents create chains of requests. Each individual call in the chain might be authorized by its scope. But the combined outcome across a sequence of tool invocations can produce a result that no single token check would have approved. OAuth validates each request on its own terms. Agents execute plans that span multiple requests. The granularity mismatch is fundamental.
Third, the trust model. OAuth and SAML assume that once an entity is authenticated, it remains trustworthy for the session duration. AI agents introduce adversarial risks that break this assumption: prompt injection can modify an agent's behavior mid-session, context drift can shift its intent, and the scope granted at session start says nothing about whether the agent's behavior at minute fifteen still reflects what the user authorized.
The industry is working on pieces of this. RFC 8693 (Token Exchange) provides a partial answer for delegation chains, letting systems exchange tokens with attenuated scopes at each hop. Continuous Access Evaluation Protocol (CAEP) pushes toward real-time revocation when risk conditions change. Databricks' Unity AI Gateway (a governance layer for model access within Databricks' data platform) supports on-behalf-of user execution, where an agent's tool calls execute with the requesting user's exact permissions rather than a shared service account's. But no shipped standard fully solves the action governance layer: evaluating individual tool calls against policy, independent of the identity layer. IETF draft work on agent authentication exists but isn't standardized. This is genuinely unsettled, and anyone who tells you otherwise is selling something that doesn't exist yet.
Okta's authorization servers issue tokens with scopes that resource servers enforce. This mechanism works for AI platform access control the same way it works for any SaaS app. Where it breaks: scopes are static, coarse, and evaluated at issuance. Agentic AI needs dynamic, fine-grained, continuously evaluated authorization. The full action governance layer for autonomous agents remains an industry-wide unsolved problem.
- The analogy break: OAuth scopes govern access. Agentic AI requires action governance: evaluating whether a specific tool call is appropriate given current context, beyond whether the caller holds the right scope. Scopes are the starting point. The rest of the problem is still open.
When you'll need this
A federal CISO is evaluating an AI coding assistant rollout for 500 developers. They've already approved the gateway. They're logging requests. They feel good about observability.
Then someone asks: "When the IG asks which developer used the AI assistant to generate code that touched a controlled unclassified system, can your logs answer that question by name?" If the answer involves tracing an API key back to a team Slack channel and asking who was working that day, they don't have per-user attribution. They have a logging system that will embarrass them under examination.
The follow-up is where your IDAM expertise becomes genuine insight: "You already have the identity infrastructure to solve this. Your IdP authenticates every user. SCIM can provision and deprovision them. RBAC can constrain what they access. The gap is connecting that identity layer to the AI gateway so every request carries a name, not just a key."
And if they're thinking about agentic workflows — agents that call tools, chain actions, operate with some autonomy — you can be honest about where the map runs out. Scopes get the agent through the door. Governing what it does once inside requires something the industry is still building. Nobody has a clean answer yet.
The buyer will respect you more for saying so.
- Practical application: per-user attribution connects existing identity infrastructure to AI gateway logs, connecting observability to accountability. For agentic contexts, be honest that scope-based authorization handles the access question but leaves the action governance question open — and that's an unsolved industry problem.
Things to follow up on...
-
Anthropic's Workload Identity Federation: Claude now supports authenticating workloads via short-lived OIDC tokens from your existing IdP instead of static API keys, using the RFC 7523 jwt-bearer grant to attribute API calls to a federated workload identity rather than just "the API key."
-
GitHub's SAML/SCIM audit enrichment: GitHub Enterprise Cloud now includes SAML and SCIM identity data in audit log events, surfacing
external_identity_nameidandexternal_identity_usernamefields so logs from multiple systems can be linked using a common corporate identity. -
Databricks Unity AI Gateway: Databricks merged its AI Gateway into Unity Catalog's governance model, enabling on-behalf-of user execution where an agent's tool calls run with the requesting user's exact permissions rather than a shared service account's.
-
IETF agent authentication drafts: The emerging draft-klrc-aiagent-auth-01 proposes concepts like maximum delegation depth to cap how many hops authority can propagate in agent chains, though nothing here is standardized yet.

