Somewhere in your buyer's org, developers are calling OpenAI and Anthropic with API keys stored in environment variables, Slack threads, and config files nobody audits. An AI gateway sits between every internal consumer and every external model provider. It holds the real API keys centrally. It issues virtual keys scoped to teams and applications. It logs every request. When the CISO asks "who sent what to which model last Tuesday," the gateway is what makes the question answerable.
Four products dominate this conversation in enterprise accounts: Cloudflare AI Gateway, LiteLLM, Kong AI Gateway, and Portkey. You'll encounter Cloudflare when the buyer already routes traffic through their network. LiteLLM surfaces in engineering-led orgs that default to open source. Kong appears where the buyer already governs REST APIs through Kong's platform. Portkey shows up in teams building AI-native applications who want observability fast. Knowing which one your buyer is evaluating, and what it actually can and can't do, lets you shape the conversation rather than chase it.
Cloudflare AI Gateway
What it is: A managed SaaS proxy for LLM traffic, running on Cloudflare's global edge network.
What it does: Teams change their API endpoint URL to route through Cloudflare instead of calling providers directly. Cloudflare handles caching, rate limiting, request logging, and unified billing across multiple providers. Provider API keys live in Cloudflare's Secrets Store, not in application code.
Who's behind it: Cloudflare (publicly traded, $40B+ market cap). AI Gateway is part of their broader developer platform alongside Workers, R2, and Vectorize.
What makes it distinct: Zero infrastructure to manage. If the buyer already runs on Cloudflare, this is a URL change, not a deployment. The cost of that simplicity: all prompts and completions transit Cloudflare's infrastructure. For regulated accounts with data residency requirements, that's the first question and possibly the last.
LiteLLM
What it is: An open-source Python proxy server that provides a unified OpenAI-compatible API across 100+ model providers.
What it does: Teams deploy a self-hosted proxy (Docker, Kubernetes, bare metal) that translates every provider's API into a single interface. Virtual keys, spend tracking, rate limiting, and team-level budget enforcement are available. Production deployment requires a self-managed PostgreSQL database and Redis instance.
Who's behind it: BerriAI, a startup. The core proxy is open source under MIT license. SSO, RBAC, and advanced audit logging require a commercial enterprise license.
What makes it distinct: You own the infrastructure. Nothing leaves your network unless you send it to a provider. The cost of that ownership is real: you're running Postgres, Redis, and the proxy itself, and you're responsible for keeping all three healthy. A Hacker News commenter described it as "a locally hosted OpenRouter that doesn't charge you for routing." Accurate for the appeal. Silent on the operational load.
Kong AI Gateway
What it is: A set of AI-specific plugins running on top of Kong Gateway, the enterprise API management platform.
What it does: AI plugins handle prompt routing, token-based rate limiting, per-request OpenTelemetry spans, PII sanitization, and guardrails integration (AWS Guardrails, Azure AI Content Safety, Google Model Armor). AI traffic inherits Kong's existing governance layer: RBAC, workspace isolation, OIDC authentication, signed audit logs.
Who's behind it: Kong Inc. (privately held, Series D). AI Gateway ships as plugins within Kong Enterprise and Konnect, their cloud-managed control plane. There is no standalone AI Gateway product.
What makes it distinct: If the buyer already runs Kong for API governance, AI traffic gets the same policy enforcement their REST APIs already have. The OIDC plugin is the most mature IdP integration in this group, supporting authorization code flow, client credentials, and claims-based authorization. Kong publishes performance benchmarks for AI Gateway, but these were produced by Kong in a proxy-only configuration with no policies applied. Expect different numbers in a production deployment running RBAC and content safety plugins. The flip side of the add-on model: if the buyer doesn't already run Kong, they're adopting an enterprise API gateway to get AI proxy features. That's a different size conversation.
Portkey
What it is: An AI gateway and observability platform focused on application-level routing, logging, and reliability.
What it does: Teams route LLM traffic through Portkey for fallback routing, semantic caching, request logging, and trace analysis. Portkey offers SSO via OIDC and SAML 2.0 (with documented Okta and Azure AD setup guides), RBAC with workspace isolation, and SCIM for user lifecycle management. The gateway was open-sourced in March 2026.
Who's behind it: Portkey AI (startup, venture-backed). The open-source gateway handles routing; enterprise governance features (SSO, advanced RBAC, audit trails) remain in the commercial tiers.
What makes it distinct: Fastest path to observability for a team building an AI application. The tradeoff is scope. Portkey is optimized for individual applications, not organization-wide governance across dozens of teams. Log-based pricing compounds this: the Pro tier caps at 3M logs/month, and exceeding that stops logging, not requests. Observability fails silently at scale.
Every gateway here issues virtual keys to teams while holding the real provider API keys centrally. This is structurally identical to OAuth client credentials: the authorization server (gateway) issues scoped tokens (virtual keys) so individual clients never touch the underlying resource credentials. The analogy holds for issuance and scoping. It breaks on revocation and lifecycle. OAuth tokens expire and refresh through well-defined protocol flows. Virtual keys in most gateways are manually created and manually revoked, with no standard lifecycle protocol between the gateway and the identity provider. That gap is where governance gets hard, and where the IdP conversation starts.
Comparing the Four, Trait by Trait
Gateway capabilities are moving fast. What follows reflects documented, generally available features as of early 2026, sourced from official vendor documentation. These are snapshots of the category's current shape, not product reviews with a shelf life.
A flat four-column feature matrix becomes wallpaper at this scale. Three traits matter most for enterprise evaluation. All four gateways are addressed within each.
Deployment Model
The fundamental split: who runs the infrastructure that sees your prompts?
Cloudflare is SaaS-only. LLM traffic transits their edge network. You can disable logging to reduce data exposure, but that eliminates the observability benefit you bought the gateway for.
LiteLLM is self-hosted-only. BerriAI does not offer a managed option. You deploy the proxy, the database, and the cache. You own the data. You also own the 2 AM pages when Postgres needs a vacuum.
Kong offers the widest deployment flexibility: fully self-hosted (open source or enterprise licensed), hybrid mode (customer data plane, Konnect control plane), or fully managed via Konnect. The hybrid model matters most for regulated buyers. Prompts stay in your infrastructure while the management plane runs in Kong's cloud.
Portkey defaults to SaaS but offers a hybrid enterprise architecture where the data plane (the component that actually processes prompts) runs in your VPC while the control plane stays managed by Portkey. Fully air-gapped deployment with no Portkey-managed component is not clearly documented as a standard offering.
One note for public sector conversations: when a SaaS gateway vendor commits to data residency or zero data retention, that commitment is contractual, not architectural. The contract says they won't store your prompts. The architecture means your prompts still transit their infrastructure. For buyers where that distinction matters, self-hosted or hybrid data-plane models are the only options that resolve it at the technical layer rather than relying on the legal layer alone.
Enterprise Governance Readiness
This is where the products pull apart.
Cloudflare has the thinnest governance layer. Rate limiting operates at the request level, not the token level (you can cap how many calls a team makes, not how many tokens they consume). There is no documented per-gateway RBAC, no workspace isolation model, and no SCIM provisioning. The 10M log cap per gateway is an architectural constraint Cloudflare is actively working to address, but it's the current ceiling. Fine for exploratory workloads. A hard limit for production AI services at agency scale.
LiteLLM documents a governance layer that includes six defined RBAC roles, SSO via OIDC and SAML, team-level budget enforcement, and audit logging. All of it requires the commercial enterprise license. The open-source version cannot integrate with corporate Okta. SSO is free for up to 5 users as of v1.76.0, which covers a proof of concept and nothing else. Service Accounts remain explicitly marked beta.
Kong has the most mature governance layer, because it existed before AI Gateway did. RBAC, workspace isolation, signed audit logs, and OIDC-based authentication are GA features of Kong Enterprise. AI traffic inherits these controls. Token-based quota controls (not just request-based) are available for AI traffic specifically. The constraint: RBAC and the OIDC plugin require the enterprise license. Open-source Kong lacks them entirely.
Portkey sits between Cloudflare and Kong. SSO (OIDC and SAML with documented Okta setup), RBAC, workspace isolation, and SCIM are available in the enterprise tier. The governance model works well within a single application's scope. It strains across an organization with dozens of teams running independent AI workloads, each generating logs that count against a shared cap.
In zero trust architecture, a reverse proxy enforces policy on every request between the user and the resource. The AI gateway plays exactly this role for LLM traffic. The concept maps cleanly. What doesn't map is the policy language: zero trust policies evaluate identity, device posture, and network context, while gateway policies evaluate token budgets, content safety, and model routing. The gateway is the PEP, but it needs an external PDP for identity. That's the IdP's job, and it's the seam where these systems meet.
Integration Surface
How does the gateway connect to the identity and observability infrastructure the buyer already runs?
Cloudflare relies on account-level Cloudflare Access for user authentication. The AI Gateway itself does not expose SSO, OIDC, or SAML integration points. Observability is via Logpush to external S3 or SIEM at usage-based pricing ($0.05/million records after the included tier, as of this writing). No documented OpenTelemetry integration for AI Gateway traffic.
LiteLLM integrates SSO for its Admin UI, with Okta and Google named as supported providers. Observability runs through callback integrations to tools like Langfuse, MLflow, and Helicone rather than native OTel export. No documented SCIM support.
Kong has the deepest integration surface. The OIDC plugin supports all standard flows including claims-based authorization. OpenTelemetry is native, with per-request AI span attributes and aggregated metrics. Audit logs export via SIEM webhook. If the buyer's SOC already consumes Kong telemetry, AI traffic shows up in the same pipeline with no additional plumbing.
Portkey documents OIDC and SAML SSO with step-by-step Okta setup, plus SCIM for automated user provisioning and deprovisioning. Observability is through Portkey's own logging and trace analysis platform, with Langfuse integration available. No documented native OTel export.
When your buyer says "token," they might mean an OAuth access token (the credential that proves identity) or an LLM token (the unit of text consumption that drives cost). These are completely unrelated concepts sharing a word. The gateway tracks LLM tokens for cost and rate limiting. The IdP issues access tokens for authentication. In a well-architected stack, the access token gets the request through the gateway's front door; the LLM token count determines whether the request stays under budget once inside. If you hear "token management" in a buyer conversation, clarify which token before you respond.
How to Say This in the Field
| Don't say | Do say | Why it matters |
|---|---|---|
| "You need an AI gateway." | "How are your teams getting access to model APIs today? Are they sharing keys, or is there a central proxy?" | Discovery, not prescription — lets the buyer reveal their current state. |
| "LiteLLM is free and open source." | "LiteLLM's proxy is open source. The governance features — SSO, RBAC, budget enforcement — those require their enterprise license." | Buyers who hear "free" discover the gap at the worst possible time. |
| "Cloudflare AI Gateway is easy to set up." | "Cloudflare is the fastest path if you're already on their network. The constraint is that all prompts route through Cloudflare infrastructure, and rate limits are request-based, not token-based." | Surfaces the data residency question before the CISO does. |
| "Kong has an AI gateway." | "Kong added AI plugins to their existing API gateway. If you already run Kong, your AI traffic gets the same RBAC and audit logging your APIs already have." | Frames Kong AI correctly as add-on, which is actually a strength for existing Kong shops. |
| "Portkey handles AI governance." | "Portkey is strong for application-level routing and observability. If you need org-wide governance across many teams, ask how they handle quota enforcement at that scale." | Prevents the buyer from over-scoping Portkey's sweet spot. |
| "The gateway handles identity." | "The gateway enforces policy based on identity it receives from your IdP. It doesn't issue or manage credentials itself." | Keeps the identity layer correctly positioned — sets up the next conversation. |
| "These products all do the same thing." | "They solve the same core problem — centralizing model access — but they differ on who runs the infrastructure, how mature the governance layer is, and what they integrate with." | Shows you understand the category without flattening real differences. |
| "We should replace their gateway." | "The gateway creates the chokepoint. Does that chokepoint connect to your identity layer? That's how you know who's consuming what." | Positions identity as complementary to whatever gateway they chose. |
| "Kong is overkill for AI." | "Kong makes sense if you're already running it. If you're not, you're adopting an enterprise API platform to get AI proxy features. Different decision." | Honest framing that respects the buyer's context. |
| "Cloudflare logs everything." | "Cloudflare logs up to 10 million requests per gateway. Past that, you need their Logpush add-on to stream to your own SIEM." | The cap surprises buyers who assumed unlimited logging — specificity builds credibility. |
What This Means for the Next Conversation
The gateway creates the chokepoint. Every model request becomes visible, attributable, subject to policy. A log full of virtual key IDs with no identity behind them, though, is accounting with anonymous accounts.
The gateway receives identity as an input. Where that identity comes from, how it's provisioned, and how it gets enforced across the gateway's policy layer — that's what connects this infrastructure to the platform you sell. That's Lesson 3.
Things to follow up on...
- LiteLLM's supply chain incident: In March 2026, compromised PyPI publishing credentials led to two poisoned LiteLLM package versions being live for up to 180 minutes before quarantine, a reminder that self-hosted doesn't mean supply-chain-safe — Zscaler's analysis covers the full attack chain.
- Portkey's open-source gateway move: Portkey open-sourced its unified gateway in March 2026, though enterprise governance features like SSO and advanced RBAC remain in commercial tiers — The New Stack's coverage has the details on what shipped and what didn't.
- Cloudflare's log scaling architecture: Cloudflare's engineering team published a detailed post on how they're working to push past the 10M-per-gateway log ceiling using sharded Durable Objects, worth tracking for when that constraint lifts.
- Kong's OTel conventions for AI traffic: Kong now emits per-request OpenTelemetry span attributes specifically for AI and MCP traffic, which aligns with the experimental OTel GenAI semantic conventions that are becoming the observability standard for LLM workloads.

