Why the AI Gateway Layer Exists — and How to Talk About It
When a platform engineering lead says "we need to centralize our AI provider access," they're describing a problem that most enterprises hit around the same moment: someone runs a cost report, sees fourteen different teams billing to the same OpenAI API key, and nobody can explain which team called what, when, or why the bill is three times what was budgeted. The AI gateway category exists to solve that problem. Portkey, LiteLLM, Kong AI Gateway, and Cloudflare AI Gateway are the four names you'll encounter most often in enterprise conversations about this layer. Knowing what each one is — and knowing the precise language that signals fluency to a platform engineering lead — is what separates a productive discovery call from one that ends with "we'll loop in our architects."
The Four Subjects
Portkey
What it is: A SaaS-delivered AI gateway focused on observability, routing, and prompt management for teams building on top of large language models.
What it does: Portkey sits between your application code and your AI providers, handling request routing, fallback logic, semantic caching, and detailed logging. Teams get virtual keys — scoped credentials that route through Portkey rather than hitting provider APIs directly. The platform tracks latency, cost, and token consumption per key, per team, per request. Enterprise tiers add SSO, RBAC, and audit export.
Who's behind it: Founded in 2023, Portkey (portkey.ai) is a venture-backed startup that came out of the AI-native developer tooling wave. Their design philosophy is developer-first — the SDK wraps the OpenAI client interface, so adoption friction is low.
What makes it distinct: Portkey's semantic cache is the differentiator relative to the other three. The cache matches on semantic similarity, so two prompts asking the same thing in different words return the same cached response. For high-volume use cases with repetitive query patterns, the cost reduction is material. No other subject in this comparison ships this as a native feature.
[Accuracy review required: Portkey feature set, founding date, and funding status should be verified against current vendor documentation before production.]
LiteLLM
What it is: An open-source Python proxy that normalizes the API interface across 100-plus AI providers to a single OpenAI-compatible format, with an enterprise tier that adds access controls and audit logging.
What it does: LiteLLM's core function is interface normalization. You write your application against one API shape — OpenAI's — and LiteLLM handles the translation to whatever provider you're actually calling: Anthropic, Azure OpenAI, Bedrock, Cohere, Vertex AI, local models via Ollama. The proxy layer enforces rate limits, tracks spend by team or user, and routes traffic based on rules you define. The enterprise version (LiteLLM Proxy Enterprise) adds SSO via OIDC, RBAC, and audit logging.
Who's behind it: BerriAI, a small team that open-sourced LiteLLM in 2023. The project has significant community traction on GitHub and is frequently cited on r/LocalLLaMA and Hacker News as the default answer to "how do I stop hardcoding provider keys." The community signal here is real — this is what practitioners are actually deploying.
What makes it distinct: The normalization layer is the differentiator. LiteLLM is the only subject in this comparison whose primary value proposition is making provider APIs interchangeable at the code level. The others route traffic and enforce policy; LiteLLM does that and also abstracts the provider interface itself. For teams that want to switch providers without rewriting application code, this matters.
[Accuracy review required: Provider count, BerriAI team size, and enterprise feature set should be verified against current documentation before production.]
Kong AI Gateway
What it is: An AI-specific extension of Kong Gateway, the enterprise API gateway platform, delivered as a set of plugins that add LLM routing, prompt governance, and AI observability to Kong's existing infrastructure.
What it does: Kong AI Gateway adds AI-specific plugins — AI Proxy, AI Rate Limiting, AI Prompt Guard, AI Request Transformer — on top of Kong's mature gateway infrastructure. If you're already running Kong Gateway, the AI capabilities are an incremental add to an existing deployment. If you're not, you're adopting Kong's full stack. Deployment is self-hosted (Kong Gateway open source or enterprise) or managed via Kong Konnect. Enterprise readiness is inherited from Kong's existing platform: SSO, RBAC, audit logging, and a developer portal are all table stakes in Kong's enterprise offering.
Who's behind it: Kong Inc., founded in 2017, with a large installed base of enterprise API management customers. Kong Gateway has been in production at Fortune 500 companies for years. The AI Gateway is a 2024-era extension, not a new product.
What makes it distinct: Procurement continuity. For the large installed base of Kong enterprise customers, adopting Kong AI Gateway means no new vendor, no new security review, no new contract. The AI capabilities plug into an existing trust relationship. For a platform engineering lead who's already managing Kong, the conversation is about plugin configuration, not architecture.
[Accuracy review required: Kong founding date, plugin names, and Konnect managed offering details should be verified against current Kong documentation before production.]
Cloudflare AI Gateway
What it is: A SaaS AI gateway delivered from Cloudflare's global edge network, providing logging, caching, rate limiting, and provider routing for AI API traffic.
What it does: Cloudflare AI Gateway intercepts requests to AI providers by routing them through Cloudflare's network. You change your provider endpoint URL to point at Cloudflare's gateway, and Cloudflare handles logging, caching, rate limiting, and fallback. Because it runs on Cloudflare's edge, latency characteristics are different from a centrally-hosted proxy — requests are handled close to where they originate. DLP capabilities are available through integration with Cloudflare's broader Zero Trust platform (Cloudflare Gateway and CASB). Analytics live in the Cloudflare dashboard alongside your other Cloudflare traffic data.
Who's behind it: Cloudflare, Inc. — public company, established enterprise procurement relationships, existing FedRAMP authorization for some services. The AI Gateway launched in 2024 as part of Cloudflare's Workers AI platform expansion.
What makes it distinct: Network position. Cloudflare AI Gateway is the only subject in this comparison that runs at the CDN edge rather than as a discrete proxy service. For organizations already in the Cloudflare ecosystem — using Cloudflare for DNS, DDoS mitigation, or Zero Trust — the AI Gateway is another surface in an existing control plane. The DLP integration story is also more mature here than in the other three, because Cloudflare's CASB and Gateway products have been doing content inspection for years.
[Accuracy review required: FedRAMP authorization scope, DLP integration details, and launch date should be verified against current Cloudflare documentation before production.]
Comparison
Structure note: Four subjects across three dimensions don't compress cleanly into a single table without losing the nuance that makes this comparison useful. Each dimension gets its own section, with all four subjects present throughout. Category logic before product specifics.
Deployment Model
Portkey is SaaS-only. There is no self-hosted option in the standard offering. For enterprises with data handling requirements that prohibit third-party SaaS in the request path, this is a hard stop. For teams that want fast time-to-value and don't have those constraints, it's the lowest-friction option.
LiteLLM is self-hosted by default. The open-source proxy runs in your environment — Docker, Kubernetes, whatever you're running. The enterprise tier adds managed deployment options, but the architecture assumption is that you control the infrastructure. This makes LiteLLM the natural choice for air-gapped or highly regulated environments where the proxy itself cannot be a third-party service.
Kong AI Gateway is self-hosted (Kong Gateway) or managed via Kong Konnect. The managed option is a hybrid: Kong manages the control plane, you run the data plane in your environment. For enterprises already on Kong, the deployment model is already decided.
Cloudflare AI Gateway is SaaS, delivered from Cloudflare's edge. Like Portkey, there is no self-hosted option — the value proposition is inseparable from Cloudflare's network. Most enterprises have already made the decision to trust Cloudflare in their request path, so this isn't a new evaluation; it's an extension of an existing one.
Enterprise Readiness
SSO integration, audit logging, and role-based access control are the three markers. All four subjects offer them in their enterprise tiers. The differences are in maturity, integration depth, and what "enterprise tier" actually costs.
Portkey added SSO (SAML/OIDC) and RBAC in 2024. Audit logs are exportable. The enterprise tier is priced per seat with a minimum commitment. For a startup or mid-market company, this is accessible. For a large enterprise procurement process, Portkey is still building the paper trail — security questionnaires, SOC 2 Type II, the usual. [Accuracy review required: Portkey's SOC 2 status and enterprise pricing model.]
LiteLLM enterprise adds SSO via OIDC, RBAC with team-level scoping, and audit logging. Because the proxy runs in your environment, the audit logs stay in your environment — which is either a feature or a management burden depending on your logging infrastructure. The enterprise pricing is annual contract, direct with BerriAI. [Accuracy review required: LiteLLM enterprise pricing structure.]
Kong AI Gateway inherits Kong's enterprise readiness story, which is the most mature of the four. Kong has been selling to enterprise security teams since 2017. SOC 2, ISO 27001, SSO, RBAC, audit logging, developer portal, role separation between admins and consumers — all of this exists and has been through enterprise procurement cycles many times. The AI Gateway plugins sit on top of this foundation.
Cloudflare AI Gateway inherits Cloudflare's enterprise posture, which is similarly mature. SSO, RBAC, and audit logging are table stakes in Cloudflare's enterprise tier. The AI Gateway analytics integrate with Cloudflare's existing logging infrastructure, which means if you're already shipping Cloudflare logs to your SIEM, AI Gateway traffic comes along.
Routing Surface
Routing is where the four subjects diverge most sharply, because the word means something different to each of them.
Portkey offers fallback logic (if Provider A fails, route to Provider B), load balancing across providers, and conditional routing based on request metadata. The semantic cache sits here too — a cache hit means no provider call at all. Portkey supports all major providers and can route to self-hosted models via compatible endpoints.
LiteLLM offers the most extensive multi-provider support of the four — 100-plus providers, all normalized to the OpenAI interface. Fallback logic is configurable. Routing can be based on cost (route to the cheapest model that meets latency requirements), load, or explicit rules. The normalization layer means routing changes don't require application code changes.
Kong AI Gateway routing is handled through Kong's plugin architecture. The AI Proxy plugin handles provider routing; fallback and retry logic are configured at the plugin level. Multi-provider support covers the major commercial providers. For organizations running Kong, routing configuration lives alongside their existing API routing configuration — same tooling, same deployment pipeline.
Cloudflare AI Gateway handles fallback and caching. The caching layer is CDN-native — Cloudflare's edge cache, which has different performance characteristics than an application-layer cache. Multi-provider support covers the major commercial providers. Routing logic is less flexible than LiteLLM or Portkey — Cloudflare's strength is network-layer performance, not routing sophistication.
Field Language Guide
The primary accuracy problem in buyer conversations about this layer is conflation: with a WAF (which inspects for malicious payloads), with an API management platform (which handles lifecycle, versioning, developer portals, and monetization), or with a load balancer (which distributes traffic without policy enforcement). A platform engineering lead will catch these conflations immediately. The table below is the correction.
| Don't say | Do say | Why it matters |
|---|---|---|
| "It's like a WAF for your AI traffic" | "It's a policy enforcement layer for AI provider access" | A WAF inspects for attack signatures. This layer enforces access policy and provides observability. Different problem, different architecture. |
| "It's an API management platform" | "It's an AI traffic control layer — narrower scope than full API management" | API management handles versioning, developer portals, monetization, and lifecycle. This layer handles credential abstraction, routing, and logging for AI provider calls specifically. |
| "It's a load balancer" | "It routes AI traffic based on policy rules, not just traffic distribution" | Load balancers distribute requests for performance. This layer makes routing decisions based on cost, availability, model capability, and access policy. |
| "It proxies your API calls" | "It centralizes credential management and enforces per-team access controls" | "Proxy" is technically accurate but undersells the policy function. The credential centralization is the enterprise value proposition. |
| "It prevents prompt injection" | "Some gateways include DLP-adjacent content inspection; prompt injection defense is a separate control" | DLP features in gateways inspect for data exfiltration patterns. Prompt injection is an application-layer problem that requires different tooling. Don't conflate them. |
| "It gives you rate limiting" | "It gives you per-team rate limiting with attribution — you can see which team hit the limit and why" | Generic rate limiting is table stakes. The value is the attribution layer that tells you who's consuming what. |
| "It's a security tool" | "It's an access control and observability layer — security is one function, not the whole product" | Calling it a security tool sends the buyer to their security team, not their platform team. The buyer for this layer is usually platform engineering. |
| "It replaces your API gateway" | "It sits alongside your existing API infrastructure, scoped to AI provider traffic" | Kong is the exception here — if they're already on Kong, the AI Gateway extends their existing gateway. But for the others, this is additive, not replacement. |
| "It handles data residency" | "It centralizes access control and logging; data residency is a separate conversation about provider configuration" | Data residency and retention policy are out of scope for this layer. Conflating them creates compliance expectations the gateway can't meet. |
| "It's open source" | "LiteLLM is open source; the others are commercial products with open-source components or community tiers" | "Open source" means different things for each subject. LiteLLM's core is Apache-licensed. The others are commercial. Precision matters when procurement asks. |
| "It integrates with your identity provider" | "The enterprise tiers support SSO via SAML or OIDC — ask which IdP they're running and confirm compatibility before the next call" | Don't assert compatibility you haven't confirmed. The right move is to surface the question, not close it prematurely. |
| "It's easy to deploy" | "Portkey and Cloudflare are SaaS — no deployment required. LiteLLM and Kong require infrastructure. Which model fits your environment?" | "Easy" is relative to the deployment model. Self-hosted options require Kubernetes or equivalent. Know which model you're selling before the call. |
IDAM Concept Mapping: The Token Broker Analogy
The closest IDAM analog to an AI gateway is an OAuth authorization server acting as a credential broker — teams receive scoped tokens rather than raw credentials, and the broker maintains the audit trail of who got what and when. The analogy holds for the core value proposition: centralized issuance, scope limitation, and attribution. Where it breaks is revocation and token lifecycle. OAuth tokens have expiry, refresh, and revocation semantics built into the protocol. AI gateway "virtual keys" are typically static credentials with routing rules attached — revocation means deleting the key or changing the routing policy, not triggering a protocol-level revocation event. The practical implication for a buyer conversation: if a platform engineering lead asks "how does this integrate with our PAM solution," the answer is more nuanced than it would be for a true token broker. The gateway centralizes access, but the credential lifecycle management story is less mature than what they're used to from IDAM infrastructure.
Preview note: This piece was produced as a demonstration. All vendor feature claims, pricing structures, founding dates, and community statistics require verification against current vendor documentation before production publication. Community source references (r/LocalLLaMA, Hacker News) reflect general practitioner discourse patterns and should be confirmed with specific citations.

