AI Gateways: The Proxy Layer

By Leigh Garrity— May 6, 2026

What This Category Is and Why It Exists

Portkey, LiteLLM, Kong AI Gateway, and Cloudflare AI Gateway all solve the same foundational infrastructure problem: individual application teams should not hold raw provider API keys. You'll encounter these products in discovery when a platform engineering lead mentions "our LLM proxy," "we're routing everything through Kong," or "we set up an AI gateway last quarter." The term that buys you credibility in that conversation is virtual key — a gateway-issued credential that maps to a real provider key inside the gateway's configuration, scoped per application, centrally revocable without touching the underlying provider credential. If you can use that term correctly and ask a coherent question about deployment model, you'll stay in the conversation instead of nodding through it.

The category exists because "put the key in the .env file" doesn't survive contact with a second team. Once you have forty application teams calling OpenAI, you have forty potential sources of credential exposure, forty independent rate limit counters, and no central record of what was sent to which model when. An AI gateway collapses that into one control plane: one virtual key per application, centralized rate limits, fallback routing when a provider returns errors, and a complete prompt-response log for audit. The four products below are different architectural answers to that same operational requirement.

The Products

Portkey

What it is: A SaaS AI gateway purpose-built around the virtual key concept, with a self-hosted option for teams that need data to stay on-premises.

What it does: Portkey sits between your application and your LLM providers. Each application receives a virtual key — a Portkey-issued credential that maps to one or more real provider API keys stored in Portkey's configuration. The gateway handles routing, rate limiting, and fallback logic: if a GPT-4o call returns a 429, Portkey can automatically retry against Claude 3.5 Sonnet or a secondary Azure OpenAI deployment, based on rules you define. Every prompt-response pair is logged with latency, estimated token cost, and model version. Portkey's API surface mirrors OpenAI's, so existing OpenAI SDK calls route through Portkey with a one-line endpoint change.

Who's behind it: Portkey was founded in 2023 by Rohit Agarwal and Ayush Garg, part of Y Combinator's W23 batch. The product is purpose-built for AI traffic — there's no prior API gateway product underneath it.

What makes it distinct: Semantic caching and the depth of its observability layer. Portkey caches responses to semantically similar prompts, not just identical ones — so "summarize this contract" and "give me a summary of this contract" can resolve to the same cached response. The observability dashboard is readable by non-engineers, which matters when a platform team needs to show usage data to an application owner who doesn't live in Grafana.

LiteLLM

What it is: An open-source Python proxy that presents a unified OpenAI-compatible API across 100+ LLM providers, self-hosted by default.

What it does: You deploy LiteLLM as a server — Docker image, Kubernetes deployment, whatever your infra team prefers. Applications call LiteLLM's endpoint using standard OpenAI SDK syntax; LiteLLM translates the request to whichever provider and model you've configured. Virtual keys are issued per application or team through the admin API or a YAML config file. Rate limits are enforced at the key level. Fallback routing is configured declaratively: if gpt-4o is unavailable, fall back to claude-3-5-sonnet, then to gemini-1.5-pro. LiteLLM Cloud is the managed version for teams that want the capability without running the server.

Who's behind it: BerriAI, founded by Krrish Dholakia. The project has accumulated over 13,000 GitHub stars and a substantial community in r/LocalLLaMA and among enterprise ML platform teams. LiteLLM's README has been cited in more internal architecture reviews than most vendor whitepapers — which is either a testament to the project's quality or an indictment of vendor whitepapers, probably both.

What makes it distinct: Provider breadth and the self-hosted default. If a platform team is running models on AWS Bedrock, Azure OpenAI, a fine-tune on Together AI, and an internal Ollama instance simultaneously, LiteLLM is the product most likely to have all four in its provider list with maintained, tested integrations. The open-source model also means the platform team can inspect and modify the proxy code, which matters for security-conscious buyers who want to audit what's handling their prompts.

Kong AI Gateway

What it is: A set of AI-specific plugins layered onto Kong Gateway, extending an existing enterprise API gateway to handle LLM traffic.

What it does: Platform teams that already run Kong Gateway add the AI plugin suite — AI Proxy, AI Rate Limiting, AI Prompt Guard, AI Request Transformer — through Kong's standard declarative configuration. The AI Proxy plugin manages provider routing and virtual key issuance. AI Rate Limiting applies token-based limits rather than request-based limits, which matters because a single LLM call can consume anywhere from 50 to 50,000 tokens depending on prompt length. Request counting doesn't capture the actual load. Configuration is managed through Kong's existing tooling: deck for declarative config, Kong Ingress Controller for Kubernetes environments. The full Kong plugin ecosystem remains available, so existing authentication, logging, and traffic management plugins apply to AI traffic alongside everything else.

Who's behind it: Kong Inc., which has been in the API gateway market since 2015. The AI plugins are a product extension, not a new product. Kong carries SOC 2 Type II certification, offers enterprise support contracts, and has an established federal customer base — the compliance posture was built for the API gateway business and extends to the AI plugins.

What makes it distinct: The integration with existing Kong deployments. A platform team that already manages Kong Gateway doesn't introduce a new vendor, a new support contract, or a new deployment model. The AI capability is additive. For organizations where procurement and security review cycles are measured in quarters, "extend what we already have" is a materially different conversation than "evaluate a new product."

Cloudflare AI Gateway

What it is: An edge-native AI gateway that runs on Cloudflare's global network, proxying requests to Workers AI and external LLM providers without requiring any server deployment.

What it does: Traffic routes through Cloudflare's edge before reaching the provider. Cloudflare logs requests, applies rate limits, caches responses (including semantic caching), and surfaces a dashboard. For teams using Workers AI, the gateway is essentially zero-configuration — it's in the request path by default. For external providers like OpenAI or Anthropic, you replace the provider's base URL with a Cloudflare-provided endpoint; the SDK call is otherwise unchanged. Virtual keys are managed through Cloudflare's dashboard or API. The free tier covers logging and basic rate limiting; paid tiers add semantic caching and higher log retention.

Who's behind it: Cloudflare. This is a platform feature, not a standalone product — it's priced and supported through Cloudflare's existing account structure. There is no dedicated AI Gateway support tier separate from Cloudflare's standard support tiers.

What makes it distinct: The deployment model is the entire differentiator. There is no server to run, no container to manage, no infrastructure to maintain. The gateway runs at Cloudflare's edge across 300+ cities. For teams already in the Cloudflare ecosystem, the time-to-value is measured in minutes. The tradeoff is that you're entirely in Cloudflare's network, with no self-hosted option, and the edge geography is Cloudflare's, not yours.

Comparison: Four Products Against Four Dimensions

Comparison structure: trait-led analysis. The four products don't cluster into clean pairs, and a competitive ranking would misrepresent what's actually a deployment model decision. The dimensions below are the ones a platform engineering lead will use to evaluate; each product's position on each dimension is a fact about its architecture, not an editorial preference.

Dimension	Portkey	LiteLLM	Kong AI Gateway	Cloudflare AI Gateway
Deployment model	SaaS primary, self-hosted available	Self-hosted primary, managed cloud available	Self-hosted (on-prem, cloud, hybrid)	Edge-only, no self-hosted option
Architectural origin	Purpose-built for AI	Purpose-built for AI	Extended from existing API gateway	Extended from existing CDN/network platform
Provider coverage	20+ major providers	100+ providers, including local/private models	Major cloud providers via plugins	Major cloud providers + Workers AI
Enterprise readiness	SOC 2 Type II (in progress as of Q1 2026), paid support tiers	Enterprise support via BerriAI, self-hosted means your security team owns the posture	SOC 2 Type II, established enterprise contracts, federal references	Cloudflare's existing compliance posture (FedRAMP in progress), standard support tiers
Integration surface	OpenAI-compatible API, SDKs for Python/JS/Go	OpenAI-compatible API, 100+ provider SDKs, Python SDK	Kong plugin ecosystem, deck/KIC config, existing Kong integrations	Cloudflare Workers, REST API, dashboard

Deployment model is where the table above understates the real difference. Cloudflare has no self-hosted path. For a buyer with strict data residency requirements, or a network architecture that can't route production AI traffic through a third-party edge network, that's a disqualifier before the evaluation starts. The other three all offer self-hosted deployment, though the operational complexity varies: running LiteLLM in Kubernetes is a different lift than adding Kong plugins to an existing Kong deployment.

Architectural origin shapes the product roadmap more than any feature list will. Kong and Cloudflare are extending existing infrastructure products to handle AI traffic; their AI capabilities are constrained by what the parent platform can express. Portkey and LiteLLM were built for AI traffic from the start, so they can ship AI-specific features (semantic caching, prompt guard, model-specific retry logic) on their own schedule.

Enterprise readiness in this category means audit logging that satisfies a CISO, a support contract that satisfies procurement, and a compliance posture that satisfies legal. Kong has the most established answer to all three, because it's been selling to enterprises since 2015. Portkey and LiteLLM are earlier in that journey. Cloudflare's answer depends entirely on whether the buyer already has a Cloudflare enterprise contract.

“

Okta Concept Mapping

Virtual keys map most cleanly to OAuth 2.0 client credentials — one credential per application, scoped to a purpose, issued by a central authority, revocable without touching the underlying resource. The issuance and revocation mechanics are genuinely analogous. Where the analogy breaks: an OAuth client_id carries identity that the authorization server recognizes and can make policy decisions against. A virtual key in most gateway implementations is a routing token. The gateway knows which application is using it, but not who within that application, and it doesn't make an authorization decision in the OIDC sense. The gateway is doing access control (this virtual key may call GPT-4o but not Claude 3 Opus) without doing authentication. When a platform lead asks whether Okta app catalog entries can drive virtual key provisioning automatically, the honest answer is: not natively. That's a configuration integration, not a protocol one, and it's covered in the next lesson.

Field Language Guide

Don't say	Do say	Why it matters
Fake API key	Virtual key	"Fake" implies it doesn't work; virtual keys are real credentials the gateway honors
Wrapper around the OpenAI API	AI gateway	Understates routing, rate limiting, and logging; "wrapper" signals you haven't thought about the infrastructure
Logs your prompts	Prompt-response logging	"Logs your prompts" sounds like a privacy incident; "prompt-response logging" is the audit capability
Reroutes if it breaks	Fallback routing	Platform engineers use this term; matching it signals you understand the mechanism, not just the outcome
Cheaper API calls	Semantic caching	The mechanism matters; caching near-identical prompts is different from negotiating rates
Manages your AI spend	Rate limiting	Spend management is a separate capability; rate limiting is what the gateway enforces at the key level
Works with all the AI models	Multi-provider coverage	"All the models" is an overstatement that will get you tested; "multi-provider coverage" is accurate and specific
Easy to set up	Zero-config (Cloudflare) / drop-in proxy (LiteLLM)	Platform engineers don't trust "easy"; specific deployment characteristics are credible
Enterprise-grade	SOC 2 Type II, dedicated support tier	"Enterprise-grade" is marketing; the compliance posture is a fact you can state
Better security	Central credential management	The security claim is vague; the mechanism — one virtual key per app, real key never leaves the gateway — is specific and testable
Blocks bad prompts	Prompt guard	The feature has a name; using it shows you've read the documentation
Token limits	Token-based rate limiting	Distinguishes from request-based rate limiting, which is the distinction that matters for LLM traffic