What This Category Is and Why It Exists
Portkey, LiteLLM, Kong AI Gateway, and Cloudflare AI Gateway all solve the same foundational infrastructure problem: individual application teams should not hold raw provider API keys. You'll encounter these products in discovery when a platform engineering lead mentions "our LLM proxy," "we're routing everything through Kong," or "we set up an AI gateway last quarter." The term that buys you credibility in that conversation is virtual key — a gateway-issued credential that maps to a real provider key inside the gateway's configuration, scoped per application, centrally revocable without touching the underlying provider credential. If you can use that term correctly and ask a coherent question about deployment model, you'll stay in the conversation instead of nodding through it.
The category exists because "put the key in the .env file" doesn't survive contact with a second team. Once you have forty application teams calling OpenAI, you have forty potential sources of credential exposure, forty independent rate limit counters, and no central record of what was sent to which model when. An AI gateway collapses that into one control plane: one virtual key per application, centralized rate limits, fallback routing when a provider returns errors, and a complete prompt-response log for audit. The four products below are different architectural answers to that same operational requirement.
The Products
Portkey
What it is: A SaaS AI gateway purpose-built around the virtual key concept, with a self-hosted option for teams that need data to stay on-premises.
What it does: Portkey sits between your application and your LLM providers. Each application receives a virtual key — a Portkey-issued credential that maps to one or more real provider API keys stored in Portkey's configuration. The gateway handles routing, rate limiting, and fallback logic: if a GPT-4o call returns a 429, Portkey can automatically retry against Claude 3.5 Sonnet or a secondary Azure OpenAI deployment, based on rules you define. Every prompt-response pair is logged with latency, estimated token cost, and model version. Portkey's API surface mirrors OpenAI's, so existing OpenAI SDK calls route through Portkey with a one-line endpoint change.
Who's behind it: Portkey was founded in 2023 by Rohit Agarwal and Ayush Garg, part of Y Combinator's W23 batch. The product is purpose-built for AI traffic — there's no prior API gateway product underneath it.
What makes it distinct: Semantic caching and the depth of its observability layer. Portkey caches responses to semantically similar prompts, not just identical ones — so "summarize this contract" and "give me a summary of this contract" can resolve to the same cached response. The observability dashboard is readable by non-engineers, which matters when a platform team needs to show usage data to an application owner who doesn't live in Grafana.
LiteLLM
What it is: An open-source Python proxy that presents a unified OpenAI-compatible API across 100+ LLM providers, self-hosted by default.
What it does: You deploy LiteLLM as a server — Docker image, Kubernetes deployment, whatever your infra team prefers. Applications call LiteLLM's endpoint using standard OpenAI SDK syntax; LiteLLM translates the request to whichever provider and model you've configured. Virtual keys are issued per application or team through the admin API or a YAML config file. Rate limits are enforced at the key level. Fallback routing is configured declaratively: if gpt-4o is unavailable, fall back to claude-3-5-sonnet, then to gemini-1.5-pro. LiteLLM Cloud is the managed version for teams that want the capability without running the server.
Who's behind it: BerriAI, founded by Krrish Dholakia. The project has accumulated over 13,000 GitHub stars and a substantial community in r/LocalLLaMA and among enterprise ML platform teams. LiteLLM's README has been cited in more internal architecture reviews than most vendor whitepapers — which is either a testament to the project's quality or an indictment of vendor whitepapers, probably both.
What makes it distinct: Provider breadth and the self-hosted default. If a platform team is running models on AWS Bedrock, Azure OpenAI, a fine-tune on Together AI, and an internal Ollama instance simultaneously, LiteLLM is the product most likely to have all four in its provider list with maintained, tested integrations. The open-source model also means the platform team can inspect and modify the proxy code, which matters for security-conscious buyers who want to audit what's handling their prompts.
Kong AI Gateway
What it is: A set of AI-specific plugins layered onto Kong Gateway, extending an existing enterprise API gateway to handle LLM traffic.
What it does: Platform teams that already run Kong Gateway add the AI plugin suite — AI Proxy, AI Rate Limiting, AI Prompt Guard, AI Request Transformer — through Kong's standard declarative configuration. The AI Proxy plugin manages provider routing and virtual key issuance. AI Rate Limiting applies token-based limits rather than request-based limits, which matters because a single LLM call can consume anywhere from 50 to 50,000 tokens depending on prompt length. Request counting doesn't capture the actual load. Configuration is managed through Kong's existing tooling: deck for declarative config, Kong Ingress Controller for Kubernetes environments. The full Kong plugin ecosystem remains available, so existing authentication, logging, and traffic management plugins apply to AI traffic alongside everything else.
Who's behind it: Kong Inc., which has been in the API gateway market since 2015. The AI plugins are a product extension, not a new product. Kong carries SOC 2 Type II certification, offers enterprise support contracts, and has an established federal customer base — the compliance posture was built for the API gateway business and extends to the AI plugins.
What makes it distinct: The integration with existing Kong deployments. A platform team that already manages Kong Gateway doesn't introduce a new vendor, a new support contract, or a new deployment model. The AI capability is additive. For organizations where procurement and security review cycles are measured in quarters, "extend what we already have" is a materially different conversation than "evaluate a new product."
Cloudflare AI Gateway
What it is: An edge-native AI gateway that runs on Cloudflare's global network, proxying requests to Workers AI and external LLM providers without requiring any server deployment.
What it does: Traffic routes through Cloudflare's edge before reaching the provider. Cloudflare logs requests, applies rate limits, caches responses (including semantic caching), and surfaces a dashboard. For teams using Workers AI, the gateway is essentially zero-configuration — it's in the request path by default. For external providers like OpenAI or Anthropic, you replace the provider's base URL with a Cloudflare-provided endpoint; the SDK call is otherwise unchanged. Virtual keys are managed through Cloudflare's dashboard or API. The free tier covers logging and basic rate limiting; paid tiers add semantic caching and higher log retention.
Who's behind it: Cloudflare. This is a platform feature, not a standalone product — it's priced and supported through Cloudflare's existing account structure. There is no dedicated AI Gateway support tier separate from Cloudflare's standard support tiers.
What makes it distinct: The deployment model is the entire differentiator. There is no server to run, no container to manage, no infrastructure to maintain. The gateway runs at Cloudflare's edge across 300+ cities. For teams already in the Cloudflare ecosystem, the time-to-value is measured in minutes. The tradeoff is that you're entirely in Cloudflare's network, with no self-hosted option, and the edge geography is Cloudflare's, not yours.
Comparison: Four Products Against Four Dimensions
Comparison structure: trait-led analysis. The four products don't cluster into clean pairs, and a competitive ranking would misrepresent what's actually a deployment model decision. The dimensions below are the ones a platform engineering lead will use to evaluate; each product's position on each dimension is a fact about its architecture, not an editorial preference.
| Dimension | Portkey | LiteLLM | Kong AI Gateway | Cloudflare AI Gateway |
|---|---|---|---|---|
| Deployment model | SaaS primary, self-hosted available | Self-hosted primary, managed cloud available | Self-hosted (on-prem, cloud, hybrid) | Edge-only, no self-hosted option |
| Architectural origin | Purpose-built for AI | Purpose-built for AI | Extended from existing API gateway | Extended from existing CDN/network platform |
| Provider coverage | 20+ major providers | 100+ providers, including local/private models | Major cloud providers via plugins | Major cloud providers + Workers AI |
| Enterprise readiness | SOC 2 Type II (in progress as of Q1 2026), paid support tiers | Enterprise support via BerriAI, self-hosted means your security team owns the posture | SOC 2 Type II, established enterprise contracts, federal references | Cloudflare's existing compliance posture (FedRAMP in progress), standard support tiers |
| Integration surface | OpenAI-compatible API, SDKs for Python/JS/Go | OpenAI-compatible API, 100+ provider SDKs, Python SDK | Kong plugin ecosystem, deck/KIC config, existing Kong integrations | Cloudflare Workers, REST API, dashboard |
Deployment model is where the table above understates the real difference. Cloudflare has no self-hosted path. For a buyer with strict data residency requirements, or a network architecture that can't route production AI traffic through a third-party edge network, that's a disqualifier before the evaluation starts. The other three all offer self-hosted deployment, though the operational complexity varies: running LiteLLM in Kubernetes is a different lift than adding Kong plugins to an existing Kong deployment.
Architectural origin shapes the product roadmap more than any feature list will. Kong and Cloudflare are extending existing infrastructure products to handle AI traffic; their AI capabilities are constrained by what the parent platform can express. Portkey and LiteLLM were built for AI traffic from the start, so they can ship AI-specific features (semantic caching, prompt guard, model-specific retry logic) on their own schedule.
Enterprise readiness in this category means audit logging that satisfies a CISO, a support contract that satisfies procurement, and a compliance posture that satisfies legal. Kong has the most established answer to all three, because it's been selling to enterprises since 2015. Portkey and LiteLLM are earlier in that journey. Cloudflare's answer depends entirely on whether the buyer already has a Cloudflare enterprise contract.
Okta Concept Mapping
Virtual keys map most cleanly to OAuth 2.0 client credentials — one credential per application, scoped to a purpose, issued by a central authority, revocable without touching the underlying resource. The issuance and revocation mechanics are genuinely analogous. Where the analogy breaks: an OAuth client_id carries identity that the authorization server recognizes and can make policy decisions against. A virtual key in most gateway implementations is a routing token. The gateway knows which application is using it, but not who within that application, and it doesn't make an authorization decision in the OIDC sense. The gateway is doing access control (this virtual key may call GPT-4o but not Claude 3 Opus) without doing authentication. When a platform lead asks whether Okta app catalog entries can drive virtual key provisioning automatically, the honest answer is: not natively. That's a configuration integration, not a protocol one, and it's covered in the next lesson.
Field Language Guide
| Don't say | Do say | Why it matters |
|---|---|---|
| Fake API key | Virtual key | "Fake" implies it doesn't work; virtual keys are real credentials the gateway honors |
| Wrapper around the OpenAI API | AI gateway | Understates routing, rate limiting, and logging; "wrapper" signals you haven't thought about the infrastructure |
| Logs your prompts | Prompt-response logging | "Logs your prompts" sounds like a privacy incident; "prompt-response logging" is the audit capability |
| Reroutes if it breaks | Fallback routing | Platform engineers use this term; matching it signals you understand the mechanism, not just the outcome |
| Cheaper API calls | Semantic caching | The mechanism matters; caching near-identical prompts is different from negotiating rates |
| Manages your AI spend | Rate limiting | Spend management is a separate capability; rate limiting is what the gateway enforces at the key level |
| Works with all the AI models | Multi-provider coverage | "All the models" is an overstatement that will get you tested; "multi-provider coverage" is accurate and specific |
| Easy to set up | Zero-config (Cloudflare) / drop-in proxy (LiteLLM) | Platform engineers don't trust "easy"; specific deployment characteristics are credible |
| Enterprise-grade | SOC 2 Type II, dedicated support tier | "Enterprise-grade" is marketing; the compliance posture is a fact you can state |
| Better security | Central credential management | The security claim is vague; the mechanism — one virtual key per app, real key never leaves the gateway — is specific and testable |
| Blocks bad prompts | Prompt guard | The feature has a name; using it shows you've read the documentation |
| Token limits | Token-based rate limiting | Distinguishes from request-based rate limiting, which is the distinction that matters for LLM traffic |

