On March 24, 2026, a supply chain attack hit LiteLLM - one of the most widely used open-source AI gateways in production. A threat group called TeamPCP exploited a cascading CI/CD compromise that started with Aqua Security's Trivy scanner, stole a PyPI publishing token, and pushed two malicious package versions containing a multi-stage credential stealer.
The malware harvested SSH keys, cloud credentials, Kubernetes secrets, and .env files from every system that installed the compromised versions during a roughly five-hour window. LiteLLM is present in 36% of cloud environments according to Wiz. The incident was assigned CVE-2026-33634 with a CVSS score of 9.4.
This article isn't a post-mortem. It's about the question the incident forced every engineering team to ask: what should you actually look for in an AI gateway in 2026?
Why your gateway choice is a security decision
An AI gateway sits between your application and every LLM provider you use. It holds credentials for each provider, routes every request, and often sees every prompt and response.
The LiteLLM incident made the implication explicit: your gateway is one of the highest-value targets in your infrastructure. How it's deployed — self-hosted package, managed service, edge proxy — determines your attack surface.
We evaluated six gateways across security, observability, routing, platform breadth, ease of adoption, and governance.
1. Respan — full-stack LLM engineering platform
Type: Managed cloud
Respan bundles an AI gateway with observability, evaluations, prompt management, and optimization in one platform. The gateway is managed infrastructure — not a package you install. Provider credentials live on Respan's side, never in your environment variables or CI/CD secrets. The class of supply chain attack that hit LiteLLM doesn't apply here.
With 300+ models across all major providers, Respan matches the breadth of marketplaces like OpenRouter — but at lower cost and with built-in observability and evals that OpenRouter doesn't offer. Built-in evals test model outputs before deployment. Prompt management provides versioning, A/B testing, and optimization. Cost tracking gives engineering and finance shared visibility. SOC 2 compliant, with a free tier available.
- Strengths: No supply chain risk; 300+ models at lower cost than aggregators; 99.99% uptime SLA; unlimited rate limits; only option combining gateway + observability + evals + prompt optimization; SOC 2; provider credentials never touch your infra; free tier.
- Limitations: Managed-only (no self-hosted option); newest models from smaller providers and open-source ecosystems may appear on OpenRouter first.
- Best for: Teams that want one platform for the full LLM lifecycle without assembling separate tools.
2. OpenRouter — managed LLM marketplace
Type: Managed API aggregator
OpenRouter gives you one API key and credit balance for 300+ models. No infrastructure, no provider accounts. Provider pricing is passed through with a 5.5% fee on credit purchases. Automatic fallback handles provider outages.
- Strengths: Broadest model selection (300+); zero setup; no inference markup; free tier with 25+ models; BYOK option.
- Limitations: Minimal observability; no governance or access controls; no guardrails, evals, or prompt management; 100–150ms added latency; no public SLA.
- Best for: Individual developers and early-stage teams prototyping across multiple models.
3. Vercel AI Gateway — frontend-ecosystem gateway
Type: Managed (Vercel platform)
Vercel's AI Gateway provides a unified endpoint for ~100 models, tightly integrated with their deployment platform and open-source AI SDK. Excellent DX within the Next.js ecosystem. No markup on token pricing.
- Strengths: Seamless Next.js integration; no token markup; built on the popular AI SDK; load balancing and failover; $5/mo free credit.
- Limitations: Limited model selection (~100 models); tightly coupled to Vercel; observability tuned for web vitals, not AI metrics; no semantic caching; serverless timeouts limit agentic workflows; no evals or prompt management.
- Best for: Frontend teams on Next.js/Vercel who want the fastest path to shipping AI features.
4. LiteLLM — open-source self-hosted gateway
Type: Open-source (Python)
LiteLLM provides an OpenAI-compatible interface to 100+ providers with ~97M monthly PyPI downloads and 40K+ GitHub stars. The team responded to the March breach with transparency and speed — engaging Mandiant and publishing detailed remediation. They were victims of a broader campaign, not negligent.
But the incident exposed a structural risk: it's a pip package, and any system that ran an unpinned install during the attack window was compromised. Self-hosting also means you own dependency management, security patching, scaling, and incident response.
- Strengths: Broadest provider coverage (100+); fully open source; mature codebase; free.
- Limitations: PyPI supply chain exposure; Python performance ceiling under concurrency; basic observability; no evals or prompt management; 800+ open GitHub issues.
- Best for: Teams with strong DevOps/security capabilities willing to invest in hardening their own infrastructure.
5. Cloudflare AI Gateway — edge caching gateway
Type: Managed edge proxy
Cloudflare AI Gateway routes LLM traffic through their global edge network with response caching, basic analytics, and rate limiting. Setup is a one-line URL change.
- Strengths: Zero infrastructure; edge caching reduces duplicate API costs; simple setup; Cloudflare security layer; free tier.
- Limitations: Exact-match caching only; 10–50ms proxy latency; limited observability; no guardrails, evals, or prompt tooling.
- Best for: Teams already on Cloudflare who want lightweight caching and traffic management.
6. Portkey — gateway + observability control plane
Type: Open-source + managed
Portkey is a gateway with deep observability, guardrails, and governance. It went fully open source on March 24, 2026, and reports processing 1T+ tokens daily across 24,000+ organizations. Every request is logged with full context. Guardrails include PII redaction and jailbreak detection. Governance covers RBAC, audit trails, and budget enforcement.
- Strengths: Best-in-class observability, strong governance, 1,600+ model integrations, MCP Gateway for agents.
- Limitations: Log-based pricing can get expensive at scale, no built-in evals or prompt optimization, enterprise features gated to higher tiers.
- Best for: Teams that need robust observability and governance but already have eval tooling.
Comparison table
| Respan | OpenRouter | Vercel GW | LiteLLM | Cloudflare GW | Portkey | |
|---|---|---|---|---|---|---|
| Type | Managed | Managed | Managed | Self-hosted OSS | Managed edge | OSS + Managed |
| Supply chain risk | None | None | None | High (PyPI) | None | Medium |
| Observability | Built-in | Minimal | Basic | Basic | Basic | Best-in-class |
| Evals & testing | Built-in | — | — | — | — | — |
| Prompt management | Built-in | — | — | — | — | Yes |
| Cost tracking | Built-in | Per-model | Per-model | Basic | Basic | Built-in |
| Guardrails | Yes | — | — | — | — | Yes |
| Reliability / uptime | 99.99% SLA | No public SLA | Vercel SLA | Self-managed | Cloudflare SLA | 99.99% SLA |
| Rate limits | Unlimited | Provider limits | Platform limits | Self-managed | Per-gateway limits | Tier-based |
| Free tier | Yes | Yes | $5/mo credit | Free (OSS) | Yes | Yes |
| SOC 2 | Yes | — | Enterprise | * | Cloudflare | Enterprise |
*LiteLLM's SOC 2 was certified via Delve, a compliance startup facing allegations of inaccurate reporting.
Which one should you choose?
Prototyping with multiple models? Start with OpenRouter — lowest friction, 300+ models, no infrastructure.
Building on Next.js/Vercel? The Vercel AI Gateway is the natural fit.
Need a caching layer on Cloudflare? One URL change, edge caching, done.
Observability and governance are the priority? Portkey has the deepest tracing and policy enforcement in the market.
Want full control and have a strong platform team? LiteLLM remains capable — but after March 24, "self-hosted" means "self-secured."
Want one platform for gateway, observability, evals, and prompts? That's the problem Respan was built to solve. Managed architecture, no supply chain risk, SOC 2 built in. Try it free.
The lesson from the LiteLLM breach isn't "don't use open source." The lesson is that AI gateways are critical infrastructure now, and they deserve the same rigor we apply to databases and auth systems. The teams that ask where their credentials live and who controls the release pipeline are the ones that won't be scrambling when the next attack lands.


