Use the gateway
Call 250+ LLMs through a unified API with automatic tracing, fallbacks, and caching.
Set up Respan
- Sign up — Create an account at platform.respan.ai
- Create an API key — Generate one on the API keys page
- Add credits or a provider key — Add credits on the Credits page or connect your own provider key on the Integrations page
Use AI
Add the Docs MCP to your AI coding tool to get help building with Respan. No API key needed.
What is AI gateway?
Respan’s AI Gateway is a gateway that lets you interface with 250+ large language models (LLMs) via one unified API.
Considerations:
- May not be suitable for products with strict latency requirements (50 - 150ms added).
- May not be ideal for those who do not want to integrate a third-party service into the core of their application.
Use AI gateway
1. Get your Respan API key
After you create an account on Respan, you can get your API key from the API keys page.
2. Set up LLM provider API key
Environment Management: To separate test and production environments, create separate API keys for each environment instead of using an env parameter. This approach provides better security and clearer separation between your development and production workflows.
3. Call a LLM
Point any LLM SDK at https://api.respan.ai/api/ and use your Respan API key.
OpenAI
Anthropic
Gemini
OpenAI Agents
Vercel AI
Claude Agents
API
Next steps
Now that you can make a basic call:
- Respan params & metadata:
customer_identifier,metadata(custom properties), and the three ways to send platform-specific params. - Providers & models: switch providers, custom model aliases, and per-request routing controls.
- Routing & passthrough: the two endpoint shapes (unified router vs. provider-native passthroughs).
- Reliability: fallback models, load balancing, retries.
- Caching and Limits.
Supported models
Browse available models on the Models page.
Streaming
Function calling
Enable thinking
Thinking mode allows supported models to show their reasoning process before providing the final answer.
Choose models that support thinking like gpt-5, claude-sonnet-4-20250514. See Log content types for details on the response structure.
Upload PDF
Upload image
Pass images using image_url content blocks or via prompt variables.
Prompt caching (Anthropic only)
Prompt caching stores the model’s intermediate computation state. The model generates diverse responses while saving computational costs, as it doesn’t need to reprocess the entire prompt from scratch.