Use the gateway

Call 250+ LLMs through a unified API with automatic tracing, fallbacks, and caching.

Use the gateway

Call 250+ LLMs through a unified API with automatic tracing, fallbacks, and caching.

Set up Respan

Sign up — Create an account at platform.respan.ai
Create an API key — Generate one on the API keys page
Add credits or a provider key — Add credits on the Credits page or connect your own provider key on the Integrations page

Use AI

Add the Docs MCP to your AI coding tool to get help building with Respan. No API key needed.

1 {
2   "mcpServers": {
3     "respan-docs": {
4       "url": "https://mcp.respan.ai/mcp/docs"
5     }
6   }
7 }

What is AI gateway?

Respan’s AI Gateway is a gateway that lets you interface with 250+ large language models (LLMs) via one unified API.

Considerations:

May not be suitable for products with strict latency requirements (50 - 150ms added).
May not be ideal for those who do not want to integrate a third-party service into the core of their application.

Use AI gateway

1. Get your Respan API key

After you create an account on Respan, you can get your API key from the API keys page.

2. Set up LLM provider API key

Environment Management: To separate test and production environments, create separate API keys for each environment instead of using an env parameter. This approach provides better security and clearer separation between your development and production workflows.

For all AI gateway users, you have to add your own credentials to activate AI gateway. We will use your credentials to call LLMs on your behalf.

For example, if you want to use OpenAI, you have to add your OpenAI API key to activate AI gateway. We won’t use your credentials for any other purposes.

Set up LLM provider API key

3. Call a LLM

Point any LLM SDK at https://api.respan.ai/api/ and use your Respan API key.

OpenAI

Anthropic

Gemini

OpenAI Agents

Vercel AI

Claude Agents

API

1 from openai import OpenAI
2 
3 client = OpenAI(
4     base_url="https://api.respan.ai/api/",
5     api_key="YOUR_RESPAN_API_KEY",
6 )
7 
8 response = client.chat.completions.create(
9     model="gpt-5.4",
10     messages=[{"role": "user", "content": "Hello!"}],
11 )
12 print(response.choices[0].message.content)

All integrations —>

Next steps

Now that you can make a basic call:

Respan params & metadata: customer_identifier, metadata (custom properties), and the three ways to send platform-specific params.
Providers & models: switch providers, custom model aliases, and per-request routing controls.
Routing & passthrough: the two endpoint shapes (unified router vs. provider-native passthroughs).
Reliability: fallback models, load balancing, retries.
Caching and Limits.

Supported models

Browse available models on the Models page.

Streaming

1 from openai import OpenAI
2 
3 client = OpenAI(
4     base_url="https://api.respan.ai/api/",
5     api_key="YOUR_RESPAN_API_KEY",
6 )
7 
8 response = client.chat.completions.create(
9     model="gpt-5.4",
10     messages=[{"role": "user", "content": "Hello"}],
11     stream=True,
12 )
13 
14 for chunk in response:
15     print(chunk)

Function calling

1 from openai import OpenAI
2 client = OpenAI(
3     base_url="https://api.respan.ai/api/",
4     api_key="YOUR_RESPAN_API_KEY",
5 )
6 
7 tools = [
8   {
9     "type": "function",
10     "function": {
11       "name": "get_current_weather",
12       "description": "Get the current weather in a given location",
13       "parameters": {
14         "type": "object",
15         "properties": {
16           "location": {
17             "type": "string",
18             "description": "The city and state, e.g. San Francisco, CA",
19           },
20           "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
21         },
22         "required": ["location"],
23       },
24     }
25   }
26 ]
27 messages = [{"role": "user", "content": "What's the weather like in Boston today?"}]
28 completion = client.chat.completions.create(
29   model="gpt-5.4",
30   messages=messages,
31   tools=tools,
32   tool_choice="auto"
33 )
34 print(completion)

Enable thinking

Thinking mode allows supported models to show their reasoning process before providing the final answer.

1 payload = {
2     "model": "claude-sonnet-4-20250514",
3     "max_tokens": 16000,
4     "thinking": {
5         "type": "enabled",
6         "budget_tokens": 10000
7     },
8     "messages": [
9         {
10             "role": "user",
11             "content": "Are there an infinite number of prime numbers such that n mod 4 == 3?"
12         }
13     ]
14 }

Choose models that support thinking like gpt-5, claude-sonnet-4-20250514. See Log content types for details on the response structure.

Upload PDF

1 import os
2 import base64
3 import requests
4 from openai import OpenAI
5 
6 openai_client = OpenAI()
7 pdf_url = "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
8 response = requests.get(pdf_url)
9 file_data = response.content
10 file = openai_client.files.create(file=file_data, purpose="user_data")
11 
12 client = OpenAI(
13     base_url="https://api.respan.ai/api",
14     api_key=os.getenv("RESPAN_API_KEY"),
15 )
16 
17 file_content = [
18     {"type": "text", "text": "What's this file about?"},
19     {
20         "type": "file",
21         "file": {
22             "file_id": file.id,
23         },
24     }
25 ]
26 
27 response = client.chat.completions.create(
28     model="gpt-5.4",
29     messages=[
30         {
31             "role": "user",
32             "content": file_content,
33         }
34     ],
35 )

Upload image

Pass images using image_url content blocks or via prompt variables.

1 from openai import OpenAI
2 
3 client = OpenAI(
4     base_url="https://api.respan.ai/api/",
5     api_key="YOUR_RESPAN_API_KEY",
6 )
7 
8 response = client.chat.completions.create(
9     model="gpt-5.4",
10     messages=[{"role": "user", "content": [
11         {"type": "text", "text": "What do you see?"},
12         {"type": "image_url", "image_url": {"url": "https://example.com/image.png"}}
13     ]}],
14 )

Prompt caching (Anthropic only)

Prompt caching stores the model’s intermediate computation state. The model generates diverse responses while saving computational costs, as it doesn’t need to reprocess the entire prompt from scratch.

Only available for Anthropic models through the gateway.

1 import anthropic
2 
3 client = anthropic.Anthropic(
4     base_url="https://api.respan.ai/api/anthropic/",
5     api_key="YOUR_RESPAN_API_KEY",
6 )
7 
8 message = client.messages.create(
9     model="claude-sonnet-4-20250514",
10     system=[
11       {
12         "type": "text",
13         "text": "You are an AI assistant tasked with analyzing literary works.",
14       },
15       {
16         "type": "text",
17         "text": "<the entire contents of 'Pride and Prejudice'>",
18         "cache_control": {"type": "ephemeral"}
19       }
20     ],
21     messages=[{"role": "user", "content": "Analyze the major themes."}]
22 )