Use the gateway

Call 250+ LLMs through a unified API with automatic tracing, fallbacks, and caching.

  1. Sign up — Create an account at platform.respan.ai
  2. Create an API key — Generate one on the API keys page
  3. Add credits or a provider key — Add credits on the Credits page or connect your own provider key on the Integrations page

Add the Docs MCP to your AI coding tool to get help building with Respan. No API key needed.

1{
2 "mcpServers": {
3 "respan-docs": {
4 "url": "https://mcp.respan.ai/mcp/docs"
5 }
6 }
7}

What is AI gateway?

Respan’s AI Gateway is a gateway that lets you interface with 250+ large language models (LLMs) via one unified API.

Considerations:

  • May not be suitable for products with strict latency requirements (50 - 150ms added).
  • May not be ideal for those who do not want to integrate a third-party service into the core of their application.

Use AI gateway

1. Get your Respan API key

After you create an account on Respan, you can get your API key from the API keys page.

Create API key placeholder

2. Set up LLM provider API key

Environment Management: To separate test and production environments, create separate API keys for each environment instead of using an env parameter. This approach provides better security and clearer separation between your development and production workflows.

For all AI gateway users, you have to add your own credentials to activate AI gateway. We will use your credentials to call LLMs on your behalf.
For example, if you want to use OpenAI, you have to add your OpenAI API key to activate AI gateway. We won’t use your credentials for any other purposes.

3. Call a LLM

Point any LLM SDK at https://api.respan.ai/api/ and use your Respan API key.

1from openai import OpenAI
2
3client = OpenAI(
4 base_url="https://api.respan.ai/api/",
5 api_key="YOUR_RESPAN_API_KEY",
6)
7
8response = client.chat.completions.create(
9 model="gpt-5.4",
10 messages=[{"role": "user", "content": "Hello!"}],
11)
12print(response.choices[0].message.content)

All integrations —>


Next steps

Now that you can make a basic call:

Supported models

Browse available models on the Models page.


Streaming

1from openai import OpenAI
2
3client = OpenAI(
4 base_url="https://api.respan.ai/api/",
5 api_key="YOUR_RESPAN_API_KEY",
6)
7
8response = client.chat.completions.create(
9 model="gpt-5.4",
10 messages=[{"role": "user", "content": "Hello"}],
11 stream=True,
12)
13
14for chunk in response:
15 print(chunk)

Function calling

1from openai import OpenAI
2client = OpenAI(
3 base_url="https://api.respan.ai/api/",
4 api_key="YOUR_RESPAN_API_KEY",
5)
6
7tools = [
8 {
9 "type": "function",
10 "function": {
11 "name": "get_current_weather",
12 "description": "Get the current weather in a given location",
13 "parameters": {
14 "type": "object",
15 "properties": {
16 "location": {
17 "type": "string",
18 "description": "The city and state, e.g. San Francisco, CA",
19 },
20 "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
21 },
22 "required": ["location"],
23 },
24 }
25 }
26]
27messages = [{"role": "user", "content": "What's the weather like in Boston today?"}]
28completion = client.chat.completions.create(
29 model="gpt-5.4",
30 messages=messages,
31 tools=tools,
32 tool_choice="auto"
33)
34print(completion)

Enable thinking

Thinking mode allows supported models to show their reasoning process before providing the final answer.

1payload = {
2 "model": "claude-sonnet-4-20250514",
3 "max_tokens": 16000,
4 "thinking": {
5 "type": "enabled",
6 "budget_tokens": 10000
7 },
8 "messages": [
9 {
10 "role": "user",
11 "content": "Are there an infinite number of prime numbers such that n mod 4 == 3?"
12 }
13 ]
14}

Choose models that support thinking like gpt-5, claude-sonnet-4-20250514. See Log content types for details on the response structure.


Upload PDF

1import os
2import base64
3import requests
4from openai import OpenAI
5
6openai_client = OpenAI()
7pdf_url = "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
8response = requests.get(pdf_url)
9file_data = response.content
10file = openai_client.files.create(file=file_data, purpose="user_data")
11
12client = OpenAI(
13 base_url="https://api.respan.ai/api",
14 api_key=os.getenv("RESPAN_API_KEY"),
15)
16
17file_content = [
18 {"type": "text", "text": "What's this file about?"},
19 {
20 "type": "file",
21 "file": {
22 "file_id": file.id,
23 },
24 }
25]
26
27response = client.chat.completions.create(
28 model="gpt-5.4",
29 messages=[
30 {
31 "role": "user",
32 "content": file_content,
33 }
34 ],
35)

Upload image

Pass images using image_url content blocks or via prompt variables.

1from openai import OpenAI
2
3client = OpenAI(
4 base_url="https://api.respan.ai/api/",
5 api_key="YOUR_RESPAN_API_KEY",
6)
7
8response = client.chat.completions.create(
9 model="gpt-5.4",
10 messages=[{"role": "user", "content": [
11 {"type": "text", "text": "What do you see?"},
12 {"type": "image_url", "image_url": {"url": "https://example.com/image.png"}}
13 ]}],
14)

Prompt caching (Anthropic only)

Prompt caching stores the model’s intermediate computation state. The model generates diverse responses while saving computational costs, as it doesn’t need to reprocess the entire prompt from scratch.

Only available for Anthropic models through the gateway.
1import anthropic
2
3client = anthropic.Anthropic(
4 base_url="https://api.respan.ai/api/anthropic/",
5 api_key="YOUR_RESPAN_API_KEY",
6)
7
8message = client.messages.create(
9 model="claude-sonnet-4-20250514",
10 system=[
11 {
12 "type": "text",
13 "text": "You are an AI assistant tasked with analyzing literary works.",
14 },
15 {
16 "type": "text",
17 "text": "<the entire contents of 'Pride and Prejudice'>",
18 "cache_control": {"type": "ephemeral"}
19 }
20 ],
21 messages=[{"role": "user", "content": "Analyze the major themes."}]
22)