Unified router or provider passthrough for 250+ models, with failover, response caching, warn/block limits, and metadata on every logged request.
One endpoint for 250+ models — router or passthrough, failover, response caching, warn/block limits, and metadata on every logged request.
Route OpenAI-style calls through Respan to 500+ models, or keep each provider’s native SDK on a passthrough endpoint—every request is logged.
If a model errors or rate-limits, try the next model in your fallback list, balance load across keys, and retry with backoff from one place.
Set soft warnings or hard caps per API key, get Slack or email alerts when a threshold crosses, and cache repeat prompts to cut cost and latency.
Routing, reliability, limits, and metadata on one surface — tied to tracing and evals on the same trace.
Chat Completions and Responses on the unified router, or Anthropic, Gemini, and Vertex passthroughs. Swap slugs, pass an inline models list, add custom aliases, pin X-Respan-Route-Provider, or credential_override per model slug.
Platform fallback_models or per-request lists; load_balance_group across deployments and customer_credentials across keys; retry_params with backoff. Limits warn or block on cost, requests, or tokens per API key.
customer_identifier, metadata, thread_identifier, and disable_log on one call, via extra_body, respan_params, or X-Data-Respan-Params when needed. Filter logs and traces by any key; Users page breaks down spend per end user.
Router or passthrough, limits, cache, failover, and log, one lifecycle from your app through Respan to the model provider.
Pick the unified router or passthrough — swap model slugs or native SDKs; Respan logs every request.
Try fallback_models when the primary fails; cache hits return the stored response, scoped per customer.
Tag customer_identifier and metadata on calls; filter Logs for routing, token counts, and cost.
Six gaps teams hit calling providers directly, and how Respan routing, cache, and automatic logging address each.
Per-team key sprawl with no shared caps. Issue Respan API keys per env and team, set warn/block limit policies and tag traffic with customer_identifier.
Upstream errors become user-facing downtime without a fallback list. Set fallback_models in Settings → Fallback or on the request.
Gateway retry_params plus app retries compound load. Configure retry_params in the platform or request and cap retries in your application.
Cache without cache_by_customer can return one user's answer to another. Enable cache_by_customer or validate cache_ttl before launch.
Calls that bypass the gateway miss unified logs. Route through Respan so every router and passthrough request is logged automatically.
Logs lack customer_identifier or metadata; cannot filter by feature, tenant, or thread. Send params on prod paths; thread_identifier groups multi-turn traffic.
Point your client at https://api.respan.ai/api/, add provider keys, and ship — more examples in the docs.
Get your Respan API key
Sign up and create your first key on the API keys page.
Add provider credentials
Connect providers on Integrations or add credits on Billing.
Choose router or passthrough
One OpenAI-style base URL, or native Anthropic / Gemini URLs.
Send params on every call
Tag users, set fallback models, and enable cache in extra_body.
from openai import OpenAI
client = OpenAI(
base_url="https://api.respan.ai/api/",
api_key="YOUR_RESPAN_API_KEY",
)
response = client.chat.completions.create(
model="gpt-5.4",
messages=[{"role": "user", "content": "Hello!"}],
extra_body={
"customer_identifier": "user_123",
"metadata": {"feature": "chatbot", "environment": "production"},
"fallback_models": ["claude-sonnet-4-20250514", "gemini-2.5-flash"],
"cache_enabled": True,
"cache_ttl": 600,
"cache_options": {"cache_by_customer": True},
},
)
print(response.choices[0].message.content)from openai import OpenAI
client = OpenAI(
base_url="https://api.respan.ai/api/",
api_key="YOUR_RESPAN_API_KEY",
)
response = client.chat.completions.create(
model="gpt-5.4",
messages=[{"role": "user", "content": "Hello!"}],
extra_body={
"customer_identifier": "user_123",
"metadata": {"feature": "chatbot", "environment": "production"},
"fallback_models": ["claude-sonnet-4-20250514", "gemini-2.5-flash"],
"cache_enabled": True,
"cache_ttl": 600,
"cache_options": {"cache_by_customer": True},
},
)
print(response.choices[0].message.content)Retries, cache policy, and streaming edge cases: what teams configure before the gateway becomes a single point of dependency.
Gateway retry_params can retry upstream while your app retries the gateway. Configure num_retries and retry_after in the platform or request body, and cap application retries so layers do not stack.
cache_ttl too long or cache_by_customer off can serve stale answers across users. Set cache_options.is_cached_by_model when switching models so the same prompt does not return a cache entry from another model.
disable_log records metrics only; no request/response payloads. cache_options.omit_log skips a new log on cache hits. Use when you need cost and latency without storing full bodies.
ISO 27001
Respan is fully compliant with ISO 27001, the internationally recognized standard for information security management.
SOC 2
We meet SOC 2 requirements to ensure secure and compliant management of data across all our systems.
GDPR
With operations designed for global compliance, we operate under GDPR - the world's strictest standard for data privacy.
HIPAA
Respan is HIPAA compliant with a Business Associate Agreement available for healthcare organizations.
Practical guides on the architecture decisions that surround the gateway:
Related guides: LLM tracing · LLM evals · LLM observability