Use Respan AI gateway as a proxy for coding agents

Set up Respan

Sign up: Create an account at platform.respan.ai
Create an API key: Generate one on the API keys page
Add credits or a provider key: Add credits on the Credits page or connect your own provider key on the Integrations page

Overview

CLI coding agents like Claude Code, Codex CLI, Gemini CLI, and OpenCode talk to provider APIs through environment variables or TOML config. Point those config knobs at the Respan gateway instead of the upstream provider, and every request flows through Respan.

You unlock a lot with a single config change:

One key for everyone. Issue a Respan key per developer instead of distributing OpenAI, Anthropic, or Google keys.
Model switching. Try GPT-5.5, Claude 4.6, or Gemini 3 from the same CLI by changing one string.
Fallbacks, retries, and caching. Turn them on through gateway parameters without touching the agent.
Cost tracking per developer, project, or sprint. Every request is logged with metadata you control.
Audit trail. Every prompt and response is captured, and revoking a key immediately cuts off a leaver.

This cookbook covers the gateway setup. To also capture agent-level events such as thinking blocks, tool calls, and file edits, pair this with Trace CLI coding agents.

How it works

Each agent supports a “custom base URL” config. Respan exposes provider-compatible endpoints under one host, so the agent does not need to know it is talking to a gateway.

Provider	Native protocol	Respan endpoint
OpenAI	OpenAI Chat / Responses API	`https://api.respan.ai/api/`
Anthropic	Anthropic Messages API	`https://api.respan.ai/api/anthropic/`
Google Gemini	Gemini API	`https://api.respan.ai/api/google/gemini`
Google Vertex AI	Vertex AI	`https://api.respan.ai/api/google/vertexai/`

Authenticate with your RESPAN_API_KEY instead of the provider key. Respan looks up the upstream provider credentials from your account and forwards the request.

Use Respan AI gateway as a proxy for Claude Code

Claude Code reads ANTHROPIC_BASE_URL and ANTHROPIC_API_KEY from the environment or ~/.claude/settings.json.

Option 1: shell env vars

Add to .bashrc, .zshrc, or PowerShell $PROFILE:

$ unset ANTHROPIC_AUTH_TOKEN
$ export ANTHROPIC_BASE_URL="https://api.respan.ai/api/anthropic/"
$ export ANTHROPIC_API_KEY="YOUR_RESPAN_API_KEY"

ANTHROPIC_AUTH_TOKEN takes precedence over ANTHROPIC_API_KEY when both are set, so unset it first.

Option 2: settings.json (persistent)

Use ~/.claude/settings.json for global config or .claude/settings.local.json for per-project:

1 {
2   "env": {
3     "ANTHROPIC_BASE_URL": "https://api.respan.ai/api/anthropic/",
4     "ANTHROPIC_API_KEY": "YOUR_RESPAN_API_KEY",
5     "ANTHROPIC_AUTH_TOKEN": ""
6   }
7 }

The empty ANTHROPIC_AUTH_TOKEN clears any inherited token from your shell or terminal app.

On first interactive launch Claude prompts: Detected a custom API key in your environment. Choose Yes. If you skipped it earlier, run /config, search for custom, and enable Use custom API key. Non-interactive claude -p and claude --print use ANTHROPIC_API_KEY automatically.

Switch Claude models

$ claude --model claude-opus-4-6
$ claude --model claude-sonnet-4-5-20250929
$ claude --model claude-3-5-haiku-20241022

Use Respan AI gateway as a proxy for Codex CLI

Codex CLI reads provider config from ~/.codex/config.toml. Add a respan model provider entry and point model at it:

1 model = "openai/gpt-5.5"  # provider prefix required for Codex CLI
2 model_provider = "respan"
3 
4 [model_providers.respan]
5 name = "Respan Gateway"
6 base_url = "https://api.respan.ai/api/"
7 wire_api = "responses"
8 env_key = "RESPAN_API_KEY"

Then export the key:

$ export RESPAN_API_KEY="YOUR_RESPAN_API_KEY"

env_key is the name of the environment variable holding your key, not the key itself. wire_api = "responses" tells Codex to use HTTP instead of WebSockets, which the gateway supports.

Switch Codex models

Edit the model = line in ~/.codex/config.toml:

1 model = "openai/gpt-5.5"
2 # model = "openai/claude-sonnet-4-5-20250929"
3 # model = "openai/gemini-2.5-flash"

Use Respan AI gateway as a proxy for Gemini CLI

Gemini CLI reads GOOGLE_GEMINI_BASE_URL and GEMINI_API_KEY from the environment.

Gemini API endpoint

$ export GOOGLE_GEMINI_BASE_URL="https://api.respan.ai/api/google/gemini"
$ export GEMINI_API_KEY="YOUR_RESPAN_API_KEY"

Vertex AI endpoint

If your account is set up with a Google Cloud Vertex AI provider key, use:

$ export GOOGLE_GEMINI_BASE_URL="https://api.respan.ai/api/google/vertexai/"
$ export GEMINI_API_KEY="YOUR_RESPAN_API_KEY"

Switch Gemini models

$ gemini --model gemini-3-pro
$ gemini --model gemini-2.5-flash
$ gemini --model gemini-2.0-flash-exp

Use Respan AI gateway as a proxy for OpenCode

OpenCode talks to OpenAI-compatible endpoints.

$ export OPENAI_BASE_URL="https://api.respan.ai/api/"
$ export OPENAI_API_KEY="YOUR_RESPAN_API_KEY"

Then run with any model the gateway exposes (use the openai/ prefix):

$ opencode run -m "openai/gpt-5.5" "your prompt"

Switch OpenCode models

$ opencode run -m "openai/gpt-5.5" "..."
$ opencode run -m "openai/claude-sonnet-4-5-20250929" "..."
$ opencode run -m "openai/gemini-2.5-flash" "..."

Verify

Run a single prompt with each agent and confirm the request shows up in Logs:

$ claude -p "say hi"
$ codex exec "say hi"
$ gemini -p "say hi"
$ opencode run -m openai/gpt-5.5 "say hi"

Each request displays the prompt, response, model, tokens, and cost. See the full model list for the 250+ models reachable through one gateway.

Reliability and cost features

Once requests flow through Respan, layer on gateway features without changing agent code. Set them per-key in the API keys settings, or per-request when you control the request body.

Feature	What it does	Reference
Fallback models	If the primary model errors or is rate-limited, automatically retry on a backup.	Fallback models
Load balancing	Spread requests across providers, accounts, or regions.	Load balancing
Retries	Automatic retry with backoff for transient errors.	Retries
Response caching	Cache repeated prompts. Useful for `/init`, README scans, and similar boilerplate.	Caches
Per-customer credentials	Bring-your-own provider keys per end-user.	Provider keys
Cost limits	Stop runaway sessions before they burn through budget.	Spend limits

Tag requests for cost tracking

The CLI agents above pass requests verbatim, so the cleanest way to attribute usage is to scope the API key to a developer or team. Issue keys per-developer on the API keys page and group them by tags. The Users dashboard breaks down spend by key out of the box.

For per-session metadata such as a Jira ticket, branch, or sprint, pair this gateway setup with the Trace CLI coding agents cookbook. The respan integrate hook supports RESPAN_CUSTOMER_ID and RESPAN_METADATA env vars that attach the right tags without any agent-side support.

Combine with full tracing

The gateway captures every LLM request, but agent-level events such as thinking blocks, tool calls, and file edits live inside the agent process and never hit the network. To capture those too, add the tracing hook on top of the gateway config:

Follow this cookbook to point the agent at https://api.respan.ai/api/....
Then run respan integrate <agent> from Trace CLI coding agents.

The hook produces a parent span and the gateway’s LLM-call spans nest underneath. You see one trace per agent turn with thinking, tools, and the underlying chat.completion calls all linked.

Troubleshooting

Claude Code: requests still go to api.anthropic.com

ANTHROPIC_AUTH_TOKEN (set by claude auth login or the OAuth flow) takes precedence over ANTHROPIC_API_KEY. Run unset ANTHROPIC_AUTH_TOKEN and restart your terminal, or set it to "" in settings.json as shown above.

In an interactive session, you may also need to re-approve the custom API key. Run /config, search for custom, and enable Use custom API key.

Codex CLI: 'unknown wire_api' error

wire_api = "responses" was added in a recent Codex CLI version. Update Codex with npm i -g @openai/codex and try again. Older versions only support wire_api = "chat", which is also accepted by the gateway but only routes Chat Completions.

Gemini CLI: 401 Unauthorized

The gateway authenticates with your Respan key in GEMINI_API_KEY, not your Google API key. Confirm that echo $GEMINI_API_KEY returns a value starting with sk_.

If you are using the Vertex AI endpoint, your Respan account must have a Vertex AI provider key connected.

OpenCode: 'model not found'

OpenCode requires the openai/ provider prefix. Use -m "openai/gpt-5.5", not -m "gpt-5.5".

Next steps

Trace CLI coding agents

Capture thinking, tool calls, and file edits with the Respan hook.

Track cost per feature

Attribute LLM costs to teams, projects, and sprints.

Migrate from OpenAI to multi-model

Switch providers with fallbacks and cost comparison.

Gateway advanced features

Fallbacks, load balancing, retries, and caching.