Respan caching | Respan Docs

This page covers Respan’s response caching — storing and reusing exact request/response pairs. For provider-level prompt caching (Anthropic), see Prompt caching.

Caches save and reuse exact LLM requests. Enable caches to reduce LLM costs and improve response times.

Reduce latency: Serve stored responses instantly, eliminating repeated API calls.
Save costs: Minimize expenses by reusing cached responses.

Turn on caches by setting cache_enabled to true. We will cache the whole conversation, including the system message, user message and the response.

OpenAI Python SDK

OpenAI TypeScript SDK

Standard API

1 from openai import OpenAI
2 
3 client = OpenAI(
4     base_url="https://api.respan.ai/api/",
5     api_key="YOUR_RESPAN_API_KEY",
6 )
7 
8 response = client.chat.completions.create(
9     model="gpt-5.4",
10     messages=[
11         {"role": "user", "content": "Tell me a long story"}
12     ],
13     extra_body={
14         "cache_enabled": True,
15         "cache_ttl": 600,
16         "cache_options": {
17             "cache_by_customer": True
18         }
19     }
20 )

Cache parameters

cache_enabled

boolean

Enable or disable caches.

1 {
2     "cache_enabled": true
3 }

cache_ttl

number

Time-to-live (TTL) for the cache in seconds.

Optional — default value is 30 days.

1 {
2     "cache_ttl": 3600
3 }

cache_options

object

Cache behavior options.

Field	Type	Default	Description
`cache_by_customer`	boolean	`false`	Create separate cache entries per `customer_identifier`
`is_cached_by_model`	boolean	`false`	Create separate cache entries per model name. Use this to invalidate caches when switching models — without it, the same prompt returns the cached response from any model.
`omit_log`	boolean	`false`	Don’t log the request when cache is hit

Optional parameter

1 {
2     "cache_options": {
3         "cache_by_customer": true,
4         "is_cached_by_model": true,
5         "omit_log": false
6     }
7 }

View caches

You can view the caches on the Logs page. The model tag will be respan/cache. You can also filter the logs by the Cache hit field.

Omit logs when cache hit

Set the omit_logs parameter to true or go to Caches in Settings. This won’t generate a new LLM log when the cache is hit.