Caching

Cache LLM responses to reduce costs and latency.

This page covers Respan’s response caching — storing and reusing exact request/response pairs. For provider-level prompt caching (Anthropic), see Prompt caching in the gateway overview.

Caches save and reuse exact LLM requests. Enable caches to reduce LLM costs and improve response times.

  • Reduce latency: Serve stored responses instantly, eliminating repeated API calls.
  • Save costs: Minimize expenses by reusing cached responses.

Turn on caches by setting cache_enabled to true. We will cache the whole conversation, including the system message, user message and the response.

1from openai import OpenAI
2
3client = OpenAI(
4 base_url="https://api.respan.ai/api/",
5 api_key="YOUR_RESPAN_API_KEY",
6)
7
8response = client.chat.completions.create(
9 model="gpt-5.4",
10 messages=[
11 {"role": "user", "content": "Tell me a long story"}
12 ],
13 extra_body={
14 "cache_enabled": True,
15 "cache_ttl": 600,
16 "cache_options": {
17 "cache_by_customer": True
18 }
19 }
20)

Cache parameters

cache_enabled
boolean

Enable or disable caches.

1{
2 "cache_enabled": true
3}
cache_ttl
number

Time-to-live (TTL) for the cache in seconds.

Optional — default value is 30 days.
1{
2 "cache_ttl": 3600
3}
cache_options
object

Cache behavior options.

FieldTypeDefaultDescription
cache_by_customerbooleanfalseCreate separate cache entries per customer_identifier
is_cached_by_modelbooleanfalseCreate separate cache entries per model name. Use this to invalidate caches when switching models — without it, the same prompt returns the cached response from any model.
omit_logbooleanfalseDon’t log the request when cache is hit
Optional parameter
1{
2 "cache_options": {
3 "cache_by_customer": true,
4 "is_cached_by_model": true,
5 "omit_log": false
6 }
7}

View caches

You can view the caches on the Logs page. The model tag will be respan/cache. You can also filter the logs by the Cache hit field.

Caches

Omit logs when cache hit

Set the omit_logs parameter to true or go to Caches in Settings. This won’t generate a new LLM log when the cache is hit.