LlamaIndex | Respan Docs

LlamaIndex is a framework for building LLM applications with your own data. It provides indexes, query engines, retrievers, and agents for retrieval-augmented generation. Respan gives you full observability over every query, retrieval, agent step, and LLM call — and gateway routing through the OpenAI-compatible Respan endpoint.

Set up Respan

Create an account at platform.respan.ai and grab an API key. For gateway, also add credits or a provider key.

Run npx @respan/cli setup to set up with your coding agent.

Example projects

Python examples

Tracing

Gateway

Setup

Install packages

$ pip install respan-ai openinference-instrumentation-llama-index llama-index llama-index-llms-openai

Set environment variables

$ export OPENAI_API_KEY="YOUR_OPENAI_API_KEY"
$ export RESPAN_API_KEY="YOUR_RESPAN_API_KEY"

OPENAI_API_KEY is used for LLM requests. RESPAN_API_KEY is used to export traces to Respan.

Initialize and run

1 from llama_index.llms.openai import OpenAI
2 from respan import Respan
3 from openinference.instrumentation.llama_index import LlamaIndexInstrumentor
4 
5 respan = Respan(instrumentations=[LlamaIndexInstrumentor()])
6 
7 llm = OpenAI(model="gpt-4.1-nano")
8 
9 response = llm.complete("Say hello in three languages.")
10 print(response)
11 respan.flush()

View your trace

Open the Traces page to see your LlamaIndex workflow with query spans, retrievers, agent steps, and LLM calls.

Configuration

Parameter	Type	Default	Description
`api_key`	`str \| None`	`None`	Falls back to `RESPAN_API_KEY` env var.
`base_url`	`str \| None`	`None`	Falls back to `RESPAN_BASE_URL` env var.
`instrumentations`	`list`	`[]`	Plugin instrumentations to activate (e.g. `LlamaIndexInstrumentor()`).
`customer_identifier`	`str \| None`	`None`	Default customer identifier for all spans.
`metadata`	`dict \| None`	`None`	Default metadata attached to all spans.
`environment`	`str \| None`	`None`	Environment tag (e.g. `"production"`).

Attributes

In Respan()

Set defaults at initialization — these apply to all spans.

1 from respan import Respan
2 from openinference.instrumentation.llama_index import LlamaIndexInstrumentor
3 
4 respan = Respan(
5     instrumentations=[LlamaIndexInstrumentor()],
6     customer_identifier="user_123",
7     metadata={"service": "rag-api", "version": "1.0.0"},
8 )

With propagate_attributes

Override per-request using a context scope.

1 from llama_index.llms.openai import OpenAI
2 from respan import Respan, propagate_attributes
3 from openinference.instrumentation.llama_index import LlamaIndexInstrumentor
4 
5 respan = Respan(instrumentations=[LlamaIndexInstrumentor()])
6 llm = OpenAI(model="gpt-4.1-nano")
7 
8 def handle_request(user_id: str, question: str):
9     with propagate_attributes(
10         customer_identifier=user_id,
11         thread_identifier="conv_abc_123",
12         metadata={"plan": "pro"},
13     ):
14         response = llm.complete(question)
15         print(response)

Attribute	Type	Description
`customer_identifier`	`str`	Identifies the end user in Respan analytics.
`thread_identifier`	`str`	Groups related messages into a conversation.
`metadata`	`dict`	Custom key-value pairs. Merged with default metadata.

Decorators (optional)

Decorators are not required. All LlamaIndex query engines, retrievers, agents, and LLM calls are auto-traced by the instrumentor. Use @workflow and @task to add structure when you want to group related runs into a named workflow with nested tasks.

1 from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
2 from llama_index.llms.openai import OpenAI
3 from respan import Respan, workflow, task
4 from openinference.instrumentation.llama_index import LlamaIndexInstrumentor
5 
6 respan = Respan(instrumentations=[LlamaIndexInstrumentor()])
7 
8 @task(name="build_index")
9 def build_index(path: str):
10     documents = SimpleDirectoryReader(path).load_data()
11     return VectorStoreIndex.from_documents(documents)
12 
13 @workflow(name="rag_pipeline")
14 def rag(question: str, path: str):
15     index = build_index(path)
16     query_engine = index.as_query_engine(llm=OpenAI(model="gpt-4.1-nano"))
17     print(query_engine.query(question))
18 
19 rag("What is the document about?", "./data")
20 respan.flush()

Examples

Query engine

Query engines are auto-traced as a single workflow with nested retriever and LLM spans.

1 from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
2 from llama_index.llms.openai import OpenAI
3 
4 documents = SimpleDirectoryReader("./data").load_data()
5 index = VectorStoreIndex.from_documents(documents)
6 query_engine = index.as_query_engine(llm=OpenAI(model="gpt-4.1-nano"))
7 
8 response = query_engine.query("Summarize this document.")
9 print(response)