LlamaIndex

LlamaIndex is a framework for building LLM applications with your own data. It provides indexes, query engines, retrievers, and agents for retrieval-augmented generation. Respan gives you full observability over every query, retrieval, agent step, and LLM call — and gateway routing through the OpenAI-compatible Respan endpoint.

Create an account at platform.respan.ai and grab an API key. For gateway, also add credits or a provider key.

Run npx @respan/cli setup to set up with your coding agent.

Setup

1

Install packages

$pip install respan-ai openinference-instrumentation-llama-index llama-index llama-index-llms-openai
2

Set environment variables

$export OPENAI_API_KEY="YOUR_OPENAI_API_KEY"
$export RESPAN_API_KEY="YOUR_RESPAN_API_KEY"

OPENAI_API_KEY is used for LLM requests. RESPAN_API_KEY is used to export traces to Respan.

3

Initialize and run

1from llama_index.llms.openai import OpenAI
2from respan import Respan
3from openinference.instrumentation.llama_index import LlamaIndexInstrumentor
4
5respan = Respan(instrumentations=[LlamaIndexInstrumentor()])
6
7llm = OpenAI(model="gpt-4.1-nano")
8
9response = llm.complete("Say hello in three languages.")
10print(response)
11respan.flush()
4

View your trace

Open the Traces page to see your LlamaIndex workflow with query spans, retrievers, agent steps, and LLM calls.

Configuration

ParameterTypeDefaultDescription
api_keystr | NoneNoneFalls back to RESPAN_API_KEY env var.
base_urlstr | NoneNoneFalls back to RESPAN_BASE_URL env var.
instrumentationslist[]Plugin instrumentations to activate (e.g. LlamaIndexInstrumentor()).
customer_identifierstr | NoneNoneDefault customer identifier for all spans.
metadatadict | NoneNoneDefault metadata attached to all spans.
environmentstr | NoneNoneEnvironment tag (e.g. "production").

Attributes

In Respan()

Set defaults at initialization — these apply to all spans.

1from respan import Respan
2from openinference.instrumentation.llama_index import LlamaIndexInstrumentor
3
4respan = Respan(
5 instrumentations=[LlamaIndexInstrumentor()],
6 customer_identifier="user_123",
7 metadata={"service": "rag-api", "version": "1.0.0"},
8)

With propagate_attributes

Override per-request using a context scope.

1from llama_index.llms.openai import OpenAI
2from respan import Respan, propagate_attributes
3from openinference.instrumentation.llama_index import LlamaIndexInstrumentor
4
5respan = Respan(instrumentations=[LlamaIndexInstrumentor()])
6llm = OpenAI(model="gpt-4.1-nano")
7
8def handle_request(user_id: str, question: str):
9 with propagate_attributes(
10 customer_identifier=user_id,
11 thread_identifier="conv_abc_123",
12 metadata={"plan": "pro"},
13 ):
14 response = llm.complete(question)
15 print(response)
AttributeTypeDescription
customer_identifierstrIdentifies the end user in Respan analytics.
thread_identifierstrGroups related messages into a conversation.
metadatadictCustom key-value pairs. Merged with default metadata.

Decorators (optional)

Decorators are not required. All LlamaIndex query engines, retrievers, agents, and LLM calls are auto-traced by the instrumentor. Use @workflow and @task to add structure when you want to group related runs into a named workflow with nested tasks.

1from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
2from llama_index.llms.openai import OpenAI
3from respan import Respan, workflow, task
4from openinference.instrumentation.llama_index import LlamaIndexInstrumentor
5
6respan = Respan(instrumentations=[LlamaIndexInstrumentor()])
7
8@task(name="build_index")
9def build_index(path: str):
10 documents = SimpleDirectoryReader(path).load_data()
11 return VectorStoreIndex.from_documents(documents)
12
13@workflow(name="rag_pipeline")
14def rag(question: str, path: str):
15 index = build_index(path)
16 query_engine = index.as_query_engine(llm=OpenAI(model="gpt-4.1-nano"))
17 print(query_engine.query(question))
18
19rag("What is the document about?", "./data")
20respan.flush()

Examples

Query engine

Query engines are auto-traced as a single workflow with nested retriever and LLM spans.

1from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
2from llama_index.llms.openai import OpenAI
3
4documents = SimpleDirectoryReader("./data").load_data()
5index = VectorStoreIndex.from_documents(documents)
6query_engine = index.as_query_engine(llm=OpenAI(model="gpt-4.1-nano"))
7
8response = query_engine.query("Summarize this document.")
9print(response)