Ollama

Ollama lets you run large language models locally. It provides a simple CLI and API for downloading, running, and managing models like Llama, Mistral, and Gemma on your own hardware. Respan gives you full observability over every Ollama call, streamed response, and tool invocation — and gateway routing through the OpenAI-compatible Respan endpoint when you want to mix local and hosted models.

Create an account at platform.respan.ai and grab an API key. For gateway, also add credits or a provider key.

Run npx @respan/cli setup to set up with your coding agent.

Setup

1

Install packages

$pip install respan-ai opentelemetry-instrumentation-ollama ollama
2

Set environment variables

$export RESPAN_API_KEY="YOUR_RESPAN_API_KEY"

RESPAN_API_KEY is used to export traces to Respan. Ollama runs locally and needs no API key.

3

Initialize and run

1from ollama import Client
2from respan import Respan
3from opentelemetry.instrumentation.ollama import OllamaInstrumentor
4
5respan = Respan(instrumentations=[OllamaInstrumentor()])
6
7client = Client()
8
9response = client.chat(
10 model="llama3.1",
11 messages=[{"role": "user", "content": "Say hello in three languages."}],
12)
13print(response["message"]["content"])
14respan.flush()
4

View your trace

Open the Traces page to see your auto-instrumented Ollama spans with prompts, tokens, and latency.

Configuration

ParameterTypeDefaultDescription
api_keystr | NoneNoneFalls back to RESPAN_API_KEY env var.
base_urlstr | NoneNoneFalls back to RESPAN_BASE_URL env var.
instrumentationslist[]Plugin instrumentations to activate (e.g. OllamaInstrumentor()).
customer_identifierstr | NoneNoneDefault customer identifier for all spans.
metadatadict | NoneNoneDefault metadata attached to all spans.
environmentstr | NoneNoneEnvironment tag (e.g. "production").

Attributes

In Respan()

Set defaults at initialization — these apply to all spans.

1from respan import Respan
2from opentelemetry.instrumentation.ollama import OllamaInstrumentor
3
4respan = Respan(
5 instrumentations=[OllamaInstrumentor()],
6 customer_identifier="user_123",
7 metadata={"service": "local-llm", "version": "1.0.0"},
8)

With propagate_attributes

Override per-request using a context scope.

1from respan import Respan, propagate_attributes
2from opentelemetry.instrumentation.ollama import OllamaInstrumentor
3
4respan = Respan(instrumentations=[OllamaInstrumentor()])
5
6def handle_request(user_id: str, question: str):
7 with propagate_attributes(
8 customer_identifier=user_id,
9 thread_identifier="conv_abc_123",
10 metadata={"plan": "pro"},
11 ):
12 response = client.chat(
13 model="llama3.1",
14 messages=[{"role": "user", "content": question}],
15 )
16 print(response["message"]["content"])
AttributeTypeDescription
customer_identifierstrIdentifies the end user in Respan analytics.
thread_identifierstrGroups related messages into a conversation.
metadatadictCustom key-value pairs. Merged with default metadata.