Log every LLM call, trace agent pipelines, and run quality evals — with two lines of code. No infra to build, no dashboard to maintain.
Works with OpenAI, Anthropic, Gemini, Groq, LangChain, LlamaIndex, and 20+ integrations.
Everything you'd build yourself if you had the time — ready in minutes, free to start.
Debug production failures
Find the exact request that went wrong. See inputs, outputs, latency, and the full agent trace — searchable in seconds.
Trace agent pipelines
View every step of a multi-step agent: which tools were called, what data passed between steps, and where latency accumulated.
Track costs per request
See token usage and dollar cost for every call. Break down spend by model, feature, or user to find where money goes.
Run quality evals
Apply built-in evaluators (hallucination, relevance, toxicity) or write custom Python checks. Scores tracked over time.
Version prompts without deploys
Store prompts in Respan, pull at runtime via SDK. Edit, A/B test, and roll back from the dashboard — no redeploy needed.
Search and filter all logs
Full-text search across every request. Filter by user, model, cost, latency, status, or custom metadata tags.
Set up alerts
Threshold alerts for cost spikes, latency regressions, and error rate increases. Get notified before users do.
Export everything
Pull logs, traces, and eval results via REST API. Stream to S3, BigQuery, or any destination via webhooks.
Daily workflow
Quality & evals
Ship & iterate
From zero to full observability in under 5 minutes. No infrastructure to deploy.
Install the SDK
pip install respan or npm install respan. One package, zero configuration.
→ SDK ready
Wrap your LLM client
Two lines of code. Wrap your OpenAI, Anthropic, or framework client. Every call is logged automatically.
→ Structured logs for every request
Search and debug
Find any request by content, user, cost, or metadata. Trace agent pipelines step-by-step.
→ Root cause identified in seconds
Evaluate quality
Run evaluators on production outputs: hallucination, relevance, coherence, or custom checks.
→ Quality scores per request and over time
Iterate and ship
Update prompts from the dashboard. Compare variants. Deploy changes without touching code.
→ Better outputs, shipped faster
<5 min
from signup to first log
2 lines
of code to instrument
100%
of requests captured
<80ms
P99 ingestion latency
Model providers
Frameworks
Languages