How Respan works
Everything in Respan is built on one core data structure: the span. Every LLM interaction, whether it comes from the tracing SDK, a framework integration, the AI gateway, or a direct API call, is stored as a span with its input, output, model, metrics, and metadata. Spans form hierarchies called Traces (the execution tree of an agent workflow). They can also group into Threads (conversations) and carry Scores (evaluation results). Every feature in the platform reads from this same span data.The full workflow
Respan has three workflows that feed into each other:- Trace & monitor captures production data: agent steps, LLM calls, user interactions.
- Evaluate & optimize turns that data into quality measurements, comparing prompt versions, models, and configurations.
- Prompt & gateway deploys the winning configuration and routes traffic.
- Iterate — new traffic flows back into tracing, and the cycle repeats.
Trace & monitor
Tracing records the execution tree of your workflows: every agent step, tool call, and model request in one view.- Tracing
- Monitoring
- User analytics
See agent workflows as trace trees with parent-child spans. Each span shows its input, output, latency, and cost.
- Tracing SDK: add
@workflow/@taskdecorators. LLM calls are auto-captured. - Framework integrations: pre-built exporters for OpenAI Agents SDK, Vercel AI, Mastra, LangGraph, and others.
- Manual ingestion:
- Ingest traces: send traces via OTLP or JSON API.
- Ingest logs: log individual LLM calls.
Evaluate & optimize
With production data flowing, you can start measuring output quality. Respan supports two evaluation modes: online (live traffic) and offline (test datasets).- Offline evaluation
- Online evaluation
Build datasets from production spans or CSV imports, run them through different prompt versions or models in experiments, and compare evaluator scores side by side.
- Online: production data goes directly to evaluators, which produce scores in real time.
- Offline:
- Build a dataset: sample production spans or import test cases via CSV.
- Set up evaluators: LLM evaluators (an LLM judges quality), code evaluators (a Python function checks format or length), or human evaluators (your team reviews manually).
- Run experiments: test your dataset against different prompt versions or models.
- Compare scores.
Prompt & gateway
Manage prompts outside your codebase and route LLM traffic through one API.- Prompt management
- AI gateway
Create templates with
{{variables}}, commit versions, test in the playground, and deploy without code changes. Your application picks up new versions immediately.- Create prompts: create templates, commit versions, compare changes.
- Set up gateway: access 250+ models with one line of code.
- Use prompt in code: reference prompts by ID. Deploy new versions without code changes.
- Add fallbacks, retries, load balancing, and caching: configure reliability and cost optimization for your gateway traffic.
The gateway adds ~50-150ms of latency. If latency is critical, use tracing for observability instead of routing through the gateway.
