What is Respan?

How Respan works

Everything in Respan is built on one core data structure: the span. Every LLM interaction, whether it comes from the tracing SDK, a framework integration, the AI gateway, or a direct API call, is stored as a span with its input, output, model, metrics, and metadata.

Spans form hierarchies called Traces (the execution tree of an agent workflow). They can also group into Threads (conversations) and carry Scores (evaluation results). Every feature in the platform reads from this same span data.

The full workflow

Respan has three workflows that feed into each other:

Trace & monitor captures production data: agent steps, LLM calls, user interactions.
Evaluate & optimize turns that data into quality measurements, comparing prompt versions, models, and configurations.
Prompt & gateway deploys the winning configuration and routes traffic.
Iterate — new traffic flows back into tracing, and the cycle repeats.

Trace & monitor

Tracing records the execution tree of your workflows: every agent step, tool call, and model request in one view.

Tracing

Monitoring

User analytics

See agent workflows as trace trees with parent-child spans. Each span shows its input, output, latency, and cost.

Choose one to get started:

Tracing SDK: add @workflow / @task decorators. LLM calls are auto-captured.
Framework integrations: pre-built exporters for OpenAI Agents SDK, Vercel AI, Mastra, LangGraph, and others.
Manual ingestion:
- Ingest traces: send traces via OTLP or JSON API.
- Ingest logs: log individual LLM calls.

Production spans feed directly into evaluations. Sample real traffic into datasets, run evaluators on live spans via automations, or set up alerts to catch regressions after prompt changes.

Evaluate & optimize

With production data flowing, you can start measuring output quality. Respan supports two evaluation modes: online (live traffic) and offline (test datasets).

Offline evaluation

Online evaluation

Build datasets from production spans or CSV imports, run them through different prompt versions or models in experiments, and compare evaluator scores side by side.

The eval pipeline:

Online: production data goes directly to evaluators, which produce scores in real time.
Offline:
1. Build a dataset: sample production spans or import test cases via CSV.
2. Set up evaluators: LLM evaluators (an LLM judges quality), code evaluators (a Python function checks format or length), or human evaluators (your team reviews manually).
3. Run experiments: test your dataset against different prompt versions or models.
4. Compare scores.

Datasets come from the production spans captured by tracing. Experiment results tell you which prompt version to deploy. Online evaluations feed into monitoring alerts so quality issues surface immediately.

Prompt & gateway

Manage prompts outside your codebase and route LLM traffic through one API.

Prompt management

AI gateway

Create templates with {{variables}}, commit versions, test in the playground, and deploy without code changes. Your application picks up new versions immediately.

Workflow:

Create prompts: create templates, commit versions, compare changes.
Set up gateway: access 250+ models with one line of code.
Use prompt in code: reference prompts by ID. Deploy new versions without code changes.
Add fallbacks, retries, load balancing, and caching: configure reliability and cost optimization for your gateway traffic.

The gateway adds ~50-150ms of latency. If latency is critical, use tracing for observability instead of routing through the gateway.

All gateway traffic is logged automatically, feeding back into tracing. Prompt variables appear in spans so you can see which version and inputs produced each output. Prompts deployed here are typically tested through the evaluation pipeline first.

Getting started

Create your Respan account

Get started

Get up and running with Respan in minutes

Support

Join Discord

Connect with our community

Book a demo

Schedule a personalized walkthrough

Support center

Get help and documentation