Set up Respan
Set up Respan
- Sign up — Create an account at platform.respan.ai
- Create an API key — Generate one on the API keys page
- Add credits or a provider key — Add credits on the Credits page or connect your own provider key on the Integrations page
Overview
AI agents make multiple LLM calls, use tools, and branch based on intermediate results. When something goes wrong, it’s hard to debug without visibility into each step. This cookbook shows how to:- Trace an agent’s full execution
- Evaluate agent responses automatically
- Alert when quality drops
1. Trace the agent
Use the Respan tracing SDK to instrument your agent. Each step becomes a span in the trace tree.What you’ll see in Respan
2. Set up online evaluation
Create an automation that evaluates agent responses in real-time:Create an evaluator
Go to Evaluation > Evaluators > + New evaluator:| Field | Value |
|---|---|
| Name | Agent Response Quality |
| Type | LLM |
| Model | gpt-4o |
| Score type | Numerical (1-5) |
| Definition | Rate the agent’s response quality. Consider: (1) Did it answer the question? (2) Is the answer accurate? (3) Did it use tools appropriately? Score 1 = poor, 5 = excellent. |
Create a condition
Go to Conditions and create a condition:- Type: Single log
- Filter:
metadata.agent = "support"
Create an automation
Go to Automations > + New automation:- Select Online evals as the type
- Select your condition
- Select the evaluator
- Set sampling rate (start with
0.1for 10% of traffic)
3. Set up alerts
Use webhook notifications to get alerted when quality drops:- Go to Automations > + New automation
- Select Alert as the type
- Create a condition based on aggregated metrics (e.g., average evaluation score < 3 over last hour)
- Configure your webhook URL (Slack, PagerDuty, email)
Debugging workflow
When you get an alert:- Check the dashboard — Look for spikes in errors or latency
- Filter traces — Use
metadata.agent = "support"to find recent agent traces - Inspect spans — Open a failing trace and walk through each step
- Identify the issue — Bad retrieval? Wrong tool call? Poor synthesis?
- Fix and test — Update prompts or logic, run offline experiments to verify