Monitor an AI agent in production

Set up Respan

Sign up — Create an account at platform.respan.ai
Create an API key — Generate one on the API keys page
Add credits or a provider key — Add credits on the Credits page or connect your own provider key on the Integrations page

Overview

AI agents make multiple LLM calls, use tools, and branch based on intermediate results. When something goes wrong, it’s hard to debug without visibility into each step. This cookbook shows how to:

Trace an agent’s full execution
Evaluate agent responses automatically
Alert when quality drops

1. Trace the agent

Use the Respan tracing SDK to instrument your agent. Each step becomes a span in the trace tree.

from openai import OpenAI
from respan_tracing.decorators import workflow, task, agent, tool
from respan_tracing.main import RespanTelemetry
from respan_tracing.contexts.span import respan_span_attributes
import json

k_tl = RespanTelemetry()
client = OpenAI()

TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "search_docs",
            "description": "Search the knowledge base",
            "parameters": {
                "type": "object",
                "properties": {"query": {"type": "string"}},
                "required": ["query"],
            },
        },
    }
]


@tool(name="search_docs")
def search_docs(query: str) -> str:
    """Simulated knowledge base search."""
    return f"Found 3 results for '{query}': [doc1, doc2, doc3]"


@task(name="plan")
def plan(user_message: str):
    """Agent decides what to do."""
    completion = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a helpful assistant. Use tools when needed."},
            {"role": "user", "content": user_message},
        ],
        tools=TOOLS,
    )
    return completion.choices[0].message


@task(name="synthesize")
def synthesize(user_message: str, tool_results: list[str]):
    """Generate final answer from tool results."""
    context = "\n".join(tool_results)
    completion = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": f"Answer using this context:\n{context}"},
            {"role": "user", "content": user_message},
        ],
    )
    return completion.choices[0].message.content


@workflow(name="support_agent")
def support_agent(user_message: str, customer_id: str):
    with respan_span_attributes(
        respan_params={
            "customer_identifier": customer_id,
            "metadata": {"agent": "support", "version": "v1"},
        }
    ):
        # Step 1: Plan
        plan_result = plan(user_message)

        # Step 2: Execute tools
        tool_results = []
        if plan_result.tool_calls:
            for tool_call in plan_result.tool_calls:
                args = json.loads(tool_call.function.arguments)
                result = search_docs(**args)
                tool_results.append(result)

        # Step 3: Synthesize
        if tool_results:
            answer = synthesize(user_message, tool_results)
        else:
            answer = plan_result.content

    return answer


# Run it
result = support_agent("How do I set up tracing?", customer_id="user_789")
print(result)

What you’ll see in Respan

support_agent (workflow)
├── plan (task)
│   └── gpt-4o-mini (LLM call)
├── search_docs (tool)
└── synthesize (task)
    └── gpt-4o-mini (LLM call)

Each span shows input, output, latency, and cost. You can see exactly what the agent decided, what tools it called, and what it returned.

2. Set up online evaluation

Create an automation that evaluates agent responses in real-time:

Create an evaluator

Go to Evaluation > Evaluators > + New evaluator:

Field	Value
Name	Agent Response Quality
Type	LLM
Model	gpt-4o
Score type	Numerical (1-5)
Definition	Rate the agent’s response quality. Consider: (1) Did it answer the question? (2) Is the answer accurate? (3) Did it use tools appropriately? Score 1 = poor, 5 = excellent.

Create a condition

Go to Conditions and create a condition:

Type: Single log
Filter: metadata.agent = "support"

Create an automation

Go to Automations > + New automation:

Select Online evals as the type
Select your condition
Select the evaluator
Set sampling rate (start with 0.1 for 10% of traffic)

3. Set up alerts

Use webhook notifications to get alerted when quality drops:

Go to Automations > + New automation
Select Alert as the type
Create a condition based on aggregated metrics (e.g., average evaluation score < 3 over last hour)
Configure your webhook URL (Slack, PagerDuty, email)

Debugging workflow

When you get an alert:

Check the dashboard — Look for spikes in errors or latency
Filter traces — Use metadata.agent = "support" to find recent agent traces
Inspect spans — Open a failing trace and walk through each step
Identify the issue — Bad retrieval? Wrong tool call? Poor synthesis?
Fix and test — Update prompts or logic, run offline experiments to verify

Get started

Features

Admin

Security

Resources

Help & Community

Monitor an AI agent in production

Overview

1. Trace the agent

What you’ll see in Respan

2. Set up online evaluation

Create an evaluator

Create a condition

Create an automation

3. Set up alerts

Debugging workflow

Next steps

Tracing quickstart

Automations

Get started

Features

Admin

Security

Resources

Help & Community

​Overview

​1. Trace the agent

​What you’ll see in Respan

​2. Set up online evaluation

​Create an evaluator

​Create a condition

​Create an automation

​3. Set up alerts

​Debugging workflow

​Next steps

Tracing quickstart

Automations

Overview

1. Trace the agent

What you’ll see in Respan

2. Set up online evaluation

Create an evaluator

Create a condition

Create an automation

3. Set up alerts

Debugging workflow

Next steps