End-to-end: from first API call to production monitoring
Set up Respan
- Sign up — Create an account at platform.respan.ai
- Create an API key — Generate one on the API keys page
- Add credits or a provider key — Add credits on the Credits page or connect your own provider key on the Integrations page
Overview
This cookbook walks through the complete Respan workflow with a real example: a customer support chatbot. By the end, you’ll have:
- LLM calls routed through the gateway
- Prompts managed in the platform
- Full tracing for every conversation
- Automated evaluation on production traffic
Step 1: Set up the gateway
Point your OpenAI SDK to Respan:
Test that it works:
Step 2: Create a prompt template
Go to Prompts and create a prompt:
- Name:
support_chatbot - System message:
- Model:
gpt-4o-mini
Step 3: Use the prompt in code
Fetch the prompt at runtime so you can update it without redeploying:
Step 4: Add tracing
Wrap the chatbot logic with tracing to see the full execution flow:
You’ll see this trace in Respan:
Step 5: Create an evaluator
Go to Evaluation > Evaluators > + New evaluator:
Step 6: Run an offline experiment
Before going to production, test your prompt against sample questions:
- Go to Experiments > + New experiment
- Select the
support_chatbotprompt - Add test cases:
- Run the experiment and evaluate with the “Support Quality” evaluator
- If scores are good, proceed to production
Step 7: Set up online evaluation
Create an automation to continuously evaluate production traffic:
- Condition:
metadata.feature = "support_chat" - Evaluator: Support Quality
- Sampling rate:
0.2(evaluate 20% of traffic)
Now every 5th support conversation is automatically scored.
Step 8: Monitor and iterate
Your production monitoring is now running. Here’s your ongoing workflow:
- Check the dashboard daily — Watch cost, latency, and evaluation scores
- Review low-scoring logs — Filter for conversations that scored below 3
- Add failures to your test dataset — Growing your dataset makes evaluations more robust
- Iterate on the prompt — Edit in the Respan playground, test with experiments
- Deploy updates — Update the active prompt version, monitor the impact
The complete workflow is now a loop: production data feeds evaluation, which feeds prompt improvements, which deploy back to production.