- Build — Connect to LLMs and manage prompts
- Observe — Monitor logs, traces, and user behavior
- Evaluate — Test quality offline and online
- Iterate — Optimize based on data and ship improvements
1. Build
Start by connecting your application to LLMs and setting up prompt management.Gateway + Prompt management
Use the Respan gateway to call 250+ models through a single API, then manage your prompts in the platform for version control, team collaboration, and A/B testing. What you’ll do:- Point your OpenAI SDK to the Respan gateway (2 lines of code)
- Create prompt templates in the Respan platform
- Use prompts in your code with the Prompts API
- Iterate on prompts in the playground without redeploying
Gateway quickstart
Connect to 250+ models via one API
Prompt management quickstart
Create and deploy prompt templates
2. Observe
Once your application is running, add observability to understand what’s happening in production.Logging or tracing
Choose based on your application complexity:- Logging — For simple LLM calls. Send request/response data to Respan for monitoring.
- Tracing — For multi-step agents and workflows. See the full execution tree with parent-child relationships.
- Instrument your LLM calls with logging or tracing
- Monitor cost, latency, errors, and token usage on the dashboard
- Track users with
customer_identifier - Filter and search logs by metadata, model, status, and more
Logging quickstart
Send LLM calls to Respan for observability
Tracing quickstart
Monitor agent workflows step-by-step
3. Evaluate
With observability data flowing, set up evaluations to systematically measure and improve quality.3.1 Offline evaluation
Test prompts and models before deploying to production. Run experiments over datasets to compare performance. What you’ll do:- Set up a dataset — Curate test cases with inputs and expected outputs
- Set up evaluators — Define how to score responses (LLM judge, code-based, or human review)
- Run experiments — Test prompt versions, models, or agent configurations against your dataset and compare scores side-by-side
Evaluation quickstart
Set up datasets, evaluators, and experiments
Experiments
Run and compare experiments
3.2 Online evaluation
Monitor production quality continuously with automations that evaluate live traffic. What you’ll do:- Set up automations — Configure rules that trigger evaluators on incoming logs
- Monitor scores — Track evaluation results on the dashboard in real time
- Set alerts — Get notified when quality drops below thresholds
Automations quickstart
Set up automated evaluation pipelines
Online evaluation
Run evaluators on live traffic
4. Iterate
Use the data from observation and evaluation to continuously improve your AI product. The loop:- Review production logs and scores to identify failure modes
- Curate failing examples into your evaluation dataset
- Adjust prompts in the playground and test with experiments
- Deploy improved versions and monitor with automations
This is a continuous cycle. Each deployment generates new observability data, which feeds back into evaluation and optimization. The goal is to make each iteration faster and more data-driven.