Evaluate & optimize

Set up Respan

Sign up — Create an account at platform.respan.ai
Create an API key — Generate one on the API keys page
Add credits or a provider key — Add credits on the Credits page or connect your own provider key on the Integrations page

Evaluate

Once you have data flowing into Respan, you can start evaluating your AI system. The goal is to compare different versions of your prompts, models, and configurations so you ship the best version with confidence. See the evaluation quickstart for a hands-on walkthrough. Here’s how the eval pipeline works:

Build a dataset: you need test cases to evaluate against. You can create a dataset from your Respan logs (sample real production data) or import your own testset via CSV.
Set up evaluators: define how outputs are scored. Respan supports LLM evaluators (an LLM judges quality), code evaluators (a Python function checks format, length, etc.), and human evaluators (your team reviews manually).
Run experiments: bring it all together in experiments. Run your dataset through different prompt versions, models, and configurations, then compare evaluator scores side by side to find the best one. You can also run experiments programmatically via the API.

Optimize

After identifying the best configuration from your experiments, iterate and deploy:

Manage prompts: edit your prompt, commit a new version, and run another experiment. Repeat until you’re confident in the result.
Deploy to production: push the winning version live with one click. All API calls using this prompt automatically pick up the new version, no code changes needed.
Online evaluation: keep quality high after deployment by running evaluators on live traffic automatically. Get alerted when scores drop.

Get started

Features

Admin

Security

Resources

Help & Community

Evaluate & optimize

Evaluate

Optimize

Get started

Features

Admin

Security

Resources

Help & Community

​Evaluate

​Optimize

Evaluate

Optimize