Evaluators
Set up Respan
- Sign up — Create an account at platform.respan.ai
- Create an API key — Generate one on the API keys page
- Add credits or a provider key — Add credits on the Credits page or connect your own provider key on the Integrations page
Use AI
Add the Docs MCP to your AI coding tool to get help building with Respan. No API key needed.
Evaluators score your LLM outputs — automatically with an LLM judge, programmatically with code, or manually with human review. Create them on Respan, then trigger from code, the gateway, or experiments.
Set up a grader
Within an evaluator, graders are the individual metrics you score. Create a grader from the GRADERS section in the evaluator builder.
A grader starts with a name, an optional description, an output data type, a score range, and a passing score. The description is especially useful for human review because annotators can use it to understand what the metric measures and how it should be judged.
The output data type, score range, and passing score define the shared scoring contract for that grader. In most cases, these stay the same no matter how the grader is evaluated.
A single grader can include both an LLM evaluation config and a Code evaluation config at the same time. During an evaluation run, Respan loads the config that matches how the grader is assigned, so the same grader can be used with human, llm, or code evaluation workflows.
To edit a grader, click on the pencil icon shown on the active grader block.
LLM grader
Code grader
LLM evaluation lets a grader score outputs automatically with a judge model. This config lives inside the grader and uses the grader’s data type, score range, and passing score.
Write the definition
Use the Definition field to describe what the grader should measure and how the model should score it.

The definition must include {{output}}. You can also reference these optional variables:
Use the definition to explain both the grading criteria and how scores within the grader’s range should be interpreted.
Create an evaluator
Evaluators are built visually from blocks. Drag blocks from the palette and snap them together magnetically like puzzle pieces to define the evaluation flow.
The evaluator builder includes these block types:
Markers
Markers define the entry and exit points of the flow.
Original inputis the starting point of the evaluator.Final resultis the output marker that returns the final grading result.
Conditions
Conditions add branching logic to the evaluator.
Use If, If / Then / Else, and comparison operators to route the flow differently based on values in the graph.
Graders
Grader blocks use the graders configured in the grader section above. They let the evaluator run LLM, code, or human grading logic as part of the flow.
Compute
Compute blocks perform calculations on values in the graph.
Use them for operations such as averages and weighted averages.
Metrics
Metrics blocks represent values from the original input log.
These blocks expose built-in metrics such as latency, cost, model, and token counts.
Constants
Constants blocks represent fixed values that you provide directly in the evaluator.
Use constants for values such as numbers, text, true, or false.
To save a valid evaluator, the last block in the flow must be either Final result or If / Then / Else.
Test, deploy, and load versions
Once the evaluator flow is ready, you can test the whole evaluator, deploy it as a version, and load previous versions from history.
Test runruns the entire evaluator against sample data so you can validate the full flow before deployment.Deploypublishes the current draft as a new evaluator version.Versionsopens the evaluator history.
From the version history, you can review the current draft, see previously deployed versions, and load an older version back into the editor when needed.