This is a beta feature. The API documentation is the source of truth for evaluator configuration and behavior.
Set up an evaluator
Go to Evaluators and click + New evaluator. Select the evaluator type:- LLM evaluator
- Code evaluator
- Human evaluator
LLM evaluators use a language model to score outputs automatically.
Configure the evaluator
Define a Slug — a unique identifier used to reference this evaluator in API calls and logs.Choose a model for the evaluator. Currently supported: 
gpt-4o and gpt-4o-mini (OpenAI and Azure OpenAI).
Write the definition
The definition is the core instruction that tells the LLM how to evaluate. You can use these variables:
| Variable | Description |
|---|---|
{{input}} | The input prompt sent to the LLM |
{{output}} | The response generated by the LLM |
{{metadata}} | Custom metadata associated with the request |
{{metrics}} | System-captured metrics (latency, tokens, etc.) |
Ideal output:
ideal_output is not a standalone variable. To compare against a reference answer, include it in your metadata and reference it as {{metadata.ideal_output}}.

