Create evaluator | Respan Docs

Creates a new evaluator for your organization. You must specify type and score_value_type. The eval_class field is optional and only used for pre-built templates.

Authentication

All endpoints require API key authentication:

$ Authorization: Bearer YOUR_API_KEY

Evaluator Types and Score Value Types

Evaluator Types (Required)

Important: The evaluator type field now represents the primary interface/use case, but automation can be added independently via llm_config or code_config. This decouples the annotation method from the evaluator type.

llm: Primarily LLM-based evaluators (can also have code automation)
human: Primarily human annotation-based (can have LLM or code automation for assistance)
code: Primarily code-based evaluators (can also have LLM automation as fallback)

Score Value Types (Required)

numerical: Numeric scores (e.g., 1-5, 0.0-1.0)
boolean: True/false or pass/fail evaluations
percentage: 0-100 percentage scores (use decimals; 0.0–100.0)
single_select: Choose exactly one option from predefined choices
multi_select: Choose one or more options from predefined choices
json: Structured JSON data for complex evaluations
text: Text-based feedback and comments
(Legacy) categorical and comment remain readable for older evaluators

Pre-built Templates (Optional)

You can optionally use pre-built templates by specifying eval_class:

respan_custom_llm: LLM-based evaluator with standard configuration
custom_code: Code-based evaluator template

Unified Evaluator Inputs

All evaluator runs now receive a single unified inputs object. This applies to all evaluator types (llm, human, code).

Structure:

1 {
2   "inputs": {
3     "input": {},
4     "output": {},
5     "metrics": {},
6     "metadata": {}
7   }
8 }

input (any JSON): The request/input to be evaluated.
output (any JSON): The response/output being evaluated.
metrics (object, optional): System-captured metrics (e.g., tokens, latency, cost).
metadata (object, optional): Context and custom properties you pass; also logged.
llm_input and llm_output (string, optional): Legacy convenience aliases.

Required Fields

name (string): Display name for the evaluator
type (string): Evaluator type - "llm", "human", or "code"
score_value_type (string): Score format - "numerical", "boolean", "categorical", or "comment"

Optional Fields

evaluator_slug (string): Unique identifier (auto-generated if not provided)
description (string): Description of the evaluator
eval_class (string): Pre-built template to use (optional)
configurations (object): Custom configuration based on evaluator type
categorical_choices (array): Required when score_value_type is "categorical"

New Format (Recommended)

The new evaluator format uses clean, flat configuration fields instead of nested configurations. This format allows you to add both LLM and code automation to any evaluator type, decoupling the annotation method from the evaluator type.

New Top-Level Fields (All Optional)

Field	Type	Description
`score_config`	object	Score type configuration (shape varies by `score_value_type`)
`passing_conditions`	object	Passing conditions using universal filter format
`llm_config`	object	LLM automation config (if using LLM for scoring)
`code_config`	object	Code automation config (if using code for scoring)

Score Config Shapes

Numerical/Percentage:

1 {
2   "min_score": 0.0,
3   "max_score": 5.0,
4   "choices": [...]  // Optional discrete values
5 }

Single/Multi Select:

1 {
2   "choices": [
3     {"name": "Professional", "value": "professional"},
4     {"name": "Casual", "value": "casual"}
5   ]
6 }

LLM Config

1 {
2   "model": "gpt-4o-mini",
3   "evaluator_definition": "Your prompt template with {{input}} and {{output}}",
4   "scoring_rubric": "Scoring instructions",
5   "temperature": 0.1,
6   "max_tokens": 200
7 }

Available LLM config fields (all optional except model and evaluator_definition):

Core: model, stream
Sampling: temperature, top_p, max_tokens, max_completion_tokens
Penalties: frequency_penalty, presence_penalty, stop
Formatting: response_format, verbosity
Tools: tools, tool_choice, parallel_tool_calls

Code Config

1 {
2   "eval_code_snippet": "def main(eval_inputs):\n    return 1 if 'success' in eval_inputs.get('output', '') else 0"
3 }

Passing Conditions

Uses the universal filter format. Example:

1 {
2   "primary_score": {
3     "operator": "gte",
4     "value": 3
5   }
6 }

For complete details, see the Filters API Reference.

Legacy Format (Still Supported)

The legacy configurations format remains fully functional for backward compatibility.

Configuration Fields by Type

For type: "llm" evaluators:

evaluator_definition (string): The evaluation prompt/instruction. Must include {{input}} and {{output}} template variables. Legacy {{llm_input}} and {{llm_output}} are also supported for backward compatibility.
scoring_rubric (string): Description of the scoring criteria
llm_engine (string): LLM model to use (e.g., “gpt-4o-mini”, “gpt-4o”)
model_options (object, optional): LLM parameters like temperature, max_tokens
min_score (number, optional): Minimum possible score
max_score (number, optional): Maximum possible score
passing_score (number, optional): Score threshold for passing

For type: "code" evaluators:

eval_code_snippet (string): Python code with main(eval_inputs) function that returns the score

For type: "human" evaluators:

No specific configuration fields required
Use categorical_choices field when score_value_type is "single_select" or "multi_select"

For score_value_type: "single_select" | "multi_select":

categorical_choices (array): List of choice objects with name and value properties

1 [
2   { "name": "Excellent", "value": 5 },
3   { "name": "Good", "value": 4 }
4 ]

Examples

New Format Examples

LLM Evaluator with Automation (Numerical)

Python

1 url = "https://api.respan.ai/api/evaluators/"
2 headers = {
3     "Authorization": "Bearer YOUR_API_KEY",
4     "Content-Type": "application/json"
5 }
6 
7 data = {
8     "name": "Response Quality",
9     "evaluator_slug": "response_quality_v2",
10     "score_value_type": "numerical",
11     "score_config": {
12         "min_score": 1,
13         "max_score": 5,
14         "choices": [
15             {"name": "Poor", "value": 1},
16             {"name": "Fair", "value": 2},
17             {"name": "Good", "value": 3},
18             {"name": "Great", "value": 4},
19             {"name": "Excellent", "value": 5}
20         ]
21     },
22     "passing_conditions": {
23         "primary_score": {
24             "operator": "gte",
25             "value": 3
26         }
27     },
28     "llm_config": {
29         "model": "gpt-4o-mini",
30         "evaluator_definition": "Rate the quality of this response:\n<input>{{input}}</input>\n<output>{{output}}</output>",
31         "scoring_rubric": "1=Poor, 5=Excellent",
32         "temperature": 0.1
33     }
34 }
35 
36 response = requests.post(url, headers=headers, json=data)
37 print(response.json())

cURL

$ curl -X POST "https://api.respan.ai/api/evaluators/" \
>   -H "Authorization: Bearer YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "name": "Response Quality",
>     "score_value_type": "numerical",
>     "score_config": {
>       "min_score": 1,
>       "max_score": 5
>     },
>     "passing_conditions": {
>       "primary_score": {
>         "operator": "gte",
>         "value": 3
>       }
>     },
>     "llm_config": {
>       "model": "gpt-4o-mini",
>       "evaluator_definition": "Rate the quality:\n<input>{{input}}</input>\n<output>{{output}}</output>",
>       "temperature": 0.1
>     }
>   }'

Human Evaluator with LLM Assistance

This shows how a human evaluator can have LLM automation for suggested scoring, decoupling annotation method from evaluator type.

Python

1 data = {
2     "name": "Human Review with AI Assistance",
3     "evaluator_slug": "human_ai_assist_v1",
4     "type": "human",
5     "score_value_type": "numerical",
6     "score_config": {"min_score": 1, "max_score": 5},
7     "llm_config": {
8         "model": "gpt-4o-mini",
9         "evaluator_definition": "Suggest a quality score for this response",
10         "temperature": 0.1
11     }
12 }
13 
14 response = requests.post(url, headers=headers, json=data)
15 print(response.json())

cURL

$ curl -X POST "https://api.respan.ai/api/evaluators/" \
>   -H "Authorization: Bearer YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "name": "Human Review with AI Assistance",
>     "type": "human",
>     "score_value_type": "numerical",
>     "score_config": {"min_score": 1, "max_score": 5},
>     "llm_config": {
>       "model": "gpt-4o-mini",
>       "evaluator_definition": "Suggest a quality score"
>     }
>   }'

Code Evaluator (Boolean)

Python

1 data = {
2     "name": "Length Check",
3     "evaluator_slug": "length_check_v1",
4     "score_value_type": "boolean",
5     "description": "Checks if response is longer than 10 characters",
6     "code_config": {
7         "eval_code_snippet": "def main(eval_inputs):\n    output = eval_inputs.get('output', '')\n    return len(str(output)) > 10"
8     }
9 }
10 
11 response = requests.post(url, headers=headers, json=data)
12 print(response.json())

cURL

$ curl -X POST "https://api.respan.ai/api/evaluators/" \
>   -H "Authorization: Bearer YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "name": "Length Check",
>     "score_value_type": "boolean",
>     "code_config": {
>       "eval_code_snippet": "def main(eval_inputs):\n    return len(str(eval_inputs.get(\"output\", \"\"))) > 10"
>     }
>   }'

Single Select Evaluator with LLM

Python

1 data = {
2     "name": "Tone Classifier",
3     "score_value_type": "single_select",
4     "score_config": {
5         "choices": [
6             {"name": "Professional", "value": "professional"},
7             {"name": "Casual", "value": "casual"},
8             {"name": "Formal", "value": "formal"}
9         ]
10     },
11     "llm_config": {
12         "model": "gpt-4o-mini",
13         "evaluator_definition": "Classify the tone of this response"
14     }
15 }
16 
17 response = requests.post(url, headers=headers, json=data)
18 print(response.json())

cURL

$ curl -X POST "https://api.respan.ai/api/evaluators/" \
>   -H "Authorization: Bearer YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "name": "Tone Classifier",
>     "score_value_type": "single_select",
>     "score_config": {
>       "choices": [
>         {"name": "Professional", "value": "professional"},
>         {"name": "Casual", "value": "casual"}
>       ]
>     },
>     "llm_config": {
>       "model": "gpt-4o-mini",
>       "evaluator_definition": "Classify the tone"
>     }
>   }'

Legacy Format Examples

Custom LLM Evaluator (Numerical)

Python

1 url = "https://api.respan.ai/api/evaluators/"
2 headers = {
3     "Authorization": "Bearer YOUR_API_KEY",
4     "Content-Type": "application/json"
5 }
6 
7 data = {
8     "name": "Response Quality Evaluator",
9     "evaluator_slug": "response_quality_v1",
10     "type": "llm",
11     "score_value_type": "numerical",
12     "description": "Evaluates response quality on a 1-5 scale",
13     "configurations": {
14         "evaluator_definition": "Rate the response quality based on accuracy, relevance, and completeness.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>",
15         "scoring_rubric": "1=Poor, 2=Fair, 3=Good, 4=Very Good, 5=Excellent",
16         "llm_engine": "gpt-4o-mini",
17         "model_options": {
18             "temperature": 0.1,
19             "max_tokens": 200
20         },
21         "min_score": 1.0,
22         "max_score": 5.0,
23         "passing_score": 3.0
24     }
25 }
26 
27 response = requests.post(url, headers=headers, json=data)
28 print(response.json())

cURL

$ curl -X POST "https://api.respan.ai/api/evaluators/" \
>   -H "Authorization: Bearer YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "name": "Response Quality Evaluator",
>     "evaluator_slug": "response_quality_v1",
>     "type": "llm",
>     "score_value_type": "numerical",
>     "description": "Evaluates response quality on a 1-5 scale",
>     "configurations": {
>       "evaluator_definition": "Rate the response quality based on accuracy, relevance, and completeness.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>",
>       "scoring_rubric": "1=Poor, 2=Fair, 3=Good, 4=Very Good, 5=Excellent",
>       "llm_engine": "gpt-4o-mini",
>       "min_score": 1.0,
>       "max_score": 5.0,
>       "passing_score": 3.0
>     }
>   }'

Human Categorical Evaluator

Python

1 data = {
2     "name": "Content Quality Assessment",
3     "evaluator_slug": "content_quality_categorical",
4     "type": "human",
5     "score_value_type": "categorical",
6     "description": "Human assessment of content quality with predefined categories",
7     "categorical_choices": [
8         { "name": "Excellent", "value": 5 },
9         { "name": "Good", "value": 4 },
10         { "name": "Average", "value": 3 },
11         { "name": "Poor", "value": 2 },
12         { "name": "Very Poor", "value": 1 }
13     ]
14 }
15 
16 response = requests.post(url, headers=headers, json=data)
17 print(response.json())

cURL

$ curl -X POST "https://api.respan.ai/api/evaluators/" \
>   -H "Authorization: Bearer YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "name": "Content Quality Assessment",
>     "evaluator_slug": "content_quality_categorical",
>     "type": "human",
>     "score_value_type": "categorical",
>     "description": "Human assessment of content quality with predefined categories",
>     "categorical_choices": [
>       { "name": "Excellent", "value": 5 },
>       { "name": "Good", "value": 4 },
>       { "name": "Average", "value": 3 },
>       { "name": "Poor", "value": 2 },
>       { "name": "Very Poor", "value": 1 }
>     ]
>   }'

Code-based Boolean Evaluator

Python

1 data = {
2     "name": "Response Length Checker",
3     "evaluator_slug": "length_checker_boolean",
4     "type": "code",
5     "score_value_type": "boolean",
6     "description": "Checks if response meets minimum length requirement",
7     "configurations": {
8         "eval_code_snippet": "def evaluate(llm_input, llm_output, **kwargs):\n    '''\n    Check if response meets minimum length requirement\n    Returns True if length >= 50 characters, False otherwise\n    '''\n    if not llm_output:\n        return False\n    \n    return len(llm_output.strip()) >= 50"
9     }
10 }
11 
12 response = requests.post(url, headers=headers, json=data)
13 print(response.json())

cURL

$ curl -X POST "https://api.respan.ai/api/evaluators/" \
>   -H "Authorization: Bearer YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "name": "Response Length Checker",
>     "evaluator_slug": "length_checker_boolean",
>     "type": "code",
>     "score_value_type": "boolean",
>     "description": "Checks if response meets minimum length requirement",
>     "configurations": {
>       "eval_code_snippet": "def evaluate(llm_input, llm_output, **kwargs):\n    if not llm_output:\n        return False\n    return len(llm_output.strip()) >= 50"
>     }
>   }'

LLM Boolean Evaluator

Python

1 data = {
2     "name": "LLM Factual Accuracy Check",
3     "evaluator_slug": "llm_factual_accuracy",
4     "type": "llm",
5     "score_value_type": "boolean",
6     "description": "LLM-based evaluator that checks if response is factually accurate",
7     "configurations": {
8         "evaluator_definition": "Determine if the response is factually accurate and contains no misinformation.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>",
9         "scoring_rubric": "Return True if factually accurate, False if contains errors or misinformation",
10         "llm_engine": "gpt-4o-mini"
11     }
12 }
13 
14 response = requests.post(url, headers=headers, json=data)
15 print(response.json())

cURL

$ curl -X POST "https://api.respan.ai/api/evaluators/" \
>   -H "Authorization: Bearer YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "name": "LLM Factual Accuracy Check",
>     "evaluator_slug": "llm_factual_accuracy",
>     "type": "llm",
>     "score_value_type": "boolean",
>     "description": "LLM-based evaluator that checks if response is factually accurate",
>     "configurations": {
>       "evaluator_definition": "Determine if the response is factually accurate and contains no misinformation.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>",
>       "scoring_rubric": "Return True if factually accurate, False if contains errors or misinformation",
>       "llm_engine": "gpt-4o-mini"
>     }
>   }'

Using Pre-built Template

Python

1 data = {
2     "name": "Template-based LLM Evaluator",
3     "evaluator_slug": "template_llm_eval",
4     "type": "llm",
5     "score_value_type": "numerical",
6     "eval_class": "respan_custom_llm",
7     "description": "Uses pre-built LLM evaluator template",
8     "configurations": {
9         "evaluator_definition": "Evaluate response accuracy and helpfulness.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>",
10         "scoring_rubric": "Score from 1-10 based on accuracy and helpfulness",
11         "llm_engine": "gpt-4o",
12         "min_score": 1.0,
13         "max_score": 10.0
14     }
15 }
16 
17 response = requests.post(url, headers=headers, json=data)
18 print(response.json())

cURL

$ curl -X POST "https://api.respan.ai/api/evaluators/" \
>   -H "Authorization: Bearer YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "name": "Template-based LLM Evaluator",
>     "evaluator_slug": "template_llm_eval",
>     "type": "llm",
>     "score_value_type": "numerical",
>     "eval_class": "respan_custom_llm",
>     "description": "Uses pre-built LLM evaluator template",
>     "configurations": {
>       "evaluator_definition": "Evaluate response accuracy and helpfulness.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>",
>       "scoring_rubric": "Score from 1-10 based on accuracy and helpfulness",
>       "llm_engine": "gpt-4o",
>       "min_score": 1.0,
>       "max_score": 10.0
>     }
>   }'

Response

Status: 201 Created

1 {
2   "id": "0f4325f9-55ef-4c20-8abe-376694419947",
3   "name": "Response Quality Evaluator",
4   "evaluator_slug": "response_quality_v1",
5   "type": "llm",
6   "score_value_type": "numerical",
7   "eval_class": "",
8   "description": "Evaluates response quality on a 1-5 scale",
9   "configurations": {
10     "evaluator_definition": "Rate the response quality based on accuracy, relevance, and completeness.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>",
11     "scoring_rubric": "1=Poor, 2=Fair, 3=Good, 4=Very Good, 5=Excellent",
12     "llm_engine": "gpt-4o-mini",
13     "model_options": {
14       "temperature": 0.1,
15       "max_tokens": 200
16     },
17     "min_score": 1.0,
18     "max_score": 5.0,
19     "passing_score": 3.0
20   },
21   "created_by": {
22     "first_name": "Respan",
23     "last_name": "Team",
24     "email": "admin@respan.ai"
25   },
26   "updated_by": {
27     "first_name": "Respan",
28     "last_name": "Team",
29     "email": "admin@respan.ai"
30   },
31   "created_at": "2025-09-11T09:43:55.858321Z",
32   "updated_at": "2025-09-11T09:43:55.858331Z",
33   "custom_required_fields": [],
34   "categorical_choices": null,
35   "starred": false,
36   "tags": []
37 }

Error Responses

400 Bad Request

1 {
2   "configurations": [
3     "Configuration validation failed: 1 validation error for RespanCustomLLMEvaluatorType\nscoring_rubric\n  Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]"
4   ]
5 }

401 Unauthorized

1 {
2   "detail": "Your API key is invalid or expired, please check your API key at https://platform.respan.ai/platform/api/api-keys"
3 }

Creates a new evaluator for your organization. You must specify `type` and `score_value_type`. The `eval_class` field is optional and only used for pre-built templates. ## Authentication All endpoints require API key authentication: ```bash Authorization: Bearer YOUR_API_KEY ``` ## Evaluator Types and Score Value Types ### Evaluator Types (Required) <Note> **Important**: The evaluator `type` field now represents the primary interface/use case, but automation can be added independently via `llm_config` or `code_config`. This decouples the annotation method from the evaluator type. </Note> - **`llm`**: Primarily LLM-based evaluators (can also have code automation) - **`human`**: Primarily human annotation-based (can have LLM or code automation for assistance) - **`code`**: Primarily code-based evaluators (can also have LLM automation as fallback) ### Score Value Types (Required) - **`numerical`**: Numeric scores (e.g., 1-5, 0.0-1.0) - **`boolean`**: True/false or pass/fail evaluations - **`percentage`**: 0-100 percentage scores (use decimals; 0.0–100.0) - **`single_select`**: Choose exactly one option from predefined choices - **`multi_select`**: Choose one or more options from predefined choices - **`json`**: Structured JSON data for complex evaluations - **`text`**: Text-based feedback and comments - (Legacy) **`categorical`** and **`comment`** remain readable for older evaluators ### Pre-built Templates (Optional) You can optionally use pre-built templates by specifying `eval_class`: - **`respan_custom_llm`**: LLM-based evaluator with standard configuration - **`custom_code`**: Code-based evaluator template ## Unified Evaluator Inputs All evaluator runs now receive a single unified `inputs` object. This applies to all evaluator types (`llm`, `human`, `code`). Structure: ```json { "inputs": { "input": {}, "output": {}, "metrics": {}, "metadata": {} } } ``` - `input` (any JSON): The request/input to be evaluated. - `output` (any JSON): The response/output being evaluated. - `metrics` (object, optional): System-captured metrics (e.g., tokens, latency, cost). - `metadata` (object, optional): Context and custom properties you pass; also logged. - `llm_input` and `llm_output` (string, optional): Legacy convenience aliases. ## Required Fields - **`name`** (string): Display name for the evaluator - **`type`** (string): Evaluator type - `"llm"`, `"human"`, or `"code"` - **`score_value_type`** (string): Score format - `"numerical"`, `"boolean"`, `"categorical"`, or `"comment"` ## Optional Fields - **`evaluator_slug`** (string): Unique identifier (auto-generated if not provided) - **`description`** (string): Description of the evaluator - **`eval_class`** (string): Pre-built template to use (optional) - **`configurations`** (object): Custom configuration based on evaluator type - **`categorical_choices`** (array): Required when `score_value_type` is `"categorical"` ## New Format (Recommended) <Note> The new evaluator format uses clean, flat configuration fields instead of nested `configurations`. This format allows you to add **both LLM and code automation** to any evaluator type, decoupling the annotation method from the evaluator type. </Note> ### New Top-Level Fields (All Optional) | Field | Type | Description | |-------|------|-------------| | `score_config` | object | Score type configuration (shape varies by `score_value_type`) | | `passing_conditions` | object | Passing conditions using universal filter format | | `llm_config` | object | LLM automation config (if using LLM for scoring) | | `code_config` | object | Code automation config (if using code for scoring) | ### Score Config Shapes **Numerical/Percentage:** ```json { "min_score": 0.0, "max_score": 5.0, "choices": [...] // Optional discrete values } ``` **Single/Multi Select:** ```json { "choices": [ {"name": "Professional", "value": "professional"}, {"name": "Casual", "value": "casual"} ] } ``` ### LLM Config ```json { "model": "gpt-4o-mini", "evaluator_definition": "Your prompt template with {{input}} and {{output}}", "scoring_rubric": "Scoring instructions", "temperature": 0.1, "max_tokens": 200 } ``` Available LLM config fields (all optional except `model` and `evaluator_definition`): - Core: `model`, `stream` - Sampling: `temperature`, `top_p`, `max_tokens`, `max_completion_tokens` - Penalties: `frequency_penalty`, `presence_penalty`, `stop` - Formatting: `response_format`, `verbosity` - Tools: `tools`, `tool_choice`, `parallel_tool_calls` ### Code Config ```json { "eval_code_snippet": "def main(eval_inputs):\n return 1 if 'success' in eval_inputs.get('output', '') else 0" } ``` ### Passing Conditions Uses the universal filter format. Example: ```json { "primary_score": { "operator": "gte", "value": 3 } } ``` For complete details, see the [Filters API Reference](/api-reference/reference/filters-api-reference). ## Legacy Format (Still Supported) The legacy `configurations` format remains fully functional for backward compatibility. ### Configuration Fields by Type **For `type: "llm"` evaluators:** - `evaluator_definition` (string): The evaluation prompt/instruction. **Must include `{{input}}` and `{{output}}` template variables**. Legacy `{{llm_input}}` and `{{llm_output}}` are also supported for backward compatibility. - `scoring_rubric` (string): Description of the scoring criteria - `llm_engine` (string): LLM model to use (e.g., "gpt-4o-mini", "gpt-4o") - `model_options` (object, optional): LLM parameters like temperature, max_tokens - `min_score` (number, optional): Minimum possible score - `max_score` (number, optional): Maximum possible score - `passing_score` (number, optional): Score threshold for passing **For `type: "code"` evaluators:** - `eval_code_snippet` (string): Python code with `main(eval_inputs)` function that returns the score **For `type: "human"` evaluators:** - No specific configuration fields required - Use `categorical_choices` field when `score_value_type` is `"single_select"` or `"multi_select"` **For `score_value_type: "single_select" | "multi_select"`:** - `categorical_choices` (array): List of choice objects with `name` and `value` properties ```json [ { "name": "Excellent", "value": 5 }, { "name": "Good", "value": 4 } ] ``` ## Examples ### New Format Examples #### LLM Evaluator with Automation (Numerical) ```python Python url = "https://api.respan.ai/api/evaluators/" headers = { "Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json" } data = { "name": "Response Quality", "evaluator_slug": "response_quality_v2", "score_value_type": "numerical", "score_config": { "min_score": 1, "max_score": 5, "choices": [ {"name": "Poor", "value": 1}, {"name": "Fair", "value": 2}, {"name": "Good", "value": 3}, {"name": "Great", "value": 4}, {"name": "Excellent", "value": 5} ] }, "passing_conditions": { "primary_score": { "operator": "gte", "value": 3 } }, "llm_config": { "model": "gpt-4o-mini", "evaluator_definition": "Rate the quality of this response:\n<input>{{input}}</input>\n<output>{{output}}</output>", "scoring_rubric": "1=Poor, 5=Excellent", "temperature": 0.1 } } response = requests.post(url, headers=headers, json=data) print(response.json()) ``` ```bash cURL curl -X POST "https://api.respan.ai/api/evaluators/" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "Response Quality", "score_value_type": "numerical", "score_config": { "min_score": 1, "max_score": 5 }, "passing_conditions": { "primary_score": { "operator": "gte", "value": 3 } }, "llm_config": { "model": "gpt-4o-mini", "evaluator_definition": "Rate the quality:\n<input>{{input}}</input>\n<output>{{output}}</output>", "temperature": 0.1 } }' ``` #### Human Evaluator with LLM Assistance <Note> This shows how a **human** evaluator can have LLM automation for suggested scoring, decoupling annotation method from evaluator type. </Note> ```python Python data = { "name": "Human Review with AI Assistance", "evaluator_slug": "human_ai_assist_v1", "type": "human", "score_value_type": "numerical", "score_config": {"min_score": 1, "max_score": 5}, "llm_config": { "model": "gpt-4o-mini", "evaluator_definition": "Suggest a quality score for this response", "temperature": 0.1 } } response = requests.post(url, headers=headers, json=data) print(response.json()) ``` ```bash cURL curl -X POST "https://api.respan.ai/api/evaluators/" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "Human Review with AI Assistance", "type": "human", "score_value_type": "numerical", "score_config": {"min_score": 1, "max_score": 5}, "llm_config": { "model": "gpt-4o-mini", "evaluator_definition": "Suggest a quality score" } }' ``` #### Code Evaluator (Boolean) ```python Python data = { "name": "Length Check", "evaluator_slug": "length_check_v1", "score_value_type": "boolean", "description": "Checks if response is longer than 10 characters", "code_config": { "eval_code_snippet": "def main(eval_inputs):\n output = eval_inputs.get('output', '')\n return len(str(output)) > 10" } } response = requests.post(url, headers=headers, json=data) print(response.json()) ``` ```bash cURL curl -X POST "https://api.respan.ai/api/evaluators/" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "Length Check", "score_value_type": "boolean", "code_config": { "eval_code_snippet": "def main(eval_inputs):\n return len(str(eval_inputs.get(\"output\", \"\"))) > 10" } }' ``` #### Single Select Evaluator with LLM ```python Python data = { "name": "Tone Classifier", "score_value_type": "single_select", "score_config": { "choices": [ {"name": "Professional", "value": "professional"}, {"name": "Casual", "value": "casual"}, {"name": "Formal", "value": "formal"} ] }, "llm_config": { "model": "gpt-4o-mini", "evaluator_definition": "Classify the tone of this response" } } response = requests.post(url, headers=headers, json=data) print(response.json()) ``` ```bash cURL curl -X POST "https://api.respan.ai/api/evaluators/" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "Tone Classifier", "score_value_type": "single_select", "score_config": { "choices": [ {"name": "Professional", "value": "professional"}, {"name": "Casual", "value": "casual"} ] }, "llm_config": { "model": "gpt-4o-mini", "evaluator_definition": "Classify the tone" } }' ``` ### Legacy Format Examples #### Custom LLM Evaluator (Numerical) ```python Python url = "https://api.respan.ai/api/evaluators/" headers = { "Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json" } data = { "name": "Response Quality Evaluator", "evaluator_slug": "response_quality_v1", "type": "llm", "score_value_type": "numerical", "description": "Evaluates response quality on a 1-5 scale", "configurations": { "evaluator_definition": "Rate the response quality based on accuracy, relevance, and completeness.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>", "scoring_rubric": "1=Poor, 2=Fair, 3=Good, 4=Very Good, 5=Excellent", "llm_engine": "gpt-4o-mini", "model_options": { "temperature": 0.1, "max_tokens": 200 }, "min_score": 1.0, "max_score": 5.0, "passing_score": 3.0 } } response = requests.post(url, headers=headers, json=data) print(response.json()) ``` ```bash cURL curl -X POST "https://api.respan.ai/api/evaluators/" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "Response Quality Evaluator", "evaluator_slug": "response_quality_v1", "type": "llm", "score_value_type": "numerical", "description": "Evaluates response quality on a 1-5 scale", "configurations": { "evaluator_definition": "Rate the response quality based on accuracy, relevance, and completeness.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>", "scoring_rubric": "1=Poor, 2=Fair, 3=Good, 4=Very Good, 5=Excellent", "llm_engine": "gpt-4o-mini", "min_score": 1.0, "max_score": 5.0, "passing_score": 3.0 } }' ``` ### Human Categorical Evaluator ```python Python data = { "name": "Content Quality Assessment", "evaluator_slug": "content_quality_categorical", "type": "human", "score_value_type": "categorical", "description": "Human assessment of content quality with predefined categories", "categorical_choices": [ { "name": "Excellent", "value": 5 }, { "name": "Good", "value": 4 }, { "name": "Average", "value": 3 }, { "name": "Poor", "value": 2 }, { "name": "Very Poor", "value": 1 } ] } response = requests.post(url, headers=headers, json=data) print(response.json()) ``` ```bash cURL curl -X POST "https://api.respan.ai/api/evaluators/" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "Content Quality Assessment", "evaluator_slug": "content_quality_categorical", "type": "human", "score_value_type": "categorical", "description": "Human assessment of content quality with predefined categories", "categorical_choices": [ { "name": "Excellent", "value": 5 }, { "name": "Good", "value": 4 }, { "name": "Average", "value": 3 }, { "name": "Poor", "value": 2 }, { "name": "Very Poor", "value": 1 } ] }' ``` ### Code-based Boolean Evaluator ```python Python data = { "name": "Response Length Checker", "evaluator_slug": "length_checker_boolean", "type": "code", "score_value_type": "boolean", "description": "Checks if response meets minimum length requirement", "configurations": { "eval_code_snippet": "def evaluate(llm_input, llm_output, **kwargs):\n '''\n Check if response meets minimum length requirement\n Returns True if length >= 50 characters, False otherwise\n '''\n if not llm_output:\n return False\n \n return len(llm_output.strip()) >= 50" } } response = requests.post(url, headers=headers, json=data) print(response.json()) ``` ```bash cURL curl -X POST "https://api.respan.ai/api/evaluators/" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "Response Length Checker", "evaluator_slug": "length_checker_boolean", "type": "code", "score_value_type": "boolean", "description": "Checks if response meets minimum length requirement", "configurations": { "eval_code_snippet": "def evaluate(llm_input, llm_output, **kwargs):\n if not llm_output:\n return False\n return len(llm_output.strip()) >= 50" } }' ``` ### LLM Boolean Evaluator ```python Python data = { "name": "LLM Factual Accuracy Check", "evaluator_slug": "llm_factual_accuracy", "type": "llm", "score_value_type": "boolean", "description": "LLM-based evaluator that checks if response is factually accurate", "configurations": { "evaluator_definition": "Determine if the response is factually accurate and contains no misinformation.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>", "scoring_rubric": "Return True if factually accurate, False if contains errors or misinformation", "llm_engine": "gpt-4o-mini" } } response = requests.post(url, headers=headers, json=data) print(response.json()) ``` ```bash cURL curl -X POST "https://api.respan.ai/api/evaluators/" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "LLM Factual Accuracy Check", "evaluator_slug": "llm_factual_accuracy", "type": "llm", "score_value_type": "boolean", "description": "LLM-based evaluator that checks if response is factually accurate", "configurations": { "evaluator_definition": "Determine if the response is factually accurate and contains no misinformation.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>", "scoring_rubric": "Return True if factually accurate, False if contains errors or misinformation", "llm_engine": "gpt-4o-mini" } }' ``` ### Using Pre-built Template ```python Python data = { "name": "Template-based LLM Evaluator", "evaluator_slug": "template_llm_eval", "type": "llm", "score_value_type": "numerical", "eval_class": "respan_custom_llm", "description": "Uses pre-built LLM evaluator template", "configurations": { "evaluator_definition": "Evaluate response accuracy and helpfulness.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>", "scoring_rubric": "Score from 1-10 based on accuracy and helpfulness", "llm_engine": "gpt-4o", "min_score": 1.0, "max_score": 10.0 } } response = requests.post(url, headers=headers, json=data) print(response.json()) ``` ```bash cURL curl -X POST "https://api.respan.ai/api/evaluators/" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "Template-based LLM Evaluator", "evaluator_slug": "template_llm_eval", "type": "llm", "score_value_type": "numerical", "eval_class": "respan_custom_llm", "description": "Uses pre-built LLM evaluator template", "configurations": { "evaluator_definition": "Evaluate response accuracy and helpfulness.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>", "scoring_rubric": "Score from 1-10 based on accuracy and helpfulness", "llm_engine": "gpt-4o", "min_score": 1.0, "max_score": 10.0 } }' ``` ## Response **Status: 201 Created** ```json { "id": "0f4325f9-55ef-4c20-8abe-376694419947", "name": "Response Quality Evaluator", "evaluator_slug": "response_quality_v1", "type": "llm", "score_value_type": "numerical", "eval_class": "", "description": "Evaluates response quality on a 1-5 scale", "configurations": { "evaluator_definition": "Rate the response quality based on accuracy, relevance, and completeness.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>", "scoring_rubric": "1=Poor, 2=Fair, 3=Good, 4=Very Good, 5=Excellent", "llm_engine": "gpt-4o-mini", "model_options": { "temperature": 0.1, "max_tokens": 200 }, "min_score": 1.0, "max_score": 5.0, "passing_score": 3.0 }, "created_by": { "first_name": "Respan", "last_name": "Team", "email": "admin@respan.ai" }, "updated_by": { "first_name": "Respan", "last_name": "Team", "email": "admin@respan.ai" }, "created_at": "2025-09-11T09:43:55.858321Z", "updated_at": "2025-09-11T09:43:55.858331Z", "custom_required_fields": [], "categorical_choices": null, "starred": false, "tags": [] } ``` ## Error Responses ### 400 Bad Request ```json { "configurations": [ "Configuration validation failed: 1 validation error for RespanCustomLLMEvaluatorType\nscoring_rubric\n Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]" ] } ``` ### 401 Unauthorized ```json { "detail": "Your API key is invalid or expired, please check your API key at https://platform.respan.ai/platform/api/api-keys" } ```

Authentication

AuthorizationBearer

API key authentication. Get your API key from https://platform.respan.ai/platform/api-keys

Request

This endpoint expects an object.

Response

Successful response for Create evaluator

inputsobject

Errors

401

Unauthorized Error

Creates a new evaluator for your organization. You must specify type and score_value_type. The eval_class field is optional and only used for pre-built templates.

Authentication

All endpoints require API key authentication:

$ Authorization: Bearer YOUR_API_KEY

Evaluator Types and Score Value Types

Evaluator Types (Required)

llm: Primarily LLM-based evaluators (can also have code automation)
human: Primarily human annotation-based (can have LLM or code automation for assistance)
code: Primarily code-based evaluators (can also have LLM automation as fallback)

Score Value Types (Required)

numerical: Numeric scores (e.g., 1-5, 0.0-1.0)
boolean: True/false or pass/fail evaluations
percentage: 0-100 percentage scores (use decimals; 0.0–100.0)
single_select: Choose exactly one option from predefined choices
multi_select: Choose one or more options from predefined choices
json: Structured JSON data for complex evaluations
text: Text-based feedback and comments
(Legacy) categorical and comment remain readable for older evaluators

Pre-built Templates (Optional)

You can optionally use pre-built templates by specifying eval_class:

respan_custom_llm: LLM-based evaluator with standard configuration
custom_code: Code-based evaluator template

Unified Evaluator Inputs

All evaluator runs now receive a single unified inputs object. This applies to all evaluator types (llm, human, code).

Structure:

1 {
2   "inputs": {
3     "input": {},
4     "output": {},
5     "metrics": {},
6     "metadata": {}
7   }
8 }

input (any JSON): The request/input to be evaluated.
output (any JSON): The response/output being evaluated.
metrics (object, optional): System-captured metrics (e.g., tokens, latency, cost).
metadata (object, optional): Context and custom properties you pass; also logged.
llm_input and llm_output (string, optional): Legacy convenience aliases.

Required Fields

name (string): Display name for the evaluator
type (string): Evaluator type - "llm", "human", or "code"
score_value_type (string): Score format - "numerical", "boolean", "categorical", or "comment"

Optional Fields

evaluator_slug (string): Unique identifier (auto-generated if not provided)
description (string): Description of the evaluator
eval_class (string): Pre-built template to use (optional)
configurations (object): Custom configuration based on evaluator type
categorical_choices (array): Required when score_value_type is "categorical"

New Format (Recommended)

New Top-Level Fields (All Optional)

Field	Type	Description
`score_config`	object	Score type configuration (shape varies by `score_value_type`)
`passing_conditions`	object	Passing conditions using universal filter format
`llm_config`	object	LLM automation config (if using LLM for scoring)
`code_config`	object	Code automation config (if using code for scoring)

Score Config Shapes

Numerical/Percentage:

1 {
2   "min_score": 0.0,
3   "max_score": 5.0,
4   "choices": [...]  // Optional discrete values
5 }

Single/Multi Select:

1 {
2   "choices": [
3     {"name": "Professional", "value": "professional"},
4     {"name": "Casual", "value": "casual"}
5   ]
6 }

LLM Config

1 {
2   "model": "gpt-4o-mini",
3   "evaluator_definition": "Your prompt template with {{input}} and {{output}}",
4   "scoring_rubric": "Scoring instructions",
5   "temperature": 0.1,
6   "max_tokens": 200
7 }

Available LLM config fields (all optional except model and evaluator_definition):

Core: model, stream
Sampling: temperature, top_p, max_tokens, max_completion_tokens
Penalties: frequency_penalty, presence_penalty, stop
Formatting: response_format, verbosity
Tools: tools, tool_choice, parallel_tool_calls

Code Config

1 {
2   "eval_code_snippet": "def main(eval_inputs):\n    return 1 if 'success' in eval_inputs.get('output', '') else 0"
3 }

Passing Conditions

Uses the universal filter format. Example:

1 {
2   "primary_score": {
3     "operator": "gte",
4     "value": 3
5   }
6 }

For complete details, see the Filters API Reference.

Legacy Format (Still Supported)

The legacy configurations format remains fully functional for backward compatibility.

Configuration Fields by Type

For type: "llm" evaluators:

evaluator_definition (string): The evaluation prompt/instruction. Must include {{input}} and {{output}} template variables. Legacy {{llm_input}} and {{llm_output}} are also supported for backward compatibility.
scoring_rubric (string): Description of the scoring criteria
llm_engine (string): LLM model to use (e.g., “gpt-4o-mini”, “gpt-4o”)
model_options (object, optional): LLM parameters like temperature, max_tokens
min_score (number, optional): Minimum possible score
max_score (number, optional): Maximum possible score
passing_score (number, optional): Score threshold for passing

For type: "code" evaluators:

eval_code_snippet (string): Python code with main(eval_inputs) function that returns the score

For type: "human" evaluators:

No specific configuration fields required
Use categorical_choices field when score_value_type is "single_select" or "multi_select"

For score_value_type: "single_select" | "multi_select":

categorical_choices (array): List of choice objects with name and value properties

1 [
2   { "name": "Excellent", "value": 5 },
3   { "name": "Good", "value": 4 }
4 ]

Examples

New Format Examples

LLM Evaluator with Automation (Numerical)

Python

1 url = "https://api.respan.ai/api/evaluators/"
2 headers = {
3     "Authorization": "Bearer YOUR_API_KEY",
4     "Content-Type": "application/json"
5 }
6 
7 data = {
8     "name": "Response Quality",
9     "evaluator_slug": "response_quality_v2",
10     "score_value_type": "numerical",
11     "score_config": {
12         "min_score": 1,
13         "max_score": 5,
14         "choices": [
15             {"name": "Poor", "value": 1},
16             {"name": "Fair", "value": 2},
17             {"name": "Good", "value": 3},
18             {"name": "Great", "value": 4},
19             {"name": "Excellent", "value": 5}
20         ]
21     },
22     "passing_conditions": {
23         "primary_score": {
24             "operator": "gte",
25             "value": 3
26         }
27     },
28     "llm_config": {
29         "model": "gpt-4o-mini",
30         "evaluator_definition": "Rate the quality of this response:\n<input>{{input}}</input>\n<output>{{output}}</output>",
31         "scoring_rubric": "1=Poor, 5=Excellent",
32         "temperature": 0.1
33     }
34 }
35 
36 response = requests.post(url, headers=headers, json=data)
37 print(response.json())

cURL

$ curl -X POST "https://api.respan.ai/api/evaluators/" \
>   -H "Authorization: Bearer YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "name": "Response Quality",
>     "score_value_type": "numerical",
>     "score_config": {
>       "min_score": 1,
>       "max_score": 5
>     },
>     "passing_conditions": {
>       "primary_score": {
>         "operator": "gte",
>         "value": 3
>       }
>     },
>     "llm_config": {
>       "model": "gpt-4o-mini",
>       "evaluator_definition": "Rate the quality:\n<input>{{input}}</input>\n<output>{{output}}</output>",
>       "temperature": 0.1
>     }
>   }'

Human Evaluator with LLM Assistance

This shows how a human evaluator can have LLM automation for suggested scoring, decoupling annotation method from evaluator type.

Python

1 data = {
2     "name": "Human Review with AI Assistance",
3     "evaluator_slug": "human_ai_assist_v1",
4     "type": "human",
5     "score_value_type": "numerical",
6     "score_config": {"min_score": 1, "max_score": 5},
7     "llm_config": {
8         "model": "gpt-4o-mini",
9         "evaluator_definition": "Suggest a quality score for this response",
10         "temperature": 0.1
11     }
12 }
13 
14 response = requests.post(url, headers=headers, json=data)
15 print(response.json())

cURL

$ curl -X POST "https://api.respan.ai/api/evaluators/" \
>   -H "Authorization: Bearer YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "name": "Human Review with AI Assistance",
>     "type": "human",
>     "score_value_type": "numerical",
>     "score_config": {"min_score": 1, "max_score": 5},
>     "llm_config": {
>       "model": "gpt-4o-mini",
>       "evaluator_definition": "Suggest a quality score"
>     }
>   }'

Code Evaluator (Boolean)

Python

1 data = {
2     "name": "Length Check",
3     "evaluator_slug": "length_check_v1",
4     "score_value_type": "boolean",
5     "description": "Checks if response is longer than 10 characters",
6     "code_config": {
7         "eval_code_snippet": "def main(eval_inputs):\n    output = eval_inputs.get('output', '')\n    return len(str(output)) > 10"
8     }
9 }
10 
11 response = requests.post(url, headers=headers, json=data)
12 print(response.json())

cURL

$ curl -X POST "https://api.respan.ai/api/evaluators/" \
>   -H "Authorization: Bearer YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "name": "Length Check",
>     "score_value_type": "boolean",
>     "code_config": {
>       "eval_code_snippet": "def main(eval_inputs):\n    return len(str(eval_inputs.get(\"output\", \"\"))) > 10"
>     }
>   }'

Single Select Evaluator with LLM

Python

1 data = {
2     "name": "Tone Classifier",
3     "score_value_type": "single_select",
4     "score_config": {
5         "choices": [
6             {"name": "Professional", "value": "professional"},
7             {"name": "Casual", "value": "casual"},
8             {"name": "Formal", "value": "formal"}
9         ]
10     },
11     "llm_config": {
12         "model": "gpt-4o-mini",
13         "evaluator_definition": "Classify the tone of this response"
14     }
15 }
16 
17 response = requests.post(url, headers=headers, json=data)
18 print(response.json())

cURL

$ curl -X POST "https://api.respan.ai/api/evaluators/" \
>   -H "Authorization: Bearer YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "name": "Tone Classifier",
>     "score_value_type": "single_select",
>     "score_config": {
>       "choices": [
>         {"name": "Professional", "value": "professional"},
>         {"name": "Casual", "value": "casual"}
>       ]
>     },
>     "llm_config": {
>       "model": "gpt-4o-mini",
>       "evaluator_definition": "Classify the tone"
>     }
>   }'

Legacy Format Examples

Custom LLM Evaluator (Numerical)

Python

1 url = "https://api.respan.ai/api/evaluators/"
2 headers = {
3     "Authorization": "Bearer YOUR_API_KEY",
4     "Content-Type": "application/json"
5 }
6 
7 data = {
8     "name": "Response Quality Evaluator",
9     "evaluator_slug": "response_quality_v1",
10     "type": "llm",
11     "score_value_type": "numerical",
12     "description": "Evaluates response quality on a 1-5 scale",
13     "configurations": {
14         "evaluator_definition": "Rate the response quality based on accuracy, relevance, and completeness.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>",
15         "scoring_rubric": "1=Poor, 2=Fair, 3=Good, 4=Very Good, 5=Excellent",
16         "llm_engine": "gpt-4o-mini",
17         "model_options": {
18             "temperature": 0.1,
19             "max_tokens": 200
20         },
21         "min_score": 1.0,
22         "max_score": 5.0,
23         "passing_score": 3.0
24     }
25 }
26 
27 response = requests.post(url, headers=headers, json=data)
28 print(response.json())

cURL

$ curl -X POST "https://api.respan.ai/api/evaluators/" \
>   -H "Authorization: Bearer YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "name": "Response Quality Evaluator",
>     "evaluator_slug": "response_quality_v1",
>     "type": "llm",
>     "score_value_type": "numerical",
>     "description": "Evaluates response quality on a 1-5 scale",
>     "configurations": {
>       "evaluator_definition": "Rate the response quality based on accuracy, relevance, and completeness.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>",
>       "scoring_rubric": "1=Poor, 2=Fair, 3=Good, 4=Very Good, 5=Excellent",
>       "llm_engine": "gpt-4o-mini",
>       "min_score": 1.0,
>       "max_score": 5.0,
>       "passing_score": 3.0
>     }
>   }'

Human Categorical Evaluator

Python

1 data = {
2     "name": "Content Quality Assessment",
3     "evaluator_slug": "content_quality_categorical",
4     "type": "human",
5     "score_value_type": "categorical",
6     "description": "Human assessment of content quality with predefined categories",
7     "categorical_choices": [
8         { "name": "Excellent", "value": 5 },
9         { "name": "Good", "value": 4 },
10         { "name": "Average", "value": 3 },
11         { "name": "Poor", "value": 2 },
12         { "name": "Very Poor", "value": 1 }
13     ]
14 }
15 
16 response = requests.post(url, headers=headers, json=data)
17 print(response.json())

cURL

$ curl -X POST "https://api.respan.ai/api/evaluators/" \
>   -H "Authorization: Bearer YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "name": "Content Quality Assessment",
>     "evaluator_slug": "content_quality_categorical",
>     "type": "human",
>     "score_value_type": "categorical",
>     "description": "Human assessment of content quality with predefined categories",
>     "categorical_choices": [
>       { "name": "Excellent", "value": 5 },
>       { "name": "Good", "value": 4 },
>       { "name": "Average", "value": 3 },
>       { "name": "Poor", "value": 2 },
>       { "name": "Very Poor", "value": 1 }
>     ]
>   }'

Code-based Boolean Evaluator

Python

1 data = {
2     "name": "Response Length Checker",
3     "evaluator_slug": "length_checker_boolean",
4     "type": "code",
5     "score_value_type": "boolean",
6     "description": "Checks if response meets minimum length requirement",
7     "configurations": {
8         "eval_code_snippet": "def evaluate(llm_input, llm_output, **kwargs):\n    '''\n    Check if response meets minimum length requirement\n    Returns True if length >= 50 characters, False otherwise\n    '''\n    if not llm_output:\n        return False\n    \n    return len(llm_output.strip()) >= 50"
9     }
10 }
11 
12 response = requests.post(url, headers=headers, json=data)
13 print(response.json())

cURL

$ curl -X POST "https://api.respan.ai/api/evaluators/" \
>   -H "Authorization: Bearer YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "name": "Response Length Checker",
>     "evaluator_slug": "length_checker_boolean",
>     "type": "code",
>     "score_value_type": "boolean",
>     "description": "Checks if response meets minimum length requirement",
>     "configurations": {
>       "eval_code_snippet": "def evaluate(llm_input, llm_output, **kwargs):\n    if not llm_output:\n        return False\n    return len(llm_output.strip()) >= 50"
>     }
>   }'

LLM Boolean Evaluator

Python

1 data = {
2     "name": "LLM Factual Accuracy Check",
3     "evaluator_slug": "llm_factual_accuracy",
4     "type": "llm",
5     "score_value_type": "boolean",
6     "description": "LLM-based evaluator that checks if response is factually accurate",
7     "configurations": {
8         "evaluator_definition": "Determine if the response is factually accurate and contains no misinformation.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>",
9         "scoring_rubric": "Return True if factually accurate, False if contains errors or misinformation",
10         "llm_engine": "gpt-4o-mini"
11     }
12 }
13 
14 response = requests.post(url, headers=headers, json=data)
15 print(response.json())

cURL

$ curl -X POST "https://api.respan.ai/api/evaluators/" \
>   -H "Authorization: Bearer YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "name": "LLM Factual Accuracy Check",
>     "evaluator_slug": "llm_factual_accuracy",
>     "type": "llm",
>     "score_value_type": "boolean",
>     "description": "LLM-based evaluator that checks if response is factually accurate",
>     "configurations": {
>       "evaluator_definition": "Determine if the response is factually accurate and contains no misinformation.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>",
>       "scoring_rubric": "Return True if factually accurate, False if contains errors or misinformation",
>       "llm_engine": "gpt-4o-mini"
>     }
>   }'

Using Pre-built Template

Python

1 data = {
2     "name": "Template-based LLM Evaluator",
3     "evaluator_slug": "template_llm_eval",
4     "type": "llm",
5     "score_value_type": "numerical",
6     "eval_class": "respan_custom_llm",
7     "description": "Uses pre-built LLM evaluator template",
8     "configurations": {
9         "evaluator_definition": "Evaluate response accuracy and helpfulness.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>",
10         "scoring_rubric": "Score from 1-10 based on accuracy and helpfulness",
11         "llm_engine": "gpt-4o",
12         "min_score": 1.0,
13         "max_score": 10.0
14     }
15 }
16 
17 response = requests.post(url, headers=headers, json=data)
18 print(response.json())

cURL

$ curl -X POST "https://api.respan.ai/api/evaluators/" \
>   -H "Authorization: Bearer YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "name": "Template-based LLM Evaluator",
>     "evaluator_slug": "template_llm_eval",
>     "type": "llm",
>     "score_value_type": "numerical",
>     "eval_class": "respan_custom_llm",
>     "description": "Uses pre-built LLM evaluator template",
>     "configurations": {
>       "evaluator_definition": "Evaluate response accuracy and helpfulness.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>",
>       "scoring_rubric": "Score from 1-10 based on accuracy and helpfulness",
>       "llm_engine": "gpt-4o",
>       "min_score": 1.0,
>       "max_score": 10.0
>     }
>   }'

Response

Status: 201 Created

1 {
2   "id": "0f4325f9-55ef-4c20-8abe-376694419947",
3   "name": "Response Quality Evaluator",
4   "evaluator_slug": "response_quality_v1",
5   "type": "llm",
6   "score_value_type": "numerical",
7   "eval_class": "",
8   "description": "Evaluates response quality on a 1-5 scale",
9   "configurations": {
10     "evaluator_definition": "Rate the response quality based on accuracy, relevance, and completeness.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>",
11     "scoring_rubric": "1=Poor, 2=Fair, 3=Good, 4=Very Good, 5=Excellent",
12     "llm_engine": "gpt-4o-mini",
13     "model_options": {
14       "temperature": 0.1,
15       "max_tokens": 200
16     },
17     "min_score": 1.0,
18     "max_score": 5.0,
19     "passing_score": 3.0
20   },
21   "created_by": {
22     "first_name": "Respan",
23     "last_name": "Team",
24     "email": "admin@respan.ai"
25   },
26   "updated_by": {
27     "first_name": "Respan",
28     "last_name": "Team",
29     "email": "admin@respan.ai"
30   },
31   "created_at": "2025-09-11T09:43:55.858321Z",
32   "updated_at": "2025-09-11T09:43:55.858331Z",
33   "custom_required_fields": [],
34   "categorical_choices": null,
35   "starred": false,
36   "tags": []
37 }

Error Responses

400 Bad Request

1 {
2   "configurations": [
3     "Configuration validation failed: 1 validation error for RespanCustomLLMEvaluatorType\nscoring_rubric\n  Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]"
4   ]
5 }

401 Unauthorized

1 {
2   "detail": "Your API key is invalid or expired, please check your API key at https://platform.respan.ai/platform/api/api-keys"
3 }

1	{
2	"inputs": {
3	"input": {},
4	"output": {},
5	"metrics": {},
6	"metadata": {}
7	}
8	}

1	{
2	"min_score": 0.0,
3	"max_score": 5.0,
4	"choices": [...] // Optional discrete values
5	}

1	{
2	"choices": [
3	{"name": "Professional", "value": "professional"},
4	{"name": "Casual", "value": "casual"}
5	]
6	}

1	{
2	"model": "gpt-4o-mini",
3	"evaluator_definition": "Your prompt template with {{input}} and {{output}}",
4	"scoring_rubric": "Scoring instructions",
5	"temperature": 0.1,
6	"max_tokens": 200
7	}

1	{
2	"eval_code_snippet": "def main(eval_inputs):\n return 1 if 'success' in eval_inputs.get('output', '') else 0"
3	}

1	[
2	{ "name": "Excellent", "value": 5 },
3	{ "name": "Good", "value": 4 }
4	]

1	url = "https://api.respan.ai/api/evaluators/"
2	headers = {
3	"Authorization": "Bearer YOUR_API_KEY",
4	"Content-Type": "application/json"
5	}
6
7	data = {
8	"name": "Response Quality",
9	"evaluator_slug": "response_quality_v2",
10	"score_value_type": "numerical",
11	"score_config": {
12	"min_score": 1,
13	"max_score": 5,
14	"choices": [
15	{"name": "Poor", "value": 1},
16	{"name": "Fair", "value": 2},
17	{"name": "Good", "value": 3},
18	{"name": "Great", "value": 4},
19	{"name": "Excellent", "value": 5}
20	]
21	},
22	"passing_conditions": {
23	"primary_score": {
24	"operator": "gte",
25	"value": 3
26	}
27	},
28	"llm_config": {
29	"model": "gpt-4o-mini",
30	"evaluator_definition": "Rate the quality of this response:\n<input>{{input}}</input>\n<output>{{output}}</output>",
31	"scoring_rubric": "1=Poor, 5=Excellent",
32	"temperature": 0.1
33	}
34	}
35
36	response = requests.post(url, headers=headers, json=data)
37	print(response.json())

$	curl -X POST "https://api.respan.ai/api/evaluators/" \
>	-H "Authorization: Bearer YOUR_API_KEY" \
>	-H "Content-Type: application/json" \
>	-d '{
>	"name": "Response Quality",
>	"score_value_type": "numerical",
>	"score_config": {
>	"min_score": 1,
>	"max_score": 5
>	},
>	"passing_conditions": {
>	"primary_score": {
>	"operator": "gte",
>	"value": 3
>	}
>	},
>	"llm_config": {
>	"model": "gpt-4o-mini",
>	"evaluator_definition": "Rate the quality:\n<input>{{input}}</input>\n<output>{{output}}</output>",
>	"temperature": 0.1
>	}
>	}'

1	data = {
2	"name": "Human Review with AI Assistance",
3	"evaluator_slug": "human_ai_assist_v1",
4	"type": "human",
5	"score_value_type": "numerical",
6	"score_config": {"min_score": 1, "max_score": 5},
7	"llm_config": {
8	"model": "gpt-4o-mini",
9	"evaluator_definition": "Suggest a quality score for this response",
10	"temperature": 0.1
11	}
12	}
13
14	response = requests.post(url, headers=headers, json=data)
15	print(response.json())

1	data = {
2	"name": "Length Check",
3	"evaluator_slug": "length_check_v1",
4	"score_value_type": "boolean",
5	"description": "Checks if response is longer than 10 characters",
6	"code_config": {
7	"eval_code_snippet": "def main(eval_inputs):\n output = eval_inputs.get('output', '')\n return len(str(output)) > 10"
8	}
9	}
10
11	response = requests.post(url, headers=headers, json=data)
12	print(response.json())

1	data = {
2	"name": "Tone Classifier",
3	"score_value_type": "single_select",
4	"score_config": {
5	"choices": [
6	{"name": "Professional", "value": "professional"},
7	{"name": "Casual", "value": "casual"},
8	{"name": "Formal", "value": "formal"}
9	]
10	},
11	"llm_config": {
12	"model": "gpt-4o-mini",
13	"evaluator_definition": "Classify the tone of this response"
14	}
15	}
16
17	response = requests.post(url, headers=headers, json=data)
18	print(response.json())

1	data = {
2	"name": "Content Quality Assessment",
3	"evaluator_slug": "content_quality_categorical",
4	"type": "human",
5	"score_value_type": "categorical",
6	"description": "Human assessment of content quality with predefined categories",
7	"categorical_choices": [
8	{ "name": "Excellent", "value": 5 },
9	{ "name": "Good", "value": 4 },
10	{ "name": "Average", "value": 3 },
11	{ "name": "Poor", "value": 2 },
12	{ "name": "Very Poor", "value": 1 }
13	]
14	}
15
16	response = requests.post(url, headers=headers, json=data)
17	print(response.json())

1	data = {
2	"name": "Response Length Checker",
3	"evaluator_slug": "length_checker_boolean",
4	"type": "code",
5	"score_value_type": "boolean",
6	"description": "Checks if response meets minimum length requirement",
7	"configurations": {
8	"eval_code_snippet": "def evaluate(llm_input, llm_output, **kwargs):\n '''\n Check if response meets minimum length requirement\n Returns True if length >= 50 characters, False otherwise\n '''\n if not llm_output:\n return False\n \n return len(llm_output.strip()) >= 50"
9	}
10	}
11
12	response = requests.post(url, headers=headers, json=data)
13	print(response.json())

1	data = {
2	"name": "LLM Factual Accuracy Check",
3	"evaluator_slug": "llm_factual_accuracy",
4	"type": "llm",
5	"score_value_type": "boolean",
6	"description": "LLM-based evaluator that checks if response is factually accurate",
7	"configurations": {
8	"evaluator_definition": "Determine if the response is factually accurate and contains no misinformation.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>",
9	"scoring_rubric": "Return True if factually accurate, False if contains errors or misinformation",
10	"llm_engine": "gpt-4o-mini"
11	}
12	}
13
14	response = requests.post(url, headers=headers, json=data)
15	print(response.json())

1	data = {
2	"name": "Template-based LLM Evaluator",
3	"evaluator_slug": "template_llm_eval",
4	"type": "llm",
5	"score_value_type": "numerical",
6	"eval_class": "respan_custom_llm",
7	"description": "Uses pre-built LLM evaluator template",
8	"configurations": {
9	"evaluator_definition": "Evaluate response accuracy and helpfulness.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>",
10	"scoring_rubric": "Score from 1-10 based on accuracy and helpfulness",
11	"llm_engine": "gpt-4o",
12	"min_score": 1.0,
13	"max_score": 10.0
14	}
15	}
16
17	response = requests.post(url, headers=headers, json=data)
18	print(response.json())

1	{
2	"id": "0f4325f9-55ef-4c20-8abe-376694419947",
3	"name": "Response Quality Evaluator",
4	"evaluator_slug": "response_quality_v1",
5	"type": "llm",
6	"score_value_type": "numerical",
7	"eval_class": "",
8	"description": "Evaluates response quality on a 1-5 scale",
9	"configurations": {
10	"evaluator_definition": "Rate the response quality based on accuracy, relevance, and completeness.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>",
11	"scoring_rubric": "1=Poor, 2=Fair, 3=Good, 4=Very Good, 5=Excellent",
12	"llm_engine": "gpt-4o-mini",
13	"model_options": {
14	"temperature": 0.1,
15	"max_tokens": 200
16	},
17	"min_score": 1.0,
18	"max_score": 5.0,
19	"passing_score": 3.0
20	},
21	"created_by": {
22	"first_name": "Respan",
23	"last_name": "Team",
24	"email": "admin@respan.ai"
25	},
26	"updated_by": {
27	"first_name": "Respan",
28	"last_name": "Team",
29	"email": "admin@respan.ai"
30	},
31	"created_at": "2025-09-11T09:43:55.858321Z",
32	"updated_at": "2025-09-11T09:43:55.858331Z",
33	"custom_required_fields": [],
34	"categorical_choices": null,
35	"starred": false,
36	"tags": []
37	}

1	{
2	"configurations": [
3	"Configuration validation failed: 1 validation error for RespanCustomLLMEvaluatorType\nscoring_rubric\n Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]"
4	]
5	}

1	{
2	"detail": "Your API key is invalid or expired, please check your API key at https://platform.respan.ai/platform/api/api-keys"
3	}