Create evaluator

Creates a new evaluator for your organization. You must specify `type` and `score_value_type`. The `eval_class` field is optional and only used for pre-built templates. ## Authentication All endpoints require API key authentication: ```bash Authorization: Bearer YOUR_API_KEY ``` ## Evaluator Types and Score Value Types ### Evaluator Types (Required) <Note> **Important**: The evaluator `type` field now represents the primary interface/use case, but automation can be added independently via `llm_config` or `code_config`. This decouples the annotation method from the evaluator type. </Note> - **`llm`**: Primarily LLM-based evaluators (can also have code automation) - **`human`**: Primarily human annotation-based (can have LLM or code automation for assistance) - **`code`**: Primarily code-based evaluators (can also have LLM automation as fallback) ### Score Value Types (Required) - **`numerical`**: Numeric scores (e.g., 1-5, 0.0-1.0) - **`boolean`**: True/false or pass/fail evaluations - **`percentage`**: 0-100 percentage scores (use decimals; 0.0–100.0) - **`single_select`**: Choose exactly one option from predefined choices - **`multi_select`**: Choose one or more options from predefined choices - **`json`**: Structured JSON data for complex evaluations - **`text`**: Text-based feedback and comments - (Legacy) **`categorical`** and **`comment`** remain readable for older evaluators ### Pre-built Templates (Optional) You can optionally use pre-built templates by specifying `eval_class`: - **`respan_custom_llm`**: LLM-based evaluator with standard configuration - **`custom_code`**: Code-based evaluator template ## Unified Evaluator Inputs All evaluator runs now receive a single unified `inputs` object. This applies to all evaluator types (`llm`, `human`, `code`). Structure: ```json { "inputs": { "input": {}, "output": {}, "metrics": {}, "metadata": {} } } ``` - `input` (any JSON): The request/input to be evaluated. - `output` (any JSON): The response/output being evaluated. - `metrics` (object, optional): System-captured metrics (e.g., tokens, latency, cost). - `metadata` (object, optional): Context and custom properties you pass; also logged. - `llm_input` and `llm_output` (string, optional): Legacy convenience aliases. ## Required Fields - **`name`** (string): Display name for the evaluator - **`type`** (string): Evaluator type - `"llm"`, `"human"`, or `"code"` - **`score_value_type`** (string): Score format - `"numerical"`, `"boolean"`, `"categorical"`, or `"comment"` ## Optional Fields - **`evaluator_slug`** (string): Unique identifier (auto-generated if not provided) - **`description`** (string): Description of the evaluator - **`eval_class`** (string): Pre-built template to use (optional) - **`configurations`** (object): Custom configuration based on evaluator type - **`categorical_choices`** (array): Required when `score_value_type` is `"categorical"` ## New Format (Recommended) <Note> The new evaluator format uses clean, flat configuration fields instead of nested `configurations`. This format allows you to add **both LLM and code automation** to any evaluator type, decoupling the annotation method from the evaluator type. </Note> ### New Top-Level Fields (All Optional) | Field | Type | Description | |-------|------|-------------| | `score_config` | object | Score type configuration (shape varies by `score_value_type`) | | `passing_conditions` | object | Passing conditions using universal filter format | | `llm_config` | object | LLM automation config (if using LLM for scoring) | | `code_config` | object | Code automation config (if using code for scoring) | ### Score Config Shapes **Numerical/Percentage:** ```json { "min_score": 0.0, "max_score": 5.0, "choices": [...] // Optional discrete values } ``` **Single/Multi Select:** ```json { "choices": [ {"name": "Professional", "value": "professional"}, {"name": "Casual", "value": "casual"} ] } ``` ### LLM Config ```json { "model": "gpt-4o-mini", "evaluator_definition": "Your prompt template with {{input}} and {{output}}", "scoring_rubric": "Scoring instructions", "temperature": 0.1, "max_tokens": 200 } ``` Available LLM config fields (all optional except `model` and `evaluator_definition`): - Core: `model`, `stream` - Sampling: `temperature`, `top_p`, `max_tokens`, `max_completion_tokens` - Penalties: `frequency_penalty`, `presence_penalty`, `stop` - Formatting: `response_format`, `verbosity` - Tools: `tools`, `tool_choice`, `parallel_tool_calls` ### Code Config ```json { "eval_code_snippet": "def main(eval_inputs):\n return 1 if 'success' in eval_inputs.get('output', '') else 0" } ``` ### Passing Conditions Uses the universal filter format. Example: ```json { "primary_score": { "operator": "gte", "value": 3 } } ``` For complete details, see the [Filters API Reference](/api-reference/reference/filters-api-reference). ## Legacy Format (Still Supported) The legacy `configurations` format remains fully functional for backward compatibility. ### Configuration Fields by Type **For `type: "llm"` evaluators:** - `evaluator_definition` (string): The evaluation prompt/instruction. **Must include `{{input}}` and `{{output}}` template variables**. Legacy `{{llm_input}}` and `{{llm_output}}` are also supported for backward compatibility. - `scoring_rubric` (string): Description of the scoring criteria - `llm_engine` (string): LLM model to use (e.g., "gpt-4o-mini", "gpt-4o") - `model_options` (object, optional): LLM parameters like temperature, max_tokens - `min_score` (number, optional): Minimum possible score - `max_score` (number, optional): Maximum possible score - `passing_score` (number, optional): Score threshold for passing **For `type: "code"` evaluators:** - `eval_code_snippet` (string): Python code with `main(eval_inputs)` function that returns the score **For `type: "human"` evaluators:** - No specific configuration fields required - Use `categorical_choices` field when `score_value_type` is `"single_select"` or `"multi_select"` **For `score_value_type: "single_select" | "multi_select"`:** - `categorical_choices` (array): List of choice objects with `name` and `value` properties ```json [ { "name": "Excellent", "value": 5 }, { "name": "Good", "value": 4 } ] ``` ## Examples ### New Format Examples #### LLM Evaluator with Automation (Numerical) ```python Python url = "https://api.respan.ai/api/evaluators/" headers = { "Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json" } data = { "name": "Response Quality", "evaluator_slug": "response_quality_v2", "score_value_type": "numerical", "score_config": { "min_score": 1, "max_score": 5, "choices": [ {"name": "Poor", "value": 1}, {"name": "Fair", "value": 2}, {"name": "Good", "value": 3}, {"name": "Great", "value": 4}, {"name": "Excellent", "value": 5} ] }, "passing_conditions": { "primary_score": { "operator": "gte", "value": 3 } }, "llm_config": { "model": "gpt-4o-mini", "evaluator_definition": "Rate the quality of this response:\n<input>{{input}}</input>\n<output>{{output}}</output>", "scoring_rubric": "1=Poor, 5=Excellent", "temperature": 0.1 } } response = requests.post(url, headers=headers, json=data) print(response.json()) ``` ```bash cURL curl -X POST "https://api.respan.ai/api/evaluators/" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "Response Quality", "score_value_type": "numerical", "score_config": { "min_score": 1, "max_score": 5 }, "passing_conditions": { "primary_score": { "operator": "gte", "value": 3 } }, "llm_config": { "model": "gpt-4o-mini", "evaluator_definition": "Rate the quality:\n<input>{{input}}</input>\n<output>{{output}}</output>", "temperature": 0.1 } }' ``` #### Human Evaluator with LLM Assistance <Note> This shows how a **human** evaluator can have LLM automation for suggested scoring, decoupling annotation method from evaluator type. </Note> ```python Python data = { "name": "Human Review with AI Assistance", "evaluator_slug": "human_ai_assist_v1", "type": "human", "score_value_type": "numerical", "score_config": {"min_score": 1, "max_score": 5}, "llm_config": { "model": "gpt-4o-mini", "evaluator_definition": "Suggest a quality score for this response", "temperature": 0.1 } } response = requests.post(url, headers=headers, json=data) print(response.json()) ``` ```bash cURL curl -X POST "https://api.respan.ai/api/evaluators/" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "Human Review with AI Assistance", "type": "human", "score_value_type": "numerical", "score_config": {"min_score": 1, "max_score": 5}, "llm_config": { "model": "gpt-4o-mini", "evaluator_definition": "Suggest a quality score" } }' ``` #### Code Evaluator (Boolean) ```python Python data = { "name": "Length Check", "evaluator_slug": "length_check_v1", "score_value_type": "boolean", "description": "Checks if response is longer than 10 characters", "code_config": { "eval_code_snippet": "def main(eval_inputs):\n output = eval_inputs.get('output', '')\n return len(str(output)) > 10" } } response = requests.post(url, headers=headers, json=data) print(response.json()) ``` ```bash cURL curl -X POST "https://api.respan.ai/api/evaluators/" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "Length Check", "score_value_type": "boolean", "code_config": { "eval_code_snippet": "def main(eval_inputs):\n return len(str(eval_inputs.get(\"output\", \"\"))) > 10" } }' ``` #### Single Select Evaluator with LLM ```python Python data = { "name": "Tone Classifier", "score_value_type": "single_select", "score_config": { "choices": [ {"name": "Professional", "value": "professional"}, {"name": "Casual", "value": "casual"}, {"name": "Formal", "value": "formal"} ] }, "llm_config": { "model": "gpt-4o-mini", "evaluator_definition": "Classify the tone of this response" } } response = requests.post(url, headers=headers, json=data) print(response.json()) ``` ```bash cURL curl -X POST "https://api.respan.ai/api/evaluators/" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "Tone Classifier", "score_value_type": "single_select", "score_config": { "choices": [ {"name": "Professional", "value": "professional"}, {"name": "Casual", "value": "casual"} ] }, "llm_config": { "model": "gpt-4o-mini", "evaluator_definition": "Classify the tone" } }' ``` ### Legacy Format Examples #### Custom LLM Evaluator (Numerical) ```python Python url = "https://api.respan.ai/api/evaluators/" headers = { "Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json" } data = { "name": "Response Quality Evaluator", "evaluator_slug": "response_quality_v1", "type": "llm", "score_value_type": "numerical", "description": "Evaluates response quality on a 1-5 scale", "configurations": { "evaluator_definition": "Rate the response quality based on accuracy, relevance, and completeness.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>", "scoring_rubric": "1=Poor, 2=Fair, 3=Good, 4=Very Good, 5=Excellent", "llm_engine": "gpt-4o-mini", "model_options": { "temperature": 0.1, "max_tokens": 200 }, "min_score": 1.0, "max_score": 5.0, "passing_score": 3.0 } } response = requests.post(url, headers=headers, json=data) print(response.json()) ``` ```bash cURL curl -X POST "https://api.respan.ai/api/evaluators/" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "Response Quality Evaluator", "evaluator_slug": "response_quality_v1", "type": "llm", "score_value_type": "numerical", "description": "Evaluates response quality on a 1-5 scale", "configurations": { "evaluator_definition": "Rate the response quality based on accuracy, relevance, and completeness.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>", "scoring_rubric": "1=Poor, 2=Fair, 3=Good, 4=Very Good, 5=Excellent", "llm_engine": "gpt-4o-mini", "min_score": 1.0, "max_score": 5.0, "passing_score": 3.0 } }' ``` ### Human Categorical Evaluator ```python Python data = { "name": "Content Quality Assessment", "evaluator_slug": "content_quality_categorical", "type": "human", "score_value_type": "categorical", "description": "Human assessment of content quality with predefined categories", "categorical_choices": [ { "name": "Excellent", "value": 5 }, { "name": "Good", "value": 4 }, { "name": "Average", "value": 3 }, { "name": "Poor", "value": 2 }, { "name": "Very Poor", "value": 1 } ] } response = requests.post(url, headers=headers, json=data) print(response.json()) ``` ```bash cURL curl -X POST "https://api.respan.ai/api/evaluators/" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "Content Quality Assessment", "evaluator_slug": "content_quality_categorical", "type": "human", "score_value_type": "categorical", "description": "Human assessment of content quality with predefined categories", "categorical_choices": [ { "name": "Excellent", "value": 5 }, { "name": "Good", "value": 4 }, { "name": "Average", "value": 3 }, { "name": "Poor", "value": 2 }, { "name": "Very Poor", "value": 1 } ] }' ``` ### Code-based Boolean Evaluator ```python Python data = { "name": "Response Length Checker", "evaluator_slug": "length_checker_boolean", "type": "code", "score_value_type": "boolean", "description": "Checks if response meets minimum length requirement", "configurations": { "eval_code_snippet": "def evaluate(llm_input, llm_output, **kwargs):\n '''\n Check if response meets minimum length requirement\n Returns True if length >= 50 characters, False otherwise\n '''\n if not llm_output:\n return False\n \n return len(llm_output.strip()) >= 50" } } response = requests.post(url, headers=headers, json=data) print(response.json()) ``` ```bash cURL curl -X POST "https://api.respan.ai/api/evaluators/" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "Response Length Checker", "evaluator_slug": "length_checker_boolean", "type": "code", "score_value_type": "boolean", "description": "Checks if response meets minimum length requirement", "configurations": { "eval_code_snippet": "def evaluate(llm_input, llm_output, **kwargs):\n if not llm_output:\n return False\n return len(llm_output.strip()) >= 50" } }' ``` ### LLM Boolean Evaluator ```python Python data = { "name": "LLM Factual Accuracy Check", "evaluator_slug": "llm_factual_accuracy", "type": "llm", "score_value_type": "boolean", "description": "LLM-based evaluator that checks if response is factually accurate", "configurations": { "evaluator_definition": "Determine if the response is factually accurate and contains no misinformation.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>", "scoring_rubric": "Return True if factually accurate, False if contains errors or misinformation", "llm_engine": "gpt-4o-mini" } } response = requests.post(url, headers=headers, json=data) print(response.json()) ``` ```bash cURL curl -X POST "https://api.respan.ai/api/evaluators/" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "LLM Factual Accuracy Check", "evaluator_slug": "llm_factual_accuracy", "type": "llm", "score_value_type": "boolean", "description": "LLM-based evaluator that checks if response is factually accurate", "configurations": { "evaluator_definition": "Determine if the response is factually accurate and contains no misinformation.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>", "scoring_rubric": "Return True if factually accurate, False if contains errors or misinformation", "llm_engine": "gpt-4o-mini" } }' ``` ### Using Pre-built Template ```python Python data = { "name": "Template-based LLM Evaluator", "evaluator_slug": "template_llm_eval", "type": "llm", "score_value_type": "numerical", "eval_class": "respan_custom_llm", "description": "Uses pre-built LLM evaluator template", "configurations": { "evaluator_definition": "Evaluate response accuracy and helpfulness.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>", "scoring_rubric": "Score from 1-10 based on accuracy and helpfulness", "llm_engine": "gpt-4o", "min_score": 1.0, "max_score": 10.0 } } response = requests.post(url, headers=headers, json=data) print(response.json()) ``` ```bash cURL curl -X POST "https://api.respan.ai/api/evaluators/" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "Template-based LLM Evaluator", "evaluator_slug": "template_llm_eval", "type": "llm", "score_value_type": "numerical", "eval_class": "respan_custom_llm", "description": "Uses pre-built LLM evaluator template", "configurations": { "evaluator_definition": "Evaluate response accuracy and helpfulness.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>", "scoring_rubric": "Score from 1-10 based on accuracy and helpfulness", "llm_engine": "gpt-4o", "min_score": 1.0, "max_score": 10.0 } }' ``` ## Response **Status: 201 Created** ```json { "id": "0f4325f9-55ef-4c20-8abe-376694419947", "name": "Response Quality Evaluator", "evaluator_slug": "response_quality_v1", "type": "llm", "score_value_type": "numerical", "eval_class": "", "description": "Evaluates response quality on a 1-5 scale", "configurations": { "evaluator_definition": "Rate the response quality based on accuracy, relevance, and completeness.\n<llm_input>{{llm_input}}</llm_input>\n<llm_output>{{llm_output}}</llm_output>", "scoring_rubric": "1=Poor, 2=Fair, 3=Good, 4=Very Good, 5=Excellent", "llm_engine": "gpt-4o-mini", "model_options": { "temperature": 0.1, "max_tokens": 200 }, "min_score": 1.0, "max_score": 5.0, "passing_score": 3.0 }, "created_by": { "first_name": "Respan", "last_name": "Team", "email": "admin@respan.ai" }, "updated_by": { "first_name": "Respan", "last_name": "Team", "email": "admin@respan.ai" }, "created_at": "2025-09-11T09:43:55.858321Z", "updated_at": "2025-09-11T09:43:55.858331Z", "custom_required_fields": [], "categorical_choices": null, "starred": false, "tags": [] } ``` ## Error Responses ### 400 Bad Request ```json { "configurations": [ "Configuration validation failed: 1 validation error for RespanCustomLLMEvaluatorType\nscoring_rubric\n Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]" ] } ``` ### 401 Unauthorized ```json { "detail": "Your API key is invalid or expired, please check your API key at https://platform.respan.ai/platform/api/api-keys" } ```

Authentication

AuthorizationBearer
API key authentication. Get your API key from https://platform.respan.ai/platform/api-keys

Request

This endpoint expects an object.

Response

Successful response for Create evaluator
inputsobject

Errors

401
Unauthorized Error