Overview
Respan’s observability is built around one core concept: different views of the same underlying data. Every LLM interaction is stored as a log — a single record containing the input, output, model, metrics, and metadata. All views present the same log data, just organized differently for different use cases:Logs
Plain view - Individual LLM requests as they happen
Traces
Hierarchical view - Multi-step workflows and AI agent operations
Threads
Conversational view - Linear chat interface for dialogue systems
Scores
Evaluation view - Quality assessments and performance metrics
How Respan stores your data
When you send an LLM request through Respan — whether via the LLM Gateway, the logging API, or a tracing SDK — Respan creates a log record. Every log uses a universalinput / output design. The system automatically:
- Serializes your data regardless of format (messages array, text string, embeddings vector, audio metadata)
- Extracts type-specific fields (tool calls, thinking blocks, etc.)
- Calculates metrics when possible (cost, token counts, latency)
- Associates the log with traces, threads, and customers based on the identifiers you provide
What’s in a log
A log captures everything about a single LLM interaction:| Category | What it stores | Key fields |
|---|---|---|
| Content | What was sent and received | input, output, model, log_type |
| Metrics | Performance and cost data | latency, cost, usage, time_to_first_token |
| Identity | Who and what | customer_identifier, metadata, thread_identifier, group_identifier |
| Tracing | Where in the workflow | trace_unique_id, span_parent_id, span_name |
| Config | LLM settings | temperature, max_tokens, tools |
| Status | Success or failure | status_code, error_message |
Logs
A log represents a single LLM request. This is the foundational data that powers all other views.
log_type that determines its input/output format. The most common is chat (messages in, assistant message out), but Respan also supports embedding, speech, transcription, workflow, agent, and more. See Log types for the full list.
Traces
Traces organize the same log data into hierarchical workflows, perfect for complex AI agent operations and multi-step processes.
Trace structure
Key trace fields
trace_unique_id: Groups all spans in the same workflowspan_unique_id: Individual span identifier (maps to log ID)span_parent_id: Creates the hierarchical structurespan_name: Descriptive name for the operationspan_workflow_name: The nearest workflow this span belongs to
Multi-trace grouping
Complex workflows can span multiple traces usingtrace_group_identifier:
Threads
Threads organize the same log data in a conversational format, ideal for chat applications and dialogue systems.
Thread structure
- Thread ID: Unique identifier for the conversation
- Messages: Ordered sequence of user and assistant messages (each message maps to log entries)
Notice how each message in the thread references a
log_id - this shows how threads are just a different presentation of the same underlying log data.Scores
Scores organize the same log data with evaluation metrics and quality assessments, perfect for monitoring LLM performance and conducting evaluations.
Score structure
Scores are linked to logs through thelog_id field and can be created by two types of evaluators:
- Platform evaluators: Use
evaluator_id(UUID from Respan platform) - Custom evaluators: Use
evaluator_slug(your custom string identifier)
Key score fields
id: Unique score identifierlog_id: Links the score to its corresponding log entryevaluator_id: UUID of Respan platform evaluator (optional)evaluator_slug: Custom evaluator identifier (optional)is_passed: Whether the evaluation passed defined criteriacost: Cost of running the evaluationcreated_at: When the score was created
Each evaluator can only have one score per log, ensuring data integrity and preventing duplicate evaluations.
Score value types
Scores support four different value types based on the evaluator’sscore_value_type:
Evaluator’s score_value_type | Use This Field | Data Type | Example |
|---|---|---|---|
numerical | numerical_value | number | 4.5 |
boolean | boolean_value | boolean | true |
categorical | categorical_value | array of strings | ["excellent", "coherent"] |
comment | string_value | string | "Good response quality" |
Detailed descriptions
Numerical Scores- Use case: Ratings, confidence scores, quality metrics
- Range: Defined by evaluator’s
min_scoreandmax_score - Example: Rating response quality from 1-5
- Use case: Pass/fail evaluations, binary classifications
- Values:
trueorfalse - Example: Content safety check
- Use case: Multi-choice classifications
- Values: Array of predefined choices from evaluator’s
categorical_choices - Example:
["relevant", "accurate", "helpful"]
- Use case: Qualitative feedback, explanations
- Values: Free-form text
- Example: Detailed evaluation reasoning
Evaluator types
Scores are created by evaluators, which come in three types: LLM Evaluators (type: "llm")
- AI-powered evaluation using language models
- Requires
evaluator_definitionprompt - Supports all score value types
type: "human")
- Manual evaluation by human reviewers
- Often used with categorical or comment scores
- Requires predefined choices for categorical
type: "code")
- Programmatic evaluation using custom code
- Requires
eval_code_snippet - Most flexible for complex logic
Legacy fields (
llm_input, llm_output) are normalized to input/output when reading the inputs field.