This guide shows you how to create any type of LLM span in Respan using the universal input/output design that supports all span types.
Span size limit: 20MB
Each span payload has a maximum size limit of 20MB. This includes the input, output, and all other fields combined. Spans exceeding this limit will be rejected.
Respan uses universal input and output fields across all span types.
- Chat completions: Messages arrays
- Embeddings: Text strings or arrays
- Transcriptions: Audio metadata → text
- Speech: Text → audio
- Workflows/Tasks: Any custom data structure
- Agent operations: Complex nested objects
How it works:
- You provide
input and output fields in any structure (string, object, array, etc.)
- Set
log_type to indicate span type ("chat", "embedding", "workflow", etc.)
- Respan automatically extracts type-specific fields for backward compatibility
- Your data is stored efficiently and retrieved with both universal and type-specific fields
For complete log_type specifications, see log types.
Legacy field support
For backward compatibility, Respan still supports legacy fields:
-
prompt_messages array: Legacy field. Use input instead.
-
completion_message object: Legacy field. Use output instead.
Request body
Core fields
input string | object | array: Universal input field for the span. Structure depends on log_type:
- Chat: JSON string of messages array or messages array directly
- Embedding: Text string or array of strings
- Workflow/Task: Any JSON-serializable structure
- Transcription: Audio file reference or metadata object
- Speech: Text string or TTS configuration object
See the Span Types section below for complete specifications.
Example for Chat
Example for Embedding
Example for Workflow
output string | object | array: Universal output field for the span. Structure depends on log_type:
- Chat: JSON string of completion message or message object directly
- Embedding: Array of vector embeddings
- Workflow/Task: Any JSON-serializable result structure
- Transcription: Transcribed text string
- Speech: Audio file reference or base64 audio data
Example for Chat
Example for Embedding
log_type string: Type of span being logged. Determines how input and output are parsed.
Supported types:
"chat" - Chat completion requests (default)
"completion" - Legacy completion requests
"response" - OpenAI Response API
"embedding" - Embedding generation
"transcription" - Speech-to-text
"speech" - Text-to-speech
"workflow" or "agent" - Workflow/agent execution
"task" or "tool" - Task/tool execution
"function" - Function call
"generation" - Generation span
"handoff" - Agent handoff
"guardrail" - Safety check
"custom" - Custom span type
Default Behavior
If not specified, defaults to "chat". For chat types, the system automatically extracts prompt_messages and completion_message from input and output for backward compatibility.
For complete specifications of each type, see log types.
model string: The model used for the inference. Optional but recommended for chat/completion/embedding types.
Example
Telemetry
Performance metrics and cost tracking for monitoring LLM efficiency.
usage object: Token usage information for the request.
Properties
-
prompt_tokens integer: Number of tokens in the prompt/input.
-
completion_tokens integer: Number of tokens in the completion/output.
-
total_tokens integer: Total tokens (prompt + completion).
-
prompt_tokens_details object: Detailed breakdown of prompt tokens (e.g., cached tokens).
-
cache_creation_prompt_tokens integer: For Anthropic models: tokens used to create the cache.
Example
-
cost float: Cost of the inference in US dollars. If not provided, will be calculated automatically based on model pricing.
-
latency float: Total request latency in seconds (replaces deprecated generation_time).
Previously called generation_time. For backward compatibility, both field names are supported.
time_to_first_token float: Time to first token (TTFT) in seconds. Useful for streaming responses and voice AI applications.
Previously called ttft. Both field names are supported.
tokens_per_second float: Generation speed in tokens per second.
Custom tracking and identification parameters for advanced analytics and filtering.
metadata object: You can add any key-value pair to this metadata field for your reference. Useful for custom analytics and filtering.
Example
customer_identifier string: An identifier for the customer that invoked this request. Helps with visualizing user activities. See customer identifier details.
Example
customer_params object: Extended customer information (alternative to individual customer fields).
Properties
-
customer_identifier string: Customer identifier.
-
name string: Customer name.
-
email string: Customer email.
Example
-
thread_identifier string: A unique identifier for the conversation thread. Useful for multi-turn conversations.
-
custom_identifier string: Same functionality as metadata, but indexed for faster querying.
Example
group_identifier string: Group identifier. Use to group related spans together.
Workflow & tracing
Parameters for distributed tracing and workflow tracking.
-
trace_unique_id string: Unique identifier for the trace. Used to link multiple spans together in distributed tracing.
-
span_workflow_name string: Name of the workflow this span belongs to.
-
span_name string: Name of this specific span/task within the workflow.
-
span_parent_id string: ID of the parent span. Used to build the trace hierarchy.
Advanced parameters
tools array: A list of tools the model may call. Currently, only functions are supported as a tool.
Properties
-
type string required: The type of the tool. Currently, only function is supported.
-
function object required: Properties
-
name string required: The name of the function.
-
description string: A description of what the function does.
-
parameters object: The parameters the function accepts.
Example
tool_choice string | object: Controls which (if any) tool is called by the model. Can be "none", "auto", or an object specifying a specific tool.
Example
Response configuration
response_format object: Setting to { "type": "json_schema", "json_schema": {...} } enables Structured Outputs.
Possible types
- Text:
{ "type": "text" } - Default response format
- JSON Schema:
{ "type": "json_schema", "json_schema": {...} } - Structured outputs
- JSON Object:
{ "type": "json_object" } - Legacy JSON format
Model configuration
-
temperature number: Controls randomness in the output (0-2). Higher values produce more random responses.
-
top_p number: Nucleus sampling parameter. Alternative to temperature.
-
frequency_penalty number: Penalizes tokens based on their frequency in the text so far.
-
presence_penalty number: Penalizes tokens based on whether they appear in the text so far.
-
max_tokens integer: Maximum number of tokens to generate.
-
stop array[string]: Stop sequences where generation will stop.
Error handling and status
status_code integer: The HTTP status code for the request. Default is 200 (success).
Supported status codes
All valid HTTP status codes are supported: 200, 201, 400, 401, 403, 404, 429, 500, 502, 503, 504, etc.
-
error_message string: Error message if the request failed. Default is empty string.
-
warnings string | object: Any warnings that occurred during the request.
-
status string: Request status. Common values: "success", "error".
Additional configuration
-
stream boolean: Whether the response was streamed.
-
prompt_id string: ID of the prompt template used. See Prompts documentation.
-
prompt_name string: Name of the prompt template.
-
is_custom_prompt boolean: Whether the prompt is a custom prompt. Set to true if using custom prompt_id.
-
timestamp string: ISO 8601 timestamp when the request completed.
Example
-
start_time string: ISO 8601 timestamp when the request started.
-
full_request object: The full request object. Useful for logging additional configuration parameters.
Tool calls and other nested objects will be automatically extracted from full_request.
full_response object: The full response object from the model provider.
Pricing configuration
prompt_unit_price number: Custom price per 1M prompt tokens. Used for self-hosted or fine-tuned models.
Example
completion_unit_price number: Custom price per 1M completion tokens. Used for self-hosted or fine-tuned models.
Example
API controls
respan_api_controls object: Control the behavior of the Respan logging API.
Properties
block boolean: If false, the server immediately returns initialization status without waiting for log completion.
Example
positive_feedback boolean: Whether the user liked the output. true means positive feedback.