Create span | Respan Docs

This guide shows you how to create any type of LLM span in Respan using the universal input/output design that supports all span types.

Span size limit: 20MB

Each span payload has a maximum size limit of 20MB. This includes the input, output, and all other fields combined. Spans exceeding this limit will be rejected.

Input/Output

Respan uses universal input and output fields across all span types.

Chat completions: Messages arrays
Embeddings: Text strings or arrays
Transcriptions: Audio metadata → text
Speech: Text → audio
Workflows/Tasks: Any custom data structure
Agent operations: Complex nested objects

How it works:

You provide input and output fields in any structure (string, object, array, etc.)
Set log_type to indicate span type ("chat", "embedding", "workflow", etc.)
Respan automatically extracts type-specific fields for backward compatibility
Your data is stored efficiently and retrieved with both universal and type-specific fields

For complete log_type specifications, see log types.

Legacy field support

For backward compatibility, Respan still supports legacy fields:

prompt_messages array: Legacy field. Use input instead.
completion_message object: Legacy field. Use output instead.

Request body

Core fields

input string | object | array: Universal input field for the span. Structure depends on log_type:
Chat: JSON string of messages array or messages array directly
Embedding: Text string or array of strings
Workflow/Task: Any JSON-serializable structure
Transcription: Audio file reference or metadata object
Speech: Text string or TTS configuration object

See the Span Types section below for complete specifications.

Example for Chat

1 "input": "[{\"role\":\"system\",\"content\":\"You are helpful.\"},{\"role\":\"user\",\"content\":\"Hello\"}]"

Example for Embedding

1 "input": "Respan is an LLM observability platform"

Example for Workflow

1 "input": "{\"query\":\"Help with order #12345\",\"context\":{\"user_id\":\"123\"}}"

output string | object | array: Universal output field for the span. Structure depends on log_type:
Chat: JSON string of completion message or message object directly
Embedding: Array of vector embeddings
Workflow/Task: Any JSON-serializable result structure
Transcription: Transcribed text string
Speech: Audio file reference or base64 audio data

Example for Chat

1 "output": "{\"role\":\"assistant\",\"content\":\"Hello! How can I help you?\"}"

Example for Embedding

1 "output": "[0.123, -0.456, 0.789, ...]"

log_type string: Type of span being logged. Determines how input and output are parsed.

Supported types:

"chat" - Chat completion requests (default)
"completion" - Legacy completion requests
"response" - OpenAI Response API
"embedding" - Embedding generation
"transcription" - Speech-to-text
"speech" - Text-to-speech
"workflow" or "agent" - Workflow/agent execution
"task" or "tool" - Task/tool execution
"function" - Function call
"generation" - Generation span
"handoff" - Agent handoff
"guardrail" - Safety check
"custom" - Custom span type

Default Behavior

If not specified, defaults to "chat". For chat types, the system automatically extracts prompt_messages and completion_message from input and output for backward compatibility.

For complete specifications of each type, see log types.

model string: The model used for the inference. Optional but recommended for chat/completion/embedding types.

Example

1 "model": "gpt-4o-mini"

Telemetry

Performance metrics and cost tracking for monitoring LLM efficiency.

usage object: Token usage information for the request.

Properties

prompt_tokens integer: Number of tokens in the prompt/input.
completion_tokens integer: Number of tokens in the completion/output.
total_tokens integer: Total tokens (prompt + completion).
prompt_tokens_details object: Detailed breakdown of prompt tokens (e.g., cached tokens).
cache_creation_prompt_tokens integer: For Anthropic models: tokens used to create the cache.

Example

1 {
2   "usage": {
3     "prompt_tokens": 150,
4     "completion_tokens": 85,
5     "total_tokens": 235,
6     "prompt_tokens_details": {
7       "cached_tokens": 10
8     }
9   }
10 }

cost float: Cost of the inference in US dollars. If not provided, will be calculated automatically based on model pricing.
latency float: Total request latency in seconds (replaces deprecated generation_time).

Previously called generation_time. For backward compatibility, both field names are supported.

time_to_first_token float: Time to first token (TTFT) in seconds. Useful for streaming responses and voice AI applications.

Previously called ttft. Both field names are supported.

tokens_per_second float: Generation speed in tokens per second.

Metadata

Custom tracking and identification parameters for advanced analytics and filtering.

metadata object: You can add any key-value pair to this metadata field for your reference. Useful for custom analytics and filtering.

Example

1 {
2   "metadata": {
3     "language": "en",
4     "environment": "production",
5     "version": "v1.0.0",
6     "feature": "chat_support",
7     "user_tier": "premium"
8   }
9 }

customer_identifier string: An identifier for the customer that invoked this request. Helps with visualizing user activities. See customer identifier details.

Example

1 "customer_identifier": "user_123"

customer_params object: Extended customer information (alternative to individual customer fields).

Properties

customer_identifier string: Customer identifier.
name string: Customer name.
email string: Customer email.

Example

1 {
2   "customer_params": {
3     "customer_identifier": "customer_123",
4     "name": "John Doe",
5     "email": "john.doe@example.com"
6   }
7 }

thread_identifier string: A unique identifier for the conversation thread. Useful for multi-turn conversations.
custom_identifier string: Same functionality as metadata, but indexed for faster querying.

Example

1 "custom_identifier": "ticket_12345"

group_identifier string: Group identifier. Use to group related spans together.

Workflow & tracing

Parameters for distributed tracing and workflow tracking.

trace_unique_id string: Unique identifier for the trace. Used to link multiple spans together in distributed tracing.
span_workflow_name string: Name of the workflow this span belongs to.
span_name string: Name of this specific span/task within the workflow.
span_parent_id string: ID of the parent span. Used to build the trace hierarchy.

Advanced parameters

Tool calls and function calling

tools array: A list of tools the model may call. Currently, only functions are supported as a tool.

Properties

type string required: The type of the tool. Currently, only function is supported.
function object required: Properties
name string required: The name of the function.
description string: A description of what the function does.
parameters object: The parameters the function accepts.

Example

1 "tools": [
2     {
3         "type": "function",
4         "function": {
5             "name": "get_current_weather",
6             "description": "Get the current weather in a given location",
7             "parameters": {
8                 "type": "object",
9                 "properties": {
10                     "location": {
11                         "type": "string",
12                         "description": "The city and state, e.g. San Francisco, CA"
13                     },
14                     "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
15                 },
16                 "required": ["location"]
17             }
18         }
19     }
20 ]

tool_choice string | object: Controls which (if any) tool is called by the model. Can be "none", "auto", or an object specifying a specific tool.

Example

1 "tool_choice": {
2     "type": "function",
3     "function": {
4         "name": "get_current_weather"
5     }
6 }

Response configuration

response_format object: Setting to { "type": "json_schema", "json_schema": {...} } enables Structured Outputs.

Possible types

Text: { "type": "text" } - Default response format
JSON Schema: { "type": "json_schema", "json_schema": {...} } - Structured outputs
JSON Object: { "type": "json_object" } - Legacy JSON format

Model configuration

temperature number: Controls randomness in the output (0-2). Higher values produce more random responses.
top_p number: Nucleus sampling parameter. Alternative to temperature.
frequency_penalty number: Penalizes tokens based on their frequency in the text so far.
presence_penalty number: Penalizes tokens based on whether they appear in the text so far.
max_tokens integer: Maximum number of tokens to generate.
stop array[string]: Stop sequences where generation will stop.

Error handling and status

status_code integer: The HTTP status code for the request. Default is 200 (success).

Supported status codes

All valid HTTP status codes are supported: 200, 201, 400, 401, 403, 404, 429, 500, 502, 503, 504, etc.

error_message string: Error message if the request failed. Default is empty string.
warnings string | object: Any warnings that occurred during the request.
status string: Request status. Common values: "success", "error".

Additional configuration

stream boolean: Whether the response was streamed.
prompt_id string: ID of the prompt template used. See Prompts documentation.
prompt_name string: Name of the prompt template.
is_custom_prompt boolean: Whether the prompt is a custom prompt. Set to true if using custom prompt_id.
timestamp string: ISO 8601 timestamp when the request completed.

Example

1 "timestamp": "2025-01-01T10:30:00Z"

start_time string: ISO 8601 timestamp when the request started.
full_request object: The full request object. Useful for logging additional configuration parameters.

Tool calls and other nested objects will be automatically extracted from full_request.

full_response object: The full response object from the model provider.

Pricing configuration

prompt_unit_price number: Custom price per 1M prompt tokens. Used for self-hosted or fine-tuned models.

Example

1 "prompt_unit_price": 0.0042  // $0.0042 per 1M tokens

completion_unit_price number: Custom price per 1M completion tokens. Used for self-hosted or fine-tuned models.

Example

1 "completion_unit_price": 0.0042  // $0.0042 per 1M tokens

API controls

respan_api_controls object: Control the behavior of the Respan logging API.

Properties

block boolean: If false, the server immediately returns initialization status without waiting for log completion.

Example

1 {
2   "respan_api_controls": {
3     "block": true
4   }
5 }

positive_feedback boolean: Whether the user liked the output. true means positive feedback.

This guide shows you how to create any type of LLM span in Respan using the **universal input/output design** that supports all span types. <Warning> **Span size limit: 20MB** Each span payload has a maximum size limit of 20MB. This includes the `input`, `output`, and all other fields combined. Spans exceeding this limit will be rejected. </Warning> ## Input/Output **Respan uses universal `input` and `output` fields across all span types.** - **Chat completions**: Messages arrays - **Embeddings**: Text strings or arrays - **Transcriptions**: Audio metadata → text - **Speech**: Text → audio - **Workflows/Tasks**: Any custom data structure - **Agent operations**: Complex nested objects **How it works:** 1. You provide `input` and `output` fields in any structure (string, object, array, etc.) 2. Set `log_type` to indicate span type (`"chat"`, `"embedding"`, `"workflow"`, etc.) 3. Respan automatically extracts type-specific fields for backward compatibility 4. Your data is stored efficiently and retrieved with both universal and type-specific fields For complete `log_type` specifications, see [log types](/documentation/resources/reference/data-model#log-types). <Note> ### Legacy field support For backward compatibility, Respan still supports legacy fields: - `prompt_messages` *array*: **Legacy field.** Use `input` instead. - `completion_message` *object*: **Legacy field.** Use `output` instead. </Note> ## Request body ### Core fields - `input` *string | object | array*: Universal input field for the span. Structure depends on `log_type`: - **Chat**: JSON string of messages array or messages array directly - **Embedding**: Text string or array of strings - **Workflow/Task**: Any JSON-serializable structure - **Transcription**: Audio file reference or metadata object - **Speech**: Text string or TTS configuration object See the **Span Types** section below for complete specifications. **Example for Chat** ```json "input": "[{\"role\":\"system\",\"content\":\"You are helpful.\"},{\"role\":\"user\",\"content\":\"Hello\"}]" ``` **Example for Embedding** ```json "input": "Respan is an LLM observability platform" ``` **Example for Workflow** ```json "input": "{\"query\":\"Help with order #12345\",\"context\":{\"user_id\":\"123\"}}" ``` - `output` *string | object | array*: Universal output field for the span. Structure depends on `log_type`: - **Chat**: JSON string of completion message or message object directly - **Embedding**: Array of vector embeddings - **Workflow/Task**: Any JSON-serializable result structure - **Transcription**: Transcribed text string - **Speech**: Audio file reference or base64 audio data **Example for Chat** ```json "output": "{\"role\":\"assistant\",\"content\":\"Hello! How can I help you?\"}" ``` **Example for Embedding** ```json "output": "[0.123, -0.456, 0.789, ...]" ``` - `log_type` *string*: Type of span being logged. Determines how `input` and `output` are parsed. **Supported types:** - `"chat"` - Chat completion requests (default) - `"completion"` - Legacy completion requests - `"response"` - OpenAI Response API - `"embedding"` - Embedding generation - `"transcription"` - Speech-to-text - `"speech"` - Text-to-speech - `"workflow"` or `"agent"` - Workflow/agent execution - `"task"` or `"tool"` - Task/tool execution - `"function"` - Function call - `"generation"` - Generation span - `"handoff"` - Agent handoff - `"guardrail"` - Safety check - `"custom"` - Custom span type **Default Behavior** If not specified, defaults to `"chat"`. For chat types, the system automatically extracts `prompt_messages` and `completion_message` from `input` and `output` for backward compatibility. For complete specifications of each type, see [log types](/documentation/resources/reference/data-model#log-types). - `model` *string*: The model used for the inference. Optional but recommended for chat/completion/embedding types. **Example** ```json "model": "gpt-4o-mini" ``` ### Telemetry Performance metrics and cost tracking for monitoring LLM efficiency. - `usage` *object*: Token usage information for the request. **Properties** - `prompt_tokens` *integer*: Number of tokens in the prompt/input. - `completion_tokens` *integer*: Number of tokens in the completion/output. - `total_tokens` *integer*: Total tokens (prompt + completion). - `prompt_tokens_details` *object*: Detailed breakdown of prompt tokens (e.g., cached tokens). - `cache_creation_prompt_tokens` *integer*: For Anthropic models: tokens used to create the cache. **Example** ```json { "usage": { "prompt_tokens": 150, "completion_tokens": 85, "total_tokens": 235, "prompt_tokens_details": { "cached_tokens": 10 } } } ``` - `cost` *float*: Cost of the inference in US dollars. If not provided, will be calculated automatically based on model pricing. - `latency` *float*: Total request latency in seconds (replaces deprecated `generation_time`). <Note> Previously called `generation_time`. For backward compatibility, both field names are supported. </Note> - `time_to_first_token` *float*: Time to first token (TTFT) in seconds. Useful for streaming responses and voice AI applications. <Note> Previously called `ttft`. Both field names are supported. </Note> - `tokens_per_second` *float*: Generation speed in tokens per second. ### Metadata Custom tracking and identification parameters for advanced analytics and filtering. - `metadata` *object*: You can add any key-value pair to this metadata field for your reference. Useful for custom analytics and filtering. **Example** ```json { "metadata": { "language": "en", "environment": "production", "version": "v1.0.0", "feature": "chat_support", "user_tier": "premium" } } ``` - `customer_identifier` *string*: An identifier for the customer that invoked this request. Helps with visualizing user activities. See [customer identifier details](/documentation/features/user-analytics/customer-identifier). **Example** ```json "customer_identifier": "user_123" ``` - `customer_params` *object*: Extended customer information (alternative to individual customer fields). **Properties** - `customer_identifier` *string*: Customer identifier. - `name` *string*: Customer name. - `email` *string*: Customer email. **Example** ```json { "customer_params": { "customer_identifier": "customer_123", "name": "John Doe", "email": "john.doe@example.com" } } ``` - `thread_identifier` *string*: A unique identifier for the conversation thread. Useful for multi-turn conversations. - `custom_identifier` *string*: Same functionality as `metadata`, but indexed for faster querying. **Example** ```json "custom_identifier": "ticket_12345" ``` - `group_identifier` *string*: Group identifier. Use to group related spans together. ### Workflow & tracing Parameters for distributed tracing and workflow tracking. - `trace_unique_id` *string*: Unique identifier for the trace. Used to link multiple spans together in distributed tracing. - `span_workflow_name` *string*: Name of the workflow this span belongs to. - `span_name` *string*: Name of this specific span/task within the workflow. - `span_parent_id` *string*: ID of the parent span. Used to build the trace hierarchy. ## Advanced parameters ### Tool calls and function calling - `tools` *array*: A list of tools the model may call. Currently, only functions are supported as a tool. **Properties** - `type` *string* **required**: The type of the tool. Currently, only `function` is supported. - `function` *object* **required**: **Properties** - `name` *string* **required**: The name of the function. - `description` *string*: A description of what the function does. - `parameters` *object*: The parameters the function accepts. **Example** ```json "tools": [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" }, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]} }, "required": ["location"] } } } ] ``` - `tool_choice` *string | object*: Controls which (if any) tool is called by the model. Can be `"none"`, `"auto"`, or an object specifying a specific tool. **Example** ```json "tool_choice": { "type": "function", "function": { "name": "get_current_weather" } } ``` ### Response configuration - `response_format` *object*: Setting to `{ "type": "json_schema", "json_schema": {...} }` enables Structured Outputs. **Possible types** - **Text**: `{ "type": "text" }` - Default response format - **JSON Schema**: `{ "type": "json_schema", "json_schema": {...} }` - Structured outputs - **JSON Object**: `{ "type": "json_object" }` - Legacy JSON format ### Model configuration - `temperature` *number*: Controls randomness in the output (0-2). Higher values produce more random responses. - `top_p` *number*: Nucleus sampling parameter. Alternative to temperature. - `frequency_penalty` *number*: Penalizes tokens based on their frequency in the text so far. - `presence_penalty` *number*: Penalizes tokens based on whether they appear in the text so far. - `max_tokens` *integer*: Maximum number of tokens to generate. - `stop` *array[string]*: Stop sequences where generation will stop. ### Error handling and status - `status_code` *integer*: The HTTP status code for the request. Default is 200 (success). **Supported status codes** All valid HTTP status codes are supported: `200`, `201`, `400`, `401`, `403`, `404`, `429`, `500`, `502`, `503`, `504`, etc. - `error_message` *string*: Error message if the request failed. Default is empty string. - `warnings` *string | object*: Any warnings that occurred during the request. - `status` *string*: Request status. Common values: `"success"`, `"error"`. ### Additional configuration - `stream` *boolean*: Whether the response was streamed. - `prompt_id` *string*: ID of the prompt template used. See [Prompts documentation](/documentation/features/prompt-management/manage-prompts). - `prompt_name` *string*: Name of the prompt template. - `is_custom_prompt` *boolean*: Whether the prompt is a custom prompt. Set to `true` if using custom `prompt_id`. - `timestamp` *string*: ISO 8601 timestamp when the request completed. **Example** ```json "timestamp": "2025-01-01T10:30:00Z" ``` - `start_time` *string*: ISO 8601 timestamp when the request started. - `full_request` *object*: The full request object. Useful for logging additional configuration parameters. <Note> Tool calls and other nested objects will be automatically extracted from `full_request`. </Note> - `full_response` *object*: The full response object from the model provider. ### Pricing configuration - `prompt_unit_price` *number*: Custom price per 1M prompt tokens. Used for self-hosted or fine-tuned models. **Example** ```json "prompt_unit_price": 0.0042 // $0.0042 per 1M tokens ``` - `completion_unit_price` *number*: Custom price per 1M completion tokens. Used for self-hosted or fine-tuned models. **Example** ```json "completion_unit_price": 0.0042 // $0.0042 per 1M tokens ``` ### API controls - `respan_api_controls` *object*: Control the behavior of the Respan logging API. **Properties** - `block` *boolean*: If `false`, the server immediately returns initialization status without waiting for log completion. **Example** ```json { "respan_api_controls": { "block": true } } ``` - `positive_feedback` *boolean*: Whether the user liked the output. `true` means positive feedback.

Authentication

AuthorizationBearer

API key authentication. Get your API key from https://platform.respan.ai/platform/api-keys

Request

This endpoint expects an object.

prompt_messageslist of stringsOptional

Legacy field. Use input instead.

completion_messageobjectOptional

Legacy field. Use output instead.

inputstring or map from strings to any or list of stringsOptional

Universal input field for the span. Structure depends on log_type: - Chat: JSON string of messages array or messages array directly - Embedding: Text string or array of strings - Workflow/Task: Any JSON-serializable structure - Transcription: Audio file reference or metadata object - Speech: Text string or TTS configuration object See the Span Types section below for complete specifications.

outputstring or map from strings to any or list of stringsOptional

Universal output field for the span. Structure depends on log_type: - Chat: JSON string of completion message or message object directly - Embedding: Array of vector embeddings - Workflow/Task: Any JSON-serializable result structure - Transcription: Transcribed text string - Speech: Audio file reference or base64 audio data

log_typestringOptionalDefaults to chat

Type of span being logged. Determines how input and output are parsed. Supported types: - “chat” - Chat completion requests (default) - “completion” - Legacy completion requests - “response” - OpenAI Response API - “embedding” - Embedding generation - “transcription” - Speech-to-text - “speech” - Text-to-speech - “workflow” or “agent” - Workflow/agent execution - “task” or “tool” - Task/tool execution - “function” - Function call - “generation” - Generation span - “handoff” - Agent handoff - “guardrail” - Safety check - “custom” - Custom span type

Type of span being logged. Determines how input and output are parsed. Supported types: - "chat" - Chat completion requests (default) - "completion" - Legacy completion requests - "response" - OpenAI Response API - "embedding" - Embedding generation - "transcription" - Speech-to-text - "speech" - Text-to-speech - "workflow" or "agent" - Workflow/agent execution - "task" or "tool" - Task/tool execution - "function" - Function call - "generation" - Generation span - "handoff" - Agent handoff - "guardrail" - Safety check - "custom" - Custom span type

modelstringOptional

The model used for the inference. Optional but recommended for chat/completion/embedding types.

usageobjectOptional

Token usage information for the request.

costdoubleOptional

Cost of the inference in US dollars. If not provided, will be calculated automatically based on model pricing.

latencydoubleOptional

Total request latency in seconds (replaces deprecated generation_time).

time_to_first_tokendoubleOptional

Time to first token (TTFT) in seconds. Useful for streaming responses and voice AI applications.

tokens_per_seconddoubleOptional

Generation speed in tokens per second.

metadataobjectOptional

You can add any key-value pair to this metadata field for your reference. Useful for custom analytics and filtering.

customer_identifierstringOptional<=254 characters

An identifier for the customer that invoked this request. Helps with visualizing user activities. See customer identifier details. Max 254 characters; auto-truncated if exceeded.

customer_paramsobjectOptional

Extended customer information (alternative to individual customer fields).

thread_identifierstringOptional

A unique identifier for the conversation thread. Useful for multi-turn conversations.

custom_identifierstringOptional

Same functionality as metadata, but indexed for faster querying.

group_identifierstringOptional

Group identifier. Use to group related spans together.

trace_unique_idstringOptional

Unique identifier for the trace. Used to link multiple spans together in distributed tracing.

span_workflow_namestringOptional

Name of the workflow this span belongs to.

span_namestringOptional

Name of this specific span/task within the workflow.

span_parent_idstringOptional

ID of the parent span. Used to build the trace hierarchy.

toolslist of stringsOptional

A list of tools the model may call. Currently, only functions are supported as a tool.

tool_choicestring or map from strings to anyOptional

Controls which (if any) tool is called by the model. Can be “none”, “auto”, or an object specifying a specific tool.

response_formatobjectOptional

Setting to { “type”: “json_schema”, “json_schema”: {…} } enables Structured Outputs.

temperaturedoubleOptionalDefaults to 1

Controls randomness in the output (0-2). Higher values produce more random responses.

top_pdoubleOptionalDefaults to 1

Nucleus sampling parameter. Alternative to temperature.

frequency_penaltydoubleOptional

Penalizes tokens based on their frequency in the text so far.

presence_penaltydoubleOptional

Penalizes tokens based on whether they appear in the text so far.

max_tokensintegerOptional

Maximum number of tokens to generate.

stoplist of stringsOptional

Stop sequences where generation will stop.

status_codeintegerOptionalDefaults to 200

The HTTP status code for the request. Default is 200 (success).

error_messagestringOptional

Error message if the request failed. Default is empty string.

warningsstring or map from strings to anyOptional

Any warnings that occurred during the request.

statusstringOptional

Request status. Common values: “success”, “error”.

streambooleanOptional

Whether the response was streamed.

prompt_idstringOptional

ID of the prompt template used. See Prompts documentation.

prompt_namestringOptional

Name of the prompt template.

is_custom_promptbooleanOptional

Whether the prompt is a custom prompt. Set to true if using custom prompt_id.

timestampstringOptional

ISO 8601 timestamp when the request completed.

start_timestringOptional

ISO 8601 timestamp when the request started.

full_requestobjectOptional

The full request object. Useful for logging additional configuration parameters.

full_responseobjectOptional

The full response object from the model provider.

prompt_unit_pricedoubleOptional

Custom price per 1M prompt tokens. Used for self-hosted or fine-tuned models.

completion_unit_pricedoubleOptional

Custom price per 1M completion tokens. Used for self-hosted or fine-tuned models.

respan_api_controlsobjectOptional

Control the behavior of the Respan logging API.

positive_feedbackbooleanOptional

Whether the user liked the output. true means positive feedback.

Response

Successful response for Create span

usageobject

Errors

401

Unauthorized Error

This guide shows you how to create any type of LLM span in Respan using the universal input/output design that supports all span types.

Span size limit: 20MB

Each span payload has a maximum size limit of 20MB. This includes the input, output, and all other fields combined. Spans exceeding this limit will be rejected.

Input/Output

Respan uses universal input and output fields across all span types.

Chat completions: Messages arrays
Embeddings: Text strings or arrays
Transcriptions: Audio metadata → text
Speech: Text → audio
Workflows/Tasks: Any custom data structure
Agent operations: Complex nested objects

How it works:

You provide input and output fields in any structure (string, object, array, etc.)
Set log_type to indicate span type ("chat", "embedding", "workflow", etc.)
Respan automatically extracts type-specific fields for backward compatibility
Your data is stored efficiently and retrieved with both universal and type-specific fields

For complete log_type specifications, see log types.

Legacy field support

For backward compatibility, Respan still supports legacy fields:

prompt_messages array: Legacy field. Use input instead.
completion_message object: Legacy field. Use output instead.

Request body

Core fields

input string | object | array: Universal input field for the span. Structure depends on log_type:
Chat: JSON string of messages array or messages array directly
Embedding: Text string or array of strings
Workflow/Task: Any JSON-serializable structure
Transcription: Audio file reference or metadata object
Speech: Text string or TTS configuration object

See the Span Types section below for complete specifications.

Example for Chat

1 "input": "[{\"role\":\"system\",\"content\":\"You are helpful.\"},{\"role\":\"user\",\"content\":\"Hello\"}]"

Example for Embedding

1 "input": "Respan is an LLM observability platform"

Example for Workflow

1 "input": "{\"query\":\"Help with order #12345\",\"context\":{\"user_id\":\"123\"}}"

output string | object | array: Universal output field for the span. Structure depends on log_type:
Chat: JSON string of completion message or message object directly
Embedding: Array of vector embeddings
Workflow/Task: Any JSON-serializable result structure
Transcription: Transcribed text string
Speech: Audio file reference or base64 audio data

Example for Chat

1 "output": "{\"role\":\"assistant\",\"content\":\"Hello! How can I help you?\"}"

Example for Embedding

1 "output": "[0.123, -0.456, 0.789, ...]"

log_type string: Type of span being logged. Determines how input and output are parsed.

Supported types:

"chat" - Chat completion requests (default)
"completion" - Legacy completion requests
"response" - OpenAI Response API
"embedding" - Embedding generation
"transcription" - Speech-to-text
"speech" - Text-to-speech
"workflow" or "agent" - Workflow/agent execution
"task" or "tool" - Task/tool execution
"function" - Function call
"generation" - Generation span
"handoff" - Agent handoff
"guardrail" - Safety check
"custom" - Custom span type

Default Behavior

If not specified, defaults to "chat". For chat types, the system automatically extracts prompt_messages and completion_message from input and output for backward compatibility.

For complete specifications of each type, see log types.

model string: The model used for the inference. Optional but recommended for chat/completion/embedding types.

Example

1 "model": "gpt-4o-mini"

Telemetry

Performance metrics and cost tracking for monitoring LLM efficiency.

usage object: Token usage information for the request.

Properties

prompt_tokens integer: Number of tokens in the prompt/input.
completion_tokens integer: Number of tokens in the completion/output.
total_tokens integer: Total tokens (prompt + completion).
prompt_tokens_details object: Detailed breakdown of prompt tokens (e.g., cached tokens).
cache_creation_prompt_tokens integer: For Anthropic models: tokens used to create the cache.

Example

1 {
2   "usage": {
3     "prompt_tokens": 150,
4     "completion_tokens": 85,
5     "total_tokens": 235,
6     "prompt_tokens_details": {
7       "cached_tokens": 10
8     }
9   }
10 }

cost float: Cost of the inference in US dollars. If not provided, will be calculated automatically based on model pricing.
latency float: Total request latency in seconds (replaces deprecated generation_time).

Previously called generation_time. For backward compatibility, both field names are supported.

time_to_first_token float: Time to first token (TTFT) in seconds. Useful for streaming responses and voice AI applications.

Previously called ttft. Both field names are supported.

tokens_per_second float: Generation speed in tokens per second.

Metadata

Custom tracking and identification parameters for advanced analytics and filtering.

metadata object: You can add any key-value pair to this metadata field for your reference. Useful for custom analytics and filtering.

Example

1 {
2   "metadata": {
3     "language": "en",
4     "environment": "production",
5     "version": "v1.0.0",
6     "feature": "chat_support",
7     "user_tier": "premium"
8   }
9 }

customer_identifier string: An identifier for the customer that invoked this request. Helps with visualizing user activities. See customer identifier details.

Example

1 "customer_identifier": "user_123"

customer_params object: Extended customer information (alternative to individual customer fields).

Properties

customer_identifier string: Customer identifier.
name string: Customer name.
email string: Customer email.

Example

1 {
2   "customer_params": {
3     "customer_identifier": "customer_123",
4     "name": "John Doe",
5     "email": "john.doe@example.com"
6   }
7 }

thread_identifier string: A unique identifier for the conversation thread. Useful for multi-turn conversations.
custom_identifier string: Same functionality as metadata, but indexed for faster querying.

Example

1 "custom_identifier": "ticket_12345"

group_identifier string: Group identifier. Use to group related spans together.

Workflow & tracing

Parameters for distributed tracing and workflow tracking.

trace_unique_id string: Unique identifier for the trace. Used to link multiple spans together in distributed tracing.
span_workflow_name string: Name of the workflow this span belongs to.
span_name string: Name of this specific span/task within the workflow.
span_parent_id string: ID of the parent span. Used to build the trace hierarchy.

Advanced parameters

Tool calls and function calling

tools array: A list of tools the model may call. Currently, only functions are supported as a tool.

Properties

type string required: The type of the tool. Currently, only function is supported.
function object required: Properties
name string required: The name of the function.
description string: A description of what the function does.
parameters object: The parameters the function accepts.

Example

1 "tools": [
2     {
3         "type": "function",
4         "function": {
5             "name": "get_current_weather",
6             "description": "Get the current weather in a given location",
7             "parameters": {
8                 "type": "object",
9                 "properties": {
10                     "location": {
11                         "type": "string",
12                         "description": "The city and state, e.g. San Francisco, CA"
13                     },
14                     "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
15                 },
16                 "required": ["location"]
17             }
18         }
19     }
20 ]

tool_choice string | object: Controls which (if any) tool is called by the model. Can be "none", "auto", or an object specifying a specific tool.

Example

1 "tool_choice": {
2     "type": "function",
3     "function": {
4         "name": "get_current_weather"
5     }
6 }

Response configuration

response_format object: Setting to { "type": "json_schema", "json_schema": {...} } enables Structured Outputs.

Possible types

Text: { "type": "text" } - Default response format
JSON Schema: { "type": "json_schema", "json_schema": {...} } - Structured outputs
JSON Object: { "type": "json_object" } - Legacy JSON format

Model configuration

temperature number: Controls randomness in the output (0-2). Higher values produce more random responses.
top_p number: Nucleus sampling parameter. Alternative to temperature.
frequency_penalty number: Penalizes tokens based on their frequency in the text so far.
presence_penalty number: Penalizes tokens based on whether they appear in the text so far.
max_tokens integer: Maximum number of tokens to generate.
stop array[string]: Stop sequences where generation will stop.

Error handling and status

status_code integer: The HTTP status code for the request. Default is 200 (success).

Supported status codes

All valid HTTP status codes are supported: 200, 201, 400, 401, 403, 404, 429, 500, 502, 503, 504, etc.

error_message string: Error message if the request failed. Default is empty string.
warnings string | object: Any warnings that occurred during the request.
status string: Request status. Common values: "success", "error".

Additional configuration

stream boolean: Whether the response was streamed.
prompt_id string: ID of the prompt template used. See Prompts documentation.
prompt_name string: Name of the prompt template.
is_custom_prompt boolean: Whether the prompt is a custom prompt. Set to true if using custom prompt_id.
timestamp string: ISO 8601 timestamp when the request completed.

Example

1 "timestamp": "2025-01-01T10:30:00Z"

start_time string: ISO 8601 timestamp when the request started.
full_request object: The full request object. Useful for logging additional configuration parameters.

Tool calls and other nested objects will be automatically extracted from full_request.

full_response object: The full response object from the model provider.

Pricing configuration

prompt_unit_price number: Custom price per 1M prompt tokens. Used for self-hosted or fine-tuned models.

Example

1 "prompt_unit_price": 0.0042  // $0.0042 per 1M tokens

completion_unit_price number: Custom price per 1M completion tokens. Used for self-hosted or fine-tuned models.

Example

1 "completion_unit_price": 0.0042  // $0.0042 per 1M tokens

API controls

respan_api_controls object: Control the behavior of the Respan logging API.

Properties

block boolean: If false, the server immediately returns initialization status without waiting for log completion.

Example

1 {
2   "respan_api_controls": {
3     "block": true
4   }
5 }

positive_feedback boolean: Whether the user liked the output. true means positive feedback.

1	curl -X POST https://api.respan.ai/api/request-logs/ \
2	-H "Authorization: Bearer <token>" \
3	-H "Content-Type: application/json" \
4	-d '{}'

1	{
2	"usage": {
3	"prompt_tokens": 150,
4	"completion_tokens": 85,
5	"total_tokens": 235,
6	"prompt_tokens_details": {
7	"cached_tokens": 10
8	}
9	}
10	}

1	{
2	"metadata": {
3	"language": "en",
4	"environment": "production",
5	"version": "v1.0.0",
6	"feature": "chat_support",
7	"user_tier": "premium"
8	}
9	}

1	{
2	"customer_params": {
3	"customer_identifier": "customer_123",
4	"name": "John Doe",
5	"email": "john.doe@example.com"
6	}
7	}

1	"tools": [
2	{
3	"type": "function",
4	"function": {
5	"name": "get_current_weather",
6	"description": "Get the current weather in a given location",
7	"parameters": {
8	"type": "object",
9	"properties": {
10	"location": {
11	"type": "string",
12	"description": "The city and state, e.g. San Francisco, CA"
13	},
14	"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
15	},
16	"required": ["location"]
17	}
18	}
19	}
20	]

1	"tool_choice": {
2	"type": "function",
3	"function": {
4	"name": "get_current_weather"
5	}
6	}