Speech to text

Transcribe audio to text through the Respan gateway with automatic logging.

Authentication

AuthorizationBearer

Use your Respan API key for Respan API authentication. Enter only the Respan API key value; clients send Authorization: Bearer <RESPAN_API_KEY>. For /api/responses, OpenAI or Azure OpenAI provider credentials go in Settings -> Providers or the request body credential_override field, not in this auth field.

Headers

X-Data-Respan-ParamsstringOptional

Base64-encoded JSON object of Respan parameters. Legacy X-Data-Keywordsai-Params is still accepted.

Request

This endpoint expects a multipart form containing a file.
filefileRequired

Audio file. Supported: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm.

modelenumRequired
Model ID.
Allowed values:
languagestringOptional

Input audio language (ISO-639-1).

promptstringOptional
Optional text to guide the model's style.
response_formatenumOptionalDefaults to json
Output format.
Allowed values:
temperaturedoubleOptional

Sampling temperature (0-1).

timestamp_granularitiesenumOptional

Timestamp granularities. Requires verbose_json response format.

Allowed values:
customer_credentialsobjectOptional

Per-customer LLM provider credentials.

disable_logbooleanOptionalDefaults to false

When true, omits input/output from the log. Metrics still recorded.

metadataobjectOptional

Custom key-value metadata.

customer_identifierstringOptional
End user identifier.
thread_identifierstringOptional
Conversation thread ID.

Response

Transcription result.
textstring
Transcribed text.
languagestring
Detected language.
durationdouble
Audio duration in seconds.
wordslist of objects

Word-level timestamps (if requested).

segmentslist of objects

Segment-level timestamps (if requested).

Errors

401
Unauthorized Error