Speech to text

Transcribe audio to text through the Respan gateway with automatic logging.

Headers

AuthorizationstringRequired

Bearer token. Use Bearer YOUR_API_KEY.

Request

This endpoint expects a multipart form containing a file.
filefileRequired

Audio file. Supported: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm.

modelenumRequired
Model ID.
Allowed values:
languagestringOptional

Input audio language (ISO-639-1).

promptstringOptional
Optional text to guide the model's style.
response_formatenumOptionalDefaults to json
Output format.
Allowed values:
temperaturedoubleOptional

Sampling temperature (0-1).

timestamp_granularitiesenumOptional

Timestamp granularities. Requires verbose_json response format.

Allowed values:
customer_credentialsobjectOptional

Per-customer LLM provider credentials.

disable_logbooleanOptionalDefaults to false

When true, omits input/output from the log. Metrics still recorded.

metadataobjectOptional

Custom key-value metadata.

customer_identifierstringOptional
End user identifier.
customer_emailstringOptional
Customer email address.
thread_identifierstringOptional
Conversation thread ID.
request_breakdownbooleanOptional
Return response metrics summary in the response body.

Response

Transcription result.
textstring
Transcribed text.
languagestring
Detected language.
durationdouble
Audio duration in seconds.
wordslist of objects

Word-level timestamps (if requested).

segmentslist of objects

Segment-level timestamps (if requested).

Errors

401
Unauthorized Error