Speech to text | Respan Docs

curl -X POST https://api.respan.ai/api/audio/transcriptions \
     -H "Authorization: Bearer <respanApiKey>" \
     -H "Content-Type: multipart/form-data" \
     -F file=@meeting_recording.wav \
     -F model="whisper-1"

{
  "text": "Good morning everyone, let's start the weekly team meeting.",
  "language": "en",
  "duration": 12.5,
  "words": [
    {
      "word": "Good",
      "start": 0,
      "end": 0.3
    },
    {
      "word": "morning",
      "start": 0.3,
      "end": 0.8
    },
    {
      "word": "everyone,",
      "start": 0.8,
      "end": 1.3
    },
    {
      "word": "let's",
      "start": 1.3,
      "end": 1.6
    },
    {
      "word": "start",
      "start": 1.6,
      "end": 2
    },
    {
      "word": "the",
      "start": 2,
      "end": 2.2
    },
    {
      "word": "weekly",
      "start": 2.2,
      "end": 2.7
    },
    {
      "word": "team",
      "start": 2.7,
      "end": 3
    },
    {
      "word": "meeting.",
      "start": 3,
      "end": 3.5
    }
  ],
  "segments": [
    {}
  ]
}

Transcribe audio to text through the Respan gateway with automatic logging.

Authentication

AuthorizationBearer

Use your Respan API key for Respan API authentication. Enter only the Respan API key value; clients send Authorization: Bearer <RESPAN_API_KEY>. For /api/responses, OpenAI or Azure OpenAI provider credentials go in Settings -> Providers or the request body credential_override field, not in this auth field.

Headers

X-Data-Respan-ParamsstringOptional

Base64-encoded JSON object of Respan parameters. Legacy X-Data-Keywordsai-Params is still accepted.

Request

This endpoint expects a multipart form containing a file.

filefileRequired

Audio file. Supported: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm.

modelenumRequired

Model ID.

Allowed values:

languagestringOptional

Input audio language (ISO-639-1).

promptstringOptional

Optional text to guide the model's style.

response_formatenumOptionalDefaults to json

Output format.

Allowed values:

temperaturedoubleOptional

Sampling temperature (0-1).

timestamp_granularitiesenumOptional

Timestamp granularities. Requires verbose_json response format.

Allowed values:

customer_credentialsobjectOptional

Per-customer LLM provider credentials.

disable_logbooleanOptionalDefaults to false

When true, omits input/output from the log. Metrics still recorded.

metadataobjectOptional

Custom key-value metadata.

customer_identifierstringOptional

End user identifier.

thread_identifierstringOptional

Conversation thread ID.

Response

Transcription result.

textstring

Transcribed text.

languagestring

Detected language.

durationdouble

Audio duration in seconds.

wordslist of objects

Word-level timestamps (if requested).

segmentslist of objects

Segment-level timestamps (if requested).

Errors

401

Unauthorized Error