runanything.ai

Speech to text

POST/v1/audio/transcriptionsTranscribe audio to text

Transcribes audio with distil-whisper large-v3 — the accuracy of Whisper large-v3 at a fraction of the size, tuned for fast, clean English transcripts. Send a multipart/form-data request, exactly like OpenAI's endpoint.

curl https://runanything.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F file=@recording.wav \
  -F model=distil-whisper-large-v3

Request body

ParameterTypeDescription
filerequiredfileThe audio to transcribe: webm, mp4, ogg, wav, or mp3. Up to 4 MB during the beta.
modelstringdistil-whisper-large-v3 (default). whisper-1, gpt-4o-transcribe, and gpt-4o-mini-transcribe are accepted aliases.
languagestringISO-639-1 hint (e.g. en). Defaults to English — the model is English-optimized, so non-English audio is best-effort.
response_formatstringjson (default), text, or verbose_json. srt and vtt aren't supported yet.
prompt, temperatureAccepted for compatibility and currently ignored.

Responses

json (default):

{ "text": "Nah, I don't think so. Hello, how are you doing today?" }

text returns the bare transcript as text/plain. verbose_json adds metadata (segment-level timestamps aren't populated yet):

{
  "task": "transcribe",
  "language": "en",
  "duration": 0,
  "text": "Nah, I don't think so. Hello, how are you doing today?",
  "segments": []
}

Silence is not an error: audio with no discernible speech returns a 200 with an empty text.

4 MB file limit. Our edge rejects request bodies over ~4.5 MB, so uploads are capped at 4 MB during the beta — roughly 4 minutes of 128 kbps mp3 or 20+ minutes of Opus-compressed webm. For longer recordings, chunk the audio and transcribe the pieces.