Speech to text

POST/v1/audio/transcriptionsTranscribe audio to text

Transcribes audio with distil-whisper large-v3 — the accuracy of Whisper large-v3 at a fraction of the size, tuned for fast, clean English transcripts. Send a multipart/form-data request, exactly like OpenAI's endpoint.

curl https://runanything.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F file=@recording.wav \
  -F model=distil-whisper-large-v3

from openai import OpenAI

client = OpenAI(base_url="https://runanything.ai/v1", api_key="YOUR_API_KEY")

result = client.audio.transcriptions.create(
    model="distil-whisper-large-v3",
    file=open("recording.wav", "rb"),
)
print(result.text)

import OpenAI from "openai";
import fs from "node:fs";

const client = new OpenAI({
  baseURL: "https://runanything.ai/v1",
  apiKey: "YOUR_API_KEY",
});

const result = await client.audio.transcriptions.create({
  model: "distil-whisper-large-v3",
  file: fs.createReadStream("recording.wav"),
});
console.log(result.text);

Request body

Parameter	Type	Description
`file`required	file	The audio to transcribe: webm, mp4, ogg, wav, or mp3. Up to 4 MB during the beta.
`model`	string	`distil-whisper-large-v3` (default). `whisper-1`, `gpt-4o-transcribe`, and `gpt-4o-mini-transcribe` are accepted aliases.
`language`	string	ISO-639-1 hint (e.g. `en`). Defaults to English — the model is English-optimized, so non-English audio is best-effort.
`response_format`	string	`json` (default), `text`, or `verbose_json`. `srt` and `vtt` aren't supported yet.
`prompt, temperature`	—	Accepted for compatibility and currently ignored.

Responses

json (default):

{ "text": "Nah, I don't think so. Hello, how are you doing today?" }

text returns the bare transcript as text/plain. verbose_json adds metadata (segment-level timestamps aren't populated yet):

{
  "task": "transcribe",
  "language": "en",
  "duration": 0,
  "text": "Nah, I don't think so. Hello, how are you doing today?",
  "segments": []
}

Silence is not an error: audio with no discernible speech returns a 200 with an empty text.

4 MB file limit. Our edge rejects request bodies over ~4.5 MB, so uploads are capped at 4 MB during the beta — roughly 4 minutes of 128 kbps mp3 or 20+ minutes of Opus-compressed webm. For longer recordings, chunk the audio and transcribe the pieces.