Speech to text
POST
/v1/audio/transcriptionsTranscribe audio to textTranscribes audio with distil-whisper large-v3 — the accuracy of Whisper large-v3 at a fraction of the size, tuned for fast, clean English transcripts. Send a multipart/form-data request, exactly like OpenAI's endpoint.
curl https://runanything.ai/v1/audio/transcriptions \
-H "Authorization: Bearer YOUR_API_KEY" \
-F file=@recording.wav \
-F model=distil-whisper-large-v3from openai import OpenAI
client = OpenAI(base_url="https://runanything.ai/v1", api_key="YOUR_API_KEY")
result = client.audio.transcriptions.create(
model="distil-whisper-large-v3",
file=open("recording.wav", "rb"),
)
print(result.text)import OpenAI from "openai";
import fs from "node:fs";
const client = new OpenAI({
baseURL: "https://runanything.ai/v1",
apiKey: "YOUR_API_KEY",
});
const result = await client.audio.transcriptions.create({
model: "distil-whisper-large-v3",
file: fs.createReadStream("recording.wav"),
});
console.log(result.text);Request body
| Parameter | Type | Description |
|---|---|---|
filerequired | file | The audio to transcribe: webm, mp4, ogg, wav, or mp3. Up to 4 MB during the beta. |
model | string | distil-whisper-large-v3 (default). whisper-1, gpt-4o-transcribe, and gpt-4o-mini-transcribe are accepted aliases. |
language | string | ISO-639-1 hint (e.g. en). Defaults to English — the model is English-optimized, so non-English audio is best-effort. |
response_format | string | json (default), text, or verbose_json. srt and vtt aren't supported yet. |
prompt, temperature | — | Accepted for compatibility and currently ignored. |
Responses
json (default):
{ "text": "Nah, I don't think so. Hello, how are you doing today?" }text returns the bare transcript as text/plain. verbose_json adds metadata (segment-level timestamps aren't populated yet):
{
"task": "transcribe",
"language": "en",
"duration": 0,
"text": "Nah, I don't think so. Hello, how are you doing today?",
"segments": []
}Silence is not an error: audio with no discernible speech returns a 200 with an empty text.
4 MB file limit. Our edge rejects request bodies over ~4.5 MB, so uploads are capped at 4 MB during the beta — roughly 4 minutes of 128 kbps mp3 or 20+ minutes of Opus-compressed webm. For longer recordings, chunk the audio and transcribe the pieces.