runanything.ai

Text to speech

POST/v1/audio/speechGenerate speech from text

Synthesizes speech with Kokoro-82M — a sentence takes about a second. The request and response match OpenAI's /v1/audio/speech, so official SDKs work unchanged.

curl https://runanything.ai/v1/audio/speech \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kokoro-82m",
    "input": "The quick brown fox jumped over the lazy dog.",
    "voice": "bm_george",
    "response_format": "wav",
    "speed": 1.1
  }' \
  --output speech.wav

Request body

ParameterTypeDescription
inputrequiredstringThe text to speak. 1–4,096 characters.
voicerequiredstringA Kokoro voice id (af_heart, bm_george, full list) or an OpenAI voice name (alloy, nova, onyx, …).
modelstringkokoro-82m (default). tts-1, tts-1-hd, and gpt-4o-mini-tts are accepted aliases for compatibility.
response_formatstringmp3 (default), wav, aac, or pcm. opus and flac aren't supported yet and return a 400.
speednumberPlayback speed, 0.25–4.0. Default 1.0.

Output formats

  • mp3 — 96 kbps mono. The default, and what OpenAI SDK code expects when it doesn't pass a format.
  • wav — 16-bit PCM, 24 kHz mono, standard RIFF file. Largest output, zero decode cost.
  • aac — ADTS AAC. Smallest output at comparable quality.
  • pcm — raw signed 16-bit little-endian samples, 24 kHz mono, no container. The only streaming format — see below.

Streaming with pcm

With response_format: "pcm" the response body streams raw audio as it's synthesized — the first bytes arrive before the full clip exists, which is what you want for assistants and anything conversational. The response includes X-Sample-Rate: 24000; samples are s16le mono.

import pyaudio
from openai import OpenAI

client = OpenAI(base_url="https://runanything.ai/v1", api_key="YOUR_API_KEY")

player = pyaudio.PyAudio().open(
    format=pyaudio.paInt16, channels=1, rate=24000, output=True
)

with client.audio.speech.with_streaming_response.create(
    model="kokoro-82m",
    voice="af_heart",
    input="Audio starts playing before this sentence finishes generating.",
    response_format="pcm",
) as response:
    for chunk in response.iter_bytes(chunk_size=4096):
        player.write(chunk)
The other formats (mp3, wav, aac) are delivered as complete files — fine for clips you save or play whole. If you need time-to-first-audio, use pcm.