Voxtral TTS
Powered by Mistral AI — 32+ languages supported

Voxtral TTS
Mistral text to speech in your hands.

Voxtral TTS transforms text into natural, expressive speech with voice cloning, emotion control, and broadcast-quality audio. The best Mistral TTS alternative to ElevenLabs and Kokoro TTS.

32+
Languages
<200ms
Realtime Latency
5s
Voice Clone
44.1kHz
Max Quality

Mistral Voxtral TTS Features — What It Can Do for You

Everything you need for production-grade Mistral text to speech, in one API. A powerful alternative to ElevenLabs, Kokoro TTS, and Ollama voice generation.

Zero-Shot Voice Cloning

Clone any voice from just 5 seconds of reference audio — like Mistral Voxtral TTS voice cloning. No training, no fine-tuning, instant replication.

Emotion Control

Choose from 7 emotions — happy, calm, sad, angry, fearful, disgusted, surprised. More expressive than Kokoro TTS or ElevenLabs.

32+ Languages — Multilingual TTS

Native support for English, Chinese, Japanese, Korean, Spanish, French, German, Arabic. Wider language coverage than Voxtral Mini.

Voxtral Realtime — Ultra-Low Latency

Under 200ms median latency with streaming support. Voxtral realtime voice synthesis for live agents and applications.

Natural Interjections

Add (laughs), (sighs), (coughs), and 20+ human sounds that render naturally — a feature missing in Ollama and other local TTS tools.

Broadcast Quality Audio

Studio-grade output up to 44.1kHz. Ranked #1 on Artificial Analysis and Hugging Face TTS Arena, outperforming Kokoro.

Fine-Grained Control

Adjust speed (0.5x–2x), pitch (-12 to +12), volume, custom pauses. Compatible with vLLM Omni and vLLM serving pipelines.

Production Ready — Mistral AI Powered

Enterprise-grade Mistral AI TTS API with high throughput. Deploy via Hugging Face, vLLM, or our managed cloud.

Mistral TTS Playground — Try Voxtral Text to Speech

Type or paste text, pick a voice or clone your own, and hear Mistral text to speech come to life. Free to use — no API key required.

152 / 10,000

Voxtral TTS Pricing — Mistral Text to Speech Plans

Start free. Scale as you grow. Up to 5x more characters than ElevenLabs at the same price, with quality rivaling Kokoro TTS and Ollama local models.

Current Plan

Free

$0forever

Try Voxtral TTS with no commitment. Perfect for personal projects and evaluation.

  • 10,000 characters / month
  • 5 preset voices
  • Turbo model quality
  • MP3 output
  • Community support
  • Voice cloning
  • Emotion control
  • API access
Active

Starter

$5/month

For indie developers and content creators getting started with TTS at scale.

$0.08 / 1K chars overage

  • 100,000 characters / month
  • All 17+ preset voices
  • Turbo model quality
  • All audio formats
  • REST API access
  • Commercial license
  • Voice cloning
  • Emotion control
Most Popular

Pro

$19/month

Full power for professionals. HD quality, voice cloning, and emotion control.

$0.06 / 1K chars overage

  • 500,000 characters / month
  • All 17+ preset voices
  • HD model quality
  • All audio formats
  • Voice cloning (5 voices)
  • Emotion control
  • Streaming API
  • Priority support

Business

$49/month

High-volume production with unlimited cloning and dedicated support.

$0.04 / 1K chars overage

  • 2,000,000 characters / month
  • All 17+ preset voices
  • HD model quality
  • Unlimited voice clones
  • Emotion control
  • Streaming + Batch API
  • Webhook callbacks
  • Dedicated support

How we compare

ElevenLabs Starter

$5/mo

30K chars

Voxtral Starter

$5/mo

100K chars

ElevenLabs Pro

$99/mo

500K chars

Voxtral Pro

$19/mo

500K chars

Same price, more characters. Switch to Voxtral TTS and save up to 80% vs ElevenLabs.

Need more than 2M characters/month or custom deployment?

Contact us for Enterprise pricing →

Mistral Voxtral TTS API — Quick Start in Minutes

Three lines of code to generate your first speech with the Voxtral TTS API. Works with vLLM, vLLM Omni, or our hosted endpoint.

generate_speech.py
import requests, base64

response = requests.post(
    "https://voxtralttsai.com/api/tts",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "text": "Hello! Welcome to Voxtral TTS.",
        "voice": "casual_male",
        "emotion": "happy",
        "format": "mp3"
    }
)

audio = base64.b64decode(response.json()["audio"])
with open("output.mp3", "wb") as f:
    f.write(audio)

REST API

Simple HTTP endpoints with JSON payloads. Compatible with Mistral AI ecosystem.

Voxtral Realtime Streaming

WebSocket & SSE for real-time audio delivery. Under 200ms latency.

SDKs & vLLM Omni

Python, TypeScript, and cURL examples. Deploy with vLLM or Hugging Face.

Deploy Voxtral TTS with Hugging Face & vLLM

Self-host Mistral Voxtral TTS on your own infrastructure using Hugging Face open weights and vLLM Omni. The model runs on a single GPU with 16GB+ VRAM — no Ollama required. Alternatively, use our managed API for zero-setup deployment, or compare with Kokoro TTS and ElevenLabs on the playground above.

The next chapter of Mistral voice AI
is yours.

Start building with Voxtral TTS today. Free tier available with full Mistral text to speech capabilities — no credit card required. Outperforms Kokoro TTS, Ollama, and Gemini 3.1 Flash Live in voice quality benchmarks.