Endpoint
Generates audio from text input using text-to-speech models.
Request
The AI provider to use (e.g., openai)
Your API key for the specified provider
Body Parameters
The TTS model to use (e.g., tts-1, tts-1-hd)
The text to convert to speech. Maximum length is 4096 characters.
The voice to use for speech generation. Available voices: alloy, echo, fable, onyx, nova, shimmer
The audio format. Options: mp3, opus, aac, flac, wav, pcm
Playback speed (0.25 to 4.0)
Response
Returns the audio file content as a binary stream.
The MIME type of the audio file (e.g., audio/mpeg for MP3)
Examples
Basic Text-to-Speech
curl http://localhost:8787/v1/audio/speech \
-H "Content-Type: application/json" \
-H "x-portkey-provider: openai" \
-H "x-portkey-api-key: sk-..." \
-d '{
"model": "tts-1",
"input": "Hello! This is a test of text to speech.",
"voice": "alloy"
}' \
--output speech.mp3
Python SDK
from portkey_ai import Portkey
from pathlib import Path
client = Portkey(
provider="openai",
Authorization="sk-..."
)
response = client.audio.speech.create(
model="tts-1",
voice="alloy",
input="Hello! This is a test of text to speech."
)
# Save to file
response.stream_to_file("speech.mp3")
print("Audio saved to speech.mp3")
JavaScript SDK
import Portkey from 'portkey-ai';
import fs from 'fs';
const client = new Portkey({
provider: 'openai',
Authorization: 'sk-...'
});
const mp3 = await client.audio.speech.create({
model: 'tts-1',
voice: 'alloy',
input: 'Hello! This is a test of text to speech.'
});
const buffer = Buffer.from(await mp3.arrayBuffer());
fs.writeFileSync('speech.mp3', buffer);
console.log('Audio saved to speech.mp3');
High Definition Audio
from portkey_ai import Portkey
client = Portkey(
provider="openai",
Authorization="sk-..."
)
response = client.audio.speech.create(
model="tts-1-hd", # High definition model
voice="nova",
input="The quick brown fox jumps over the lazy dog."
)
response.stream_to_file("speech_hd.mp3")
Different Voice Options
from portkey_ai import Portkey
client = Portkey(
provider="openai",
Authorization="sk-..."
)
voices = ["alloy", "echo", "fable", "onyx", "nova", "shimmer"]
text = "Hello, I am demonstrating different voice options."
for voice in voices:
response = client.audio.speech.create(
model="tts-1",
voice=voice,
input=text
)
response.stream_to_file(f"speech_{voice}.mp3")
print(f"Generated: speech_{voice}.mp3")
from portkey_ai import Portkey
client = Portkey(
provider="openai",
Authorization="sk-..."
)
formats = {
"mp3": "audio/mpeg",
"opus": "audio/opus",
"aac": "audio/aac",
"flac": "audio/flac"
}
text = "Testing different audio formats."
for format_name in formats.keys():
response = client.audio.speech.create(
model="tts-1",
voice="alloy",
input=text,
response_format=format_name
)
response.stream_to_file(f"speech.{format_name}")
print(f"Generated: speech.{format_name}")
Adjust Speech Speed
from portkey_ai import Portkey
client = Portkey(
provider="openai",
Authorization="sk-..."
)
text = "This demonstrates different playback speeds."
# Slow (0.5x)
response = client.audio.speech.create(
model="tts-1",
voice="alloy",
input=text,
speed=0.5
)
response.stream_to_file("speech_slow.mp3")
# Normal (1.0x)
response = client.audio.speech.create(
model="tts-1",
voice="alloy",
input=text,
speed=1.0
)
response.stream_to_file("speech_normal.mp3")
# Fast (2.0x)
response = client.audio.speech.create(
model="tts-1",
voice="alloy",
input=text,
speed=2.0
)
response.stream_to_file("speech_fast.mp3")
Streaming Audio
from portkey_ai import Portkey
client = Portkey(
provider="openai",
Authorization="sk-..."
)
response = client.audio.speech.create(
model="tts-1",
voice="alloy",
input="This is streaming audio output."
)
# Stream to file
with open("speech_stream.mp3", "wb") as f:
for chunk in response.iter_bytes(chunk_size=4096):
f.write(chunk)
Long Text Example
from portkey_ai import Portkey
client = Portkey(
provider="openai",
Authorization="sk-..."
)
long_text = """
The Portkey AI Gateway is a blazing fast API gateway that routes requests
to over 250 language models. It provides a unified interface for accessing
multiple AI providers with features like fallbacks, load balancing,
and automatic retries. The gateway is designed for high performance and
reliability, making it ideal for production AI applications.
"""
response = client.audio.speech.create(
model="tts-1-hd",
voice="nova",
input=long_text
)
response.stream_to_file("long_speech.mp3")
Voice Characteristics
- alloy: Neutral and balanced, good for general use
- echo: Clear and articulate, professional tone
- fable: Warm and expressive, storytelling quality
- onyx: Deep and authoritative, formal tone
- nova: Friendly and energetic, engaging quality
- shimmer: Bright and cheerful, conversational tone
Model Comparison
tts-1 (Standard)
- Lower latency
- Good quality
- Suitable for real-time applications
- More cost-effective
tts-1-hd (High Definition)
- Higher quality audio
- More natural-sounding
- Slightly higher latency
- Better for pre-recorded content
- mp3: Most compatible, good compression (default)
- opus: Best for internet streaming, low latency
- aac: Good quality-to-size ratio
- flac: Lossless compression, larger files
- wav: Uncompressed, largest files
- pcm: Raw audio data
Best Practices
- Choose the Right Model: Use
tts-1 for real-time, tts-1-hd for quality
- Text Length: Keep under 4096 characters per request
- Voice Selection: Test different voices for your use case
- Format Selection: Use MP3 for general use, OPUS for streaming
- Speed Adjustment: Use 0.9-1.1 for natural variations
Use Cases
- Accessibility: Convert text content to audio for visually impaired users
- Content Creation: Generate voiceovers for videos and presentations
- E-learning: Create audio versions of educational content
- Audiobooks: Convert written content to audio format
- Voice Assistants: Generate spoken responses for AI assistants
- Notifications: Create audio alerts and announcements