ITTSService interface. All providers deliver audio in real-time streaming formats optimized for telephony and WebRTC channels.
Supported providers
The platform includes native integrations for:ElevenLabs
Industry-leading voice cloning and multilingual synthesis
Azure Speech
Microsoft’s neural TTS with 400+ voices
Deepgram
Ultra-low latency streaming TTS
Cartesia
Expressive conversational voices
Google TTS
WaveNet and Neural2 voices
FishAudio
High-quality voice synthesis
Minimax
Advanced Chinese language support
HumeAI
Emotionally intelligent speech
Inworld
Character voices for gaming
Speechify
Natural reading voices
MurfAI
Studio-quality voiceovers
Neuphonic
Neural voice generation
ResembleAI
Real-time voice cloning
Rime
Expressive speech synthesis
Sarvam
Indic language specialist
UpliftAI
Enterprise TTS platform
HamsaAI
Arabic language optimization
Zyphra Zonos
Fast multilingual synthesis
Popular provider configurations
- ElevenLabs
- Azure Speech
- Deepgram
- Cartesia
- Google Cloud
ElevenLabs Text to Speech
Provider ID:ElevenLabsTextToSpeechImplementation:
ElevenLabsTTSService.csIndustry-leading voice cloning with support for 30+ languages and ultra-realistic prosody.Configuration fields
| Field | Type | Required | Description |
|---|---|---|---|
apiKey | password | Yes | ElevenLabs API key from elevenlabs.io |
voiceId | text | Yes | Voice identifier (e.g., 21m00Tcm4TlvDq8ikWAM) |
modelId | select | No | Model: eleven_multilingual_v2, eleven_turbo_v2_5 |
stability | number | No | Voice consistency (0.0-1.0, default: 0.5) |
similarityBoost | number | No | Voice clarity (0.0-1.0, default: 0.75) |
style | number | No | Exaggeration level (0.0-1.0) |
useSpeakerBoost | boolean | No | Enhance clarity (recommended: true) |
speed | number | No | Playback speed (0.5-2.0) |
pronunciationDictionaryIds | array | No | Custom pronunciation dictionaries |
applyTextNormalization | select | No | auto, on, or off |
Recommended settings for voice calls
Use
eleven_turbo_v2_5 for real-time conversations (lowest latency) and eleven_multilingual_v2 for maximum voice quality in non-English languages.Finding voice IDs
- Go to https://elevenlabs.io/voice-library
- Select a voice or clone your own
- Copy the voice ID from the URL or API settings
Pronunciation dictionaries
Create custom dictionaries in the ElevenLabs dashboard to handle:- Brand names and acronyms
- Technical terminology
- Non-standard pronunciations
- Regional variations
pronunciationDictionaryIds array.Implementation details
Interface contract
Audio format handling
Iqra AI automatically handles format conversion:- Provider native format - Each TTS service outputs in its preferred format
- Format detection - System identifies optimal format (PCM, μ-law, Opus, etc.)
- Automatic conversion - Converts to telephony format (8kHz μ-law) or WebRTC (16kHz Opus)
- Streaming delivery - Chunks audio for minimal latency
TTSProviderManager.cs:1-50 for implementation.
Caching system
TheTTSAudioCacheManager optimizes repeated phrases:
- Cache key generation - Hash of text + voice + config
- S3 storage - Persistent cache in RustFS
- TTL management - Configurable expiration
- Cache invalidation - Automatic on config changes
Provider selection guide
- Lowest latency
- Best quality
- Multi-language
- Cost-optimized
Recommended providers:
- Deepgram - Sub-250ms first chunk
- ElevenLabs Turbo - ~300ms latency
- Cartesia - Optimized for streaming
Adding custom providers
To integrate a new TTS provider:- Add enum value in
IqraCore/Entities/Interfaces/InterfaceTTSProviderEnum.cs - Implement interface in
IqraInfrastructure/Managers/TTS/Providers/ - Handle audio formats using
TTSProviderAvailableAudioFormat - Return streaming data via
Streamorbyte[] - Restart application for auto-registration
ElevenLabsTTSService.cs:19-71 for reference implementation.
Next steps
Configure STT
Add speech-to-text for input processing
Multi-language agents
Configure parallel language contexts
Voice settings
Fine-tune voice parameters per agent
Telephony integration
Deploy via phone providers