Overview
LiveKit enables you to build real-time voice AI agents that can listen, think, and respond naturally. This example demonstrates a basic voice agent using LiveKit’s voice pipeline, which seamlessly combines speech-to-text (Deepgram), language models (OpenAI), and text-to-speech (ElevenLabs).This example was tested in real hackathons for voice-based interview automation.
What you’ll build
A voice agent that:- Listens to user speech in real-time
- Converts speech to text using Deepgram
- Processes conversations with GPT-4
- Responds with natural voice using ElevenLabs
- Handles the complete audio pipeline automatically
Prerequisites
Before you start, you’ll need API keys and accounts for:Set up LiveKit
Sign up at LiveKit Cloud and get:
- API Key
- API Secret
- WebSocket URL
Get provider API keys
You’ll need:
- OpenAI API key for the LLM
- Deepgram API key for speech-to-text
- ElevenLabs API key for text-to-speech
Complete code
Here’s the full implementation of a basic voice agent:voice-agent-basic.py
How it works
1. Entry point and room connection
1. Entry point and room connection
The The
entrypoint() function is called when a user joins a LiveKit room:JobContext provides access to the room and handles connection management.2. Voice pipeline setup
2. Voice pipeline setup
The
VoicePipeline orchestrates the entire audio flow:- STT (Speech-to-Text): Deepgram’s Nova-2 model converts user speech to text in real-time
- LLM (Language Model): GPT-4 Turbo processes the conversation and generates responses
- TTS (Text-to-Speech): ElevenLabs converts the LLM response back to natural voice
3. LLM configuration
3. LLM configuration
You can customize the agent’s personality and behavior:The instructions define how the agent should behave. Keep responses concise for better voice UX.
4. Voice selection
4. Voice selection
ElevenLabs offers various voices:Choose voices that match your use case. Preview them in the ElevenLabs Voice Library.
Usage instructions
Customization examples
Use cases
Voice interviews
Automate screening interviews with natural conversation flows
Customer support
Build voice assistants that handle support queries 24/7
Virtual receptionists
Create voice agents that greet visitors and route calls
Language learning
Build conversation practice bots for language learners
Accessibility tools
Create voice interfaces for users with visual impairments
Voice surveys
Conduct engaging voice-based surveys and feedback collection
Advanced features
Event handling
You can listen to pipeline events for custom logic:Multi-participant rooms
The agent can handle multiple users in the same room:Recording conversations
Record voice interactions for analysis:Troubleshooting
Agent doesn't respond to speech
Agent doesn't respond to speech
- Verify all API keys are set correctly
- Check browser permissions for microphone access
- Ensure the agent started successfully (check console output)
- Try speaking louder or closer to the microphone
High latency in responses
High latency in responses
- Use
eleven_turbo_v2instead ofeleven_multilingual_v2for TTS - Switch to
gpt-3.5-turbofor faster (but less capable) responses - Check your network connection to LiveKit servers
- Consider using a LiveKit instance closer to your region
ModuleNotFoundError for plugins
ModuleNotFoundError for plugins
Install the specific plugins:
Unexpected charges
Unexpected charges
- Set shorter timeout values (e.g.,
await asyncio.sleep(300)for 5 minutes) - Implement usage monitoring in your code
- Use the free tiers: Deepgram (45K minutes), ElevenLabs (10K characters)
- Track API usage in respective dashboards
Next steps
- Check out the Voice & Communications resources for more tools
- Explore the AI & ML resources to enhance your agent’s capabilities
- Review the Firecrawl RAG example to add knowledge bases to your agent