LiveKit voice agent

Overview

LiveKit enables you to build real-time voice AI agents that can listen, think, and respond naturally. This example demonstrates a basic voice agent using LiveKit’s voice pipeline, which seamlessly combines speech-to-text (Deepgram), language models (OpenAI), and text-to-speech (ElevenLabs).

This example was tested in real hackathons for voice-based interview automation.

What you’ll build

A voice agent that:

Listens to user speech in real-time
Converts speech to text using Deepgram
Processes conversations with GPT-4
Responds with natural voice using ElevenLabs
Handles the complete audio pipeline automatically

Prerequisites

Before you start, you’ll need API keys and accounts for:

Set up LiveKit

API Key
API Secret
WebSocket URL

Get provider API keys

You’ll need:

OpenAI API key for the LLM
Deepgram API key for speech-to-text
ElevenLabs API key for text-to-speech

Install LiveKit agents SDK

pip install livekit livekit-agents livekit-plugins-openai livekit-plugins-deepgram livekit-plugins-elevenlabs

Set environment variables

export LIVEKIT_API_KEY="your_api_key"
export LIVEKIT_API_SECRET="your_api_secret"
export LIVEKIT_URL="wss://your-instance.livekit.cloud"
export OPENAI_API_KEY="your_openai_key"
export DEEPGRAM_API_KEY="your_deepgram_key"
export ELEVENLABS_API_KEY="your_elevenlabs_key"

Complete code

Here’s the full implementation of a basic voice agent:

voice-agent-basic.py

#!/usr/bin/env python
"""
Basic LiveKit Voice Agent Example
Personal experience: Used in voice-based interview automation
"""

import asyncio
import os
from livekit import rtc
from livekit.agents import JobContext, WorkerOptions, cli
from livekit.agents.voice import VoicePipeline
from livekit.plugins import openai, deepgram, elevenlabs

async def entrypoint(ctx: JobContext):
    """Main entry point for voice agent"""
    
    # Connect to the LiveKit room
    await ctx.connect()
    
    # Initialize the voice pipeline
    # This handles: Audio In -> STT -> LLM -> TTS -> Audio Out
    pipeline = VoicePipeline(
        stt=deepgram.STT(model="nova-2"),  # Speech-to-Text
        llm=openai.LLM(
            model="gpt-4-turbo",
            temperature=0.7,
            instructions="""You are a friendly voice assistant helping with
            a product demo at a hackathon. Be concise and helpful.
            Keep responses under 50 words."""
        ),
        tts=elevenlabs.TTS(
            voice="Rachel",
            model="eleven_turbo_v2"
        ),
    )
    
    # Start the pipeline
    pipeline.start(ctx.room)
    
    print(f"Agent started in room: {ctx.room.name}")
    
    # Keep agent alive
    await asyncio.sleep(3600)  # 1 hour max

if __name__ == "__main__":
    # Run with: python voice-agent-basic.py start
    cli.run_app(
        WorkerOptions(
            entrypoint_fnc=entrypoint,
            api_key=os.getenv("LIVEKIT_API_KEY"),
            api_secret=os.getenv("LIVEKIT_API_SECRET"),
            ws_url=os.getenv("LIVEKIT_URL"),
        )
    )

How it works

1. Entry point and room connection

The entrypoint() function is called when a user joins a LiveKit room:

async def entrypoint(ctx: JobContext):
    await ctx.connect()

The JobContext provides access to the room and handles connection management.

2. Voice pipeline setup

The VoicePipeline orchestrates the entire audio flow:

User Audio → Deepgram (STT) → GPT-4 (LLM) → ElevenLabs (TTS) → Agent Audio

STT (Speech-to-Text): Deepgram’s Nova-2 model converts user speech to text in real-time
LLM (Language Model): GPT-4 Turbo processes the conversation and generates responses
TTS (Text-to-Speech): ElevenLabs converts the LLM response back to natural voice

All this happens automatically with minimal code!

3. LLM configuration

You can customize the agent’s personality and behavior:

llm=openai.LLM(
    model="gpt-4-turbo",
    temperature=0.7,  # Controls randomness (0-1)
    instructions="""Your custom instructions here"""
)

The instructions define how the agent should behave. Keep responses concise for better voice UX.

4. Voice selection

ElevenLabs offers various voices:

tts=elevenlabs.TTS(
    voice="Rachel",  # Try: Rachel, Drew, Clyde, Paul, etc.
    model="eleven_turbo_v2"  # Faster, good for real-time
)

Choose voices that match your use case. Preview them in the ElevenLabs Voice Library.

Usage instructions

Start the agent

python voice-agent-basic.py start

The agent will start and wait for users to join rooms.

Create a test room

Use the LiveKit CLI or dashboard to create a room and get a join URL:

livekit-cli create-token \
  --api-key $LIVEKIT_API_KEY \
  --api-secret $LIVEKIT_API_SECRET \
  --join --room my-room \
  --identity user1

Join and test

Open the join URL in your browser. The agent will automatically:

Connect to the room
Start listening for your voice
Respond through audio

Try saying: “Hello, can you help me with a demo?”

Customization examples

pipeline = VoicePipeline(
    stt=deepgram.STT(model="nova-2"),
    llm=openai.LLM(
        model="gpt-4-turbo",
        temperature=0.9,  # More creative
        instructions="""You are a witty and energetic podcast host.
        Use humor and keep the conversation engaging.
        Ask follow-up questions to keep users talking."""
    ),
    tts=elevenlabs.TTS(
        voice="Drew",  # Energetic male voice
        model="eleven_turbo_v2"
    ),
)

Use cases

Voice interviews

Automate screening interviews with natural conversation flows

Customer support

Build voice assistants that handle support queries 24/7

Virtual receptionists

Create voice agents that greet visitors and route calls

Language learning

Build conversation practice bots for language learners

Accessibility tools

Create voice interfaces for users with visual impairments

Voice surveys

Conduct engaging voice-based surveys and feedback collection

Pro tip: Set temperature=0.3-0.5 for professional/formal agents and 0.7-0.9 for creative/casual agents. Lower temperature = more consistent responses.

Voice calls can be expensive with per-minute charges from Deepgram and ElevenLabs. Monitor usage closely and implement timeouts to prevent runaway costs.

Advanced features

Event handling

You can listen to pipeline events for custom logic:

async def entrypoint(ctx: JobContext):
    await ctx.connect()
    
    pipeline = VoicePipeline(
        stt=deepgram.STT(model="nova-2"),
        llm=openai.LLM(model="gpt-4-turbo"),
        tts=elevenlabs.TTS(voice="Rachel"),
    )
    
    @pipeline.on("user_started_speaking")
    def on_user_speech():
        print("User started speaking")
    
    @pipeline.on("agent_started_speaking")
    def on_agent_speech():
        print("Agent started responding")
    
    @pipeline.on("function_call")
    def on_function(call):
        print(f"LLM called function: {call.name}")
    
    pipeline.start(ctx.room)
    await asyncio.sleep(3600)

Multi-participant rooms

The agent can handle multiple users in the same room:

async def entrypoint(ctx: JobContext):
    await ctx.connect()
    
    pipeline = VoicePipeline(
        stt=deepgram.STT(model="nova-2"),
        llm=openai.LLM(
            model="gpt-4-turbo",
            instructions="""You are moderating a group discussion.
            Address participants by name when they speak.
            Facilitate turn-taking and keep conversation flowing."""
        ),
        tts=elevenlabs.TTS(voice="Rachel"),
    )
    
    pipeline.start(ctx.room)
    await asyncio.sleep(3600)

Recording conversations

Record voice interactions for analysis:

from livekit import api

async def entrypoint(ctx: JobContext):
    await ctx.connect()
    
    # Start recording
    recording_client = api.RecordingServiceClient(
        api_key=os.getenv("LIVEKIT_API_KEY"),
        api_secret=os.getenv("LIVEKIT_API_SECRET")
    )
    
    await recording_client.start_recording(
        room_name=ctx.room.name,
        output_type="file"
    )
    
    pipeline = VoicePipeline(
        stt=deepgram.STT(model="nova-2"),
        llm=openai.LLM(model="gpt-4-turbo"),
        tts=elevenlabs.TTS(voice="Rachel"),
    )
    
    pipeline.start(ctx.room)
    await asyncio.sleep(3600)

Troubleshooting

Agent doesn't respond to speech

Verify all API keys are set correctly
Check browser permissions for microphone access
Ensure the agent started successfully (check console output)
Try speaking louder or closer to the microphone

High latency in responses

Use eleven_turbo_v2 instead of eleven_multilingual_v2 for TTS
Switch to gpt-3.5-turbo for faster (but less capable) responses
Check your network connection to LiveKit servers
Consider using a LiveKit instance closer to your region

ModuleNotFoundError for plugins

Install the specific plugins:

pip install livekit-plugins-openai
pip install livekit-plugins-deepgram
pip install livekit-plugins-elevenlabs

Unexpected charges

Set shorter timeout values (e.g., await asyncio.sleep(300) for 5 minutes)
Implement usage monitoring in your code
Use the free tiers: Deepgram (45K minutes), ElevenLabs (10K characters)
Track API usage in respective dashboards

Next steps

Check out the Voice & Communications resources for more tools
Explore the AI & ML resources to enhance your agent’s capabilities
Review the Firecrawl RAG example to add knowledge bases to your agent

Getting Started

Resources

Examples

Overview

What you’ll build

Prerequisites

Complete code

How it works

Usage instructions

Customization examples

Use cases

Voice interviews

Customer support

Virtual receptionists

Language learning

Accessibility tools

Voice surveys

Advanced features

Event handling

Multi-participant rooms

Recording conversations

Troubleshooting

Next steps

Build docs developers (and LLMs) love

Getting Started

Resources

Examples

​Overview

​What you’ll build

​Prerequisites

​Complete code

​How it works

​Usage instructions

​Customization examples

​Use cases

Voice interviews

Customer support

Virtual receptionists

Language learning

Accessibility tools

Voice surveys

​Advanced features

​Event handling

​Multi-participant rooms

​Recording conversations

​Troubleshooting

​Next steps

Build docs developers (and LLMs) love

Overview

What you’ll build

Prerequisites

Complete code

How it works

Usage instructions

Customization examples

Use cases

Advanced features

Event handling

Multi-participant rooms

Recording conversations

Troubleshooting

Next steps