Voice Chat

Voice chat lets you speak to Khoj instead of typing. Ask questions out loud, listen to responses, and have natural spoken conversations with your AI assistant.

How Voice Chat Works

Speak Your Query

Click the microphone icon and talk naturally

Speech Recognition

Khoj transcribes your voice to text using speech-to-text AI

Review & Edit

The transcription appears - you can edit it before sending if needed

Send & Receive

Submit the message and receive text or voice response

Using Voice Input

Web
Desktop App
Obsidian

Open app.khoj.dev/chat

Click the microphone icon 🎤 in the chat input box

Allow microphone access when prompted by your browser

Speak your question or message

Click stop or wait for automatic detection

Review transcription and click send

Try it now at app.khoj.dev - the mic icon is in the chat input!

Voice Response

Automatic Voice Reply

When you send a voice message, Khoj automatically responds with voice:

Send Voice Message

Use the microphone to ask your question

Receive Voice Response

Khoj speaks the answer automatically

Text Also Shown

The response appears as text too for reference

Manual Voice Playback

Listen to any message, even if typed:

Find the speaker icon 🔊 next to any Khoj message

Click to hear the message read aloud

Click again to pause

Voice response is currently available on the web interface. Desktop and Obsidian support coming soon.

Voice Chat Tips

Speak Clearly

Use a quiet environment when possible
Speak at a normal pace - not too fast or slow
Enunciate clearly, especially for technical terms
Pause briefly between sentences

Structure Complex Queries

For multi-part questions, use structure:Good:

"I have three questions. First, what's the weather today? 
Second, summarize my notes from yesterday. Third, what's 
on my calendar tomorrow?"

This helps both transcription accuracy and response quality.

Review Before Sending

Always check the transcription:

Fix any misheard words
Add punctuation if needed
Correct technical terms or names

This ensures Khoj understands exactly what you meant.

Use Voice for Long-Form

Voice shines for:

Lengthy queries or descriptions
Brainstorming sessions
When typing is inconvenient
Dictating notes or ideas

Mix Input Methods

Combine voice and text freely:

Speak your question
Type follow-up refinements
Use voice for the next topic

Use Cases

Hands-Free
Accessibility
Brainstorming
Learning

While multitasking:

Cooking while asking for recipe help
Exercising while logging workouts
Driving (parked) while reviewing schedule
Cleaning while brainstorming ideas

"Hey Khoj, what ingredients do I need for the pasta 
recipe in my cooking notes?"

For easier access:

Faster than typing for some users
Helpful for visual impairments
Reduces repetitive strain
More natural communication style

"Read me the summary of the article I saved yesterday"

Creative thinking:

Stream of consciousness ideation
Verbal processing of concepts
Natural conversational flow
Quick capture of thoughts

"I'm thinking about a new feature for my app. It would 
let users create custom dashboards with drag and drop 
widgets. What do you think about that idea?"

Educational conversations:

Ask questions naturally
Listen to explanations
Practice pronunciation
Verbal comprehension

"Explain quantum entanglement to me like I'm a beginner"
[Listen to response]
"Can you give me an analogy for that?"

Self-Hosting Configuration

Speech-to-Text (Voice Input)

Default (Local)
OpenAI Whisper API

Automatically configured when you initialize Khoj.

Runs locally on your server
No API keys needed
Privacy-friendly
Works offline
Uses open-source models

Default configuration is sufficient for most users

For potentially better accuracy with cloud processing:

Get OpenAI API Key

If not already configured, get an API key from OpenAISee chat model setup →

Create Speech-to-Text Config

Navigate to: http://localhost:42110/server/admin/database/speechtotextmodeloptions/Click “Add Speech to Text Model Options”

Configure Model

Model name: whisper-1
Model type: Openai
AI model API: Select your configured OpenAI API

Save and Restart

Save configuration and restart Khoj server

Benefits:

Very high accuracy
Supports many languages
Better with accents and background noise

Cost: ~$0.006 per minute of audio

Text-to-Speech (Voice Output)

Default (Local)
ElevenLabs

Included by default - uses local text-to-speech.Works immediately with no configuration.

For high-quality, natural-sounding voices:

Create ElevenLabs Account

Get API Key

Generate an API key from your ElevenLabs dashboard

Set Environment Variable

Add to your environment:

export ELEVEN_LABS_API_KEY=your_api_key_here

Or in docker-compose.yml:

environment:
  - ELEVEN_LABS_API_KEY=your_api_key_here

(Optional) Choose Voice

Browse voices at ElevenLabs Voice LibraryTo configure a specific voice:

Navigate to: http://localhost:42110/server/admin/database/voicemodeloption/
Click “Add Voice Model Option”
Enter the Voice ID from your chosen voice
Save

Restart Server

Restart Khoj to apply changes

Benefits:

Extremely natural voices
Multiple voice options
Emotional intonation
Multiple languages

Cost: Character-based pricing (see ElevenLabs pricing)

Language Support

Speech Recognition
Voice Output

Khoj’s voice input supports many languages:

English (all variants)
Spanish
French
German
Italian
Portuguese
Chinese (Mandarin)
Japanese
Korean
Arabic
Hindi
Russian
And many more…

OpenAI Whisper API has broader language support than local models

Voice Chat Best Practices

Good Microphone

Use a quality mic for better transcription accuracy

Quiet Environment

Reduce background noise when possible

Review Transcriptions

Always check before sending for accuracy

Use Headphones

Prevents feedback when listening to voice responses

Troubleshooting

Microphone not working

Browser (Web):

Check browser permissions for microphone access
Look for blocked mic icon in address bar
Try a different browser
Ensure no other app is using the microphone

Desktop:

Check system microphone permissions
Verify mic is selected as input device
Restart the application

General:

Test microphone in other apps
Check physical mic connection
Update audio drivers

Poor transcription accuracy

Improve quality:

Speak more slowly and clearly
Use quieter environment
Get closer to microphone
Switch to OpenAI Whisper API (self-hosted)
Use better microphone hardware

For technical terms:

Spell them out if needed
Edit transcription before sending
Add to custom vocabulary (if available)

No voice response

Check:

Voice output feature is web-only currently
Click speaker icon manually to play
Browser audio permissions granted
Volume not muted
ElevenLabs API key valid (if configured)

Self-hosted:

Check server logs for TTS errors
Verify ELEVEN_LABS_API_KEY environment variable
Ensure sufficient API credits

Voice response cuts off

Check internet connection stability
Try shorter messages
Verify browser audio isn’t interrupted
Check API rate limits (ElevenLabs)

Privacy Considerations

Voice data handling:Local processing (default):

Voice stays on your server
No data sent to third parties
Maximum privacy

Cloud APIs (OpenAI Whisper, ElevenLabs):

Audio sent to API provider for processing
Subject to provider’s privacy policies
Typically not stored long-term
Check provider terms for details

Keyboard Shortcuts

Speed up voice chat with hotkeys:

Action	Shortcut
Activate microphone	Click mic icon (no global hotkey yet)
Stop recording	Click again or auto-stop
Play/pause voice	Click speaker icon

More keyboard shortcuts available - see Keyboard Shortcuts →

Combining Voice with Other Features

Voice + Search
Voice + Online Search
Voice + Image Generation
Voice + Code

🎤 "Find my notes about project planning from last week"

Speak search queries naturally

🎤 "Search online for the best Italian restaurants near me"

Voice triggers web research too

🎤 "Create an image of a sunset over mountains"

Describe images you want verbally

🎤 "Write Python code to calculate fibonacci numbers"

Dictate coding tasks

Future Enhancements

Voice features coming soon:

Voice response on Desktop and Obsidian
Push-to-talk hotkey
Voice-only mode (continuous conversation)
Voice activity detection (auto-start/stop)
Custom wake word support
Voice command shortcuts

Feature requests and feedback welcome on Discord!

Next Steps

Learn Chat Commands

Master slash commands and conversation features

Keyboard Shortcuts

Navigate Khoj efficiently without the mouse

Mobile Access

Use voice chat on your phone via web app

Get Started

Features

Clients

Data Sources

Advanced

How Voice Chat Works

Using Voice Input

Voice Response

Automatic Voice Reply

Manual Voice Playback

Voice Chat Tips

Use Cases

Self-Hosting Configuration

Speech-to-Text (Voice Input)

Text-to-Speech (Voice Output)

Language Support

Voice Chat Best Practices

Good Microphone

Quiet Environment

Review Transcriptions

Use Headphones

Troubleshooting

Privacy Considerations

Keyboard Shortcuts

Combining Voice with Other Features

Future Enhancements

Next Steps

Learn Chat Commands

Keyboard Shortcuts

Mobile Access

Build docs developers (and LLMs) love

Get Started

Features

Clients

Data Sources

Advanced

​How Voice Chat Works

​Using Voice Input

​Voice Response

​Automatic Voice Reply

​Manual Voice Playback

​Voice Chat Tips

​Use Cases

​Self-Hosting Configuration

​Speech-to-Text (Voice Input)

​Text-to-Speech (Voice Output)

​Language Support

​Voice Chat Best Practices

Good Microphone

Quiet Environment

Review Transcriptions

Use Headphones

​Troubleshooting

​Privacy Considerations

​Keyboard Shortcuts

​Combining Voice with Other Features

​Future Enhancements

​Next Steps

Learn Chat Commands

Keyboard Shortcuts

Mobile Access

Build docs developers (and LLMs) love

How Voice Chat Works

Using Voice Input

Voice Response

Automatic Voice Reply

Manual Voice Playback

Voice Chat Tips

Use Cases

Self-Hosting Configuration

Speech-to-Text (Voice Input)

Text-to-Speech (Voice Output)

Language Support

Voice Chat Best Practices

Troubleshooting

Privacy Considerations

Keyboard Shortcuts

Combining Voice with Other Features

Future Enhancements

Next Steps