Skip to main content
Voice chat lets you speak to Khoj instead of typing. Ask questions out loud, listen to responses, and have natural spoken conversations with your AI assistant.
Voice Chat Demo

How Voice Chat Works

1

Speak Your Query

Click the microphone icon and talk naturally
2

Speech Recognition

Khoj transcribes your voice to text using speech-to-text AI
3

Review & Edit

The transcription appears - you can edit it before sending if needed
4

Send & Receive

Submit the message and receive text or voice response

Using Voice Input

2
Click the microphone icon 🎤 in the chat input box
3
Allow microphone access when prompted by your browser
4
Speak your question or message
5
Click stop or wait for automatic detection
6
Review transcription and click send
Try it now at app.khoj.dev - the mic icon is in the chat input!

Voice Response

Automatic Voice Reply

When you send a voice message, Khoj automatically responds with voice:
1

Send Voice Message

Use the microphone to ask your question
2

Receive Voice Response

Khoj speaks the answer automatically
3

Text Also Shown

The response appears as text too for reference

Manual Voice Playback

Listen to any message, even if typed:
Speaker icon for text-to-speech
1
Find the speaker icon 🔊 next to any Khoj message
2
Click to hear the message read aloud
3
Click again to pause
Voice response is currently available on the web interface. Desktop and Obsidian support coming soon.

Voice Chat Tips

  • Use a quiet environment when possible
  • Speak at a normal pace - not too fast or slow
  • Enunciate clearly, especially for technical terms
  • Pause briefly between sentences
For multi-part questions, use structure:Good:
"I have three questions. First, what's the weather today? 
Second, summarize my notes from yesterday. Third, what's 
on my calendar tomorrow?"
This helps both transcription accuracy and response quality.
Always check the transcription:
  • Fix any misheard words
  • Add punctuation if needed
  • Correct technical terms or names
This ensures Khoj understands exactly what you meant.
Voice shines for:
  • Lengthy queries or descriptions
  • Brainstorming sessions
  • When typing is inconvenient
  • Dictating notes or ideas
Combine voice and text freely:
  • Speak your question
  • Type follow-up refinements
  • Use voice for the next topic

Use Cases

While multitasking:
  • Cooking while asking for recipe help
  • Exercising while logging workouts
  • Driving (parked) while reviewing schedule
  • Cleaning while brainstorming ideas
"Hey Khoj, what ingredients do I need for the pasta 
recipe in my cooking notes?"

Self-Hosting Configuration

Speech-to-Text (Voice Input)

Automatically configured when you initialize Khoj.
  • Runs locally on your server
  • No API keys needed
  • Privacy-friendly
  • Works offline
  • Uses open-source models
Default configuration is sufficient for most users

Text-to-Speech (Voice Output)

Included by default - uses local text-to-speech.Works immediately with no configuration.

Language Support

Khoj’s voice input supports many languages:
  • English (all variants)
  • Spanish
  • French
  • German
  • Italian
  • Portuguese
  • Chinese (Mandarin)
  • Japanese
  • Korean
  • Arabic
  • Hindi
  • Russian
  • And many more…
OpenAI Whisper API has broader language support than local models

Voice Chat Best Practices

Good Microphone

Use a quality mic for better transcription accuracy

Quiet Environment

Reduce background noise when possible

Review Transcriptions

Always check before sending for accuracy

Use Headphones

Prevents feedback when listening to voice responses

Troubleshooting

Browser (Web):
  • Check browser permissions for microphone access
  • Look for blocked mic icon in address bar
  • Try a different browser
  • Ensure no other app is using the microphone
Desktop:
  • Check system microphone permissions
  • Verify mic is selected as input device
  • Restart the application
General:
  • Test microphone in other apps
  • Check physical mic connection
  • Update audio drivers
Improve quality:
  • Speak more slowly and clearly
  • Use quieter environment
  • Get closer to microphone
  • Switch to OpenAI Whisper API (self-hosted)
  • Use better microphone hardware
For technical terms:
  • Spell them out if needed
  • Edit transcription before sending
  • Add to custom vocabulary (if available)
Check:
  • Voice output feature is web-only currently
  • Click speaker icon manually to play
  • Browser audio permissions granted
  • Volume not muted
  • ElevenLabs API key valid (if configured)
Self-hosted:
  • Check server logs for TTS errors
  • Verify ELEVEN_LABS_API_KEY environment variable
  • Ensure sufficient API credits
  • Check internet connection stability
  • Try shorter messages
  • Verify browser audio isn’t interrupted
  • Check API rate limits (ElevenLabs)

Privacy Considerations

Voice data handling:Local processing (default):
  • Voice stays on your server
  • No data sent to third parties
  • Maximum privacy
Cloud APIs (OpenAI Whisper, ElevenLabs):
  • Audio sent to API provider for processing
  • Subject to provider’s privacy policies
  • Typically not stored long-term
  • Check provider terms for details

Keyboard Shortcuts

Speed up voice chat with hotkeys:
ActionShortcut
Activate microphoneClick mic icon (no global hotkey yet)
Stop recordingClick again or auto-stop
Play/pause voiceClick speaker icon
More keyboard shortcuts available - see Keyboard Shortcuts →

Combining Voice with Other Features

Future Enhancements

Voice features coming soon:
  • Voice response on Desktop and Obsidian
  • Push-to-talk hotkey
  • Voice-only mode (continuous conversation)
  • Voice activity detection (auto-start/stop)
  • Custom wake word support
  • Voice command shortcuts
Feature requests and feedback welcome on Discord!

Next Steps

Learn Chat Commands

Master slash commands and conversation features

Keyboard Shortcuts

Navigate Khoj efficiently without the mouse

Mobile Access

Use voice chat on your phone via web app

Build docs developers (and LLMs) love