Skip to main content

Overview

The /use-local-whisper skill switches voice transcription from OpenAI’s Whisper API to local whisper.cpp running on your device. All transcription happens locally — no API key, no network calls, no cost.
Requires /add-voice-transcription skill to be applied first. Currently supports WhatsApp only.

Advantages

Zero Cost

No OpenAI API usage fees

Privacy

Audio never leaves your device

Offline

Works without internet connection

Fast

Apple Silicon optimization

Prerequisites

1

Apply voice-transcription skill first

/add-voice-transcription
2

Install whisper-cpp

brew install whisper-cpp
This installs the whisper-cli binary.
3

Install ffmpeg

brew install ffmpeg
4

Download a model

mkdir -p data/models

# Small model (~150MB, fast)
curl -L https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin \
  -o data/models/ggml-small.bin

# Or base model (~150MB, balanced)
curl -L https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin \
  -o data/models/ggml-base.bin

How to Apply

/use-local-whisper
The skill will:
  1. Verify dependencies are installed
  2. Check for model file in data/models/
  3. Modify src/transcription.ts to use local whisper
  4. Update error handling for local execution
  5. Rebuild and restart

What Changes

Files Modified

  • src/transcription.ts - Switches from OpenAI API to whisper-cli
  • src/channels/whatsapp.ts - Audio handling updated

Behavioral Changes

  • Voice messages transcribed locally
  • No network calls for transcription
  • Slightly different transcription quality (depends on model)

Model Selection

ModelSizeSpeedQualityUse Case
tiny~75MBFastestLowerQuick transcription
base~150MBFastGoodGeneral use
small~500MBMediumBetterAccuracy priority
medium~1.5GBSlowBestMaximum quality
Start with base model for best balance of speed and quality.

Configuration

Environment Variables

WHISPER_MODEL_PATH
string
Path to GGML model file (default: data/models/ggml-base.bin)
WHISPER_LANGUAGE
string
Language code for better accuracy (default: en)

Usage

No changes to user experience:
[User sends voice message in WhatsApp]

Assistant:
[Transcription: "Hey, can you check my calendar for tomorrow?"]

Sure! Let me check your calendar...
Transcription happens locally instead of via OpenAI API.

Performance

Apple Silicon (M1/M2/M3)

  • base model: ~2-3x realtime (10s audio = 3-5s to transcribe)
  • small model: ~1-1.5x realtime (10s audio = 7-10s to transcribe)

Intel Mac

  • base model: ~1-1.5x realtime
  • small model: ~0.5-1x realtime (may be slower than audio)
Apple Silicon is highly recommended for local whisper. Intel Macs may experience slower transcription.

Troubleshooting

Install whisper-cpp:
brew install whisper-cpp

# Verify
which whisper-cli
Check model path:
ls -lh data/models/
Download if missing (see Prerequisites above).
Try smaller model:
# Switch to tiny model for speed
WHISPER_MODEL_PATH=data/models/ggml-tiny.bin
Or upgrade to Apple Silicon if on Intel Mac.
Use larger model:
curl -L https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin \
  -o data/models/ggml-small.bin

# Update config
WHISPER_MODEL_PATH=data/models/ggml-small.bin

Switching Back to OpenAI

To revert to OpenAI Whisper API:
  1. Restore from git:
    git checkout HEAD -- src/transcription.ts
    
  2. Rebuild:
    npm run build
    
  3. Restart service

Voice Transcription

Base voice transcription skill

WhatsApp Channel

WhatsApp setup and features

Build docs developers (and LLMs) love