/use-local-whisper

Overview

The /use-local-whisper skill switches voice transcription from OpenAI’s Whisper API to local whisper.cpp running on your device. All transcription happens locally — no API key, no network calls, no cost.

Requires /add-voice-transcription skill to be applied first. Currently supports WhatsApp only.

Advantages

Zero Cost

No OpenAI API usage fees

Privacy

Audio never leaves your device

Offline

Works without internet connection

Fast

Apple Silicon optimization

Prerequisites

Apply voice-transcription skill first

/add-voice-transcription

Install whisper-cpp

brew install whisper-cpp

This installs the whisper-cli binary.

Install ffmpeg

brew install ffmpeg

Download a model

mkdir -p data/models

# Small model (~150MB, fast)
curl -L https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin \
  -o data/models/ggml-small.bin

# Or base model (~150MB, balanced)
curl -L https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin \
  -o data/models/ggml-base.bin

How to Apply

/use-local-whisper

The skill will:

Verify dependencies are installed
Check for model file in data/models/
Modify src/transcription.ts to use local whisper
Update error handling for local execution
Rebuild and restart

What Changes

Files Modified

src/transcription.ts - Switches from OpenAI API to whisper-cli
src/channels/whatsapp.ts - Audio handling updated

Behavioral Changes

Voice messages transcribed locally
No network calls for transcription
Slightly different transcription quality (depends on model)

Model Selection

Model	Size	Speed	Quality	Use Case
tiny	~75MB	Fastest	Lower	Quick transcription
base	~150MB	Fast	Good	General use
small	~500MB	Medium	Better	Accuracy priority
medium	~1.5GB	Slow	Best	Maximum quality

Start with base model for best balance of speed and quality.

Configuration

Environment Variables

WHISPER_MODEL_PATH

string

Path to GGML model file (default: data/models/ggml-base.bin)

WHISPER_LANGUAGE

string

Language code for better accuracy (default: en)

Usage

No changes to user experience:

[User sends voice message in WhatsApp]

Assistant:
[Transcription: "Hey, can you check my calendar for tomorrow?"]

Sure! Let me check your calendar...

Transcription happens locally instead of via OpenAI API.

Performance

Apple Silicon (M1/M2/M3)

base model: ~2-3x realtime (10s audio = 3-5s to transcribe)
small model: ~1-1.5x realtime (10s audio = 7-10s to transcribe)

Intel Mac

base model: ~1-1.5x realtime
small model: ~0.5-1x realtime (may be slower than audio)

Apple Silicon is highly recommended for local whisper. Intel Macs may experience slower transcription.

Troubleshooting

whisper-cli not found

Install whisper-cpp:

brew install whisper-cpp

# Verify
which whisper-cli

Model file not found

Check model path:

ls -lh data/models/

Download if missing (see Prerequisites above).

Slow transcription

Try smaller model:

# Switch to tiny model for speed
WHISPER_MODEL_PATH=data/models/ggml-tiny.bin

Or upgrade to Apple Silicon if on Intel Mac.

Poor transcription quality

Use larger model:

curl -L https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin \
  -o data/models/ggml-small.bin

# Update config
WHISPER_MODEL_PATH=data/models/ggml-small.bin

Switching Back to OpenAI

To revert to OpenAI Whisper API:

Restore from git:

git checkout HEAD -- src/transcription.ts

Rebuild:
```
npm run build
```
Restart service

Voice Transcription

Base voice transcription skill

WhatsApp Channel

WhatsApp setup and features

Built-in Skills

Channel Skills

Enhancement Skills

Advanced Skills

Overview

Advantages

Zero Cost

Privacy

Offline

Fast

Prerequisites

How to Apply

What Changes

Files Modified

Behavioral Changes

Model Selection

Configuration

Environment Variables

Usage

Performance

Apple Silicon (M1/M2/M3)

Intel Mac

Troubleshooting

Switching Back to OpenAI

Voice Transcription

WhatsApp Channel

Build docs developers (and LLMs) love

Built-in Skills

Channel Skills

Enhancement Skills

Advanced Skills

​Overview

​Advantages

Zero Cost

Privacy

Offline

Fast

​Prerequisites

​How to Apply

​What Changes

​Files Modified

​Behavioral Changes

​Model Selection

​Configuration

​Environment Variables

​Usage

​Performance

​Apple Silicon (M1/M2/M3)

​Intel Mac

​Troubleshooting

​Switching Back to OpenAI

​Related Documentation

Voice Transcription

WhatsApp Channel

Build docs developers (and LLMs) love

Overview

Advantages

Prerequisites

How to Apply

What Changes

Files Modified

Behavioral Changes

Model Selection

Configuration

Environment Variables

Usage

Performance

Apple Silicon (M1/M2/M3)

Intel Mac

Troubleshooting

Switching Back to OpenAI

Related Documentation