Speech Enhancement

This feature is coming in version 0.5.0 and is not yet available in the current release.

Overview

Speech Enhancement will improve audio quality by removing noise, echo, and other artifacts. This preprocessing step significantly improves STT accuracy and audio clarity.

Planned Features

Noise Reduction

Remove background noise from recordings

Echo Cancellation

Eliminate echo and reverb artifacts

Audio Normalization

Normalize volume levels

Before/After Preview

Compare original and enhanced audio

Expected API (Preview)

While the API is not finalized, the expected interface will be:

import { createEnhancement } from 'react-native-sherpa-onnx/enhancement';

// Create enhancement engine
const enhancer = await createEnhancement({
  modelPath: { type: 'asset', path: 'models/rnnoise' },
  noiseReductionLevel: 0.8,  // 0..1
});

// Enhance audio file
const enhanced = await enhancer.processFile('/path/to/noisy.wav');

// Save enhanced audio
await saveAudioToFile(enhanced, '/path/to/clean.wav');

// Cleanup
await enhancer.destroy();

Use Cases

1. Pre-processing for STT

Improve transcription accuracy by cleaning audio first:

// Planned API
const enhancer = await createEnhancement(config);
const stt = await createSTT(sttConfig);

const enhanced = await enhancer.processFile('/path/to/noisy.wav');
const result = await stt.transcribeSamples(
  enhanced.samples,
  enhanced.sampleRate
);

console.log('Clean transcript:', result.text);

await enhancer.destroy();
await stt.destroy();

2. Podcast Cleanup

Remove background noise from recordings:

// Planned API
const enhancer = await createEnhancement({
  modelPath: { type: 'asset', path: 'models/rnnoise' },
  noiseReductionLevel: 0.9,
});

const enhanced = await enhancer.processFile('/path/to/podcast.wav');
await saveAudioToFile(enhanced, '/path/to/podcast-clean.wav');

await enhancer.destroy();

3. Real-time Enhancement

Enhance streaming audio:

// Planned API
const enhancer = await createEnhancement(config);

const recorder = startRecording();

recorder.on('chunk', async (samples) => {
  const enhanced = await enhancer.processSamples(samples, 16000);
  
  // Forward to STT or playback
  await sttStream.acceptWaveform(enhanced, 16000);
});

4. Call Quality Improvement

// Planned API
const enhancer = await createEnhancement({
  modelPath: { type: 'asset', path: 'models/rnnoise' },
  enableEchoCancel: true,
  enableNoiseReduction: true,
});

const enhanced = await enhancer.processFile('/path/to/call.wav');
await saveAudioToFile(enhanced, '/path/to/call-enhanced.wav');

Planned Configuration

// Expected configuration options
interface EnhancementConfig {
  modelPath: ModelPathConfig;
  
  // Noise reduction
  noiseReductionLevel?: number;   // 0 (off) to 1 (max), default 0.5
  enableNoiseReduction?: boolean; // default true
  
  // Echo cancellation
  enableEchoCancel?: boolean;     // default false
  echoSuppressionLevel?: number;  // 0..1
  
  // Normalization
  enableNormalization?: boolean;  // default false
  targetLevel?: number;           // dB, e.g., -20
  
  // Advanced
  frameSize?: number;             // Samples per frame
  hopSize?: number;               // Overlap between frames
}

Expected Output

interface EnhancedAudio {
  samples: number[];       // Enhanced PCM samples
  sampleRate: number;      // Sample rate
  noiseLevelBefore?: number;  // Noise estimate before
  noiseLevelAfter?: number;   // Noise estimate after
}

Enhancement Levels

// Planned presets
const presets = {
  light: { noiseReductionLevel: 0.3 },
  moderate: { noiseReductionLevel: 0.6 },
  aggressive: { noiseReductionLevel: 0.9 },
};

const enhancer = await createEnhancement({
  modelPath: { type: 'asset', path: 'models/rnnoise' },
  ...presets.moderate,
});

Expected Models

Likely model support:

RNNoise - Lightweight noise suppression
DeepFilterNet - Deep learning-based enhancement
Speex - Classic noise reduction
Custom sherpa-onnx models - Optimized for mobile

Timeline

Enhancement support is planned for:

Version 0.5.0

Initial enhancement with basic noise reduction

Future versions

Advanced features like echo cancellation and real-time processing

Stay Updated

To track progress or contribute:

Watch the GitHub repository
Check the changelog
Join discussions in issues or PRs

Current Workarounds

While enhancement is not available, you can:

External libraries - Use JavaScript audio libraries (e.g., Web Audio API)
Pre-process offline - Use desktop tools (Audacity, FFmpeg) before importing
Cloud services - Use enhancement APIs from cloud providers

Simple Normalization Example

function normalizeAudio(samples: number[]): number[] {
  // Find peak
  const peak = Math.max(...samples.map(Math.abs));
  
  if (peak === 0) return samples;
  
  // Normalize to 0.8 (-1.9 dB)
  const targetPeak = 0.8;
  const gain = targetPeak / peak;
  
  return samples.map(s => s * gain);
}

// Usage
const samples = getPcmSamples();
const normalized = normalizeAudio(samples);

// Now use for STT
const result = await stt.transcribeSamples(normalized, 16000);

Simple High-pass Filter

function highPassFilter(samples: number[], alpha: number = 0.95): number[] {
  const filtered: number[] = [];
  let prev = 0;
  
  for (let i = 0; i < samples.length; i++) {
    const current = samples[i];
    filtered[i] = alpha * (prev + current - (samples[i - 1] || 0));
    prev = filtered[i];
  }
  
  return filtered;
}

// Usage (removes low-frequency noise)
const samples = getPcmSamples();
const filtered = highPassFilter(samples);

Integration with STT Pipeline

When available, enhancement will integrate seamlessly:

// Future combined API (preview)
import { createEnhancement } from 'react-native-sherpa-onnx/enhancement';
import { createSTT } from 'react-native-sherpa-onnx/stt';

const enhancer = await createEnhancement(enhancementConfig);
const stt = await createSTT(sttConfig);

// Process pipeline
const enhanced = await enhancer.processFile('/path/to/noisy.wav');
const transcript = await stt.transcribeSamples(
  enhanced.samples,
  enhanced.sampleRate
);

console.log('Transcript:', transcript.text);
console.log('Noise reduction:', 
  `${enhanced.noiseLevelBefore} -> ${enhanced.noiseLevelAfter} dB`
);

await enhancer.destroy();
await stt.destroy();

Comparison Tool

Expected before/after comparison:

// Planned API
const enhancer = await createEnhancement(config);

const result = await enhancer.processFile('/path/to/noisy.wav', {
  returnOriginal: true,  // Include original for comparison
});

// Play original
await playAudio(result.original.samples, result.original.sampleRate);

// Play enhanced
await playAudio(result.enhanced.samples, result.enhanced.sampleRate);

console.log('SNR improvement:', result.snrImprovement, 'dB');

Speech-to-Text

Transcribe enhanced audio

Source Separation

Separate voice from background (coming in v0.6.0)

Get Started

Core Features

Guides

Platform Specific

Advanced

Overview

Planned Features

Noise Reduction

Echo Cancellation

Audio Normalization

Before/After Preview

Expected API (Preview)

Use Cases

1. Pre-processing for STT

2. Podcast Cleanup

3. Real-time Enhancement

4. Call Quality Improvement

Planned Configuration

Expected Output

Enhancement Levels

Expected Models

Timeline

Stay Updated

Current Workarounds

Simple Normalization Example

Simple High-pass Filter

Integration with STT Pipeline

Comparison Tool

Speech-to-Text

Source Separation

Build docs developers (and LLMs) love

Get Started

Core Features

Guides

Platform Specific

Advanced

​Overview

​Planned Features

Noise Reduction

Echo Cancellation

Audio Normalization

Before/After Preview

​Expected API (Preview)

​Use Cases

​1. Pre-processing for STT

​2. Podcast Cleanup

​3. Real-time Enhancement

​4. Call Quality Improvement

​Planned Configuration

​Expected Output

​Enhancement Levels

​Expected Models

​Timeline

​Stay Updated

​Current Workarounds

​Simple Normalization Example

​Simple High-pass Filter

​Integration with STT Pipeline

​Comparison Tool

​Related Features

Speech-to-Text

Source Separation

Build docs developers (and LLMs) love

Overview

Planned Features

Expected API (Preview)

Use Cases

1. Pre-processing for STT

2. Podcast Cleanup

3. Real-time Enhancement

4. Call Quality Improvement

Planned Configuration

Expected Output

Enhancement Levels

Expected Models

Timeline

Stay Updated

Current Workarounds

Simple Normalization Example

Simple High-pass Filter

Integration with STT Pipeline

Comparison Tool

Related Features