Overview
FAD measures the distance between two audio distributions using the Fréchet distance in a learned embedding space. Lower FAD scores indicate better quality and diversity matching the baseline distribution.fad_setup
Initialize FAD scoring system with baseline dataset and embedding model.Path to baseline audio dataset. Can be a Kaldi-style SCP file or directory path depending on
io parameterEmbedding model to use for feature extraction. See FADTK documentation for available models
Directory to cache computed embeddings for faster repeated evaluation
Whether to use infinite FAD calculation (recommended for different dataset sizes)
Input/output format for audio files. Options:
"kaldi" (SCP format) or other supported formatsDictionary containing:
module: FAD calculation modulebaseline: Baseline dataset pathcache_dir: Cache directory pathuse_inf: Infinite FAD flagio: I/O formatembedding: Embedding model name
Requires FADTK installation. Install using
tools/install_fadtk.sh or follow FADTK documentation.fad_scoring
Calculate FAD score between baseline and evaluation datasets.Path to evaluation/generated audio dataset (same format as baseline)
FAD configuration dictionary from
fad_setup()Prefix for result dictionary keys
Dictionary containing:
{key_info}_overall(float): FAD score{key_info}_r2(float): R² value (only whenuse_inf=True)
Usage Examples
- Basic Usage
- Multiple Evaluations
- Custom Embedding
- With Kaldi Files
Understanding FAD Scores
What is a good FAD score?
What is a good FAD score?
- Lower is better: FAD measures distribution distance
- FAD = 0: Perfect match (identical distributions)
- FAD < 1: Excellent quality and diversity
- FAD 1-5: Good quality
- FAD > 10: Significant distribution mismatch
Infinite FAD vs Standard FAD
Infinite FAD vs Standard FAD
- Infinite FAD (
use_inf=True): Recommended when baseline and evaluation have different sizes. Provides more stable estimates. - Standard FAD (
use_inf=False): Requires equal-sized datasets. Faster but less robust.
Embedding Models
Embedding Models
FAD uses pre-trained models to extract audio features:
- Different models may give different absolute scores
- Use the same embedding model for fair comparison
- See FADTK models for options
Caching for Performance
Caching for Performance
Embeddings are cached to disk:
- First run: Computes and caches embeddings for both datasets
- Subsequent runs: Loads cached embeddings (much faster)
- Cache structure:
{cache_dir}/baseline/: Baseline embeddings{cache_dir}/eval/: Evaluation embeddings
File Format
Kaldi SCP Format
When usingio="kaldi", provide SCP (script) files:
- Utterance ID: Unique identifier
- File Path: Absolute or relative path to audio file