Skip to main content

Overview

RocksDB supports multiple compression algorithms to reduce storage space and I/O. Compression is applied at the block level and can be configured per-level for optimal trade-offs between space, CPU, and performance.
Proper compression configuration can reduce storage costs by 3-10x while maintaining good read/write performance.

Compression Types

From compression_type.h:18-166, RocksDB supports these algorithms:
enum CompressionType : unsigned char {
  kNoCompression = 0x00,
  kSnappyCompression = 0x01,
  kZlibCompression = 0x02,
  kBZip2Compression = 0x03,
  kLZ4Compression = 0x04,
  kLZ4HCCompression = 0x05,
  kXpressCompression = 0x06,  // Windows only
  kZSTD = 0x07,
  kDisableCompressionOption = 0xff,
};

Algorithm Comparison

AlgorithmCompression RatioSpeedCPU UsageBest For
SnappyLow (~2x)Very FastLowL0, hot data
LZ4Low-Medium (~2.5x)Very FastLowL0, L1
LZ4HCMedium (~3x)MediumMediumMid-levels
ZlibMedium-High (~3.5x)SlowHighCold levels
ZSTDHigh (~4x)FastMediumMost levels
BZip2Very High (~4.5x)Very SlowVery HighArchival
ZSTD (recommended): Best balance of compression ratio and speed for most workloads.

Basic Configuration

Single Compression Type

Options options;
options.compression = CompressionType::kZSTD;

Per-Level Compression

From advanced_options.h:505-534:
std::vector<CompressionType> compression_per_level;
Optimize by using fast compression for hot levels, strong compression for cold levels:
Options options;
options.compression_per_level = {
  kNoCompression,      // L0 - memtable flushes, uncompressed
  kNoCompression,      // L1 - still hot
  kLZ4Compression,     // L2 - warm data
  kLZ4Compression,     // L3
  kZSTD,               // L4 - colder data
  kZSTD,               // L5
  kZSTD                // L6 - coldest data
};
If the vector is smaller than num_levels, the last value is used for remaining levels.

Compression Options

From compression_type.h:169-317, configure algorithm-specific parameters:
struct CompressionOptions {
  int window_bits = -14;         // zlib only
  int level = kDefaultCompressionLevel;
  int strategy = 0;              // zlib only
  uint32_t max_dict_bytes = 0;
  uint32_t zstd_max_train_bytes = 0;
  uint32_t parallel_threads = 1;
  bool enabled = false;
  uint64_t max_dict_buffer_bytes = 0;
  bool use_zstd_dict_trainer = true;
  int max_compressed_bytes_per_kb = 1024 * 7 / 8;
  bool checksum = false;         // ZSTD only
};

Compression Level

From compression_type.h:175-195:
static constexpr int kDefaultCompressionLevel = 32767;

options.compression_opts.level = -1;  // Default varies by algorithm:
                                       // LZ4: 1
                                       // ZSTD: 3
                                       // Zlib: 6
LZ4 negative levels configure acceleration (speed over ratio):
options.compression_opts.level = -10;  // LZ4 acceleration=10 (faster)

Parallel Compression

From compression_type.h:229-244:
options.compression_opts.parallel_threads = 4;  // Use 4 threads
Parallel compression adds overhead. Not recommended for lightweight algorithms like Snappy or LZ4.
Restrictions:
  • Only works with BlockBasedTable
  • Disabled if using partition filters without decoupling

Dictionary Compression

From compression_type.h:200-220, use dictionaries for better compression on small values:
// Basic dictionary
options.compression_opts.max_dict_bytes = 16 * 1024;  // 16 KB dictionary

// With ZSTD training
options.compression_opts.zstd_max_train_bytes = 100 * max_dict_bytes;
options.compression_opts.use_zstd_dict_trainer = true;

Dictionary Training

Two approaches: 1. ZSTD Trainer (better compression, more CPU):
options.compression_opts.max_dict_bytes = 16384;
options.compression_opts.zstd_max_train_bytes = 100 * 16384;  // 100x training data
options.compression_opts.use_zstd_dict_trainer = true;
2. Direct Finalization (faster, less compression):
options.compression_opts.max_dict_bytes = 16384;
options.compression_opts.use_zstd_dict_trainer = false;  // Use ZDICT_finalizeDictionary

Dictionary Buffering

From compression_type.h:255-272:
options.compression_opts.max_dict_buffer_bytes = 1024 * 1024 * 1024;  // 1 GB
Dictionary building requires buffering SST data in memory. Limit with max_dict_buffer_bytes to prevent excessive memory usage.
Memory is charged to block cache when available:
// If block cache insertion fails (MemoryLimit), dictionary is finalized
// with whatever data has been collected so far

Compression Ratio Threshold

From compression_type.h:288-299:
// Only compress if achieving at least 1.143:1 ratio (default)
int max_compressed_bytes_per_kb = 1024 * 7 / 8;  // 896 bytes per 1KB

// Require at least 2:1 compression
options.compression_opts.SetMinRatio(2.0);
Blocks that don’t meet the threshold are stored uncompressed.

ZSTD-Specific Options

Checksum

From compression_type.h:301-307:
options.compression_opts.checksum = true;  // 32-bit checksum per frame
Adds a checksum verified during decompression for additional data integrity.

ZSTD Level Guidelines

// Fast compression (L0-L2)
options.compression_opts.level = 1;   // ~3x ratio, very fast

// Balanced (L3-L4)
options.compression_opts.level = 3;   // Default, ~4x ratio, fast

// High compression (L5+)
options.compression_opts.level = 9;   // ~5x ratio, slower

// Maximum (archival)
options.compression_opts.level = 19;  // ~6x ratio, very slow

Bottommost Level Compression

Apply stronger compression to the bottommost level:
Options options;
options.compression = kLZ4Compression;  // Default for all levels

// Stronger compression for bottommost
options.bottommost_compression = kZSTD;
options.bottommost_compression_opts.level = 9;
options.bottommost_compression_opts.enabled = true;

Compression Statistics

From statistics.h:251-283, monitor compression effectiveness:
// Compression operations
NUMBER_BLOCK_COMPRESSED
NUMBER_BLOCK_DECOMPRESSED

// Compression efficiency
BYTES_COMPRESSED_FROM      // Input bytes
BYTES_COMPRESSED_TO        // Output bytes (compressed)
BYTES_COMPRESSION_BYPASSED // Stored uncompressed (kNoCompression)
BYTES_COMPRESSION_REJECTED // Didn't meet ratio threshold

// Block counts
NUMBER_BLOCK_COMPRESSION_BYPASSED
NUMBER_BLOCK_COMPRESSION_REJECTED

// Decompression
BYTES_DECOMPRESSED_FROM    // Compressed bytes read
BYTES_DECOMPRESSED_TO      // Uncompressed output
Calculate compression ratio:
auto stats = options.statistics;
uint64_t from = stats->getTickerCount(BYTES_COMPRESSED_FROM);
uint64_t to = stats->getTickerCount(BYTES_COMPRESSED_TO);

if (to > 0) {
  double ratio = static_cast<double>(from) / to;
  printf("Overall compression ratio: %.2fx\n", ratio);
}

Advanced Patterns

Tiered Compression Strategy

ColumnFamilyOptions cf_opts;

// L0-L1: No compression (hot data, frequent rewrites)
cf_opts.compression_per_level.push_back(kNoCompression);
cf_opts.compression_per_level.push_back(kNoCompression);

// L2-L3: Fast compression (warm data)
cf_opts.compression_per_level.push_back(kLZ4Compression);
cf_opts.compression_per_level.push_back(kLZ4Compression);

// L4+: Strong compression (cold data)
for (int i = 4; i < 7; i++) {
  cf_opts.compression_per_level.push_back(kZSTD);
}

// Bottommost: Maximum compression (coldest data)
cf_opts.bottommost_compression = kZSTD;
cf_opts.bottommost_compression_opts.enabled = true;
cf_opts.bottommost_compression_opts.level = 9;
cf_opts.bottommost_compression_opts.max_dict_bytes = 16384;

Sample-Based Compression Testing

From advanced_options.h:943-947:
// Test compression on 1 in 100 blocks
options.sample_for_compression = 100;

// Check statistics to evaluate algorithms
// Data stored uncompressed unless compression is also enabled

Performance Tuning

Write-Heavy Workloads

// Minimize compression overhead on writes
options.compression = kLZ4Compression;  // Fast compression
options.compression_opts.level = 1;

// Or disable for L0
options.compression_per_level[0] = kNoCompression;

Read-Heavy Workloads

// Maximize compression for storage efficiency
options.compression = kZSTD;
options.compression_opts.level = 3;  // Balanced

// Enable dictionary for small values
options.compression_opts.max_dict_bytes = 16384;

Balanced Workload

Options options;
options.compression = kZSTD;
options.compression_opts.level = 3;

// Tiered compression
options.compression_per_level = {
  kNoCompression,  // L0
  kLZ4Compression, // L1
  kZSTD,           // L2+
};

options.bottommost_compression = kZSTD;
options.bottommost_compression_opts.level = 6;  // Stronger for coldest data
options.bottommost_compression_opts.enabled = true;

Troubleshooting

Poor Compression Ratio

  1. Check BYTES_COMPRESSION_REJECTED: Too many blocks failing threshold
    // Lower threshold to compress more blocks
    options.compression_opts.SetMinRatio(1.1);
    
  2. Enable dictionaries: Helps with small, similar values
    options.compression_opts.max_dict_bytes = 16384;
    
  3. Increase compression level: More CPU for better ratio
    options.compression_opts.level = 6;  // ZSTD level 6
    

High Write CPU

  1. Use faster algorithms: Switch from ZSTD to LZ4
    options.compression = kLZ4Compression;
    
  2. Disable parallel compression: May add overhead
    options.compression_opts.parallel_threads = 1;
    
  3. Lower compression level:
    options.compression_opts.level = 1;  // Faster
    

Slow Reads

  1. Profile decompression time: Check DECOMPRESSION_TIMES_NANOS histogram
  2. Consider lighter compression: Trade space for speed
    options.compression = kLZ4Compression;  // Faster decompression
    
  3. Increase block cache: Cache uncompressed blocks

Best Practices

Start with ZSTD level 3: Good default for most workloads. Tune based on profiling.
  1. Use per-level compression: Match compression to data temperature
  2. Monitor statistics: Track compression ratio and CPU time
  3. Test with your data: Compression effectiveness varies by data characteristics
  4. Enable dictionaries for small values: Especially effective for similar records
  5. Profile before optimizing: Measure CPU overhead vs. space savings
Options options;
options.compression = kZSTD;
options.compression_opts.level = 3;

// Tiered approach
options.compression_per_level = {
  kNoCompression,  // L0 - skip compression overhead
  kLZ4Compression, // L1 - fast compression
  kZSTD,           // L2+ - balanced
};

// Stronger compression for bottommost
options.bottommost_compression = kZSTD;
options.bottommost_compression_opts.enabled = true;
options.bottommost_compression_opts.level = 6;

See Also

Build docs developers (and LLMs) love