Configuration Python API
ArchiveBox provides a comprehensive configuration system that can be accessed and modified programmatically. Configuration is organized into several sections and can be sourced from multiple locations.Configuration Sources
Configuration values are loaded from multiple sources in this order (later sources override earlier ones):- Built-in defaults - Hard-coded defaults
- Environment variables -
ARCHIVEBOX_*prefixed - Config file -
ArchiveBox.confin data directory - Machine config - Per-machine overrides in database
- Crawl config - Per-crawl overrides in database
- Snapshot config - Per-snapshot overrides in database
Configuration Sections
Configuration is organized into logical sections:Constants (Read-Only)
Immutable constants for paths and versions:Shell Configuration
Settings for CLI and shell behavior:Storage Configuration
Settings for storage, paths, and file handling:General Configuration
General application settings:Server Configuration
Web server and UI settings:Archiving Configuration
Settings that control archiving behavior:Search Backend Configuration
Settings for search functionality:Getting All Configuration
Retrieve all configuration sections as a dictionary:Modifying Configuration Programmatically
Environment Variables
The most common way to override configuration:Config File
ModifyArchiveBox.conf in your data directory:
Per-Snapshot Configuration
Override configuration for specific snapshots:Per-Crawl Configuration
Override configuration for all snapshots in a crawl:Configuration Merging
Get merged configuration from multiple sources:Validation
Some configuration sections have validation:Examples
Change Timeout for All Future Archives
Create High-Resolution Screenshot Archive
Check Binary Availability
Filter URLs by Configuration
Export Configuration
See Also
Python API Overview
Basic Python API usage
Models Reference
Django models documentation
Extractors API
Using and creating extractors
Configuration Guide
User-facing configuration guide