Introduction to ArchiveBox

What is ArchiveBox?

ArchiveBox is a self-hosted app that lets you preserve content from websites in a variety of formats. Without active preservation effort, everything on the internet eventually disappears or degrades. We aim to make your data immediately useful, and kept in formats that other programs can read directly. As output, we save standard HTML, PNG, PDF, TXT, JSON, WARC, SQLite, all guaranteed to be readable for decades to come. ArchiveBox also has a CLI, REST API, and webhooks so you can set up integrations with other services. ArchiveBox is an open source tool that lets organizations & individuals archive both public & private web content while retaining control over their data. It can be used to save copies of bookmarks, preserve evidence for legal cases, backup photos from FB/Insta/Flickr or media from YT/Soundcloud/etc., save research papers, and more.

Key Features

Free & Open Source

Own your own data & maintain your privacy by self-hosting. Licensed under MIT.

Powerful CLI & Web UI

Comprehensive command-line interface plus a self-hosted web UI for managing your archive.

Multiple Archive Formats

Saves HTML, PDF, screenshots, videos, git repos, WARC files, and more.

Import from Anywhere

Import URLs from bookmarks, RSS feeds, browser history, Pocket, Pinboard, and more.

Scheduled Archiving

Set up automated imports from RSS feeds and other sources on a schedule.

Archive.org Integration

Optionally saves all pages to archive.org for redundancy (can be disabled).

Content Extraction

Automatically extracts media (yt-dlp), articles (readability), code (git), and more.

Long-term Preservation

Uses standard, durable formats like HTML, JSON, PDF, PNG, MP4, TXT, and WARC.

What Does ArchiveBox Save?

For each web page you archive, ArchiveBox creates a snapshot and preserves its content in multiple redundant formats:

HTML & DOM: Original HTML+CSS+JS, SingleFile snapshot, DOM dump
Visual: Screenshots (PNG), PDF printouts
Metadata: Title, favicon, headers, response data
Media: Audio/video files via yt-dlp, including subtitles and metadata
Articles: Extracted article text using Readability & Mercury
Source Code: Git repository clones from GitHub, GitLab, Bitbucket
Archives: WARC files via wget, Archive.org permalinks
And more: See Output Formats for the full list

Use Cases

Journalists

Crawl websites during research, preserve cited pages for fact-checking and review.

Lawyers

Collect and preserve evidence, detect changes over time, organize with tags for review.

Researchers

Analyze social media trends, gather LLM training data, build crawling pipelines.

Individuals

Save bookmarks, preserve portfolio content, create legacy archives and memoirs.

How to Import URLs

ArchiveBox supports importing URLs from many sources:

Browser Extension: Real-time archiving from Chrome/Chromium/Firefox
Browser Exports: Import bookmarks or history from any browser
RSS Feeds: Automatically archive new posts from feeds
Bookmark Services: Import from Pocket, Pinboard, Instapaper, etc.
Social Media: Save posts from Reddit, Twitter bookmarks, and more
Plain Text: Any text file containing URLs (CSV, JSON, TXT, Markdown, etc.)
Manual Entry: Add individual URLs via CLI or Web UI

Get Started

Quickstart Guide

Get ArchiveBox running in minutes with Docker Compose

Installation Guide

Detailed installation instructions for all platforms

Community & Support

ArchiveBox is free for everyone to self-host. We also provide professional support, security review, and custom integrations for NGOs, governments, and organizations. Learn more

Demo: Try it out at demo.archivebox.io
Documentation: Full wiki at github.com/ArchiveBox/ArchiveBox/wiki
GitHub: Star us at github.com/ArchiveBox/ArchiveBox
Community: Join the discussion in our Web Archiving Community

Get Started

Installation Methods

Usage

Core Features

Configuration

Plugins

Advanced Topics

Introduction to ArchiveBox

What is ArchiveBox?

Key Features

Free & Open Source

Powerful CLI & Web UI

Multiple Archive Formats

Import from Anywhere

Scheduled Archiving

Archive.org Integration

Content Extraction

Long-term Preservation

What Does ArchiveBox Save?

Use Cases

Journalists

Lawyers

Researchers

Individuals

How to Import URLs

Get Started

Quickstart Guide

Installation Guide

Community & Support

Build docs developers (and LLMs) love

Get Started

Installation Methods

Usage

Core Features

Configuration

Plugins

Advanced Topics

​What is ArchiveBox?

​Key Features

Free & Open Source

Powerful CLI & Web UI

Multiple Archive Formats

Import from Anywhere

Scheduled Archiving

Archive.org Integration

Content Extraction

Long-term Preservation

​What Does ArchiveBox Save?

​Use Cases

Journalists

Lawyers

Researchers

Individuals

​How to Import URLs

​Get Started

Quickstart Guide

Installation Guide

​Community & Support

Build docs developers (and LLMs) love

What is ArchiveBox?

Key Features

What Does ArchiveBox Save?

Use Cases

How to Import URLs

Get Started

Community & Support