Overview
The Crawls API allows you to manage crawl sessions. A crawl represents a group of snapshots created from a single import or archiving command (e.g.,archivebox add).
Base URL: /api/v1/crawls/
Crawl Schema
A crawl object contains:Field Descriptions
| Field | Type | Description |
|---|---|---|
id | UUID | Unique identifier for the crawl |
created_at | datetime | When the crawl was created |
modified_at | datetime | When the crawl was last modified |
created_by_id | string | User ID who created the crawl |
created_by_username | string | Username who created the crawl |
status | string | Current status (see Status Values) |
retry_at | datetime? | When to retry crawl (null if not scheduled) |
urls | string | Newline-separated list of URLs |
extractor | string | Parser used (e.g., “auto”, “wget_log”, “rss”) |
max_depth | int | Recursion depth for crawling |
tags_str | string | Comma-separated tag names |
config | object | Configuration overrides for this crawl |
Status Values
queued- Waiting to be processedstarted- Currently crawlingsucceeded- Successfully completedfailed- Crawl failedsealed- Cancelled/frozen (no further processing)
List Crawls
Get all crawls in the system.Response
Returns an array of crawl objects:Get Single Crawl
Retrieve a specific crawl by ID.Path Parameters
| Parameter | Description |
|---|---|
crawl_id | Crawl UUID (full or prefix match) |
Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
with_snapshots | bool | false | Include snapshots array |
with_archiveresults | bool | false | Include archiveresults in snapshots |
as_rss | bool | false | Return snapshots as RSS XML feed |
Response
Returns a single crawl object.RSS Feed Export
Get snapshots from a crawl as an RSS feed:Update Crawl
Update crawl status or retry time.Request Body
Behavior
When settingstatus to sealed:
- The crawl’s
retry_atis set tonull - All queued or started snapshots in this crawl are also sealed
- Their
retry_atfields are also set tonull
Valid Status Transitions
You can update status to any of these values:queuedstartedsucceededfailedsealed
Response
Returns the updated crawl object.Common Workflows
Cancel a Running Crawl
Stop a crawl and all its associated snapshots:View All Snapshots in a Crawl
Monitor Recent Crawls
Export Crawl as RSS Feed
Useful for sharing or re-importing:Understanding Crawls vs Snapshots
Relationship:This creates:
- Crawl: Represents a single import operation (e.g., one
archivebox addcommand) - Snapshot: Individual URL within a crawl
- 1 Crawl with
max_depth=1 - Multiple Snapshots (example.com + any linked pages)
Crawl Configuration
Theconfig field stores configuration overrides that were active when the crawl was created:
Error Responses
404 Not Found
400 Bad Request
Related Endpoints
Snapshots API
View snapshots within a crawl
Tags API
Crawls can have tags applied to all snapshots