Webhooks

Webhooks allow you to receive real-time notifications about the progress and completion of asynchronous operations like crawls and batch scrapes. Instead of polling for status, Firecrawl will send HTTP POST requests to your specified endpoint.

Webhook Events

Firecrawl sends webhooks for the following events:

Event	Description	Triggered For
`crawl.started` / `batch_scrape.started`	Job has started processing	Crawl, Batch Scrape
`crawl.page` / `batch_scrape.page`	A single page has been scraped	Crawl, Batch Scrape
`crawl.completed` / `batch_scrape.completed`	Job completed successfully	Crawl, Batch Scrape
`crawl.failed` / `batch_scrape.failed`	Job failed	Crawl, Batch Scrape

Setting Up Webhooks

For Crawls

from firecrawl import Firecrawl

app = Firecrawl(api_key="fc-YOUR_API_KEY")

job = app.start_crawl(
    url="https://docs.firecrawl.dev",
    limit=100,
    webhook={
        "url": "https://your-domain.com/webhook",
        "headers": {
            "Authorization": "Bearer your-secret-token"
        },
        "metadata": {
            "user_id": "12345",
            "project": "documentation"
        },
        "events": ["completed", "failed"]  # Optional: filter events
    }
)

print(f"Crawl started: {job.id}")

For Batch Scrapes

from firecrawl import Firecrawl

app = Firecrawl(api_key="fc-YOUR_API_KEY")

job = app.start_batch_scrape(
    urls=[
        "https://firecrawl.dev",
        "https://docs.firecrawl.dev",
        "https://firecrawl.dev/pricing"
    ],
    formats=["markdown"],
    webhook={
        "url": "https://your-domain.com/webhook",
        "headers": {
            "Authorization": "Bearer your-secret-token"
        },
        "events": ["page", "completed"]
    }
)

print(f"Batch scrape started: {job.id}")

Webhook Configuration

URL (Required)

The HTTPS endpoint where Firecrawl will send webhook notifications.

webhook={"url": "https://your-domain.com/webhook"}

Webhook URLs must use HTTPS. HTTP endpoints are not supported for security reasons.

Headers (Optional)

Custom headers to include with webhook requests. Commonly used for authentication.

webhook={
    "url": "https://your-domain.com/webhook",
    "headers": {
        "Authorization": "Bearer your-secret-token",
        "X-Custom-Header": "custom-value"
    }
}

Use headers to implement webhook authentication and verify that requests are coming from Firecrawl.

Metadata (Optional)

Custom metadata that will be included in all webhook payloads for the job. Useful for tracking context.

webhook={
    "url": "https://your-domain.com/webhook",
    "metadata": {
        "user_id": "12345",
        "project": "documentation",
        "environment": "production"
    }
}

Events (Optional)

Filter which events trigger webhooks. By default, all events are sent.

webhook={
    "url": "https://your-domain.com/webhook",
    "events": ["completed", "failed"]  # Only send completion/failure events
}

Available events:

started - Job has started
page - A page has been scraped (can be high volume)
completed - Job completed successfully
failed - Job failed

If you only need to know when a job finishes, filter to ["completed", "failed"] to reduce webhook volume.

Webhook Payload

Firecrawl sends a POST request to your webhook URL with a JSON payload. The structure depends on the event type.

Started Event

{
  "event": "crawl.started",
  "id": "123-456-789",
  "metadata": {
    "user_id": "12345",
    "project": "documentation"
  },
  "timestamp": "2024-03-15T10:30:00Z"
}

Page Event

Sent for each page scraped. Contains the same data as the /scrape endpoint response.

{
  "event": "crawl.page",
  "id": "123-456-789",
  "data": {
    "markdown": "# Page Title\n\nContent...",
    "html": "<!DOCTYPE html>...",
    "metadata": {
      "title": "Page Title",
      "sourceURL": "https://docs.firecrawl.dev/page",
      "statusCode": 200
    }
  },
  "metadata": {
    "user_id": "12345",
    "project": "documentation"
  },
  "timestamp": "2024-03-15T10:30:05Z"
}

Completed Event

{
  "event": "crawl.completed",
  "id": "123-456-789",
  "status": "completed",
  "total": 50,
  "completed": 50,
  "creditsUsed": 50,
  "metadata": {
    "user_id": "12345",
    "project": "documentation"
  },
  "timestamp": "2024-03-15T10:35:00Z"
}

Failed Event

{
  "event": "crawl.failed",
  "id": "123-456-789",
  "status": "failed",
  "error": "Request timeout after 30000ms",
  "metadata": {
    "user_id": "12345",
    "project": "documentation"
  },
  "timestamp": "2024-03-15T10:35:00Z"
}

Implementing a Webhook Handler

Here’s an example webhook handler implementation:

from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/webhook', methods=['POST'])
def webhook():
    # Verify the request (recommended)
    auth_header = request.headers.get('Authorization')
    if auth_header != 'Bearer your-secret-token':
        return jsonify({'error': 'Unauthorized'}), 401
    
    # Parse the webhook payload
    data = request.json
    event = data.get('event')
    job_id = data.get('id')
    
    # Handle different event types
    if event == 'crawl.started':
        print(f"Crawl {job_id} started")
    
    elif event == 'crawl.page':
        page_data = data.get('data', {})
        url = page_data.get('metadata', {}).get('sourceURL')
        print(f"Scraped page: {url}")
        # Process the page data...
    
    elif event == 'crawl.completed':
        total = data.get('total')
        credits = data.get('creditsUsed')
        print(f"Crawl {job_id} completed: {total} pages, {credits} credits")
        # Final processing...
    
    elif event == 'crawl.failed':
        error = data.get('error')
        print(f"Crawl {job_id} failed: {error}")
        # Error handling...
    
    # Always return 200 to acknowledge receipt
    return jsonify({'status': 'received'}), 200

if __name__ == '__main__':
    app.run(port=3000)

Best Practices

Return 200 quickly: Your webhook handler should return a 200 status code as quickly as possible. Process the webhook data asynchronously to avoid timeouts.

Implement authentication: Use the headers option to add authentication tokens, and verify them in your webhook handler.

Handle retries: Firecrawl will retry failed webhook deliveries. Make your handler idempotent to safely handle duplicate events.

Filter events: Use the events parameter to only receive the events you need, reducing webhook traffic and processing.

Webhook endpoints must respond within 30 seconds. Long-running operations should be queued for background processing.

Testing Webhooks Locally

To test webhooks during development, you can use tools like ngrok to expose your local server:

# Start your webhook server locally
python webhook_server.py  # or node webhook_server.js

# In another terminal, start ngrok
ngrok http 3000

# Use the ngrok HTTPS URL in your webhook configuration
# Example: https://abc123.ngrok.io/webhook

Troubleshooting

Webhooks Not Received

Verify your endpoint is accessible via HTTPS
Check that your server is returning a 200 status code
Review your event filters - you may be filtering out the events
Check your server logs for incoming requests

Authentication Failures

Verify the Authorization header is being sent correctly
Check that your handler is reading the header correctly (header names may be case-insensitive)
Ensure the token matches exactly

High Volume Issues

If you’re receiving too many page events:

Filter events to only ["completed", "failed"]
Implement rate limiting in your webhook handler
Process page events asynchronously using a queue

Getting Started

Core Features

Advanced

Self-Hosting

Webhook Events

Setting Up Webhooks

For Crawls

For Batch Scrapes

Webhook Configuration

URL (Required)

Headers (Optional)

Metadata (Optional)

Events (Optional)

Webhook Payload

Started Event

Page Event

Completed Event

Failed Event

Implementing a Webhook Handler

Best Practices

Testing Webhooks Locally

Troubleshooting

Webhooks Not Received

Authentication Failures

High Volume Issues

Build docs developers (and LLMs) love

Getting Started

Core Features

Advanced

Self-Hosting

​Webhook Events

​Setting Up Webhooks

​For Crawls

​For Batch Scrapes

​Webhook Configuration

​URL (Required)

​Headers (Optional)

​Metadata (Optional)

​Events (Optional)

​Webhook Payload

​Started Event

​Page Event

​Completed Event

​Failed Event

​Implementing a Webhook Handler

​Best Practices

​Testing Webhooks Locally

​Troubleshooting

​Webhooks Not Received

​Authentication Failures

​High Volume Issues

Build docs developers (and LLMs) love

Webhook Events

Setting Up Webhooks

For Crawls

For Batch Scrapes

Webhook Configuration

URL (Required)

Headers (Optional)

Metadata (Optional)

Events (Optional)

Webhook Payload

Started Event

Page Event

Completed Event

Failed Event

Implementing a Webhook Handler

Best Practices

Testing Webhooks Locally

Troubleshooting

Webhooks Not Received

Authentication Failures

High Volume Issues