Webhooks allow you to receive real-time notifications about the progress and completion of asynchronous operations like crawls and batch scrapes. Instead of polling for status, Firecrawl will send HTTP POST requests to your specified endpoint.
Webhook Events
Firecrawl sends webhooks for the following events:
Event Description Triggered For crawl.started / batch_scrape.startedJob has started processing Crawl, Batch Scrape crawl.page / batch_scrape.pageA single page has been scraped Crawl, Batch Scrape crawl.completed / batch_scrape.completedJob completed successfully Crawl, Batch Scrape crawl.failed / batch_scrape.failedJob failed Crawl, Batch Scrape
Setting Up Webhooks
For Crawls
from firecrawl import Firecrawl
app = Firecrawl( api_key = "fc-YOUR_API_KEY" )
job = app.start_crawl(
url = "https://docs.firecrawl.dev" ,
limit = 100 ,
webhook = {
"url" : "https://your-domain.com/webhook" ,
"headers" : {
"Authorization" : "Bearer your-secret-token"
},
"metadata" : {
"user_id" : "12345" ,
"project" : "documentation"
},
"events" : [ "completed" , "failed" ] # Optional: filter events
}
)
print ( f "Crawl started: { job.id } " )
For Batch Scrapes
from firecrawl import Firecrawl
app = Firecrawl( api_key = "fc-YOUR_API_KEY" )
job = app.start_batch_scrape(
urls = [
"https://firecrawl.dev" ,
"https://docs.firecrawl.dev" ,
"https://firecrawl.dev/pricing"
],
formats = [ "markdown" ],
webhook = {
"url" : "https://your-domain.com/webhook" ,
"headers" : {
"Authorization" : "Bearer your-secret-token"
},
"events" : [ "page" , "completed" ]
}
)
print ( f "Batch scrape started: { job.id } " )
Webhook Configuration
URL (Required)
The HTTPS endpoint where Firecrawl will send webhook notifications.
webhook = { "url" : "https://your-domain.com/webhook" }
Webhook URLs must use HTTPS. HTTP endpoints are not supported for security reasons.
Custom headers to include with webhook requests. Commonly used for authentication.
webhook = {
"url" : "https://your-domain.com/webhook" ,
"headers" : {
"Authorization" : "Bearer your-secret-token" ,
"X-Custom-Header" : "custom-value"
}
}
Use headers to implement webhook authentication and verify that requests are coming from Firecrawl.
Custom metadata that will be included in all webhook payloads for the job. Useful for tracking context.
webhook = {
"url" : "https://your-domain.com/webhook" ,
"metadata" : {
"user_id" : "12345" ,
"project" : "documentation" ,
"environment" : "production"
}
}
Events (Optional)
Filter which events trigger webhooks. By default, all events are sent.
webhook = {
"url" : "https://your-domain.com/webhook" ,
"events" : [ "completed" , "failed" ] # Only send completion/failure events
}
Available events:
started - Job has started
page - A page has been scraped (can be high volume)
completed - Job completed successfully
failed - Job failed
If you only need to know when a job finishes, filter to ["completed", "failed"] to reduce webhook volume.
Webhook Payload
Firecrawl sends a POST request to your webhook URL with a JSON payload. The structure depends on the event type.
Started Event
{
"event" : "crawl.started" ,
"id" : "123-456-789" ,
"metadata" : {
"user_id" : "12345" ,
"project" : "documentation"
},
"timestamp" : "2024-03-15T10:30:00Z"
}
Page Event
Sent for each page scraped. Contains the same data as the /scrape endpoint response.
{
"event" : "crawl.page" ,
"id" : "123-456-789" ,
"data" : {
"markdown" : "# Page Title \n\n Content..." ,
"html" : "<!DOCTYPE html>..." ,
"metadata" : {
"title" : "Page Title" ,
"sourceURL" : "https://docs.firecrawl.dev/page" ,
"statusCode" : 200
}
},
"metadata" : {
"user_id" : "12345" ,
"project" : "documentation"
},
"timestamp" : "2024-03-15T10:30:05Z"
}
Completed Event
{
"event" : "crawl.completed" ,
"id" : "123-456-789" ,
"status" : "completed" ,
"total" : 50 ,
"completed" : 50 ,
"creditsUsed" : 50 ,
"metadata" : {
"user_id" : "12345" ,
"project" : "documentation"
},
"timestamp" : "2024-03-15T10:35:00Z"
}
Failed Event
{
"event" : "crawl.failed" ,
"id" : "123-456-789" ,
"status" : "failed" ,
"error" : "Request timeout after 30000ms" ,
"metadata" : {
"user_id" : "12345" ,
"project" : "documentation"
},
"timestamp" : "2024-03-15T10:35:00Z"
}
Implementing a Webhook Handler
Here’s an example webhook handler implementation:
Python (Flask)
Node.js (Express)
from flask import Flask, request, jsonify
app = Flask( __name__ )
@app.route ( '/webhook' , methods = [ 'POST' ])
def webhook ():
# Verify the request (recommended)
auth_header = request.headers.get( 'Authorization' )
if auth_header != 'Bearer your-secret-token' :
return jsonify({ 'error' : 'Unauthorized' }), 401
# Parse the webhook payload
data = request.json
event = data.get( 'event' )
job_id = data.get( 'id' )
# Handle different event types
if event == 'crawl.started' :
print ( f "Crawl { job_id } started" )
elif event == 'crawl.page' :
page_data = data.get( 'data' , {})
url = page_data.get( 'metadata' , {}).get( 'sourceURL' )
print ( f "Scraped page: { url } " )
# Process the page data...
elif event == 'crawl.completed' :
total = data.get( 'total' )
credits = data.get( 'creditsUsed' )
print ( f "Crawl { job_id } completed: { total } pages, { credits } credits" )
# Final processing...
elif event == 'crawl.failed' :
error = data.get( 'error' )
print ( f "Crawl { job_id } failed: { error } " )
# Error handling...
# Always return 200 to acknowledge receipt
return jsonify({ 'status' : 'received' }), 200
if __name__ == '__main__' :
app.run( port = 3000 )
Best Practices
Return 200 quickly : Your webhook handler should return a 200 status code as quickly as possible. Process the webhook data asynchronously to avoid timeouts.
Implement authentication : Use the headers option to add authentication tokens, and verify them in your webhook handler.
Handle retries : Firecrawl will retry failed webhook deliveries. Make your handler idempotent to safely handle duplicate events.
Filter events : Use the events parameter to only receive the events you need, reducing webhook traffic and processing.
Webhook endpoints must respond within 30 seconds. Long-running operations should be queued for background processing.
Testing Webhooks Locally
To test webhooks during development, you can use tools like ngrok to expose your local server:
# Start your webhook server locally
python webhook_server.py # or node webhook_server.js
# In another terminal, start ngrok
ngrok http 3000
# Use the ngrok HTTPS URL in your webhook configuration
# Example: https://abc123.ngrok.io/webhook
Troubleshooting
Webhooks Not Received
Verify your endpoint is accessible via HTTPS
Check that your server is returning a 200 status code
Review your event filters - you may be filtering out the events
Check your server logs for incoming requests
Authentication Failures
Verify the Authorization header is being sent correctly
Check that your handler is reading the header correctly (header names may be case-insensitive)
Ensure the token matches exactly
High Volume Issues
If you’re receiving too many page events:
Filter events to only ["completed", "failed"]
Implement rate limiting in your webhook handler
Process page events asynchronously using a queue