Skip to main content

Endpoint

POST /v1/extract

Authentication

This endpoint requires authentication using a Bearer token. Include your API key in the Authorization header:
Authorization: Bearer YOUR_API_KEY

Request Body

urls
array
required
The URLs to extract data from. URLs should be in glob format.
prompt
string
Prompt to guide the extraction process
schema
object
Schema to define the structure of the extracted data. Must conform to JSON Schema.
When true, the extraction will use web search to find additional data
ignoreSitemap
boolean
default:false
When true, sitemap.xml files will be ignored during website scanning
includeSubdomains
boolean
default:true
When true, subdomains of the provided URLs will also be scanned
showSources
boolean
default:false
When true, the sources used to extract the data will be included in the response as sources key
scrapeOptions
object
Additional scraping options to apply. See the scrape endpoint for available options.
ignoreInvalidURLs
boolean
default:false
If invalid URLs are specified in the urls array, they will be ignored. Instead of them failing the entire request, an extract using the remaining valid URLs will be performed, and the invalid URLs will be returned in the invalidURLs field of the response.

Response

success
boolean
Indicates whether the extract job was successfully started
id
string
The unique identifier of the extract job. Use this to check the status.
invalidURLs
array
If ignoreInvalidURLs is true, this is an array containing the invalid URLs that were specified in the request. If there were no invalid URLs, this will be an empty array. If ignoreInvalidURLs is false, this field will be undefined.

Examples

curl -X POST https://api.firecrawl.dev/v1/extract \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -d '{
    "urls": ["https://example.com"],
    "prompt": "Extract the main article title and author",
    "schema": {
      "type": "object",
      "properties": {
        "title": { "type": "string" },
        "author": { "type": "string" }
      }
    }
  }'

Error Responses

400
object
Invalid Request - Invalid input data.
{
  "error": "Invalid input data."
}
500
object
Server Error - An unexpected error occurred on the server.
{
  "error": "An unexpected error occurred on the server."
}

Build docs developers (and LLMs) love