Skip to main content
The Scrape feature converts any URL into LLM-ready data formats including markdown, HTML, screenshots, and structured JSON. It handles JavaScript rendering, dynamic content, and can interact with pages before extracting data.

When to Use Scrape

Use Scrape when you need to:
  • Extract content from a single URL
  • Convert web pages to clean markdown for LLM processing
  • Capture screenshots of pages
  • Extract structured data using a schema
  • Interact with pages (login, click, scroll) before scraping
  • Extract brand identity (colors, fonts, typography)

Basic Usage

from firecrawl import Firecrawl

app = Firecrawl(api_key="fc-YOUR_API_KEY")

# Scrape a website
result = app.scrape(
    'https://firecrawl.dev',
    formats=['markdown', 'html']
)

print(result.markdown)
print(result.html)

Response

{
  "success": true,
  "data": {
    "markdown": "# Firecrawl Docs\n\nTurn websites into LLM-ready data...",
    "html": "<!DOCTYPE html><html>...",
    "metadata": {
      "title": "Quickstart | Firecrawl",
      "description": "Firecrawl allows you to turn entire websites into LLM-ready markdown",
      "sourceURL": "https://docs.firecrawl.dev",
      "statusCode": 200
    }
  }
}

Available Formats

You can request multiple formats in a single scrape:
  • markdown - Clean markdown content
  • html - Cleaned HTML
  • rawHtml - Original HTML
  • screenshot - Base64 encoded screenshot
  • links - All links found on the page
  • json - Structured data extraction
  • branding - Brand identity (colors, fonts, typography)

Get a Screenshot

doc = app.scrape("https://firecrawl.dev", formats=["screenshot"])
print(doc.screenshot)  # Base64 encoded image

Extract Brand Identity

doc = app.scrape("https://firecrawl.dev", formats=["branding"])
print(doc.branding)  # {"colors": {...}, "fonts": [...], "typography": {...}}

Extract Structured Data

Extract structured data using a schema with JSON mode:
from firecrawl import Firecrawl
from pydantic import BaseModel

app = Firecrawl(api_key="fc-YOUR_API_KEY")

class CompanyInfo(BaseModel):
    company_mission: str
    is_open_source: bool
    is_in_yc: bool

result = app.scrape(
    'https://firecrawl.dev',
    formats=[{"type": "json", "schema": CompanyInfo.model_json_schema()}]
)

print(result.json)
Output:
{
  "company_mission": "Turn websites into LLM-ready data",
  "is_open_source": true,
  "is_in_yc": true
}

Extract with Prompt (No Schema)

You can also extract data using just a prompt without defining a schema:
result = app.scrape(
    'https://firecrawl.dev',
    formats=[{"type": "json", "prompt": "Extract the company mission"}]
)

Actions: Interact Before Scraping

Perform actions like clicking, typing, scrolling, and waiting before extracting content:
doc = app.scrape(
    url="https://example.com/login",
    formats=["markdown"],
    actions=[
        {"type": "write", "text": "[email protected]"},
        {"type": "press", "key": "Tab"},
        {"type": "write", "text": "password"},
        {"type": "click", "selector": 'button[type="submit"]'},
        {"type": "wait", "milliseconds": 2000},
        {"type": "screenshot"}
    ]
)
Actions are executed sequentially in the order specified. Use the wait action to allow dynamic content to load after interactions.

Best Practices

  • Request only the formats you need to minimize API usage
  • Use screenshot format to verify page rendering
  • For structured extraction, define a clear schema for consistent results
  • Use actions to handle pages requiring authentication or interaction
  • The branding format is useful for design research and competitor analysis

Next Steps

  • Learn about Batch Scraping to scrape multiple URLs efficiently
  • Use Crawl to scrape entire websites
  • Try Agent for autonomous data gathering

Build docs developers (and LLMs) love