Smart Cache Warming

Overview

Adapt’s cache warming system ensures your CDN serves cached content by detecting cache misses and automatically re-requesting pages until they’re cached. This dramatically improves initial page load times after publishing.

How It Works

Two-Phase Warming Strategy

Phase 1: Initial Request

First request to each URL to trigger cache population:

// Initial crawl request
result, err := crawler.WarmURL(ctx, targetURL, findLinks)

// Check cache status from CDN headers
if result.CacheStatus == "MISS" || result.CacheStatus == "EXPIRED" {
    // Proceed to Phase 2
}

Phase 2: Cache Validation

After detecting a miss, Adapt waits for the CDN to cache the response, then validates:

// Apply randomised delay (500-1000ms) to allow CDN processing
jitteredDelay := 500 + rand.Intn(501)
time.Sleep(time.Duration(jitteredDelay) * time.Millisecond)

// Check cache status with HEAD requests (up to 3 attempts)
for attempt := 1; attempt <= 3; attempt++ {
    cacheStatus := checkCacheStatus(url)
    if cacheStatus == "HIT" {
        // Make final GET request to measure cached performance
        break
    }
    time.Sleep(time.Duration(checkDelay) * time.Millisecond)
    checkDelay += 300 // Exponential backoff
}

Intelligent Cache Detection

Adapt supports all major CDN cache headers:

CF-Cache-Status: HIT|MISS|DYNAMIC|BYPASS|EXPIRED|STALE

CloudFront (AWS)

X-Cache: Hit from cloudfront
X-Cache: Miss from cloudfront

Fastly

X-Cache: HIT
X-Cache: HIT, HIT  # Shielding (takes last value)

Akamai

X-Cache-Remote: TCP_HIT
X-Cache-Remote: TCP_MISS from child

Vercel

x-vercel-cache: HIT|MISS|STALE|PRERENDER

Netlify (RFC 9211 format)

Cache-Status: Netlify Edge; hit

Cache Status Normalisation

All CDN-specific headers are normalised to standard values:

func normaliseCacheStatus(status string) string {
    // Cloudflare: "MISS" → "MISS"
    // CloudFront: "Miss from cloudfront" → "MISS"  
    // Akamai: "TCP_MISS from child" → "MISS"
    // Fastly: "HIT, MISS" → "MISS" (last value)
    
    switch strings.ToUpper(status) {
    case "HIT", "MISS", "BYPASS", "EXPIRED", "STALE":
        return status
    }
}

Performance Metrics

First vs Second Response Time

Adapt tracks both cache MISS and HIT performance:

{
  "url": "https://example.com/page",
  "response_time": 850,           // Initial MISS (origin)
  "cache_status": "MISS",
  "second_response_time": 45,     // After caching (CDN)
  "second_cache_status": "HIT",
  "cache_check_attempts": [
    {"attempt": 1, "cache_status": "MISS", "delay": 700},
    {"attempt": 2, "cache_status": "HIT",  "delay": 1000}
  ]
}

Performance Timing Breakdown

Detailed timing for each phase:

{
  "performance": {
    "dns_lookup_time": 12,
    "tcp_connection_time": 35,
    "tls_handshake_time": 78,
    "ttfb": 245,                    // Time to first byte
    "content_transfer_time": 142
  },
  "second_performance": {
    "dns_lookup_time": 0,           // Cached connection
    "tcp_connection_time": 0,
    "tls_handshake_time": 0,
    "ttfb": 28,                      // CDN edge response
    "content_transfer_time": 17
  }
}

When Cache Warming Happens

Automatic Warming Triggers

Cache Miss Detected

When CF-Cache-Status: MISS or equivalent is returned

Expired Content

When Cache-Status: EXPIRED indicates stale content

Skip for Non-Cacheable

No warming for DYNAMIC or BYPASS (saves resources)

Warming Decision Logic

func shouldMakeSecondRequest(cacheStatus string) bool {
    switch strings.ToUpper(cacheStatus) {
    case "MISS", "EXPIRED":
        return true  // Warm these
    case "HIT", "STALE", "REVALIDATED":
        return false // Already cached
    case "BYPASS", "DYNAMIC", "NONE":
        return false // Uncacheable
    }
}

Rate Limiting & Politeness

Configurable Delays

Respect origin servers with built-in rate limiting:

config := crawler.DefaultConfig()
config.RateLimit = 5              // 5 requests/second default
config.MaxConcurrency = 20        // Parallel requests
config.DefaultTimeout = 30 * time.Second

Robots.txt Compliance

Automatic crawl-delay detection:

// Parse robots.txt for crawl-delay directive
if robotsRules.CrawlDelay > 0 {
    // Update domain crawl delay in database
    domainLimiter.Seed(domain, robotsRules.CrawlDelay)
}

Adaptive Rate Limiting

Dynamic adjustment based on server response:

Fast Servers: Increase concurrency up to job limit
Slow Servers: Reduce rate to prevent overload
Error Threshold: Back off on 429/503 responses

Priority Processing

Homepage First

Critical pages are warmed first:

// Homepage gets highest priority
if path == "/" {
    page.Priority = 1.000
} else {
    page.Priority = 0.1 // Sitemap default
}

Priority Queue

Tasks are processed by priority score:

SELECT * FROM tasks 
WHERE status = 'pending'
ORDER BY priority_score DESC, created_at ASC

Browser-Like Behaviour

Mimic real user requests to avoid blocking:

// Set browser-like headers
request.Headers.Set("Accept", "text/html,application/xhtml+xml,application/xml")
request.Headers.Set("Accept-Language", "en-US,en;q=0.9")
request.Headers.Set("Accept-Encoding", "gzip, deflate, br")
request.Headers.Set("Referer", "https://example.com/")

User Agent

Adapt/1.0 (+https://adapt.beehivebusiness.builders/bot)

Identifiable crawler that respects robots.txt and rate limits.

Configuration Options

Job-Level Settings

{
  "domain": "example.com",
  "concurrency": 20,        // Parallel crawl threads
  "find_links": true,       // Discover and warm linked pages
  "max_pages": 1000,        // Limit total pages
  "include_paths": ["/blog/*"],
  "exclude_paths": ["/admin/*"]
}

Domain-Level Overrides

Persistent settings per domain:

Crawl Delay: From robots.txt or manual override
Adaptive Delay: Learned optimal rate
Concurrency Limit: Domain-specific max

Use Cases

Post-Publish Warming

Warm cache immediately after publishing new content so first visitors get fast page loads.

Scheduled Cache Refresh

Regularly warm cache before expiration to maintain consistent performance.

CDN Cache Purge Recovery

Quickly repopulate cache after purging for deployments or updates.

New Site Launch

Pre-warm entire site before announcing to ensure great first impressions.

Performance Impact

Improved Metrics

Time to First Byte

Before: 850ms (origin)After: 45ms (CDN edge)Improvement: 95% faster

Full Page Load

Before: 2.1s (uncached)After: 0.3s (cached)Improvement: 86% faster

Resource Efficiency

Origin Load: Reduced by 90%+ with effective caching
Bandwidth Costs: Lower egress from origin servers
Server Capacity: Handle more traffic with same infrastructure

Best Practices

Warm After Publish

Trigger warming via webhook when content is published

Schedule Regular Warming

Keep cache fresh with recurring crawls (12-24 hour intervals)

Priority Homepage

Ensure critical pages are warmed first for best user experience

Monitor Cache Hit Rate

Track cache status metrics to optimise CDN configuration

Performance Monitoring - Track cache effectiveness
Scheduled Crawls - Automate regular cache warming
Broken Link Detection - Ensure cached pages work

Get Started

Core Features

Integrations

Guides

​Overview

​How It Works

​Two-Phase Warming Strategy

​Phase 1: Initial Request

​Phase 2: Cache Validation

​Intelligent Cache Detection

​Cloudflare (64% market share)

​CloudFront (AWS)

​Fastly

​Akamai

​Vercel

​Netlify (RFC 9211 format)

​Cache Status Normalisation

​Performance Metrics

​First vs Second Response Time

​Performance Timing Breakdown

​When Cache Warming Happens

​Automatic Warming Triggers

​Warming Decision Logic

​Rate Limiting & Politeness

​Configurable Delays

​Robots.txt Compliance

​Adaptive Rate Limiting

​Priority Processing

​Homepage First

​Priority Queue

​Browser-Like Behaviour

​User Agent

​Configuration Options

​Job-Level Settings

​Domain-Level Overrides

​Use Cases

​Performance Impact

​Improved Metrics