Skip to main content

Overview

Adapt’s cache warming system ensures your CDN serves cached content by detecting cache misses and automatically re-requesting pages until they’re cached. This dramatically improves initial page load times after publishing.

How It Works

Two-Phase Warming Strategy

Phase 1: Initial Request

First request to each URL to trigger cache population:
// Initial crawl request
result, err := crawler.WarmURL(ctx, targetURL, findLinks)

// Check cache status from CDN headers
if result.CacheStatus == "MISS" || result.CacheStatus == "EXPIRED" {
    // Proceed to Phase 2
}

Phase 2: Cache Validation

After detecting a miss, Adapt waits for the CDN to cache the response, then validates:
// Apply randomised delay (500-1000ms) to allow CDN processing
jitteredDelay := 500 + rand.Intn(501)
time.Sleep(time.Duration(jitteredDelay) * time.Millisecond)

// Check cache status with HEAD requests (up to 3 attempts)
for attempt := 1; attempt <= 3; attempt++ {
    cacheStatus := checkCacheStatus(url)
    if cacheStatus == "HIT" {
        // Make final GET request to measure cached performance
        break
    }
    time.Sleep(time.Duration(checkDelay) * time.Millisecond)
    checkDelay += 300 // Exponential backoff
}

Intelligent Cache Detection

Adapt supports all major CDN cache headers:

Cloudflare (64% market share)

CF-Cache-Status: HIT|MISS|DYNAMIC|BYPASS|EXPIRED|STALE

CloudFront (AWS)

X-Cache: Hit from cloudfront
X-Cache: Miss from cloudfront

Fastly

X-Cache: HIT
X-Cache: HIT, HIT  # Shielding (takes last value)

Akamai

X-Cache-Remote: TCP_HIT
X-Cache-Remote: TCP_MISS from child

Vercel

x-vercel-cache: HIT|MISS|STALE|PRERENDER

Netlify (RFC 9211 format)

Cache-Status: Netlify Edge; hit

Cache Status Normalisation

All CDN-specific headers are normalised to standard values:
func normaliseCacheStatus(status string) string {
    // Cloudflare: "MISS" → "MISS"
    // CloudFront: "Miss from cloudfront" → "MISS"  
    // Akamai: "TCP_MISS from child" → "MISS"
    // Fastly: "HIT, MISS" → "MISS" (last value)
    
    switch strings.ToUpper(status) {
    case "HIT", "MISS", "BYPASS", "EXPIRED", "STALE":
        return status
    }
}

Performance Metrics

First vs Second Response Time

Adapt tracks both cache MISS and HIT performance:
{
  "url": "https://example.com/page",
  "response_time": 850,           // Initial MISS (origin)
  "cache_status": "MISS",
  "second_response_time": 45,     // After caching (CDN)
  "second_cache_status": "HIT",
  "cache_check_attempts": [
    {"attempt": 1, "cache_status": "MISS", "delay": 700},
    {"attempt": 2, "cache_status": "HIT",  "delay": 1000}
  ]
}

Performance Timing Breakdown

Detailed timing for each phase:
{
  "performance": {
    "dns_lookup_time": 12,
    "tcp_connection_time": 35,
    "tls_handshake_time": 78,
    "ttfb": 245,                    // Time to first byte
    "content_transfer_time": 142
  },
  "second_performance": {
    "dns_lookup_time": 0,           // Cached connection
    "tcp_connection_time": 0,
    "tls_handshake_time": 0,
    "ttfb": 28,                      // CDN edge response
    "content_transfer_time": 17
  }
}

When Cache Warming Happens

Automatic Warming Triggers

1

Cache Miss Detected

When CF-Cache-Status: MISS or equivalent is returned
2

Expired Content

When Cache-Status: EXPIRED indicates stale content
3

Skip for Non-Cacheable

No warming for DYNAMIC or BYPASS (saves resources)

Warming Decision Logic

func shouldMakeSecondRequest(cacheStatus string) bool {
    switch strings.ToUpper(cacheStatus) {
    case "MISS", "EXPIRED":
        return true  // Warm these
    case "HIT", "STALE", "REVALIDATED":
        return false // Already cached
    case "BYPASS", "DYNAMIC", "NONE":
        return false // Uncacheable
    }
}

Rate Limiting & Politeness

Configurable Delays

Respect origin servers with built-in rate limiting:
config := crawler.DefaultConfig()
config.RateLimit = 5              // 5 requests/second default
config.MaxConcurrency = 20        // Parallel requests
config.DefaultTimeout = 30 * time.Second

Robots.txt Compliance

Automatic crawl-delay detection:
// Parse robots.txt for crawl-delay directive
if robotsRules.CrawlDelay > 0 {
    // Update domain crawl delay in database
    domainLimiter.Seed(domain, robotsRules.CrawlDelay)
}

Adaptive Rate Limiting

Dynamic adjustment based on server response:
  • Fast Servers: Increase concurrency up to job limit
  • Slow Servers: Reduce rate to prevent overload
  • Error Threshold: Back off on 429/503 responses

Priority Processing

Homepage First

Critical pages are warmed first:
// Homepage gets highest priority
if path == "/" {
    page.Priority = 1.000
} else {
    page.Priority = 0.1 // Sitemap default
}

Priority Queue

Tasks are processed by priority score:
SELECT * FROM tasks 
WHERE status = 'pending'
ORDER BY priority_score DESC, created_at ASC

Browser-Like Behaviour

Mimic real user requests to avoid blocking:
// Set browser-like headers
request.Headers.Set("Accept", "text/html,application/xhtml+xml,application/xml")
request.Headers.Set("Accept-Language", "en-US,en;q=0.9")
request.Headers.Set("Accept-Encoding", "gzip, deflate, br")
request.Headers.Set("Referer", "https://example.com/")

User Agent

Adapt/1.0 (+https://adapt.beehivebusiness.builders/bot)
Identifiable crawler that respects robots.txt and rate limits.

Configuration Options

Job-Level Settings

{
  "domain": "example.com",
  "concurrency": 20,        // Parallel crawl threads
  "find_links": true,       // Discover and warm linked pages
  "max_pages": 1000,        // Limit total pages
  "include_paths": ["/blog/*"],
  "exclude_paths": ["/admin/*"]
}

Domain-Level Overrides

Persistent settings per domain:
  • Crawl Delay: From robots.txt or manual override
  • Adaptive Delay: Learned optimal rate
  • Concurrency Limit: Domain-specific max

Use Cases

Warm cache immediately after publishing new content so first visitors get fast page loads.
Regularly warm cache before expiration to maintain consistent performance.
Quickly repopulate cache after purging for deployments or updates.
Pre-warm entire site before announcing to ensure great first impressions.

Performance Impact

Improved Metrics

Time to First Byte

Before: 850ms (origin)After: 45ms (CDN edge)Improvement: 95% faster

Full Page Load

Before: 2.1s (uncached)After: 0.3s (cached)Improvement: 86% faster

Resource Efficiency

  • Origin Load: Reduced by 90%+ with effective caching
  • Bandwidth Costs: Lower egress from origin servers
  • Server Capacity: Handle more traffic with same infrastructure

Best Practices

Warm After Publish

Trigger warming via webhook when content is published

Schedule Regular Warming

Keep cache fresh with recurring crawls (12-24 hour intervals)

Priority Homepage

Ensure critical pages are warmed first for best user experience

Monitor Cache Hit Rate

Track cache status metrics to optimise CDN configuration

Build docs developers (and LLMs) love