How it works
The prompt cache uses an LRU (Least Recently Used) strategy with:- In-memory storage - Fast access without network calls
- TTL-based expiration - Entries refresh after a configurable time
- Background refresh - Stale entries update automatically without blocking
- Stale-while-revalidate - Returns cached data immediately while refreshing in background
Basic usage
Prompt caching is enabled by default when using the LangSmith client:Configuring the cache
Customize cache behavior globally:Using a custom cache instance
Create and manage your own cache:Disabling the cache
SetmaxSize to 0 to disable caching:
Offline mode
Use infinite TTL for offline/disconnected environments:Saving and loading cache
Persist the cache to disk for offline use:Cache metrics
Monitor cache performance:Invalidating entries
Manually remove entries from cache:Background refresh behavior
When a cached entry becomes stale:- The cached value is returned immediately (no blocking)
- A background task refreshes the entry from the API
- The next request gets the updated value
- No latency increase when cache is stale
- Always eventually consistent with latest prompt version
- Automatic recovery from API errors (keeps stale data)
Best practices
if client.prompt_cache.metrics.hit_rate < 0.8:
print("Consider increasing max_size or ttl_seconds")
Async clients (Python)
For async Python clients, useAsyncPromptCache: