LESSON 04
FINAL LESSON
Software Architecture Basics
Caching and Performance
The fastest code is code that never runs. The fastest query is a query you never make.
11 min read
Caching is storing the result of an expensive operation so you can reuse it instead of recalculating it every time. When a user requests their profile, you query the database once, cache the result in memory, and serve the cached version to the next 1000 requests. Caching trades memory for speed and freshness for performance. Every cache introduces a new problem: when does the cached data become stale, and how do you invalidate it?
There are cache layers at every level of the stack. The browser caches static assets like images and CSS. A CDN (content delivery network) caches entire pages at edge locations near users. Your application server caches database query results in memory. Your database caches frequently accessed rows. Each layer has different invalidation semantics and different costs. Understanding which layer to cache at determines whether your optimization works or creates bugs.
The two hardest problems in computer science are cache invalidation, naming things, and off-by-one errors. Cache invalidation is hard because deciding when cached data is no longer valid requires understanding all the ways data can change. If you cache a user's profile and they update their email, you must invalidate the cache. If you forget to invalidate, users see stale data. If you invalidate too aggressively, you lose the performance benefit.
Time-to-live (TTL) is the simplest invalidation strategy. Cached data expires after a fixed duration — 60 seconds, 5 minutes, 1 hour. The cache automatically refreshes when the TTL expires. This works well when you can tolerate stale data for the duration of the TTL. It fails when you need immediate consistency — if a user changes their password, you cannot wait 5 minutes for the cache to expire before enforcing the new password.
Write-through and write-behind are strategies for keeping caches in sync with the database. Write-through updates both the cache and the database simultaneously — slower writes but guaranteed consistency. Write-behind updates the cache immediately and writes to the database asynchronously — faster writes but risk of data loss if the system crashes before the database write completes. The choice depends on whether you prioritize speed or durability.
The cache hit ratio measures how often requests are served from the cache versus hitting the underlying data source. A 95% hit ratio means 19 out of 20 requests avoid the database. A 50% hit ratio means caching is barely helping. Low hit ratios indicate either the wrong things are being cached or the cache is too small and evicting frequently accessed data. Monitoring hit ratios is how you know if your caching strategy works.
Thundering herd is the problem that occurs when a popular cache entry expires and suddenly thousands of requests hit the database simultaneously trying to regenerate it. If the database cannot handle the spike, it crashes, making the problem worse. The solution is cache stampede protection: the first request to detect the expired cache locks the key, regenerates the value, and updates the cache while other requests wait for the updated value instead of querying the database.
Not everything should be cached. Data that changes frequently or is unique per user often costs more to cache than it saves. Caching a user-specific recommendation feed that is regenerated every time they log in provides no benefit — you still have to compute it. Caching a homepage that changes once per day and is viewed by millions provides massive benefit. The rule is: cache what is expensive to compute and accessed frequently by many users.
Cache for performance. Invalidate for correctness. Get invalidation wrong and your cache becomes a liability.
This lesson is coming soon.
TERMS
Term of focus
Cache
A high-speed storage layer that stores copies of frequently accessed data to reduce latency and load on slower storage systems (databases, APIs, disk). Caches trade memory for speed — reads are fast but writes are complex because you must keep the cache in sync with the source of truth. Effective caching can reduce database load by 90% or more.
The process of removing or updating stale data in a cache when the underlying data changes. Invalidation is the hardest part of caching because you must track all the ways data can change and ensure the cache reflects those changes. Incorrect invalidation leads to users seeing outdated information, which can be worse than having no cache at all.
The duration a cached item remains valid before automatically expiring. A TTL of 60 seconds means the cache serves stale data for up to 60 seconds after the source data changes. Short TTLs reduce staleness but increase cache misses. Long TTLs improve performance but increase staleness. Setting the right TTL requires understanding how fresh your data needs to be.
The percentage of requests served from the cache versus requests that miss the cache and hit the underlying data source. A 95% hit ratio means 95% of requests avoid the database. Hit ratio is the primary metric for cache effectiveness. A low hit ratio indicates the cache is undersized, evicting too aggressively, or caching the wrong data.
A geographically distributed network of servers that cache static content (images, CSS, JavaScript, videos) close to users. When a user in Tokyo requests your site, the CDN serves assets from a Tokyo server instead of your US origin server, reducing latency from 200ms to 20ms. CDNs handle cache invalidation via versioned URLs or cache purge APIs.
Two strategies for keeping caches synchronized with the database. Write-through writes to both cache and database simultaneously, ensuring consistency at the cost of slower writes. Write-behind updates the cache immediately and queues the database write for later, improving write performance but risking data loss if the system crashes before the write completes.
A failure mode where many requests simultaneously attempt to regenerate an expired cache entry, overwhelming the database. Occurs when a popular cache key expires and hundreds of requests hit the database at once instead of being served from cache. Prevented by cache stampede protection — ensuring only one request regenerates the cache while others wait.
BEFORE YOUR NEXT MEETING
— What is our cache hit ratio for our most expensive database queries, and if it is below 80%, do we know why?
— When a user updates their profile, how do we ensure the cached version is invalidated across all cache layers — application, CDN, browser?
— If our most popular cached item expired during peak traffic, would our database survive the thundering herd, or do we have stampede protection in place?
— Can you walk me through what happens when we deploy new code that changes a cached data format — do we have a cache versioning strategy?
— What is the TTL for our most critical cached data, and how did we decide that number — was it measured or guessed?
REALITY CHECK
SOURCES
↗Phil Karlton — "There are only two hard things in Computer Science" (attributed)
↗Cloudflare — "What is a CDN?" (2023)
↗Redis Documentation — "Caching Strategies"
↗Facebook Engineering — "Scaling Memcache at Facebook" (2013)
↗High Scalability — "Cache Stampede Protection"
↗Martin Kleppmann — "Designing Data-Intensive Applications: Chapter 3" (2017)
LESSON 04 OF 04