Under the Hood: The Secrets of Redis Data Structures
Why does Redis use a custom string implementation instead of C's char*? Why are small lists stored differently than large ones? The answer is always: Memory Efficiency and CPU Cache Performance. In this chapter, we go deep into the bytes.
1. Intuition: The Pointer Tax
Why this exists in Redis
In a standard Linked List, every element has a "next" and "prev" pointer. On a 64-bit system, that's 16 bytes of overhead per element just for links. If your data is only 4 bytes (an integer), you're wasting 80% of your memory on pointers.
Real-world problem it solves: Reducing memory fragmentation and pointer overhead to fit more data into expensive RAM.
The Analogy (The Suitcase)
- Standard List: 100 small boxes, each with a long string tied to the next box. Very flexible, but the strings take up more space than the boxes.
- Redis Ziplist: A single suitcase where 100 items are packed tightly against each other. No strings, just items side-by-side.
2. Internal Architecture
How Redis designs this feature
Redis uses Data Evolution (Promotion). It starts with "Small" versions of structures and promotes them to "Large" versions only when data hits a threshold.
Components Involved
- Small (Ziplist/Intset): Packed memory, $O(N)$ search but extremely low RAM. No pointers.
- Large (Quicklist/Hashtable): Pointer-based, $O(1)$ or $O(log N)$ search, higher RAM.
Trade-offs: Memory Locality vs. Speed
CPUs load data in "Cache Lines" (64 bytes). Contiguous structures (Ziplists) fit in cache lines and are faster for small data. Pointer-based structures (Linked Lists) cause "Cache Misses" but are faster for massive datasets ($O(1)$).
3. End-to-End Flow: Adding to a Set
Client → TCP → Event Loop → Command Queue → Parser → Execution → Data Store → Response
-
Client: Sends
SADD myset 100. -
Server: Checks if
mysetexists. If not, it creates a new Set Object. -
Heuristic: Since
100is an integer, it initializes an Intset. -
Insertion: The value is added to a sorted array of 16-bit integers.
-
Growth: If you then send
SADD myset "hello", Redis realizes a sorted array of integers can't hold a string. -
Promotion: It converts the Intset into a Hash Table ($O(1)$ lookup) instantly.
4. Internals Breakdown
SDS (Simple Dynamic Strings)
Unlike C strings which end in \0 (requiring O(N) to find length), SDS stores:
[LEN] [FREE] [DATA] \0
- O(1) Length: Fast
STRLEN. - Binary Safe: Can contain
\0because it relies on theLENfield.
Skiplists (The Sorted Set Magic)
Redis Sorted Sets (ZSets) use a Skiplist. It's a multi-layer linked list that allows $O(log N)$ search without the complex rebalancing logic of a Red-Black tree.
5. Node.js Reimplementation (Hands-on)
Step 1: Simulating SDS with Buffer
class SDS {
constructor(str) {
this.buf = Buffer.from(str);
this.len = this.buf.length;
this.free = 0; // We could pre-allocate for growth
}
get length() { return this.len; }
}Step 2: Implementation of a Sorted Intset
class Intset {
constructor() {
this.data = new Int32Array(0); // Contiguous memory
}
add(val) {
// Step 1: Find insertion point (binary search)
// Step 2: Create NEW Int32Array with +1 size
// Step 3: Copy old data and insert new val
// This is O(N) but 0 pointer overhead!
}
}Step 3: The Encoding Switch
class RedisSet {
constructor() {
this.internal = new Intset();
this.isHash = false;
}
add(val) {
if (!this.isHash && !Number.isInteger(val)) {
this.promoteToHash();
}
this.internal.add(val);
}
}6. Performance, High Concurrency & Backpressure
Data structures (SDS, Skiplist, Ziplist)
- SDS: Pre-calculates length to avoid $O(N)$
strlencalls. - Ziplists: Eliminates pointer overhead for small collections.
- Skiplists: Enables $O(log N)$ range queries in Sorted Sets.
Bottlenecks & Scaling
- Bottleneck: CPU Cache Misses. Pointer-based structures cause the CPU to wait for RAM.
- Scaling: As collections grow, they are "Promoted" from Ziplists to linked structures. This promotion is a blocking $O(N)$ operation that can cause a momentary latency spike.
7. Redis vs. Our Implementation: What we Simplified
- Intsets: Redis uses bit-packing for integers (16, 32, or 64 bits). JavaScript numbers are always 64-bit floats.
- Quicklists: Redis uses a "List of Ziplists" to get the speed of pointers with the density of packed memory. We use standard JS Arrays/Maps.
8. Why Redis is Optimized
Redis is optimized for Memory Locality. By keeping small data together in contiguous memory blocks, it ensures that the CPU spends its time "Thinking" rather than "Waiting" for bytes to arrive from RAM.
- Pointers: Redis (C) manages every byte. In Node.js, even a
BufferorTypedArrayhas some hidden object overhead from the V8 engine. - TypedArrays:
Int32Arrayis the closest we get to Redis' contiguous memory optimizations in JavaScript.
8. Summary & Key Takeaways
- Specialized structures save RAM: Don't use a Hash Table if a Sorted Array will do.
- Locality matters: Contiguous memory is faster for the CPU.
- SDS is safer and faster than C's native strings.
Next Step: You've mastered the internals! Now let's explore Clustering, Replication, and High Availability.