Database
NoSQL
Redis
Under the Hood: Data Structures

Under the Hood: The Secrets of Redis Data Structures

Why does Redis use a custom string implementation instead of C's char*? Why are small lists stored differently than large ones? The answer is always: Memory Efficiency and CPU Cache Performance. In this chapter, we go deep into the bytes.


1. Intuition: The Pointer Tax

Why this exists in Redis

In a standard Linked List, every element has a "next" and "prev" pointer. On a 64-bit system, that's 16 bytes of overhead per element just for links. If your data is only 4 bytes (an integer), you're wasting 80% of your memory on pointers.

Real-world problem it solves: Reducing memory fragmentation and pointer overhead to fit more data into expensive RAM.

The Analogy (The Suitcase)

  • Standard List: 100 small boxes, each with a long string tied to the next box. Very flexible, but the strings take up more space than the boxes.
  • Redis Ziplist: A single suitcase where 100 items are packed tightly against each other. No strings, just items side-by-side.

2. Internal Architecture

How Redis designs this feature

Redis uses Data Evolution (Promotion). It starts with "Small" versions of structures and promotes them to "Large" versions only when data hits a threshold.

Components Involved

  • Small (Ziplist/Intset): Packed memory, $O(N)$ search but extremely low RAM. No pointers.
  • Large (Quicklist/Hashtable): Pointer-based, $O(1)$ or $O(log N)$ search, higher RAM.

Trade-offs: Memory Locality vs. Speed

CPUs load data in "Cache Lines" (64 bytes). Contiguous structures (Ziplists) fit in cache lines and are faster for small data. Pointer-based structures (Linked Lists) cause "Cache Misses" but are faster for massive datasets ($O(1)$).


3. End-to-End Flow: Adding to a Set

Client → TCP → Event Loop → Command Queue → Parser → Execution → Data Store → Response

  1. Client: Sends SADD myset 100.

  2. Server: Checks if myset exists. If not, it creates a new Set Object.

  3. Heuristic: Since 100 is an integer, it initializes an Intset.

  4. Insertion: The value is added to a sorted array of 16-bit integers.

  5. Growth: If you then send SADD myset "hello", Redis realizes a sorted array of integers can't hold a string.

  6. Promotion: It converts the Intset into a Hash Table ($O(1)$ lookup) instantly.


4. Internals Breakdown

SDS (Simple Dynamic Strings)

Unlike C strings which end in \0 (requiring O(N) to find length), SDS stores: [LEN] [FREE] [DATA] \0

  • O(1) Length: Fast STRLEN.
  • Binary Safe: Can contain \0 because it relies on the LEN field.

Skiplists (The Sorted Set Magic)

Redis Sorted Sets (ZSets) use a Skiplist. It's a multi-layer linked list that allows $O(log N)$ search without the complex rebalancing logic of a Red-Black tree.


5. Node.js Reimplementation (Hands-on)

Step 1: Simulating SDS with Buffer

class SDS {
 constructor(str) {
 this.buf = Buffer.from(str);
 this.len = this.buf.length;
 this.free = 0; // We could pre-allocate for growth
 }
 
 get length() { return this.len; }
}

Step 2: Implementation of a Sorted Intset

class Intset {
 constructor() {
 this.data = new Int32Array(0); // Contiguous memory
 }
 
 add(val) {
 // Step 1: Find insertion point (binary search)
 // Step 2: Create NEW Int32Array with +1 size
 // Step 3: Copy old data and insert new val
 // This is O(N) but 0 pointer overhead!
 }
}

Step 3: The Encoding Switch

class RedisSet {
 constructor() {
 this.internal = new Intset();
 this.isHash = false;
 }
 
 add(val) {
 if (!this.isHash && !Number.isInteger(val)) {
 this.promoteToHash();
 }
 this.internal.add(val);
 }
}

6. Performance, High Concurrency & Backpressure

Data structures (SDS, Skiplist, Ziplist)

  • SDS: Pre-calculates length to avoid $O(N)$ strlen calls.
  • Ziplists: Eliminates pointer overhead for small collections.
  • Skiplists: Enables $O(log N)$ range queries in Sorted Sets.

Bottlenecks & Scaling

  • Bottleneck: CPU Cache Misses. Pointer-based structures cause the CPU to wait for RAM.
  • Scaling: As collections grow, they are "Promoted" from Ziplists to linked structures. This promotion is a blocking $O(N)$ operation that can cause a momentary latency spike.

7. Redis vs. Our Implementation: What we Simplified

  • Intsets: Redis uses bit-packing for integers (16, 32, or 64 bits). JavaScript numbers are always 64-bit floats.
  • Quicklists: Redis uses a "List of Ziplists" to get the speed of pointers with the density of packed memory. We use standard JS Arrays/Maps.

8. Why Redis is Optimized

Redis is optimized for Memory Locality. By keeping small data together in contiguous memory blocks, it ensures that the CPU spends its time "Thinking" rather than "Waiting" for bytes to arrive from RAM.

  • Pointers: Redis (C) manages every byte. In Node.js, even a Buffer or TypedArray has some hidden object overhead from the V8 engine.
  • TypedArrays: Int32Array is the closest we get to Redis' contiguous memory optimizations in JavaScript.

8. Summary & Key Takeaways

  • Specialized structures save RAM: Don't use a Hash Table if a Sorted Array will do.
  • Locality matters: Contiguous memory is faster for the CPU.
  • SDS is safer and faster than C's native strings.

Next Step: You've mastered the internals! Now let's explore Clustering, Replication, and High Availability.