Pipelining & AOF: The Art of Durability
Redis is an in-memory database, which means if the power goes out, the data vanishes. To prevent this, Redis uses Persistence. In this chapter, we'll implement AOF (Append Only File)—a high-performance journaling mechanism—and explore how Pipelining makes our networking 10x faster.
1. Intuition: The "Journal" vs "Snapshot"
Why this exists in Redis
If you have 100GB of data, saving the entire dataset to disk every time a single key changes is impossible—it would take seconds and block the server.
The Analogy (The Accountant)
- AOF (Append Only File): An accountant's ledger. Every time a transaction happens, they write a new line at the bottom. It's fast and sequential.
- RDB (Snapshot): A photograph of the entire office at 5:00 PM. It's complete, but it doesn't show what happened between 4:00 and 5:00.
Real-world problem solved: Maximum durability with minimal performance impact on the main execution thread.
2. Internal Architecture
How Redis designs this feature
Redis designs AOF around Sequential I/O. Appending to the end of a file is much faster than jumping around to different locations (Random I/O).
Components Involved
- AOF Buffer: A memory space where commands are queued before being flushed to disk.
- OS Page Cache: The operating system's temporary storage for file writes.
- The Disk: The final, permanent destination.
Trade-offs
- Durability vs. Speed: If you wait for the disk to confirm every write (
fsync), Redis becomes slow. If you don't wait, you might lose 1 second of data if the OS crashes.
3. End-to-End Flow (VERY IMPORTANT)
Client → TCP → Event Loop → Command Queue → Parser → Execution → Data Store → Response
-
Client: Sends a batch of commands (Pipelining).
-
Event Loop: Reads the full TCP buffer.
-
Command Engine: Executes all commands in the batch sequentially.
-
AOF Logger:
- Takes the raw RESP representation of the batch.
- Appends it to the AOF Buffer.
- Kernel: Moves data from AOF Buffer to Page Cache.
- Fsync Policy: Depending on setup (e.g.,
everysec), the OS forcefully flushes the Page Cache to Physical Disk.
4. Internals Breakdown
The fsync() Syscall
When you "write" to a file, the OS often just puts it in RAM (Page Cache). To guarantee it's on the platter/flash, we must call fsync().
AOF Rewriting (Copy-on-Write)
If you INCR counter 1 million times, the AOF has 1 million lines. A Rewrite scans the current Map in memory and generates a new, minimal file: SET counter 1000000.
5. Node.js Reimplementation (Hands-on)
Step 1: Pipelining (The Multi-Command Loop)
socket.on('data', (buffer) => {
let offset = 0;
while (offset < buffer.length) {
const { command, bytesRead } = parseRESP(buffer, offset);
const response = execute(command);
socket.write(response);
// Log to AOF only if it's a write command
if (isWrite(command)) logToAOF(buffer.slice(offset, offset + bytesRead));
offset += bytesRead;
}
});Step 2: Persistence Logic
const fs = require('fs');
const aofFd = fs.openSync('appendonly.aof', 'a');
function logToAOF(respBuffer) {
// 1. Write to OS Page Cache
fs.writeSync(aofFd, respBuffer);
// 2. The fsync policy (example: Every Write - VERY SLOW)
// fs.fsyncSync(aofFd);
}Step 3: Background Sync (The "Everysec" Rule)
setInterval(() => {
// Force the OS to move data from RAM to Disk
fs.fsync(aofFd, (err) => {
if (!err) console.log('AOF Flushed to disk');
});
}, 1000);6. Performance, High Concurrency & Backpressure
High Concurrency Behavior
Redis uses the AOF Re-write (BGREWRITEAOF) to keep the log small. This happens in a child process, ensuring the main thread remains fast and ready for high-concurrency traffic even while the disk is writing gigabytes of data.
Disk Backpressure & Bottlenecks
- Backpressure: If the disk is slow to
fsync, the main thread must block (Wait) for the disk to confirm. This is "Disk Backpressure". - Bottlenecks: Sequential I/O speed. No matter how fast Redis is, it can't move faster than the hardware's ability to persist bits to the platter or flash.
7. Redis vs. Our Implementation: What we Simplified
- Copy-on-Write (CoW): Redis uses
fork()to let the OS handle the memory snapshot efficiently. We usefs.writeFileSync()which blocks the entire server in Node.js. - Incremental Fsync: Redis can
fsyncin 32MB chunks to avoid a massive "IO Stall" at the end of a big write.
8. Why Redis is Optimized
Redis is optimized for Crash Recovery. By using a sequential Append-Only format, it ensures that even a sudden power loss only ever loses the very last command, never the entire database structure.
- Forking: Redis uses the
fork()syscall to create a child process for rewriting. This allows the child to see a consistent "snapshot" of memory (Copy-on-Write) while the parent continues serving clients. - Non-blocking IO: Redis uses its own I/O threading to ensure
fsyncdoesn't purely block the execution thread in modern versions.
8. Summary & Key Takeaways
- Append-only is fast because it uses sequential I/O.
- Pipelining reduces network RTT overhead.
- Durability is a spectrum controlled by the
fsyncpolicy.
Next Step: We've mastered persistence. Now let's go deep into memory optimization: Redis Objects, Encodings, and the INFO Command.