Speaking RESP: The Redis Serialization Protocol
In the last chapter, we built an Event Loop that can accept TCP connections. But a connection is just a stream of raw bytes. To make it a database, we need a language. Redis speaks RESP (Redis Serialization Protocol).
1. Intuition: Why not JSON?
Why this exists in Redis
If you send {"command": "SET", "key": "user", "value": "alice"}, the server has to:
- Scan for matching brackets.
- Handle escaping.
- Allocate memory for the entire string before parsing.
Real-world problem it solves: Parsing overhead. At 100k requests/sec, even a microsecond of JSON parsing creates a massive bottleneck. RESP is designed to be O(1) to find the end of a field and O(N) to parse the value.
The Analogy
RESP is like a pre-measured recipe. Instead of "some flour", it says "500g: [500 grams of flour]". You don't need to look for where the flour ends; you already have the scale ready.
2. Internal Architecture
How Redis designs this feature
RESP is a binary-safe, human-readable (mostly) protocol that uses a Type Prefix and Length Prefix strategy.
Components Involved
- Prefix Decoders: Maps characters like
*or$to types. - Length Parsers: Extracts the subsequent bytes to avoid scanning for delimiters.
Trade-offs
- Pros: Extremely simple to implement, extremely fast to parse. Binary safe (you can send images).
- Cons: Not as compact as binary protocols like Protobuf or MessagePack.
3. End-to-End Flow (VERY IMPORTANT)
Client → TCP → Event Loop → Command Queue → Parser → Execution → Data Store → Response
For a simple SET mykey value command:
- Client: Wraps command in an array:
*3\r\n$3\r\nSET\r\n$5\r\nmykey\r\n$5\r\nvalue\r\n. - TCP: Sends raw bytes to the server.
- Event Loop: Wakes up and reads the buffer.
- Parser:
- Reads
*. Knows an array is coming. - Reads
3. Knows it needs to parse 3 more elements. - Reads
$. Knows a bulk string is coming. - Reads
3. Knows exactly 3 bytes of data follow.
- Execution: The parser emits
['SET', 'mykey', 'value']to the command engine. - Response: The engine writes back
+OK\r\n.
4. Internals Breakdown
Data Structures: The Buffer Pointer
In C, Redis uses a read buffer and an offset pointer. It doesn't "split" strings which causes memory allocations. It simply looks at the buffer and says "the value starts at index X and is Y bytes long".
Memory Behavior
Small simple strings are often allocated on the stack or reused from a pool. Bulk strings are allocated on the heap.
5. Node.js Reimplementation (Hands-on)
Step 1: The Setup
We need to handle the fact that TCP packets might be "fragmented" (one command arriving in two data events).
class RESPParser {
constructor() {
this.buffer = Buffer.alloc(0);
}
feed(data) {
this.buffer = Buffer.concat([this.buffer, data]);
}
}Step 2: Core Logic (Parsing Bulk Strings)
Bulk strings are the bread and butter of Redis.
parseBulkString() {
// Expected format: $5\r\nhello\r\n
const firstNewline = this.buffer.indexOf('\r\n');
if (firstNewline === -1) return null; // Wait for more data
const length = parseInt(this.buffer.slice(1, firstNewline).toString());
const totalLength = firstNewline + 2 + length + 2;
if (this.buffer.length < totalLength) return null; // Wait for more data
const value = this.buffer.slice(firstNewline + 2, firstNewline + 2 + length);
this.buffer = this.buffer.slice(totalLength); // Clear the buffer
return value.toString();
}Step 3: Command Pipeline
function handleConnection(socket) {
const parser = new RESPParser();
socket.on('data', (data) => {
parser.feed(data);
let command;
while ((command = parser.parse())) {
// execute(command)
socket.write('+OK\r\n');
}
});
}6. Performance, High Concurrency & Backpressure
High Concurrency Behavior
RESP is designed to be O(1) for finding the end of a bulk string (using the length prefix). This allows Redis to jump through 10,000 commands in a single TCP packet without scanning every byte.
Backpressure & Bottlenecks
- Backpressure: If the client sends a 512MB bulk string (the limit), the parser must buffer it. If many clients do this, the server hits his memory limit.
- Bottlenecks: The primary bottleneck is Memory Bandwidth. Moving large strings from the network buffer to the data store consumes the memory bus.
7. Redis vs. Our Implementation: What we Simplified
- Zero-Copy Parsing: Redis (C) parses the protocol in-place in the read buffer using pointers. Our Node.js implementation uses
Buffer.slice()andBuffer.concat(), which create new memory objects and trigger Garbage Collection. - Static Buffers: Redis reuses memory blocks for common responses like
+OK\r\n. We create new strings for every response.
8. Why Redis is Optimized
Redis uses Length-Prefixing instead of Delimiters (like JSON's " or {). This means the parser knows exactly how many bytes to read before it even starts, allowing for extremely high-speed memory copies (memcpy).
- Optimization: Redis (C) uses
sscanfand pointer arithmetic. It's zero-copy where possible. Our Node.js implementation usesBuffer.concat()andindexOf(), which creates new objects and can be slower. - Error Handling: Redis has very specific error types (e.g.,
ERR syntax error). We often simplify into a catch-all.
8. Summary & Key Takeaways
- Length-Prefixing is Key: It makes parsing predictable and memory-safe.
- Stateful Parsing: Essential for handling real-world TCP volatility.
- Binary Safety: Unlike HTTP, RESP doesn't care if your data contains null bytes or special characters.
Next Step: Now that we can speak the protocol, let's implement the Storage Engine: GET, SET, and the Mystery of Key Expiration.