CRC-32 in Hardware
The 4-byte checksum at the end of every Ethernet frame, ZIP file, and NVMe command — computed by a 32-bit shift register with XOR feedback taps running at line rate in the NIC on your machine right now.
CRC-32 is not magic. It’s a linear feedback shift register — 32 flip-flops in a chain, with certain outputs XOR’d back into the input. The specific XOR positions are defined by the Ethernet polynomial 0x04C11DB7. Every 1-bit in that 32-bit constant is a wire from a flip-flop output back into the chain.
In this post, you’ll build a 4-bit LFSR, understand how the polynomial defines the wiring, and watch the CRC-32 accumulator process bytes one at a time — the same operation your NIC performs at 25 Gbps.
The Shift Register That Never Repeats (Almost)
A linear feedback shift register (LFSR) is a chain of flip-flops where the input to the first stage is the XOR of certain output stages. On each clock tick, all the bits shift one position, and a new bit is computed at the input from the XOR of the “tap” positions.
With the right choice of tap positions — determined by an irreducible polynomial over GF(2) — an n-bit LFSR cycles through every possible non-zero state exactly once before repeating. A 4-bit LFSR with taps at positions 0 and 3 visits all 24 − 1 = 15 states. A 32-bit LFSR visits over 4 billion.
Click Step below to advance the clock. Watch the four LEDs cycle through all 15 non-zero patterns. The circuit never visits the same pattern twice until it wraps around after 15 steps.
From LFSR to CRC
CRC-32 is a 32-bit LFSR where the “input” is XOR’d with the incoming data before being fed back. Instead of generating a pseudorandom sequence, you’re computing a checksum: the final register state after clocking in all the data bytes is the CRC. The Ethernet polynomial 0xEDB88320 (the reflected form of 0x04C11DB7) specifies exactly which of the 32 flip-flop outputs get XOR’d back — those are the tap positions.
The Polynomial Is the Wiring Diagram
The CRC-32 polynomial is usually written as 0x04C11DB7 in normal form, or 0xEDB88320 in reflected (bit-reversed) form. Both represent the same mathematical object — a degree-32 polynomial over GF(2):
The key insight: each term xk corresponds to a wire. Wherever the polynomial has a 1-coefficient, there is an XOR gate connecting flip-flop k to the feedback path. Wherever it has a 0-coefficient, there is no connection.
Look at the reflected polynomial in binary:
Count the orange bits: there are 16 tap positions out of 32. Each one is a physical XOR gate in the NIC’s CRC engine. The polynomial is literally the gate-level schematic.
In a real NIC running at 25 Gbps, the CRC-32 engine is unrolled to process multiple bits per clock cycle — but the mathematics is identical. The hardware is fixed at tape-out; there are no microcode decisions, no branch predictions, no cache misses. Just wires and XOR gates, computing the polynomial remainder at the speed of light.
Why this specific polynomial?
For a CRC to reliably detect errors, the generator polynomial must be irreducible over GF(2). The IEEE 802.3 committee chose 0x04C11DB7 in 1975 because it detects all single-bit errors, all double-bit errors, all odd numbers of errors, and all burst errors of 32 bits or fewer within a frame. Every Ethernet frame, ZIP archive, PNG image, and NVMe sector uses this exact same polynomial. It’s been protecting your data for 50 years.
The CRC-32 Accumulator
The standard CRC-32 algorithm processes one byte per clock cycle. Each byte is XOR’d into the low 8 bits of the running CRC register, then shifted through 8 rounds of the polynomial feedback. The result becomes the new register state for the next byte.
The circuit below shows one byte-step of CRC-32 with a register accumulating the state. Change the data byte and click Step to process it. The register’s initial value is 0xFF (truncated from the standard 0xFFFFFFFF init for the 8-bit demo). After all bytes of a message are processed, the final CRC state is XOR’d with 0xFFFFFFFF to produce the checksum you’d find in the Ethernet FCS field.
Why one byte per cycle?
Processing one bit at a time would mean 8 cycles per byte, which is too slow for multi-gigabit links. Modern NICs unroll the LFSR computation: instead of shifting 8 times sequentially, the hardware precomputes the effect of all 8 polynomial shifts in parallel and produces the result in a single clock cycle. At 100 Gbps, a NIC must compute CRC-32 over roughly 12.5 billion bytes per second. The unrolled hardware does exactly that with a fixed amount of combinational logic — no loops, no branches.
Verifying Against the Standard
The canonical CRC-32 test vector is the ASCII string "123456789". Every correct CRC-32 implementation must produce 0xCBF43926. This is how you know your hardware matches the spec that governs every Ethernet chip in the world.
You can verify this from the Simten editor using the crc-32 npm package. Paste the following into the editor on any page:
// CRC-32 test vector: "123456789" → 0xCBF43926 import CRC32 from 'crc-32'; // The standard test vector const result = CRC32.str("123456789"); console.log((result >>> 0).toString(16)); // cbf43926 // Verify individual bytes const bytes = [49, 50, 51, 52, 53, 54, 55, 56, 57]; // "1" through "9" console.log(bytes.map(b => String.fromCharCode(b)).join('')); // 123456789
The >>> 0 converts the signed 32-bit JavaScript integer to an unsigned value before converting to hex. CRC-32 is always an unsigned 32-bit number; JavaScript ’s bitwise operators work on signed 32-bit integers, so the coercion is necessary to print it correctly.
Where you’ll find CRC-32 in the wild
- Ethernet frames — the 4-byte FCS (Frame Check Sequence) field at the end of every packet
- ZIP archives — stored in the local file header and central directory for each file
- PNG images — each chunk ends with a CRC-32 of the chunk type and data
- NVMe / SATA — every sector has a CRC computed in the drive controller before writing to flash
- gzip / zlib — the checksum appended to compressed streams