Building a CPU from Scratch
From a single NAND gate to a working 6502 processor running C code — every circuit is live and interactive. Click the switches. Watch the signals propagate. Build intuition for how computers actually work.
Starting from Nothing: The NAND Gate
Every computer ever built — from the Apollo Guidance Computer to the M4 chip in your MacBook — can be constructed from a single type of logic gate: NAND.
A NAND gate outputs 0 only when both its inputs are 1. That’s it. From this one building block, we can create every other logic gate, and from those gates, an entire computer.
Let’s start by building the basic gates. Click the switches to toggle inputs and watch the output LED respond.
NOT — The Inverter
Wire both NAND inputs together. When the input is 1, both NAND inputs are 1, so the output is 0. Inversion!
AND Gate
NAND followed by NOT. The double negation cancels out, giving us a gate that outputs 1 only when both inputs are 1.
OR Gate
De Morgan’s theorem in action: NOT each input, then NAND the results. The output is 1 when either input is 1.
XOR — Exclusive OR
The “difference detector” — outputs 1 only when inputs are different. Built from 4 NAND gates. This one is essential for arithmetic.
Composition: Building Arithmetic
Now for the magic trick of digital design: composition. We take the gates we just built and wire them together into bigger circuits. Those bigger circuits become building blocks for even bigger ones.
Let’s build an adder — the circuit that lets a CPU do math.
Half Adder
Adds two single bits. The sum output is the XOR (are the bits different?), and the carry output is the AND (are both bits 1?). Try it: 1+1 = 10 in binary — sum is 0, carry is 1.
Full Adder
The real workhorse. A full adder handles three inputs: A, B, and a carry-in from the previous column. Chain 8 of these together and you can add two bytes. Chain 32 and you have the adder in a modern CPU.
Multiplexer
A data selector: the sel switch chooses which input (A or B) passes through to the output. Muxes are everywhere in CPUs — they’re how the control unit routes data between components.
Memory: Teaching Circuits to Remember
Everything so far has been combinational — the outputs depend only on the current inputs. But a computer needs to remember things. To store a bit, we need feedback: a circuit whose output connects back to its own input.
This is where the clock enters the picture. Sequential circuits use a clock signal to synchronize state changes. Click the Tick button to advance the clock by one cycle.
SR Latch
The simplest memory cell: two NOR gates cross-coupled. Toggle S (Set) to store a 1, toggle R (Reset) to clear it. Notice how the output stays after you release the input — that’s memory!
D Flip-Flop
The workhorse of digital memory. The D flip-flop captures whatever value is on the D input when the clock ticks, and holds it until the next tick. Set the switch, then click Tick to capture the value.
4-Bit Register
Four D flip-flops in parallel, sharing a clock. Set some switches, click Tick, and the register captures all four bits at once. This is exactly how CPU registers work — just wider (8, 16, 32, or 64 bits).
Putting It Together: A Counter
Now we combine everything. A counter uses flip-flops, NOT gates, XOR gates, and AND gates working together. Bit 0 always toggles. Bit 1 toggles when bit 0 is 1. Bit 2 toggles when bits 0 and 1 are both 1. The AND gates form a carry chain — the same idea as addition.
Click Tick repeatedly or hit Auto to watch it count. The four LEDs show the binary value: 0000, 0001, 0010, 0011, ... up to 1111 (15), then it wraps around.
A program counter in a CPU works just like this — it counts through memory addresses, fetching one instruction at a time. The only difference is width (16 bits for the 6502) and the ability to load a new value (for jumps and branches).
Scaling Up: A 4-Bit Adder
We built a full adder that adds three single bits. But a CPU needs to add numbers. The trick is chaining: connect the carry-out of each full adder to the carry-in of the next. Four full adders in a row give you a 4-bit adder that can add numbers 0–15.
This is called a ripple-carry adder because the carry “ripples” from the least significant bit to the most significant. The 6502 uses exactly this pattern, just wider — 8 bits for its ALU, 16 bits for address arithmetic.
Try it: set the A switches (a3–a0) to 0011 (3) and the B switches to 0101 (5). You should see the sum LEDs show 1000 (8).
The ALU: A Calculator Chip
An adder can only add. A real CPU needs to do logic too — AND, OR, XOR. The Arithmetic Logic Unit (ALU) computes all of these in parallel and uses a multiplexer to pick the result based on a control signal.
Below is a 1-bit ALU slice. It has an adder, AND, OR, and XOR gate all wired to the same inputs. The two op switches select which result passes through: 00 = ADD, 01 = AND, 10 = OR, 11 = XOR.
Chain 8 of these slices (with carry linking the adders) and you have the complete ALU of the 6502. The CPU’s control unit just sets the op bits based on which instruction it decoded. Same circuit, different operation — that’s what makes it programmable.
RAM: Read/Write Memory
Registers store a few values. A CPU needs thousands of addressable bytes. That’s RAM — an array of memory cells, each with an address. You put an address on the bus and the data at that address appears on the output.
The key insight: reads are instant (combinational) but writes need a clock tick. Change the addr input and data_out updates immediately. To write: set addr, set data_in, turn we (write-enable) ON, then Tick. Turn we OFF and change the address to read it back.
Try it: write the value 42 to address 1, then write 7 to address 2. Switch between addresses to see both values are remembered. The 6502 has 2 KB of RAM wired to its address bus — same idea, just 2,048 locations instead of 256.
The 6502: A Real CPU
Everything we’ve built — gates, adders, registers, counters, and an ALU — are the building blocks of a real processor. The MOS 6502 (1975) powered the Apple II, Commodore 64, and NES. It has just 3,510 transistors and an elegant instruction set. Its ALU is wider (8 bits), its program counter longer (16 bits), and it has a control unit that decodes 56 instructions — but the pieces are the same.
Below is a complete 6502 system simulated at the gate level — over 5,500 lines of TypeScript, compiled and running in your browser. It has a CPU, RAM, ROM, and a memory-mapped console output at address $F000.
The ROM is pre-loaded with C programs compiled with cc65, a C compiler targeting the 6502. Click Run to watch the CPU execute real compiled C code, one cycle at a time.
What you just saw is the same process that happens billions of times per second in the device you’re reading this on. A clock ticks. The program counter increments. An instruction is fetched from memory. The control unit decodes it. The ALU computes. Results are stored. Repeat.
The only difference between this 6502 and a modern CPU is scale: more transistors, wider buses, deeper pipelines, more cache. But the fundamentals — NAND gates all the way down — haven’t changed.