Source-Synchronous vs System-Synchronous Interfaces: How Hardware Sends a Clock Alongside the Data

2026-06-02

When two chips exchange parallel data at hundreds of MHz, the question of which clock samples the data becomes the entire design problem. Two philosophies dominate: system-synchronous and source-synchronous.

In a system-synchronous interface, both chips share one global clock distributed by the board. The transmitter launches data on a clock edge, and the receiver samples it on the next edge of the same clock. This is how SDRAM worked in the 1990s at 66–133 MHz. The problem: as you push frequency up, the data takes a fixed flight time across the PCB trace (roughly 150 ps/inch), but so does the clock — and they don't necessarily match. Worse, the clock arrives at both chips at slightly different times (skew), and any jitter on the clock tree shows up directly in your timing budget. By the time you hit 200 MHz, the eye closes.

In a source-synchronous interface, the transmitter sends its own clock (often called a strobe) alongside the data on a parallel trace. The receiver uses that strobe — not the system clock — to capture the bits. The genius is that the data and clock traverse nearly identical paths, so flight time and most board variation cancel out. You're left only with intra-bus skew (mismatch between data lanes and the strobe within the same source), which is far smaller and tunable.

Concrete example: DDR SDRAM. The memory controller drives DQ (data) and DQS (data strobe) together to the DRAM during writes. The DRAM samples DQ on both edges of DQS. During reads, the DRAM drives DQS back to the controller alongside the returned data. The controller then internally delays DQS by 90° (using a DLL) to place its edges in the middle of the DQ eye — a technique called center-aligned sampling. This is why DDR4 can hit 3200 MT/s on a PCB where a system-synchronous bus would die at 400 MHz.

Rule of thumb: below ~200 MHz, system-synchronous is simpler and cheaper. Above ~400 MHz, source-synchronous is mandatory. In between, it's a board-design judgment call based on trace length and skew budget. For parallel buses above ~1 Gbps per pin, you abandon both and go to embedded-clock SerDes.

The hidden cost of source-synchronous: you need a per-byte-lane training sequence at link bring-up to calibrate the DQS delay against PVT variation. That's why DDR controllers spend the first few microseconds after reset doing write leveling and read training before they accept real traffic.

See it in action: Check out How Computers Work - Oversimplified by Conner Ardman to see this theory applied.
Key Takeaway: Source-synchronous interfaces send a clock alongside the data so flight-time variation cancels out, enabling parallel buses to run far faster than a shared system clock allows.

All newsletters