2026-05-27
A cosmic ray strikes a flip-flop in a satellite's controller. A single high-energy particle deposits enough charge to flip a stored bit from 0 to 1. In a normal design, that corrupted bit propagates downstream, the state machine wanders into an illegal state, and the satellite tumbles. Triple Modular Redundancy (TMR) is how hardware survives this without rebooting.
The idea is brutally simple: instantiate the same logic three times, run all three copies on identical inputs, and feed their outputs into a majority voter. The voter computes Y = (A·B) + (B·C) + (A·C) — output is whatever at least two of the three agree on. If one copy gets corrupted, the other two outvote it and the system marches on.
TMR comes in three flavors with different tradeoffs:
Real example: The Mars rovers' RAD750 processors don't use TMR internally (they use radiation-hardened silicon and ECC), but the Xilinx Virtex FPGAs on instruments like Curiosity's ChemCam use TMR synthesis tools that automatically triplicate user logic. Designs grow ~3.5× in area (3× logic plus voters) but survive single-event upsets that would crash unhardened parts within hours of launch.
The catch — voter reliability: If your voter is a single XOR-AND tree, it's now the single point of failure. Hardened designs use triplicated voters, each driving its own downstream path, so no single particle strike can corrupt the consensus.
Rule of thumb for reliability: If a single module has failure probability p per unit time, naive TMR has failure probability roughly 3p² (any two of three failing simultaneously). For p = 10⁻⁶ per hour, TMR drops you to ~3×10⁻¹² per hour — a million-fold improvement. But this only holds if failures are independent. Common-mode failures (shared power rail, shared clock, particle striking the voter) destroy the math entirely.
Scrubbing: TMR masks errors but doesn't fix the broken copy. Without intervention, a second strike on a different copy creates two corrupted modules and the voter outputs garbage. Production rad-hard systems pair TMR with configuration scrubbing: a background process that periodically reads FPGA config memory, compares against a golden copy in radiation-hardened ROM, and rewrites any flipped bits.
