2026-05-03
In April 1992, Ron Rivest of MIT — the "R" in RSA — published a 21-page document that would become one of the most widely deployed algorithms in computing history, and eventually one of the most publicly broken. RFC 1321 specifies MD5, a message-digest algorithm that takes an arbitrary-length input and produces a fixed 128-bit (16-byte) hash. It was designed as a stronger successor to MD4, which Rivest himself had also created.
The problem it solves is deceptively simple: given any block of data, produce a compact fingerprint such that even a one-bit change in the input produces a wildly different output. This enables verification of file integrity, password storage, digital signatures, and dozens of other applications that need to answer the question "has this data been tampered with?"
The design is elegant in its simplicity. MD5 processes input in 512-bit (64-byte) blocks through four rounds of 16 operations each, for a total of 64 operations per block. Each round uses a different nonlinear function:
(B AND C) OR (NOT B AND D)(B AND D) OR (C AND NOT D)B XOR C XOR DC XOR (B OR NOT D)The input is padded so its length is congruent to 448 mod 512, then a 64-bit representation of the original length is appended. This Merkle–Damgård construction was standard for the era. The per-round constants are derived from the sine function — T[i] = floor(2^32 × abs(sin(i+1))) — a "nothing up my sleeve" technique meant to assure users that no backdoor was hidden in the constants.
Why MD5 broke, and why it took so long. The first theoretical weaknesses appeared in 1996 when Hans Dobbertin found collisions in MD5's compression function. In 2004, Xiaoyun Wang and her team demonstrated practical collision attacks — they could find two distinct inputs producing the same MD5 hash in under an hour on a standard PC. By 2008, researchers used MD5 collisions to forge a rogue CA certificate, demonstrating real-world danger. RFC 6151 (2011) formally updated RFC 1321 to document its security considerations, stating MD5 must not be used for digital signatures or certificate validation.
Yet MD5 refuses to die. This is the part that matters to working engineers. You will encounter MD5 in production systems in 2026. It lives in:
The critical distinction engineers must understand: MD5 is broken for collision resistance but remains a perfectly fast, functional checksum for non-adversarial integrity verification. If you're checking whether a file got corrupted during transfer and nobody is trying to attack you, MD5 is fine. If you're verifying that a file hasn't been maliciously modified, MD5 is dangerously inadequate — use SHA-256 or better.
The RFC itself is a fascinating artifact. It includes a complete C implementation in the appendix — unusual for RFCs — making it perhaps the most copy-pasted reference implementation in history. Rivest's writing is terse, precise, and unburdened by the committee-driven verbosity of later standards. It reads more like a math paper with code attached.
