Claude Shannon's "A Mathematical Theory of Communication": The 1948 Paper That Invented the Bit — and Made the Internet Possible

2026-05-25

In July and October of 1948, a 32-year-old Bell Labs engineer named Claude Elwood Shannon published a two-part paper in the Bell System Technical Journal titled "A Mathematical Theory of Communication." It was not a patent — but it was filed alongside, and underpinned, a remarkable run of Bell Labs patents on pulse-code modulation, error-correcting codes, and noise-resistant signaling that Shannon and his colleagues secured in the late 1940s and early 1950s (including US 2,605,361, Shannon's 1952 patent on differential PCM). The paper itself did something stranger than any single device patent: it invented the conceptual machinery that made every later communication patent possible.

Shannon's central move was to throw away meaning. A telegram about a death and a telegram ordering groceries, he argued, were the same engineering problem: symbols passing through a channel corrupted by noise. By stripping semantics, he could treat communication as pure mathematics. He defined a unit — the binary digit, or "bit" — borrowing the name from his Bell Labs colleague John Tukey. He defined entropy as a measure of information content, borrowing the formula (and, legend has it on von Neumann's advice, the name) from statistical thermodynamics: H = -Σ p(x) log p(x).

Then came the bombshell: the Noisy Channel Coding Theorem. Shannon proved that every communication channel has a fixed capacity C (in bits per second), and that as long as you transmit below C, you can achieve arbitrarily low error rates by clever encoding — no matter how noisy the channel is. Above C, no scheme works. Before Shannon, engineers assumed noise inevitably corrupted messages and you simply traded power for clarity. Shannon proved this was wrong. Reliability was a coding problem, not a power problem.

He didn't say how to build such codes. He just proved they existed. That gauntlet launched 75 years of research: Hamming codes (1950), Reed-Solomon (1960, the patent behind CDs and QR codes), convolutional codes, turbo codes (1993), and finally LDPC and polar codes — the latter actually approaching the Shannon limit and now standard in 5G.

The modern relevance is total. Every time your phone pulls a clean signal from a weak cell tower, every Wi-Fi packet that survives a microwave's interference, every Voyager image transmitted from beyond Neptune at fractions of a watt, every QR code you scan after coffee spilled on it — all of these work because Shannon proved they could work, and gave engineers the yardstick (C) to measure how close they were getting. Modern 5G systems operate within a fraction of a dB of the Shannon limit. There is no "better than Shannon." There is only "closer to Shannon."

The same paper also formalized data compression. Shannon's source coding theorem proved that any data source has an irreducible entropy below which lossless compression is impossible. Every ZIP file, every MP3, every H.265 stream is a practical attempt to approach that bound. The whole field of data compression is, essentially, applied Shannon.

What's surprising is the era. In 1948 the transistor had just been invented (also at Bell Labs, eight months earlier). There was no internet, no digital storage, no cell phones, no satellites. Shannon described their mathematical foundations before any of them existed — sketching the rules of a game whose players hadn't been born.

Key Takeaway: Shannon's 1948 paper didn't invent a device — it invented the limit, proving that every communication channel has a maximum capacity and that clever coding (not more power) is the path to it, a yardstick every modem, Wi-Fi chip, and deep-space probe has been chasing for 75 years.

All newsletters