Endianness and Data Representation

2026-04-27

Every multi-byte value in memory has a byte order. Little-endian stores the least significant byte at the lowest address; big-endian stores the most significant byte first. x86 and ARM (in its default mode) are little-endian. Network protocols (TCP/IP) are big-endian by convention — hence the term "network byte order."

Consider the 32-bit integer 0xDEADBEEF stored at address 0x1000:

This matters every time bytes cross a boundary: network sockets, file formats, shared memory between heterogeneous systems, or reading a binary dump. A classic bug is writing a struct to a file on an x86 machine and reading it on a big-endian MIPS device — every multi-byte field is silently scrambled.

Practical tools: POSIX gives you htonl(), htons(), ntohl(), ntohs() for converting between host and network byte order. In Linux, <endian.h> provides htobe32(), le16toh(), and friends for explicit conversion. Modern code should use these rather than hand-rolled bit shifts — they compile to single BSWAP instructions on x86 or are no-ops when source and target order match.

Real-world example: The PNG file format stores chunk lengths as 4-byte big-endian integers. If you're writing a minimal PNG parser on x86, reading the length field directly with *(uint32_t*)ptr gives you a byte-swapped value. You need be32toh() or equivalent. Get this wrong and a 1024-byte chunk (0x00000400) becomes 0x00040000 — 262,144 bytes — and your parser reads past the buffer.

Rule of thumb: If you're serializing an N-byte integer to a wire format or file, you need exactly N-1 byte-order decisions to get right. For a struct with 5 multi-byte fields, that's 5 potential swap points. Miss one and you have a bug that only manifests on a different architecture — or never, until someone ports your code.

Beyond integers: Floating-point values also have byte order (IEEE 754 doesn't mandate endianness). In practice, float endianness matches integer endianness on nearly all modern hardware, but protocol specs like some industrial SCADA formats use mixed endianness — big-endian floats on little-endian systems — requiring manual byte swaps via memcpy through a uint32_t intermediate, not pointer casting, to avoid undefined behavior from strict aliasing violations.

One defensive pattern: define your serialization format as explicitly big- or little-endian, and always convert at the boundary. Never store raw host-order values in files or on the wire. This makes your code portable by construction rather than by testing.

See it in action: Check out Endianness Explained by Aaron Yoo to see this theory applied.
Key Takeaway: Always convert byte order explicitly at serialization boundaries — relying on your host's native endianness is a latent portability bug.

All newsletters