memcmp on Structs: The Padding Bytes That Lie About Equality2026-06-06
This program deduplicates records. Two records are "equal" if their type and id match. Easy: just memcmp the whole struct. We build two records the same way and expect count_unique to return 1.
#include <stdio.h>
#include <string.h>
#include <stdbool.h>
typedef struct {
char type; // 1 byte
int id; // 4 bytes (3 bytes of padding sit before it)
} Record;
static bool records_equal(const Record *a, const Record *b) {
return memcmp(a, b, sizeof(Record)) == 0;
}
static int count_unique(const Record *r, int n) {
int unique = 0;
for (int i = 0; i < n; i++) {
bool seen = false;
for (int j = 0; j < i; j++)
if (records_equal(&r[i], &r[j])) { seen = true; break; }
if (!seen) unique++;
}
return unique;
}
int main(void) {
Record a = { 'X', 42 };
Record b; // uninitialized
b.type = 'X';
b.id = 42;
Record list[2] = { a, b };
printf("%d\n", count_unique(list, 2)); // expected: 1
return 0;
}
On most 32-bit-aligned platforms, Record is 8 bytes, not 5. The compiler inserts 3 bytes of padding between type and id so that id sits on a 4-byte boundary. Those padding bytes are part of sizeof(Record) — and memcmp compares every byte.
Look at how a and b are constructed:
Record a = { 'X', 42 }; — an initializer list. The C standard guarantees the padding bytes are zeroed.Record b; b.type = 'X'; b.id = 42; — declared without an initializer, then assigned field by field. The padding bytes are never touched and hold whatever junk was on the stack.So a reads as 58 00 00 00 2A 00 00 00 and b reads as 58 ?? ?? ?? 2A 00 00 00. The visible fields are identical; the invisible padding isn't. memcmp returns non-zero, the dedup logic thinks they're different, and count_unique returns 2. Worse — it's non-deterministic. Compile with -O2, change unrelated code, run on a different machine, and the bug can vanish, only to return when a customer hits it in production.
The same trap bites you when you try to hash a struct with SHA256(&rec, sizeof rec), write it to a binary file, or send it over the network. Padding bytes are a phantom field you didn't declare.
Two options. Best: compare field-by-field — the compiler only looks at what you declared:
static bool records_equal(const Record *a, const Record *b) {
return a->type == b->type && a->id == b->id;
}
If you really need a generic byte-wise compare (e.g., for hashing many struct types), zero the whole struct before assigning fields:
Record b;
memset(&b, 0, sizeof b); // wipe the padding too
b.type = 'X';
b.id = 42;
Note that Record b = {0}; also works and is idiomatic. Be aware that copying via = or passing by value is not required to preserve padding bytes — the compiler can copy field-wise. So even Record c = a; followed by memcmp(&a, &c, sizeof a) is technically not portable.
Rule: treat sizeof(struct) as "the bytes the ABI owns," not "the bytes you wrote." If your equality, hash, or serialization touches all of them, you've coupled correctness to layout decisions the compiler makes behind your back.
memcmp on structs compares the padding bytes too — and those bytes are uninitialized garbage unless you zero them, so identical-looking records can compare unequal at random.
