Why Does the Linux Neighbor Timer Expire ~10s Later Than Expected Even with a Fixed Reachable Time?

2026-05-07

Stack Overflow: View Question

Tags: c, timer, linux-kernel, arp, neighbours

Score: 4 | Views: 122

The asker patched neigh_rand_reach_time() on a 5.10 kernel to remove the random jitter (normally a uniform draw between 0.5x and 1.5x of base_reachable_time). Their goal: deterministic ARP/ND aging. They observe that even after their fix, the entry's effective lifetime is consistently ~10 seconds longer than the configured reachable time. Why?

Why this is interesting. It looks like a timer accuracy bug, but it almost certainly isn't. The Linux neighbor state machine is a multi-stage FSM, and "reachable time" only governs the first transition. What looks like a single timer is actually a chain of transitions, each with its own configurable delay. Without measuring which state transition you're observing, the math will always come out wrong.

The state machine. A reachable entry doesn't simply expire — it walks through:

Add it up: 5s (DELAY) + ~3s (PROBE retransmits) ≈ 8–10s tacked onto whatever reachable_time the asker chose. That matches their observation almost exactly.

How to verify. Instrument with trace-cmd:

trace-cmd record -e neigh:neigh_update -e neigh:neigh_event_send_done

or watch ip -s neigh show while flipping /proc/sys/net/ipv4/neigh/ethX/{delay_first_probe_time,retrans_time_ms,ucast_solicit}. If reducing delay_first_probe_time shrinks the overshoot 1:1, the diagnosis is confirmed.

Gotchas.

The challenge: The "extra 10 seconds" isn't timer drift — it's the rest of the neighbor FSM (DELAY + PROBE) that the asker didn't realize was part of the lifetime they were measuring.

All newsletters