2026-05-07
The asker patched neigh_rand_reach_time() on a 5.10 kernel to remove the random jitter (normally a uniform draw between 0.5x and 1.5x of base_reachable_time). Their goal: deterministic ARP/ND aging. They observe that even after their fix, the entry's effective lifetime is consistently ~10 seconds longer than the configured reachable time. Why?
Why this is interesting. It looks like a timer accuracy bug, but it almost certainly isn't. The Linux neighbor state machine is a multi-stage FSM, and "reachable time" only governs the first transition. What looks like a single timer is actually a chain of transitions, each with its own configurable delay. Without measuring which state transition you're observing, the math will always come out wrong.
The state machine. A reachable entry doesn't simply expire — it walks through:
NUD_REACHABLE → NUD_STALE after reachable_time elapses (no probes sent yet, entry is still usable).NUD_STALE → NUD_DELAY.NUD_DELAY waits delay_first_probe_time (default 5s) for a confirmation from upper layers (e.g., TCP ACK).NUD_DELAY → NUD_PROBE, which sends ucast_probes (default 3) spaced at retrans_time (default 1s).NUD_FAILED and get garbage-collected (subject to gc_staletime, default 60s).Add it up: 5s (DELAY) + ~3s (PROBE retransmits) ≈ 8–10s tacked onto whatever reachable_time the asker chose. That matches their observation almost exactly.
How to verify. Instrument with trace-cmd:
trace-cmd record -e neigh:neigh_update -e neigh:neigh_event_send_done
or watch ip -s neigh show while flipping /proc/sys/net/ipv4/neigh/ethX/{delay_first_probe_time,retrans_time_ms,ucast_solicit}. If reducing delay_first_probe_time shrinks the overshoot 1:1, the diagnosis is confirmed.
Gotchas.
neigh_periodic_work() recomputes tbl->reachable_time every base_reachable_time/2. Patching only neigh_rand_reach_time() removes per-entry jitter, but the table-level value is still recomputed — verify both paths use your fixed value./proc/sys/net/ipv4/neigh/<dev>/ shadow the default/ ones; setting only the latter on a running interface is a no-op.reachable_time is stored in jiffies; on HZ=250 with NSEC_PER_JIFFY rounding, sub-tick error is ~4ms — irrelevant here, but worth knowing if the asker later chases millisecond-level precision.NUD_PERMANENT via ip neigh replace ... nud permanent sidesteps the FSM entirely and is far less invasive than kernel patches.