Stack Overflow Unanswered: Why Does the Linux Neighbor Timer Expire ~10s Later Than Expected Even with a Fixed Reachable Time?

Why Does the Linux Neighbor Timer Expire ~10s Later Than Expected Even with a Fixed Reachable Time?

2026-05-07

Stack Overflow: View Question

Tags: c, timer, linux-kernel, arp, neighbours

Score: 4 | Views: 122

The asker patched neigh_rand_reach_time() on a 5.10 kernel to remove the random jitter (normally a uniform draw between 0.5x and 1.5x of base_reachable_time). Their goal: deterministic ARP/ND aging. They observe that even after their fix, the entry's effective lifetime is consistently ~10 seconds longer than the configured reachable time. Why?

Why this is interesting. It looks like a timer accuracy bug, but it almost certainly isn't. The Linux neighbor state machine is a multi-stage FSM, and "reachable time" only governs the first transition. What looks like a single timer is actually a chain of transitions, each with its own configurable delay. Without measuring which state transition you're observing, the math will always come out wrong.

The state machine. A reachable entry doesn't simply expire — it walks through:

NUD_REACHABLE → NUD_STALE after reachable_time elapses (no probes sent yet, entry is still usable).
On the next outbound packet that hits the entry, NUD_STALE → NUD_DELAY.
NUD_DELAY waits delay_first_probe_time (default 5s) for a confirmation from upper layers (e.g., TCP ACK).
If no confirmation arrives, NUD_DELAY → NUD_PROBE, which sends ucast_probes (default 3) spaced at retrans_time (default 1s).
Only after probes fail does the entry hit NUD_FAILED and get garbage-collected (subject to gc_staletime, default 60s).

Add it up: 5s (DELAY) + ~3s (PROBE retransmits) ≈ 8–10s tacked onto whatever reachable_time the asker chose. That matches their observation almost exactly.

How to verify. Instrument with trace-cmd:

trace-cmd record -e neigh:neigh_update -e neigh:neigh_event_send_done

or watch ip -s neigh show while flipping /proc/sys/net/ipv4/neigh/ethX/{delay_first_probe_time,retrans_time_ms,ucast_solicit}. If reducing delay_first_probe_time shrinks the overshoot 1:1, the diagnosis is confirmed.

Gotchas.

neigh_periodic_work() recomputes tbl->reachable_time every base_reachable_time/2. Patching only neigh_rand_reach_time() removes per-entry jitter, but the table-level value is still recomputed — verify both paths use your fixed value.
Per-interface knobs in /proc/sys/net/ipv4/neigh/<dev>/ shadow the default/ ones; setting only the latter on a running interface is a no-op.
reachable_time is stored in jiffies; on HZ=250 with NSEC_PER_JIFFY rounding, sub-tick error is ~4ms — irrelevant here, but worth knowing if the asker later chases millisecond-level precision.
If the goal is "pin entries forever," NUD_PERMANENT via ip neigh replace ... nud permanent sidesteps the FSM entirely and is far less invasive than kernel patches.

The challenge: The "extra 10 seconds" isn't timer drift — it's the rest of the neighbor FSM (DELAY + PROBE) that the asker didn't realize was part of the lifetime they were measuring.

All newsletters