Hacker News Deep Cuts: Show HN: Avoiding "if" makes Quicksort faster

Show HN: Avoiding "if" makes Quicksort faster

2026-05-07

Link: https://easylang.online/blog/qsort

Quicksort is one of the most studied algorithms in computer science — taught in every undergraduate curriculum, implemented in every standard library, and the subject of countless optimization papers. So when someone claims they can make it meaningfully faster by removing a single language construct, that's worth a closer look.

The premise here is delightfully concrete: branches are expensive on modern CPUs. When the branch predictor guesses wrong, the pipeline stalls and dozens of cycles evaporate. In a partitioning loop where the comparison outcome is essentially random (especially on already-disordered data), the predictor has no signal to latch onto, and the mispredict rate creeps toward 50%. That's catastrophic for a hot loop.

The fix is branchless programming: replace the conditional swap with arithmetic that produces the same result regardless of which value is larger. Something like:

Compute a mask from the comparison (0 or all-ones)
Use the mask to conditionally select values via XOR or arithmetic
Let the CPU execute both "branches" speculatively without ever branching

This pattern shows up in cryptographic code (where it's mandatory for constant-time operations to avoid timing side-channels), in SIMD-heavy numerical kernels, and increasingly in mainstream sorts — Andrei Alexandrescu's talks on this topic and the BlockQuicksort paper from 2016 are the canonical references. Modern pdqsort implementations in Rust and C++ use similar techniques.

What makes this particular post interesting is the venue — easylang is a teaching language, which means the author is demonstrating these techniques in an accessible context rather than buried in C++ template metaprogramming or hand-tuned assembly. For developers who've heard "branches are slow" but never seen the transformation applied to an algorithm they already understand, this is a great gateway.

The deeper lesson worth internalizing: algorithmic complexity is not the only thing that matters. Two O(n log n) sorts can differ by 2-3x in real-world performance based purely on how friendly they are to the CPU's branch predictor and cache hierarchy. The textbook view of algorithms — counting comparisons and swaps — was calibrated for hardware that no longer exists.

For anyone writing performance-sensitive code, this reframes optimization away from "find a better algorithm" toward "make the existing algorithm friendlier to the silicon."

Why it deserves more upvotes: A clean, accessible demonstration that branch prediction matters as much as Big-O — applied to the one algorithm every programmer thinks they already understand.

All newsletters