Integer Overflow in Average Calculation

2026-04-23

You're implementing binary search and a colleague flags a bug in production: the search occasionally returns wrong results or crashes, but only on very large arrays. You stare at the code. It looks textbook-perfect.

// Returns the index of `target` in sorted array `arr`,
// or -1 if not found.
int binary_search(int arr[], int size, int target) {
    int low = 0;
    int high = size - 1;

    while (low <= high) {
        int mid = (low + high) / 2;

        if (arr[mid] == target) {
            return mid;
        } else if (arr[mid] < target) {
            low = mid + 1;
        } else {
            high = mid - 1;
        }
    }

    return -1;
}

This is the version you'll find in most algorithms textbooks — including Bentley's Programming Pearls, where he famously noted that most professional programmers get binary search wrong. The irony? Even his published version contained this bug, and it went unnoticed for twenty years.

Try to spot it before scrolling down. The logic is correct. The off-by-one cases are handled. The loop terminates. And yet, this code has a bug that has shipped in the JDK, in countless production systems, and probably in code you've written yourself.

The Bug

The problem is on this line:

int mid = (low + high) / 2;

When low + high exceeds INT_MAX (2,147,483,647 for 32-bit integers), the addition overflows. In C, signed integer overflow is undefined behavior — the compiler is free to do anything, including returning a negative mid value that indexes out of bounds. In Java, signed overflow wraps around predictably, so mid becomes negative, throwing an ArrayIndexOutOfBoundsException.

This only triggers when searching arrays larger than about one billion elements — which is why it lurked undetected for decades. But in the era of large datasets and 64-bit addressing, arrays of that size are increasingly common.

Let's trace it. Suppose low = 1,500,000,000 and high = 2,000,000,000:

The fix is deceptively simple. Instead of adding low and high (which can overflow), compute the distance between them (which cannot, since high >= low is always true inside the loop) and add half of that to low:

int binary_search(int arr[], int size, int target) {
    int low = 0;
    int high = size - 1;

    while (low <= high) {
        int mid = low + (high - low) / 2;  // overflow-safe

        if (arr[mid] == target) {
            return mid;
        } else if (arr[mid] < target) {
            low = mid + 1;
        } else {
            high = mid - 1;
        }
    }

    return -1;
}

Since high - low is always non-negative within the loop, and both values fit in an int, the subtraction and subsequent division can never overflow.

An alternative in languages with unsigned right shift is mid = (low + high) >>> 1, which treats the overflowed bit pattern as unsigned before shifting — this is the fix Joshua Bloch applied to the JDK in 2006. But the low + (high - low) / 2 form is portable, readable, and works in every language.

This bug is a reminder that arithmetic on indices is not just arithmetic on abstract integers — it's arithmetic on bounded machine words, and the bounds matter.

Key Takeaway: Never compute a midpoint as (low + high) / 2 — use low + (high - low) / 2 to avoid integer overflow on large inputs.

All newsletters