Go's WaitGroup.Add Inside the Goroutine: The Wait That Doesn't Wait

2026-05-21

This function is supposed to process every item concurrently and return all results once every worker has finished. It compiles, it passes the unit tests on your laptop, and it ships:

func processItems(items []Item) []Result {
    var wg sync.WaitGroup
    var mu sync.Mutex
    var results []Result

    for _, item := range items {
        item := item
        go func() {
            wg.Add(1)
            defer wg.Done()
            r := transform(item)
            mu.Lock()
            results = append(results, r)
            mu.Unlock()
        }()
    }
    wg.Wait()
    return results
}

Then it runs on a CPU-quota'd container in CI. Sometimes you get all the results. Sometimes you get half. Sometimes you get zero — and no panic, no error, just silently dropped work and a smiling green build.

The Bug

wg.Add(1) is inside the goroutine. The parent loop spawns goroutines and then races directly to wg.Wait(). If the Go scheduler hasn't yet run any of those new goroutines — easy on a single-vCPU container, or when the spawning goroutine just keeps its OS thread — the counter is still zero when Wait() is reached. Wait returns instantly, processItems returns an empty slice, and the workers run afterwards, mutating results while the caller has already moved on. Worse than data loss: the late appends race with whatever the caller does next, corrupting unrelated state.

The sync documentation states the rule directly: "calls with a positive delta that occur when the counter is zero must happen before a Wait." The trap is that the buggy version reads beautifully — Add and Done bracketing the work feels symmetric and self-contained. But "must happen before" is about program-order sequencing in the goroutine that calls Wait, not about the lexical position of the call.

This race is timing-dependent, so it almost always passes on a developer machine with eight idle cores. It surfaces under contention, under CPU quotas, or once the input slice gets large enough that the scheduler delays a few worker starts.

The Fix

Increment the counter in the same goroutine that calls Wait — before go, not inside it:

func processItems(items []Item) []Result {
    var wg sync.WaitGroup
    results := make([]Result, len(items))

    wg.Add(len(items))
    for i, item := range items {
        i, item := i, item
        go func() {
            defer wg.Done()
            results[i] = transform(item)
        }()
    }
    wg.Wait()
    return results
}

Two wins bundled in: Add happens once, sequenced before any goroutine launches, so Wait can never observe a premature zero; and each worker owns a distinct slot in a pre-sized slice, so the mutex disappears. If you genuinely need append, just hoist wg.Add(1) onto the line above go func() inside the loop — what matters is that the increment runs in the parent goroutine.

Modern go vet flags this exact pattern under its waitgroup analyzer, and go test -race will often catch the post-Wait writes. Wire both into CI and this entire family of bugs stops shipping.

Key Takeaway: wg.Add must execute in the goroutine that calls wg.Wait — never inside the goroutine it is counting.

All newsletters