Daily Software Engineering: The Idempotent Consumer Pattern: Surviving Duplicate Messages Without Corrupting State

The Idempotent Consumer Pattern: Surviving Duplicate Messages Without Corrupting State

2026-06-05

Every message broker worth using guarantees at-least-once delivery. That means duplicates. A consumer crashes after processing but before acking. A network blip retriggers a redelivery. A producer retries on a flaky connection. If your handler isn't built for it, you'll double-charge a customer, double-ship an order, or double-credit an account.

The Idempotent Consumer Pattern makes processing the same message N times produce the same result as processing it once. The recipe is boring on purpose: every message carries a stable unique ID, the consumer records which IDs it has processed, and it checks that record before doing the work.

The three flavors:

Dedup table. Insert the message ID into a processed_messages table inside the same transaction as the business write. On duplicate, the unique constraint fires and you skip. Simplest, most durable.
Natural idempotency. Structure the operation so repetition is harmless. UPDATE orders SET status='shipped' WHERE id=X is naturally idempotent. UPDATE balance SET amount = amount - 10 is not.
Versioned writes. Include a version or event sequence number. Only apply the write if the incoming version is greater than the stored one. Useful for state-replacement workloads.

Concrete example. A payments service consumes a charge_requested event from Kafka. Naive handler: call Stripe, write to payments, ack. If the consumer crashes after Stripe but before ack, Kafka redelivers — you charge the customer twice. Fix it with a dedup table:

Begin transaction
INSERT INTO processed_events (event_id) VALUES ($1) — fails if duplicate
Call Stripe with the event_id as the idempotency key (Stripe will also dedupe)
Insert payment row, commit, ack

Note the layered defense: even if the dedup row commits but the broker ack fails, Stripe's idempotency key prevents a second charge on redelivery.

Rule of thumb for the dedup table. Retention should cover your broker's maximum redelivery window plus a generous safety margin — typically 7 days for Kafka, 30 days for SQS-backed systems with DLQs. Index on event_id, partition by day, and prune the tail with a scheduled job. Storage cost: ~50 bytes per row × peak throughput × retention. At 1k msg/sec for 7 days, that's roughly 30 GB — cheap insurance.

Common traps: using a non-stable ID (like a hash of payload that changes when a producer adds a field), checking the dedup table outside the business transaction (race condition), and forgetting that side effects to external systems aren't covered by your local transaction — that's where you need the downstream service's own idempotency key.

Key Takeaway: At-least-once delivery is the default; idempotent consumers — via dedup tables, natural idempotency, or versioned writes — are how you survive it without corrupting state.

All newsletters