CPU Work and GPU Work

2026-05-03

Link: https://www.talhoffman.com/2026/05/03/cpu-work-and-gpu-work/

HN Discussion: 2 points, 0 comments

Tal Hoffman's blog has a track record of producing clear, visual explanations of graphics programming and GPU architecture concepts. This post tackles one of the most fundamental — yet frequently misunderstood — distinctions in modern computing: what actually belongs on the CPU versus the GPU, and how work flows between them.

This matters more now than it has in years. The explosion of GPU computing driven by machine learning has pulled an enormous number of developers into GPU programming who didn't come up through the traditional graphics pipeline. Many of them carry CPU-centric mental models that lead to real performance pitfalls:

A good treatment of CPU vs. GPU work decomposition covers the asymmetry between the two processors: CPUs optimized for low-latency sequential tasks with deep caches and branch prediction, GPUs optimized for high-throughput parallel tasks with massive register files and SIMD execution. The practical upshot is that choosing where work runs isn't just about "is this parallelizable?" — it's about data locality, synchronization costs, and pipeline bubbles.

For graphics programmers, this is bread and butter: culling on the CPU, draw call submission, command buffer recording, versus vertex shading, rasterization, and fragment work on the GPU. But the same principles apply directly to ML inference pipelines, compute shaders, and even database query engines that offload to GPUs. Understanding the boundary helps you design systems where both processors stay saturated rather than blocking on each other.

Hoffman's writing tends to be concise and diagram-heavy, which makes these architectural concepts accessible without dumbing them down. For anyone writing renderers, ML inference code, or heterogeneous compute pipelines, having a clean mental model of this CPU/GPU division pays dividends in every optimization decision you make.

Why it deserves more upvotes: A clear explanation of CPU vs. GPU work partitioning is evergreen knowledge that applies far beyond graphics — from ML inference to compute pipelines — and getting this boundary wrong is one of the most common sources of performance bottlenecks in heterogeneous systems.

All newsletters