2026-05-07
The page cache is usually your friend — but sometimes it's overhead you don't want. O_DIRECT tells the kernel: skip the cache, DMA straight between userspace buffers and the storage device. The data path becomes app buffer → device, not app buffer → page cache → device.
Why bypass the cache? Three legitimate reasons:
The brutal alignment rules. O_DIRECT enforces three constraints, and violating any returns EINVAL:
You can't just malloc(4096) — glibc gives no alignment guarantee. Use posix_memalign(&buf, 4096, size) or aligned_alloc(4096, size). Query the required alignment with statx() using STATX_DIOALIGN (Linux 6.1+), which returns stx_dio_mem_align and stx_dio_offset_align.
Concrete example — PostgreSQL. For decades Postgres relied on the page cache and used fsync() for durability. As of PG 16 (2023), io_method=io_uring combined with O_DIRECT is supported for WAL and data files, because the shared_buffers pool already caches pages. On a 256 GB server with 64 GB shared_buffers, double-caching with the page cache wasted ~60 GB of RAM that could now hold more index data.
Rule of thumb for sizing. Direct I/O wins when your application's hit rate on its own cache exceeds the page cache's hit rate. If your DB buffer pool is >25% of RAM and tracks access patterns better than LRU (most do — they use clock-pro or similar), O_DIRECT pays off. Below that threshold, let the kernel cache for you.
Gotcha: O_DIRECT does not imply O_SYNC. The write may still sit in the disk's volatile write cache. You still need fdatasync() for durability — direct I/O bypasses the kernel's cache, not the drive's.
