2026-05-31
The Task State Segment (TSS) was designed in 1985 for hardware task switching — the CPU would save all registers into a TSS on a task switch, swap TSS selectors, and restore the new task's state. Long mode threw all of that away. Hardware task switching doesn't exist in 64-bit mode. Yet every x86-64 CPU still requires a valid TSS, and getting it wrong means triple faults on the first interrupt.
What survived is the part the kernel actually needs: stack pointers for privilege transitions. The 64-bit TSS is 104 bytes containing three things that matter:
The IST is the critical feature. Consider a double fault (#DF, vector 8). If your kernel stack is corrupted or unmapped, taking any interrupt would push a trap frame onto a bad stack, immediately faulting again — a triple fault, which resets the CPU. Linux assigns #DF its own IST entry pointing to a dedicated stack, so the double fault handler runs on guaranteed-good memory.
The same applies to NMI (non-maskable interrupt) and #MC (machine check). These can arrive at any instruction boundary, including mid-syscall when RSP is briefly in an undefined state. Without IST, an NMI during the SYSCALL instruction's stack-swap window would push to userspace's stack.
Rule of thumb: Linux uses 4 IST stacks per CPU (DF, NMI, MCE, DEBUG), each 16 KB. On a 256-core machine that's 256 × 4 × 16 KB = 16 MB of always-resident kernel memory just for "if everything else breaks, we still have a stack."
Real-world example: The Meltdown KPTI mitigation complicated this. With page-table isolation, the kernel stack pointed to by RSP0 isn't mapped in the userspace page table. The CPU loads RSP0 from the TSS before CR3 is swapped — so the TSS itself, and a tiny trampoline stack, must live in the kernel's "user-visible" minimal mapping. That's why cpu_entry_area exists in arch/x86/mm/cpu_entry_area.c: a per-CPU page-aligned region containing the TSS and trampoline stacks, mapped in both page tables.
