Daily Low-Level Programming: The SYSRET Vulnerability: Why Returning from a Syscall Can Crash Your Kernel

The SYSRET Vulnerability: Why Returning from a Syscall Can Crash Your Kernel

2026-06-01

When your process calls read(), the CPU executes SYSCALL to enter the kernel, and the kernel uses SYSRET to return. These instructions are faster than the old INT 0x80 path because they skip IDT lookup and stack switching — the kernel does that manually. But SYSRET on Intel has a sharp edge that took years to find and still trips up hypervisor authors.

The mechanism: SYSRET restores RIP from RCX and RFLAGS from R11, then drops CPL from 0 to 3 in a single uninterruptible step. The whole point is atomicity — you can't be interrupted between "still in kernel mode" and "RIP points to userspace," because an interrupt there would push a kernel SS:RSP onto a now-user stack.

The bug: On Intel CPUs, if RCX contains a non-canonical address (bits 48-63 don't match bit 47 on x86-64), SYSRET raises a #GP fault. But here's the trap: the fault is delivered after RFLAGS has been loaded from R11 and before CPL has dropped to 3. The fault handler runs in ring 0, but on the user's RSP, with the user's GS base. AMD designed SYSRET to fault in user mode; Intel designed it to fault in kernel mode. Same mnemonic, different semantics.

The exploit (CVE-2012-0217): A user process sets RCX (via ptrace or a signal frame) to 0x800000000000 — the lowest non-canonical address. When the kernel does SYSRET, it faults in ring 0 with attacker-controlled RSP. The #GP handler pushes the exception frame to that RSP, then the attacker's signal handler runs with kernel privileges. FreeBSD, Xen, Windows, and Linux all shipped this bug. Joanna Rutkowska's team weaponized it on Xen.

The fix: Before SYSRET, the kernel must verify RCX is canonical. Linux's entry_SYSCALL_64 checks: if the saved RIP isn't canonical, fall back to IRET (which switches stacks atomically through the TSS) instead of SYSRET. The cost is one compare and one branch on the syscall return path — measurable but unavoidable.

Rule of thumb: Canonical address check is one instruction: shl $16, %rcx; sar $16, %rcx; cmp %rcx, saved_rcx. If they differ, bit 47 didn't sign-extend, and SYSRET would fault dangerously. Three instructions to avoid a privilege escalation.

Real-world fallout: Xen patched this in XSA-7. The lesson echoes through Meltdown and L1TF: vendor manuals describe instructions in isolation, but the fault delivery semantics around them are where the privilege boundary actually lives.

Key Takeaway: SYSRET on Intel faults in ring 0 with the user's stack, so the kernel must validate the return address is canonical before issuing the instruction — or fall back to IRET.

All newsletters