RFC 2152: UTF-7, A Mail-Safe Transformation Format of Unicode

2026-05-09

RFC: RFC 2152

Published: 1997

Authors: David Goldsmith, Mark Davis

UTF-7 is the encoding nobody asks for and almost everybody has touched. It's the strange cousin of UTF-8: a way to smuggle Unicode through systems that only tolerate 7-bit ASCII. RFC 2152 supersedes RFC 1642 and finalizes the spec that, despite being deprecated for general use, still lurks in places you can't escape — most notably IMAP folder names.

The problem. In the mid-1990s, vast swaths of the email infrastructure were 7-bit clean only. SMTP servers, gateways, and especially the IMAP protocol command channel could choke on any byte with the high bit set. UTF-8 — designed by Pike and Thompson on a placemat in 1992 — solved the file-system case beautifully but was 8-bit by nature. If you wanted to send Japanese, Greek, or Cyrillic through a gateway that stripped the high bit, UTF-8 turned to mush. UTF-7 was the workaround: encode all of Unicode using only the printable ASCII subset.

The mechanism. UTF-7 splits text into two regions:

The + character is itself the shift-in escape, so a literal + must be encoded as +-. The result is fully 7-bit, never longer than necessary by much, and (unlike Quoted-Printable or MIME-encoded words) requires no per-line wrapping rules.

Why it's quirky. UTF-7 is the only common Unicode encoding where a single character can be represented multiple ways. A can be a literal A or shifted as +AEE-. This ambiguity turned out to be a security disaster: in the 2000s, Internet Explorer would auto-detect UTF-7 from page content and treat +ADw-script+AD4- as <script>, bypassing every XSS filter looking for angle brackets. The fallout is why modern browsers refuse to auto-detect UTF-7 and the WHATWG encoding standard refuses to even define it.

Where it lives today. If you've ever named an IMAP folder Sent Items in Japanese or stored emoji in a mailbox name, you've used modified UTF-7 as defined by RFC 3501 (IMAP4rev1). It's UTF-7 with two changes: & replaces + as the shift character (since + is common in folder names), and the base64 alphabet swaps / for , (because / is the IMAP hierarchy delimiter). So your 受信箱 folder travels the wire as &U9dP5U+e-. Every IMAP client and server on Earth still implements this in 2026.

Design lesson. RFC 2152 is a monument to transitional protocol design — engineering that exists solely because the substrate (7-bit-clean SMTP) couldn't be upgraded fast enough. Once the substrate caught up (8BITMIME, UTF-8 everywhere), the workaround should have died. Instead, it got embedded in IMAP and became immortal. The takeaway for working engineers: every encoding hack you ship as "temporary" will outlive your career, so design it to fail safe, not just to work.

Why it matters: UTF-7 is a deprecated, security-cursed encoding that nonetheless powers every non-ASCII IMAP folder name in the world — a reminder that "temporary" protocol workarounds become permanent infrastructure.

All newsletters