RFC 4733: RTP Payload for DTMF Digits, Telephony Tones, and Telephony Signals

2026-06-09

RFC: RFC 4733

Published: December 2006

Authors: H. Schulzrinne, T. Taylor

Every time you've punched digits into a phone tree from your cell phone or a softphone — "press 1 for billing" — there's a non-trivial chance RFC 4733 carried those digits across the network. It is the unsung workhorse of modern telephony signaling, and it exists because the obvious approach (just send the tones as audio) is catastrophically broken once codecs get involved.

The problem. DTMF tones are pairs of sine waves (e.g., "5" is 770 Hz + 1336 Hz). On a clean PSTN circuit running G.711, they survive intact and the receiving switch can decode them with a Goertzel filter. But VoIP increasingly uses low-bitrate codecs — G.729, Opus at low rates, AMR — that are tuned for speech. Their psychoacoustic models mangle pure tones beyond recognition. Worse, packet loss concealment, comfort noise generation, and silence suppression can corrupt or drop a tone burst entirely. An IVR that misreads "1" as "4" because a packet got lost is a customer support disaster.

The design. RFC 4733 (which obsoleted RFC 2833) takes the tones out of band. Instead of synthesizing a tone in the audio stream, the endpoint sends a tiny RTP packet using a separate telephone-event payload type, negotiated in SDP. The payload is four bytes:

Reliability via redundancy. Here's the clever part. Because RTP runs on UDP and a lost digit is unacceptable, the sender retransmits the end packet (with the E bit set) three times by default. The receiver deduplicates by inspecting the RTP timestamp — every packet for a given event carries the timestamp of the event's start, so multiple packets describing "the 5 that began at timestamp T" collapse into one logical key press. The duration field grows in each packet, so you can also reconstruct long key holds.

Why it survived. The alternatives all lose. SIP INFO messages (another way to send digits) travel over the signaling path, which may be a different route with different latency — you get key presses arriving out of order with the audio they were meant to accompany. In-band tones, as noted, die in low-bitrate codecs. RFC 4733 events are multiplexed into the RTP stream itself, sharing sequence numbers and timestamps with the audio, so ordering and timing are preserved relative to speech.

Quirks worth knowing. Interop bugs around RFC 4733 are legendary. SBCs (Session Border Controllers) routinely translate between in-band DTMF, RFC 4733, and SIP INFO because no two carriers agree. The duration field is in RTP timestamp units, not milliseconds — at 8 kHz sampling, 8000 units = 1 second, which trips up implementers who hardcode 1000. And the event table goes well beyond keypad digits: codes 32–63 cover trunk signaling like wink, hookflash, and MF tones, making the RFC also relevant to legacy SS7 gateway work.

Why it matters: RFC 4733 is the reason your touch-tones still reach the IVR after your voice has been crushed through a 12 kbps codec — a tiny, redundant out-of-band signaling channel quietly making modern VoIP usable.

All newsletters