RFC Deep Dive: RFC 4103: RTP Payload for Text Conversation

RFC 4103: RTP Payload for Text Conversation

2026-05-01

Published: 2005

Authors: Gunnar Hellström, Paul Jones

You type a message in a chat app and hit send. Your recipient reads the whole thing at once. But imagine a phone call where every keystroke you type appears on the other person's screen in real time, character by character, as you think and compose — no send button required. That is Real-Time Text (RTT), and RFC 4103 is the protocol that makes it work over IP networks.

The problem it solves: For deaf and hard-of-hearing people, text-based communication over telephone networks has been essential since the 1960s, when TTY/TDD devices first appeared. These devices transmitted characters one at a time over analog phone lines, enabling conversational text with the same immediacy as voice. When telephony moved to IP (VoIP, SIP, IMS), there was no equivalent mechanism. Store-and-forward messaging like SMS or IM wasn't a substitute — it broke the real-time, conversational flow that TTY users depended on. RFC 4103 bridged that gap.

How it works: The RFC defines an RTP payload format for carrying ITU-T T.140 text — a UTF-8-based protocol for multimedia conversational text. The key design decisions are:

RTP as transport: Rather than inventing a new transport, the authors piggy-backed on RTP, the same protocol carrying voice and video in VoIP calls. This was a deliberate choice — it meant RTT could be negotiated alongside audio/video in SIP sessions using standard SDP, managed by the same overbilling infrastructure, and benefit from RTP's existing timestamp and sequencing mechanisms.
Redundancy for reliability: Unlike voice, where a dropped packet means a brief audio glitch, a dropped text packet means missing characters — potentially changing meaning entirely. RFC 4103 addresses this with a redundancy scheme (per RFC 2198): each packet carries not only the new text but also copies of the previous two or three transmissions. If a packet is lost, the next one fills in the gap. No retransmission delay, no TCP-style head-of-line blocking.
300ms buffering interval: Characters are buffered for up to 300 milliseconds before transmission. This batches a few keystrokes per packet to reduce overhead while keeping latency perceptually "real-time." The spec explicitly warns against buffering longer — the point is conversational immediacy.
T.140 features: The payload supports not just character insertion but also backspace, allowing the receiver to see corrections as they happen. It's UTF-8 native, so it handles any language from the start — no ASCII-only limitations like old TTY.

Why it matters today: RFC 4103 is far from a dead spec. It is mandated in multiple regulatory frameworks:

The U.S. FCC requires RTT support in IP-based telecommunications as the successor to TTY, and all major carriers have deployed it.
The European Electronic Communications Code requires real-time text capability in emergency communications.
Text-to-911: RTT is a key technology enabling deaf users to contact emergency services with the same immediacy as a voice call — something SMS-to-911 cannot provide because of its store-and-forward delay.
Android and iOS both implement RTT natively in their phone dialers, built on top of RFC 4103 carried within IMS (IP Multimedia Subsystem) sessions.

If you've ever seen the "RTT" option in your phone's call settings and wondered what it was, this is the answer. It's also a fascinating case study in protocol design for accessibility: the authors had to balance packet efficiency, real-time latency requirements, and reliability in a way that voice codecs never worry about, because every single character matters.

The spec is also a reminder that "real-time" means different things in different contexts. For voice, 20ms packet intervals are standard. For text, 300ms is perfectly fine — human typing speed is the bottleneck, not the network. Matching protocol parameters to human factors rather than raw capability is an underrated design skill.

Why it matters: RFC 4103 is the protocol that lets deaf and hard-of-hearing users make real-time text "phone calls" over IP networks — it's mandated by regulators, built into every modern smartphone, and is a masterclass in designing protocols around human accessibility needs rather than pure engineering metrics.

All newsletters