Rust's &str Slicing Trap: The Truncation That Panics on Café

2026-06-07

This helper truncates a string for display, appending an ellipsis if it was too long. It passes every test in the suite — until a user named Привет signs up.

fn ellipsis(s: &str, max_len: usize) -> String {
    if s.len() <= max_len {
        s.to_string()
    } else {
        format!("{}…", &s[..max_len])
    }
}

fn main() {
    println!("{}", ellipsis("Hello, world!", 8));   // "Hello, w…"
    println!("{}", ellipsis("café society", 5));    // "café …"
    println!("{}", ellipsis("Привет, мир!", 5));    // boom
}

The third call panics:

thread 'main' panicked at 'byte index 5 is not a char boundary;
  it is inside 'е' (bytes 4..6) of `Привет, мир!`'

The Bug

In Rust, &str is a slice of UTF-8 bytes, but its contents are Unicode. The operators s.len() and &s[..n] both work in bytes, not characters. ASCII fools you: every letter is one byte, so "Hello"[..3] is "Hel" and life is good.

The moment a multi-byte codepoint shows up, the abstraction leaks. In UTF-8:

So "Привет, мир!" is 21 bytes long, not 12. When you ask for &s[..5], Rust lands mid-codepoint — between the two bytes of 'е' — and refuses to hand you broken UTF-8. It panics rather than silently produce an invalid &str. That guarantee is good; getting bitten by it in production is not.

Worse, this bug is invisible in unit tests that only feed ASCII. It hides until a real user's name, a customer's address, or a translated UI string trips it — usually in front of an audience.

The Fix

Walk back to the nearest valid char boundary before slicing:

fn ellipsis(s: &str, max_len: usize) -> String {
    if s.len() <= max_len {
        return s.to_string();
    }
    let mut end = max_len;
    while !s.is_char_boundary(end) {
        end -= 1;
    }
    format!("{}…", &s[..end])
}

Or, if you actually want a character count rather than a byte count (almost always what UI code wants), iterate codepoints:

fn ellipsis(s: &str, max_chars: usize) -> String {
    let mut it = s.chars();
    let head: String = it.by_ref().take(max_chars).collect();
    if it.next().is_some() {
        format!("{head}…")
    } else {
        head
    }
}

One more layer of nuance: even chars() isn't the whole story. A user-perceived character — a grapheme cluster — can be several codepoints (e.g., "é" as 'e' + U+0301, or family emoji built from ZWJ sequences). Splitting between them won't panic, but it will render garbage. For correct human-facing truncation, reach for the unicode-segmentation crate and iterate graphemes(true).

The deeper lesson: &str looks like a string and indexes like an array, but it is neither. It's a window onto bytes that happen to be valid UTF-8, and Rust enforces that invariant at runtime with panics. Treat any indexing into &str the way you'd treat a raw pointer — fine when you've proven the offset is valid, dangerous when you've assumed it.

Key Takeaway: In Rust, str.len() and &s[..n] count bytes, not characters — slice on a non-boundary inside a multi-byte codepoint and you panic, so use is_char_boundary, chars(), or grapheme iteration for any string a human will type.

All newsletters