&str Slicing Trap: The Truncation That Panics on Café2026-06-07
This helper truncates a string for display, appending an ellipsis if it was too long. It passes every test in the suite — until a user named Привет signs up.
fn ellipsis(s: &str, max_len: usize) -> String {
if s.len() <= max_len {
s.to_string()
} else {
format!("{}…", &s[..max_len])
}
}
fn main() {
println!("{}", ellipsis("Hello, world!", 8)); // "Hello, w…"
println!("{}", ellipsis("café society", 5)); // "café …"
println!("{}", ellipsis("Привет, мир!", 5)); // boom
}
The third call panics:
thread 'main' panicked at 'byte index 5 is not a char boundary;
it is inside 'е' (bytes 4..6) of `Привет, мир!`'
In Rust, &str is a slice of UTF-8 bytes, but its contents are Unicode. The operators s.len() and &s[..n] both work in bytes, not characters. ASCII fools you: every letter is one byte, so "Hello"[..3] is "Hel" and life is good.
The moment a multi-byte codepoint shows up, the abstraction leaks. In UTF-8:
'é' = 2 bytes (C3 A9)'П', 'р', 'и', 'в', 'е', 'т' = 2 bytes eachSo "Привет, мир!" is 21 bytes long, not 12. When you ask for &s[..5], Rust lands mid-codepoint — between the two bytes of 'е' — and refuses to hand you broken UTF-8. It panics rather than silently produce an invalid &str. That guarantee is good; getting bitten by it in production is not.
Worse, this bug is invisible in unit tests that only feed ASCII. It hides until a real user's name, a customer's address, or a translated UI string trips it — usually in front of an audience.
Walk back to the nearest valid char boundary before slicing:
fn ellipsis(s: &str, max_len: usize) -> String {
if s.len() <= max_len {
return s.to_string();
}
let mut end = max_len;
while !s.is_char_boundary(end) {
end -= 1;
}
format!("{}…", &s[..end])
}
Or, if you actually want a character count rather than a byte count (almost always what UI code wants), iterate codepoints:
fn ellipsis(s: &str, max_chars: usize) -> String {
let mut it = s.chars();
let head: String = it.by_ref().take(max_chars).collect();
if it.next().is_some() {
format!("{head}…")
} else {
head
}
}
One more layer of nuance: even chars() isn't the whole story. A user-perceived character — a grapheme cluster — can be several codepoints (e.g., "é" as 'e' + U+0301, or family emoji built from ZWJ sequences). Splitting between them won't panic, but it will render garbage. For correct human-facing truncation, reach for the unicode-segmentation crate and iterate graphemes(true).
The deeper lesson: &str looks like a string and indexes like an array, but it is neither. It's a window onto bytes that happen to be valid UTF-8, and Rust enforces that invariant at runtime with panics. Treat any indexing into &str the way you'd treat a raw pointer — fine when you've proven the offset is valid, dangerous when you've assumed it.
str.len() and &s[..n] count bytes, not characters — slice on a non-boundary inside a multi-byte codepoint and you panic, so use is_char_boundary, chars(), or grapheme iteration for any string a human will type.
