2026-05-16
You inherit a 4GB CSV. Or a SQLite file from a vendor. Or a Parquet dump nobody documented. Or a JSON Lines stream from an API you've never used. The default reaction is to write a one-off script, fight pandas in a notebook, or open it in Excel and watch it crash. VisiData (vd) by Saul Pwanson is a single-binary TUI that opens all of them — and dozens more — with the same keystrokes, in a spreadsheet-like grid, in your terminal.
It's vim-modal, written in Python, and runs over SSH on a server with no display. Install: pip install visidata or brew install visidata.
Open anything:
vd access.log.gz # auto-decompresses
vd users.csv orders.parquet # multiple sheets, switch with Shift-S
vd sqlite:///data.db # browses tables, dive in with Enter
vd https://example.com/x.csv # streams over HTTP
vd -f json events.ndjson # force a parser
Now the killer keystrokes. Cursor is on a column:
Shift-F — frequency table of the current column. Cardinality, counts, percentages. One keystroke replaces sort | uniq -c | sort -rn.[ / ] — sort ascending / descending.| — select rows matching a regex; " opens a new sheet of just those rows.= — add a derived column from a Python expression. price * qty, urlparse(url).netloc, whatever.Shift-I — describe sheet: row count, column types, nulls, min/max/mean per column.g* prefix applies to all selected rows; z* to the current cell only.Ctrl-S — save to any supported format. Open CSV, save Parquet. Open SQLite, save JSON.The feature that justifies installing it for one-off use: everything you do is logged as a replayable script. Press Shift-D to view the command log, save it as a .vd file, then later:
vd -p clean_orders.vd -b -o orders_cleaned.parquet raw_orders.csv
That runs the exact session you recorded — every sort, filter, derived column — in batch mode (-b) with no TUI, output to Parquet. Your "I explored the data interactively" turns into a reproducible pipeline without ever writing a script. This is the thing pandas notebooks pretend to be.
A real workflow: investigating a slow query log shipped as JSON Lines.
vd slow.ndjson
# columns appear: query, duration_ms, db, user, ts
# cursor on duration_ms, press ] to sort desc
# Shift-F on db → frequency table, see which DB dominates
# = on query column: regex_search(r'JOIN\s+\w+\s+JOIN', query)
# | on that new column to select queries with multiple joins
# " to open just those rows as a new sheet
# Ctrl-S → save as bad_queries.csv
# Shift-D → save the .vd, hand it to a coworker
Compare to the alternatives: jq is great until you want to sort by one field and group by another. xsv and Miller are fast but batch-only — you can't see the data. Pandas in Jupyter requires writing code for every question. VisiData closes the loop: explore by keystroke, freeze the exploration as a script when you find something worth keeping.
Bonus: vd --play tutorial.vd walks you through the keybindings interactively. The cheat sheet at ? inside the app is searchable.
