2026-05-20
HN Discussion: 1 points, 0 comments
This is exactly the kind of post that should be lighting up the HN front page and instead got buried at one point with zero comments. GitHub's code search is one of the most quietly impressive pieces of search infrastructure in production anywhere — it indexes over 100 terabytes of source code across tens of millions of repositories, supports regex queries, respects symbol semantics, and returns results fast enough to feel like grep on your laptop. That isn't something you build with Elasticsearch and a prayer.
The post almost certainly digs into Blackbird, GitHub's custom Rust-based search engine that replaced their earlier Elasticsearch-backed system. The interesting bits a technical reader would want to see covered:
The reason this matters beyond GitHub: code search is one of the few large-scale search workloads where the corpus has structure the search engine can exploit. Every team building developer tools — from AI coding assistants doing retrieval over codebases, to internal "Sourcegraph for our monorepo" tools — is solving a tiny fraction of this problem. GitHub publishes how they actually did it at scale, in detail, and almost nobody noticed.
Posts like this also age well. Architecture writeups from Google, Dropbox, and Facebook from a decade ago are still cited; this is in that same lineage.
