Human Judgment as a Specification

2026-06-09

Link: https://blog.brownplt.org/2026/06/09/pick.html

HN Discussion: 1 points, 0 comments

The Brown University Programming Languages group publishes some of the most thoughtful work on how humans actually interact with formal systems — from Pyret to research on notional machines and the cognitive load of type errors. A new post from them about human judgment as a specification is exactly the kind of quietly important idea that gets buried under the daily churn of model releases and framework drama.

The framing is provocative: in classical software engineering, a specification is a precise, machine-checkable statement of what a program should do. But increasingly — especially in the LLM era — we're writing systems whose correctness criterion is human judgment. A summarizer is "correct" when humans find the summary good. A code assistant is "correct" when the developer accepts the suggestion. There's no oracle, no reference implementation, no decidable predicate.

Why this matters for a technical audience:

The URL pattern (pick.html) hints the post may frame this around a concrete example — perhaps a "pick the best option" interaction, which is exactly the abstraction many LLM-as-judge eval pipelines rest on. If so, it likely interrogates whether that abstraction is sound: when humans "pick," are they specifying something stable, or generating noise that we're laundering through statistics?

This is the kind of post that ten years from now will look obvious — of course we needed a theory of specification for systems with no formal spec — but right now it's a single upvote from a researcher's blog. The PL theory crowd has been quietly building the conceptual scaffolding the AI engineering world is going to need, and Brown PLT is one of the few groups doing it without hype.

Why it deserves more upvotes: A rigorous PL-theory take on the hardest open problem in AI engineering — what does "correct" mean when the spec lives in someone's head?

All newsletters