Hacker News Deep Cuts: Human Judgment as a Specification

Human Judgment as a Specification

2026-06-09

Link: https://blog.brownplt.org/2026/06/09/pick.html

The Brown University Programming Languages group publishes some of the most thoughtful work on how humans actually interact with formal systems — from Pyret to research on notional machines and the cognitive load of type errors. A new post from them about human judgment as a specification is exactly the kind of quietly important idea that gets buried under the daily churn of model releases and framework drama.

The framing is provocative: in classical software engineering, a specification is a precise, machine-checkable statement of what a program should do. But increasingly — especially in the LLM era — we're writing systems whose correctness criterion is human judgment. A summarizer is "correct" when humans find the summary good. A code assistant is "correct" when the developer accepts the suggestion. There's no oracle, no reference implementation, no decidable predicate.

Why this matters for a technical audience:

Testing strategy collapses. If your spec is "a human would approve," then property-based testing, fuzzing, and formal verification all need rethinking. What does a unit test look like when the assertion is fuzzy?
Regression detection becomes statistical. You can't say "build N+1 broke feature X" with certainty — only that approval rates shifted, possibly within noise.
The PL community has tools to offer. Brown PLT in particular has spent decades thinking about how to make implicit human expectations explicit. Their angle is likely more rigorous than the typical "vibes-based evals" discourse.

The URL pattern (pick.html) hints the post may frame this around a concrete example — perhaps a "pick the best option" interaction, which is exactly the abstraction many LLM-as-judge eval pipelines rest on. If so, it likely interrogates whether that abstraction is sound: when humans "pick," are they specifying something stable, or generating noise that we're laundering through statistics?

This is the kind of post that ten years from now will look obvious — of course we needed a theory of specification for systems with no formal spec — but right now it's a single upvote from a researcher's blog. The PL theory crowd has been quietly building the conceptual scaffolding the AI engineering world is going to need, and Brown PLT is one of the few groups doing it without hype.

Why it deserves more upvotes: A rigorous PL-theory take on the hardest open problem in AI engineering — what does "correct" mean when the spec lives in someone's head?

All newsletters