ArXiv Paper Digest: Automating Low-Risk Code Review at Meta: RADAR, Risk Calibration, and Review Efficiency

Automating Low-Risk Code Review at Meta: RADAR, Risk Calibration, and Review Efficiency

2026-05-29

Authors: Chris Adams, Arjun Singh Banga, Parveen Bansal, Souvik Bhattacharya

Meta has a problem that's becoming everyone's problem: AI is writing code faster than humans can review it. The paper opens with a striking stat — at Meta, the lines of code per human-landed change grew 105.9% year over year, and per-developer change volume jumped 51%. Agentic AI accounts for over 80% of that growth. Meanwhile, the share of changes getting timely human review is falling. Reviewers are drowning.

The authors' bet: not every code change deserves the same scrutiny. A one-line config tweak is not a database migration. So they built RADAR — a system that predicts how risky a proposed change is, and then automatically approves the low-risk stuff so humans can focus on what actually matters.

Here's the interesting part — the calibration question. It's easy to build a model that says "this looks safe." It's much harder to build one whose confidence you can trust. If RADAR says a change has a 2% chance of causing a bug, that needs to actually mean 2% in the real world, not 20%. The paper walks through how they tune the risk thresholds so the system is conservative enough to be safe but liberal enough to actually save reviewer time.

Three questions structure the work:

Feasibility: Can you reliably tell low-risk changes apart from risky ones at all?
Calibration: Can the model's confidence scores be trusted enough to act on automatically?
Impact: When deployed to real engineers, does it actually move the needle on review latency without introducing bugs?

This matters because it's one of the first public looks at how a hyperscaler is restructuring code review around AI-generated code at scale. Most discussion of AI coding assistants focuses on the generation side — how good is the code Copilot writes? But the bottleneck is shifting downstream. If AI agents are producing 80%+ of new code growth, the review process designed for human-paced output simply breaks. You either automate parts of review, or you ship unreviewed code, or you slow down — and "slow down" isn't on the table for most companies.

The honest framing here is also refreshing: the paper doesn't claim to automate all review, just the low-risk slice. That's the right place to start. It preserves human judgment for changes that warrant it while clawing back capacity from the rubber-stamp queue.

Why it matters: As AI-written code outpaces human review capacity, the next frontier isn't better code generation — it's risk-calibrated automation of the review process itself.

All newsletters