Veritas: A Semantically Grounded Agentic Framework for Memory Corruption Vulnerability Detection in Binaries

2026-05-15

Authors: Xinran Zheng, Alfredo Pesoli, Marco Valleri, Suman Jana

ArXiv: 2605.15097v1

PDF: Download PDF

Imagine you're a security researcher handed a compiled program — a binary blob of machine code — and asked to find dangerous bugs like buffer overflows or use-after-free errors that attackers love to exploit. You don't have the original source code. All the helpful names, comments, and type information that programmers wrote are gone. What's left looks like assembly soup. This is the everyday reality for people who audit closed-source software, firmware, and malware.

The problem: Traditional tools for finding memory corruption bugs in binaries are either fast-but-shallow (pattern matching that misses real bugs and flags lots of false ones) or deep-but-brittle (symbolic execution that explodes computationally on real programs). Recent attempts to throw a large language model at the problem help with "understanding" what code does, but LLMs hallucinate confidently, miss subtle interprocedural bugs (where the flaw spans multiple functions), and can't tell whether a suspicious-looking pattern is actually reachable at runtime.

What Veritas does: The authors built an agentic framework — meaning a coordinated team of LLM-driven components that each handle a piece of the job — anchored by two ideas that keep it from drifting into fantasy:

The agentic part matters because memory bugs aren't local. A vulnerability often involves one function allocating an object, another using it, and a third freeing it under some weird condition. Veritas's agents propagate information across these boundaries, mimicking how a human reverse engineer hops around in IDA or Ghidra building a mental model.

The key insight: LLMs are good at pattern recognition and bad at proof. The right design isn't "LLM replaces the analyzer" — it's "LLM steers a classical analyzer toward the suspicious parts, and the classical analyzer keeps the LLM honest." Veritas treats the LLM as a strong but unreliable collaborator and pairs it with the kinds of evidence (slices, runtime triggers) that don't lie.

Why it matters: As more critical infrastructure runs on closed-source binaries and firmware, scalable binary vulnerability detection that combines LLM reasoning with grounded program analysis could meaningfully shift the offense-defense balance toward defenders.

All newsletters