Static binary analysis is a key tool to assess the security of thirdparty binaries and legacy programs. Most forms of binary analysis rely on the availability of two key pieces of information: the program's control-flow graph and function boundaries. However, current tools struggle to provide accurate and precise results, in particular when dealing with hand-written assembly functions and non-trivial control-flow transfer instructions, such as tail calls. In addition, most of the existing solutions are ad-hoc, rely on handcoded heuristics, and are tied to a specific architecture. In this paper we highlight the challenges faced by an architecture agnostic static binary analysis framework to provide accurate information about a program's CFG and function boundaries without employing debugging information or symbols.We propose a set of analyses to address predicate instructions, noreturn functions, tail calls, and context-dependent CFG. REV.NG, our binary analysis framework based on QEMU and LLVM, handles all the 17 architectures supported by QEMU and produces a compilable LLVM IR. We implement our described analyses on top of LLVM IR. In an extensive evaluation, we test our tool on binaries compiled for MIPS, ARM, and x86-64 using GCC and clang and compare them to the industry's state of the art tool, IDA Pro, and two well-known academic tools, BAP/ByteWeight and angr. In all cases, the quality of the CFG and function boundaries produced by REV.NG is comparable to or improves over the alternatives.

REV.NG: A unified binary analysis framework to recover CFGs and function boundaries

DI FEDERICO, ALESSANDRO;AGOSTA, GIOVANNI
2017-01-01

Abstract

Static binary analysis is a key tool to assess the security of thirdparty binaries and legacy programs. Most forms of binary analysis rely on the availability of two key pieces of information: the program's control-flow graph and function boundaries. However, current tools struggle to provide accurate and precise results, in particular when dealing with hand-written assembly functions and non-trivial control-flow transfer instructions, such as tail calls. In addition, most of the existing solutions are ad-hoc, rely on handcoded heuristics, and are tied to a specific architecture. In this paper we highlight the challenges faced by an architecture agnostic static binary analysis framework to provide accurate information about a program's CFG and function boundaries without employing debugging information or symbols.We propose a set of analyses to address predicate instructions, noreturn functions, tail calls, and context-dependent CFG. REV.NG, our binary analysis framework based on QEMU and LLVM, handles all the 17 architectures supported by QEMU and produces a compilable LLVM IR. We implement our described analyses on top of LLVM IR. In an extensive evaluation, we test our tool on binaries compiled for MIPS, ARM, and x86-64 using GCC and clang and compare them to the industry's state of the art tool, IDA Pro, and two well-known academic tools, BAP/ByteWeight and angr. In all cases, the quality of the CFG and function boundaries produced by REV.NG is comparable to or improves over the alternatives.
2017
ACM International Conference Proceeding Series
9781450352338
Control-flow graph recovery; Function boundary detection; Static binary analysis; Human-Computer Interaction; Computer Networks and Communications; 1707; Software
File in questo prodotto:
File Dimensione Formato  
p131-difederico.pdf

Accesso riservato

Descrizione: Main article
: Publisher’s version
Dimensione 216.26 kB
Formato Adobe PDF
216.26 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1026811
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 36
  • ???jsp.display-item.citation.isi??? 25
social impact