RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Static binary analysis techniques are widely used to reconstruct the behavior and discover vulnerabilities in software when source code is not available. To avoid errors due to mis-interpreting data as machine instructions (or vice-versa), disassemblers and static analysis tools must precisely infer the boundaries between code and data. However, this information is often not readily available. Worse, compilers may embed small chunks of data inside the code section. Most state of the art approaches to separate code and data are rooted on recursive traversal disassembly, with severe limitations when dealing with indirect control instructions. We propose ELISA, a technique to separate code from data and ease the static analysis of executable files. ELISA leverages supervised sequential learning techniques to locate the code section(s) boundaries of header-less binary files, and to predict the instruction boundaries inside the identified code section. As a preliminary step, if the Instruction Set Architecture (ISA) of the binary is unknown, ELISA leverages a logistic regression model to identify the correct ISA from the file content. We provide a comprehensive evaluation on a dataset of executables compiled for different ISAs, and we show that our method is capable to identify code sections with a byte-level accuracy (F1 score) ranging from 98.13% to over 99.9% depending on the ISA. Fine-grained separation of code from embedded data on x86, x86-64 and ARM executables is accomplished with an accuracy of over 99.9%.

ELISA: ELiciting ISA of Raw Binaries for Fine-grained Code and Data Separation

DE NICOLAO, PIETRO;Marcello Pogliani;Mario Polino;Michele Carminati;Davide Quarta;Stefano Zanero

2018-01-01

Abstract

Static binary analysis techniques are widely used to reconstruct the behavior and discover vulnerabilities in software when source code is not available. To avoid errors due to mis-interpreting data as machine instructions (or vice-versa), disassemblers and static analysis tools must precisely infer the boundaries between code and data. However, this information is often not readily available. Worse, compilers may embed small chunks of data inside the code section. Most state of the art approaches to separate code and data are rooted on recursive traversal disassembly, with severe limitations when dealing with indirect control instructions. We propose ELISA, a technique to separate code from data and ease the static analysis of executable files. ELISA leverages supervised sequential learning techniques to locate the code section(s) boundaries of header-less binary files, and to predict the instruction boundaries inside the identified code section. As a preliminary step, if the Instruction Set Architecture (ISA) of the binary is unknown, ELISA leverages a logistic regression model to identify the correct ISA from the file content. We provide a comprehensive evaluation on a dataset of executables compiled for different ISAs, and we show that our method is capable to identify code sections with a byte-level accuracy (F1 score) ranging from 98.13% to over 99.9% depending on the ISA. Fine-grained separation of code from embedded data on x86, x86-64 and ARM executables is accomplished with an accuracy of over 99.9%.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
			2018
		
	Titolo del libro
	
			15th Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA)
		
	Titolo della collana
	
			LECTURE NOTES IN COMPUTER SCIENCE
		
	ISBN (International Standard Book Number)
	
			9783319934105
9783319934112
		
	Appare nelle tipologie:
	
			04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
elisa-paper.pdf accesso aperto : Post-Print (DRAFT o Author’s Accepted Manuscript-AAM) Dimensione 986.91 kB Formato Adobe PDF Visualizza/Apri	986.91 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1053282

Citazioni

ND

7

4

social impact