RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

API functions often require the crafting of specific inputs and may return some output that is usually processed by the code that immediately follows their invocation. In this work, we claim that - for some APIs - those two stages are both frequently similar across different binaries and sufficiently unique to be fingerprinted. We build upon this intuition and present Apìcula, a static analysis tool for identifying API calls in generic streams of bytes, such as memory dumps, network traffic, or object code files. In a nutshell, Apìcula leverages the control flow graph of a binary to generate a set of fingerprints for all basic blocks that end with a call instruction. Those sets are then compared against a database of pre-computed fingerprints to establish whether any known API is being invoked. Due to its applicability to unstructured byte streams, Apìcula can complement the reverse engineering process when this is carried out over memory dumps collected after a cyber-incident. Moreover, it can enable behavioral analysis in a fully static way, by identifying sequences of API calls even in non executable binaries. We provide a series of experiments that are instrumental (1) in demonstrating that the same fingerprints computed for specific APIs can be observed across different binaries and (2) in iden- tifying a subset of the Windows APIs whose usage can be detected by Apìcula with sufficient precision and sensitivity, focusing in particular on malicious binaries. Furthermore, we illustrate two techniques that can be used to validate different fingerprint databases in case someone wants to detect APIs belonging to libraries different from those that we consider in this work. In particular, we prove that fingerprints associated with different APIs are remarkably dissimilar and therefore can be employed for distinguishing between APIs. More specifically, we find that fingerprint sets associated with different APIs present on average a Jaccard index value of 0.000125; in comparison, the average similarity between fingerprint sets associated with the same API is 0.29 (Jaccard index) for binaries compiled with the same optimization level and 0.07 (Jaccard index) for binaries compiled with different optimization levels. Moreover, we show that we can build databases of fingerprints that are sufficiently comprehensive to identify specific APIs in unseen binaries. More precisely, we identify 228 different APIs among the Windows APIs (including the C run-time libraries) whose usage can be detected by Apìcula with sensitivity greater than 80% and a false discovery rate lower than 5%.

Apícula: Static Detection of API Calls in Generic Streams of Bytes

D’Onghia, Mario;Salvadore, Matteo;Nespoli, Benedetto Maria;Carminati, Michele;Polino, Mario;Zanero, Stefano

2022-01-01

Abstract

API functions often require the crafting of specific inputs and may return some output that is usually processed by the code that immediately follows their invocation. In this work, we claim that - for some APIs - those two stages are both frequently similar across different binaries and sufficiently unique to be fingerprinted. We build upon this intuition and present Apìcula, a static analysis tool for identifying API calls in generic streams of bytes, such as memory dumps, network traffic, or object code files. In a nutshell, Apìcula leverages the control flow graph of a binary to generate a set of fingerprints for all basic blocks that end with a call instruction. Those sets are then compared against a database of pre-computed fingerprints to establish whether any known API is being invoked. Due to its applicability to unstructured byte streams, Apìcula can complement the reverse engineering process when this is carried out over memory dumps collected after a cyber-incident. Moreover, it can enable behavioral analysis in a fully static way, by identifying sequences of API calls even in non executable binaries. We provide a series of experiments that are instrumental (1) in demonstrating that the same fingerprints computed for specific APIs can be observed across different binaries and (2) in iden- tifying a subset of the Windows APIs whose usage can be detected by Apìcula with sufficient precision and sensitivity, focusing in particular on malicious binaries. Furthermore, we illustrate two techniques that can be used to validate different fingerprint databases in case someone wants to detect APIs belonging to libraries different from those that we consider in this work. In particular, we prove that fingerprints associated with different APIs are remarkably dissimilar and therefore can be employed for distinguishing between APIs. More specifically, we find that fingerprint sets associated with different APIs present on average a Jaccard index value of 0.000125; in comparison, the average similarity between fingerprint sets associated with the same API is 0.29 (Jaccard index) for binaries compiled with the same optimization level and 0.07 (Jaccard index) for binaries compiled with different optimization levels. Moreover, we show that we can build databases of fingerprints that are sufficiently comprehensive to identify specific APIs in unseen binaries. More precisely, we identify 228 different APIs among the Windows APIs (including the C run-time libraries) whose usage can be detected by Apìcula with sensitivity greater than 80% and a false discovery rate lower than 5%.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2022
			
	Titolo della rivista
	
				COMPUTERS & SECURITY
			
	Parole chiave
	
				Application Programming Interface Detection, Binary Analysis, Static Analysis, Code Fingerprinting, Malware Analysis
			
	Appare nelle tipologie:
	
				01.1 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
apicula_final.pdf accesso aperto Descrizione: Articolo principale : Pre-Print (o Pre-Refereeing) Dimensione 978.7 kB Formato Adobe PDF Visualizza/Apri	978.7 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1216855

Citazioni

ND

6

5

social impact