Retrieval of files without the support of file system structures is arguably essential for digital forensics. Files are typically stored as sequences of data blocks, which have to be reconstructed in the retrieval process. This is commonly performed, among other approaches, through file carving, in general detecting the original block sequences by means of signatures of known headers and footers of files. Of course, this creates challenges with fragmented files, where blocks belonging to different files may be interleaved. Ways to classify file blocks into file types relying on their content may provide a support to achieve a successful reconstruction. We propose to classify file blocks using Support Vector Machines (SVMs), and we do so by studying in-depth the impact of an appropriate selection of the features used in the classification process. We analyze several potential features and test their performance over a large and representative collection of file blocks and file types. We find out that SVM classifiers can achieve a good accuracy and that a specific type of features (based on byte frequency distribution) performs well across almost all of the examined file types.

File Block Classification by Support Vector Machines

SPORTIELLO, LUIGI;ZANERO, STEFANO
2011-01-01

Abstract

Retrieval of files without the support of file system structures is arguably essential for digital forensics. Files are typically stored as sequences of data blocks, which have to be reconstructed in the retrieval process. This is commonly performed, among other approaches, through file carving, in general detecting the original block sequences by means of signatures of known headers and footers of files. Of course, this creates challenges with fragmented files, where blocks belonging to different files may be interleaved. Ways to classify file blocks into file types relying on their content may provide a support to achieve a successful reconstruction. We propose to classify file blocks using Support Vector Machines (SVMs), and we do so by studying in-depth the impact of an appropriate selection of the features used in the classification process. We analyze several potential features and test their performance over a large and representative collection of file blocks and file types. We find out that SVM classifiers can achieve a good accuracy and that a specific type of features (based on byte frequency distribution) performs well across almost all of the examined file types.
2011 Sixth International Conference on Availability, Reliability and Security (ARES)
9780769544854
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/590885
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 25
  • ???jsp.display-item.citation.isi??? ND
social impact