Obfuscated and encrypted protocols hinder traffic classification by classical techniques such as port analysis or deep packet inspection. Therefore, there is growing interest for classification algorithms based on statistical analysis of the length of the first packets of flows. Most classifiers proposed in literature are based on machine learning techniques and consider each flow independently of previous source activity (per-flow analysis). In this paper, we propose to use specific per-source information to improve classification accuracy: the sequence of starting times of flows generated by single sources may be analyzed along time to estimate peculiar statistical parameters, in our case the exponent α of the power law f -α that approximates the PSD of their counting process. In our method, this measurement is used to train a classifier in addition to the lengths of the first packets of the flows. In our experiments, considering this additional per-source information yielded the same accuracy as using only per-flow data, but observing fewer packets in each flow and thus allowing a quicker response. For the proposed classifier, we report performance evaluation results obtained on sets of Internet traffic traces collected in three sites.

Using per-Source Measurements to Improve Performance of Internet Traffic Classification

BREGNI, STEFANO;LUCERNA, DIEGO;ROTTONDI, CRISTINA EMMA MARGHERITA;VERTICALE, GIACOMO
2010-01-01

Abstract

Obfuscated and encrypted protocols hinder traffic classification by classical techniques such as port analysis or deep packet inspection. Therefore, there is growing interest for classification algorithms based on statistical analysis of the length of the first packets of flows. Most classifiers proposed in literature are based on machine learning techniques and consider each flow independently of previous source activity (per-flow analysis). In this paper, we propose to use specific per-source information to improve classification accuracy: the sequence of starting times of flows generated by single sources may be analyzed along time to estimate peculiar statistical parameters, in our case the exponent α of the power law f -α that approximates the PSD of their counting process. In our method, this measurement is used to train a classifier in addition to the lengths of the first packets of the flows. In our experiments, considering this additional per-source information yielded the same accuracy as using only per-flow data, but observing fewer packets in each flow and thus allowing a quicker response. For the proposed classifier, we report performance evaluation results obtained on sets of Internet traffic traces collected in three sites.
2010
Communications (LATINCOM), 2010 IEEE Latin-American Conference on
9781424471713
Communication system traffic; Internet; longrange dependence; traffic measurement (communication).
File in questo prodotto:
File Dimensione Formato  
Latincom2010_persource.pdf

Accesso riservato

: Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione 245.58 kB
Formato Adobe PDF
245.58 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/580202
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? ND
social impact