Using per-Host Measurements for Fast Internet Traffic Classification

Lucerna, Diego; Rottondi, CRISTINA EMMA MARGHERITA; Verticale, Giacomo

Accurate classification of Internet traffic is of fundamental importance for network management applications such as security monitoring, accounting, Quality-of-Service (QoS) provisioning, and for providing operators with useful information for network planning. Classical port-based or payload-based classification techniques are becoming less effective, because of the increasing presence of protocol obfuscation and payload encryption in today’s internet traffic. Therefore, there is growing interest in classification algorithms that only look at the IP and transport packet headers, along with other information which are difficult to obfuscate, such as the packet lengths and the interarrival times. Several recent papers have identified machine learning techniques as a viable technique for designing a classifier capable of dealing with the wide variety of protocols and implementations. In the real-time scenario, a traffic flow has to be classified by looking only at the first packets of the flow. In this context, measuring the activity of internet hosts can provide useful information about the applications that are generating the traffic coming from that host. In particular, we assume that the sequence of TCP connection requests (or, for UDP traffic, the sequence of new flows) generated by a given host using a given transport protocol towards a given transport port can be modeled as a random process with a power spectral density decaying according to a power law. Computation of the power law exponent for a given host/port pair requires some computational effort, but is available at the beginning of each flow, with no additional delay. In this paper, we show that using such information makes it possible to achieve a good classification accuracy by looking at very few packets, therefore yielding very quick response.