Flow records, that summarize the characteristics of traffic flows, represent a practical and powerful way to monitor a network. While they already offer significant compression compared to full packet captures, their sheer volume remains daunting, especially for large Internet Service Providers (ISPs). In this paper, we investigate several lossy compression techniques to further reduce storage requirements while preserving the utility of flow records for key tasks, such as predicting the domain name of contacted servers. Our study evaluates scalar quantization, Principal Component Analysis (PCA), and vector quantization, applied to a real-world dataset from an operational campus network. Results reveal that scalar quantization provides the best tradeoff between compression and accuracy. PCA can preserve predictive accuracy but hampers subsequent entropic compression, and while vector quantization shows promise, it struggles with scalability due to the high-dimensional nature of the data. These findings result in practical strategies for optimizing flow record storage in large-scale monitoring scenarios.

Handling Large-Scale Network Flow Records: A Comparative Study on Lossy Compression

Trevisan, Martino;Palmese, Fabio;Baccichet, Giovanni;Redondi, Alessandro
2025-01-01

Abstract

Flow records, that summarize the characteristics of traffic flows, represent a practical and powerful way to monitor a network. While they already offer significant compression compared to full packet captures, their sheer volume remains daunting, especially for large Internet Service Providers (ISPs). In this paper, we investigate several lossy compression techniques to further reduce storage requirements while preserving the utility of flow records for key tasks, such as predicting the domain name of contacted servers. Our study evaluates scalar quantization, Principal Component Analysis (PCA), and vector quantization, applied to a real-world dataset from an operational campus network. Results reveal that scalar quantization provides the best tradeoff between compression and accuracy. PCA can preserve predictive accuracy but hampers subsequent entropic compression, and while vector quantization shows promise, it struggles with scalability due to the high-dimensional nature of the data. These findings result in practical strategies for optimizing flow record storage in large-scale monitoring scenarios.
2025
Proceedings of IEEE/IFIP Network Operations and Management Symposium 2025, NOMS 2025
Big Data
Lossy Com-pression
Network Flow Records
Passive Monitoring
File in questo prodotto:
File Dimensione Formato  
Handling_Large-Scale_Network_Flow_Records_A_Comparative_Study_on_Lossy_Compression.pdf

accesso aperto

: Publisher’s version
Dimensione 339.76 kB
Formato Adobe PDF
339.76 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1298579
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact