Association rule extraction is a widely used exploratory technique which has been exploited in different contexts (e.g., biological data, medical images). However, association rule extraction, driven by support and confidence constraints, entails (i) generating a huge number of rules which are difficult to analyze, or (ii) pruning rare itemsets, even if their hidden knowledge might be relevant. To address the above issues, this paper presents a novel frequent itemset mining algorithm, called GENIO (GENeralized Itemset DiscOverer), to analyze correlation among data by means of generalized itemsets, which provide a powerful tool to efficiently extract hidden knowledge, discarded by previous approaches. The proposed technique exploits a (user provided) taxonomy to drive the pruning phase of the extraction process. Instead of extracting itemsets for all levels of the taxonomy and post-pruning them, the GenIO algorithm performs a support driven opportunistic aggregation of itemsets. Generalized itemsets are extracted only if itemsets at a lower level in the taxonomy are below the support threshold. Experiments performed in the network traffic domain show the efficiency and the effectiveness of the proposed algorithm.

Support driven opportunistic aggregation for generalized itemset extraction

GARZA, PAOLO
2010-01-01

Abstract

Association rule extraction is a widely used exploratory technique which has been exploited in different contexts (e.g., biological data, medical images). However, association rule extraction, driven by support and confidence constraints, entails (i) generating a huge number of rules which are difficult to analyze, or (ii) pruning rare itemsets, even if their hidden knowledge might be relevant. To address the above issues, this paper presents a novel frequent itemset mining algorithm, called GENIO (GENeralized Itemset DiscOverer), to analyze correlation among data by means of generalized itemsets, which provide a powerful tool to efficiently extract hidden knowledge, discarded by previous approaches. The proposed technique exploits a (user provided) taxonomy to drive the pruning phase of the extraction process. Instead of extracting itemsets for all levels of the taxonomy and post-pruning them, the GenIO algorithm performs a support driven opportunistic aggregation of itemsets. Generalized itemsets are extracted only if itemsets at a lower level in the taxonomy are below the support threshold. Experiments performed in the network traffic domain show the efficiency and the effectiveness of the proposed algorithm.
2010
Intelligent Systems (IS), 2010 5th IEEE International Conference
9781424451630
INF
File in questo prodotto:
File Dimensione Formato  
17_IS10.pdf

Accesso riservato

: Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione 672.13 kB
Formato Adobe PDF
672.13 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/580111
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 31
  • ???jsp.display-item.citation.isi??? ND
social impact