Frequent itemset mining is today one of the most popular data mining techniques. Its application is, however, hindered by the high computational cost in many real-world datasets, especially for smaller values of support thresholds. In many cases, moreover, the large number of frequent itemsets discovered is overwhelming. In some real-world applications, it is sufficient to find a smaller subset of frequent itemsets, such as identifying the frequent itemsets with a maximum length. In this paper, we present a pruning algorithm, called LengthSort, that reduces the search space effectively and improves the efficiency of mining frequent itemsets with a maximum length. LengthSort prunes both the items and the transactions before constructing a Frequent Pattern tree structure. Our experiments on several datasets show that the proposed pruning techniques reduce the time needed to discover the frequent itemsets with a maximum length. The proposed pruning algorithm can also be applied to efficiently discover frequent itemsets that are longer than a user-specified threshold.

A novel pruning algorithm for mining long and maximum length frequent itemsets

Lessanibahri S.;Gastaldi L.;
2020

Abstract

Frequent itemset mining is today one of the most popular data mining techniques. Its application is, however, hindered by the high computational cost in many real-world datasets, especially for smaller values of support thresholds. In many cases, moreover, the large number of frequent itemsets discovered is overwhelming. In some real-world applications, it is sufficient to find a smaller subset of frequent itemsets, such as identifying the frequent itemsets with a maximum length. In this paper, we present a pruning algorithm, called LengthSort, that reduces the search space effectively and improves the efficiency of mining frequent itemsets with a maximum length. LengthSort prunes both the items and the transactions before constructing a Frequent Pattern tree structure. Our experiments on several datasets show that the proposed pruning techniques reduce the time needed to discover the frequent itemsets with a maximum length. The proposed pruning algorithm can also be applied to efficiently discover frequent itemsets that are longer than a user-specified threshold.
Association rules mining; Data mining; Long frequent itemsets; Maximum length frequent itemsets
File in questo prodotto:
File Dimensione Formato  
Paper 4R Research Gate.pdf

embargo fino al 15/03/2022

Descrizione: Post-print
: Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione 822.8 kB
Formato Adobe PDF
822.8 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11311/1133599
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 4
social impact