Frequent itemset mining is today one of the most popular data mining techniques. Its application is, however, hindered by the high computational cost in many real-world datasets, especially for smaller values of support thresholds. In many cases, moreover, the large number of frequent itemsets discovered is overwhelming. In some real-world applications, it is sufficient to find a smaller subset of frequent itemsets, such as identifying the frequent itemsets with a maximum length. In this paper, we present a pruning algorithm, called LengthSort, that reduces the search space effectively and improves the efficiency of mining frequent itemsets with a maximum length. LengthSort prunes both the items and the transactions before constructing a Frequent Pattern tree structure. Our experiments on several datasets show that the proposed pruning techniques reduce the time needed to discover the frequent itemsets with a maximum length. The proposed pruning algorithm can also be applied to efficiently discover frequent itemsets that are longer than a user-specified threshold.
A novel pruning algorithm for mining long and maximum length frequent itemsets
Lessanibahri S.;Gastaldi L.;
2020-01-01
Abstract
Frequent itemset mining is today one of the most popular data mining techniques. Its application is, however, hindered by the high computational cost in many real-world datasets, especially for smaller values of support thresholds. In many cases, moreover, the large number of frequent itemsets discovered is overwhelming. In some real-world applications, it is sufficient to find a smaller subset of frequent itemsets, such as identifying the frequent itemsets with a maximum length. In this paper, we present a pruning algorithm, called LengthSort, that reduces the search space effectively and improves the efficiency of mining frequent itemsets with a maximum length. LengthSort prunes both the items and the transactions before constructing a Frequent Pattern tree structure. Our experiments on several datasets show that the proposed pruning techniques reduce the time needed to discover the frequent itemsets with a maximum length. The proposed pruning algorithm can also be applied to efficiently discover frequent itemsets that are longer than a user-specified threshold.File | Dimensione | Formato | |
---|---|---|---|
Paper 4R Research Gate.pdf
Open Access dal 16/03/2022
Descrizione: Post-print
:
Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione
822.8 kB
Formato
Adobe PDF
|
822.8 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.