Association rule extraction is a widely used exploratory technique that allows the identification of hidden correlations among data. The problem of generalized association rule mining, originally introduced in the context of market basket analysis, exploits a taxonomy to drive the mining activity with the aim at discovering associations between data items at any level of the taxonomy. Since XML has become a standard for representing and exchanging information, the extraction of association rules from XML data is becoming appealing as it allows for identifying hidden and interesting patterns among data. Many approaches have been devoted to effectively mining association rules from both transactional and XML data. However, traditional association rule mining algorithms are sometimes not effective in mining valuable knowledge because of the excessive detail level of the mined information. Furthermore, the cardinality of the extracted knowledge may be too large to be effectively exploited in decision making processes. This chapter proposes the XML-GERMI framework to support XML data analysis by automatically extracting generalized association rules (i.e., higher level correlations) from XML data. The proposed approach, which extends the concept of multiple-level association rules, is focused on extracting generalized rules from XML data. To drive the generalization phase of the extraction process, a taxonomy is exploited to aggregate features at different granularity levels. Experiments performed on both real and synthetic datasets show the adaptability and the effectiveness of the proposed framework in discovering higher level correlations from XML data.

Discovering higher level correlations from XML data

GARZA, PAOLO
2012

Abstract

Association rule extraction is a widely used exploratory technique that allows the identification of hidden correlations among data. The problem of generalized association rule mining, originally introduced in the context of market basket analysis, exploits a taxonomy to drive the mining activity with the aim at discovering associations between data items at any level of the taxonomy. Since XML has become a standard for representing and exchanging information, the extraction of association rules from XML data is becoming appealing as it allows for identifying hidden and interesting patterns among data. Many approaches have been devoted to effectively mining association rules from both transactional and XML data. However, traditional association rule mining algorithms are sometimes not effective in mining valuable knowledge because of the excessive detail level of the mined information. Furthermore, the cardinality of the extracted knowledge may be too large to be effectively exploited in decision making processes. This chapter proposes the XML-GERMI framework to support XML data analysis by automatically extracting generalized association rules (i.e., higher level correlations) from XML data. The proposed approach, which extends the concept of multiple-level association rules, is focused on extracting generalized rules from XML data. To drive the generalization phase of the extraction process, a taxonomy is exploited to aggregate features at different granularity levels. Experiments performed on both real and synthetic datasets show the adaptability and the effectiveness of the proposed framework in discovering higher level correlations from XML data.
XML Data Mining: Models, Methods, and Applications
9781613503560
INF
File in questo prodotto:
File Dimensione Formato  
XMLDataMining2012_Chapter13.pdf

Accesso riservato

: Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione 4.1 MB
Formato Adobe PDF
4.1 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11311/580110
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 0
social impact