Discovering higher level correlations from XML data

Luca, Cagliero; Tania, Cerquitelli; Garza, Paolo

doi:10.4018/978-1-61350-356-0.ch013

Association rule extraction is a widely used exploratory technique that allows the identification of hidden correlations among data. The problem of generalized association rule mining, originally introduced in the context of market basket analysis, exploits a taxonomy to drive the mining activity with the aim at discovering associations between data items at any level of the taxonomy. Since XML has become a standard for representing and exchanging information, the extraction of association rules from XML data is becoming appealing as it allows for identifying hidden and interesting patterns among data. Many approaches have been devoted to effectively mining association rules from both transactional and XML data. However, traditional association rule mining algorithms are sometimes not effective in mining valuable knowledge because of the excessive detail level of the mined information. Furthermore, the cardinality of the extracted knowledge may be too large to be effectively exploited in decision making processes. This chapter proposes the XML-GERMI framework to support XML data analysis by automatically extracting generalized association rules (i.e., higher level correlations) from XML data. The proposed approach, which extends the concept of multiple-level association rules, is focused on extracting generalized rules from XML data. To drive the generalization phase of the extraction process, a taxonomy is exploited to aggregate features at different granularity levels. Experiments performed on both real and synthetic datasets show the adaptability and the effectiveness of the proposed framework in discovering higher level correlations from XML data.