While event log data quality is recognized as a crucial concern in process mining, the impact of event log errors on different types of process mining tasks has remained largely unexplored. This paper aims to fill such a gap by analyzing how various errors affect analysis results. In particular, we aim to assess whether and to what extent different types of errors that impact the quality of activity labels affect the performance of predictive process monitoring models, considering the three main tasks of next activity, outcome, and remaining time prediction, using publicly available and simulated event logs. The results of the experiments are used to extract preliminary insights into the design of data preparation pipelines for predictive process monitoring.
On the Impact of Low-Quality Activity Labels in Predictive Process Monitoring
Comuzzi, Marco;Salamov, Musa;Cappiello, Cinzia;Pernici, Barbara
2025-01-01
Abstract
While event log data quality is recognized as a crucial concern in process mining, the impact of event log errors on different types of process mining tasks has remained largely unexplored. This paper aims to fill such a gap by analyzing how various errors affect analysis results. In particular, we aim to assess whether and to what extent different types of errors that impact the quality of activity labels affect the performance of predictive process monitoring models, considering the three main tasks of next activity, outcome, and remaining time prediction, using publicly available and simulated event logs. The results of the experiments are used to extract preliminary insights into the design of data preparation pipelines for predictive process monitoring.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


