Machine learning is widely used to predict software defect-prone components, facilitating testing and improving application quality. In a recent meta-analysis on binary classification for software defect prediction, the so-called researcher bias -i.e., the group who conducts the study- has been shown to play a critical role; the analysis, however, featured using proper null hypothesis testing statistical analysis alone. Since the null hypothesis testing is based on the so-called p-value, which is not the desired likelihood of the null hypothesis, it suffers from several important drawbacks. This article presents a Bayesian analysis of the same dataset, which overcomes the pitfalls of the null hypothesis testing approach and relaxes the assumptions of the methods used in the previous study. While the Bayesian analysis in this article identifies the software metrics as the most influential factor for a classifier's performance, researcher bias is still the second most important factor: the precautions against researcher bias are still critical to consider in the scope of software defect prediction endeavors. Further on, to confirm this finding, we analyze the data with more advanced Bayesian modeling, according to which we identify (1) classifiers with better performance, (2) the datasets whose instances are harder to predict, and (3) the metrics that impact the performance of a classifier.
Bayesian Meta-Analysis of Software Defect Prediction With Machine Learning
Damian Andrew Tamburri
2023-01-01
Abstract
Machine learning is widely used to predict software defect-prone components, facilitating testing and improving application quality. In a recent meta-analysis on binary classification for software defect prediction, the so-called researcher bias -i.e., the group who conducts the study- has been shown to play a critical role; the analysis, however, featured using proper null hypothesis testing statistical analysis alone. Since the null hypothesis testing is based on the so-called p-value, which is not the desired likelihood of the null hypothesis, it suffers from several important drawbacks. This article presents a Bayesian analysis of the same dataset, which overcomes the pitfalls of the null hypothesis testing approach and relaxes the assumptions of the methods used in the previous study. While the Bayesian analysis in this article identifies the software metrics as the most influential factor for a classifier's performance, researcher bias is still the second most important factor: the precautions against researcher bias are still critical to consider in the scope of software defect prediction endeavors. Further on, to confirm this finding, we analyze the data with more advanced Bayesian modeling, according to which we identify (1) classifiers with better performance, (2) the datasets whose instances are harder to predict, and (3) the metrics that impact the performance of a classifier.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


