TY - JOUR
T1 - Investigating associative classification for software fault prediction
T2 - An experimental perspective
AU - Ma, Baojun
AU - Zhang, Huaping
AU - Chen, Guoqing
AU - Zhao, Yanping
AU - Baesens, Bart
PY - 2014/2
Y1 - 2014/2
N2 - It is a recurrent finding that software development is often troubled by considerable delays as well as budget overruns and several solutions have been proposed in answer to this observation, software fault prediction being a prime example. Drawing upon machine learning techniques, software fault prediction tries to identify upfront software modules that are most likely to contain faults, thereby streamlining testing efforts and improving overall software quality. When deploying fault prediction models in a production environment, both prediction performance and model comprehensibility are typically taken into consideration, although the latter is commonly overlooked in the academic literature. Many classification methods have been suggested to conduct fault prediction; yet associative classification methods remain uninvestigated in this context. This paper proposes an associative classification (AC)-based fault prediction method, building upon the CBA2 algorithm. In an empirical comparison on 12 real-world datasets, the AC-based classifier is shown to achieve a predictive performance competitive to those of models induced by five other tree/rule-based classification techniques. In addition, our findings also highlight the comprehensibility of the AC-based models, while achieving similar prediction performance. Furthermore, the possibilities of cross project prediction are investigated, strengthening earlier findings on the feasibility of such approach when insufficient data on the target project is available.
AB - It is a recurrent finding that software development is often troubled by considerable delays as well as budget overruns and several solutions have been proposed in answer to this observation, software fault prediction being a prime example. Drawing upon machine learning techniques, software fault prediction tries to identify upfront software modules that are most likely to contain faults, thereby streamlining testing efforts and improving overall software quality. When deploying fault prediction models in a production environment, both prediction performance and model comprehensibility are typically taken into consideration, although the latter is commonly overlooked in the academic literature. Many classification methods have been suggested to conduct fault prediction; yet associative classification methods remain uninvestigated in this context. This paper proposes an associative classification (AC)-based fault prediction method, building upon the CBA2 algorithm. In an empirical comparison on 12 real-world datasets, the AC-based classifier is shown to achieve a predictive performance competitive to those of models induced by five other tree/rule-based classification techniques. In addition, our findings also highlight the comprehensibility of the AC-based models, while achieving similar prediction performance. Furthermore, the possibilities of cross project prediction are investigated, strengthening earlier findings on the feasibility of such approach when insufficient data on the target project is available.
KW - Software fault prediction
KW - associative classification
KW - comprehensibility
KW - cross project validation
KW - prediction performance
UR - http://www.scopus.com/inward/record.url?scp=84902316977&partnerID=8YFLogxK
U2 - 10.1142/S021819401450003X
DO - 10.1142/S021819401450003X
M3 - Article
AN - SCOPUS:84902316977
SN - 0218-1940
VL - 24
SP - 61
EP - 90
JO - International Journal of Software Engineering and Knowledge Engineering
JF - International Journal of Software Engineering and Knowledge Engineering
IS - 1
ER -