Investigating associative classification for software fault prediction: An experimental perspective

Baojun Ma; Huaping Zhang; Guoqing Chen; Yanping Zhao; Bart Baesens

doi:10.1142/S021819401450003X

Investigating associative classification for software fault prediction: An experimental perspective

Baojun Ma, Huaping Zhang, Guoqing Chen^*, Yanping Zhao, Bart Baesens

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Contribution to journal › Article › peer-review

24 Citations (Scopus)

Abstract

It is a recurrent finding that software development is often troubled by considerable delays as well as budget overruns and several solutions have been proposed in answer to this observation, software fault prediction being a prime example. Drawing upon machine learning techniques, software fault prediction tries to identify upfront software modules that are most likely to contain faults, thereby streamlining testing efforts and improving overall software quality. When deploying fault prediction models in a production environment, both prediction performance and model comprehensibility are typically taken into consideration, although the latter is commonly overlooked in the academic literature. Many classification methods have been suggested to conduct fault prediction; yet associative classification methods remain uninvestigated in this context. This paper proposes an associative classification (AC)-based fault prediction method, building upon the CBA2 algorithm. In an empirical comparison on 12 real-world datasets, the AC-based classifier is shown to achieve a predictive performance competitive to those of models induced by five other tree/rule-based classification techniques. In addition, our findings also highlight the comprehensibility of the AC-based models, while achieving similar prediction performance. Furthermore, the possibilities of cross project prediction are investigated, strengthening earlier findings on the feasibility of such approach when insufficient data on the target project is available.

Original language	English
Pages (from-to)	61-90
Number of pages	30
Journal	International Journal of Software Engineering and Knowledge Engineering
Volume	24
Issue number	1
DOIs	https://doi.org/10.1142/S021819401450003X
Publication status	Published - Feb 2014

Keywords

Software fault prediction
associative classification
comprehensibility
cross project validation
prediction performance

Access to Document

10.1142/S021819401450003X

Cite this

Ma, B., Zhang, H., Chen, G., Zhao, Y., & Baesens, B. (2014). Investigating associative classification for software fault prediction: An experimental perspective. International Journal of Software Engineering and Knowledge Engineering, 24(1), 61-90. https://doi.org/10.1142/S021819401450003X

@article{9e60abeeb0d444f09bddee7909f31584,

title = "Investigating associative classification for software fault prediction: An experimental perspective",

abstract = "It is a recurrent finding that software development is often troubled by considerable delays as well as budget overruns and several solutions have been proposed in answer to this observation, software fault prediction being a prime example. Drawing upon machine learning techniques, software fault prediction tries to identify upfront software modules that are most likely to contain faults, thereby streamlining testing efforts and improving overall software quality. When deploying fault prediction models in a production environment, both prediction performance and model comprehensibility are typically taken into consideration, although the latter is commonly overlooked in the academic literature. Many classification methods have been suggested to conduct fault prediction; yet associative classification methods remain uninvestigated in this context. This paper proposes an associative classification (AC)-based fault prediction method, building upon the CBA2 algorithm. In an empirical comparison on 12 real-world datasets, the AC-based classifier is shown to achieve a predictive performance competitive to those of models induced by five other tree/rule-based classification techniques. In addition, our findings also highlight the comprehensibility of the AC-based models, while achieving similar prediction performance. Furthermore, the possibilities of cross project prediction are investigated, strengthening earlier findings on the feasibility of such approach when insufficient data on the target project is available.",

keywords = "Software fault prediction, associative classification, comprehensibility, cross project validation, prediction performance",

author = "Baojun Ma and Huaping Zhang and Guoqing Chen and Yanping Zhao and Bart Baesens",

year = "2014",

month = feb,

doi = "10.1142/S021819401450003X",

language = "English",

volume = "24",

pages = "61--90",

journal = "International Journal of Software Engineering and Knowledge Engineering",

issn = "0218-1940",

publisher = "World Scientific Publishing Co. Pte Ltd",

number = "1",

}

TY - JOUR

T1 - Investigating associative classification for software fault prediction

T2 - An experimental perspective

AU - Ma, Baojun

AU - Zhang, Huaping

AU - Chen, Guoqing

AU - Zhao, Yanping

AU - Baesens, Bart

PY - 2014/2

Y1 - 2014/2

N2 - It is a recurrent finding that software development is often troubled by considerable delays as well as budget overruns and several solutions have been proposed in answer to this observation, software fault prediction being a prime example. Drawing upon machine learning techniques, software fault prediction tries to identify upfront software modules that are most likely to contain faults, thereby streamlining testing efforts and improving overall software quality. When deploying fault prediction models in a production environment, both prediction performance and model comprehensibility are typically taken into consideration, although the latter is commonly overlooked in the academic literature. Many classification methods have been suggested to conduct fault prediction; yet associative classification methods remain uninvestigated in this context. This paper proposes an associative classification (AC)-based fault prediction method, building upon the CBA2 algorithm. In an empirical comparison on 12 real-world datasets, the AC-based classifier is shown to achieve a predictive performance competitive to those of models induced by five other tree/rule-based classification techniques. In addition, our findings also highlight the comprehensibility of the AC-based models, while achieving similar prediction performance. Furthermore, the possibilities of cross project prediction are investigated, strengthening earlier findings on the feasibility of such approach when insufficient data on the target project is available.

AB - It is a recurrent finding that software development is often troubled by considerable delays as well as budget overruns and several solutions have been proposed in answer to this observation, software fault prediction being a prime example. Drawing upon machine learning techniques, software fault prediction tries to identify upfront software modules that are most likely to contain faults, thereby streamlining testing efforts and improving overall software quality. When deploying fault prediction models in a production environment, both prediction performance and model comprehensibility are typically taken into consideration, although the latter is commonly overlooked in the academic literature. Many classification methods have been suggested to conduct fault prediction; yet associative classification methods remain uninvestigated in this context. This paper proposes an associative classification (AC)-based fault prediction method, building upon the CBA2 algorithm. In an empirical comparison on 12 real-world datasets, the AC-based classifier is shown to achieve a predictive performance competitive to those of models induced by five other tree/rule-based classification techniques. In addition, our findings also highlight the comprehensibility of the AC-based models, while achieving similar prediction performance. Furthermore, the possibilities of cross project prediction are investigated, strengthening earlier findings on the feasibility of such approach when insufficient data on the target project is available.

KW - Software fault prediction

KW - associative classification

KW - comprehensibility

KW - cross project validation

KW - prediction performance

UR - http://www.scopus.com/inward/record.url?scp=84902316977&partnerID=8YFLogxK

U2 - 10.1142/S021819401450003X

DO - 10.1142/S021819401450003X

M3 - Article

AN - SCOPUS:84902316977

SN - 0218-1940

VL - 24

SP - 61

EP - 90

JO - International Journal of Software Engineering and Knowledge Engineering

JF - International Journal of Software Engineering and Knowledge Engineering

IS - 1

ER -

Investigating associative classification for software fault prediction: An experimental perspective

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this