TY - JOUR
T1 - 稳健边界强化GMM-SMOTE软件缺陷检测方法
AU - Luo, Senlin
AU - Su, Xia
AU - Pan, Limin
N1 - Publisher Copyright:
© 2021, Editorial Department of Transaction of Beijing Institute of Technology. All right reserved.
PY - 2021/3
Y1 - 2021/3
N2 - Software defects are bugs that can disrupt the normal operation of the system or software, the cost of detection and positioning for software defects is high. Automatic defect detection model based on software data have become an important tool for defect discovery. Defective samples that are accurately labeled is rare, and the rate of missing labels and mislabeling is high, which leads the existing data balance optimization methods to exacerbate noise and blur boundaries of classification. To solve this problem, a robust boundary-enhanced GMM-SMOTE software defect detection method was proposed. This method was arranged to use Gaussian mixture clustering to divide the software data set into multiple clusters, to make reliable sample selection based on intra-cluster category ratio, and to implement boundary recognition based on posterior probability, to guide the completion of the weighted data balance, and finally to build a software defect detection model using balanced optimization data. Experimental results on multiple NASA public data sets show that GMM-SMOTE can achieve data balance of noise suppression and boundary enhancement, effectively improve the effect of software defect detection, possessing great practical value.
AB - Software defects are bugs that can disrupt the normal operation of the system or software, the cost of detection and positioning for software defects is high. Automatic defect detection model based on software data have become an important tool for defect discovery. Defective samples that are accurately labeled is rare, and the rate of missing labels and mislabeling is high, which leads the existing data balance optimization methods to exacerbate noise and blur boundaries of classification. To solve this problem, a robust boundary-enhanced GMM-SMOTE software defect detection method was proposed. This method was arranged to use Gaussian mixture clustering to divide the software data set into multiple clusters, to make reliable sample selection based on intra-cluster category ratio, and to implement boundary recognition based on posterior probability, to guide the completion of the weighted data balance, and finally to build a software defect detection model using balanced optimization data. Experimental results on multiple NASA public data sets show that GMM-SMOTE can achieve data balance of noise suppression and boundary enhancement, effectively improve the effect of software defect detection, possessing great practical value.
KW - Data imbalance
KW - Gaussian mixture model
KW - Oversampling
KW - Software defect detection
UR - http://www.scopus.com/inward/record.url?scp=85105623056&partnerID=8YFLogxK
U2 - 10.15918/j.tbit1001-0645.2019.312
DO - 10.15918/j.tbit1001-0645.2019.312
M3 - 文章
AN - SCOPUS:85105623056
SN - 1001-0645
VL - 41
SP - 303
EP - 310
JO - Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology
JF - Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology
IS - 3
ER -