SPBC: A self-paced learning model for bug classification from historical repositories of open-source software

Hufsa Mohsin; Chongyang Shi

doi:10.1016/j.eswa.2020.113808

SPBC: A self-paced learning model for bug classification from historical repositories of open-source software

Hufsa Mohsin, Chongyang Shi^*

^*Corresponding author for this work

School of Computer Science and Technology

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

10 Citations (Scopus)

Abstract

One of the areas most in need of improvement in the field of automated bug fixing, localization and triaging systems is that of an effective categorization, as this would bugs to reduce the time, cost and effort required to locate, assign and fix the bug. The existing approaches depend upon the textual similarity of the bug description and category in a given reported bug; accordingly, the challenges of unstructured bugs, technical terms, versatile ways of reporting the same bug, the diverse nature and sizes of datasets etc. are often overlooked. Consequently, this limits the classifier performance to a specific type of dataset, resulting in classification inefficiency. To this end, we propose a novel Self-Paced Bug Classifier (SPBC) that is capable of locating the target categories from the bug description of the historical data, maintained by multiple open-source software packages (Bugzilla, Mentis, Redmine). The proposed model introduces a self-paced back-traceable algorithm, controlled by a self-paced regularizer, which classifies textually independent bug descriptions with weighted data-independent tokens (the easy samples). Later on, the regularizer sets comparatively hard samples for textually dependent classification by capturing intra-class and inter-class discrimination features from bug descriptions, based on the weighted similarities of words; this is done with the help of a Key Feature Identification Matrix (KFIM), a Non-Independent and Identically Distributed (NIID) matrix. Easy-to-hard self-pace learning, integrated with textually dependent and independent classification, makes SPBC capable of simultaneously enhancing the effectiveness and robustness of intelligent systems through a substantial increase in precision (5–15% on average). The main advantage of SPBC is that it targets the spatial relationship between the data and the system, which makes it an apt learner of data and allows it to maintains sample insertion into the classifier at a controlled pace. Additionally, it maintains stability, which is not affected by the dataset's dimensionality and traits. As is evidenced by the experimental results on four different datasets from open-source projects, our model outperforms the baseline and state-of-the-art methods through a single-stroke solution with improved accuracy and stable performance (average 95% precision and 4% decrease in kappa); hence, it is significant for improving intelligent bug fixing and triaging systems.

Original language	English
Article number	113808
Journal	Expert Systems with Applications
Volume	167
DOIs	https://doi.org/10.1016/j.eswa.2020.113808
Publication status	Published - 1 Apr 2021

Keywords

Bug classification
Bug report analysis
Bug triaging
Defect localization
Self-paced learning

Access to Document

10.1016/j.eswa.2020.113808

Cite this

Mohsin, H., & Shi, C. (2021). SPBC: A self-paced learning model for bug classification from historical repositories of open-source software. Expert Systems with Applications, 167, Article 113808. https://doi.org/10.1016/j.eswa.2020.113808

@article{4f0b2db44d5f4b88b2fa6b0be2632625,

title = "SPBC: A self-paced learning model for bug classification from historical repositories of open-source software",

abstract = "One of the areas most in need of improvement in the field of automated bug fixing, localization and triaging systems is that of an effective categorization, as this would bugs to reduce the time, cost and effort required to locate, assign and fix the bug. The existing approaches depend upon the textual similarity of the bug description and category in a given reported bug; accordingly, the challenges of unstructured bugs, technical terms, versatile ways of reporting the same bug, the diverse nature and sizes of datasets etc. are often overlooked. Consequently, this limits the classifier performance to a specific type of dataset, resulting in classification inefficiency. To this end, we propose a novel Self-Paced Bug Classifier (SPBC) that is capable of locating the target categories from the bug description of the historical data, maintained by multiple open-source software packages (Bugzilla, Mentis, Redmine). The proposed model introduces a self-paced back-traceable algorithm, controlled by a self-paced regularizer, which classifies textually independent bug descriptions with weighted data-independent tokens (the easy samples). Later on, the regularizer sets comparatively hard samples for textually dependent classification by capturing intra-class and inter-class discrimination features from bug descriptions, based on the weighted similarities of words; this is done with the help of a Key Feature Identification Matrix (KFIM), a Non-Independent and Identically Distributed (NIID) matrix. Easy-to-hard self-pace learning, integrated with textually dependent and independent classification, makes SPBC capable of simultaneously enhancing the effectiveness and robustness of intelligent systems through a substantial increase in precision (5–15% on average). The main advantage of SPBC is that it targets the spatial relationship between the data and the system, which makes it an apt learner of data and allows it to maintains sample insertion into the classifier at a controlled pace. Additionally, it maintains stability, which is not affected by the dataset's dimensionality and traits. As is evidenced by the experimental results on four different datasets from open-source projects, our model outperforms the baseline and state-of-the-art methods through a single-stroke solution with improved accuracy and stable performance (average 95% precision and 4% decrease in kappa); hence, it is significant for improving intelligent bug fixing and triaging systems.",

keywords = "Bug classification, Bug report analysis, Bug triaging, Defect localization, Self-paced learning",

author = "Hufsa Mohsin and Chongyang Shi",

note = "Publisher Copyright: {\textcopyright} 2020 Elsevier Ltd",

year = "2021",

month = apr,

day = "1",

doi = "10.1016/j.eswa.2020.113808",

language = "English",

volume = "167",

journal = "Expert Systems with Applications",

issn = "0957-4174",

publisher = "Elsevier Ltd.",

}

TY - JOUR

T1 - SPBC

T2 - A self-paced learning model for bug classification from historical repositories of open-source software

AU - Mohsin, Hufsa

AU - Shi, Chongyang

PY - 2021/4/1

Y1 - 2021/4/1

N2 - One of the areas most in need of improvement in the field of automated bug fixing, localization and triaging systems is that of an effective categorization, as this would bugs to reduce the time, cost and effort required to locate, assign and fix the bug. The existing approaches depend upon the textual similarity of the bug description and category in a given reported bug; accordingly, the challenges of unstructured bugs, technical terms, versatile ways of reporting the same bug, the diverse nature and sizes of datasets etc. are often overlooked. Consequently, this limits the classifier performance to a specific type of dataset, resulting in classification inefficiency. To this end, we propose a novel Self-Paced Bug Classifier (SPBC) that is capable of locating the target categories from the bug description of the historical data, maintained by multiple open-source software packages (Bugzilla, Mentis, Redmine). The proposed model introduces a self-paced back-traceable algorithm, controlled by a self-paced regularizer, which classifies textually independent bug descriptions with weighted data-independent tokens (the easy samples). Later on, the regularizer sets comparatively hard samples for textually dependent classification by capturing intra-class and inter-class discrimination features from bug descriptions, based on the weighted similarities of words; this is done with the help of a Key Feature Identification Matrix (KFIM), a Non-Independent and Identically Distributed (NIID) matrix. Easy-to-hard self-pace learning, integrated with textually dependent and independent classification, makes SPBC capable of simultaneously enhancing the effectiveness and robustness of intelligent systems through a substantial increase in precision (5–15% on average). The main advantage of SPBC is that it targets the spatial relationship between the data and the system, which makes it an apt learner of data and allows it to maintains sample insertion into the classifier at a controlled pace. Additionally, it maintains stability, which is not affected by the dataset's dimensionality and traits. As is evidenced by the experimental results on four different datasets from open-source projects, our model outperforms the baseline and state-of-the-art methods through a single-stroke solution with improved accuracy and stable performance (average 95% precision and 4% decrease in kappa); hence, it is significant for improving intelligent bug fixing and triaging systems.

AB - One of the areas most in need of improvement in the field of automated bug fixing, localization and triaging systems is that of an effective categorization, as this would bugs to reduce the time, cost and effort required to locate, assign and fix the bug. The existing approaches depend upon the textual similarity of the bug description and category in a given reported bug; accordingly, the challenges of unstructured bugs, technical terms, versatile ways of reporting the same bug, the diverse nature and sizes of datasets etc. are often overlooked. Consequently, this limits the classifier performance to a specific type of dataset, resulting in classification inefficiency. To this end, we propose a novel Self-Paced Bug Classifier (SPBC) that is capable of locating the target categories from the bug description of the historical data, maintained by multiple open-source software packages (Bugzilla, Mentis, Redmine). The proposed model introduces a self-paced back-traceable algorithm, controlled by a self-paced regularizer, which classifies textually independent bug descriptions with weighted data-independent tokens (the easy samples). Later on, the regularizer sets comparatively hard samples for textually dependent classification by capturing intra-class and inter-class discrimination features from bug descriptions, based on the weighted similarities of words; this is done with the help of a Key Feature Identification Matrix (KFIM), a Non-Independent and Identically Distributed (NIID) matrix. Easy-to-hard self-pace learning, integrated with textually dependent and independent classification, makes SPBC capable of simultaneously enhancing the effectiveness and robustness of intelligent systems through a substantial increase in precision (5–15% on average). The main advantage of SPBC is that it targets the spatial relationship between the data and the system, which makes it an apt learner of data and allows it to maintains sample insertion into the classifier at a controlled pace. Additionally, it maintains stability, which is not affected by the dataset's dimensionality and traits. As is evidenced by the experimental results on four different datasets from open-source projects, our model outperforms the baseline and state-of-the-art methods through a single-stroke solution with improved accuracy and stable performance (average 95% precision and 4% decrease in kappa); hence, it is significant for improving intelligent bug fixing and triaging systems.

KW - Bug classification

KW - Bug report analysis

KW - Bug triaging

KW - Defect localization

KW - Self-paced learning

UR - http://www.scopus.com/inward/record.url?scp=85092937208&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2020.113808

DO - 10.1016/j.eswa.2020.113808

M3 - Article

AN - SCOPUS:85092937208

SN - 0957-4174

VL - 167

JO - Expert Systems with Applications

JF - Expert Systems with Applications

M1 - 113808

ER -

SPBC: A self-paced learning model for bug classification from historical repositories of open-source software

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this