TY - GEN
T1 - Exploiting multi-aspect interactions for god class detection with dataset fine-tuning
AU - Ren, Shaojun
AU - Shi, Chongyang
AU - Zhao, Shuxin
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/7
Y1 - 2021/7
N2 - God class refers to a class that undertakes too many responsibilities for tasks that should more appropriately be handled by multiple classes. The existence of god classes seriously affects the maintainability and understandability of software. To eliminate god class, we first need to identify them. Researchers have proposed traditional methods using code metrics and deep learning methods using code metrics and text information to detect god classes. However, the relationship existing in metrics and text information is often ignored; moreover, deep learning methods require a large number of reliable datasets, while authentic god class datasets are scarce. To solve the above problems, we propose a novel god class detection method based on multi-aspect interactions and dataset fine-tuning. First, we use proposed model to extract multi-aspect interaction information, including three parts: (i) the interaction information existing in code metrics; (ii) the interaction information existing in texts; (iii) the interaction information existing in texts and code metrics. In this way, we can not only make use of code metrics and text information, but also fully exploit the multi-aspect interaction information. Second, we train with large-scale synthetic datasets to obtain a pre-trained model, then fine-tune the pre-trained model parameters with high-quality authentic datasets. Using the training method of pre-training and fine-tuning, we can solve the problem of low-reliability synthetic datasets and scarce authentic datasets. Finally, evaluation results on open-source applications suggest that the proposed approach improves on the state-of-the-art.
AB - God class refers to a class that undertakes too many responsibilities for tasks that should more appropriately be handled by multiple classes. The existence of god classes seriously affects the maintainability and understandability of software. To eliminate god class, we first need to identify them. Researchers have proposed traditional methods using code metrics and deep learning methods using code metrics and text information to detect god classes. However, the relationship existing in metrics and text information is often ignored; moreover, deep learning methods require a large number of reliable datasets, while authentic god class datasets are scarce. To solve the above problems, we propose a novel god class detection method based on multi-aspect interactions and dataset fine-tuning. First, we use proposed model to extract multi-aspect interaction information, including three parts: (i) the interaction information existing in code metrics; (ii) the interaction information existing in texts; (iii) the interaction information existing in texts and code metrics. In this way, we can not only make use of code metrics and text information, but also fully exploit the multi-aspect interaction information. Second, we train with large-scale synthetic datasets to obtain a pre-trained model, then fine-tune the pre-trained model parameters with high-quality authentic datasets. Using the training method of pre-training and fine-tuning, we can solve the problem of low-reliability synthetic datasets and scarce authentic datasets. Finally, evaluation results on open-source applications suggest that the proposed approach improves on the state-of-the-art.
KW - Code smells
KW - Feature interactions
KW - Fine-tuning
KW - God class
KW - Pre-training
UR - http://www.scopus.com/inward/record.url?scp=85115854216&partnerID=8YFLogxK
U2 - 10.1109/COMPSAC51774.2021.00119
DO - 10.1109/COMPSAC51774.2021.00119
M3 - Conference contribution
AN - SCOPUS:85115854216
T3 - Proceedings - 2021 IEEE 45th Annual Computers, Software, and Applications Conference, COMPSAC 2021
SP - 864
EP - 873
BT - Proceedings - 2021 IEEE 45th Annual Computers, Software, and Applications Conference, COMPSAC 2021
A2 - Chan, W. K.
A2 - Claycomb, Bill
A2 - Takakura, Hiroki
A2 - Yang, Ji-Jiang
A2 - Teranishi, Yuuichi
A2 - Towey, Dave
A2 - Segura, Sergio
A2 - Shahriar, Hossain
A2 - Reisman, Sorel
A2 - Ahamed, Sheikh Iqbal
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 45th IEEE Annual Computers, Software, and Applications Conference, COMPSAC 2021
Y2 - 12 July 2021 through 16 July 2021
ER -