TY - GEN
T1 - COLARE
T2 - 31st IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024
AU - Zeng, Qunhong
AU - Zhang, Yuxia
AU - Sun, Zeyu
AU - Guo, Yujie
AU - Liu, Hui
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Commit classification for maintenance activities is of critical importance for both industry and academia. State-of-the-art approaches either treat code changes as plain text or rely on manually identified features. Directly applying the most advanced model of code change representation into commit classification faces two limitations: (1) coarse-grained diff comparison neglects the distance of modified code lines; (2) missing key context information of hunk modification and file categories. This study proposes a novel classification model, COLARE, which compares code changes at the hunk level, takes fine-grained features based on categories of changed files, and aggregates with the representation of commit messages. The evaluation results show that our model outperforms state-of-the-art techniques by 7.24% and 7.35% in accuracy and macro F1 score, respectively. We also manually labeled a multi-language dataset and evaluated our approach, The results further confirm that our approach achieves the best performance over three baselines, including ChatGPT (3.5). The evaluation of the ablation study demonstrates the effectiveness of the major components in our technique.
AB - Commit classification for maintenance activities is of critical importance for both industry and academia. State-of-the-art approaches either treat code changes as plain text or rely on manually identified features. Directly applying the most advanced model of code change representation into commit classification faces two limitations: (1) coarse-grained diff comparison neglects the distance of modified code lines; (2) missing key context information of hunk modification and file categories. This study proposes a novel classification model, COLARE, which compares code changes at the hunk level, takes fine-grained features based on categories of changed files, and aggregates with the representation of commit messages. The evaluation results show that our model outperforms state-of-the-art techniques by 7.24% and 7.35% in accuracy and macro F1 score, respectively. We also manually labeled a multi-language dataset and evaluated our approach, The results further confirm that our approach achieves the best performance over three baselines, including ChatGPT (3.5). The evaluation of the ablation study demonstrates the effectiveness of the major components in our technique.
KW - Commit Classification
KW - Fine-grained Code Change Representation
KW - Maintenance Activities
UR - http://www.scopus.com/inward/record.url?scp=85199776987&partnerID=8YFLogxK
U2 - 10.1109/SANER60148.2024.00082
DO - 10.1109/SANER60148.2024.00082
M3 - Conference contribution
AN - SCOPUS:85199776987
T3 - Proceedings - 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024
SP - 752
EP - 763
BT - Proceedings - 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 12 March 2024 through 15 March 2024
ER -