COLARE: Commit Classification via Fine-grained Context-aware Representation of Code Changes

Qunhong Zeng, Yuxia Zhang*, Zeyu Sun, Yujie Guo, Hui Liu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Commit classification for maintenance activities is of critical importance for both industry and academia. State-of-the-art approaches either treat code changes as plain text or rely on manually identified features. Directly applying the most advanced model of code change representation into commit classification faces two limitations: (1) coarse-grained diff comparison neglects the distance of modified code lines; (2) missing key context information of hunk modification and file categories. This study proposes a novel classification model, COLARE, which compares code changes at the hunk level, takes fine-grained features based on categories of changed files, and aggregates with the representation of commit messages. The evaluation results show that our model outperforms state-of-the-art techniques by 7.24% and 7.35% in accuracy and macro F1 score, respectively. We also manually labeled a multi-language dataset and evaluated our approach, The results further confirm that our approach achieves the best performance over three baselines, including ChatGPT (3.5). The evaluation of the ablation study demonstrates the effectiveness of the major components in our technique.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages752-763
Number of pages12
ISBN (Electronic)9798350330663
DOIs
Publication statusPublished - 2024
Event31st IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024 - Rovaniemi, Finland
Duration: 12 Mar 202415 Mar 2024

Publication series

NameProceedings - 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024

Conference

Conference31st IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024
Country/TerritoryFinland
CityRovaniemi
Period12/03/2415/03/24

Keywords

  • Commit Classification
  • Fine-grained Code Change Representation
  • Maintenance Activities

Fingerprint

Dive into the research topics of 'COLARE: Commit Classification via Fine-grained Context-aware Representation of Code Changes'. Together they form a unique fingerprint.

Cite this