COLARE: Commit Classification via Fine-grained Context-aware Representation of Code Changes

Qunhong Zeng; Yuxia Zhang; Zeyu Sun; Yujie Guo; Hui Liu

doi:10.1109/SANER60148.2024.00082

COLARE: Commit Classification via Fine-grained Context-aware Representation of Code Changes

Qunhong Zeng, Yuxia Zhang^*, Zeyu Sun, Yujie Guo, Hui Liu

^*此作品的通讯作者

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

Commit classification for maintenance activities is of critical importance for both industry and academia. State-of-the-art approaches either treat code changes as plain text or rely on manually identified features. Directly applying the most advanced model of code change representation into commit classification faces two limitations: (1) coarse-grained diff comparison neglects the distance of modified code lines; (2) missing key context information of hunk modification and file categories. This study proposes a novel classification model, COLARE, which compares code changes at the hunk level, takes fine-grained features based on categories of changed files, and aggregates with the representation of commit messages. The evaluation results show that our model outperforms state-of-the-art techniques by 7.24% and 7.35% in accuracy and macro F1 score, respectively. We also manually labeled a multi-language dataset and evaluated our approach, The results further confirm that our approach achieves the best performance over three baselines, including ChatGPT (3.5). The evaluation of the ablation study demonstrates the effectiveness of the major components in our technique.

源语言	英语
主期刊名	Proceedings - 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024
出版商	Institute of Electrical and Electronics Engineers Inc.
页	752-763
页数	12
ISBN（电子版）	9798350330663
DOI	https://doi.org/10.1109/SANER60148.2024.00082
出版状态	已出版 - 2024
活动	31st IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024 - Rovaniemi, 芬兰期限: 12 3月 2024 → 15 3月 2024

出版系列

姓名	Proceedings - 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024

会议

会议	31st IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024
国家/地区	芬兰
市	Rovaniemi
时期	12/03/24 → 15/03/24

访问文件

10.1109/SANER60148.2024.00082

其它文件与链接

链接到 Scopus 的出版物

引用此

Zeng, Q., Zhang, Y., Sun, Z., Guo, Y., & Liu, H. (2024). COLARE: Commit Classification via Fine-grained Context-aware Representation of Code Changes. 在 Proceedings - 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024 (页码 752-763). (Proceedings - 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/SANER60148.2024.00082

Zeng, Qunhong ; Zhang, Yuxia ; Sun, Zeyu 等. / COLARE : Commit Classification via Fine-grained Context-aware Representation of Code Changes. Proceedings - 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024. Institute of Electrical and Electronics Engineers Inc., 2024. 页码 752-763 (Proceedings - 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024).

@inproceedings{abf7ca24dff143abbfdcc3be8d689911,

title = "COLARE: Commit Classification via Fine-grained Context-aware Representation of Code Changes",

abstract = "Commit classification for maintenance activities is of critical importance for both industry and academia. State-of-the-art approaches either treat code changes as plain text or rely on manually identified features. Directly applying the most advanced model of code change representation into commit classification faces two limitations: (1) coarse-grained diff comparison neglects the distance of modified code lines; (2) missing key context information of hunk modification and file categories. This study proposes a novel classification model, COLARE, which compares code changes at the hunk level, takes fine-grained features based on categories of changed files, and aggregates with the representation of commit messages. The evaluation results show that our model outperforms state-of-the-art techniques by 7.24% and 7.35% in accuracy and macro F1 score, respectively. We also manually labeled a multi-language dataset and evaluated our approach, The results further confirm that our approach achieves the best performance over three baselines, including ChatGPT (3.5). The evaluation of the ablation study demonstrates the effectiveness of the major components in our technique.",

keywords = "Commit Classification, Fine-grained Code Change Representation, Maintenance Activities",

author = "Qunhong Zeng and Yuxia Zhang and Zeyu Sun and Yujie Guo and Hui Liu",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 31st IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024 ; Conference date: 12-03-2024 Through 15-03-2024",

year = "2024",

doi = "10.1109/SANER60148.2024.00082",

language = "English",

series = "Proceedings - 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "752--763",

booktitle = "Proceedings - 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024",

address = "United States",

}

Zeng, Q, Zhang, Y, Sun, Z, Guo, Y & Liu, H 2024, COLARE: Commit Classification via Fine-grained Context-aware Representation of Code Changes. 在 Proceedings - 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024. Proceedings - 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024, Institute of Electrical and Electronics Engineers Inc., 页码 752-763, 31st IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024, Rovaniemi, 芬兰, 12/03/24. https://doi.org/10.1109/SANER60148.2024.00082

COLARE: Commit Classification via Fine-grained Context-aware Representation of Code Changes. / Zeng, Qunhong; Zhang, Yuxia; Sun, Zeyu 等.
Proceedings - 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024. Institute of Electrical and Electronics Engineers Inc., 2024. 页码 752-763 (Proceedings - 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - COLARE

T2 - 31st IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024

AU - Zeng, Qunhong

AU - Zhang, Yuxia

AU - Sun, Zeyu

AU - Guo, Yujie

AU - Liu, Hui

PY - 2024

Y1 - 2024

N2 - Commit classification for maintenance activities is of critical importance for both industry and academia. State-of-the-art approaches either treat code changes as plain text or rely on manually identified features. Directly applying the most advanced model of code change representation into commit classification faces two limitations: (1) coarse-grained diff comparison neglects the distance of modified code lines; (2) missing key context information of hunk modification and file categories. This study proposes a novel classification model, COLARE, which compares code changes at the hunk level, takes fine-grained features based on categories of changed files, and aggregates with the representation of commit messages. The evaluation results show that our model outperforms state-of-the-art techniques by 7.24% and 7.35% in accuracy and macro F1 score, respectively. We also manually labeled a multi-language dataset and evaluated our approach, The results further confirm that our approach achieves the best performance over three baselines, including ChatGPT (3.5). The evaluation of the ablation study demonstrates the effectiveness of the major components in our technique.

AB - Commit classification for maintenance activities is of critical importance for both industry and academia. State-of-the-art approaches either treat code changes as plain text or rely on manually identified features. Directly applying the most advanced model of code change representation into commit classification faces two limitations: (1) coarse-grained diff comparison neglects the distance of modified code lines; (2) missing key context information of hunk modification and file categories. This study proposes a novel classification model, COLARE, which compares code changes at the hunk level, takes fine-grained features based on categories of changed files, and aggregates with the representation of commit messages. The evaluation results show that our model outperforms state-of-the-art techniques by 7.24% and 7.35% in accuracy and macro F1 score, respectively. We also manually labeled a multi-language dataset and evaluated our approach, The results further confirm that our approach achieves the best performance over three baselines, including ChatGPT (3.5). The evaluation of the ablation study demonstrates the effectiveness of the major components in our technique.

KW - Commit Classification

KW - Fine-grained Code Change Representation

KW - Maintenance Activities

UR - http://www.scopus.com/inward/record.url?scp=85199776987&partnerID=8YFLogxK

U2 - 10.1109/SANER60148.2024.00082

DO - 10.1109/SANER60148.2024.00082

M3 - Conference contribution

AN - SCOPUS:85199776987

T3 - Proceedings - 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024

SP - 752

EP - 763

BT - Proceedings - 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 12 March 2024 through 15 March 2024

ER -

Zeng Q, Zhang Y, Sun Z, Guo Y, Liu H. COLARE: Commit Classification via Fine-grained Context-aware Representation of Code Changes. 在 Proceedings - 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024. Institute of Electrical and Electronics Engineers Inc. 2024. 页码 752-763. (Proceedings - 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024). doi: 10.1109/SANER60148.2024.00082

COLARE: Commit Classification via Fine-grained Context-aware Representation of Code Changes

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此