TY - GEN
T1 - What Makes a Good Commit Message?
AU - Tian, Yingchen
AU - Zhang, Yuxia
AU - Stol, Klaas Jan
AU - Jiang, Lin
AU - Liu, Hui
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022
Y1 - 2022
N2 - A key issue in collaborative software development is communication among developers. One modality of communication is a commit message, in which developers describe the changes they make in a repository. As such, commit messages serve as an 'audit trail' by which developers can understand how the source code of a project has changed-and why. Hence, the quality of commit messages affects the effectiveness of communication among developers. Commit messages are often of poor quality as developers lack time and motivation to craft a good message. Several automatic approaches have been proposed to generate commit messages. However, these are based on uncurated datasets including considerable proportions of poorly phrased commit messages. In this multi-method study, we first define what constitutes a 'good' commit message, and then establish what proportion of commit messages lack information using a sample of almost 1,600 messages from five highly active open source projects. We find that an average of circa 44% of messages could be improved, suggesting the use of uncurated datasets may be a major threat when commit message generators are trained with such data. We also observe that prior work has not considered semantics of commit messages, and there is surprisingly little guidance available for writing good commit messages. To that end, we develop a taxonomy based on recurring patterns in commit messages' expressions. Finally, we investigate whether 'good' commit messages can be automatically identified; such automation could prompt developers to write better commit messages.
AB - A key issue in collaborative software development is communication among developers. One modality of communication is a commit message, in which developers describe the changes they make in a repository. As such, commit messages serve as an 'audit trail' by which developers can understand how the source code of a project has changed-and why. Hence, the quality of commit messages affects the effectiveness of communication among developers. Commit messages are often of poor quality as developers lack time and motivation to craft a good message. Several automatic approaches have been proposed to generate commit messages. However, these are based on uncurated datasets including considerable proportions of poorly phrased commit messages. In this multi-method study, we first define what constitutes a 'good' commit message, and then establish what proportion of commit messages lack information using a sample of almost 1,600 messages from five highly active open source projects. We find that an average of circa 44% of messages could be improved, suggesting the use of uncurated datasets may be a major threat when commit message generators are trained with such data. We also observe that prior work has not considered semantics of commit messages, and there is surprisingly little guidance available for writing good commit messages. To that end, we develop a taxonomy based on recurring patterns in commit messages' expressions. Finally, we investigate whether 'good' commit messages can be automatically identified; such automation could prompt developers to write better commit messages.
KW - Commit-based software development
KW - commit message quality
KW - open collaboration
UR - http://www.scopus.com/inward/record.url?scp=85133508423&partnerID=8YFLogxK
U2 - 10.1145/3510003.3510205
DO - 10.1145/3510003.3510205
M3 - Conference contribution
AN - SCOPUS:85133508423
T3 - Proceedings - International Conference on Software Engineering
SP - 2389
EP - 2401
BT - Proceedings - 2022 ACM/IEEE 44th International Conference on Software Engineering, ICSE 2022
PB - IEEE Computer Society
T2 - 44th ACM/IEEE International Conference on Software Engineering, ICSE 2022
Y2 - 22 May 2022 through 27 May 2022
ER -