TY - GEN
T1 - Which abbreviations should be expanded?
AU - Jiang, Yanjie
AU - Liu, Hui
AU - Zhang, Yuxia
AU - Niu, Nan
AU - Zhao, Yuhai
AU - Zhang, Lu
N1 - Publisher Copyright:
© 2021 Owner/Author.
PY - 2021/8/20
Y1 - 2021/8/20
N2 - Abbreviations are common in source code. Properly designed abbreviations may significantly facilitate typing, typesetting, and reading of lengthy source code. However, abbreviations, if used improperly, may also significantly reduce the readability and maintainability of source code. Although a few automated approaches have been proposed to suggest full terms for given abbreviations, to the best of our knowledge, there is no automated approaches to suggest whether abbreviations are used properly, i.e., whether they should be replaced with corresponding full terms. Notably, it is often challenging for inexperienced developers and maintainers to make such decisions. To this end, in this paper, we propose an automated approach to assisting developers and maintainers in making the decisions. The rationale of the approach is that abbreviations should not be expanded if the expansion would result in unacceptably lengthy identifiers or if developers/maintainers can easily figure out the meaning (full terms) of the abbreviations based on their domain knowledge or contexts of the abbreviations. From a corpus of programs, we leverage data mining techniques to discover common abbreviations that are frequently employed by various developers in similar contexts. The key of the data mining is to turn the problem of mining common abbreviations into the maximal clique problem that has been extensively studied. We suggest to not expand given abbreviation if it matches at least one of the discovered common abbreviations. From the same corpus, we also calculate the probability distribution for the length of different types of identifier, e.g., variable names and method names. The probability distribution specifies how likely an identifier of type T is composed of exactly n characters. Our heuristic is to not expand the abbreviation if the probability of its enclosing identifier would be reduced by the expansion. Finally, we also suggest to not expand the abbreviation if its full terms are contained in surrounding contexts of the abbreviation, i.e., tokens on the same source code line. Other abbreviations that do not receive suggestions from the proposed approach are expected to be replaced with their full terms. Our evaluation results on 1,818 abbreviations from five open-source applications suggest that the proposed approach is accurate with a high accuracy of 95%.
AB - Abbreviations are common in source code. Properly designed abbreviations may significantly facilitate typing, typesetting, and reading of lengthy source code. However, abbreviations, if used improperly, may also significantly reduce the readability and maintainability of source code. Although a few automated approaches have been proposed to suggest full terms for given abbreviations, to the best of our knowledge, there is no automated approaches to suggest whether abbreviations are used properly, i.e., whether they should be replaced with corresponding full terms. Notably, it is often challenging for inexperienced developers and maintainers to make such decisions. To this end, in this paper, we propose an automated approach to assisting developers and maintainers in making the decisions. The rationale of the approach is that abbreviations should not be expanded if the expansion would result in unacceptably lengthy identifiers or if developers/maintainers can easily figure out the meaning (full terms) of the abbreviations based on their domain knowledge or contexts of the abbreviations. From a corpus of programs, we leverage data mining techniques to discover common abbreviations that are frequently employed by various developers in similar contexts. The key of the data mining is to turn the problem of mining common abbreviations into the maximal clique problem that has been extensively studied. We suggest to not expand given abbreviation if it matches at least one of the discovered common abbreviations. From the same corpus, we also calculate the probability distribution for the length of different types of identifier, e.g., variable names and method names. The probability distribution specifies how likely an identifier of type T is composed of exactly n characters. Our heuristic is to not expand the abbreviation if the probability of its enclosing identifier would be reduced by the expansion. Finally, we also suggest to not expand the abbreviation if its full terms are contained in surrounding contexts of the abbreviation, i.e., tokens on the same source code line. Other abbreviations that do not receive suggestions from the proposed approach are expected to be replaced with their full terms. Our evaluation results on 1,818 abbreviations from five open-source applications suggest that the proposed approach is accurate with a high accuracy of 95%.
KW - Abbreviation
KW - Cliques
KW - Data Mining
KW - Expansion
KW - Software Quality
UR - http://www.scopus.com/inward/record.url?scp=85116287997&partnerID=8YFLogxK
U2 - 10.1145/3468264.3468616
DO - 10.1145/3468264.3468616
M3 - Conference contribution
AN - SCOPUS:85116287997
T3 - ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
SP - 578
EP - 589
BT - ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
A2 - Spinellis, Diomidis
PB - Association for Computing Machinery, Inc
T2 - 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021
Y2 - 23 August 2021 through 28 August 2021
ER -