TY - GEN
T1 - Nomenest omen
T2 - 2016 IEEE/ACM 38th IEEE International Conference on Software Engineering, ICSE 2016
AU - Liu, Hui
AU - Liu, Qiurong
AU - Staicu, Cristian Alexandru
AU - Pradel, Michael
AU - Luo, Yue
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/5/14
Y1 - 2016/5/14
N2 - Programmer-provided identifier names convey information about the semantics of a program. This information can complement traditional program analyses in various software engineering tasks, such as bug finding, code completion, and documentation. Even though identifier names appear to be a rich source of information, little is known about their properties and their potential usefulness. This paper presents an empirical study of the lexical similarity between arguments and parameters of methods, which is one prominent situation where names can provide otherwise missing information. The study involves 60 real-world Java programs. We find that, for most arguments, the similarity is either very high or very low, and that short and generic names often cause low similarities. Furthermore, we show that inferring a set of low-similarity parameter names from one set of programs allows for pruning such names in another set of programs. Finally, the study shows that many arguments are more similar to the corresponding parameter than any alternative argument available in the call site's scope. As applications of our findings, we present an anomaly detection technique that identifies 144 renaming opportunities and incorrect arguments in 14 programs, and a code recommendation system that suggests correct arguments with a precision of 83%.
AB - Programmer-provided identifier names convey information about the semantics of a program. This information can complement traditional program analyses in various software engineering tasks, such as bug finding, code completion, and documentation. Even though identifier names appear to be a rich source of information, little is known about their properties and their potential usefulness. This paper presents an empirical study of the lexical similarity between arguments and parameters of methods, which is one prominent situation where names can provide otherwise missing information. The study involves 60 real-world Java programs. We find that, for most arguments, the similarity is either very high or very low, and that short and generic names often cause low similarities. Furthermore, we show that inferring a set of low-similarity parameter names from one set of programs allows for pruning such names in another set of programs. Finally, the study shows that many arguments are more similar to the corresponding parameter than any alternative argument available in the call site's scope. As applications of our findings, we present an anomaly detection technique that identifies 144 renaming opportunities and incorrect arguments in 14 programs, and a code recommendation system that suggests correct arguments with a precision of 83%.
KW - Empirical study
KW - Identifier names
KW - Method arguments
KW - Name-based program analysis
KW - Static analysis
UR - https://www.scopus.com/pages/publications/84971482960
U2 - 10.1145/2884781.2884841
DO - 10.1145/2884781.2884841
M3 - Conference contribution
AN - SCOPUS:84971482960
T3 - Proceedings - International Conference on Software Engineering
SP - 1063
EP - 1073
BT - Proceedings - 2016 IEEE/ACM 38th IEEE International Conference on Software Engineering Companion, ICSE 2016
PB - IEEE Computer Society
Y2 - 14 May 2016 through 22 May 2016
ER -