TY - GEN
T1 - Semantic relation based expansion of abbreviations
AU - Jiang, Yanjie
AU - Liu, Hui
AU - Zhang, Lu
N1 - Publisher Copyright:
© 2019 ACM.
PY - 2019/8/12
Y1 - 2019/8/12
N2 - Identifiers account for 70% of source code in terms of characters, and thus the quality of such identifiers is critical for program comprehension and software maintenance. For various reasons, however, many identifiers contain abbreviations, which reduces the readability and maintainability of source code. To this end, a number of approaches have been proposed to expand abbreviations in identifiers. However, such approaches are either inaccurate or confined to specific identifiers. To this end, in this paper we propose a generic and accurate approach to expand identifier abbreviations. The key insight of the approach is that abbreviations in the name of software entity e have great chance to find their full terms in names of software entities that are semantically related to e. Consequently, the proposed approach builds a knowledge graph to represent such entities and their relationships with e, and searches the graph for full terms. The optimal searching strategy for the graph could be learned automatically from a corpus of manually expanded abbreviations. We evaluate the proposed approach on nine well known open-source projects. Results of our k-fold evaluation suggest that the proposed approach improves the state of the art. It improves precision significantly from 29% to 85%, and recall from 29% to 77%. Evaluation results also suggest that the proposed generic approach is even better than the state-of-the-art parameter-specific approach in expanding parameter abbreviations, improving F1 score significantly from 75% to 87%.
AB - Identifiers account for 70% of source code in terms of characters, and thus the quality of such identifiers is critical for program comprehension and software maintenance. For various reasons, however, many identifiers contain abbreviations, which reduces the readability and maintainability of source code. To this end, a number of approaches have been proposed to expand abbreviations in identifiers. However, such approaches are either inaccurate or confined to specific identifiers. To this end, in this paper we propose a generic and accurate approach to expand identifier abbreviations. The key insight of the approach is that abbreviations in the name of software entity e have great chance to find their full terms in names of software entities that are semantically related to e. Consequently, the proposed approach builds a knowledge graph to represent such entities and their relationships with e, and searches the graph for full terms. The optimal searching strategy for the graph could be learned automatically from a corpus of manually expanded abbreviations. We evaluate the proposed approach on nine well known open-source projects. Results of our k-fold evaluation suggest that the proposed approach improves the state of the art. It improves precision significantly from 29% to 85%, and recall from 29% to 77%. Evaluation results also suggest that the proposed generic approach is even better than the state-of-the-art parameter-specific approach in expanding parameter abbreviations, improving F1 score significantly from 75% to 87%.
KW - Abbreviation
KW - Expansion
KW - Knowledge Graph
KW - Software Quality
UR - http://www.scopus.com/inward/record.url?scp=85071906275&partnerID=8YFLogxK
U2 - 10.1145/3338906.3338929
DO - 10.1145/3338906.3338929
M3 - Conference contribution
AN - SCOPUS:85071906275
T3 - ESEC/FSE 2019 - Proceedings of the 2019 27th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
SP - 131
EP - 141
BT - ESEC/FSE 2019 - Proceedings of the 2019 27th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
A2 - Apel, Sven
A2 - Dumas, Marlon
A2 - Russo, Alessandra
A2 - Pfahl, Dietmar
PB - Association for Computing Machinery, Inc
T2 - 27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2019
Y2 - 26 August 2019 through 30 August 2019
ER -