TY - JOUR
T1 - Automated Expansion of Abbreviations Based on Semantic Relation and Transfer Expansion
AU - Jiang, Yanjie
AU - Liu, Hui
AU - Jin, Jiahao
AU - Zhang, Lu
N1 - Publisher Copyright:
© 1976-2012 IEEE.
PY - 2022/2/1
Y1 - 2022/2/1
N2 - Although the negative impact of abbreviations in source code is well-recognized, abbreviations are common for various reasons. To this end, a number of approaches have been proposed to expand abbreviations in identifiers. However, such approaches are either inaccurate or confined to specific identifiers. To this end, in this paper, we propose a generic and accurate approach to expand identifier abbreviations by leveraging both semantic relation and transfer expansion. One of the key insights of the approach is that abbreviations in the name of software entity e have a great chance to find their full terms in names of software entities that are semantically related to e. Consequently, the proposed approach builds a knowledge graph to represent such entities and their relationships with e and searches the graph for full terms. Another key insight is that literally identical abbreviations within the same application are likely (but not necessary) to have identical expansions, and thus the semantics-based expansion in one place may be transferred to other places. To investigate when abbreviation expansion could be transferred safely, we conduct a case study on three open-source applications. The results suggest that a significant part (75 percent) of expansions could be transferred among lexically identical abbreviations within the same application. However, the risk of transfer varies according to various factors, e.g., length of abbreviations, the physical distance between abbreviations, and semantic relations between abbreviations. Based on these findings, we design nine heuristics for transfer expansion and propose a learning-based approach to prioritize both transfer heuristics and semantic-based expansion heuristics. Evaluation results on nine open-source applications suggest that the proposed approach significantly improves the state of the art, improving recall from 29 to 89 percent and precision from 39 to 92 percent.
AB - Although the negative impact of abbreviations in source code is well-recognized, abbreviations are common for various reasons. To this end, a number of approaches have been proposed to expand abbreviations in identifiers. However, such approaches are either inaccurate or confined to specific identifiers. To this end, in this paper, we propose a generic and accurate approach to expand identifier abbreviations by leveraging both semantic relation and transfer expansion. One of the key insights of the approach is that abbreviations in the name of software entity e have a great chance to find their full terms in names of software entities that are semantically related to e. Consequently, the proposed approach builds a knowledge graph to represent such entities and their relationships with e and searches the graph for full terms. Another key insight is that literally identical abbreviations within the same application are likely (but not necessary) to have identical expansions, and thus the semantics-based expansion in one place may be transferred to other places. To investigate when abbreviation expansion could be transferred safely, we conduct a case study on three open-source applications. The results suggest that a significant part (75 percent) of expansions could be transferred among lexically identical abbreviations within the same application. However, the risk of transfer varies according to various factors, e.g., length of abbreviations, the physical distance between abbreviations, and semantic relations between abbreviations. Based on these findings, we design nine heuristics for transfer expansion and propose a learning-based approach to prioritize both transfer heuristics and semantic-based expansion heuristics. Evaluation results on nine open-source applications suggest that the proposed approach significantly improves the state of the art, improving recall from 29 to 89 percent and precision from 39 to 92 percent.
KW - Abbreviation
KW - Context
KW - Expansion
KW - Quality
KW - Transfer
UR - http://www.scopus.com/inward/record.url?scp=85085757408&partnerID=8YFLogxK
U2 - 10.1109/TSE.2020.2995736
DO - 10.1109/TSE.2020.2995736
M3 - Article
AN - SCOPUS:85085757408
SN - 0098-5589
VL - 48
SP - 519
EP - 537
JO - IEEE Transactions on Software Engineering
JF - IEEE Transactions on Software Engineering
IS - 2
ER -