Automated Expansion of Abbreviations Based on Semantic Relation and Transfer Expansion

Yanjie Jiang, Hui Liu*, Jiahao Jin, Lu Zhang

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

9 引用 (Scopus)

摘要

Although the negative impact of abbreviations in source code is well-recognized, abbreviations are common for various reasons. To this end, a number of approaches have been proposed to expand abbreviations in identifiers. However, such approaches are either inaccurate or confined to specific identifiers. To this end, in this paper, we propose a generic and accurate approach to expand identifier abbreviations by leveraging both semantic relation and transfer expansion. One of the key insights of the approach is that abbreviations in the name of software entity e have a great chance to find their full terms in names of software entities that are semantically related to e. Consequently, the proposed approach builds a knowledge graph to represent such entities and their relationships with e and searches the graph for full terms. Another key insight is that literally identical abbreviations within the same application are likely (but not necessary) to have identical expansions, and thus the semantics-based expansion in one place may be transferred to other places. To investigate when abbreviation expansion could be transferred safely, we conduct a case study on three open-source applications. The results suggest that a significant part (75 percent) of expansions could be transferred among lexically identical abbreviations within the same application. However, the risk of transfer varies according to various factors, e.g., length of abbreviations, the physical distance between abbreviations, and semantic relations between abbreviations. Based on these findings, we design nine heuristics for transfer expansion and propose a learning-based approach to prioritize both transfer heuristics and semantic-based expansion heuristics. Evaluation results on nine open-source applications suggest that the proposed approach significantly improves the state of the art, improving recall from 29 to 89 percent and precision from 39 to 92 percent.

源语言英语
页(从-至)519-537
页数19
期刊IEEE Transactions on Software Engineering
48
2
DOI
出版状态已出版 - 1 2月 2022

指纹

探究 'Automated Expansion of Abbreviations Based on Semantic Relation and Transfer Expansion' 的科研主题。它们共同构成独一无二的指纹。

引用此