Automated Expansion of Abbreviations Based on Semantic Relation and Transfer Expansion

Yanjie Jiang, Hui Liu*, Jiahao Jin, Lu Zhang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

9 Citations (Scopus)

Abstract

Although the negative impact of abbreviations in source code is well-recognized, abbreviations are common for various reasons. To this end, a number of approaches have been proposed to expand abbreviations in identifiers. However, such approaches are either inaccurate or confined to specific identifiers. To this end, in this paper, we propose a generic and accurate approach to expand identifier abbreviations by leveraging both semantic relation and transfer expansion. One of the key insights of the approach is that abbreviations in the name of software entity e have a great chance to find their full terms in names of software entities that are semantically related to e. Consequently, the proposed approach builds a knowledge graph to represent such entities and their relationships with e and searches the graph for full terms. Another key insight is that literally identical abbreviations within the same application are likely (but not necessary) to have identical expansions, and thus the semantics-based expansion in one place may be transferred to other places. To investigate when abbreviation expansion could be transferred safely, we conduct a case study on three open-source applications. The results suggest that a significant part (75 percent) of expansions could be transferred among lexically identical abbreviations within the same application. However, the risk of transfer varies according to various factors, e.g., length of abbreviations, the physical distance between abbreviations, and semantic relations between abbreviations. Based on these findings, we design nine heuristics for transfer expansion and propose a learning-based approach to prioritize both transfer heuristics and semantic-based expansion heuristics. Evaluation results on nine open-source applications suggest that the proposed approach significantly improves the state of the art, improving recall from 29 to 89 percent and precision from 39 to 92 percent.

Original languageEnglish
Pages (from-to)519-537
Number of pages19
JournalIEEE Transactions on Software Engineering
Volume48
Issue number2
DOIs
Publication statusPublished - 1 Feb 2022

Keywords

  • Abbreviation
  • Context
  • Expansion
  • Quality
  • Transfer

Fingerprint

Dive into the research topics of 'Automated Expansion of Abbreviations Based on Semantic Relation and Transfer Expansion'. Together they form a unique fingerprint.

Cite this