Which abbreviations should be expanded?

Yanjie Jiang, Hui Liu*, Yuxia Zhang, Nan Niu, Yuhai Zhao, Lu Zhang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Citations (Scopus)

Abstract

Abbreviations are common in source code. Properly designed abbreviations may significantly facilitate typing, typesetting, and reading of lengthy source code. However, abbreviations, if used improperly, may also significantly reduce the readability and maintainability of source code. Although a few automated approaches have been proposed to suggest full terms for given abbreviations, to the best of our knowledge, there is no automated approaches to suggest whether abbreviations are used properly, i.e., whether they should be replaced with corresponding full terms. Notably, it is often challenging for inexperienced developers and maintainers to make such decisions. To this end, in this paper, we propose an automated approach to assisting developers and maintainers in making the decisions. The rationale of the approach is that abbreviations should not be expanded if the expansion would result in unacceptably lengthy identifiers or if developers/maintainers can easily figure out the meaning (full terms) of the abbreviations based on their domain knowledge or contexts of the abbreviations. From a corpus of programs, we leverage data mining techniques to discover common abbreviations that are frequently employed by various developers in similar contexts. The key of the data mining is to turn the problem of mining common abbreviations into the maximal clique problem that has been extensively studied. We suggest to not expand given abbreviation if it matches at least one of the discovered common abbreviations. From the same corpus, we also calculate the probability distribution for the length of different types of identifier, e.g., variable names and method names. The probability distribution specifies how likely an identifier of type T is composed of exactly n characters. Our heuristic is to not expand the abbreviation if the probability of its enclosing identifier would be reduced by the expansion. Finally, we also suggest to not expand the abbreviation if its full terms are contained in surrounding contexts of the abbreviation, i.e., tokens on the same source code line. Other abbreviations that do not receive suggestions from the proposed approach are expected to be replaced with their full terms. Our evaluation results on 1,818 abbreviations from five open-source applications suggest that the proposed approach is accurate with a high accuracy of 95%.

Original languageEnglish
Title of host publicationESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
EditorsDiomidis Spinellis
PublisherAssociation for Computing Machinery, Inc
Pages578-589
Number of pages12
ISBN (Electronic)9781450385626
DOIs
Publication statusPublished - 20 Aug 2021
Event29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021 - Virtual, Online, Greece
Duration: 23 Aug 202128 Aug 2021

Publication series

NameESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Conference

Conference29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021
Country/TerritoryGreece
CityVirtual, Online
Period23/08/2128/08/21

Keywords

  • Abbreviation
  • Cliques
  • Data Mining
  • Expansion
  • Software Quality

Fingerprint

Dive into the research topics of 'Which abbreviations should be expanded?'. Together they form a unique fingerprint.

Cite this