Which abbreviations should be expanded?

Yanjie Jiang; Hui Liu; Yuxia Zhang; Nan Niu; Yuhai Zhao; Lu Zhang

doi:10.1145/3468264.3468616

Which abbreviations should be expanded?

Yanjie Jiang, Hui Liu^*, Yuxia Zhang, Nan Niu, Yuhai Zhao, Lu Zhang

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

10 Citations (Scopus)

Abstract

Abbreviations are common in source code. Properly designed abbreviations may significantly facilitate typing, typesetting, and reading of lengthy source code. However, abbreviations, if used improperly, may also significantly reduce the readability and maintainability of source code. Although a few automated approaches have been proposed to suggest full terms for given abbreviations, to the best of our knowledge, there is no automated approaches to suggest whether abbreviations are used properly, i.e., whether they should be replaced with corresponding full terms. Notably, it is often challenging for inexperienced developers and maintainers to make such decisions. To this end, in this paper, we propose an automated approach to assisting developers and maintainers in making the decisions. The rationale of the approach is that abbreviations should not be expanded if the expansion would result in unacceptably lengthy identifiers or if developers/maintainers can easily figure out the meaning (full terms) of the abbreviations based on their domain knowledge or contexts of the abbreviations. From a corpus of programs, we leverage data mining techniques to discover common abbreviations that are frequently employed by various developers in similar contexts. The key of the data mining is to turn the problem of mining common abbreviations into the maximal clique problem that has been extensively studied. We suggest to not expand given abbreviation if it matches at least one of the discovered common abbreviations. From the same corpus, we also calculate the probability distribution for the length of different types of identifier, e.g., variable names and method names. The probability distribution specifies how likely an identifier of type T is composed of exactly n characters. Our heuristic is to not expand the abbreviation if the probability of its enclosing identifier would be reduced by the expansion. Finally, we also suggest to not expand the abbreviation if its full terms are contained in surrounding contexts of the abbreviation, i.e., tokens on the same source code line. Other abbreviations that do not receive suggestions from the proposed approach are expected to be replaced with their full terms. Our evaluation results on 1,818 abbreviations from five open-source applications suggest that the proposed approach is accurate with a high accuracy of 95%.

Original language	English
Title of host publication	ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
Editors	Diomidis Spinellis
Publisher	Association for Computing Machinery, Inc
Pages	578-589
Number of pages	12
ISBN (Electronic)	9781450385626
DOIs	https://doi.org/10.1145/3468264.3468616
Publication status	Published - 20 Aug 2021
Event	29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021 - Virtual, Online, Greece Duration: 23 Aug 2021 → 28 Aug 2021

Publication series

Name	ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Conference

Conference	29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021
Country/Territory	Greece
City	Virtual, Online
Period	23/08/21 → 28/08/21

Keywords

Abbreviation
Cliques
Data Mining
Expansion
Software Quality

Access to Document

10.1145/3468264.3468616

Cite this

Jiang, Y., Liu, H., Zhang, Y., Niu, N., Zhao, Y., & Zhang, L. (2021). Which abbreviations should be expanded? In D. Spinellis (Ed.), ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering (pp. 578-589). (ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering). Association for Computing Machinery, Inc. https://doi.org/10.1145/3468264.3468616

Jiang, Yanjie ; Liu, Hui ; Zhang, Yuxia et al. / Which abbreviations should be expanded?. ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering. editor / Diomidis Spinellis. Association for Computing Machinery, Inc, 2021. pp. 578-589 (ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering).

@inproceedings{cbfd728f641e40c5a03b1f43fc29960e,

title = "Which abbreviations should be expanded?",

abstract = "Abbreviations are common in source code. Properly designed abbreviations may significantly facilitate typing, typesetting, and reading of lengthy source code. However, abbreviations, if used improperly, may also significantly reduce the readability and maintainability of source code. Although a few automated approaches have been proposed to suggest full terms for given abbreviations, to the best of our knowledge, there is no automated approaches to suggest whether abbreviations are used properly, i.e., whether they should be replaced with corresponding full terms. Notably, it is often challenging for inexperienced developers and maintainers to make such decisions. To this end, in this paper, we propose an automated approach to assisting developers and maintainers in making the decisions. The rationale of the approach is that abbreviations should not be expanded if the expansion would result in unacceptably lengthy identifiers or if developers/maintainers can easily figure out the meaning (full terms) of the abbreviations based on their domain knowledge or contexts of the abbreviations. From a corpus of programs, we leverage data mining techniques to discover common abbreviations that are frequently employed by various developers in similar contexts. The key of the data mining is to turn the problem of mining common abbreviations into the maximal clique problem that has been extensively studied. We suggest to not expand given abbreviation if it matches at least one of the discovered common abbreviations. From the same corpus, we also calculate the probability distribution for the length of different types of identifier, e.g., variable names and method names. The probability distribution specifies how likely an identifier of type T is composed of exactly n characters. Our heuristic is to not expand the abbreviation if the probability of its enclosing identifier would be reduced by the expansion. Finally, we also suggest to not expand the abbreviation if its full terms are contained in surrounding contexts of the abbreviation, i.e., tokens on the same source code line. Other abbreviations that do not receive suggestions from the proposed approach are expected to be replaced with their full terms. Our evaluation results on 1,818 abbreviations from five open-source applications suggest that the proposed approach is accurate with a high accuracy of 95%.",

keywords = "Abbreviation, Cliques, Data Mining, Expansion, Software Quality",

author = "Yanjie Jiang and Hui Liu and Yuxia Zhang and Nan Niu and Yuhai Zhao and Lu Zhang",

note = "Publisher Copyright: {\textcopyright} 2021 Owner/Author.; 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021 ; Conference date: 23-08-2021 Through 28-08-2021",

year = "2021",

month = aug,

day = "20",

doi = "10.1145/3468264.3468616",

language = "English",

series = "ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering",

publisher = "Association for Computing Machinery, Inc",

pages = "578--589",

editor = "Diomidis Spinellis",

booktitle = "ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering",

}

Jiang, Y, Liu, H, Zhang, Y, Niu, N, Zhao, Y & Zhang, L 2021, Which abbreviations should be expanded? in D Spinellis (ed.), ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Association for Computing Machinery, Inc, pp. 578-589, 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021, Virtual, Online, Greece, 23/08/21. https://doi.org/10.1145/3468264.3468616

Which abbreviations should be expanded? / Jiang, Yanjie; Liu, Hui; Zhang, Yuxia et al.
ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ed. / Diomidis Spinellis. Association for Computing Machinery, Inc, 2021. p. 578-589 (ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review