Which abbreviations should be expanded?

Yanjie Jiang; Hui Liu; Yuxia Zhang; Nan Niu; Yuhai Zhao; Lu Zhang

doi:10.1145/3468264.3468616

Which abbreviations should be expanded?

Yanjie Jiang, Hui Liu^*, Yuxia Zhang, Nan Niu, Yuhai Zhao, Lu Zhang

^*此作品的通讯作者

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

10 引用（Scopus）

摘要

Abbreviations are common in source code. Properly designed abbreviations may significantly facilitate typing, typesetting, and reading of lengthy source code. However, abbreviations, if used improperly, may also significantly reduce the readability and maintainability of source code. Although a few automated approaches have been proposed to suggest full terms for given abbreviations, to the best of our knowledge, there is no automated approaches to suggest whether abbreviations are used properly, i.e., whether they should be replaced with corresponding full terms. Notably, it is often challenging for inexperienced developers and maintainers to make such decisions. To this end, in this paper, we propose an automated approach to assisting developers and maintainers in making the decisions. The rationale of the approach is that abbreviations should not be expanded if the expansion would result in unacceptably lengthy identifiers or if developers/maintainers can easily figure out the meaning (full terms) of the abbreviations based on their domain knowledge or contexts of the abbreviations. From a corpus of programs, we leverage data mining techniques to discover common abbreviations that are frequently employed by various developers in similar contexts. The key of the data mining is to turn the problem of mining common abbreviations into the maximal clique problem that has been extensively studied. We suggest to not expand given abbreviation if it matches at least one of the discovered common abbreviations. From the same corpus, we also calculate the probability distribution for the length of different types of identifier, e.g., variable names and method names. The probability distribution specifies how likely an identifier of type T is composed of exactly n characters. Our heuristic is to not expand the abbreviation if the probability of its enclosing identifier would be reduced by the expansion. Finally, we also suggest to not expand the abbreviation if its full terms are contained in surrounding contexts of the abbreviation, i.e., tokens on the same source code line. Other abbreviations that do not receive suggestions from the proposed approach are expected to be replaced with their full terms. Our evaluation results on 1,818 abbreviations from five open-source applications suggest that the proposed approach is accurate with a high accuracy of 95%.

源语言	英语
主期刊名	ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
编辑	Diomidis Spinellis
出版商	Association for Computing Machinery, Inc
页	578-589
页数	12
ISBN（电子版）	9781450385626
DOI	https://doi.org/10.1145/3468264.3468616
出版状态	已出版 - 20 8月 2021
活动	29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021 - Virtual, Online, 希腊期限: 23 8月 2021 → 28 8月 2021

出版系列

姓名	ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering

会议

会议	29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021
国家/地区	希腊
市	Virtual, Online
时期	23/08/21 → 28/08/21

访问文件

10.1145/3468264.3468616

其它文件与链接

链接到 Scopus 的出版物

引用此

Jiang, Y., Liu, H., Zhang, Y., Niu, N., Zhao, Y., & Zhang, L. (2021). Which abbreviations should be expanded? 在 D. Spinellis (编辑), ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering (页码 578-589). (ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering). Association for Computing Machinery, Inc. https://doi.org/10.1145/3468264.3468616

Jiang, Yanjie ; Liu, Hui ; Zhang, Yuxia 等. / Which abbreviations should be expanded?. ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 编辑 / Diomidis Spinellis. Association for Computing Machinery, Inc, 2021. 页码 578-589 (ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering).

@inproceedings{cbfd728f641e40c5a03b1f43fc29960e,

title = "Which abbreviations should be expanded?",

abstract = "Abbreviations are common in source code. Properly designed abbreviations may significantly facilitate typing, typesetting, and reading of lengthy source code. However, abbreviations, if used improperly, may also significantly reduce the readability and maintainability of source code. Although a few automated approaches have been proposed to suggest full terms for given abbreviations, to the best of our knowledge, there is no automated approaches to suggest whether abbreviations are used properly, i.e., whether they should be replaced with corresponding full terms. Notably, it is often challenging for inexperienced developers and maintainers to make such decisions. To this end, in this paper, we propose an automated approach to assisting developers and maintainers in making the decisions. The rationale of the approach is that abbreviations should not be expanded if the expansion would result in unacceptably lengthy identifiers or if developers/maintainers can easily figure out the meaning (full terms) of the abbreviations based on their domain knowledge or contexts of the abbreviations. From a corpus of programs, we leverage data mining techniques to discover common abbreviations that are frequently employed by various developers in similar contexts. The key of the data mining is to turn the problem of mining common abbreviations into the maximal clique problem that has been extensively studied. We suggest to not expand given abbreviation if it matches at least one of the discovered common abbreviations. From the same corpus, we also calculate the probability distribution for the length of different types of identifier, e.g., variable names and method names. The probability distribution specifies how likely an identifier of type T is composed of exactly n characters. Our heuristic is to not expand the abbreviation if the probability of its enclosing identifier would be reduced by the expansion. Finally, we also suggest to not expand the abbreviation if its full terms are contained in surrounding contexts of the abbreviation, i.e., tokens on the same source code line. Other abbreviations that do not receive suggestions from the proposed approach are expected to be replaced with their full terms. Our evaluation results on 1,818 abbreviations from five open-source applications suggest that the proposed approach is accurate with a high accuracy of 95%.",

keywords = "Abbreviation, Cliques, Data Mining, Expansion, Software Quality",

author = "Yanjie Jiang and Hui Liu and Yuxia Zhang and Nan Niu and Yuhai Zhao and Lu Zhang",

note = "Publisher Copyright: {\textcopyright} 2021 Owner/Author.; 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021 ; Conference date: 23-08-2021 Through 28-08-2021",

year = "2021",

month = aug,

day = "20",

doi = "10.1145/3468264.3468616",

language = "English",

series = "ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering",

publisher = "Association for Computing Machinery, Inc",

pages = "578--589",

editor = "Diomidis Spinellis",

booktitle = "ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering",

}

Jiang, Y, Liu, H, Zhang, Y, Niu, N, Zhao, Y & Zhang, L 2021, Which abbreviations should be expanded? 在 D Spinellis (编辑), ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Association for Computing Machinery, Inc, 页码 578-589, 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021, Virtual, Online, 希腊, 23/08/21. https://doi.org/10.1145/3468264.3468616

Which abbreviations should be expanded? / Jiang, Yanjie; Liu, Hui; Zhang, Yuxia 等.
ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 编辑 / Diomidis Spinellis. Association for Computing Machinery, Inc, 2021. 页码 578-589 (ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering).