Machine learning based recommendation of method names: How far are we

Lin Jiang; Hui Liu; He Jiang

doi:10.1109/ASE.2019.00062

Machine learning based recommendation of method names: How far are we

Lin Jiang, Hui Liu^*, He Jiang

^*此作品的通讯作者

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

50 引用（Scopus）

摘要

High quality method names are critical for the readability and maintainability of programs. However, constructing concise and consistent method names is often challenging, especially for inexperienced developers. To this end, advanced machine learning techniques have been recently leveraged to recommend method names automatically for given method bodies/implementation. Recent large-scale evaluations also suggest that such approaches are accurate. However, little is known about where and why such approaches work or don't work. To figure out the state of the art as well as the rationale for the success/failure, in this paper we conduct an empirical study on the state-of-the-art approach code2vec. We assess code2vec on a new dataset with more realistic settings. Our evaluation results suggest that although switching to new dataset does not significantly influence the performance, more realistic settings do significantly reduce the performance of code2vec. Further analysis on the successfully recommended method names also reveals the following findings: 1) around half (48.3%) of the accepted recommendations are made on getter/setter methods; 2) a large portion (19.2%) of the successfully recommended method names could be copied from the given bodies. To further validate its usefulness, we ask developers to manually score the difficulty in naming methods they developed. Code2vec is then applied to such manually scored methods to evaluate how often it works in need. Our evaluation results suggest that code2vec rarely works when it is really needed. Finally, to intuitively reveal the state of the art and to investigate the possibility of designing simple and straightforward alternative approaches, we propose a heuristics based approach to recommending method names. Evaluation results on large-scale dataset suggest that this simple heuristics-based approach significantly outperforms the state-of-the-art machine learning based approach, improving precision and recall by 65.25% and 22.45%, respectively. The comparison suggests that machine learning based recommendation of method names may still have a long way to go.

源语言	英语
主期刊名	Proceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019
出版商	Institute of Electrical and Electronics Engineers Inc.
页	602-614
页数	13
ISBN（电子版）	9781728125084
DOI	https://doi.org/10.1109/ASE.2019.00062
出版状态	已出版 - 11月 2019
活动	34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019 - San Diego, 美国期限: 10 11月 2019 → 15 11月 2019

出版系列

姓名	Proceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019

会议

会议	34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019
国家/地区	美国
市	San Diego
时期	10/11/19 → 15/11/19

访问文件

10.1109/ASE.2019.00062

其它文件与链接

链接到 Scopus 的出版物

引用此

Jiang, L., Liu, H., & Jiang, H. (2019). Machine learning based recommendation of method names: How far are we. 在 Proceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019 (页码 602-614). 文章 8952208 (Proceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ASE.2019.00062

Jiang, Lin ; Liu, Hui ; Jiang, He. / Machine learning based recommendation of method names : How far are we. Proceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019. Institute of Electrical and Electronics Engineers Inc., 2019. 页码 602-614 (Proceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019).

@inproceedings{c32629a255f1463bbe55a894d97e9d1d,

title = "Machine learning based recommendation of method names: How far are we",

abstract = "High quality method names are critical for the readability and maintainability of programs. However, constructing concise and consistent method names is often challenging, especially for inexperienced developers. To this end, advanced machine learning techniques have been recently leveraged to recommend method names automatically for given method bodies/implementation. Recent large-scale evaluations also suggest that such approaches are accurate. However, little is known about where and why such approaches work or don't work. To figure out the state of the art as well as the rationale for the success/failure, in this paper we conduct an empirical study on the state-of-the-art approach code2vec. We assess code2vec on a new dataset with more realistic settings. Our evaluation results suggest that although switching to new dataset does not significantly influence the performance, more realistic settings do significantly reduce the performance of code2vec. Further analysis on the successfully recommended method names also reveals the following findings: 1) around half (48.3%) of the accepted recommendations are made on getter/setter methods; 2) a large portion (19.2%) of the successfully recommended method names could be copied from the given bodies. To further validate its usefulness, we ask developers to manually score the difficulty in naming methods they developed. Code2vec is then applied to such manually scored methods to evaluate how often it works in need. Our evaluation results suggest that code2vec rarely works when it is really needed. Finally, to intuitively reveal the state of the art and to investigate the possibility of designing simple and straightforward alternative approaches, we propose a heuristics based approach to recommending method names. Evaluation results on large-scale dataset suggest that this simple heuristics-based approach significantly outperforms the state-of-the-art machine learning based approach, improving precision and recall by 65.25% and 22.45%, respectively. The comparison suggests that machine learning based recommendation of method names may still have a long way to go.",

keywords = "Code Recommendation, Machine Learning",

author = "Lin Jiang and Hui Liu and He Jiang",

note = "Publisher Copyright: {\textcopyright} 2019 IEEE.; 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019 ; Conference date: 10-11-2019 Through 15-11-2019",

year = "2019",

month = nov,

doi = "10.1109/ASE.2019.00062",

language = "English",

series = "Proceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "602--614",

booktitle = "Proceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019",

address = "United States",

}

Jiang, L, Liu, H & Jiang, H 2019, Machine learning based recommendation of method names: How far are we. 在 Proceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019., 8952208, Proceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019, Institute of Electrical and Electronics Engineers Inc., 页码 602-614, 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019, San Diego, 美国, 10/11/19. https://doi.org/10.1109/ASE.2019.00062

Machine learning based recommendation of method names: How far are we. / Jiang, Lin; Liu, Hui; Jiang, He.
Proceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019. Institute of Electrical and Electronics Engineers Inc., 2019. 页码 602-614 8952208 (Proceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Machine learning based recommendation of method names

T2 - 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019

AU - Jiang, Lin

AU - Liu, Hui

AU - Jiang, He

PY - 2019/11

Y1 - 2019/11

N2 - High quality method names are critical for the readability and maintainability of programs. However, constructing concise and consistent method names is often challenging, especially for inexperienced developers. To this end, advanced machine learning techniques have been recently leveraged to recommend method names automatically for given method bodies/implementation. Recent large-scale evaluations also suggest that such approaches are accurate. However, little is known about where and why such approaches work or don't work. To figure out the state of the art as well as the rationale for the success/failure, in this paper we conduct an empirical study on the state-of-the-art approach code2vec. We assess code2vec on a new dataset with more realistic settings. Our evaluation results suggest that although switching to new dataset does not significantly influence the performance, more realistic settings do significantly reduce the performance of code2vec. Further analysis on the successfully recommended method names also reveals the following findings: 1) around half (48.3%) of the accepted recommendations are made on getter/setter methods; 2) a large portion (19.2%) of the successfully recommended method names could be copied from the given bodies. To further validate its usefulness, we ask developers to manually score the difficulty in naming methods they developed. Code2vec is then applied to such manually scored methods to evaluate how often it works in need. Our evaluation results suggest that code2vec rarely works when it is really needed. Finally, to intuitively reveal the state of the art and to investigate the possibility of designing simple and straightforward alternative approaches, we propose a heuristics based approach to recommending method names. Evaluation results on large-scale dataset suggest that this simple heuristics-based approach significantly outperforms the state-of-the-art machine learning based approach, improving precision and recall by 65.25% and 22.45%, respectively. The comparison suggests that machine learning based recommendation of method names may still have a long way to go.

AB - High quality method names are critical for the readability and maintainability of programs. However, constructing concise and consistent method names is often challenging, especially for inexperienced developers. To this end, advanced machine learning techniques have been recently leveraged to recommend method names automatically for given method bodies/implementation. Recent large-scale evaluations also suggest that such approaches are accurate. However, little is known about where and why such approaches work or don't work. To figure out the state of the art as well as the rationale for the success/failure, in this paper we conduct an empirical study on the state-of-the-art approach code2vec. We assess code2vec on a new dataset with more realistic settings. Our evaluation results suggest that although switching to new dataset does not significantly influence the performance, more realistic settings do significantly reduce the performance of code2vec. Further analysis on the successfully recommended method names also reveals the following findings: 1) around half (48.3%) of the accepted recommendations are made on getter/setter methods; 2) a large portion (19.2%) of the successfully recommended method names could be copied from the given bodies. To further validate its usefulness, we ask developers to manually score the difficulty in naming methods they developed. Code2vec is then applied to such manually scored methods to evaluate how often it works in need. Our evaluation results suggest that code2vec rarely works when it is really needed. Finally, to intuitively reveal the state of the art and to investigate the possibility of designing simple and straightforward alternative approaches, we propose a heuristics based approach to recommending method names. Evaluation results on large-scale dataset suggest that this simple heuristics-based approach significantly outperforms the state-of-the-art machine learning based approach, improving precision and recall by 65.25% and 22.45%, respectively. The comparison suggests that machine learning based recommendation of method names may still have a long way to go.

KW - Code Recommendation

KW - Machine Learning

UR - http://www.scopus.com/inward/record.url?scp=85078888420&partnerID=8YFLogxK

U2 - 10.1109/ASE.2019.00062

DO - 10.1109/ASE.2019.00062

M3 - Conference contribution

AN - SCOPUS:85078888420

T3 - Proceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019

SP - 602

EP - 614

BT - Proceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 10 November 2019 through 15 November 2019

ER -

Jiang L, Liu H, Jiang H. Machine learning based recommendation of method names: How far are we. 在 Proceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019. Institute of Electrical and Electronics Engineers Inc. 2019. 页码 602-614. 8952208. (Proceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019). doi: 10.1109/ASE.2019.00062

Machine learning based recommendation of method names: How far are we

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此