Improving code readability classification using convolutional neural networks

Qing Mi; Jacky Keung; Yan Xiao; Solomon Mensah; Yujin Gao

doi:10.1016/j.infsof.2018.07.006

Improving code readability classification using convolutional neural networks

Qing Mi^*, Jacky Keung, Yan Xiao, Solomon Mensah, Yujin Gao

^*此作品的通讯作者

计算机学院

City University of Hong Kong

科研成果: 期刊稿件 › 文章 › 同行评审

33 引用（Scopus）

摘要

Context: Code readability classification (which refers to classification of a piece of source code as either readable or unreadable) has attracted increasing concern in academia and industry. To construct accurate classification models, previous studies depended mainly upon handcrafted features. However, the manual feature engineering process is usually labor-intensive and can capture only partial information about the source code, which is likely to limit the model performance. Objective: To improve code readability classification, we propose the use of Convolutional Neural Networks (ConvNets). Method: We first introduce a representation strategy (with different granularities) to transform source codes into integer matrices as the input to ConvNets. We then propose DeepCRM, a deep learning-based model for code readability classification. DeepCRM consists of three separate ConvNets with identical architectures that are trained on data preprocessed in different ways. We evaluate our approach against five state-of-the-art code readability models. Results: The experimental results show that DeepCRM can outperform previous approaches. The improvement in accuracy ranges from 2.4% to 17.2%. Conclusions: By eliminating the need for manual feature engineering, DeepCRM provides a relatively improved performance, confirming the efficacy of deep learning techniques in the task of code readability classification.

源语言	英语
页（从-至）	60-71
页数	12
期刊	Information and Software Technology
卷	104
DOI	https://doi.org/10.1016/j.infsof.2018.07.006
出版状态	已出版 - 12月 2018

访问文件

10.1016/j.infsof.2018.07.006

其它文件与链接

链接到 Scopus 的出版物

引用此

Mi, Q., Keung, J., Xiao, Y., Mensah, S., & Gao, Y. (2018). Improving code readability classification using convolutional neural networks. Information and Software Technology, 104, 60-71. https://doi.org/10.1016/j.infsof.2018.07.006

@article{6b19bb5950de4633bd5d57450df7351d,

title = "Improving code readability classification using convolutional neural networks",

abstract = "Context: Code readability classification (which refers to classification of a piece of source code as either readable or unreadable) has attracted increasing concern in academia and industry. To construct accurate classification models, previous studies depended mainly upon handcrafted features. However, the manual feature engineering process is usually labor-intensive and can capture only partial information about the source code, which is likely to limit the model performance. Objective: To improve code readability classification, we propose the use of Convolutional Neural Networks (ConvNets). Method: We first introduce a representation strategy (with different granularities) to transform source codes into integer matrices as the input to ConvNets. We then propose DeepCRM, a deep learning-based model for code readability classification. DeepCRM consists of three separate ConvNets with identical architectures that are trained on data preprocessed in different ways. We evaluate our approach against five state-of-the-art code readability models. Results: The experimental results show that DeepCRM can outperform previous approaches. The improvement in accuracy ranges from 2.4% to 17.2%. Conclusions: By eliminating the need for manual feature engineering, DeepCRM provides a relatively improved performance, confirming the efficacy of deep learning techniques in the task of code readability classification.",

keywords = "Code readability, Convolutional Neural Network, Deep learning, Empirical software engineering, Open source software, Program comprehension",

author = "Qing Mi and Jacky Keung and Yan Xiao and Solomon Mensah and Yujin Gao",

note = "Publisher Copyright: {\textcopyright} 2018 Elsevier B.V.",

year = "2018",

month = dec,

doi = "10.1016/j.infsof.2018.07.006",

language = "English",

volume = "104",

pages = "60--71",

journal = "Information and Software Technology",

issn = "0950-5849",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Improving code readability classification using convolutional neural networks

AU - Mi, Qing

AU - Keung, Jacky

AU - Xiao, Yan

AU - Mensah, Solomon

AU - Gao, Yujin

PY - 2018/12

Y1 - 2018/12

N2 - Context: Code readability classification (which refers to classification of a piece of source code as either readable or unreadable) has attracted increasing concern in academia and industry. To construct accurate classification models, previous studies depended mainly upon handcrafted features. However, the manual feature engineering process is usually labor-intensive and can capture only partial information about the source code, which is likely to limit the model performance. Objective: To improve code readability classification, we propose the use of Convolutional Neural Networks (ConvNets). Method: We first introduce a representation strategy (with different granularities) to transform source codes into integer matrices as the input to ConvNets. We then propose DeepCRM, a deep learning-based model for code readability classification. DeepCRM consists of three separate ConvNets with identical architectures that are trained on data preprocessed in different ways. We evaluate our approach against five state-of-the-art code readability models. Results: The experimental results show that DeepCRM can outperform previous approaches. The improvement in accuracy ranges from 2.4% to 17.2%. Conclusions: By eliminating the need for manual feature engineering, DeepCRM provides a relatively improved performance, confirming the efficacy of deep learning techniques in the task of code readability classification.

AB - Context: Code readability classification (which refers to classification of a piece of source code as either readable or unreadable) has attracted increasing concern in academia and industry. To construct accurate classification models, previous studies depended mainly upon handcrafted features. However, the manual feature engineering process is usually labor-intensive and can capture only partial information about the source code, which is likely to limit the model performance. Objective: To improve code readability classification, we propose the use of Convolutional Neural Networks (ConvNets). Method: We first introduce a representation strategy (with different granularities) to transform source codes into integer matrices as the input to ConvNets. We then propose DeepCRM, a deep learning-based model for code readability classification. DeepCRM consists of three separate ConvNets with identical architectures that are trained on data preprocessed in different ways. We evaluate our approach against five state-of-the-art code readability models. Results: The experimental results show that DeepCRM can outperform previous approaches. The improvement in accuracy ranges from 2.4% to 17.2%. Conclusions: By eliminating the need for manual feature engineering, DeepCRM provides a relatively improved performance, confirming the efficacy of deep learning techniques in the task of code readability classification.

KW - Code readability

KW - Convolutional Neural Network

KW - Deep learning

KW - Empirical software engineering

KW - Open source software

KW - Program comprehension

UR - http://www.scopus.com/inward/record.url?scp=85049901654&partnerID=8YFLogxK

U2 - 10.1016/j.infsof.2018.07.006

DO - 10.1016/j.infsof.2018.07.006

M3 - Article

AN - SCOPUS:85049901654

SN - 0950-5849

VL - 104

SP - 60

EP - 71

JO - Information and Software Technology

JF - Information and Software Technology

ER -

Improving code readability classification using convolutional neural networks

摘要

访问文件

其它文件与链接

指纹

引用此