TY - JOUR
T1 - Improving code readability classification using convolutional neural networks
AU - Mi, Qing
AU - Keung, Jacky
AU - Xiao, Yan
AU - Mensah, Solomon
AU - Gao, Yujin
N1 - Publisher Copyright:
© 2018 Elsevier B.V.
PY - 2018/12
Y1 - 2018/12
N2 - Context: Code readability classification (which refers to classification of a piece of source code as either readable or unreadable) has attracted increasing concern in academia and industry. To construct accurate classification models, previous studies depended mainly upon handcrafted features. However, the manual feature engineering process is usually labor-intensive and can capture only partial information about the source code, which is likely to limit the model performance. Objective: To improve code readability classification, we propose the use of Convolutional Neural Networks (ConvNets). Method: We first introduce a representation strategy (with different granularities) to transform source codes into integer matrices as the input to ConvNets. We then propose DeepCRM, a deep learning-based model for code readability classification. DeepCRM consists of three separate ConvNets with identical architectures that are trained on data preprocessed in different ways. We evaluate our approach against five state-of-the-art code readability models. Results: The experimental results show that DeepCRM can outperform previous approaches. The improvement in accuracy ranges from 2.4% to 17.2%. Conclusions: By eliminating the need for manual feature engineering, DeepCRM provides a relatively improved performance, confirming the efficacy of deep learning techniques in the task of code readability classification.
AB - Context: Code readability classification (which refers to classification of a piece of source code as either readable or unreadable) has attracted increasing concern in academia and industry. To construct accurate classification models, previous studies depended mainly upon handcrafted features. However, the manual feature engineering process is usually labor-intensive and can capture only partial information about the source code, which is likely to limit the model performance. Objective: To improve code readability classification, we propose the use of Convolutional Neural Networks (ConvNets). Method: We first introduce a representation strategy (with different granularities) to transform source codes into integer matrices as the input to ConvNets. We then propose DeepCRM, a deep learning-based model for code readability classification. DeepCRM consists of three separate ConvNets with identical architectures that are trained on data preprocessed in different ways. We evaluate our approach against five state-of-the-art code readability models. Results: The experimental results show that DeepCRM can outperform previous approaches. The improvement in accuracy ranges from 2.4% to 17.2%. Conclusions: By eliminating the need for manual feature engineering, DeepCRM provides a relatively improved performance, confirming the efficacy of deep learning techniques in the task of code readability classification.
KW - Code readability
KW - Convolutional Neural Network
KW - Deep learning
KW - Empirical software engineering
KW - Open source software
KW - Program comprehension
UR - http://www.scopus.com/inward/record.url?scp=85049901654&partnerID=8YFLogxK
U2 - 10.1016/j.infsof.2018.07.006
DO - 10.1016/j.infsof.2018.07.006
M3 - Article
AN - SCOPUS:85049901654
SN - 0950-5849
VL - 104
SP - 60
EP - 71
JO - Information and Software Technology
JF - Information and Software Technology
ER -