Deep learning based code smell detection

Hui Liu; Jiahao Jin; Zhifeng Xu; Yanzhen Zou; Yifan Bu; Lu Zhang

doi:10.1109/TSE.2019.2936376

Deep learning based code smell detection

Hui Liu^*, Jiahao Jin, Zhifeng Xu, Yanzhen Zou, Yifan Bu, Lu Zhang

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Contribution to journal › Article › peer-review

78 Citations (Scopus)

Abstract

Code smells are structures in the source code that suggest the possibility of refactorings. Consequently, developers may identify refactoring opportunities by detecting code smells. However, manual identification of code smells is challenging and tedious. To this end, a number of approaches have been proposed to identify code smells automatically or semi-automatically. Most of such approaches rely on manually designed heuristics to map manually selected source code metrics into predictions. However, it is challenging to manually select the best features. It is also difficult to manually construct the optimal heuristics. To this end, in this paper we propose a deep learning based novel approach to detecting code smells. The key insight is that deep neural networks and advanced deep learning techniques could automatically select features of source code for code smell detection, and could automatically build the complex mapping between such features and predictions. A big challenge for deep learning based smell detection is that deep learning often requires a large number of labeled training data (to tune a large number of parameters within the employed deep neural network) whereas existing datasets for code smell detection are rather small. To this end, we propose an automatic approach to generating labeled training data for the neural network based classifier, which does not require any human intervention. As an initial try, we apply the proposed approach to four common and well-known code smells, i.e., feature envy, long method, large class, and misplaced class. Evaluation results on open-source applications suggest that the proposed approach significantly improves the state-of-the-art.

Original language	English
Pages (from-to)	1811-1837
Number of pages	27
Journal	IEEE Transactions on Software Engineering
Volume	47
Issue number	9
DOIs	https://doi.org/10.1109/TSE.2019.2936376
Publication status	Published - 1 Sept 2021

Keywords

Software refactoring
code smells
deep learning
identification
quality

Access to Document

10.1109/TSE.2019.2936376

Cite this

Liu, H., Jin, J., Xu, Z., Zou, Y., Bu, Y., & Zhang, L. (2021). Deep learning based code smell detection. IEEE Transactions on Software Engineering, 47(9), 1811-1837. https://doi.org/10.1109/TSE.2019.2936376

@article{8579f41ce78148d2a9fa206626a4e7bf,

title = "Deep learning based code smell detection",

abstract = "Code smells are structures in the source code that suggest the possibility of refactorings. Consequently, developers may identify refactoring opportunities by detecting code smells. However, manual identification of code smells is challenging and tedious. To this end, a number of approaches have been proposed to identify code smells automatically or semi-automatically. Most of such approaches rely on manually designed heuristics to map manually selected source code metrics into predictions. However, it is challenging to manually select the best features. It is also difficult to manually construct the optimal heuristics. To this end, in this paper we propose a deep learning based novel approach to detecting code smells. The key insight is that deep neural networks and advanced deep learning techniques could automatically select features of source code for code smell detection, and could automatically build the complex mapping between such features and predictions. A big challenge for deep learning based smell detection is that deep learning often requires a large number of labeled training data (to tune a large number of parameters within the employed deep neural network) whereas existing datasets for code smell detection are rather small. To this end, we propose an automatic approach to generating labeled training data for the neural network based classifier, which does not require any human intervention. As an initial try, we apply the proposed approach to four common and well-known code smells, i.e., feature envy, long method, large class, and misplaced class. Evaluation results on open-source applications suggest that the proposed approach significantly improves the state-of-the-art.",

keywords = "Software refactoring, code smells, deep learning, identification, quality",

author = "Hui Liu and Jiahao Jin and Zhifeng Xu and Yanzhen Zou and Yifan Bu and Lu Zhang",

note = "Publisher Copyright: {\textcopyright} 1976-2012 IEEE.",

year = "2021",

month = sep,

day = "1",

doi = "10.1109/TSE.2019.2936376",

language = "English",

volume = "47",

pages = "1811--1837",

journal = "IEEE Transactions on Software Engineering",

issn = "0098-5589",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "9",

}

TY - JOUR

T1 - Deep learning based code smell detection

AU - Liu, Hui

AU - Jin, Jiahao

AU - Xu, Zhifeng

AU - Zou, Yanzhen

AU - Bu, Yifan

AU - Zhang, Lu

PY - 2021/9/1

Y1 - 2021/9/1

N2 - Code smells are structures in the source code that suggest the possibility of refactorings. Consequently, developers may identify refactoring opportunities by detecting code smells. However, manual identification of code smells is challenging and tedious. To this end, a number of approaches have been proposed to identify code smells automatically or semi-automatically. Most of such approaches rely on manually designed heuristics to map manually selected source code metrics into predictions. However, it is challenging to manually select the best features. It is also difficult to manually construct the optimal heuristics. To this end, in this paper we propose a deep learning based novel approach to detecting code smells. The key insight is that deep neural networks and advanced deep learning techniques could automatically select features of source code for code smell detection, and could automatically build the complex mapping between such features and predictions. A big challenge for deep learning based smell detection is that deep learning often requires a large number of labeled training data (to tune a large number of parameters within the employed deep neural network) whereas existing datasets for code smell detection are rather small. To this end, we propose an automatic approach to generating labeled training data for the neural network based classifier, which does not require any human intervention. As an initial try, we apply the proposed approach to four common and well-known code smells, i.e., feature envy, long method, large class, and misplaced class. Evaluation results on open-source applications suggest that the proposed approach significantly improves the state-of-the-art.

AB - Code smells are structures in the source code that suggest the possibility of refactorings. Consequently, developers may identify refactoring opportunities by detecting code smells. However, manual identification of code smells is challenging and tedious. To this end, a number of approaches have been proposed to identify code smells automatically or semi-automatically. Most of such approaches rely on manually designed heuristics to map manually selected source code metrics into predictions. However, it is challenging to manually select the best features. It is also difficult to manually construct the optimal heuristics. To this end, in this paper we propose a deep learning based novel approach to detecting code smells. The key insight is that deep neural networks and advanced deep learning techniques could automatically select features of source code for code smell detection, and could automatically build the complex mapping between such features and predictions. A big challenge for deep learning based smell detection is that deep learning often requires a large number of labeled training data (to tune a large number of parameters within the employed deep neural network) whereas existing datasets for code smell detection are rather small. To this end, we propose an automatic approach to generating labeled training data for the neural network based classifier, which does not require any human intervention. As an initial try, we apply the proposed approach to four common and well-known code smells, i.e., feature envy, long method, large class, and misplaced class. Evaluation results on open-source applications suggest that the proposed approach significantly improves the state-of-the-art.

KW - Software refactoring

KW - code smells

KW - deep learning

KW - identification

KW - quality

UR - http://www.scopus.com/inward/record.url?scp=85071552621&partnerID=8YFLogxK

U2 - 10.1109/TSE.2019.2936376

DO - 10.1109/TSE.2019.2936376

M3 - Article

AN - SCOPUS:85071552621

SN - 0098-5589

VL - 47

SP - 1811

EP - 1837

JO - IEEE Transactions on Software Engineering

JF - IEEE Transactions on Software Engineering

IS - 9

ER -

Deep learning based code smell detection

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this