TY - GEN
T1 - Deep Learning Based Identification of Suspicious Return Statements
AU - Li, Guangjie
AU - Liu, Hui
AU - Jin, Jiahao
AU - Umer, Qasim
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/2
Y1 - 2020/2
N2 - Identifiers in source code are composed of terms in natural languages. Such terms, as well as phrases composed of such terms, convey rich semantics that could be exploited for program analysis and comprehension. To this end, in this paper we propose a deep learning based approach, called MLDetector, to identifying suspicious return statements by leveraging semantics conveyed by the natural language phrases that are used as identifiers in the source code. We specially design a deep neural network to tell whether a given return statement matches its corresponding method signature. The rationale is that both method signature and return value should explicitly specify the output of the method, and thus a significant mismatch between method signature and return value may suggest a suspicious return statement. To address the challenge of lacking negative training data, i.e., incorrect return statements, we generate negative training data automatically by transforming real-world correct return statements. To feed code into neural network, we convert them into vectors by Word2Vec, an unsupervised neural network based learning algorithm. We evaluate the proposed approach in two parts. In the first part, we evaluate it on 500 open-source applications by automatically generating labeled training data. Results suggest that the precision of the proposed approach varies from 83% to 90%. In the second part, we conduct a case study on 100 real-world applications. Evaluation results suggest that 42 out of 65 real-world incorrect return statements are detected (with precision of 59%).
AB - Identifiers in source code are composed of terms in natural languages. Such terms, as well as phrases composed of such terms, convey rich semantics that could be exploited for program analysis and comprehension. To this end, in this paper we propose a deep learning based approach, called MLDetector, to identifying suspicious return statements by leveraging semantics conveyed by the natural language phrases that are used as identifiers in the source code. We specially design a deep neural network to tell whether a given return statement matches its corresponding method signature. The rationale is that both method signature and return value should explicitly specify the output of the method, and thus a significant mismatch between method signature and return value may suggest a suspicious return statement. To address the challenge of lacking negative training data, i.e., incorrect return statements, we generate negative training data automatically by transforming real-world correct return statements. To feed code into neural network, we convert them into vectors by Word2Vec, an unsupervised neural network based learning algorithm. We evaluate the proposed approach in two parts. In the first part, we evaluate it on 500 open-source applications by automatically generating labeled training data. Results suggest that the precision of the proposed approach varies from 83% to 90%. In the second part, we conduct a case study on 100 real-world applications. Evaluation results suggest that 42 out of 65 real-world incorrect return statements are detected (with precision of 59%).
KW - Bug Detection
KW - Code Quality
KW - Deep Learning
KW - Identification
KW - Program Analysis
KW - Return Value
UR - http://www.scopus.com/inward/record.url?scp=85083556267&partnerID=8YFLogxK
U2 - 10.1109/SANER48275.2020.9054826
DO - 10.1109/SANER48275.2020.9054826
M3 - Conference contribution
AN - SCOPUS:85083556267
T3 - SANER 2020 - Proceedings of the 2020 IEEE 27th International Conference on Software Analysis, Evolution, and Reengineering
SP - 480
EP - 491
BT - SANER 2020 - Proceedings of the 2020 IEEE 27th International Conference on Software Analysis, Evolution, and Reengineering
A2 - Kontogiannis, Kostas
A2 - Khomh, Foutse
A2 - Chatzigeorgiou, Alexander
A2 - Fokaefs, Marios-Eleftherios
A2 - Zhou, Minghui
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 27th IEEE International Conference on Software Analysis, Evolution, and Reengineering, SANER 2020
Y2 - 18 February 2020 through 21 February 2020
ER -