Deep Learning Based Program Generation From Requirements Text: Are We There Yet?

Hui Liu; Mingzhu Shen; Jiaqi Zhu; Nan Niu; Ge Li; Lu Zhang

doi:10.1109/TSE.2020.3018481

Deep Learning Based Program Generation From Requirements Text: Are We There Yet?

Hui Liu^*, Mingzhu Shen, Jiaqi Zhu, Nan Niu, Ge Li, Lu Zhang

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

24 引用（Scopus）

摘要

To release developers from time-consuming software development, many approaches have been proposed to generate source code automatically according to software requirements. With significant advances in deep learning and natural language processing, deep learning-based approaches are proposed to generate source code from natural language descriptions. The key insight is that given a large corpus of software requirements and their corresponding implementations, advanced deep learning techniques may learn how to translate software requirements into source code that fulfill such requirements. Although such approaches are reported to be highly accurate, they are evaluated on datasets that are rather small, lack of diversity, and significantly different from real-world software requirements. To this end, we build a large scale dataset that is composed of longer requirements as well as validated implementations. We evaluate the state-of-the-art approaches on this new dataset, and the results suggest that their performance on our dataset is significantly lower than that on existing datasets concerning the common metrics, i.e., BLEU. Evaluation results also suggest that the generated programs often contain syntactic and semantical errors, and none of them can pass even a single predefined test case. Further analysis reveals that the state-of-the-art approaches learn little from software requirements, and most of the successfully generated statements are popular statements in the training programs. Based on this finding, we propose a popularity-based approach that always generates the most popular statements in training programs regardless of the input (software requirements). Evaluation results suggest that none of the state-of-the-art approaches can outperform this simple statistics-based approach. As a conclusion, deep learning-based program generation requires significant improvement in the future, and our dataset may serve as a basis for future research in this direction.

源语言	英语
页（从-至）	1268-1289
页数	22
期刊	IEEE Transactions on Software Engineering
卷	48
期	4
DOI	https://doi.org/10.1109/TSE.2020.3018481
出版状态	已出版 - 1 4月 2022

访问文件

10.1109/TSE.2020.3018481

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{cd4636970d6a408690820436c46386b9,

title = "Deep Learning Based Program Generation From Requirements Text: Are We There Yet?",

abstract = "To release developers from time-consuming software development, many approaches have been proposed to generate source code automatically according to software requirements. With significant advances in deep learning and natural language processing, deep learning-based approaches are proposed to generate source code from natural language descriptions. The key insight is that given a large corpus of software requirements and their corresponding implementations, advanced deep learning techniques may learn how to translate software requirements into source code that fulfill such requirements. Although such approaches are reported to be highly accurate, they are evaluated on datasets that are rather small, lack of diversity, and significantly different from real-world software requirements. To this end, we build a large scale dataset that is composed of longer requirements as well as validated implementations. We evaluate the state-of-the-art approaches on this new dataset, and the results suggest that their performance on our dataset is significantly lower than that on existing datasets concerning the common metrics, i.e., BLEU. Evaluation results also suggest that the generated programs often contain syntactic and semantical errors, and none of them can pass even a single predefined test case. Further analysis reveals that the state-of-the-art approaches learn little from software requirements, and most of the successfully generated statements are popular statements in the training programs. Based on this finding, we propose a popularity-based approach that always generates the most popular statements in training programs regardless of the input (software requirements). Evaluation results suggest that none of the state-of-the-art approaches can outperform this simple statistics-based approach. As a conclusion, deep learning-based program generation requires significant improvement in the future, and our dataset may serve as a basis for future research in this direction.",

keywords = "Software requirements, code generation, data set, deep learning",

author = "Hui Liu and Mingzhu Shen and Jiaqi Zhu and Nan Niu and Ge Li and Lu Zhang",

note = "Publisher Copyright: {\textcopyright} 1976-2012 IEEE.",

year = "2022",

month = apr,

day = "1",

doi = "10.1109/TSE.2020.3018481",

language = "English",

volume = "48",

pages = "1268--1289",

journal = "IEEE Transactions on Software Engineering",

issn = "0098-5589",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "4",

}

TY - JOUR

T1 - Deep Learning Based Program Generation From Requirements Text

T2 - Are We There Yet?

AU - Liu, Hui

AU - Shen, Mingzhu

AU - Zhu, Jiaqi

AU - Niu, Nan

AU - Li, Ge

AU - Zhang, Lu

PY - 2022/4/1

Y1 - 2022/4/1

N2 - To release developers from time-consuming software development, many approaches have been proposed to generate source code automatically according to software requirements. With significant advances in deep learning and natural language processing, deep learning-based approaches are proposed to generate source code from natural language descriptions. The key insight is that given a large corpus of software requirements and their corresponding implementations, advanced deep learning techniques may learn how to translate software requirements into source code that fulfill such requirements. Although such approaches are reported to be highly accurate, they are evaluated on datasets that are rather small, lack of diversity, and significantly different from real-world software requirements. To this end, we build a large scale dataset that is composed of longer requirements as well as validated implementations. We evaluate the state-of-the-art approaches on this new dataset, and the results suggest that their performance on our dataset is significantly lower than that on existing datasets concerning the common metrics, i.e., BLEU. Evaluation results also suggest that the generated programs often contain syntactic and semantical errors, and none of them can pass even a single predefined test case. Further analysis reveals that the state-of-the-art approaches learn little from software requirements, and most of the successfully generated statements are popular statements in the training programs. Based on this finding, we propose a popularity-based approach that always generates the most popular statements in training programs regardless of the input (software requirements). Evaluation results suggest that none of the state-of-the-art approaches can outperform this simple statistics-based approach. As a conclusion, deep learning-based program generation requires significant improvement in the future, and our dataset may serve as a basis for future research in this direction.

AB - To release developers from time-consuming software development, many approaches have been proposed to generate source code automatically according to software requirements. With significant advances in deep learning and natural language processing, deep learning-based approaches are proposed to generate source code from natural language descriptions. The key insight is that given a large corpus of software requirements and their corresponding implementations, advanced deep learning techniques may learn how to translate software requirements into source code that fulfill such requirements. Although such approaches are reported to be highly accurate, they are evaluated on datasets that are rather small, lack of diversity, and significantly different from real-world software requirements. To this end, we build a large scale dataset that is composed of longer requirements as well as validated implementations. We evaluate the state-of-the-art approaches on this new dataset, and the results suggest that their performance on our dataset is significantly lower than that on existing datasets concerning the common metrics, i.e., BLEU. Evaluation results also suggest that the generated programs often contain syntactic and semantical errors, and none of them can pass even a single predefined test case. Further analysis reveals that the state-of-the-art approaches learn little from software requirements, and most of the successfully generated statements are popular statements in the training programs. Based on this finding, we propose a popularity-based approach that always generates the most popular statements in training programs regardless of the input (software requirements). Evaluation results suggest that none of the state-of-the-art approaches can outperform this simple statistics-based approach. As a conclusion, deep learning-based program generation requires significant improvement in the future, and our dataset may serve as a basis for future research in this direction.

KW - Software requirements

KW - code generation

KW - data set

KW - deep learning

UR - http://www.scopus.com/inward/record.url?scp=85128812370&partnerID=8YFLogxK

U2 - 10.1109/TSE.2020.3018481

DO - 10.1109/TSE.2020.3018481

M3 - Article

AN - SCOPUS:85128812370

SN - 0098-5589

VL - 48

SP - 1268

EP - 1289

JO - IEEE Transactions on Software Engineering

JF - IEEE Transactions on Software Engineering

IS - 4

ER -

Deep Learning Based Program Generation From Requirements Text: Are We There Yet?

摘要

访问文件

其它文件与链接

指纹

引用此