TY - JOUR
T1 - Defect Detection in Deep Learning Model Compilers for LLM-Generated Computation Graphs
AU - Pan, Limin
AU - Zhao, Zhiyang
AU - Shao, Siyuan
AU - Luo, Senlin
AU - Zhang, Haoran
N1 - Publisher Copyright:
© 2025, Beijing Institute of Technology. All rights reserved.
PY - 2025
Y1 - 2025
N2 - Defects in deep learning model compilers risk model inference crashes, compromising deployment security and usability. Current defect detection methods suffer from inadequate code-line coverage and limited di-versity in detectable defect types. Existing approaches rely on local operator constraints for detection, failing to trigger defects caused by multi-operator interactions, while semantic-preserving mutation strategies restrict the operator types in computation graph nodes, resulting in insufficient code-line coverage and significantly redu-cing defect detection rates. In this paper, a defect detection method was proposed, which employs multi-round prompting of LLMs to construct test cases. Prompts were created to guide LLMs in generating computation graphs, after which common operators were masked and substituted with rare ones. The graphs were iteratively updated to produce diverse test cases. Experimental results on multiple deep learning model compilers demon-strate that the proposed method significantly improves code coverage and defect detection rates compared to baseline approaches, exhibiting high reliability and practical value.
AB - Defects in deep learning model compilers risk model inference crashes, compromising deployment security and usability. Current defect detection methods suffer from inadequate code-line coverage and limited di-versity in detectable defect types. Existing approaches rely on local operator constraints for detection, failing to trigger defects caused by multi-operator interactions, while semantic-preserving mutation strategies restrict the operator types in computation graph nodes, resulting in insufficient code-line coverage and significantly redu-cing defect detection rates. In this paper, a defect detection method was proposed, which employs multi-round prompting of LLMs to construct test cases. Prompts were created to guide LLMs in generating computation graphs, after which common operators were masked and substituted with rare ones. The graphs were iteratively updated to produce diverse test cases. Experimental results on multiple deep learning model compilers demon-strate that the proposed method significantly improves code coverage and defect detection rates compared to baseline approaches, exhibiting high reliability and practical value.
KW - deep learning model compiler
KW - defect detection
KW - fuzz testing
KW - large language model
UR - https://www.scopus.com/pages/publications/105021047019
U2 - 10.15918/j.tbit1001-0645.2025.071
DO - 10.15918/j.tbit1001-0645.2025.071
M3 - Article
AN - SCOPUS:105021047019
SN - 1001-0645
VL - 45
SP - 1204
EP - 1212
JO - Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology
JF - Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology
IS - 11
ER -