TY - JOUR
T1 - HGE-BVHD
T2 - Heterogeneous graph embedding scheme of complex structure functions for binary vulnerability homology discrimination
AU - Xing, Jiyuan
AU - Luo, Senlin
AU - Pan, Limin
AU - Hao, Jingwei
AU - Guan, Yingdan
AU - Wu, Zhouting
N1 - Publisher Copyright:
© 2023 Elsevier Ltd
PY - 2024/3/15
Y1 - 2024/3/15
N2 - Homologous vulnerability detection is an important aspect of computer security. It has several key problems, including discriminating structurally complex functions, supporting cross-architecture programs, distinguishing false positives, etc. Non-homologous functions with similar control flow graph structures are easily misjudged, which decreases discrimination accuracy. The vectors generated by instruction-embedding models contain architectural features, which increases the distance between homologous function vectors and leads to misclassification. In this paper, we propose a novel heterogeneous graph embedding (HGE) binary vulnerability homology discrimination (BVHD) method. HGE is used to aggregate basic block features to generate function representations, perform different transformations according to control flow and data flow, and improve the discrimination of non-homologous functions to increase discrimination accuracy. A novel multi-architecture instruction-embedding model is proposed for abstracting common semantic features and weakening the interference of architectural features to avoid misclassification. The experimental results show that the proposed method achieves state-of-the-art results in homologous function discrimination, and the upgrade is significant for complex structure functions.
AB - Homologous vulnerability detection is an important aspect of computer security. It has several key problems, including discriminating structurally complex functions, supporting cross-architecture programs, distinguishing false positives, etc. Non-homologous functions with similar control flow graph structures are easily misjudged, which decreases discrimination accuracy. The vectors generated by instruction-embedding models contain architectural features, which increases the distance between homologous function vectors and leads to misclassification. In this paper, we propose a novel heterogeneous graph embedding (HGE) binary vulnerability homology discrimination (BVHD) method. HGE is used to aggregate basic block features to generate function representations, perform different transformations according to control flow and data flow, and improve the discrimination of non-homologous functions to increase discrimination accuracy. A novel multi-architecture instruction-embedding model is proposed for abstracting common semantic features and weakening the interference of architectural features to avoid misclassification. The experimental results show that the proposed method achieves state-of-the-art results in homologous function discrimination, and the upgrade is significant for complex structure functions.
KW - Binary code
KW - Heterogeneous graph embedding
KW - Homology vulnerability discrimination
KW - Multi-architecture instruction embedding
UR - http://www.scopus.com/inward/record.url?scp=85173619109&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2023.121835
DO - 10.1016/j.eswa.2023.121835
M3 - Review article
AN - SCOPUS:85173619109
SN - 0957-4174
VL - 238
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 121835
ER -