TY - JOUR
T1 - MalCAFF
T2 - A Cross Attention-Based Feature Fusion Framework for Malware Classification
AU - Guo, Wenjie
AU - Hu, Jingjing
AU - Wang, Yong
AU - Fu, Yifeng
AU - Xue, Jingfeng
AU - Zheng, Baokun
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2026
Y1 - 2026
N2 - Malware detection faces growing challenges due to sophisticated obfuscation techniques that undermine the robustness of single-modal approaches relying solely on static code analysis or dynamic behavioral profiling. To address this issue, we propose MalCAFF, a cross-attention-based framework for fine-grained fusion of static assembly semantics and dynamic API behaviors. Static features are refined through program slicing to preserve critical semantics, while dynamic behaviors are represented by API Semantic Block Sequence (ABS), which aggregate API calls into parameter-aware, semantically enriched units aligned with static functions. A Cross Attention-based Feature Enhancement (CAFE) module then achieves bidirectional semantic complementation across modalities. Furthermore, contrastive pre-training mitigates inter-modal distributional discrepancies and enhances generalization. Extensive experiments on the VirusShare dataset demonstrate that MalCAFF outperforms state-of-the-art methods.
AB - Malware detection faces growing challenges due to sophisticated obfuscation techniques that undermine the robustness of single-modal approaches relying solely on static code analysis or dynamic behavioral profiling. To address this issue, we propose MalCAFF, a cross-attention-based framework for fine-grained fusion of static assembly semantics and dynamic API behaviors. Static features are refined through program slicing to preserve critical semantics, while dynamic behaviors are represented by API Semantic Block Sequence (ABS), which aggregate API calls into parameter-aware, semantically enriched units aligned with static functions. A Cross Attention-based Feature Enhancement (CAFE) module then achieves bidirectional semantic complementation across modalities. Furthermore, contrastive pre-training mitigates inter-modal distributional discrepancies and enhances generalization. Extensive experiments on the VirusShare dataset demonstrate that MalCAFF outperforms state-of-the-art methods.
KW - Malware detection
KW - cross-attention mechanism
KW - malware classification
KW - multi-modal feature fusion
UR - https://www.scopus.com/pages/publications/105020705634
U2 - 10.1109/TNSE.2025.3627147
DO - 10.1109/TNSE.2025.3627147
M3 - Article
AN - SCOPUS:105020705634
SN - 2327-4697
VL - 13
SP - 4294
EP - 4311
JO - IEEE Transactions on Network Science and Engineering
JF - IEEE Transactions on Network Science and Engineering
ER -