TY - JOUR
T1 - Early-Stage Detection of Encrypted Malware Traffic via Multi-flow Temporal Graph Learning
AU - Jia, Jizhe
AU - Zhao, Yi
AU - Shen, Meng
AU - Cui, Susu
AU - Wang, Jing
AU - Zhao, Bufan
AU - Wang, Wei
AU - Zhu, Liehuang
N1 - Publisher Copyright:
© 2005-2012 IEEE.
PY - 2026
Y1 - 2026
N2 - Malware widely adopts network traffic encryption techniques to conceal malicious activities. Recent research has demonstrated the effectiveness of machine learning (ML)-, deep learning (DL)-, and pre-training-based malware traffic detection methods. However, a vast majority of these methods rely on the collected complete traffic during the malware attack. While certain methods can operate on partial traffic, their detection accuracy often significantly decreases when the available data is restricted to the extreme early stage, where information is most sparse. In this paper, we propose DawnGuard, an effective early-stage encrypted malware traffic detection framework through multi-flow temporal graph learning. Specifically, based on the temporal packet density distribution analysis, DawnGuard innovatively proposes a self-adjusting data augmentation strategy for early-stage malware traffic, which can force the model to focus on the early-stage interaction phase with more distinguishable properties. Meanwhile, considering that temporal-topological correlations among multiple flows can provide more distinguishable properties in a malware attack, we further develop a temporal graph learning framework to extract features, which can form Multi-Flow Graph Features (MGF). By utilizing MGF, Dawn-Guard implements a Vision Transformer-based detection mechanism, enabling accurate and precise encrypted malware traffic detection with early-stage traffic by capturing both local and global contextual relationships. Extensive experiments with two real-world datasets demonstrate that DawnGuard outperforms the state-of-the-art (SOTA) methods in three typical scenarios: varying early-stage time windows, imbalanced data, and unseen malware detection. Particularly, DawnGuard achieves an average F1 of 95.11%, 8.7% higher than the SOTA method, by only utilizing the first 20% loading ratio of complete traffic.
AB - Malware widely adopts network traffic encryption techniques to conceal malicious activities. Recent research has demonstrated the effectiveness of machine learning (ML)-, deep learning (DL)-, and pre-training-based malware traffic detection methods. However, a vast majority of these methods rely on the collected complete traffic during the malware attack. While certain methods can operate on partial traffic, their detection accuracy often significantly decreases when the available data is restricted to the extreme early stage, where information is most sparse. In this paper, we propose DawnGuard, an effective early-stage encrypted malware traffic detection framework through multi-flow temporal graph learning. Specifically, based on the temporal packet density distribution analysis, DawnGuard innovatively proposes a self-adjusting data augmentation strategy for early-stage malware traffic, which can force the model to focus on the early-stage interaction phase with more distinguishable properties. Meanwhile, considering that temporal-topological correlations among multiple flows can provide more distinguishable properties in a malware attack, we further develop a temporal graph learning framework to extract features, which can form Multi-Flow Graph Features (MGF). By utilizing MGF, Dawn-Guard implements a Vision Transformer-based detection mechanism, enabling accurate and precise encrypted malware traffic detection with early-stage traffic by capturing both local and global contextual relationships. Extensive experiments with two real-world datasets demonstrate that DawnGuard outperforms the state-of-the-art (SOTA) methods in three typical scenarios: varying early-stage time windows, imbalanced data, and unseen malware detection. Particularly, DawnGuard achieves an average F1 of 95.11%, 8.7% higher than the SOTA method, by only utilizing the first 20% loading ratio of complete traffic.
KW - Malware traffic detection
KW - encrypted traffic analysis
KW - graph learning
UR - https://www.scopus.com/pages/publications/105036436824
U2 - 10.1109/TIFS.2026.3685079
DO - 10.1109/TIFS.2026.3685079
M3 - Article
AN - SCOPUS:105036436824
SN - 1556-6013
JO - IEEE Transactions on Information Forensics and Security
JF - IEEE Transactions on Information Forensics and Security
ER -