TY - GEN
T1 - Two-way feature-aligned and attention-rectified adversarial training
AU - Zhang, Haitao
AU - Jia, Fan
AU - Zhang, Quanxin
AU - Han, Yahong
AU - Kuang, Xiaohui
AU - Tan, Yu An
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/7
Y1 - 2020/7
N2 - Adversarial training increases robustness by augmenting training data with adversarial examples. However, vanilla adversarial training may be overfitting to certain adversarial attacks. Small perturbations in images bring in error which is gradually amplified when forwarded through the model so that the error leads to wrong classification. Besides, small perturbations will also distract classifier's attention to significant features that are relevant to the true label. In this paper, we propose a novel two-way feature-aligned and attention-rectified adversarial training (FAAR) to improve adversarial training (AT). FAAR utilizes two-way feature alignment and attention rectification to mitigate the problems mentioned above. FAAR effectively suppresses perturbations in lowlevel, high-level and global features by moving features of perturbed images towards those of clean images with twoway feature alignment. It also leads the model into focusing more on useful features which are correlated with true label through rectifying gradient-weighted attention. Besides, feature alignment activates attention rectification by reducing perturbations in high-level feature. Our proposed method FAAR surpasses other existing AT methods in three aspects. First, it pushes the model to keep invariant when dealing with different adversarial attacks and different magnitude of perturbations. Second, it can be applied to any convolution neural networks. Third, the training process is end-to-end. For experiments, FAAR shows promising defense performance on CIFAR-10 and ImageNet.
AB - Adversarial training increases robustness by augmenting training data with adversarial examples. However, vanilla adversarial training may be overfitting to certain adversarial attacks. Small perturbations in images bring in error which is gradually amplified when forwarded through the model so that the error leads to wrong classification. Besides, small perturbations will also distract classifier's attention to significant features that are relevant to the true label. In this paper, we propose a novel two-way feature-aligned and attention-rectified adversarial training (FAAR) to improve adversarial training (AT). FAAR utilizes two-way feature alignment and attention rectification to mitigate the problems mentioned above. FAAR effectively suppresses perturbations in lowlevel, high-level and global features by moving features of perturbed images towards those of clean images with twoway feature alignment. It also leads the model into focusing more on useful features which are correlated with true label through rectifying gradient-weighted attention. Besides, feature alignment activates attention rectification by reducing perturbations in high-level feature. Our proposed method FAAR surpasses other existing AT methods in three aspects. First, it pushes the model to keep invariant when dealing with different adversarial attacks and different magnitude of perturbations. Second, it can be applied to any convolution neural networks. Third, the training process is end-to-end. For experiments, FAAR shows promising defense performance on CIFAR-10 and ImageNet.
KW - Adversarial training
KW - Attention rectification
KW - Feature alignment
UR - http://www.scopus.com/inward/record.url?scp=85090381059&partnerID=8YFLogxK
U2 - 10.1109/ICME46284.2020.9102777
DO - 10.1109/ICME46284.2020.9102777
M3 - Conference contribution
AN - SCOPUS:85090381059
T3 - Proceedings - IEEE International Conference on Multimedia and Expo
BT - 2020 IEEE International Conference on Multimedia and Expo, ICME 2020
PB - IEEE Computer Society
T2 - 2020 IEEE International Conference on Multimedia and Expo, ICME 2020
Y2 - 6 July 2020 through 10 July 2020
ER -