TY - GEN
T1 - Action-Manipulation Attack and Defense to X-Armed Bandits
AU - Luo, Zhi
AU - Li, Youqi
AU - Chen, Lixing
AU - Xu, Zichuan
AU - Zhou, Pan
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - As a continuous variant of Multi-armed bandits (MAB), X-armed bandits have enriched many applications of online machine learning like personalized recommendation system. However, the attack and defense to the X-armed bandits remain largely unexplored, though the MAB has proved to be vulnerable. In this paper, we aim to bridge this gap and investigate the robustness analysis for the X-armed bandits. Specifically, we consider action-manipulation attack, which is practical but harder than the existing reward-manipulation attack. We propose an attack algorithm based on a lower bound tree (LBT), which can continuously hijack the learner's action by perturbing X-armed bandits' high confidence tree (HCT) construction. As a result, the nodes including the arm targeted by the attacker is selected frequently with a sublinear attack cost. To defend against the LBT attack, we propose a robust version of the HCT algorithm, called RoHCT. We theoretically analyze that the regret of RoHCT is related to the upper bound of the total cost Q and still sublinear to total number of rounds T. We carry out experiments to evaluate the effectiveness of LBT and RoHCT.
AB - As a continuous variant of Multi-armed bandits (MAB), X-armed bandits have enriched many applications of online machine learning like personalized recommendation system. However, the attack and defense to the X-armed bandits remain largely unexplored, though the MAB has proved to be vulnerable. In this paper, we aim to bridge this gap and investigate the robustness analysis for the X-armed bandits. Specifically, we consider action-manipulation attack, which is practical but harder than the existing reward-manipulation attack. We propose an attack algorithm based on a lower bound tree (LBT), which can continuously hijack the learner's action by perturbing X-armed bandits' high confidence tree (HCT) construction. As a result, the nodes including the arm targeted by the attacker is selected frequently with a sublinear attack cost. To defend against the LBT attack, we propose a robust version of the HCT algorithm, called RoHCT. We theoretically analyze that the regret of RoHCT is related to the upper bound of the total cost Q and still sublinear to total number of rounds T. We carry out experiments to evaluate the effectiveness of LBT and RoHCT.
KW - action-manipulation attack
KW - defense
KW - robustness
KW - χ-armed bandits
UR - http://www.scopus.com/inward/record.url?scp=85151747359&partnerID=8YFLogxK
U2 - 10.1109/TrustCom56396.2022.00153
DO - 10.1109/TrustCom56396.2022.00153
M3 - Conference contribution
AN - SCOPUS:85151747359
T3 - Proceedings - 2022 IEEE 21st International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2022
SP - 1115
EP - 1122
BT - Proceedings - 2022 IEEE 21st International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 21st IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2022
Y2 - 9 December 2022 through 11 December 2022
ER -