基于视觉-文本损失的开放词汇检测大模型对抗样本生成方法

Translated title of the contribution: Adversarial example generation method for open-vocabulary detection large models based on visually-textual fusion loss

Hao Shi, Shu Wang, Jianhong Han, Zhaoyi Luo, Yupei Wang*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Recently, open-vocabulary detection (OVD) has become a research focus in the field of computer vision due to its potential to recognize objects from unknown categories. As a representative approach in this domain, YOLO-World possesses powerful real-time detection capabilities; however, security issues stemming from the vulnerabilities of deep learning networks cannot be overlooked. Against this backdrop, a white-box adversarial examples generation method was proposed, targeting the YOLO-World algorithm, providing insights into identifying and quantifying vulnerabilities in large models. The method utilized gradient data generated during backpropagation in the YOLO-World network to optimize predefined perturbations, which were then added to original examples to form adversarial examples. Initially, confidence scores and bounding box information from model outputs served as a basis for preliminary optimization, resulting in adversarial examples with a certain level of attack effectiveness. This was further enhanced by a visually-textual fusion loss designed according to the RepVL-PAN structure in the YOLO-World model, to increase the destructiveness of adversarial examples against the model. Finally, perturbation magnitude loss was integrated to constrain the total amount of perturbation, generating adversarial examples with limited disturbance. The adversarial examples generated by this method were capable of achieving attack objectives such as confidence reduction and bounding box displacement according to practical needs. Experimental results demonstrated that the proposed method significantly impaired the YOLO-World model, with mean average precision dropping below 5% after testing on the LIVS dataset.

Translated title of the contributionAdversarial example generation method for open-vocabulary detection large models based on visually-textual fusion loss
Original languageChinese (Traditional)
Pages (from-to)1222-1230
Number of pages9
JournalJournal of Graphics
Volume45
Issue number6
DOIs
Publication statusPublished - Dec 2024

Fingerprint

Dive into the research topics of 'Adversarial example generation method for open-vocabulary detection large models based on visually-textual fusion loss'. Together they form a unique fingerprint.

Cite this

Shi, H., Wang, S., Han, J., Luo, Z., & Wang, Y. (2024). 基于视觉-文本损失的开放词汇检测大模型对抗样本生成方法. Journal of Graphics, 45(6), 1222-1230. https://doi.org/10.11996/JG.j.2095-302X.2024061222