RPViT: Vision Transformer Based on Region Proposal

Jing Ge, Qianxiang Wang, Jiahui Tong*, Guangyu Gao

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Vision Transformers constantly absorb the characteristics of convolutional neural networks to solve its shortcomings in translational invariance and scale invariance. However, dividing the image by a simple grid often destroys the position and scale features in the image at the beginning of the network. In this paper, we propose a vision transformer based on region proposal, which obtains the inductive bias in a simple way. Specifically, RPViT achieves locality and scale-invariance by extracting regions with locality using a traditional region proposal algorithm and deflating objects of different scales to the same scale by a bilinear interpolation algorithm. In addition, to enable the network to fully utilize and encode diverse candidate objects, a multi-class token approach based on orthogonalization is proposed and applied. Experiments on ImageNet demonstrate that RPViT outperforms baseline converters and related work.

源语言英语
主期刊名ICIGP 2022 - Proceedings of the 2022 5th International Conference on Image and Graphics Processing
出版商Association for Computing Machinery
220-225
页数6
ISBN(电子版)9781450395465
DOI
出版状态已出版 - 7 1月 2022
活动5th International Conference on Image and Graphics Processing, ICIGP 2022 - Virtual, Online, 中国
期限: 7 1月 20229 1月 2022

出版系列

姓名ACM International Conference Proceeding Series

会议

会议5th International Conference on Image and Graphics Processing, ICIGP 2022
国家/地区中国
Virtual, Online
时期7/01/229/01/22

指纹

探究 'RPViT: Vision Transformer Based on Region Proposal' 的科研主题。它们共同构成独一无二的指纹。

引用此