TY - GEN
T1 - Local Representation is Not Enough
T2 - 30th International Joint Conference on Artificial Intelligence, IJCAI 2021
AU - Wang, Zihao
AU - Li, Xueyi
AU - Li, Zhen
N1 - Publisher Copyright:
© 2021 International Joint Conferences on Artificial Intelligence. All rights reserved.
PY - 2021
Y1 - 2021
N2 - Significant progress has been witnessed for the descriptor and detector of local features, but there still exist several challenging and intractable limitations, such as insufficient localization accuracy and non-discriminative description, especially in repetitive- or blank-texture regions, which haven't be well addressed. The coarse feature representation and limited receptive field are considered as the main issues for these limitations. To address these issues, we propose a novel Soft Point-Wise Transformer for Descriptor and Detector, simultaneously mining long-range intrinsic and cross-scale dependencies of local features. Furthermore, our model leverages the distinct transformers based on the soft point-wise attention, substantially decreasing the memory and computation complexity, especially for high-resolution feature maps. In addition, multi-level decoder is constructed to guarantee the high detection accuracy and discriminative description. Extensive experiments demonstrate that our model outperforms the existing state-of-the-art methods on the image matching and visual localization benchmarks.
AB - Significant progress has been witnessed for the descriptor and detector of local features, but there still exist several challenging and intractable limitations, such as insufficient localization accuracy and non-discriminative description, especially in repetitive- or blank-texture regions, which haven't be well addressed. The coarse feature representation and limited receptive field are considered as the main issues for these limitations. To address these issues, we propose a novel Soft Point-Wise Transformer for Descriptor and Detector, simultaneously mining long-range intrinsic and cross-scale dependencies of local features. Furthermore, our model leverages the distinct transformers based on the soft point-wise attention, substantially decreasing the memory and computation complexity, especially for high-resolution feature maps. In addition, multi-level decoder is constructed to guarantee the high detection accuracy and discriminative description. Extensive experiments demonstrate that our model outperforms the existing state-of-the-art methods on the image matching and visual localization benchmarks.
UR - http://www.scopus.com/inward/record.url?scp=85125462895&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85125462895
T3 - IJCAI International Joint Conference on Artificial Intelligence
SP - 1150
EP - 1156
BT - Proceedings of the 30th International Joint Conference on Artificial Intelligence, IJCAI 2021
A2 - Zhou, Zhi-Hua
PB - International Joint Conferences on Artificial Intelligence
Y2 - 19 August 2021 through 27 August 2021
ER -