Abstract
Confronted with the task environment full of repetitive textures, the state-of-art description and detection methods for local features greatly suffer from the 'pseudo-negatives,' bringing inconsistent optimization objectives during training. To address this problem, this article develops a self-supervised graph-based contrastive learning framework to train the model for local features, GCLFeat. The proposed approach learns to alleviate the pseudo-negatives specifically from three aspects: 1) designing a graph neural network (GNN), which focuses on mining the local transformational invariance across different views and global textual knowledge within individual images; 2) generating the dense correspondence annotations from a diverse natural dataset with a self-supervised paradigm; and 3) adopting a keypoints-aware sampling strategy to compute the loss across the whole dataset. The experimental results show that the unsupervised framework outperforms the state-of-the-art supervised baselines on diverse downstream benchmarks including image matching, 3-D reconstruction and visual localization. The code will be made public and available at https://github.com/RealZihaoWang/GCLFeat.
Original language | English |
---|---|
Pages (from-to) | 4839-4851 |
Number of pages | 13 |
Journal | IEEE Transactions on Neural Networks and Learning Systems |
Volume | 35 |
Issue number | 4 |
DOIs | |
Publication status | Published - 1 Apr 2024 |
Keywords
- Descriptor
- detector
- graph neural network (GNN)
- image matching
- local features
- self-supervised