TY - JOUR
T1 - Inter-Scale Similarity Guided Cost Aggregation for Stereo Matching
AU - Li, Pengxiang
AU - Yao, Chengtang
AU - Jia, Yunde
AU - Wu, Yuwei
N1 - Publisher Copyright:
© 1991-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Stereo matching aims to estimate 3D geometry by computing disparity from a rectified image pair. Most deep learning based stereo matching methods aggregate multi-scale cost volumes computed by downsampling and achieve good performance. However, their effectiveness in fine-grained areas is limited by significant detail loss during downsampling and the use of fixed weights in upsampling. In this paper, we propose an inter-scale similarity-guided cost aggregation method that dynamically upsamples the cost volumes according to the content of images for stereo matching. The method consists of two modules: inter-scale similarity measurement and stereo-content-aware cost aggregation. Specifically, we use inter-scale similarity measurement to generate similarity guidance from feature maps in adjacent scales. The guidance, generated from both reference and target images, is then used to aggregate the cost volumes from low-resolution to high-resolution via stereo-content-aware cost aggregation. We further split the 3D aggregation into 1D disparity and 2D spatial aggregation to reduce the computational cost. Experimental results on various benchmarks (e.g., SceneFlow, KITTI, Middlebury and ETH3D-two-view) show that our method achieves consistent performance gain on multiple models (e.g., PSM-Net, HSM-Net, CF-Net, FastAcv, and FactAcvPlus). The code can be found at https://github.com/Pengxiang-Li/issga-stereo.
AB - Stereo matching aims to estimate 3D geometry by computing disparity from a rectified image pair. Most deep learning based stereo matching methods aggregate multi-scale cost volumes computed by downsampling and achieve good performance. However, their effectiveness in fine-grained areas is limited by significant detail loss during downsampling and the use of fixed weights in upsampling. In this paper, we propose an inter-scale similarity-guided cost aggregation method that dynamically upsamples the cost volumes according to the content of images for stereo matching. The method consists of two modules: inter-scale similarity measurement and stereo-content-aware cost aggregation. Specifically, we use inter-scale similarity measurement to generate similarity guidance from feature maps in adjacent scales. The guidance, generated from both reference and target images, is then used to aggregate the cost volumes from low-resolution to high-resolution via stereo-content-aware cost aggregation. We further split the 3D aggregation into 1D disparity and 2D spatial aggregation to reduce the computational cost. Experimental results on various benchmarks (e.g., SceneFlow, KITTI, Middlebury and ETH3D-two-view) show that our method achieves consistent performance gain on multiple models (e.g., PSM-Net, HSM-Net, CF-Net, FastAcv, and FactAcvPlus). The code can be found at https://github.com/Pengxiang-Li/issga-stereo.
KW - content-aware upsampling
KW - cost aggregation
KW - Stereo matching
UR - http://www.scopus.com/inward/record.url?scp=85203454226&partnerID=8YFLogxK
U2 - 10.1109/TCSVT.2024.3453965
DO - 10.1109/TCSVT.2024.3453965
M3 - Article
AN - SCOPUS:85203454226
SN - 1051-8215
VL - 35
SP - 134
EP - 147
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
IS - 1
ER -