TY - JOUR
T1 - Novel view synthesis with wide-baseline stereo pairs based on local–global information
AU - Song, Kai
AU - Zhang, Lei
N1 - Publisher Copyright:
© 2024
PY - 2025/2
Y1 - 2025/2
N2 - Novel view synthesis generates images from new views using multiple images of a scene in known views. Using wide-baseline stereo image pairs for novel view synthesis allows scenes to be rendered from varied perspectives with only two images, significantly reducing image acquisition and storage costs and improving 3D scene reconstruction efficiency. However, the large geometry difference and severe occlusion between a pair of wide-baseline stereo images often cause artifacts and holes in the novel view images. To address these issues, we propose a method that integrates both local and global information for synthesizing novel view images from wide-baseline stereo image pairs. Initially, our method aggregates cost volume with local information using Convolutional Neural Network (CNN) and employs Transformer to capture global features. This process optimizes disparity prediction for improving the depth prediction and reconstruction quality of 3D scene representations with wide-baseline stereo image pairs. Subsequently, our method uses CNN to capture local semantic information and Transformer to model long-range contextual dependencies, generating high-quality novel view images. Extensive experiments demonstrate that our method can effectively reduce artifacts and holes, thereby enhancing the synthesis quality of novel views from wide-baseline stereo image pairs.
AB - Novel view synthesis generates images from new views using multiple images of a scene in known views. Using wide-baseline stereo image pairs for novel view synthesis allows scenes to be rendered from varied perspectives with only two images, significantly reducing image acquisition and storage costs and improving 3D scene reconstruction efficiency. However, the large geometry difference and severe occlusion between a pair of wide-baseline stereo images often cause artifacts and holes in the novel view images. To address these issues, we propose a method that integrates both local and global information for synthesizing novel view images from wide-baseline stereo image pairs. Initially, our method aggregates cost volume with local information using Convolutional Neural Network (CNN) and employs Transformer to capture global features. This process optimizes disparity prediction for improving the depth prediction and reconstruction quality of 3D scene representations with wide-baseline stereo image pairs. Subsequently, our method uses CNN to capture local semantic information and Transformer to model long-range contextual dependencies, generating high-quality novel view images. Extensive experiments demonstrate that our method can effectively reduce artifacts and holes, thereby enhancing the synthesis quality of novel views from wide-baseline stereo image pairs.
KW - Depth prediction
KW - Novel view synthesis
KW - Warping
KW - Wide-baseline stereo image pair
UR - http://www.scopus.com/inward/record.url?scp=85210665985&partnerID=8YFLogxK
U2 - 10.1016/j.cag.2024.104139
DO - 10.1016/j.cag.2024.104139
M3 - Article
AN - SCOPUS:85210665985
SN - 0097-8493
VL - 126
JO - Computers and Graphics (Pergamon)
JF - Computers and Graphics (Pergamon)
M1 - 104139
ER -