TY - JOUR
T1 - Weakly-Supervised Single-view Dense 3D Point Cloud Reconstruction via Differentiable Renderer
AU - Jin, Peng
AU - Liu, Shaoli
AU - Liu, Jianhua
AU - Huang, Hao
AU - Yang, Linlin
AU - Weinmann, Michael
AU - Klein, Reinhard
N1 - Publisher Copyright:
© 2021, The Author(s).
PY - 2021/12
Y1 - 2021/12
N2 - In recent years, addressing ill-posed problems by leveraging prior knowledge contained in databases on learning techniques has gained much attention. In this paper, we focus on complete three-dimensional (3D) point cloud reconstruction based on a single red-green-blue (RGB) image, a task that cannot be approached using classical reconstruction techniques. For this purpose, we used an encoder-decoder framework to encode the RGB information in latent space, and to predict the 3D structure of the considered object from different viewpoints. The individual predictions are combined to yield a common representation that is used in a module combining camera pose estimation and rendering, thereby achieving differentiability with respect to imaging process and the camera pose, and optimization of the two-dimensional prediction error of novel viewpoints. Thus, our method allows end-to-end training and does not require supervision based on additional ground-truth (GT) mask annotations or ground-truth camera pose annotations. Our evaluation of synthetic and real-world data demonstrates the robustness of our approach to appearance changes and self-occlusions, through outperformance of current state-of-the-art methods in terms of accuracy, density, and model completeness.
AB - In recent years, addressing ill-posed problems by leveraging prior knowledge contained in databases on learning techniques has gained much attention. In this paper, we focus on complete three-dimensional (3D) point cloud reconstruction based on a single red-green-blue (RGB) image, a task that cannot be approached using classical reconstruction techniques. For this purpose, we used an encoder-decoder framework to encode the RGB information in latent space, and to predict the 3D structure of the considered object from different viewpoints. The individual predictions are combined to yield a common representation that is used in a module combining camera pose estimation and rendering, thereby achieving differentiability with respect to imaging process and the camera pose, and optimization of the two-dimensional prediction error of novel viewpoints. Thus, our method allows end-to-end training and does not require supervision based on additional ground-truth (GT) mask annotations or ground-truth camera pose annotations. Our evaluation of synthetic and real-world data demonstrates the robustness of our approach to appearance changes and self-occlusions, through outperformance of current state-of-the-art methods in terms of accuracy, density, and model completeness.
KW - Differentiable renderer
KW - Neural networks
KW - Point clouds reconstruction
KW - Single-view configuration
UR - http://www.scopus.com/inward/record.url?scp=85116006228&partnerID=8YFLogxK
U2 - 10.1186/s10033-021-00615-x
DO - 10.1186/s10033-021-00615-x
M3 - Article
AN - SCOPUS:85116006228
SN - 1000-9345
VL - 34
JO - Chinese Journal of Mechanical Engineering (English Edition)
JF - Chinese Journal of Mechanical Engineering (English Edition)
IS - 1
M1 - 93
ER -