TY - JOUR
T1 - Towards a Weakly Supervised Framework for 3D Point Cloud Object Detection and Annotation
AU - Meng, Qinghao
AU - Wang, Wenguan
AU - Zhou, Tianfei
AU - Shen, Jianbing
AU - Jia, Yunde
AU - Van Gool, Luc
N1 - Publisher Copyright:
© 1979-2012 IEEE.
PY - 2022/8/1
Y1 - 2022/8/1
N2 - It is quite laborious and costly to manually label LiDAR point cloud data for training high-quality 3D object detectors. This work proposes a weakly supervised framework which allows learning 3D detection from a few weakly annotated examples. This is achieved by a two-stage architecture design. Stage-1 learns to generate cylindrical object proposals under inaccurate and inexact supervision, obtained by our proposed BEV center-click annotation strategy, where only the horizontal object centers are click-annotated in bird's view scenes. Stage-2 learns to predict cuboids and confidence scores in a coarse-to-fine, cascade manner, under incomplete supervision, i.e., only a small portion of object cuboids are precisely annotated. With KITTI dataset, using only 500 weakly annotated scenes and 534 precisely labeled vehicle instances, our method achieves $86-97$86-97 percent the performance of current top-leading, fully supervised detectors (which require 3,712 exhaustively annotated scenes with 15,654 instances). More importantly, with our elaborately designed network architecture, our trained model can be applied as a 3D object annotator, supporting both automatic and active (human-in-the-loop) working modes. The annotations generated by our model can be used to train 3D object detectors, achieving over 95 percent of their original performance (with manually labeled training data). Our experiments also show our model's potential in boosting performance when given more training data. The above designs make our approach highly practical and open-up opportunities for learning 3D detection at reduced annotation cost.
AB - It is quite laborious and costly to manually label LiDAR point cloud data for training high-quality 3D object detectors. This work proposes a weakly supervised framework which allows learning 3D detection from a few weakly annotated examples. This is achieved by a two-stage architecture design. Stage-1 learns to generate cylindrical object proposals under inaccurate and inexact supervision, obtained by our proposed BEV center-click annotation strategy, where only the horizontal object centers are click-annotated in bird's view scenes. Stage-2 learns to predict cuboids and confidence scores in a coarse-to-fine, cascade manner, under incomplete supervision, i.e., only a small portion of object cuboids are precisely annotated. With KITTI dataset, using only 500 weakly annotated scenes and 534 precisely labeled vehicle instances, our method achieves $86-97$86-97 percent the performance of current top-leading, fully supervised detectors (which require 3,712 exhaustively annotated scenes with 15,654 instances). More importantly, with our elaborately designed network architecture, our trained model can be applied as a 3D object annotator, supporting both automatic and active (human-in-the-loop) working modes. The annotations generated by our model can be used to train 3D object detectors, achieving over 95 percent of their original performance (with manually labeled training data). Our experiments also show our model's potential in boosting performance when given more training data. The above designs make our approach highly practical and open-up opportunities for learning 3D detection at reduced annotation cost.
KW - 3D annotation
KW - 3D object detection
KW - Autonomous driving
KW - Cascade inference
KW - Weakly supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85102266069&partnerID=8YFLogxK
U2 - 10.1109/TPAMI.2021.3063611
DO - 10.1109/TPAMI.2021.3063611
M3 - Article
C2 - 33656990
AN - SCOPUS:85102266069
SN - 0162-8828
VL - 44
SP - 4454
EP - 4468
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
IS - 8
ER -