Towards a Weakly Supervised Framework for 3D Point Cloud Object Detection and Annotation

Qinghao Meng; Wenguan Wang; Tianfei Zhou; Jianbing Shen; Yunde Jia; Luc Van Gool

doi:10.1109/TPAMI.2021.3063611

Towards a Weakly Supervised Framework for 3D Point Cloud Object Detection and Annotation

Qinghao Meng, Wenguan Wang^*, Tianfei Zhou, Jianbing Shen, Yunde Jia, Luc Van Gool

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

100 Citations (Scopus)

Abstract

It is quite laborious and costly to manually label LiDAR point cloud data for training high-quality 3D object detectors. This work proposes a weakly supervised framework which allows learning 3D detection from a few weakly annotated examples. This is achieved by a two-stage architecture design. Stage-1 learns to generate cylindrical object proposals under inaccurate and inexact supervision, obtained by our proposed BEV center-click annotation strategy, where only the horizontal object centers are click-annotated in bird's view scenes. Stage-2 learns to predict cuboids and confidence scores in a coarse-to-fine, cascade manner, under incomplete supervision, i.e., only a small portion of object cuboids are precisely annotated. With KITTI dataset, using only 500 weakly annotated scenes and 534 precisely labeled vehicle instances, our method achieves $86-97$86-97 percent the performance of current top-leading, fully supervised detectors (which require 3,712 exhaustively annotated scenes with 15,654 instances). More importantly, with our elaborately designed network architecture, our trained model can be applied as a 3D object annotator, supporting both automatic and active (human-in-the-loop) working modes. The annotations generated by our model can be used to train 3D object detectors, achieving over 95 percent of their original performance (with manually labeled training data). Our experiments also show our model's potential in boosting performance when given more training data. The above designs make our approach highly practical and open-up opportunities for learning 3D detection at reduced annotation cost.

Original language	English
Pages (from-to)	4454-4468
Number of pages	15
Journal	IEEE Transactions on Pattern Analysis and Machine Intelligence
Volume	44
Issue number	8
DOIs	https://doi.org/10.1109/TPAMI.2021.3063611
Publication status	Published - 1 Aug 2022
Externally published	Yes

Keywords

3D annotation
3D object detection
Autonomous driving
Cascade inference
Weakly supervised learning

Access to Document

10.1109/TPAMI.2021.3063611

Cite this

Meng, Q., Wang, W., Zhou, T., Shen, J., Jia, Y., & Van Gool, L. (2022). Towards a Weakly Supervised Framework for 3D Point Cloud Object Detection and Annotation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(8), 4454-4468. https://doi.org/10.1109/TPAMI.2021.3063611

@article{7238327913b84ba59bae49d62324fca0,

title = "Towards a Weakly Supervised Framework for 3D Point Cloud Object Detection and Annotation",

abstract = "It is quite laborious and costly to manually label LiDAR point cloud data for training high-quality 3D object detectors. This work proposes a weakly supervised framework which allows learning 3D detection from a few weakly annotated examples. This is achieved by a two-stage architecture design. Stage-1 learns to generate cylindrical object proposals under inaccurate and inexact supervision, obtained by our proposed BEV center-click annotation strategy, where only the horizontal object centers are click-annotated in bird's view scenes. Stage-2 learns to predict cuboids and confidence scores in a coarse-to-fine, cascade manner, under incomplete supervision, i.e., only a small portion of object cuboids are precisely annotated. With KITTI dataset, using only 500 weakly annotated scenes and 534 precisely labeled vehicle instances, our method achieves $86-97$86-97 percent the performance of current top-leading, fully supervised detectors (which require 3,712 exhaustively annotated scenes with 15,654 instances). More importantly, with our elaborately designed network architecture, our trained model can be applied as a 3D object annotator, supporting both automatic and active (human-in-the-loop) working modes. The annotations generated by our model can be used to train 3D object detectors, achieving over 95 percent of their original performance (with manually labeled training data). Our experiments also show our model's potential in boosting performance when given more training data. The above designs make our approach highly practical and open-up opportunities for learning 3D detection at reduced annotation cost.",

keywords = "3D annotation, 3D object detection, Autonomous driving, Cascade inference, Weakly supervised learning",

author = "Qinghao Meng and Wenguan Wang and Tianfei Zhou and Jianbing Shen and Yunde Jia and {Van Gool}, Luc",

note = "Publisher Copyright: {\textcopyright} 1979-2012 IEEE.",

year = "2022",

month = aug,

day = "1",

doi = "10.1109/TPAMI.2021.3063611",

language = "English",

volume = "44",

pages = "4454--4468",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "IEEE Computer Society",

number = "8",

}

TY - JOUR

T1 - Towards a Weakly Supervised Framework for 3D Point Cloud Object Detection and Annotation

AU - Meng, Qinghao

AU - Wang, Wenguan

AU - Zhou, Tianfei

AU - Shen, Jianbing

AU - Jia, Yunde

AU - Van Gool, Luc

PY - 2022/8/1

Y1 - 2022/8/1

N2 - It is quite laborious and costly to manually label LiDAR point cloud data for training high-quality 3D object detectors. This work proposes a weakly supervised framework which allows learning 3D detection from a few weakly annotated examples. This is achieved by a two-stage architecture design. Stage-1 learns to generate cylindrical object proposals under inaccurate and inexact supervision, obtained by our proposed BEV center-click annotation strategy, where only the horizontal object centers are click-annotated in bird's view scenes. Stage-2 learns to predict cuboids and confidence scores in a coarse-to-fine, cascade manner, under incomplete supervision, i.e., only a small portion of object cuboids are precisely annotated. With KITTI dataset, using only 500 weakly annotated scenes and 534 precisely labeled vehicle instances, our method achieves $86-97$86-97 percent the performance of current top-leading, fully supervised detectors (which require 3,712 exhaustively annotated scenes with 15,654 instances). More importantly, with our elaborately designed network architecture, our trained model can be applied as a 3D object annotator, supporting both automatic and active (human-in-the-loop) working modes. The annotations generated by our model can be used to train 3D object detectors, achieving over 95 percent of their original performance (with manually labeled training data). Our experiments also show our model's potential in boosting performance when given more training data. The above designs make our approach highly practical and open-up opportunities for learning 3D detection at reduced annotation cost.

AB - It is quite laborious and costly to manually label LiDAR point cloud data for training high-quality 3D object detectors. This work proposes a weakly supervised framework which allows learning 3D detection from a few weakly annotated examples. This is achieved by a two-stage architecture design. Stage-1 learns to generate cylindrical object proposals under inaccurate and inexact supervision, obtained by our proposed BEV center-click annotation strategy, where only the horizontal object centers are click-annotated in bird's view scenes. Stage-2 learns to predict cuboids and confidence scores in a coarse-to-fine, cascade manner, under incomplete supervision, i.e., only a small portion of object cuboids are precisely annotated. With KITTI dataset, using only 500 weakly annotated scenes and 534 precisely labeled vehicle instances, our method achieves $86-97$86-97 percent the performance of current top-leading, fully supervised detectors (which require 3,712 exhaustively annotated scenes with 15,654 instances). More importantly, with our elaborately designed network architecture, our trained model can be applied as a 3D object annotator, supporting both automatic and active (human-in-the-loop) working modes. The annotations generated by our model can be used to train 3D object detectors, achieving over 95 percent of their original performance (with manually labeled training data). Our experiments also show our model's potential in boosting performance when given more training data. The above designs make our approach highly practical and open-up opportunities for learning 3D detection at reduced annotation cost.

KW - 3D annotation

KW - 3D object detection

KW - Autonomous driving

KW - Cascade inference

KW - Weakly supervised learning

UR - http://www.scopus.com/inward/record.url?scp=85102266069&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2021.3063611

DO - 10.1109/TPAMI.2021.3063611

M3 - Article

C2 - 33656990

AN - SCOPUS:85102266069

SN - 0162-8828

VL - 44

SP - 4454

EP - 4468

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

IS - 8

ER -

Towards a Weakly Supervised Framework for 3D Point Cloud Object Detection and Annotation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this