Mix-Teaching: A Simple, Unified and Effective Semi-Supervised Learning Framework for Monocular 3D Object Detection

Lei Yang, Xinyu Zhang*, Jun Li, Li Wang, Minghan Zhu, Chuang Zhang, Huaping Liu

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

8 Citations (Scopus)

Abstract

Semi-supervised learning (SSL) has promising potential for improving model performance using both labelled and unlabelled data. Since recovering 3D information from 2D images is an ill-posed problem, the current state-of-the-art methods of monocular 3D object detection (Mono3D) have relatively low precision and recall, making semi-supervised learning for Mono3D tasks challenging and understudied. In this work, we propose a unified and effective semi-supervised learning framework called Mix-Teaching that can be applied to most monocular 3D object detectors. Based on the idea of decomposition and recombination, unlabelled samples are firstly decomposed into collections of image patches with high-quality predictions and collections of background images containing no objects. The student model is then trained on the mixed images containing dense instances with high-quality pseudo-labels generated by the recombination operation. In addition, we propose an uncertainty-based filter to distinguish high-quality pseudo-labels from noisy predictions during the decomposition process. As results in KITTI and nuScenes benchmarks, Mix-Teaching consistently improves MonoFlex and GUPNet by significant margins under various labeling ratios. Our method achieves around +6.34% AP3D improvement against the GUPNet on the validation set when using only 10% labelled data. Using the full training set and the additional 38K raw images from KITTI, it can further improve the MonoFlex by +4.65% absolute improvement on AP3D for car detection, reaching 18.54% AP3D , which ranks the 1st place among all monocular based methods on the KITTI test leaderboard.

Original languageEnglish
Pages (from-to)6832-6844
Number of pages13
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume33
Issue number11
DOIs
Publication statusPublished - 1 Nov 2023
Externally publishedYes

Keywords

  • 3D object detection
  • Semi-supervised learning
  • autonomous driving

Fingerprint

Dive into the research topics of 'Mix-Teaching: A Simple, Unified and Effective Semi-Supervised Learning Framework for Monocular 3D Object Detection'. Together they form a unique fingerprint.

Cite this