TY - JOUR
T1 - Diffusion-Based Reconstruction of 3-D Occupancy Maps From 4-D Radar Tensors
AU - Yang, Fan
AU - Li, Xueyuan
AU - Du, Minggang
AU - Jiang, Yutong
AU - Qiao, Fandong
AU - Niu, Zhi
N1 - Publisher Copyright:
© 2026 IEEE. All rights reserved.
PY - 2026/5/1
Y1 - 2026/5/1
N2 - The 4-D radar sensing provides rich measurements across Doppler, range, azimuth, and elevation dimensions, offering strong resilience in adverse environmental conditions. However, reconstructing accurate 3-D occupancy maps from 4-D radar tensors (4DRT) remains challenging due to low spatial resolution, nonuniform sampling, and artifacts caused by sidelobes and multipath reflections. This article proposes a diffusion-based framework for 3-D occupancy reconstruction that directly leverages the encoded structure of 4DRT data. The pipeline consists of three components: a feature-preserving dimensionality reduction module that produces Doppler-aware sparse descriptors from raw 4DRT; a hierarchical pillar-based representation that encodes vertical geometry using normalized height segments within a compact and structured spatial format; and a conditional diffusion model that iteratively denoises latent occupancy predictions. To encode the sparse and irregular 4DRT inputs, we design a hybrid condition encoder that combines convolutional layers with self-attention to extract both local and global contextual features. These are injected into a U-Net-based denoising network via cross-attention to generate dense and spatially consistent 3-D occupancy volumes. Extensive experiments on the Coloradar benchmark and real-world driving data demonstrate that the proposed method consistently outperforms the state-of-the-art baseline, improving intersection over union (IoU) from at most 3.3% to over 30% across diverse scenes, while reducing the Chamfer distance (CD) by more than 4× and maintaining a single-frame inference latency of approximately 50 ms.
AB - The 4-D radar sensing provides rich measurements across Doppler, range, azimuth, and elevation dimensions, offering strong resilience in adverse environmental conditions. However, reconstructing accurate 3-D occupancy maps from 4-D radar tensors (4DRT) remains challenging due to low spatial resolution, nonuniform sampling, and artifacts caused by sidelobes and multipath reflections. This article proposes a diffusion-based framework for 3-D occupancy reconstruction that directly leverages the encoded structure of 4DRT data. The pipeline consists of three components: a feature-preserving dimensionality reduction module that produces Doppler-aware sparse descriptors from raw 4DRT; a hierarchical pillar-based representation that encodes vertical geometry using normalized height segments within a compact and structured spatial format; and a conditional diffusion model that iteratively denoises latent occupancy predictions. To encode the sparse and irregular 4DRT inputs, we design a hybrid condition encoder that combines convolutional layers with self-attention to extract both local and global contextual features. These are injected into a U-Net-based denoising network via cross-attention to generate dense and spatially consistent 3-D occupancy volumes. Extensive experiments on the Coloradar benchmark and real-world driving data demonstrate that the proposed method consistently outperforms the state-of-the-art baseline, improving intersection over union (IoU) from at most 3.3% to over 30% across diverse scenes, while reducing the Chamfer distance (CD) by more than 4× and maintaining a single-frame inference latency of approximately 50 ms.
KW - 3-D occupancy mapping
KW - 4-D radar tensor (4DRT)
KW - autonomous driving
KW - diffusion models
KW - robotics perception
UR - https://www.scopus.com/pages/publications/105029382889
U2 - 10.1109/JIOT.2026.3661161
DO - 10.1109/JIOT.2026.3661161
M3 - Article
AN - SCOPUS:105029382889
SN - 2327-4662
VL - 13
SP - 19125
EP - 19140
JO - IEEE Internet of Things Journal
JF - IEEE Internet of Things Journal
IS - 9
ER -