Real-Time Multi-Modal Active Vision for Object Detection on UAVs Equipped with Limited Field of View LiDAR and Camera

Chuanbeibei Shi; Ganghua Lai; Yushu Yu; Mauro Bellone; Vincezo Lippiello

doi:10.1109/LRA.2023.3309575

Real-Time Multi-Modal Active Vision for Object Detection on UAVs Equipped with Limited Field of View LiDAR and Camera

Chuanbeibei Shi, Ganghua Lai, Yushu Yu^*, Mauro Bellone, Vincezo Lippiello

^*此作品的通讯作者

机电学院

科研成果: 期刊稿件 › 文章 › 同行评审

15 引用（Scopus）

摘要

This letter aims to solve the challenging problems in multi-modal active vision for object detection on unmanned aerial vehicles (UAVs) with a monocular camera and a limited Field of View (FoV) LiDAR. The point cloud acquired from the low-cost LiDAR is firstly converted into a 3-channel tensor via motion compensation, accumulation, projection, and up-sampling processes. The generated 3-channel point cloud tensor and RGB image are fused into a 6-channel tensor using an early fusion strategy for object detection based on a Gaussian YOLO network structure. To solve the low computational resource problem and improve the real-time performance, the velocity information of the UAV is further fused with the detection results based on an extended Kalman Filter (EKF). A perception-aware model predictive control (MPC) is designed to achieve active vision on our UAV. According to our performance evaluation, our pre-processing step improves other literature methods running time by a factor of 10 while maintaining acceptable detection performance. Furthermore, our fusion architecture reaches 94.6 mAP on the test set, outperforming the individual sensor networks by roughly 5%. We also described an implementation of the overall algorithm on a UAV platform and validated it in real-world experiments.

源语言	英语
页（从-至）	6571-6578
页数	8
期刊	IEEE Robotics and Automation Letters
卷	8
期	10
DOI	https://doi.org/10.1109/LRA.2023.3309575
出版状态	已出版 - 1 10月 2023

访问文件

10.1109/LRA.2023.3309575

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{4eaf5e5e58254fa48c57c7ae80f5d30b,

title = "Real-Time Multi-Modal Active Vision for Object Detection on UAVs Equipped with Limited Field of View LiDAR and Camera",

abstract = "This letter aims to solve the challenging problems in multi-modal active vision for object detection on unmanned aerial vehicles (UAVs) with a monocular camera and a limited Field of View (FoV) LiDAR. The point cloud acquired from the low-cost LiDAR is firstly converted into a 3-channel tensor via motion compensation, accumulation, projection, and up-sampling processes. The generated 3-channel point cloud tensor and RGB image are fused into a 6-channel tensor using an early fusion strategy for object detection based on a Gaussian YOLO network structure. To solve the low computational resource problem and improve the real-time performance, the velocity information of the UAV is further fused with the detection results based on an extended Kalman Filter (EKF). A perception-aware model predictive control (MPC) is designed to achieve active vision on our UAV. According to our performance evaluation, our pre-processing step improves other literature methods running time by a factor of 10 while maintaining acceptable detection performance. Furthermore, our fusion architecture reaches 94.6 mAP on the test set, outperforming the individual sensor networks by roughly 5%. We also described an implementation of the overall algorithm on a UAV platform and validated it in real-world experiments.",

keywords = "Aerial systems: applications, perception-action coupling, sensor fusion",

author = "Chuanbeibei Shi and Ganghua Lai and Yushu Yu and Mauro Bellone and Vincezo Lippiello",

note = "Publisher Copyright: {\textcopyright} 2016 IEEE.",

year = "2023",

month = oct,

day = "1",

doi = "10.1109/LRA.2023.3309575",

language = "English",

volume = "8",

pages = "6571--6578",

journal = "IEEE Robotics and Automation Letters",

issn = "2377-3766",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "10",

}

TY - JOUR

T1 - Real-Time Multi-Modal Active Vision for Object Detection on UAVs Equipped with Limited Field of View LiDAR and Camera

AU - Shi, Chuanbeibei

AU - Lai, Ganghua

AU - Yu, Yushu

AU - Bellone, Mauro

AU - Lippiello, Vincezo

PY - 2023/10/1

Y1 - 2023/10/1

N2 - This letter aims to solve the challenging problems in multi-modal active vision for object detection on unmanned aerial vehicles (UAVs) with a monocular camera and a limited Field of View (FoV) LiDAR. The point cloud acquired from the low-cost LiDAR is firstly converted into a 3-channel tensor via motion compensation, accumulation, projection, and up-sampling processes. The generated 3-channel point cloud tensor and RGB image are fused into a 6-channel tensor using an early fusion strategy for object detection based on a Gaussian YOLO network structure. To solve the low computational resource problem and improve the real-time performance, the velocity information of the UAV is further fused with the detection results based on an extended Kalman Filter (EKF). A perception-aware model predictive control (MPC) is designed to achieve active vision on our UAV. According to our performance evaluation, our pre-processing step improves other literature methods running time by a factor of 10 while maintaining acceptable detection performance. Furthermore, our fusion architecture reaches 94.6 mAP on the test set, outperforming the individual sensor networks by roughly 5%. We also described an implementation of the overall algorithm on a UAV platform and validated it in real-world experiments.

AB - This letter aims to solve the challenging problems in multi-modal active vision for object detection on unmanned aerial vehicles (UAVs) with a monocular camera and a limited Field of View (FoV) LiDAR. The point cloud acquired from the low-cost LiDAR is firstly converted into a 3-channel tensor via motion compensation, accumulation, projection, and up-sampling processes. The generated 3-channel point cloud tensor and RGB image are fused into a 6-channel tensor using an early fusion strategy for object detection based on a Gaussian YOLO network structure. To solve the low computational resource problem and improve the real-time performance, the velocity information of the UAV is further fused with the detection results based on an extended Kalman Filter (EKF). A perception-aware model predictive control (MPC) is designed to achieve active vision on our UAV. According to our performance evaluation, our pre-processing step improves other literature methods running time by a factor of 10 while maintaining acceptable detection performance. Furthermore, our fusion architecture reaches 94.6 mAP on the test set, outperforming the individual sensor networks by roughly 5%. We also described an implementation of the overall algorithm on a UAV platform and validated it in real-world experiments.

KW - Aerial systems: applications

KW - perception-action coupling

KW - sensor fusion

UR - http://www.scopus.com/inward/record.url?scp=85170560250&partnerID=8YFLogxK

U2 - 10.1109/LRA.2023.3309575

DO - 10.1109/LRA.2023.3309575

M3 - Article

AN - SCOPUS:85170560250

SN - 2377-3766

VL - 8

SP - 6571

EP - 6578

JO - IEEE Robotics and Automation Letters

JF - IEEE Robotics and Automation Letters

IS - 10

ER -

Real-Time Multi-Modal Active Vision for Object Detection on UAVs Equipped with Limited Field of View LiDAR and Camera

摘要

访问文件

其它文件与链接

指纹

引用此