Efficient Multispectral Object Detection with attentive feature aggregation leveraging zero-shot implicit illumination guidance

Zhongxia Xiong, Ziying Yao, Xuan Liu, Wenyao Zhao, Jie Cao, Xinkai Wu*

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

摘要

With visible imagery and thermal sensing data, multispectral object detection facilitates around-the-clock perception for applications such as autonomous driving. Infrared input serves as auxiliary data for cross-modality feature aggregation, a common approach demonstrated to be successful by numerous previous studies. Nevertheless, despite the inclusion of complex and time-consuming modules in many existing methods, effective information fusion remains a formidable challenge due to severe spatiotemporal misalignment and modality imbalance between visible and thermal images. Thus, this paper intends to lift both the accuracy and speed for RGB-infrared perception. To this end, an illumination-guided attentive feature aggregation model (EMOD) is introduced to achieve Efficient Multispectral Object Detection. Firstly, EMOD employs feature fusion with a local-to-nonlocal cross-modality attention mechanism, which not only mitigates pixel-wise positional variation but also captures context-level complementary information. Furthermore, to address the modality imbalance issue, a signal indicating illumination conditions is implicitly embedded into the aggregation module to guide attentive computation. Unlike previous works, this signal is more potent and practical as it functions by denoting regional lighting conditions and without requiring additional training labels. Comprehensive experiments are conducted on three widely used datasets, including KAIST, CVC-14 and FLIR. Without bells and whistles, EMOD surpasses state-of-the-art approaches in terms of both effectiveness and efficiency. For example, it achieves a 5.96 MR score on KAIST while maintaining a speed of 28 FPS on a low-cost GPU.

源语言英语
文章编号102939
期刊Information Fusion
118
DOI
出版状态已出版 - 6月 2025

指纹

探究 'Efficient Multispectral Object Detection with attentive feature aggregation leveraging zero-shot implicit illumination guidance' 的科研主题。它们共同构成独一无二的指纹。

引用此

Xiong, Z., Yao, Z., Liu, X., Zhao, W., Cao, J., & Wu, X. (2025). Efficient Multispectral Object Detection with attentive feature aggregation leveraging zero-shot implicit illumination guidance. Information Fusion, 118, 文章 102939. https://doi.org/10.1016/j.inffus.2025.102939