TY - JOUR
T1 - CSMR
T2 - A Multi-Modal Registered Dataset for Complex Scenarios
AU - Li, Chenrui
AU - Gao, Kun
AU - Hu, Zibo
AU - Yang, Zhijia
AU - Cai, Mingfeng
AU - Cheng, Haobo
AU - Zhu, Zhenyu
N1 - Publisher Copyright:
© 2025 by the authors.
PY - 2025/3
Y1 - 2025/3
N2 - Complex scenarios pose challenges to tasks in computer vision, including image fusion, object detection, and image-to-image translation. On the one hand, complex scenarios involve fluctuating weather or lighting conditions, where even images of the same scenarios appear to be different. On the other hand, the large amount of textural detail in the given images introduces considerable interference that can conceal the useful information contained in them. An effective solution to these problems is to use the complementary details present in multi-modal images, such as visible-light and infrared images. Visible-light images contain rich textural information while infrared images contain information about the temperature. In this study, we propose a multi-modal registered dataset for complex scenarios under various environmental conditions, targeting security surveillance and the monitoring of low-slow-small targets. Our dataset contains 30,819 images, where the targets are labeled as three classes of “person”, “car”, and “drone” using Yolo format bounding boxes. We compared our dataset with those used in the literature for computer vision-related tasks, including image fusion, object detection, and image-to-image translation. The results showed that introducing complementary information through image fusion can compensate for missing details in the original images, and we also revealed the limitations of visual tasks in single-modal images with complex scenarios.
AB - Complex scenarios pose challenges to tasks in computer vision, including image fusion, object detection, and image-to-image translation. On the one hand, complex scenarios involve fluctuating weather or lighting conditions, where even images of the same scenarios appear to be different. On the other hand, the large amount of textural detail in the given images introduces considerable interference that can conceal the useful information contained in them. An effective solution to these problems is to use the complementary details present in multi-modal images, such as visible-light and infrared images. Visible-light images contain rich textural information while infrared images contain information about the temperature. In this study, we propose a multi-modal registered dataset for complex scenarios under various environmental conditions, targeting security surveillance and the monitoring of low-slow-small targets. Our dataset contains 30,819 images, where the targets are labeled as three classes of “person”, “car”, and “drone” using Yolo format bounding boxes. We compared our dataset with those used in the literature for computer vision-related tasks, including image fusion, object detection, and image-to-image translation. The results showed that introducing complementary information through image fusion can compensate for missing details in the original images, and we also revealed the limitations of visual tasks in single-modal images with complex scenarios.
KW - image fusion
KW - image-to-image translation
KW - infrared and visible dataset
KW - object detection
UR - http://www.scopus.com/inward/record.url?scp=86000513490&partnerID=8YFLogxK
U2 - 10.3390/rs17050844
DO - 10.3390/rs17050844
M3 - Article
AN - SCOPUS:86000513490
SN - 2072-4292
VL - 17
JO - Remote Sensing
JF - Remote Sensing
IS - 5
M1 - 844
ER -