Lightweight multi-level feature difference fusion network for RGB-D-T salient object detection

Kechen Song; Han Wang; Ying Zhao; Liming Huang; Hongwen Dong; Yunhui Yan

doi:10.1016/j.jksuci.2023.101702

Lightweight multi-level feature difference fusion network for RGB-D-T salient object detection

Kechen Song, Han Wang, Ying Zhao, Liming Huang, Hongwen Dong, Yunhui Yan^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

6 Citations (Scopus)

Abstract

In recent years, bimodal salient object detection has developed rapidly. In view of the advanced performance of their robustness to extreme situations such as background similarity and illumination variation, researchers began to focus on RGB-Depth-Thermal salient object detection (RGB-D-T SOD). However, most existing bimodal methods usually need expensive computational costs to complete accurate prediction, and this situation is even more serious for three-modal methods, which undoubtedly limits their applicability. To solve this problem, we are the first to propose a lightweight multi-level feature difference fusion network (MFDF) for real-time RGB-D-T SOD. In view of the depth modality contains less useful information, we design an asymmetric three-stream encoder based on MobileNetV2. Due to the differences in semantics and details between high and low level features, using the same module without discrimination will lead to a large number of redundant parameters. On the contrary, in the coding stage, we introduce a cross-modal enhancement module (CME) and a cross-modal fusion module (CMF) to fuse low-level and high-level features respectively. In order to reduce redundant parameters, we design a low-level feature decoding module (LFD) and a multi-scale high-level feature fusion module (MHFF). A great deal of experiments proves that the proposed MFDF has more advantages than the 17 state-of-the-art methods. On the efficiency side, MFDF has a faster speed (124 FPS when the image size is 320 × 320) and much fewer parameters (8.9 M).

Original language	English
Article number	101702
Journal	Journal of King Saud University - Computer and Information Sciences
Volume	35
Issue number	8
DOIs	https://doi.org/10.1016/j.jksuci.2023.101702
Publication status	Published - Sept 2023
Externally published	Yes

Keywords

Cross-modal feature fusion
Lightweight network
RGB-depth-thermal images
Salient object detection

Access to Document

10.1016/j.jksuci.2023.101702

Cite this

Song, K., Wang, H., Zhao, Y., Huang, L., Dong, H., & Yan, Y. (2023). Lightweight multi-level feature difference fusion network for RGB-D-T salient object detection. Journal of King Saud University - Computer and Information Sciences, 35(8), Article 101702. https://doi.org/10.1016/j.jksuci.2023.101702

@article{a9fd4452b6794c65a890fa54337a3d51,

title = "Lightweight multi-level feature difference fusion network for RGB-D-T salient object detection",

abstract = "In recent years, bimodal salient object detection has developed rapidly. In view of the advanced performance of their robustness to extreme situations such as background similarity and illumination variation, researchers began to focus on RGB-Depth-Thermal salient object detection (RGB-D-T SOD). However, most existing bimodal methods usually need expensive computational costs to complete accurate prediction, and this situation is even more serious for three-modal methods, which undoubtedly limits their applicability. To solve this problem, we are the first to propose a lightweight multi-level feature difference fusion network (MFDF) for real-time RGB-D-T SOD. In view of the depth modality contains less useful information, we design an asymmetric three-stream encoder based on MobileNetV2. Due to the differences in semantics and details between high and low level features, using the same module without discrimination will lead to a large number of redundant parameters. On the contrary, in the coding stage, we introduce a cross-modal enhancement module (CME) and a cross-modal fusion module (CMF) to fuse low-level and high-level features respectively. In order to reduce redundant parameters, we design a low-level feature decoding module (LFD) and a multi-scale high-level feature fusion module (MHFF). A great deal of experiments proves that the proposed MFDF has more advantages than the 17 state-of-the-art methods. On the efficiency side, MFDF has a faster speed (124 FPS when the image size is 320 × 320) and much fewer parameters (8.9 M).",

keywords = "Cross-modal feature fusion, Lightweight network, RGB-depth-thermal images, Salient object detection",

author = "Kechen Song and Han Wang and Ying Zhao and Liming Huang and Hongwen Dong and Yunhui Yan",

note = "Publisher Copyright: {\textcopyright} 2023 The Author(s)",

year = "2023",

month = sep,

doi = "10.1016/j.jksuci.2023.101702",

language = "English",

volume = "35",

journal = "Journal of King Saud University - Computer and Information Sciences",

issn = "1319-1578",

publisher = "King Saud bin Abdulaziz University",

number = "8",

}

TY - JOUR

T1 - Lightweight multi-level feature difference fusion network for RGB-D-T salient object detection

AU - Song, Kechen

AU - Wang, Han

AU - Zhao, Ying

AU - Huang, Liming

AU - Dong, Hongwen

AU - Yan, Yunhui

PY - 2023/9

Y1 - 2023/9

N2 - In recent years, bimodal salient object detection has developed rapidly. In view of the advanced performance of their robustness to extreme situations such as background similarity and illumination variation, researchers began to focus on RGB-Depth-Thermal salient object detection (RGB-D-T SOD). However, most existing bimodal methods usually need expensive computational costs to complete accurate prediction, and this situation is even more serious for three-modal methods, which undoubtedly limits their applicability. To solve this problem, we are the first to propose a lightweight multi-level feature difference fusion network (MFDF) for real-time RGB-D-T SOD. In view of the depth modality contains less useful information, we design an asymmetric three-stream encoder based on MobileNetV2. Due to the differences in semantics and details between high and low level features, using the same module without discrimination will lead to a large number of redundant parameters. On the contrary, in the coding stage, we introduce a cross-modal enhancement module (CME) and a cross-modal fusion module (CMF) to fuse low-level and high-level features respectively. In order to reduce redundant parameters, we design a low-level feature decoding module (LFD) and a multi-scale high-level feature fusion module (MHFF). A great deal of experiments proves that the proposed MFDF has more advantages than the 17 state-of-the-art methods. On the efficiency side, MFDF has a faster speed (124 FPS when the image size is 320 × 320) and much fewer parameters (8.9 M).

AB - In recent years, bimodal salient object detection has developed rapidly. In view of the advanced performance of their robustness to extreme situations such as background similarity and illumination variation, researchers began to focus on RGB-Depth-Thermal salient object detection (RGB-D-T SOD). However, most existing bimodal methods usually need expensive computational costs to complete accurate prediction, and this situation is even more serious for three-modal methods, which undoubtedly limits their applicability. To solve this problem, we are the first to propose a lightweight multi-level feature difference fusion network (MFDF) for real-time RGB-D-T SOD. In view of the depth modality contains less useful information, we design an asymmetric three-stream encoder based on MobileNetV2. Due to the differences in semantics and details between high and low level features, using the same module without discrimination will lead to a large number of redundant parameters. On the contrary, in the coding stage, we introduce a cross-modal enhancement module (CME) and a cross-modal fusion module (CMF) to fuse low-level and high-level features respectively. In order to reduce redundant parameters, we design a low-level feature decoding module (LFD) and a multi-scale high-level feature fusion module (MHFF). A great deal of experiments proves that the proposed MFDF has more advantages than the 17 state-of-the-art methods. On the efficiency side, MFDF has a faster speed (124 FPS when the image size is 320 × 320) and much fewer parameters (8.9 M).

KW - Cross-modal feature fusion

KW - Lightweight network

KW - RGB-depth-thermal images

KW - Salient object detection

UR - http://www.scopus.com/inward/record.url?scp=85168609397&partnerID=8YFLogxK

U2 - 10.1016/j.jksuci.2023.101702

DO - 10.1016/j.jksuci.2023.101702

M3 - Article

AN - SCOPUS:85168609397

SN - 1319-1578

VL - 35

JO - Journal of King Saud University - Computer and Information Sciences

JF - Journal of King Saud University - Computer and Information Sciences

IS - 8

M1 - 101702

ER -

Lightweight multi-level feature difference fusion network for RGB-D-T salient object detection

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this