Self-Enhanced Mixed Attention Network for Three-Modal Images Few-Shot Semantic Segmentation

Kechen Song, Yiming Zhang, Yanqi Bao*, Ying Zhao, Yunhui Yan

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

3 引用 (Scopus)

摘要

As an important computer vision technique, image segmentation has been widely used in various tasks. However, in some extreme cases, the insufficient illumination would result in a great impact on the performance of the model. So more and more fully supervised methods use multi-modal images as their input. The dense annotated large datasets are difficult to obtain, but the few-shot methods still can have satisfactory results with few pixel-annotated samples. Therefore, we propose the Visible-Depth-Thermal (three-modal) images few-shot semantic segmentation method. It utilizes the homogeneous information of three-modal images and the complementary information of different modal images, which can improve the performance of few-shot segmentation tasks. We constructed a novel indoor dataset VDT-2048-5i for the three-modal images few-shot semantic segmentation task. We also proposed a Self-Enhanced Mixed Attention Network (SEMANet), which consists of a Self-Enhanced module (SE) and a Mixed Attention module (MA). The SE module amplifies the difference between the different kinds of features and strengthens the weak connection for the foreground features. The MA module fuses the three-modal feature to obtain a better feature. Compared with the most advanced methods before, our model improves mIoU by 3.8% and 3.3% in 1-shot and 5-shot settings, respectively, which achieves state-of-the-art performance. In the future, we will solve failure cases by obtaining more discriminative and robust feature representations, and explore achieving high performance with fewer parameters and computational costs.

源语言英语
文章编号6612
期刊Sensors
23
14
DOI
出版状态已出版 - 7月 2023
已对外发布

指纹

探究 'Self-Enhanced Mixed Attention Network for Three-Modal Images Few-Shot Semantic Segmentation' 的科研主题。它们共同构成独一无二的指纹。

引用此