Learning Fine-Grained Information with Capsule-Wise Attention for Salient Object Detection

Sanyuan Zhao; Zongzheng Wen; Qi Qi; Kin Man Lam; Jianbing Shen

doi:10.1109/TMM.2023.3234436

Learning Fine-Grained Information with Capsule-Wise Attention for Salient Object Detection

Sanyuan Zhao, Zongzheng Wen, Qi Qi, Kin Man Lam, Jianbing Shen

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

5 引用（Scopus）

摘要

With the popularity of convolutional neural networks being used for salient object detection (SOD), the performance has been significantly improved. However, how to integrate crucial features for modeling salient objects needs further exploration. In this work, we propose an effective feature selection scheme to solve this task. Firstly, we provide a Simplified Atrous Spatial Pyramid Pooling (SASPP) module to lightweight the multi-scale features. Dealing with the SASSP features, we design a pixel-level local feature selection scheme named Multi-Scale Capsule-wise Attention (MSCA). It aggregates features from multi-scales by dynamic routing and helps the network to generate fine-grained prediction maps. In addition, we exploit holistic features by the Spatial-wise Attention and Channel-wise Attention (SA/CA) mechanisms, which adaptively extracts spatial or channel information. We also propose a Multi-crossed Layer Connections (MLC) structure in the upsampling stage, to fuse features from not only different levels but also different scales. The salient object prediction is performed in a coarse-to-fine manner. By conducting comprehensive experiments on five benchmark datasets, our method achieves the best performance when compared to existing state-of-the-art approaches.

源语言	英语
页（从-至）	1-14
页数	14
期刊	IEEE Transactions on Multimedia
DOI	https://doi.org/10.1109/TMM.2023.3234436
出版状态	已接受/待刊 - 2023

访问文件

10.1109/TMM.2023.3234436

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{68943ff4ca0241b6a37bfb8fba6491b5,

title = "Learning Fine-Grained Information with Capsule-Wise Attention for Salient Object Detection",

abstract = "With the popularity of convolutional neural networks being used for salient object detection (SOD), the performance has been significantly improved. However, how to integrate crucial features for modeling salient objects needs further exploration. In this work, we propose an effective feature selection scheme to solve this task. Firstly, we provide a Simplified Atrous Spatial Pyramid Pooling (SASPP) module to lightweight the multi-scale features. Dealing with the SASSP features, we design a pixel-level local feature selection scheme named Multi-Scale Capsule-wise Attention (MSCA). It aggregates features from multi-scales by dynamic routing and helps the network to generate fine-grained prediction maps. In addition, we exploit holistic features by the Spatial-wise Attention and Channel-wise Attention (SA/CA) mechanisms, which adaptively extracts spatial or channel information. We also propose a Multi-crossed Layer Connections (MLC) structure in the upsampling stage, to fuse features from not only different levels but also different scales. The salient object prediction is performed in a coarse-to-fine manner. By conducting comprehensive experiments on five benchmark datasets, our method achieves the best performance when compared to existing state-of-the-art approaches.",

keywords = "Capsule-wise attention, Context modeling, Feature extraction, Fuses, Object detection, Predictive models, Task analysis, Visualization, feature attention, salient object detection",

author = "Sanyuan Zhao and Zongzheng Wen and Qi Qi and Lam, {Kin Man} and Jianbing Shen",

note = "Publisher Copyright: IEEE",

year = "2023",

doi = "10.1109/TMM.2023.3234436",

language = "English",

pages = "1--14",

journal = "IEEE Transactions on Multimedia",

issn = "1520-9210",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Learning Fine-Grained Information with Capsule-Wise Attention for Salient Object Detection

AU - Zhao, Sanyuan

AU - Wen, Zongzheng

AU - Qi, Qi

AU - Lam, Kin Man

AU - Shen, Jianbing

N1 - Publisher Copyright: IEEE

PY - 2023

Y1 - 2023

N2 - With the popularity of convolutional neural networks being used for salient object detection (SOD), the performance has been significantly improved. However, how to integrate crucial features for modeling salient objects needs further exploration. In this work, we propose an effective feature selection scheme to solve this task. Firstly, we provide a Simplified Atrous Spatial Pyramid Pooling (SASPP) module to lightweight the multi-scale features. Dealing with the SASSP features, we design a pixel-level local feature selection scheme named Multi-Scale Capsule-wise Attention (MSCA). It aggregates features from multi-scales by dynamic routing and helps the network to generate fine-grained prediction maps. In addition, we exploit holistic features by the Spatial-wise Attention and Channel-wise Attention (SA/CA) mechanisms, which adaptively extracts spatial or channel information. We also propose a Multi-crossed Layer Connections (MLC) structure in the upsampling stage, to fuse features from not only different levels but also different scales. The salient object prediction is performed in a coarse-to-fine manner. By conducting comprehensive experiments on five benchmark datasets, our method achieves the best performance when compared to existing state-of-the-art approaches.

AB - With the popularity of convolutional neural networks being used for salient object detection (SOD), the performance has been significantly improved. However, how to integrate crucial features for modeling salient objects needs further exploration. In this work, we propose an effective feature selection scheme to solve this task. Firstly, we provide a Simplified Atrous Spatial Pyramid Pooling (SASPP) module to lightweight the multi-scale features. Dealing with the SASSP features, we design a pixel-level local feature selection scheme named Multi-Scale Capsule-wise Attention (MSCA). It aggregates features from multi-scales by dynamic routing and helps the network to generate fine-grained prediction maps. In addition, we exploit holistic features by the Spatial-wise Attention and Channel-wise Attention (SA/CA) mechanisms, which adaptively extracts spatial or channel information. We also propose a Multi-crossed Layer Connections (MLC) structure in the upsampling stage, to fuse features from not only different levels but also different scales. The salient object prediction is performed in a coarse-to-fine manner. By conducting comprehensive experiments on five benchmark datasets, our method achieves the best performance when compared to existing state-of-the-art approaches.

KW - Capsule-wise attention

KW - Context modeling

KW - Feature extraction

KW - Fuses

KW - Object detection

KW - Predictive models

KW - Task analysis

KW - Visualization

KW - feature attention

KW - salient object detection

UR - http://www.scopus.com/inward/record.url?scp=85147229353&partnerID=8YFLogxK

U2 - 10.1109/TMM.2023.3234436

DO - 10.1109/TMM.2023.3234436

M3 - Article

AN - SCOPUS:85147229353

SN - 1520-9210

SP - 1

EP - 14

JO - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

ER -

Learning Fine-Grained Information with Capsule-Wise Attention for Salient Object Detection

摘要

访问文件

其它文件与链接

指纹

引用此