TY - JOUR
T1 - CoGANet
T2 - Co-Guided Attention Network for Salient Object Detection
AU - Zhao, Yufei
AU - Song, Yong
AU - Li, Guoqi
AU - Huang, Yi
AU - Bai, Yashuo
AU - Zhou, Ya
AU - Hao, Qun
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022/8/1
Y1 - 2022/8/1
N2 - Recent salient object detection methods are mainly based on Convolutional Neural Networks (CNNs). Most of them adopt a U-shape architecture to extract and fuse multi-scale features. The coarser-level semantic information is progressively transmitted to finer-level layers through continuous upsampling operations, and the coarsest-level features will be diluted, resulting in the salient object boundary being blurred. On the other hand, the hand-craft feature has the advantage of being purposive and easy to calculate, in which the edge density feature may help improve the sharpness of the salient object boundary by the rich edge information. In this paper, we propose a Co-Guided Attention Network (CoGANet). On the base of the Feature Pyramid Network (FPN), our model implements a co-guided attention mechanism between the image itself and its edge density feature. In the bottom-up pathway of FPN, two streams separately work, taking the original image and its edge density feature as inputs, and each producing five feature maps. Then the last feature map in each stream generates a set of attention maps through a Multi-scale Spatial Attention Module (MSAM). In the top-down pathway, the attention maps of one stream are directly delivered to each stage in the other stream. These attention maps are fused with the feature maps by an Attention-based Feature Fusion Module (AFFM). Finally, an accurate saliency map is produced by fusing the finest-level outputs of the two streams. Experimental results on five benchmark datasets demonstrate our model is superior to 13 state-of-the-art methods in terms of four evaluation metrics.
AB - Recent salient object detection methods are mainly based on Convolutional Neural Networks (CNNs). Most of them adopt a U-shape architecture to extract and fuse multi-scale features. The coarser-level semantic information is progressively transmitted to finer-level layers through continuous upsampling operations, and the coarsest-level features will be diluted, resulting in the salient object boundary being blurred. On the other hand, the hand-craft feature has the advantage of being purposive and easy to calculate, in which the edge density feature may help improve the sharpness of the salient object boundary by the rich edge information. In this paper, we propose a Co-Guided Attention Network (CoGANet). On the base of the Feature Pyramid Network (FPN), our model implements a co-guided attention mechanism between the image itself and its edge density feature. In the bottom-up pathway of FPN, two streams separately work, taking the original image and its edge density feature as inputs, and each producing five feature maps. Then the last feature map in each stream generates a set of attention maps through a Multi-scale Spatial Attention Module (MSAM). In the top-down pathway, the attention maps of one stream are directly delivered to each stage in the other stream. These attention maps are fused with the feature maps by an Attention-based Feature Fusion Module (AFFM). Finally, an accurate saliency map is produced by fusing the finest-level outputs of the two streams. Experimental results on five benchmark datasets demonstrate our model is superior to 13 state-of-the-art methods in terms of four evaluation metrics.
KW - Convolutional Neural Networks
KW - Hand-craft feature
KW - Salient object detection
KW - Spatial attention
UR - http://www.scopus.com/inward/record.url?scp=85135207093&partnerID=8YFLogxK
U2 - 10.1109/JPHOT.2022.3192014
DO - 10.1109/JPHOT.2022.3192014
M3 - Article
AN - SCOPUS:85135207093
SN - 1943-0655
VL - 14
JO - IEEE Photonics Journal
JF - IEEE Photonics Journal
IS - 4
M1 - 7842812
ER -