TY - JOUR
T1 - 融合双阶段特征与 Transformer 编码的交互式图像分割
AU - Feng, Jun
AU - Zhang, Tian
AU - Shi, Yichen
AU - Wang, Hui
AU - Hu, Jingjing
N1 - Publisher Copyright:
© 2024 Institute of Computing Technology. All rights reserved.
PY - 2024/6
Y1 - 2024/6
N2 - In order to segment the foreground objects that users are interested in quickly and accurately, and obtain high-quality and low-cost annotation segmentation data, an interactive image segmentation algorithm based on two-stage feature fusion and Transformer encoder is proposed. Firstly, lightweight Transformer backbone network is adopted to extract multi-scale feature coding for input image, which can make better use of context information. Then, the subjective prior knowledge is introduced by means of click interaction, and the interactive features are integrated into Transformer network through the primary and enhanced stages in turn. Finally, the atrous convolution, attention mechanism and multi-layer perceptron are combined to decode the feature map obtained by the backbone network. Experimental results show that mNoC@90% values of the proposed algorithm on the GrabCut, Berkeley and DAVIS datasets reach 2.18, 4.04 and 7.39 respectively, which is better than other comparison algorithms. And the time and space complexity is lower than that of f-BRS-B. The proposed algorithm has good stability to the disturbance change of interactive click position and click type. It shows that the proposed algorithm can quickly, accurately and stably segment users’ interested objects, and improve user interaction experience.
AB - In order to segment the foreground objects that users are interested in quickly and accurately, and obtain high-quality and low-cost annotation segmentation data, an interactive image segmentation algorithm based on two-stage feature fusion and Transformer encoder is proposed. Firstly, lightweight Transformer backbone network is adopted to extract multi-scale feature coding for input image, which can make better use of context information. Then, the subjective prior knowledge is introduced by means of click interaction, and the interactive features are integrated into Transformer network through the primary and enhanced stages in turn. Finally, the atrous convolution, attention mechanism and multi-layer perceptron are combined to decode the feature map obtained by the backbone network. Experimental results show that mNoC@90% values of the proposed algorithm on the GrabCut, Berkeley and DAVIS datasets reach 2.18, 4.04 and 7.39 respectively, which is better than other comparison algorithms. And the time and space complexity is lower than that of f-BRS-B. The proposed algorithm has good stability to the disturbance change of interactive click position and click type. It shows that the proposed algorithm can quickly, accurately and stably segment users’ interested objects, and improve user interaction experience.
KW - Transformer encoder
KW - deep learning
KW - interactive feature fusion
KW - interactive image segmentation
KW - lightweight network
UR - http://www.scopus.com/inward/record.url?scp=85202821911&partnerID=8YFLogxK
U2 - 10.3724/SP.J.1089.2024.19922
DO - 10.3724/SP.J.1089.2024.19922
M3 - 文章
AN - SCOPUS:85202821911
SN - 1003-9775
VL - 36
SP - 831
EP - 843
JO - Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics
JF - Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics
IS - 6
ER -