Structure-Guided Image Inpainting Based on Multi-Scale Attention Pyramid Network

Jun Gong; Senlin Luo; Wenxin Yu; Liang Nie

doi:10.3390/app14188325

Structure-Guided Image Inpainting Based on Multi-Scale Attention Pyramid Network

Jun Gong, Senlin Luo, Wenxin Yu, Liang Nie^*

^*此作品的通讯作者

Southwest University of Science and Technology

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Current single-view image inpainting methods often suffer from low image information utilization and suboptimal repair outcomes. To address these challenges, this paper introduces a novel image inpainting framework that leverages a structure-guided multi-scale attention pyramid network. This network consists of a structural repair network and a multi-scale attention pyramid semantic repair network. The structural repair component utilizes a dual-branch U-Net network for robust structure prediction under strong constraints. The predicted structural view then serves as auxiliary information for the semantic repair network. This latter network exploits the pyramid structure to extract multi-scale features of the image, which are further refined through an attention feature fusion module. Additionally, a separable gated convolution strategy is employed during feature extraction to minimize the impact of invalid information from missing areas, thereby enhancing the restoration quality. Experiments conducted on standard datasets such as Paris Street View and CelebA demonstrate the superiority of our approach over existing methods through quantitative and qualitative comparisons. Further ablation studies, by incrementally integrating proposed mechanisms into a baseline model, substantiate the effectiveness of our multi-view restoration strategy, separable gated convolution, and multi-scale attention feature fusion.

源语言	英语
文章编号	8325
期刊	Applied Sciences (Switzerland)
卷	14
期	18
DOI	https://doi.org/10.3390/app14188325
出版状态	已出版 - 9月 2024

访问文件

10.3390/app14188325

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{94a0d24fa192488884a130020c189641,

title = "Structure-Guided Image Inpainting Based on Multi-Scale Attention Pyramid Network",

abstract = "Current single-view image inpainting methods often suffer from low image information utilization and suboptimal repair outcomes. To address these challenges, this paper introduces a novel image inpainting framework that leverages a structure-guided multi-scale attention pyramid network. This network consists of a structural repair network and a multi-scale attention pyramid semantic repair network. The structural repair component utilizes a dual-branch U-Net network for robust structure prediction under strong constraints. The predicted structural view then serves as auxiliary information for the semantic repair network. This latter network exploits the pyramid structure to extract multi-scale features of the image, which are further refined through an attention feature fusion module. Additionally, a separable gated convolution strategy is employed during feature extraction to minimize the impact of invalid information from missing areas, thereby enhancing the restoration quality. Experiments conducted on standard datasets such as Paris Street View and CelebA demonstrate the superiority of our approach over existing methods through quantitative and qualitative comparisons. Further ablation studies, by incrementally integrating proposed mechanisms into a baseline model, substantiate the effectiveness of our multi-view restoration strategy, separable gated convolution, and multi-scale attention feature fusion.",

keywords = "image inpainting, image processing, multi-scale attention pyramid, multi-view inpainting",

author = "Jun Gong and Senlin Luo and Wenxin Yu and Liang Nie",

note = "Publisher Copyright: {\textcopyright} 2024 by the authors.",

year = "2024",

month = sep,

doi = "10.3390/app14188325",

language = "English",

volume = "14",

journal = "Applied Sciences (Switzerland)",

issn = "2076-3417",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "18",

}

TY - JOUR

T1 - Structure-Guided Image Inpainting Based on Multi-Scale Attention Pyramid Network

AU - Gong, Jun

AU - Luo, Senlin

AU - Yu, Wenxin

AU - Nie, Liang

PY - 2024/9

Y1 - 2024/9

N2 - Current single-view image inpainting methods often suffer from low image information utilization and suboptimal repair outcomes. To address these challenges, this paper introduces a novel image inpainting framework that leverages a structure-guided multi-scale attention pyramid network. This network consists of a structural repair network and a multi-scale attention pyramid semantic repair network. The structural repair component utilizes a dual-branch U-Net network for robust structure prediction under strong constraints. The predicted structural view then serves as auxiliary information for the semantic repair network. This latter network exploits the pyramid structure to extract multi-scale features of the image, which are further refined through an attention feature fusion module. Additionally, a separable gated convolution strategy is employed during feature extraction to minimize the impact of invalid information from missing areas, thereby enhancing the restoration quality. Experiments conducted on standard datasets such as Paris Street View and CelebA demonstrate the superiority of our approach over existing methods through quantitative and qualitative comparisons. Further ablation studies, by incrementally integrating proposed mechanisms into a baseline model, substantiate the effectiveness of our multi-view restoration strategy, separable gated convolution, and multi-scale attention feature fusion.

AB - Current single-view image inpainting methods often suffer from low image information utilization and suboptimal repair outcomes. To address these challenges, this paper introduces a novel image inpainting framework that leverages a structure-guided multi-scale attention pyramid network. This network consists of a structural repair network and a multi-scale attention pyramid semantic repair network. The structural repair component utilizes a dual-branch U-Net network for robust structure prediction under strong constraints. The predicted structural view then serves as auxiliary information for the semantic repair network. This latter network exploits the pyramid structure to extract multi-scale features of the image, which are further refined through an attention feature fusion module. Additionally, a separable gated convolution strategy is employed during feature extraction to minimize the impact of invalid information from missing areas, thereby enhancing the restoration quality. Experiments conducted on standard datasets such as Paris Street View and CelebA demonstrate the superiority of our approach over existing methods through quantitative and qualitative comparisons. Further ablation studies, by incrementally integrating proposed mechanisms into a baseline model, substantiate the effectiveness of our multi-view restoration strategy, separable gated convolution, and multi-scale attention feature fusion.

KW - image inpainting

KW - image processing

KW - multi-scale attention pyramid

KW - multi-view inpainting

UR - http://www.scopus.com/inward/record.url?scp=85205301098&partnerID=8YFLogxK

U2 - 10.3390/app14188325

DO - 10.3390/app14188325

M3 - Article

AN - SCOPUS:85205301098

SN - 2076-3417

VL - 14

JO - Applied Sciences (Switzerland)

JF - Applied Sciences (Switzerland)

IS - 18

M1 - 8325

ER -

Structure-Guided Image Inpainting Based on Multi-Scale Attention Pyramid Network

摘要

访问文件

其它文件与链接

指纹

引用此