Extracting effective image attributes with refined universal detection

Qiang Yu; Xinyu Xiao; Chunxia Zhang; Lifei Song; Chunhong Pan

doi:10.3390/s21010095

Extracting effective image attributes with refined universal detection

Qiang Yu, Xinyu Xiao^*, Chunxia Zhang, Lifei Song, Chunhong Pan

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

Recently, image attributes containing high-level semantic information have been widely used in computer vision tasks, including visual recognition and image captioning. Existing attribute extraction methods map visual concepts to the probabilities of frequently-used words by directly using Convolutional Neural Networks (CNNs). Typically, two main problems exist in those methods. First, words of different parts of speech (POSs) are handled in the same way, but non-nominal words can hardly be mapped to visual regions through CNNs only. Second, synonymous nominal words are treated as independent and different words, in which similarities are ignored. In this paper, a novel Refined Universal Detection (RUDet) method is proposed to solve these two problems. Specifically, a Refinement (RF) module is designed to extract refined attributes of non-nominal words based on the attributes of nominal words and visual features. In addition, a Word Tree (WT) module is constructed to integrate synonymous nouns, which ensures that similar words hold similar and more accurate probabilities. Moreover, a Feature Enhancement (FE) module is adopted to enhance the ability to mine different visual concepts in different scales. Experiments conducted on the large-scale Microsoft (MS) COCO dataset illustrate the effectiveness of our proposed method.

源语言	英语
文章编号	95
页（从-至）	1-16
页数	16
期刊	Sensors
卷	21
期	1
DOI	https://doi.org/10.3390/s21010095
出版状态	已出版 - 1 1月 2021

访问文件

10.3390/s21010095

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{8d2b36bb6dd9437eac4597975074a6bb,

title = "Extracting effective image attributes with refined universal detection",

abstract = "Recently, image attributes containing high-level semantic information have been widely used in computer vision tasks, including visual recognition and image captioning. Existing attribute extraction methods map visual concepts to the probabilities of frequently-used words by directly using Convolutional Neural Networks (CNNs). Typically, two main problems exist in those methods. First, words of different parts of speech (POSs) are handled in the same way, but non-nominal words can hardly be mapped to visual regions through CNNs only. Second, synonymous nominal words are treated as independent and different words, in which similarities are ignored. In this paper, a novel Refined Universal Detection (RUDet) method is proposed to solve these two problems. Specifically, a Refinement (RF) module is designed to extract refined attributes of non-nominal words based on the attributes of nominal words and visual features. In addition, a Word Tree (WT) module is constructed to integrate synonymous nouns, which ensures that similar words hold similar and more accurate probabilities. Moreover, a Feature Enhancement (FE) module is adopted to enhance the ability to mine different visual concepts in different scales. Experiments conducted on the large-scale Microsoft (MS) COCO dataset illustrate the effectiveness of our proposed method.",

keywords = "Attribute extraction, Image captioning, Refined universal detection, Word tree",

author = "Qiang Yu and Xinyu Xiao and Chunxia Zhang and Lifei Song and Chunhong Pan",

note = "Publisher Copyright: {\textcopyright} 2020 by the authors. Licensee MDPI, Basel, Switzerland.",

year = "2021",

month = jan,

day = "1",

doi = "10.3390/s21010095",

language = "English",

volume = "21",

pages = "1--16",

journal = "Sensors",

issn = "1424-8220",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "1",

}

TY - JOUR

T1 - Extracting effective image attributes with refined universal detection

AU - Yu, Qiang

AU - Xiao, Xinyu

AU - Zhang, Chunxia

AU - Song, Lifei

AU - Pan, Chunhong

PY - 2021/1/1

Y1 - 2021/1/1

N2 - Recently, image attributes containing high-level semantic information have been widely used in computer vision tasks, including visual recognition and image captioning. Existing attribute extraction methods map visual concepts to the probabilities of frequently-used words by directly using Convolutional Neural Networks (CNNs). Typically, two main problems exist in those methods. First, words of different parts of speech (POSs) are handled in the same way, but non-nominal words can hardly be mapped to visual regions through CNNs only. Second, synonymous nominal words are treated as independent and different words, in which similarities are ignored. In this paper, a novel Refined Universal Detection (RUDet) method is proposed to solve these two problems. Specifically, a Refinement (RF) module is designed to extract refined attributes of non-nominal words based on the attributes of nominal words and visual features. In addition, a Word Tree (WT) module is constructed to integrate synonymous nouns, which ensures that similar words hold similar and more accurate probabilities. Moreover, a Feature Enhancement (FE) module is adopted to enhance the ability to mine different visual concepts in different scales. Experiments conducted on the large-scale Microsoft (MS) COCO dataset illustrate the effectiveness of our proposed method.

AB - Recently, image attributes containing high-level semantic information have been widely used in computer vision tasks, including visual recognition and image captioning. Existing attribute extraction methods map visual concepts to the probabilities of frequently-used words by directly using Convolutional Neural Networks (CNNs). Typically, two main problems exist in those methods. First, words of different parts of speech (POSs) are handled in the same way, but non-nominal words can hardly be mapped to visual regions through CNNs only. Second, synonymous nominal words are treated as independent and different words, in which similarities are ignored. In this paper, a novel Refined Universal Detection (RUDet) method is proposed to solve these two problems. Specifically, a Refinement (RF) module is designed to extract refined attributes of non-nominal words based on the attributes of nominal words and visual features. In addition, a Word Tree (WT) module is constructed to integrate synonymous nouns, which ensures that similar words hold similar and more accurate probabilities. Moreover, a Feature Enhancement (FE) module is adopted to enhance the ability to mine different visual concepts in different scales. Experiments conducted on the large-scale Microsoft (MS) COCO dataset illustrate the effectiveness of our proposed method.

KW - Attribute extraction

KW - Image captioning

KW - Refined universal detection

KW - Word tree

UR - http://www.scopus.com/inward/record.url?scp=85098737759&partnerID=8YFLogxK

U2 - 10.3390/s21010095

DO - 10.3390/s21010095

M3 - Article

C2 - 33375715

AN - SCOPUS:85098737759

SN - 1424-8220

VL - 21

SP - 1

EP - 16

JO - Sensors

JF - Sensors

IS - 1

M1 - 95

ER -

Extracting effective image attributes with refined universal detection

摘要

访问文件

其它文件与链接

指纹

引用此