Attentional Kernel Encoding Networks for Fine-Grained Visual Categorization

Yutao Hu; Yandan Yang; Jun Zhang; Xianbin Cao; Xiantong Zhen

doi:10.1109/TCSVT.2020.2978115

Attentional Kernel Encoding Networks for Fine-Grained Visual Categorization

Yutao Hu, Yandan Yang, Jun Zhang, Xianbin Cao^*, Xiantong Zhen

^*此作品的通讯作者

信息与电子学院

科研成果: 期刊稿件 › 文章 › 同行评审

26 引用（Scopus）

摘要

Fine-grained visual categorization aims to recognize objects from different sub-ordinate categories, which is a challenging task due to subtle visual differences between images. It is highly desired to identify discriminative regions while achieving highly non-linear compact representation for fine-grained visual categorization. However, existing methods either rely on manually defined part-based annotations to indicate the distinctive regions or operate on longitudinal vectors to capture the non-linear information, which may lose important spatial layout information. In this paper, we propose the Attentional Kernel Encoding Networks (AKEN) for fine-grained visual categorization. Specifically, the AKEN aggregates feature maps from the last convolutional layer of ConvNets to obtain a holistic feature representation. By Fourier embedding, it encodes features from both the longitudinal and transverse directions, which largely retains the spatial layout information. Moreover, we incorporate a Cascaded Attention (Cas-Attention) module to highlight local regions that distinguish among subordinate categories, enabling the AKEN to extract the most discriminative features. Working in conjunction with the attention mechanism, the proposed AKEN combines the strengths of ConvNets and kernels for non-linear feature learning, which can establish discriminative and descriptive feature representations for fine-grained image categorization. Experiments on three benchmark datasets show that the proposed AKEN delivers highly competitive performance, surpassing most existed methods and achieving state-of-the-art results.

源语言	英语
文章编号	9023386
页（从-至）	301-314
页数	14
期刊	IEEE Transactions on Circuits and Systems for Video Technology
卷	31
期	1
DOI	https://doi.org/10.1109/TCSVT.2020.2978115
出版状态	已出版 - 1月 2021

访问文件

10.1109/TCSVT.2020.2978115

其它文件与链接

链接到 Scopus 的出版物

引用此

Hu, Y., Yang, Y., Zhang, J., Cao, X., & Zhen, X. (2021). Attentional Kernel Encoding Networks for Fine-Grained Visual Categorization. IEEE Transactions on Circuits and Systems for Video Technology, 31(1), 301-314. 文章 9023386. https://doi.org/10.1109/TCSVT.2020.2978115

@article{f9b86aeea7ae490985162e0d3a4261e0,

title = "Attentional Kernel Encoding Networks for Fine-Grained Visual Categorization",

abstract = "Fine-grained visual categorization aims to recognize objects from different sub-ordinate categories, which is a challenging task due to subtle visual differences between images. It is highly desired to identify discriminative regions while achieving highly non-linear compact representation for fine-grained visual categorization. However, existing methods either rely on manually defined part-based annotations to indicate the distinctive regions or operate on longitudinal vectors to capture the non-linear information, which may lose important spatial layout information. In this paper, we propose the Attentional Kernel Encoding Networks (AKEN) for fine-grained visual categorization. Specifically, the AKEN aggregates feature maps from the last convolutional layer of ConvNets to obtain a holistic feature representation. By Fourier embedding, it encodes features from both the longitudinal and transverse directions, which largely retains the spatial layout information. Moreover, we incorporate a Cascaded Attention (Cas-Attention) module to highlight local regions that distinguish among subordinate categories, enabling the AKEN to extract the most discriminative features. Working in conjunction with the attention mechanism, the proposed AKEN combines the strengths of ConvNets and kernels for non-linear feature learning, which can establish discriminative and descriptive feature representations for fine-grained image categorization. Experiments on three benchmark datasets show that the proposed AKEN delivers highly competitive performance, surpassing most existed methods and achieving state-of-the-art results.",

keywords = "Fine-grained visual categorization, Kernel encoding, attention",

author = "Yutao Hu and Yandan Yang and Jun Zhang and Xianbin Cao and Xiantong Zhen",

note = "Publisher Copyright: {\textcopyright} 1991-2012 IEEE.",

year = "2021",

month = jan,

doi = "10.1109/TCSVT.2020.2978115",

language = "English",

volume = "31",

pages = "301--314",

journal = "IEEE Transactions on Circuits and Systems for Video Technology",

issn = "1051-8215",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "1",

}

TY - JOUR

T1 - Attentional Kernel Encoding Networks for Fine-Grained Visual Categorization

AU - Hu, Yutao

AU - Yang, Yandan

AU - Zhang, Jun

AU - Cao, Xianbin

AU - Zhen, Xiantong

PY - 2021/1

Y1 - 2021/1

N2 - Fine-grained visual categorization aims to recognize objects from different sub-ordinate categories, which is a challenging task due to subtle visual differences between images. It is highly desired to identify discriminative regions while achieving highly non-linear compact representation for fine-grained visual categorization. However, existing methods either rely on manually defined part-based annotations to indicate the distinctive regions or operate on longitudinal vectors to capture the non-linear information, which may lose important spatial layout information. In this paper, we propose the Attentional Kernel Encoding Networks (AKEN) for fine-grained visual categorization. Specifically, the AKEN aggregates feature maps from the last convolutional layer of ConvNets to obtain a holistic feature representation. By Fourier embedding, it encodes features from both the longitudinal and transverse directions, which largely retains the spatial layout information. Moreover, we incorporate a Cascaded Attention (Cas-Attention) module to highlight local regions that distinguish among subordinate categories, enabling the AKEN to extract the most discriminative features. Working in conjunction with the attention mechanism, the proposed AKEN combines the strengths of ConvNets and kernels for non-linear feature learning, which can establish discriminative and descriptive feature representations for fine-grained image categorization. Experiments on three benchmark datasets show that the proposed AKEN delivers highly competitive performance, surpassing most existed methods and achieving state-of-the-art results.

AB - Fine-grained visual categorization aims to recognize objects from different sub-ordinate categories, which is a challenging task due to subtle visual differences between images. It is highly desired to identify discriminative regions while achieving highly non-linear compact representation for fine-grained visual categorization. However, existing methods either rely on manually defined part-based annotations to indicate the distinctive regions or operate on longitudinal vectors to capture the non-linear information, which may lose important spatial layout information. In this paper, we propose the Attentional Kernel Encoding Networks (AKEN) for fine-grained visual categorization. Specifically, the AKEN aggregates feature maps from the last convolutional layer of ConvNets to obtain a holistic feature representation. By Fourier embedding, it encodes features from both the longitudinal and transverse directions, which largely retains the spatial layout information. Moreover, we incorporate a Cascaded Attention (Cas-Attention) module to highlight local regions that distinguish among subordinate categories, enabling the AKEN to extract the most discriminative features. Working in conjunction with the attention mechanism, the proposed AKEN combines the strengths of ConvNets and kernels for non-linear feature learning, which can establish discriminative and descriptive feature representations for fine-grained image categorization. Experiments on three benchmark datasets show that the proposed AKEN delivers highly competitive performance, surpassing most existed methods and achieving state-of-the-art results.

KW - Fine-grained visual categorization

KW - Kernel encoding

KW - attention

UR - http://www.scopus.com/inward/record.url?scp=85099402346&partnerID=8YFLogxK

U2 - 10.1109/TCSVT.2020.2978115

DO - 10.1109/TCSVT.2020.2978115

M3 - Article

AN - SCOPUS:85099402346

SN - 1051-8215

VL - 31

SP - 301

EP - 314

JO - IEEE Transactions on Circuits and Systems for Video Technology

JF - IEEE Transactions on Circuits and Systems for Video Technology

IS - 1

M1 - 9023386

ER -

Attentional Kernel Encoding Networks for Fine-Grained Visual Categorization

摘要

访问文件

其它文件与链接

指纹

引用此