Hierarchical Motion Excitation Network for Few-Shot Video Recognition

Bing Wang; Xiaohua Wang; Shiwei Ren; Weijiang Wang; Yueting Shi

doi:10.3390/electronics12051090

Hierarchical Motion Excitation Network for Few-Shot Video Recognition

Bing Wang, Xiaohua Wang, Shiwei Ren, Weijiang Wang, Yueting Shi^*

^*此作品的通讯作者

集成电路与电子学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

Most of the existing deep learning algorithms are supervised learning and rely on a tremendous number of manually labeled samples. However, in most domains, due to the scarcity of samples or the excessive cost of labeling, it would be impracticable to provide numerous labeled training samples to the network. In this paper, a few-shot video classification network termed Hierarchical Motion Excitation Network (HME-Net) is proposed from the perspective of accumulated feature-level motion information. An HME module composed of Motion Excitation (ME) and Interval Frame Motion Excitation (IFME) is designed to extract feature-level motion patterns from adjacent frames and interval frames. The HME module can discover and enhance the feature-level motion-sensitive information in the original features. The accumulative time window is expanded to four frames in a hierarchical manner, which achieves the purpose of increasing the receptive field. After extensive experimentation, HME-Net is demonstrated to be able to consistently outperform the existing few-shot video classification models. On the UCF101 and HMDB51 datasets, our method is established as a new state-of-the-art technique for the few-shot settings of five-way three-shot and five-way five-shot video recognition.

源语言	英语
文章编号	1090
期刊	Electronics (Switzerland)
卷	12
期	5
DOI	https://doi.org/10.3390/electronics12051090
出版状态	已出版 - 3月 2023

访问文件

10.3390/electronics12051090

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{d66cda2b3998456795e64a28b2aec448,

title = "Hierarchical Motion Excitation Network for Few-Shot Video Recognition",

abstract = "Most of the existing deep learning algorithms are supervised learning and rely on a tremendous number of manually labeled samples. However, in most domains, due to the scarcity of samples or the excessive cost of labeling, it would be impracticable to provide numerous labeled training samples to the network. In this paper, a few-shot video classification network termed Hierarchical Motion Excitation Network (HME-Net) is proposed from the perspective of accumulated feature-level motion information. An HME module composed of Motion Excitation (ME) and Interval Frame Motion Excitation (IFME) is designed to extract feature-level motion patterns from adjacent frames and interval frames. The HME module can discover and enhance the feature-level motion-sensitive information in the original features. The accumulative time window is expanded to four frames in a hierarchical manner, which achieves the purpose of increasing the receptive field. After extensive experimentation, HME-Net is demonstrated to be able to consistently outperform the existing few-shot video classification models. On the UCF101 and HMDB51 datasets, our method is established as a new state-of-the-art technique for the few-shot settings of five-way three-shot and five-way five-shot video recognition.",

keywords = "few-shot learning, meta-learning, motion information, video recognition",

author = "Bing Wang and Xiaohua Wang and Shiwei Ren and Weijiang Wang and Yueting Shi",

note = "Publisher Copyright: {\textcopyright} 2023 by the authors.",

year = "2023",

month = mar,

doi = "10.3390/electronics12051090",

language = "English",

volume = "12",

journal = "Electronics (Switzerland)",

issn = "2079-9292",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "5",

}

TY - JOUR

T1 - Hierarchical Motion Excitation Network for Few-Shot Video Recognition

AU - Wang, Bing

AU - Wang, Xiaohua

AU - Ren, Shiwei

AU - Wang, Weijiang

AU - Shi, Yueting

PY - 2023/3

Y1 - 2023/3

N2 - Most of the existing deep learning algorithms are supervised learning and rely on a tremendous number of manually labeled samples. However, in most domains, due to the scarcity of samples or the excessive cost of labeling, it would be impracticable to provide numerous labeled training samples to the network. In this paper, a few-shot video classification network termed Hierarchical Motion Excitation Network (HME-Net) is proposed from the perspective of accumulated feature-level motion information. An HME module composed of Motion Excitation (ME) and Interval Frame Motion Excitation (IFME) is designed to extract feature-level motion patterns from adjacent frames and interval frames. The HME module can discover and enhance the feature-level motion-sensitive information in the original features. The accumulative time window is expanded to four frames in a hierarchical manner, which achieves the purpose of increasing the receptive field. After extensive experimentation, HME-Net is demonstrated to be able to consistently outperform the existing few-shot video classification models. On the UCF101 and HMDB51 datasets, our method is established as a new state-of-the-art technique for the few-shot settings of five-way three-shot and five-way five-shot video recognition.

AB - Most of the existing deep learning algorithms are supervised learning and rely on a tremendous number of manually labeled samples. However, in most domains, due to the scarcity of samples or the excessive cost of labeling, it would be impracticable to provide numerous labeled training samples to the network. In this paper, a few-shot video classification network termed Hierarchical Motion Excitation Network (HME-Net) is proposed from the perspective of accumulated feature-level motion information. An HME module composed of Motion Excitation (ME) and Interval Frame Motion Excitation (IFME) is designed to extract feature-level motion patterns from adjacent frames and interval frames. The HME module can discover and enhance the feature-level motion-sensitive information in the original features. The accumulative time window is expanded to four frames in a hierarchical manner, which achieves the purpose of increasing the receptive field. After extensive experimentation, HME-Net is demonstrated to be able to consistently outperform the existing few-shot video classification models. On the UCF101 and HMDB51 datasets, our method is established as a new state-of-the-art technique for the few-shot settings of five-way three-shot and five-way five-shot video recognition.

KW - few-shot learning

KW - meta-learning

KW - motion information

KW - video recognition

UR - http://www.scopus.com/inward/record.url?scp=85149971606&partnerID=8YFLogxK

U2 - 10.3390/electronics12051090

DO - 10.3390/electronics12051090

M3 - Article

AN - SCOPUS:85149971606

SN - 2079-9292

VL - 12

JO - Electronics (Switzerland)

JF - Electronics (Switzerland)

IS - 5

M1 - 1090

ER -

Hierarchical Motion Excitation Network for Few-Shot Video Recognition

摘要

访问文件

其它文件与链接

指纹

引用此