Abstract
Most of the existing deep learning algorithms are supervised learning and rely on a tremendous number of manually labeled samples. However, in most domains, due to the scarcity of samples or the excessive cost of labeling, it would be impracticable to provide numerous labeled training samples to the network. In this paper, a few-shot video classification network termed Hierarchical Motion Excitation Network (HME-Net) is proposed from the perspective of accumulated feature-level motion information. An HME module composed of Motion Excitation (ME) and Interval Frame Motion Excitation (IFME) is designed to extract feature-level motion patterns from adjacent frames and interval frames. The HME module can discover and enhance the feature-level motion-sensitive information in the original features. The accumulative time window is expanded to four frames in a hierarchical manner, which achieves the purpose of increasing the receptive field. After extensive experimentation, HME-Net is demonstrated to be able to consistently outperform the existing few-shot video classification models. On the UCF101 and HMDB51 datasets, our method is established as a new state-of-the-art technique for the few-shot settings of five-way three-shot and five-way five-shot video recognition.
Original language | English |
---|---|
Article number | 1090 |
Journal | Electronics (Switzerland) |
Volume | 12 |
Issue number | 5 |
DOIs | |
Publication status | Published - Mar 2023 |
Keywords
- few-shot learning
- meta-learning
- motion information
- video recognition