TY - GEN
T1 - An Action Recognition Algorithm Based on Two-Stream Deep Learning for Metaverse Applications
AU - Liu, Jiayue
AU - Mao, Tianqi
AU - Huang, Yicheng
AU - He, Dongxuan
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Action recognition algorithms have gained significant attention in recent years, which can be indispensable for a plethora of cutting-edge applications like extended reality or Metaverse. These services often pose stringent requirement on immediate sensing and cognition of the surroundings, which necessitates immediate classifications of the captured actions (e.g., video data) that classical signal processing methods can hardly attain. In this paper, we introduced a residual artificial neural network with two-stream structure to further improve the accuracy of action recognition algorithm. Specifically, two residual networks (ResNet101) are trained separately, one by spatial RGB image streams, and another by optical flow streams. The two-strem network outputs are then fed into a fusion classifier, in which information extracted by spatial network and temporal network jointly determines the classification result. Moreover, in the training process, hyper-parameters setting and optimizer selection are performed numerically to achieve optimal performance. Finally, the recognition accuracy of the proposed algorithm has been compared to other existing widely-employed counterparts, where UCF101 data set is utilized for training and testing. Simulations validates aiming that the network can achieve higher recognition accuracy than traditional algorithms, and the two-stream method shows its superiority over the single-network counterpart.
AB - Action recognition algorithms have gained significant attention in recent years, which can be indispensable for a plethora of cutting-edge applications like extended reality or Metaverse. These services often pose stringent requirement on immediate sensing and cognition of the surroundings, which necessitates immediate classifications of the captured actions (e.g., video data) that classical signal processing methods can hardly attain. In this paper, we introduced a residual artificial neural network with two-stream structure to further improve the accuracy of action recognition algorithm. Specifically, two residual networks (ResNet101) are trained separately, one by spatial RGB image streams, and another by optical flow streams. The two-strem network outputs are then fed into a fusion classifier, in which information extracted by spatial network and temporal network jointly determines the classification result. Moreover, in the training process, hyper-parameters setting and optimizer selection are performed numerically to achieve optimal performance. Finally, the recognition accuracy of the proposed algorithm has been compared to other existing widely-employed counterparts, where UCF101 data set is utilized for training and testing. Simulations validates aiming that the network can achieve higher recognition accuracy than traditional algorithms, and the two-stream method shows its superiority over the single-network counterpart.
KW - Action recognition
KW - deep learning
KW - residual network
KW - ResNet101
KW - two-stream method
UR - http://www.scopus.com/inward/record.url?scp=85199991491&partnerID=8YFLogxK
U2 - 10.1109/IWCMC61514.2024.10592362
DO - 10.1109/IWCMC61514.2024.10592362
M3 - Conference contribution
AN - SCOPUS:85199991491
T3 - 20th International Wireless Communications and Mobile Computing Conference, IWCMC 2024
SP - 639
EP - 642
BT - 20th International Wireless Communications and Mobile Computing Conference, IWCMC 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 20th IEEE International Wireless Communications and Mobile Computing Conference, IWCMC 2024
Y2 - 27 May 2024 through 31 May 2024
ER -