TY - GEN
T1 - Surgical workflow recognition using two-stream mixed convolution network
AU - Ding, Yuan
AU - Fan, Jingfan
AU - Pang, Kun
AU - Li, Heng
AU - Fu, Tianyu
AU - Song, Hong
AU - Chen, Lingfeng
AU - Yang, Jian
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/4
Y1 - 2020/4
N2 - Surgical workflow recognition is the prerequisite for automatic indexing of surgical video databases and optimization of real-time operating scheduling, which is an important part of the modern operating room (OR). In this paper, we propose a surgical phase recognition method based on a two-stream mixed convolutional network (TsMCNet) to automatically recognize surgical workflow. TsMCNet optimizes the visual and temporal features learned from surgical videos by integrating 2D and 3D convolutional networks (CNNs) to form a spatio-temporal complementary architecture. Specifically, temporal branch (3D CNN) is responsible for learning the spatio-temporal features among adjacent frames, whereas the parallel visual branch (2D CNN) is focused on capturing the deep visual features of each frame. Extensive experiments on a public surgical video dataset (MICCAI 2016 Workflow Challenge) demonstrated outstanding performance of our proposed method, exceeding that of state-of-the-art methods (e.g., 86.2% accuracy and 83.0% F1 score).
AB - Surgical workflow recognition is the prerequisite for automatic indexing of surgical video databases and optimization of real-time operating scheduling, which is an important part of the modern operating room (OR). In this paper, we propose a surgical phase recognition method based on a two-stream mixed convolutional network (TsMCNet) to automatically recognize surgical workflow. TsMCNet optimizes the visual and temporal features learned from surgical videos by integrating 2D and 3D convolutional networks (CNNs) to form a spatio-temporal complementary architecture. Specifically, temporal branch (3D CNN) is responsible for learning the spatio-temporal features among adjacent frames, whereas the parallel visual branch (2D CNN) is focused on capturing the deep visual features of each frame. Extensive experiments on a public surgical video dataset (MICCAI 2016 Workflow Challenge) demonstrated outstanding performance of our proposed method, exceeding that of state-of-the-art methods (e.g., 86.2% accuracy and 83.0% F1 score).
KW - Convolutional neural network
KW - Spatio-temporal features
KW - Surgical video analysis
KW - Temporal information
UR - http://www.scopus.com/inward/record.url?scp=85088642870&partnerID=8YFLogxK
U2 - 10.1109/AEMCSE50948.2020.00064
DO - 10.1109/AEMCSE50948.2020.00064
M3 - Conference contribution
AN - SCOPUS:85088642870
T3 - Proceedings - 2020 3rd International Conference on Advanced Electronic Materials, Computers and Software Engineering, AEMCSE 2020
SP - 264
EP - 269
BT - Proceedings - 2020 3rd International Conference on Advanced Electronic Materials, Computers and Software Engineering, AEMCSE 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 3rd International Conference on Advanced Electronic Materials, Computers and Software Engineering, AEMCSE 2020
Y2 - 24 April 2020 through 26 April 2020
ER -