Surgical workflow recognition using two-stream mixed convolution network
Abstract—Surgical workflow recognition is the prerequisite for automatic indexing of surgical video databases and optimization of real-time operating scheduling, which is an important part of the modern operating room (OR). In this paper, we propose a surgical phase recognition method based on a two-stream mixed convolutional network (TsMCNet) to automatically recognize surgical workflow. TsMCNet optimizes the visual and temporal features learned from surgical videos by integrating 2D and 3D convolutional networks (CNNs) to form a spatio-temporal complementary architecture. Specifically, temporal branch (3D CNN) is responsible for learning the spatio-temporal features among adjacent frames, whereas the parallel visual branch (2D CNN) is focused on capturing the deep visual features of each frame. Extensive experiments on a public surgical video dataset (MICCAI 2016 Workflow Challenge) demonstrated outstanding performance of our proposed method, exceeding that of state-of-the-art methods (e.g., 86.2% accuracy and 83.0% F1 score).