Multi-level feature aggregation network for instrument identification of endoscopic images

Yakui Chu; Xilin Yang; Heng Li; Danni Ai; Yuan Ding; Jingfan Fan; Hong Song; Jian Yang

doi:10.1088/1361-6560/ab8dda

Multi-level feature aggregation network for instrument identification of endoscopic images

Yakui Chu^*, Xilin Yang^*, Heng Li, Danni Ai, Yuan Ding, Jingfan Fan, Hong Song, Jian Yang

^*Corresponding author for this work

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

12 Citations (Scopus)

Abstract

Identification of surgical instruments is crucial in understanding surgical scenarios and providing an assistive process in endoscopic image-guided surgery. This study proposes a novel multilevel feature-aggregated deep convolutional neural network (MLFA-Net) for identifying surgical instruments in endoscopic images. First, a global feature augmentation layer is created on the top layer of the backbone to improve the localization ability of object identification by boosting the high-level semantic information to the feature flow network. Second, a modified interaction path of cross-channel features is proposed to increase the nonlinear combination of features in the same level and improve the efficiency of information propagation. Third, a multiview fusion branch of features is built to aggregate the location-sensitive information of the same level in different views, increase the information diversity of features, and enhance the localization ability of objects. By utilizing the latent information, the proposed network of multilevel feature aggregation can accomplish multitask instrument identification with a single network. Three tasks are handled by the proposed network, including object detection, which classifies the type of instrument and locates its border; mask segmentation, which detects the instrument shape; and pose estimation, which detects the keypoint of instrument parts. The experiments are performed on laparoscopic images from MICCAI 2017 Endoscopic Vision Challenge, and the mean average precision (AP) and average recall (AR) are utilized to quantify the segmentation and pose estimation results. For the bounding box regression, the AP and AR are 79.1% and 63.2%, respectively, while the AP and AR of mask segmentation are 78.1% and 62.1%, and the AP and AR of the pose estimation achieve 67.1% and 55.7%, respectively. The experiments demonstrate that our method efficiently improves the recognition accuracy of the instrument in endoscopic images, and outperforms the other state-of-the-art methods.

Original language	English
Article number	165004
Journal	Physics in Medicine and Biology
Volume	65
Issue number	16
DOIs	https://doi.org/10.1088/1361-6560/ab8dda
Publication status	Published - 21 Aug 2020

Keywords

convolutional neural networks
endoscopic image
instrument identification

Access to Document

10.1088/1361-6560/ab8dda

Cite this

Chu, Y., Yang, X., Li, H., Ai, D., Ding, Y., Fan, J., Song, H., & Yang, J. (2020). Multi-level feature aggregation network for instrument identification of endoscopic images. Physics in Medicine and Biology, 65(16), Article 165004. https://doi.org/10.1088/1361-6560/ab8dda

@article{4c71acc7387b477e8bee8beabf4caf9e,

title = "Multi-level feature aggregation network for instrument identification of endoscopic images",

abstract = "Identification of surgical instruments is crucial in understanding surgical scenarios and providing an assistive process in endoscopic image-guided surgery. This study proposes a novel multilevel feature-aggregated deep convolutional neural network (MLFA-Net) for identifying surgical instruments in endoscopic images. First, a global feature augmentation layer is created on the top layer of the backbone to improve the localization ability of object identification by boosting the high-level semantic information to the feature flow network. Second, a modified interaction path of cross-channel features is proposed to increase the nonlinear combination of features in the same level and improve the efficiency of information propagation. Third, a multiview fusion branch of features is built to aggregate the location-sensitive information of the same level in different views, increase the information diversity of features, and enhance the localization ability of objects. By utilizing the latent information, the proposed network of multilevel feature aggregation can accomplish multitask instrument identification with a single network. Three tasks are handled by the proposed network, including object detection, which classifies the type of instrument and locates its border; mask segmentation, which detects the instrument shape; and pose estimation, which detects the keypoint of instrument parts. The experiments are performed on laparoscopic images from MICCAI 2017 Endoscopic Vision Challenge, and the mean average precision (AP) and average recall (AR) are utilized to quantify the segmentation and pose estimation results. For the bounding box regression, the AP and AR are 79.1% and 63.2%, respectively, while the AP and AR of mask segmentation are 78.1% and 62.1%, and the AP and AR of the pose estimation achieve 67.1% and 55.7%, respectively. The experiments demonstrate that our method efficiently improves the recognition accuracy of the instrument in endoscopic images, and outperforms the other state-of-the-art methods.",

keywords = "convolutional neural networks, endoscopic image, instrument identification",

author = "Yakui Chu and Xilin Yang and Heng Li and Danni Ai and Yuan Ding and Jingfan Fan and Hong Song and Jian Yang",

note = "Publisher Copyright: {\textcopyright} 2020 Institute of Physics and Engineering in Medicine.",

year = "2020",

month = aug,

day = "21",

doi = "10.1088/1361-6560/ab8dda",

language = "English",

volume = "65",

journal = "Physics in Medicine and Biology",

issn = "0031-9155",

publisher = "IOP Publishing Ltd.",

number = "16",

}

TY - JOUR

T1 - Multi-level feature aggregation network for instrument identification of endoscopic images

AU - Chu, Yakui

AU - Yang, Xilin

AU - Li, Heng

AU - Ai, Danni

AU - Ding, Yuan

AU - Fan, Jingfan

AU - Song, Hong

AU - Yang, Jian

PY - 2020/8/21

Y1 - 2020/8/21

N2 - Identification of surgical instruments is crucial in understanding surgical scenarios and providing an assistive process in endoscopic image-guided surgery. This study proposes a novel multilevel feature-aggregated deep convolutional neural network (MLFA-Net) for identifying surgical instruments in endoscopic images. First, a global feature augmentation layer is created on the top layer of the backbone to improve the localization ability of object identification by boosting the high-level semantic information to the feature flow network. Second, a modified interaction path of cross-channel features is proposed to increase the nonlinear combination of features in the same level and improve the efficiency of information propagation. Third, a multiview fusion branch of features is built to aggregate the location-sensitive information of the same level in different views, increase the information diversity of features, and enhance the localization ability of objects. By utilizing the latent information, the proposed network of multilevel feature aggregation can accomplish multitask instrument identification with a single network. Three tasks are handled by the proposed network, including object detection, which classifies the type of instrument and locates its border; mask segmentation, which detects the instrument shape; and pose estimation, which detects the keypoint of instrument parts. The experiments are performed on laparoscopic images from MICCAI 2017 Endoscopic Vision Challenge, and the mean average precision (AP) and average recall (AR) are utilized to quantify the segmentation and pose estimation results. For the bounding box regression, the AP and AR are 79.1% and 63.2%, respectively, while the AP and AR of mask segmentation are 78.1% and 62.1%, and the AP and AR of the pose estimation achieve 67.1% and 55.7%, respectively. The experiments demonstrate that our method efficiently improves the recognition accuracy of the instrument in endoscopic images, and outperforms the other state-of-the-art methods.

AB - Identification of surgical instruments is crucial in understanding surgical scenarios and providing an assistive process in endoscopic image-guided surgery. This study proposes a novel multilevel feature-aggregated deep convolutional neural network (MLFA-Net) for identifying surgical instruments in endoscopic images. First, a global feature augmentation layer is created on the top layer of the backbone to improve the localization ability of object identification by boosting the high-level semantic information to the feature flow network. Second, a modified interaction path of cross-channel features is proposed to increase the nonlinear combination of features in the same level and improve the efficiency of information propagation. Third, a multiview fusion branch of features is built to aggregate the location-sensitive information of the same level in different views, increase the information diversity of features, and enhance the localization ability of objects. By utilizing the latent information, the proposed network of multilevel feature aggregation can accomplish multitask instrument identification with a single network. Three tasks are handled by the proposed network, including object detection, which classifies the type of instrument and locates its border; mask segmentation, which detects the instrument shape; and pose estimation, which detects the keypoint of instrument parts. The experiments are performed on laparoscopic images from MICCAI 2017 Endoscopic Vision Challenge, and the mean average precision (AP) and average recall (AR) are utilized to quantify the segmentation and pose estimation results. For the bounding box regression, the AP and AR are 79.1% and 63.2%, respectively, while the AP and AR of mask segmentation are 78.1% and 62.1%, and the AP and AR of the pose estimation achieve 67.1% and 55.7%, respectively. The experiments demonstrate that our method efficiently improves the recognition accuracy of the instrument in endoscopic images, and outperforms the other state-of-the-art methods.

KW - convolutional neural networks

KW - endoscopic image

KW - instrument identification

UR - http://www.scopus.com/inward/record.url?scp=85091070781&partnerID=8YFLogxK

U2 - 10.1088/1361-6560/ab8dda

DO - 10.1088/1361-6560/ab8dda

M3 - Article

C2 - 32344381

AN - SCOPUS:85091070781

SN - 0031-9155

VL - 65

JO - Physics in Medicine and Biology

JF - Physics in Medicine and Biology

IS - 16

M1 - 165004

ER -

Multi-level feature aggregation network for instrument identification of endoscopic images

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this