Multi-level feature aggregation network for instrument identification of endoscopic images

Yakui Chu; Xilin Yang; Heng Li; Danni Ai; Yuan Ding; Jingfan Fan; Hong Song; Jian Yang

doi:10.1088/1361-6560/ab8dda

Multi-level feature aggregation network for instrument identification of endoscopic images

Yakui Chu^*, Xilin Yang^*, Heng Li, Danni Ai, Yuan Ding, Jingfan Fan, Hong Song, Jian Yang

^*此作品的通讯作者

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

11 引用（Scopus）

摘要

Identification of surgical instruments is crucial in understanding surgical scenarios and providing an assistive process in endoscopic image-guided surgery. This study proposes a novel multilevel feature-aggregated deep convolutional neural network (MLFA-Net) for identifying surgical instruments in endoscopic images. First, a global feature augmentation layer is created on the top layer of the backbone to improve the localization ability of object identification by boosting the high-level semantic information to the feature flow network. Second, a modified interaction path of cross-channel features is proposed to increase the nonlinear combination of features in the same level and improve the efficiency of information propagation. Third, a multiview fusion branch of features is built to aggregate the location-sensitive information of the same level in different views, increase the information diversity of features, and enhance the localization ability of objects. By utilizing the latent information, the proposed network of multilevel feature aggregation can accomplish multitask instrument identification with a single network. Three tasks are handled by the proposed network, including object detection, which classifies the type of instrument and locates its border; mask segmentation, which detects the instrument shape; and pose estimation, which detects the keypoint of instrument parts. The experiments are performed on laparoscopic images from MICCAI 2017 Endoscopic Vision Challenge, and the mean average precision (AP) and average recall (AR) are utilized to quantify the segmentation and pose estimation results. For the bounding box regression, the AP and AR are 79.1% and 63.2%, respectively, while the AP and AR of mask segmentation are 78.1% and 62.1%, and the AP and AR of the pose estimation achieve 67.1% and 55.7%, respectively. The experiments demonstrate that our method efficiently improves the recognition accuracy of the instrument in endoscopic images, and outperforms the other state-of-the-art methods.

源语言	英语
文章编号	165004
期刊	Physics in Medicine and Biology
卷	65
期	16
DOI	https://doi.org/10.1088/1361-6560/ab8dda
出版状态	已出版 - 21 8月 2020

访问文件

10.1088/1361-6560/ab8dda

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{4c71acc7387b477e8bee8beabf4caf9e,

title = "Multi-level feature aggregation network for instrument identification of endoscopic images",

abstract = "Identification of surgical instruments is crucial in understanding surgical scenarios and providing an assistive process in endoscopic image-guided surgery. This study proposes a novel multilevel feature-aggregated deep convolutional neural network (MLFA-Net) for identifying surgical instruments in endoscopic images. First, a global feature augmentation layer is created on the top layer of the backbone to improve the localization ability of object identification by boosting the high-level semantic information to the feature flow network. Second, a modified interaction path of cross-channel features is proposed to increase the nonlinear combination of features in the same level and improve the efficiency of information propagation. Third, a multiview fusion branch of features is built to aggregate the location-sensitive information of the same level in different views, increase the information diversity of features, and enhance the localization ability of objects. By utilizing the latent information, the proposed network of multilevel feature aggregation can accomplish multitask instrument identification with a single network. Three tasks are handled by the proposed network, including object detection, which classifies the type of instrument and locates its border; mask segmentation, which detects the instrument shape; and pose estimation, which detects the keypoint of instrument parts. The experiments are performed on laparoscopic images from MICCAI 2017 Endoscopic Vision Challenge, and the mean average precision (AP) and average recall (AR) are utilized to quantify the segmentation and pose estimation results. For the bounding box regression, the AP and AR are 79.1% and 63.2%, respectively, while the AP and AR of mask segmentation are 78.1% and 62.1%, and the AP and AR of the pose estimation achieve 67.1% and 55.7%, respectively. The experiments demonstrate that our method efficiently improves the recognition accuracy of the instrument in endoscopic images, and outperforms the other state-of-the-art methods.",

keywords = "convolutional neural networks, endoscopic image, instrument identification",

author = "Yakui Chu and Xilin Yang and Heng Li and Danni Ai and Yuan Ding and Jingfan Fan and Hong Song and Jian Yang",

note = "Publisher Copyright: {\textcopyright} 2020 Institute of Physics and Engineering in Medicine.",

year = "2020",

month = aug,

day = "21",

doi = "10.1088/1361-6560/ab8dda",

language = "English",

volume = "65",

journal = "Physics in Medicine and Biology",

issn = "0031-9155",

publisher = "IOP Publishing Ltd.",

number = "16",

}

TY - JOUR

T1 - Multi-level feature aggregation network for instrument identification of endoscopic images

AU - Chu, Yakui

AU - Yang, Xilin

AU - Li, Heng

AU - Ai, Danni

AU - Ding, Yuan

AU - Fan, Jingfan

AU - Song, Hong

AU - Yang, Jian

PY - 2020/8/21

Y1 - 2020/8/21

N2 - Identification of surgical instruments is crucial in understanding surgical scenarios and providing an assistive process in endoscopic image-guided surgery. This study proposes a novel multilevel feature-aggregated deep convolutional neural network (MLFA-Net) for identifying surgical instruments in endoscopic images. First, a global feature augmentation layer is created on the top layer of the backbone to improve the localization ability of object identification by boosting the high-level semantic information to the feature flow network. Second, a modified interaction path of cross-channel features is proposed to increase the nonlinear combination of features in the same level and improve the efficiency of information propagation. Third, a multiview fusion branch of features is built to aggregate the location-sensitive information of the same level in different views, increase the information diversity of features, and enhance the localization ability of objects. By utilizing the latent information, the proposed network of multilevel feature aggregation can accomplish multitask instrument identification with a single network. Three tasks are handled by the proposed network, including object detection, which classifies the type of instrument and locates its border; mask segmentation, which detects the instrument shape; and pose estimation, which detects the keypoint of instrument parts. The experiments are performed on laparoscopic images from MICCAI 2017 Endoscopic Vision Challenge, and the mean average precision (AP) and average recall (AR) are utilized to quantify the segmentation and pose estimation results. For the bounding box regression, the AP and AR are 79.1% and 63.2%, respectively, while the AP and AR of mask segmentation are 78.1% and 62.1%, and the AP and AR of the pose estimation achieve 67.1% and 55.7%, respectively. The experiments demonstrate that our method efficiently improves the recognition accuracy of the instrument in endoscopic images, and outperforms the other state-of-the-art methods.

AB - Identification of surgical instruments is crucial in understanding surgical scenarios and providing an assistive process in endoscopic image-guided surgery. This study proposes a novel multilevel feature-aggregated deep convolutional neural network (MLFA-Net) for identifying surgical instruments in endoscopic images. First, a global feature augmentation layer is created on the top layer of the backbone to improve the localization ability of object identification by boosting the high-level semantic information to the feature flow network. Second, a modified interaction path of cross-channel features is proposed to increase the nonlinear combination of features in the same level and improve the efficiency of information propagation. Third, a multiview fusion branch of features is built to aggregate the location-sensitive information of the same level in different views, increase the information diversity of features, and enhance the localization ability of objects. By utilizing the latent information, the proposed network of multilevel feature aggregation can accomplish multitask instrument identification with a single network. Three tasks are handled by the proposed network, including object detection, which classifies the type of instrument and locates its border; mask segmentation, which detects the instrument shape; and pose estimation, which detects the keypoint of instrument parts. The experiments are performed on laparoscopic images from MICCAI 2017 Endoscopic Vision Challenge, and the mean average precision (AP) and average recall (AR) are utilized to quantify the segmentation and pose estimation results. For the bounding box regression, the AP and AR are 79.1% and 63.2%, respectively, while the AP and AR of mask segmentation are 78.1% and 62.1%, and the AP and AR of the pose estimation achieve 67.1% and 55.7%, respectively. The experiments demonstrate that our method efficiently improves the recognition accuracy of the instrument in endoscopic images, and outperforms the other state-of-the-art methods.

KW - convolutional neural networks

KW - endoscopic image

KW - instrument identification

UR - http://www.scopus.com/inward/record.url?scp=85091070781&partnerID=8YFLogxK

U2 - 10.1088/1361-6560/ab8dda

DO - 10.1088/1361-6560/ab8dda

M3 - Article

C2 - 32344381

AN - SCOPUS:85091070781

SN - 0031-9155

VL - 65

JO - Physics in Medicine and Biology

JF - Physics in Medicine and Biology

IS - 16

M1 - 165004

ER -

Multi-level feature aggregation network for instrument identification of endoscopic images

摘要

访问文件

其它文件与链接

指纹

引用此