Dual‐stage semantic segmentation of endoscopic surgical instruments

IF 3.2 2区医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING Medical physics Pub Date : 2024-09-10 DOI:10.1002/mp.17397

Wenxin Chen, Kaifeng Wang, Xinya Song, Dongsheng Xie, Xue Li, Mobarakol Islam, Changsheng Li, Xingguang Duan

{"title":"Dual‐stage semantic segmentation of endoscopic surgical instruments","authors":"Wenxin Chen, Kaifeng Wang, Xinya Song, Dongsheng Xie, Xue Li, Mobarakol Islam, Changsheng Li, Xingguang Duan","doi":"10.1002/mp.17397","DOIUrl":null,"url":null,"abstract":"BackgroundEndoscopic instrument segmentation is essential for ensuring the safety of robotic‐assisted spinal endoscopic surgeries. However, due to the narrow operative region, intricate surrounding tissues, and limited visibility, achieving instrument segmentation within the endoscopic view remains challenging.PurposeThis work aims to devise a method to segment surgical instruments in endoscopic video. By designing an endoscopic image classification model, features of frames before and after the video are extracted to achieve continuous and precise segmentation of instruments in endoscopic videos.MethodsDeep learning techniques serve as the algorithmic core for constructing the convolutional neural network proposed in this study. The method comprises dual stages: image classification and instrument segmentation. MobileViT is employed for image classification, enabling the extraction of key features of different instruments and generating classification results. DeepLabv3+ is utilized for instrument segmentation. By training on distinct instruments separately, corresponding model parameters are obtained. Lastly, a flag caching mechanism along with a blur detection module is designed to effectively utilize the image features in consecutive frames. By incorporating specific parameters into the segmentation model, better segmentation of surgical instruments can be achieved in endoscopic videos.ResultsThe classification and segmentation models are evaluated on an endoscopic image dataset. In the dataset used for instrument segmentation, the training set consists of 7456 images, the validation set consists of 829 images, and the test set consists of 921 images. In the dataset used for image classification, the training set consists of 2400 images and the validation set consists of 600 images. The image classification model achieves an accuracy of 70% on the validation set. For the segmentation model, experiments are conducted on two common surgical instruments, and the mean Intersection over Union (mIoU) exceeds 98%. Furthermore, the proposed video segmentation method is tested using videos collected during surgeries, validating the effectiveness of the flag caching mechanism and blur detection module.ConclusionsExperimental results on the dataset demonstrate that the dual‐stage video processing method excels in performing instrument segmentation tasks under endoscopic conditions. This advancement is significant for enhancing the intelligence level of robotic‐assisted spinal endoscopic surgeries.","PeriodicalId":18384,"journal":{"name":"Medical physics","volume":"9 1","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical physics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/mp.17397","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

BackgroundEndoscopic instrument segmentation is essential for ensuring the safety of robotic‐assisted spinal endoscopic surgeries. However, due to the narrow operative region, intricate surrounding tissues, and limited visibility, achieving instrument segmentation within the endoscopic view remains challenging.PurposeThis work aims to devise a method to segment surgical instruments in endoscopic video. By designing an endoscopic image classification model, features of frames before and after the video are extracted to achieve continuous and precise segmentation of instruments in endoscopic videos.MethodsDeep learning techniques serve as the algorithmic core for constructing the convolutional neural network proposed in this study. The method comprises dual stages: image classification and instrument segmentation. MobileViT is employed for image classification, enabling the extraction of key features of different instruments and generating classification results. DeepLabv3+ is utilized for instrument segmentation. By training on distinct instruments separately, corresponding model parameters are obtained. Lastly, a flag caching mechanism along with a blur detection module is designed to effectively utilize the image features in consecutive frames. By incorporating specific parameters into the segmentation model, better segmentation of surgical instruments can be achieved in endoscopic videos.ResultsThe classification and segmentation models are evaluated on an endoscopic image dataset. In the dataset used for instrument segmentation, the training set consists of 7456 images, the validation set consists of 829 images, and the test set consists of 921 images. In the dataset used for image classification, the training set consists of 2400 images and the validation set consists of 600 images. The image classification model achieves an accuracy of 70% on the validation set. For the segmentation model, experiments are conducted on two common surgical instruments, and the mean Intersection over Union (mIoU) exceeds 98%. Furthermore, the proposed video segmentation method is tested using videos collected during surgeries, validating the effectiveness of the flag caching mechanism and blur detection module.ConclusionsExperimental results on the dataset demonstrate that the dual‐stage video processing method excels in performing instrument segmentation tasks under endoscopic conditions. This advancement is significant for enhancing the intelligence level of robotic‐assisted spinal endoscopic surgeries.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

内窥镜手术器械的双阶段语义分割

背景内窥镜器械分割对于确保机器人辅助脊柱内窥镜手术的安全性至关重要。然而，由于手术区域狭窄、周围组织错综复杂以及能见度有限，在内窥镜视图内实现器械分割仍具有挑战性。方法深度学习技术是构建本研究提出的卷积神经网络的算法核心。该方法包括两个阶段：图像分类和器械分割。图像分类采用 MobileViT，可提取不同器械的关键特征并生成分类结果。DeepLabv3+ 用于仪器分割。通过对不同仪器分别进行训练，可获得相应的模型参数。最后，还设计了标志缓存机制和模糊检测模块，以有效利用连续帧中的图像特征。通过在分割模型中加入特定参数，可以更好地分割内窥镜视频中的手术器械。在用于器械分割的数据集中，训练集由 7456 张图像组成，验证集由 829 张图像组成，测试集由 921 张图像组成。在用于图像分类的数据集中，训练集由 2400 幅图像组成，验证集由 600 幅图像组成。图像分类模型在验证集上的准确率达到 70%。在分割模型方面，对两种常见的手术器械进行了实验，其平均交叉联合率（mIoU）超过了 98%。结论数据集上的实验结果表明，双阶段视频处理方法在内窥镜条件下执行器械分割任务时表现出色。这一进步对于提高机器人辅助脊柱内窥镜手术的智能化水平意义重大。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Medical physics 医学-核医学

CiteScore

6.80

自引率

15.80%

发文量

660

审稿时长

1.7 months

期刊介绍： Medical Physics publishes original, high impact physics, imaging science, and engineering research that advances patient diagnosis and therapy through contributions in 1) Basic science developments with high potential for clinical translation 2) Clinical applications of cutting edge engineering and physics innovations 3) Broadly applicable and innovative clinical physics developments Medical Physics is a journal of global scope and reach. By publishing in Medical Physics your research will reach an international, multidisciplinary audience including practicing medical physicists as well as physics- and engineering based translational scientists. We work closely with authors of promising articles to improve their quality.

期刊最新文献

Issue Information Cover List of Advertisers Issue Information List of Advertisers