LACOSTE: Exploiting stereo and temporal contexts for surgical instrument segmentation

IF 10.7 1区医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Medical image analysis Pub Date : 2024-11-12 DOI:10.1016/j.media.2024.103387

Qiyuan Wang , Shang Zhao , Zikang Xu , S. Kevin Zhou

{"title":"LACOSTE: Exploiting stereo and temporal contexts for surgical instrument segmentation","authors":"Qiyuan Wang , Shang Zhao , Zikang Xu , S. Kevin Zhou","doi":"10.1016/j.media.2024.103387","DOIUrl":null,"url":null,"abstract":"<div><div>Surgical instrument segmentation is instrumental to minimally invasive surgeries and related applications. Most previous methods formulate this task as single-frame-based instance segmentation while ignoring the natural temporal and stereo attributes of a surgical video. As a result, these methods are less robust against the appearance variation through temporal motion and view change. In this work, we propose a novel <strong>LACOSTE</strong> model that exploits <strong>L</strong>ocation-<strong>A</strong>gnostic <strong>CO</strong>ntexts in <strong>S</strong>tereo and <strong>TE</strong>mporal images for improved surgical instrument segmentation. Leveraging a query-based segmentation model as core, we design three performance-enhancing modules. Firstly, we design a disparity-guided feature propagation module to enhance depth-aware features explicitly. To generalize well for even only a monocular video, we apply a pseudo stereo scheme to generate complementary right images. Secondly, we propose a stereo-temporal set classifier, which aggregates stereo-temporal contexts in a universal way for making a consolidated prediction and mitigates transient failures. Finally, we propose a location-agnostic classifier to decouple the location bias from mask prediction and enhance the feature semantics. We extensively validate our approach on three public surgical video datasets, including two benchmarks from EndoVis Challenges and one real radical prostatectomy surgery dataset GraSP. Experimental results demonstrate the promising performances of our method, which consistently achieves comparable or favorable results with previous state-of-the-art approaches.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103387"},"PeriodicalIF":10.7000,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1361841524003128","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Surgical instrument segmentation is instrumental to minimally invasive surgeries and related applications. Most previous methods formulate this task as single-frame-based instance segmentation while ignoring the natural temporal and stereo attributes of a surgical video. As a result, these methods are less robust against the appearance variation through temporal motion and view change. In this work, we propose a novel LACOSTE model that exploits Location-Agnostic COntexts in Stereo and TEmporal images for improved surgical instrument segmentation. Leveraging a query-based segmentation model as core, we design three performance-enhancing modules. Firstly, we design a disparity-guided feature propagation module to enhance depth-aware features explicitly. To generalize well for even only a monocular video, we apply a pseudo stereo scheme to generate complementary right images. Secondly, we propose a stereo-temporal set classifier, which aggregates stereo-temporal contexts in a universal way for making a consolidated prediction and mitigates transient failures. Finally, we propose a location-agnostic classifier to decouple the location bias from mask prediction and enhance the feature semantics. We extensively validate our approach on three public surgical video datasets, including two benchmarks from EndoVis Challenges and one real radical prostatectomy surgery dataset GraSP. Experimental results demonstrate the promising performances of our method, which consistently achieves comparable or favorable results with previous state-of-the-art approaches.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

LACOSTE：利用立体和时间背景进行手术器械分割。

手术器械分割对微创手术和相关应用至关重要。之前的大多数方法都将这一任务制定为基于单帧的实例分割，而忽略了手术视频的自然时间和立体属性。因此，这些方法对因时间运动和视图变化而产生的外观变化的鲁棒性较差。在这项工作中，我们提出了一种新颖的 LACOSTE 模型，该模型利用立体和胫骨图像中的位置诊断文本来改进手术器械分割。以基于查询的分割模型为核心，我们设计了三个性能增强模块。首先，我们设计了差异引导的特征传播模块，以明确增强深度感知特征。为了使单眼视频也能很好地通用，我们采用了一种伪立体方案来生成互补的右侧图像。其次，我们提出了立体-时间集分类器，它以通用方式聚合立体-时间上下文，以进行综合预测，并减少瞬时失效。最后，我们提出了一种位置无关分类器，将位置偏差从掩码预测中分离出来，并增强了特征语义。我们在三个公共手术视频数据集上广泛验证了我们的方法，包括两个来自 EndoVis Challenges 的基准数据集和一个真实的根治性前列腺切除手术数据集 GraSP。实验结果表明，我们的方法具有良好的性能，与之前最先进的方法相比，我们的方法始终能取得相当或更好的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Medical image analysis 工程技术-工程：生物医学

CiteScore

22.10

自引率

6.40%

发文量

309

审稿时长

6.6 months

期刊介绍： Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.