LACOSTE: Exploiting stereo and temporal contexts for surgical instrument segmentation

IF 10.7 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Medical image analysis Pub Date : 2024-11-12 DOI:10.1016/j.media.2024.103387
Qiyuan Wang , Shang Zhao , Zikang Xu , S. Kevin Zhou
{"title":"LACOSTE: Exploiting stereo and temporal contexts for surgical instrument segmentation","authors":"Qiyuan Wang ,&nbsp;Shang Zhao ,&nbsp;Zikang Xu ,&nbsp;S. Kevin Zhou","doi":"10.1016/j.media.2024.103387","DOIUrl":null,"url":null,"abstract":"<div><div>Surgical instrument segmentation is instrumental to minimally invasive surgeries and related applications. Most previous methods formulate this task as single-frame-based instance segmentation while ignoring the natural temporal and stereo attributes of a surgical video. As a result, these methods are less robust against the appearance variation through temporal motion and view change. In this work, we propose a novel <strong>LACOSTE</strong> model that exploits <strong>L</strong>ocation-<strong>A</strong>gnostic <strong>CO</strong>ntexts in <strong>S</strong>tereo and <strong>TE</strong>mporal images for improved surgical instrument segmentation. Leveraging a query-based segmentation model as core, we design three performance-enhancing modules. Firstly, we design a disparity-guided feature propagation module to enhance depth-aware features explicitly. To generalize well for even only a monocular video, we apply a pseudo stereo scheme to generate complementary right images. Secondly, we propose a stereo-temporal set classifier, which aggregates stereo-temporal contexts in a universal way for making a consolidated prediction and mitigates transient failures. Finally, we propose a location-agnostic classifier to decouple the location bias from mask prediction and enhance the feature semantics. We extensively validate our approach on three public surgical video datasets, including two benchmarks from EndoVis Challenges and one real radical prostatectomy surgery dataset GraSP. Experimental results demonstrate the promising performances of our method, which consistently achieves comparable or favorable results with previous state-of-the-art approaches.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103387"},"PeriodicalIF":10.7000,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1361841524003128","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Surgical instrument segmentation is instrumental to minimally invasive surgeries and related applications. Most previous methods formulate this task as single-frame-based instance segmentation while ignoring the natural temporal and stereo attributes of a surgical video. As a result, these methods are less robust against the appearance variation through temporal motion and view change. In this work, we propose a novel LACOSTE model that exploits Location-Agnostic COntexts in Stereo and TEmporal images for improved surgical instrument segmentation. Leveraging a query-based segmentation model as core, we design three performance-enhancing modules. Firstly, we design a disparity-guided feature propagation module to enhance depth-aware features explicitly. To generalize well for even only a monocular video, we apply a pseudo stereo scheme to generate complementary right images. Secondly, we propose a stereo-temporal set classifier, which aggregates stereo-temporal contexts in a universal way for making a consolidated prediction and mitigates transient failures. Finally, we propose a location-agnostic classifier to decouple the location bias from mask prediction and enhance the feature semantics. We extensively validate our approach on three public surgical video datasets, including two benchmarks from EndoVis Challenges and one real radical prostatectomy surgery dataset GraSP. Experimental results demonstrate the promising performances of our method, which consistently achieves comparable or favorable results with previous state-of-the-art approaches.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
LACOSTE:利用立体和时间背景进行手术器械分割。
手术器械分割对微创手术和相关应用至关重要。之前的大多数方法都将这一任务制定为基于单帧的实例分割,而忽略了手术视频的自然时间和立体属性。因此,这些方法对因时间运动和视图变化而产生的外观变化的鲁棒性较差。在这项工作中,我们提出了一种新颖的 LACOSTE 模型,该模型利用立体和胫骨图像中的位置诊断文本来改进手术器械分割。以基于查询的分割模型为核心,我们设计了三个性能增强模块。首先,我们设计了差异引导的特征传播模块,以明确增强深度感知特征。为了使单眼视频也能很好地通用,我们采用了一种伪立体方案来生成互补的右侧图像。其次,我们提出了立体-时间集分类器,它以通用方式聚合立体-时间上下文,以进行综合预测,并减少瞬时失效。最后,我们提出了一种位置无关分类器,将位置偏差从掩码预测中分离出来,并增强了特征语义。我们在三个公共手术视频数据集上广泛验证了我们的方法,包括两个来自 EndoVis Challenges 的基准数据集和一个真实的根治性前列腺切除手术数据集 GraSP。实验结果表明,我们的方法具有良好的性能,与之前最先进的方法相比,我们的方法始终能取得相当或更好的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Medical image analysis
Medical image analysis 工程技术-工程:生物医学
CiteScore
22.10
自引率
6.40%
发文量
309
审稿时长
6.6 months
期刊介绍: Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.
期刊最新文献
LACOSTE: Exploiting stereo and temporal contexts for surgical instrument segmentation Enhancing chest X-ray datasets with privacy-preserving large language models and multi-type annotations: A data-driven approach for improved classification IGUANe: A 3D generalizable CycleGAN for multicenter harmonization of brain MR images Large-scale multi-center CT and MRI segmentation of pancreas with deep learning Multi-task learning with cross-task consistency for improved depth estimation in colonoscopy
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1