Sixin Liang , Jianzhou Zhang , Ang Bian , Jiaying You
{"title":"DECA-Net: Dual encoder and cross-attention fusion network for surgical instrument segmentation","authors":"Sixin Liang , Jianzhou Zhang , Ang Bian , Jiaying You","doi":"10.1016/j.patrec.2024.07.019","DOIUrl":null,"url":null,"abstract":"<div><p>Minimally invasive surgery is now widely used to reduce surgical risks, and automatic and accurate instrument segmentation from endoscope videos is crucial for computer-assisted surgical guidance. However, given the rapid development of CNN-based surgical instrument segmentation methods, challenges like motion blur and illumination issues can still cause erroneous segmentation. In this work, we propose a novel dual encoder and cross-attention network (DECA-Net) to overcome these limitations with enhanced context representation and irrelevant feature fusion. Our approach introduces a CNN and Transformer based dual encoder unit for local features and global context information extraction and hence strength the model’s robustness against various illumination conditions. Then an attention fusion module is utilized to combine local feature and global context information and to select instrument-related boundary features. To bridge the semantic gap between encoder and decoder, we propose a parallel dual cross-attention (DCA) block to capture the channel and spatial dependencies across multi-scale encoder to build the enhanced skip connection. Experimental results show that the proposed method achieves state-of-the-art performance on Endovis2017 and Kvasir-instrument datasets.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 130-136"},"PeriodicalIF":3.9000,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865524002228","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Minimally invasive surgery is now widely used to reduce surgical risks, and automatic and accurate instrument segmentation from endoscope videos is crucial for computer-assisted surgical guidance. However, given the rapid development of CNN-based surgical instrument segmentation methods, challenges like motion blur and illumination issues can still cause erroneous segmentation. In this work, we propose a novel dual encoder and cross-attention network (DECA-Net) to overcome these limitations with enhanced context representation and irrelevant feature fusion. Our approach introduces a CNN and Transformer based dual encoder unit for local features and global context information extraction and hence strength the model’s robustness against various illumination conditions. Then an attention fusion module is utilized to combine local feature and global context information and to select instrument-related boundary features. To bridge the semantic gap between encoder and decoder, we propose a parallel dual cross-attention (DCA) block to capture the channel and spatial dependencies across multi-scale encoder to build the enhanced skip connection. Experimental results show that the proposed method achieves state-of-the-art performance on Endovis2017 and Kvasir-instrument datasets.
期刊介绍:
Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition.
Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.