Lei Yang;Chenxu Zhai;Hongyong Wang;Yanhong Liu;Guibin Bian
{"title":"A Dual-Branch Fusion Network for Surgical Instrument Segmentation","authors":"Lei Yang;Chenxu Zhai;Hongyong Wang;Yanhong Liu;Guibin Bian","doi":"10.1109/TMRB.2024.3464748","DOIUrl":null,"url":null,"abstract":"Surgical robots have become integral to contemporary surgical procedures, with the precise segmentation of surgical instruments constituting a crucial prerequisite for ensuring their stable functionality. However, numerous factors continue to influence segmentation outcomes, including intricate surgical environments, varying viewpoints, diminished contrast between surgical instruments and surroundings, divergent sizes and shapes of instruments, and imbalanced categories. In this paper, a novel dual-branch fusion network, designated DBF-Net, is presented, which integrates both convolutional neural network (CNN) and Transformer architectures to facilitate automatic segmentation of surgical instruments. For addressing the deficiencies in feature extraction capacity in CNNs or Transformer architectures, a dual-path encoding unit is introduced to proficiently represent local detail features and global context. Meanwhile, to enhance the fusion of features extracted from the dual paths, a CNN-Transformer fusion (CTF) module is proposed, to efficiently merge features from the CNN and Transformer structures, contributing to the effective representation of both local detail features and global contextual features. Further refinement is pursued through an multi-scale feature aggregation (MFAG) module and a local feature enhancement (LFE) module, to refine local contextual features at each layer. In addition, an attention-guided enhancement (AGE) module is incorporated for feature refinement of local feature maps. Finally, an multi-scale global feature representation (MGFR) module is introduced, facilitating the extraction and aggregation of multi-scale features, and a progressive fusion module (PFM) culminates in the aggregation of full-scale features from the decoder. Experimental results underscore the superior segmentation performance of proposed network compared to other state-of-the-art (SOTA) segmentation models for surgical instruments, which have well validated the efficacy of proposed network architecture in advancing the field of surgical instrument segmentation.","PeriodicalId":73318,"journal":{"name":"IEEE transactions on medical robotics and bionics","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on medical robotics and bionics","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10689273/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Surgical robots have become integral to contemporary surgical procedures, with the precise segmentation of surgical instruments constituting a crucial prerequisite for ensuring their stable functionality. However, numerous factors continue to influence segmentation outcomes, including intricate surgical environments, varying viewpoints, diminished contrast between surgical instruments and surroundings, divergent sizes and shapes of instruments, and imbalanced categories. In this paper, a novel dual-branch fusion network, designated DBF-Net, is presented, which integrates both convolutional neural network (CNN) and Transformer architectures to facilitate automatic segmentation of surgical instruments. For addressing the deficiencies in feature extraction capacity in CNNs or Transformer architectures, a dual-path encoding unit is introduced to proficiently represent local detail features and global context. Meanwhile, to enhance the fusion of features extracted from the dual paths, a CNN-Transformer fusion (CTF) module is proposed, to efficiently merge features from the CNN and Transformer structures, contributing to the effective representation of both local detail features and global contextual features. Further refinement is pursued through an multi-scale feature aggregation (MFAG) module and a local feature enhancement (LFE) module, to refine local contextual features at each layer. In addition, an attention-guided enhancement (AGE) module is incorporated for feature refinement of local feature maps. Finally, an multi-scale global feature representation (MGFR) module is introduced, facilitating the extraction and aggregation of multi-scale features, and a progressive fusion module (PFM) culminates in the aggregation of full-scale features from the decoder. Experimental results underscore the superior segmentation performance of proposed network compared to other state-of-the-art (SOTA) segmentation models for surgical instruments, which have well validated the efficacy of proposed network architecture in advancing the field of surgical instrument segmentation.