Accurate tissue segmentation of infant brain in magnetic resonance (MR) images is crucial for charting early brain development and identifying biomarkers. Due to ongoing myelination and maturation, in the isointense phase (6-9 months of age), the gray and white matters of infant brain exhibit similar intensity levels in MR images, posing significant challenges for tissue segmentation. Meanwhile, in the adult-like phase around 12 months of age, the MR images show high tissue contrast and can be easily segmented. In this paper, we propose to effectively exploit adult-like phase images to achieve robustmulti-view isointense infant brain segmentation. Specifically, in one way, we transfer adult-like phase images to the isointense view, which have similar tissue contrast as the isointense phase images, and use the transferred images to train an isointense-view segmentation network. On the other way, we transfer isointense phase images to the adult-like view, which have enhanced tissue contrast, for training a segmentation network in the adult-like view. The segmentation networks of different views form a multi-path architecture that performs multi-view learning to further boost the segmentation performance. Since anatomy-preserving style transfer is key to the downstream segmentation task, we develop a Disentangled Cycle-consistent Adversarial Network (DCAN) with strong regularization terms to accurately transfer realistic tissue contrast between isointense and adult-like phase images while still maintaining their structural consistency. Experiments on both NDAR and iSeg-2019 datasets demonstrate a significant superior performance of our method over the state-of-the-art methods.
{"title":"Transferring Adult-like Phase Images for Robust Multi-view Isointense Infant Brain Segmentation.","authors":"Huabing Liu, Jiawei Huang, Dengqiang Jia, Qian Wang, Jun Xu, Dinggang Shen","doi":"10.1109/TMI.2024.3430348","DOIUrl":"https://doi.org/10.1109/TMI.2024.3430348","url":null,"abstract":"<p><p>Accurate tissue segmentation of infant brain in magnetic resonance (MR) images is crucial for charting early brain development and identifying biomarkers. Due to ongoing myelination and maturation, in the isointense phase (6-9 months of age), the gray and white matters of infant brain exhibit similar intensity levels in MR images, posing significant challenges for tissue segmentation. Meanwhile, in the adult-like phase around 12 months of age, the MR images show high tissue contrast and can be easily segmented. In this paper, we propose to effectively exploit adult-like phase images to achieve robustmulti-view isointense infant brain segmentation. Specifically, in one way, we transfer adult-like phase images to the isointense view, which have similar tissue contrast as the isointense phase images, and use the transferred images to train an isointense-view segmentation network. On the other way, we transfer isointense phase images to the adult-like view, which have enhanced tissue contrast, for training a segmentation network in the adult-like view. The segmentation networks of different views form a multi-path architecture that performs multi-view learning to further boost the segmentation performance. Since anatomy-preserving style transfer is key to the downstream segmentation task, we develop a Disentangled Cycle-consistent Adversarial Network (DCAN) with strong regularization terms to accurately transfer realistic tissue contrast between isointense and adult-like phase images while still maintaining their structural consistency. Experiments on both NDAR and iSeg-2019 datasets demonstrate a significant superior performance of our method over the state-of-the-art methods.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141725397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In fully supervised learning-based medical image classification, the robustness of a trained model is influenced by its exposure to the range of candidate disease classes. Generalized Zero Shot Learning (GZSL) aims to correctly predict seen and novel unseen classes. Current GZSL approaches have focused mostly on the single-label case. However, it is common for chest X-rays to be labelled with multiple disease classes. We propose a novel multi-modal multi-label GZSL approach that leverages feature disentanglement andmulti-modal information to synthesize features of unseen classes. Disease labels are processed through a pre-trained BioBert model to obtain text embeddings that are used to create a dictionary encoding similarity among different labels. We then use disentangled features and graph aggregation to learn a second dictionary of inter-label similarities. A subsequent clustering step helps to identify representative vectors for each class. The multi-modal multi-label dictionaries and the class representative vectors are used to guide the feature synthesis step, which is the most important component of our pipeline, for generating realistic multi-label disease samples of seen and unseen classes. Our method is benchmarked against multiple competing methods and we outperform all of them based on experiments conducted on the publicly available NIH and CheXpert chest X-ray datasets.
在基于完全监督学习的医学图像分类中,训练好的模型的鲁棒性会受到候选疾病类别范围的影响。广义零点学习(Generalized Zero Shot Learning,GZSL)旨在正确预测已见和未见的新类别。目前的 GZSL 方法主要侧重于单标签情况。然而,胸部 X 光片上标有多种疾病类别的情况很常见。我们提出了一种新颖的多模态多标签 GZSL 方法,该方法利用特征分解和多模态信息来综合未见类别的特征。疾病标签通过预训练的 BioBert 模型进行处理,以获得文本嵌入,用于创建不同标签之间相似性的编码字典。然后,我们使用分解特征和图聚合来学习第二份标签间相似性字典。随后的聚类步骤有助于确定每个类别的代表性向量。多模式多标签字典和类别代表向量用于指导特征合成步骤,这是我们管道中最重要的组成部分,用于生成真实的已见和未见类别的多标签疾病样本。根据在公开的美国国立卫生研究院(NIH)和 CheXpert 胸部 X 光数据集上进行的实验,我们的方法对多个竞争方法进行了基准测试,结果优于所有竞争方法。
{"title":"Multi-Label Generalized Zero Shot Chest Xray Classification By Combining Image-Text Information With Feature Disentanglement.","authors":"Dwarikanath Mahapatra, Antonio Jimeno Yepes, Behzad Bozorgtabar, Sudipta Roy, Zongyuan Ge, Mauricio Reyes","doi":"10.1109/TMI.2024.3429471","DOIUrl":"https://doi.org/10.1109/TMI.2024.3429471","url":null,"abstract":"<p><p>In fully supervised learning-based medical image classification, the robustness of a trained model is influenced by its exposure to the range of candidate disease classes. Generalized Zero Shot Learning (GZSL) aims to correctly predict seen and novel unseen classes. Current GZSL approaches have focused mostly on the single-label case. However, it is common for chest X-rays to be labelled with multiple disease classes. We propose a novel multi-modal multi-label GZSL approach that leverages feature disentanglement andmulti-modal information to synthesize features of unseen classes. Disease labels are processed through a pre-trained BioBert model to obtain text embeddings that are used to create a dictionary encoding similarity among different labels. We then use disentangled features and graph aggregation to learn a second dictionary of inter-label similarities. A subsequent clustering step helps to identify representative vectors for each class. The multi-modal multi-label dictionaries and the class representative vectors are used to guide the feature synthesis step, which is the most important component of our pipeline, for generating realistic multi-label disease samples of seen and unseen classes. Our method is benchmarked against multiple competing methods and we outperform all of them based on experiments conducted on the publicly available NIH and CheXpert chest X-ray datasets.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141636246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-16DOI: 10.1109/TMI.2024.3429148
Chi Wen, Mang Ye, He Li, Ting Chen, Xuan Xiao
Existing deep learning methods have achieved remarkable results in diagnosing retinal diseases, showcasing the potential of advanced AI in ophthalmology. However, the black-box nature of these methods obscures the decision-making process, compromising their trustworthiness and acceptability. Inspired by the concept-based approaches and recognizing the intrinsic correlation between retinal lesions and diseases, we regard retinal lesions as concepts and propose an inherently interpretable framework designed to enhance both the performance and explainability of diagnostic models. Leveraging the transformer architecture, known for its proficiency in capturing long-range dependencies, our model can effectively identify lesion features. By integrating with image-level annotations, it achieves the alignment of lesion concepts with human cognition under the guidance of a retinal foundation model. Furthermore, to attain interpretability without losing lesion-specific information, our method employs a classifier built on a cross-attention mechanism for disease diagnosis and explanation, where explanations are grounded in the contributions of human-understandable lesion concepts and their visual localization. Notably, due to the structure and inherent interpretability of our model, clinicians can implement concept-level interventions to correct the diagnostic errors by simply adjusting erroneous lesion predictions. Experiments conducted on four fundus image datasets demonstrate that our method achieves favorable performance against state-of-the-art methods while providing faithful explanations and enabling conceptlevel interventions. Our code is publicly available at https://github.com/Sorades/CLAT.
{"title":"Concept-based Lesion Aware Transformer for Interpretable Retinal Disease Diagnosis.","authors":"Chi Wen, Mang Ye, He Li, Ting Chen, Xuan Xiao","doi":"10.1109/TMI.2024.3429148","DOIUrl":"https://doi.org/10.1109/TMI.2024.3429148","url":null,"abstract":"<p><p>Existing deep learning methods have achieved remarkable results in diagnosing retinal diseases, showcasing the potential of advanced AI in ophthalmology. However, the black-box nature of these methods obscures the decision-making process, compromising their trustworthiness and acceptability. Inspired by the concept-based approaches and recognizing the intrinsic correlation between retinal lesions and diseases, we regard retinal lesions as concepts and propose an inherently interpretable framework designed to enhance both the performance and explainability of diagnostic models. Leveraging the transformer architecture, known for its proficiency in capturing long-range dependencies, our model can effectively identify lesion features. By integrating with image-level annotations, it achieves the alignment of lesion concepts with human cognition under the guidance of a retinal foundation model. Furthermore, to attain interpretability without losing lesion-specific information, our method employs a classifier built on a cross-attention mechanism for disease diagnosis and explanation, where explanations are grounded in the contributions of human-understandable lesion concepts and their visual localization. Notably, due to the structure and inherent interpretability of our model, clinicians can implement concept-level interventions to correct the diagnostic errors by simply adjusting erroneous lesion predictions. Experiments conducted on four fundus image datasets demonstrate that our method achieves favorable performance against state-of-the-art methods while providing faithful explanations and enabling conceptlevel interventions. Our code is publicly available at https://github.com/Sorades/CLAT.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141629617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-16DOI: 10.1109/TMI.2024.3424978
Jun Li, Tongkun Su, Baoliang Zhao, Faqin Lv, Qiong Wang, Nassir Navab, Ying Hu, Zhongliang Jiang
Automatic report generation has arisen as a significant research area in computer-aided diagnosis, aiming to alleviate the burden on clinicians by generating reports automatically based on medical images. In this work, we propose a novel framework for automatic ultrasound report generation, leveraging a combination of unsupervised and supervised learning methods to aid the report generation process. Our framework incorporates unsupervised learning methods to extract potential knowledge from ultrasound text reports, serving as the prior information to guide the model in aligning visual and textual features, thereby addressing the challenge of feature discrepancy. Additionally, we design a global semantic comparison mechanism to enhance the performance of generating more comprehensive and accurate medical reports. To enable the implementation of ultrasound report generation, we constructed three large-scale ultrasound image-text datasets from different organs for training and validation purposes. Extensive evaluations with other state-of-the-art approaches exhibit its superior performance across all three datasets. Code and dataset are valuable at this link.
{"title":"Ultrasound Report Generation with Cross-Modality Feature Alignment via Unsupervised Guidance.","authors":"Jun Li, Tongkun Su, Baoliang Zhao, Faqin Lv, Qiong Wang, Nassir Navab, Ying Hu, Zhongliang Jiang","doi":"10.1109/TMI.2024.3424978","DOIUrl":"https://doi.org/10.1109/TMI.2024.3424978","url":null,"abstract":"<p><p>Automatic report generation has arisen as a significant research area in computer-aided diagnosis, aiming to alleviate the burden on clinicians by generating reports automatically based on medical images. In this work, we propose a novel framework for automatic ultrasound report generation, leveraging a combination of unsupervised and supervised learning methods to aid the report generation process. Our framework incorporates unsupervised learning methods to extract potential knowledge from ultrasound text reports, serving as the prior information to guide the model in aligning visual and textual features, thereby addressing the challenge of feature discrepancy. Additionally, we design a global semantic comparison mechanism to enhance the performance of generating more comprehensive and accurate medical reports. To enable the implementation of ultrasound report generation, we constructed three large-scale ultrasound image-text datasets from different organs for training and validation purposes. Extensive evaluations with other state-of-the-art approaches exhibit its superior performance across all three datasets. Code and dataset are valuable at this link.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141629619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-16DOI: 10.1109/TMI.2024.3429403
Jiaxuan Liu, Haitao Li, Bolun Zeng, Huixiang Wang, Ron Kikinis, Leo Joskowicz, Xiaojun Chen
Computer-assisted preoperative planning of pelvic fracture reduction surgery has the potential to increase the accuracy of the surgery and to reduce complications. However, the diversity of the pelvic fractures and the disturbance of small fracture fragments present a great challenge to perform reliable automatic preoperative planning. In this paper, we present a comprehensive and automatic preoperative planning pipeline for pelvic fracture surgery. It includes pelvic fracture labeling, reduction planning of the fracture, and customized screw implantation. First, automatic bone fracture labeling is performed based on the separation of the fracture sections. Then, fracture reduction planning is performed based on automatic extraction and pairing of the fracture surfaces. Finally, screw implantation is planned using the adjoint fracture surfaces. The proposed pipeline was tested on different types of pelvic fracture in 14 clinical cases. Our method achieved a translational and rotational accuracy of 2.56 mm and 3.31° in reduction planning. For fixation planning, a clinical acceptance rate of 86.7% was achieved. The results demonstrate the feasibility of the clinical application of our method. Our method has shown accuracy and reliability for complex multi-body bone fractures, which may provide effective clinical preoperative guidance and may improve the accuracy of pelvic fracture reduction surgery.
{"title":"An end-to-end geometry-based pipeline for automatic preoperative surgical planning of pelvic fracture reduction and fixation.","authors":"Jiaxuan Liu, Haitao Li, Bolun Zeng, Huixiang Wang, Ron Kikinis, Leo Joskowicz, Xiaojun Chen","doi":"10.1109/TMI.2024.3429403","DOIUrl":"https://doi.org/10.1109/TMI.2024.3429403","url":null,"abstract":"<p><p>Computer-assisted preoperative planning of pelvic fracture reduction surgery has the potential to increase the accuracy of the surgery and to reduce complications. However, the diversity of the pelvic fractures and the disturbance of small fracture fragments present a great challenge to perform reliable automatic preoperative planning. In this paper, we present a comprehensive and automatic preoperative planning pipeline for pelvic fracture surgery. It includes pelvic fracture labeling, reduction planning of the fracture, and customized screw implantation. First, automatic bone fracture labeling is performed based on the separation of the fracture sections. Then, fracture reduction planning is performed based on automatic extraction and pairing of the fracture surfaces. Finally, screw implantation is planned using the adjoint fracture surfaces. The proposed pipeline was tested on different types of pelvic fracture in 14 clinical cases. Our method achieved a translational and rotational accuracy of 2.56 mm and 3.31° in reduction planning. For fixation planning, a clinical acceptance rate of 86.7% was achieved. The results demonstrate the feasibility of the clinical application of our method. Our method has shown accuracy and reliability for complex multi-body bone fractures, which may provide effective clinical preoperative guidance and may improve the accuracy of pelvic fracture reduction surgery.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141629616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-16DOI: 10.1109/TMI.2024.3424976
Lei Mou, Qifeng Yan, Jinghui Lin, Yifan Zhao, Yonghuai Liu, Shaodong Ma, Jiong Zhang, Wenhao Lv, Tao Zhou, Alejandro F Frangi, Yitian Zhao
Time-of-flight magnetic resonance angiography (TOF-MRA) is the least invasive and ionizing radiation-free approach for cerebrovascular imaging, but variations in imaging artifacts across different clinical centers and imaging vendors result in inter-site and inter-vendor heterogeneity, making its accurate and robust cerebrovascular segmentation challenging. Moreover, the limited availability and quality of annotated data pose further challenges for segmentation methods to generalize well to unseen datasets. In this paper, we construct the largest and most diverse TOF-MRA dataset (COSTA) from 8 individual imaging centers, with all the volumes manually annotated. Then we propose a novel network for cerebrovascular segmentation, namely CESAR, with the ability to tackle feature granularity and image style heterogeneity issues. Specifically, a coarse-to-fine architecture is implemented to refine cerebrovascular segmentation in an iterative manner. An automatic feature selection module is proposed to selectively fuse global long-range dependencies and local contextual information of cerebrovascular structures. A style self-consistency loss is then introduced to explicitly align diverse styles of TOF-MRA images to a standardized one. Extensive experimental results on the COSTA dataset demonstrate the effectiveness of our CESAR network against state-of-the-art methods. We have made 6 subsets of COSTA with the source code online available, in order to promote relevant research in the community.
飞行时间磁共振血管成像(TOF-MRA)是脑血管成像中侵入性最小、无电离辐射的方法,但不同临床中心和成像供应商的成像伪影存在差异,导致了不同临床中心和供应商之间的异质性,使其准确、稳健的脑血管分割具有挑战性。此外,注释数据的可用性和质量有限,也给分割方法在未见过的数据集上良好推广带来了更多挑战。在本文中,我们构建了最大、最多样化的 TOF-MRA 数据集 (COSTA),该数据集来自 8 个独立的成像中心,所有体量均由人工标注。然后,我们提出了一种用于脑血管分割的新型网络,即 CESAR,它能够解决特征粒度和图像风格异质性问题。具体来说,我们采用了一种从粗到细的架构,以迭代的方式完善脑血管分割。提出了一个自动特征选择模块,以有选择性地融合脑血管结构的全局长程依赖性和局部上下文信息。然后引入风格自一致性损失,明确地将不同风格的 TOF-MRA 图像调整为标准化图像。在 COSTA 数据集上的大量实验结果表明,我们的 CESAR 网络与最先进的方法相比非常有效。我们在线提供了 COSTA 的 6 个子集及源代码,以促进社区的相关研究。
{"title":"COSTA: A Multi-center TOF-MRA Dataset and A Style Self-Consistency Network for Cerebrovascular Segmentation.","authors":"Lei Mou, Qifeng Yan, Jinghui Lin, Yifan Zhao, Yonghuai Liu, Shaodong Ma, Jiong Zhang, Wenhao Lv, Tao Zhou, Alejandro F Frangi, Yitian Zhao","doi":"10.1109/TMI.2024.3424976","DOIUrl":"https://doi.org/10.1109/TMI.2024.3424976","url":null,"abstract":"<p><p>Time-of-flight magnetic resonance angiography (TOF-MRA) is the least invasive and ionizing radiation-free approach for cerebrovascular imaging, but variations in imaging artifacts across different clinical centers and imaging vendors result in inter-site and inter-vendor heterogeneity, making its accurate and robust cerebrovascular segmentation challenging. Moreover, the limited availability and quality of annotated data pose further challenges for segmentation methods to generalize well to unseen datasets. In this paper, we construct the largest and most diverse TOF-MRA dataset (COSTA) from 8 individual imaging centers, with all the volumes manually annotated. Then we propose a novel network for cerebrovascular segmentation, namely CESAR, with the ability to tackle feature granularity and image style heterogeneity issues. Specifically, a coarse-to-fine architecture is implemented to refine cerebrovascular segmentation in an iterative manner. An automatic feature selection module is proposed to selectively fuse global long-range dependencies and local contextual information of cerebrovascular structures. A style self-consistency loss is then introduced to explicitly align diverse styles of TOF-MRA images to a standardized one. Extensive experimental results on the COSTA dataset demonstrate the effectiveness of our CESAR network against state-of-the-art methods. We have made 6 subsets of COSTA with the source code online available, in order to promote relevant research in the community.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141629618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-11DOI: 10.1109/TMI.2024.3426953
Hongqiu Wang, Guang Yang, Shichen Zhang, Jing Qin, Yike Guo, Bo Xu, Yueming Jin, Lei Zhu
Surgical instrument segmentation is fundamentally important for facilitating cognitive intelligence in robot-assisted surgery. Although existing methods have achieved accurate instrument segmentation results, they simultaneously generate segmentation masks of all instruments, which lack the capability to specify a target object and allow an interactive experience. This paper focuses on a novel and essential task in robotic surgery, i.e., Referring Surgical Video Instrument Segmentation (RSVIS), which aims to automatically identify and segment the target surgical instruments from each video frame, referred by a given language expression. This interactive feature offers enhanced user engagement and customized experiences, greatly benefiting the development of the next generation of surgical education systems. To achieve this, this paper constructs two surgery video datasets to promote the RSVIS research. Then, we devise a novel Video-Instrument Synergistic Network (VIS-Net) to learn both video-level and instrument-level knowledge to boost performance, while previous work only utilized video-level information. Meanwhile, we design a Graph-based Relation-aware Module (GRM) to model the correlation between multi-modal information (i.e., textual description and video frame) to facilitate the extraction of instrument-level information. Extensive experimental results on two RSVIS datasets exhibit that the VIS-Net can significantly outperform existing state-of-the-art referring segmentation methods. We will release our code and dataset for future research (Git).
{"title":"Video-Instrument Synergistic Network for Referring Video Instrument Segmentation in Robotic Surgery.","authors":"Hongqiu Wang, Guang Yang, Shichen Zhang, Jing Qin, Yike Guo, Bo Xu, Yueming Jin, Lei Zhu","doi":"10.1109/TMI.2024.3426953","DOIUrl":"https://doi.org/10.1109/TMI.2024.3426953","url":null,"abstract":"<p><p>Surgical instrument segmentation is fundamentally important for facilitating cognitive intelligence in robot-assisted surgery. Although existing methods have achieved accurate instrument segmentation results, they simultaneously generate segmentation masks of all instruments, which lack the capability to specify a target object and allow an interactive experience. This paper focuses on a novel and essential task in robotic surgery, i.e., Referring Surgical Video Instrument Segmentation (RSVIS), which aims to automatically identify and segment the target surgical instruments from each video frame, referred by a given language expression. This interactive feature offers enhanced user engagement and customized experiences, greatly benefiting the development of the next generation of surgical education systems. To achieve this, this paper constructs two surgery video datasets to promote the RSVIS research. Then, we devise a novel Video-Instrument Synergistic Network (VIS-Net) to learn both video-level and instrument-level knowledge to boost performance, while previous work only utilized video-level information. Meanwhile, we design a Graph-based Relation-aware Module (GRM) to model the correlation between multi-modal information (i.e., textual description and video frame) to facilitate the extraction of instrument-level information. Extensive experimental results on two RSVIS datasets exhibit that the VIS-Net can significantly outperform existing state-of-the-art referring segmentation methods. We will release our code and dataset for future research (Git).</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141592375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-09DOI: 10.1109/TMI.2024.3425533
Linqin Cai, Haodu Fang, Nuoying Xu, Bo Ren
Medical Visual Question Answering (VQA-Med) is a challenging task that involves answering clinical questions related to medical images. However, most current VQA-Med methods ignore the causal correlation between specific lesion or abnormality features and answers, while also failing to provide accurate explanations for their decisions. To explore the interpretability of VQA-Med, this paper proposes a novel CCIS-MVQA model for VQA-Med based on a counterfactual causal-effect intervention strategy. This model consists of the modified ResNet for image feature extraction, a GloVe decoder for question feature extraction, a bilinear attention network for vision and language feature fusion, and an interpretability generator for producing the interpretability and prediction results. The proposed CCIS-MVQA introduces a layer-wise relevance propagation method to automatically generate counterfactual samples. Additionally, CCIS-MVQA applies counterfactual causal reasoning throughout the training phase to enhance interpretability and generalization. Extensive experiments on three benchmark datasets show that the proposed CCIS-MVQA model outperforms the state-of-the-art methods. Enough visualization results are produced to analyze the interpretability and performance of CCIS-MVQA.
{"title":"Counterfactual Causal-Effect Intervention for Interpretable Medical Visual Question Answering.","authors":"Linqin Cai, Haodu Fang, Nuoying Xu, Bo Ren","doi":"10.1109/TMI.2024.3425533","DOIUrl":"https://doi.org/10.1109/TMI.2024.3425533","url":null,"abstract":"<p><p>Medical Visual Question Answering (VQA-Med) is a challenging task that involves answering clinical questions related to medical images. However, most current VQA-Med methods ignore the causal correlation between specific lesion or abnormality features and answers, while also failing to provide accurate explanations for their decisions. To explore the interpretability of VQA-Med, this paper proposes a novel CCIS-MVQA model for VQA-Med based on a counterfactual causal-effect intervention strategy. This model consists of the modified ResNet for image feature extraction, a GloVe decoder for question feature extraction, a bilinear attention network for vision and language feature fusion, and an interpretability generator for producing the interpretability and prediction results. The proposed CCIS-MVQA introduces a layer-wise relevance propagation method to automatically generate counterfactual samples. Additionally, CCIS-MVQA applies counterfactual causal reasoning throughout the training phase to enhance interpretability and generalization. Extensive experiments on three benchmark datasets show that the proposed CCIS-MVQA model outperforms the state-of-the-art methods. Enough visualization results are produced to analyze the interpretability and performance of CCIS-MVQA.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141565413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-08DOI: 10.1109/TMI.2024.3424505
Ke Zhang, Yan Yang, Jun Yu, Jianping Fan, Hanliang Jiang, Qingming Huang, Weidong Han
The potential of automated radiology report generation in alleviating the time-consuming tasks of radiologists is increasingly being recognized in medical practice. Existing report generation methods have evolved from using image-level features to the latest approach of utilizing anatomical regions, significantly enhancing interpretability. However, directly and simplistically using region features for report generation compromises the capability of relation reasoning and overlooks the common attributes potentially shared across regions. To address these limitations, we propose a novel region-based Attribute Prototype-guided Iterative Scene Graph generation framework (AP-ISG) for report generation, utilizing scene graph generation as an auxiliary task to further enhance interpretability and relational reasoning capability. The core components of AP-ISG are the Iterative Scene Graph Generation (ISGG) module and the Attribute Prototype-guided Learning (APL) module. Specifically, ISSG employs an autoregressive scheme for structural edge reasoning and a contextualization mechanism for relational reasoning. APL enhances intra-prototype matching and reduces inter-prototype semantic overlap in the visual space to fully model the potential attribute commonalities among regions. Extensive experiments on the MIMIC-CXR with Chest ImaGenome datasets demonstrate the superiority of AP-ISG across multiple metrics.
{"title":"Attribute Prototype-guided Iterative Scene Graph for Explainable Radiology Report Generation.","authors":"Ke Zhang, Yan Yang, Jun Yu, Jianping Fan, Hanliang Jiang, Qingming Huang, Weidong Han","doi":"10.1109/TMI.2024.3424505","DOIUrl":"https://doi.org/10.1109/TMI.2024.3424505","url":null,"abstract":"<p><p>The potential of automated radiology report generation in alleviating the time-consuming tasks of radiologists is increasingly being recognized in medical practice. Existing report generation methods have evolved from using image-level features to the latest approach of utilizing anatomical regions, significantly enhancing interpretability. However, directly and simplistically using region features for report generation compromises the capability of relation reasoning and overlooks the common attributes potentially shared across regions. To address these limitations, we propose a novel region-based Attribute Prototype-guided Iterative Scene Graph generation framework (AP-ISG) for report generation, utilizing scene graph generation as an auxiliary task to further enhance interpretability and relational reasoning capability. The core components of AP-ISG are the Iterative Scene Graph Generation (ISGG) module and the Attribute Prototype-guided Learning (APL) module. Specifically, ISSG employs an autoregressive scheme for structural edge reasoning and a contextualization mechanism for relational reasoning. APL enhances intra-prototype matching and reduces inter-prototype semantic overlap in the visual space to fully model the potential attribute commonalities among regions. Extensive experiments on the MIMIC-CXR with Chest ImaGenome datasets demonstrate the superiority of AP-ISG across multiple metrics.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141560572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Medical image analysis poses significant challenges due to limited availability of clinical data, which is crucial for training accurate models. This limitation is further compounded by the specialized and labor-intensive nature of the data annotation process. For example, despite the popularity of computed tomography angiography (CTA) in diagnosing atherosclerosis with an abundance of annotated datasets, magnetic resonance (MR) images stand out with better visualization for soft plaque and vessel wall characterization. However, the higher cost and limited accessibility of MR, as well as time-consuming nature of manual labeling, contribute to fewer annotated datasets. To address these issues, we formulate a multi-modal transfer learning network, named MT-Net, designed to learn from unpaired CTA and sparsely-annotated MR data. Additionally, we harness the Segment Anything Model (SAM) to synthesize additional MR annotations, enriching the training process. Specifically, our method first segments vessel lumen regions followed by precise characterization of carotid artery vessel walls, thereby ensuring both segmentation accuracy and clinical relevance. Validation of our method involved rigorous experimentation on publicly available datasets from COSMOS and CARE-II challenge, demonstrating its superior performance compared to existing state-of-the-art techniques.
{"title":"Carotid Vessel Wall Segmentation Through Domain Aligner, Topological Learning, and Segment Anything Model for Sparse Annotation in MR Images.","authors":"Xibao Li, Xi Ouyang, Jiadong Zhang, Zhongxiang Ding, Yuyao Zhang, Zhong Xue, Feng Shi, Dinggang Shen","doi":"10.1109/TMI.2024.3424884","DOIUrl":"https://doi.org/10.1109/TMI.2024.3424884","url":null,"abstract":"<p><p>Medical image analysis poses significant challenges due to limited availability of clinical data, which is crucial for training accurate models. This limitation is further compounded by the specialized and labor-intensive nature of the data annotation process. For example, despite the popularity of computed tomography angiography (CTA) in diagnosing atherosclerosis with an abundance of annotated datasets, magnetic resonance (MR) images stand out with better visualization for soft plaque and vessel wall characterization. However, the higher cost and limited accessibility of MR, as well as time-consuming nature of manual labeling, contribute to fewer annotated datasets. To address these issues, we formulate a multi-modal transfer learning network, named MT-Net, designed to learn from unpaired CTA and sparsely-annotated MR data. Additionally, we harness the Segment Anything Model (SAM) to synthesize additional MR annotations, enriching the training process. Specifically, our method first segments vessel lumen regions followed by precise characterization of carotid artery vessel walls, thereby ensuring both segmentation accuracy and clinical relevance. Validation of our method involved rigorous experimentation on publicly available datasets from COSMOS and CARE-II challenge, demonstrating its superior performance compared to existing state-of-the-art techniques.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141560573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}