Bridging multi-level gaps: Bidirectional reciprocal cycle framework for text-guided label-efficient segmentation in echocardiography

IF 10.7 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Medical image analysis Pub Date : 2025-03-07 DOI:10.1016/j.media.2025.103536
Zhenxuan Zhang , Heye Zhang , Tieyong Zeng , Guang Yang , Zhenquan Shi , Zhifan Gao
{"title":"Bridging multi-level gaps: Bidirectional reciprocal cycle framework for text-guided label-efficient segmentation in echocardiography","authors":"Zhenxuan Zhang ,&nbsp;Heye Zhang ,&nbsp;Tieyong Zeng ,&nbsp;Guang Yang ,&nbsp;Zhenquan Shi ,&nbsp;Zhifan Gao","doi":"10.1016/j.media.2025.103536","DOIUrl":null,"url":null,"abstract":"<div><div>Text-guided visual understanding is a potential solution for downstream task learning in echocardiography. It can reduce reliance on labeled large datasets and facilitate learning clinical tasks. This is because the text can embed highly condensed clinical information into predictions for visual tasks. The contrastive language-image pretraining (CLIP) based methods extract image-text features by constructing a contrastive learning pre-train process in a sequence of matched text and images. These methods adapt the pre-trained network parameters to improve downstream task performance with text guidance. However, these methods still have the challenge of the multi-level gap between image and text. It mainly stems from spatial-level, contextual-level, and domain-level gaps. It is difficult to deal with medical image–text pairs and dense prediction tasks. Therefore, we propose a bidirectional reciprocal cycle (BRC) framework to bridge the multi-level gaps. First, the BRC constructs pyramid reciprocal alignments of embedded global and local image–text feature representations. This matches complex medical expertise with corresponding phenomena. Second, BRC enforces the forward inference to be consistent with the reverse mapping (i.e., the text <span><math><mo>→</mo></math></span> feature is consistent with the feature <span><math><mo>→</mo></math></span> text or feature <span><math><mo>→</mo></math></span> image). This enforces the perception of the contextual relationship between input data and feature. Third, the BRC can adapt to the specific downstream segmentation task. This embeds complex text information to directly guide downstream tasks with a cross-modal attention mechanism. Compared with 22 existing methods, our BRC can achieve state-of-the-art performance on segmentation tasks (DSC = 95.2%). Extensive experiments on 11048 patients show that our method can significantly improve the accuracy and reduce the reliance on labeled data (DSC increased from 81.5% to 86.6% with text assistance in 1% labeled proportion data).</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"102 ","pages":"Article 103536"},"PeriodicalIF":10.7000,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1361841525000830","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Text-guided visual understanding is a potential solution for downstream task learning in echocardiography. It can reduce reliance on labeled large datasets and facilitate learning clinical tasks. This is because the text can embed highly condensed clinical information into predictions for visual tasks. The contrastive language-image pretraining (CLIP) based methods extract image-text features by constructing a contrastive learning pre-train process in a sequence of matched text and images. These methods adapt the pre-trained network parameters to improve downstream task performance with text guidance. However, these methods still have the challenge of the multi-level gap between image and text. It mainly stems from spatial-level, contextual-level, and domain-level gaps. It is difficult to deal with medical image–text pairs and dense prediction tasks. Therefore, we propose a bidirectional reciprocal cycle (BRC) framework to bridge the multi-level gaps. First, the BRC constructs pyramid reciprocal alignments of embedded global and local image–text feature representations. This matches complex medical expertise with corresponding phenomena. Second, BRC enforces the forward inference to be consistent with the reverse mapping (i.e., the text feature is consistent with the feature text or feature image). This enforces the perception of the contextual relationship between input data and feature. Third, the BRC can adapt to the specific downstream segmentation task. This embeds complex text information to directly guide downstream tasks with a cross-modal attention mechanism. Compared with 22 existing methods, our BRC can achieve state-of-the-art performance on segmentation tasks (DSC = 95.2%). Extensive experiments on 11048 patients show that our method can significantly improve the accuracy and reduce the reliance on labeled data (DSC increased from 81.5% to 86.6% with text assistance in 1% labeled proportion data).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
求助全文
约1分钟内获得全文 去求助
来源期刊
Medical image analysis
Medical image analysis 工程技术-工程:生物医学
CiteScore
22.10
自引率
6.40%
发文量
309
审稿时长
6.6 months
期刊介绍: Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.
期刊最新文献
MonoPCC: Photometric-invariant cycle constraint for monocular depth estimation of endoscopic images FedBM: Stealing knowledge from pre-trained language models for heterogeneous federated learning Bridging multi-level gaps: Bidirectional reciprocal cycle framework for text-guided label-efficient segmentation in echocardiography SpinFlowSim: A blood flow simulation framework for histology-informed diffusion MRI microvasculature mapping in cancer Local salient location-aware anomaly mask synthesis for pulmonary disease anomaly detection and lesion localization in CT images
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1