SAMCT: 允许无人工任务指示器提示的任何 CT 分段

Xian Lin;Yangyang Xiang;Zhehao Wang;Kwang-Ting Cheng;Zengqiang Yan;Li Yu
{"title":"SAMCT: 允许无人工任务指示器提示的任何 CT 分段","authors":"Xian Lin;Yangyang Xiang;Zhehao Wang;Kwang-Ting Cheng;Zengqiang Yan;Li Yu","doi":"10.1109/TMI.2024.3493456","DOIUrl":null,"url":null,"abstract":"Segment anything model (SAM), a foundation model with superior versatility and generalization across diverse segmentation tasks, has attracted widespread attention in medical imaging. However, it has been proved that SAM would encounter severe performance degradation due to the lack of medical knowledge in training and local feature encoding. Though several SAM-based models have been proposed for tuning SAM in medical imaging, they still suffer from insufficient feature extraction and highly rely on high-quality prompts. In this paper, we propose a powerful foundation model SAMCT allowing labor-free prompts and train it on a collected large CT dataset consisting of 1.1M CT images and 5M masks from public datasets. Specifically, based on SAM, SAMCT is further equipped with a U-shaped CNN image encoder, a cross-branch interaction module, and a task-indicator prompt encoder. The U-shaped CNN image encoder works in parallel with the ViT image encoder in SAM to supplement local features. Cross-branch interaction enhances the feature expression capability of the CNN image encoder and the ViT image encoder by exchanging global perception and local features from one to the other. The task-indicator prompt encoder is a plug-and-play component to effortlessly encode task-related indicators into prompt embeddings. In this way, SAMCT can work in an automatic manner in addition to the semi-automatic interactive strategy in SAM. Extensive experiments demonstrate the superiority of SAMCT against the state-of-the-art task-specific and SAM-based medical foundation models on various tasks. The code, data, and model checkpoints are available at <uri>https://github.com/xianlin7/SAMCT</uri>.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 3","pages":"1386-1399"},"PeriodicalIF":0.0000,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SAMCT: Segment Any CT Allowing Labor-Free Task-Indicator Prompts\",\"authors\":\"Xian Lin;Yangyang Xiang;Zhehao Wang;Kwang-Ting Cheng;Zengqiang Yan;Li Yu\",\"doi\":\"10.1109/TMI.2024.3493456\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Segment anything model (SAM), a foundation model with superior versatility and generalization across diverse segmentation tasks, has attracted widespread attention in medical imaging. However, it has been proved that SAM would encounter severe performance degradation due to the lack of medical knowledge in training and local feature encoding. Though several SAM-based models have been proposed for tuning SAM in medical imaging, they still suffer from insufficient feature extraction and highly rely on high-quality prompts. In this paper, we propose a powerful foundation model SAMCT allowing labor-free prompts and train it on a collected large CT dataset consisting of 1.1M CT images and 5M masks from public datasets. Specifically, based on SAM, SAMCT is further equipped with a U-shaped CNN image encoder, a cross-branch interaction module, and a task-indicator prompt encoder. The U-shaped CNN image encoder works in parallel with the ViT image encoder in SAM to supplement local features. Cross-branch interaction enhances the feature expression capability of the CNN image encoder and the ViT image encoder by exchanging global perception and local features from one to the other. The task-indicator prompt encoder is a plug-and-play component to effortlessly encode task-related indicators into prompt embeddings. In this way, SAMCT can work in an automatic manner in addition to the semi-automatic interactive strategy in SAM. Extensive experiments demonstrate the superiority of SAMCT against the state-of-the-art task-specific and SAM-based medical foundation models on various tasks. The code, data, and model checkpoints are available at <uri>https://github.com/xianlin7/SAMCT</uri>.\",\"PeriodicalId\":94033,\"journal\":{\"name\":\"IEEE transactions on medical imaging\",\"volume\":\"44 3\",\"pages\":\"1386-1399\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on medical imaging\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10746534/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on medical imaging","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10746534/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

任意分割模型(SAM)作为一种具有较强通用性和通用性的基础模型,在医学影像领域受到了广泛的关注。然而,事实证明,由于训练中缺乏医学知识和局部特征编码,SAM会遇到严重的性能下降。虽然已经提出了几种基于SAM的模型来优化医学成像中的SAM,但它们仍然存在特征提取不足和高度依赖高质量提示的问题。在本文中,我们提出了一个强大的基础模型SAMCT,允许人工提示,并在收集的大型CT数据集上进行训练,该数据集包括来自公共数据集的1.1M张CT图像和5M个掩模。具体而言,SAMCT在SAM的基础上,进一步配置了u形CNN图像编码器、跨分支交互模块和任务指示提示编码器。u形CNN图像编码器与SAM中的ViT图像编码器并行工作,以补充局部特征。跨分支交互通过交换全局感知和局部特征,增强了CNN图像编码器和ViT图像编码器的特征表达能力。任务指示符提示编码器是一个即插即用组件,可以毫不费力地将任务相关的指示符编码到提示嵌入中。这样,除了SAM中的半自动交互策略外,SAMCT还可以以自动方式工作。广泛的实验证明了SAMCT在各种任务上优于最先进的特定任务和基于sam的医学基础模型。代码、数据和模型检查点可在https://github.com/xianlin7/SAMCT上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
SAMCT: Segment Any CT Allowing Labor-Free Task-Indicator Prompts
Segment anything model (SAM), a foundation model with superior versatility and generalization across diverse segmentation tasks, has attracted widespread attention in medical imaging. However, it has been proved that SAM would encounter severe performance degradation due to the lack of medical knowledge in training and local feature encoding. Though several SAM-based models have been proposed for tuning SAM in medical imaging, they still suffer from insufficient feature extraction and highly rely on high-quality prompts. In this paper, we propose a powerful foundation model SAMCT allowing labor-free prompts and train it on a collected large CT dataset consisting of 1.1M CT images and 5M masks from public datasets. Specifically, based on SAM, SAMCT is further equipped with a U-shaped CNN image encoder, a cross-branch interaction module, and a task-indicator prompt encoder. The U-shaped CNN image encoder works in parallel with the ViT image encoder in SAM to supplement local features. Cross-branch interaction enhances the feature expression capability of the CNN image encoder and the ViT image encoder by exchanging global perception and local features from one to the other. The task-indicator prompt encoder is a plug-and-play component to effortlessly encode task-related indicators into prompt embeddings. In this way, SAMCT can work in an automatic manner in addition to the semi-automatic interactive strategy in SAM. Extensive experiments demonstrate the superiority of SAMCT against the state-of-the-art task-specific and SAM-based medical foundation models on various tasks. The code, data, and model checkpoints are available at https://github.com/xianlin7/SAMCT.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Multi-view Chest X-Ray Vision-Language Pre-training via Semantic-Aware Masked Language Modeling and High-order Alignment. Investigation of Drug Responses in 3D Tumor Spheroid Models Using Two-Photon Scanning Structured Illumination Super-Resolution Microscopy with Frequency-Specific Denoising Enhancement. Few-Shot Pulmonary Vessel Segmentation Based on Tubular-Aware Prompt-Tuning. Few-Shot Contrastive Learning for Cross-Task Stroke Prognosis Prediction With Multimodal Data. Moving Beyond Functional Connectivity: Time-Series Modeling for fMRI-Based Brain Disorder Classification.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1