SAMCT：允许无人工任务指示器提示的任何 CT 分段

IEEE transactions on medical imaging Pub Date : 2024-11-07 DOI:10.1109/TMI.2024.3493456

Xian Lin;Yangyang Xiang;Zhehao Wang;Kwang-Ting Cheng;Zengqiang Yan;Li Yu

{"title":"SAMCT：允许无人工任务指示器提示的任何 CT 分段","authors":"Xian Lin;Yangyang Xiang;Zhehao Wang;Kwang-Ting Cheng;Zengqiang Yan;Li Yu","doi":"10.1109/TMI.2024.3493456","DOIUrl":null,"url":null,"abstract":"Segment anything model (SAM), a foundation model with superior versatility and generalization across diverse segmentation tasks, has attracted widespread attention in medical imaging. However, it has been proved that SAM would encounter severe performance degradation due to the lack of medical knowledge in training and local feature encoding. Though several SAM-based models have been proposed for tuning SAM in medical imaging, they still suffer from insufficient feature extraction and highly rely on high-quality prompts. In this paper, we propose a powerful foundation model SAMCT allowing labor-free prompts and train it on a collected large CT dataset consisting of 1.1M CT images and 5M masks from public datasets. Specifically, based on SAM, SAMCT is further equipped with a U-shaped CNN image encoder, a cross-branch interaction module, and a task-indicator prompt encoder. The U-shaped CNN image encoder works in parallel with the ViT image encoder in SAM to supplement local features. Cross-branch interaction enhances the feature expression capability of the CNN image encoder and the ViT image encoder by exchanging global perception and local features from one to the other. The task-indicator prompt encoder is a plug-and-play component to effortlessly encode task-related indicators into prompt embeddings. In this way, SAMCT can work in an automatic manner in addition to the semi-automatic interactive strategy in SAM. Extensive experiments demonstrate the superiority of SAMCT against the state-of-the-art task-specific and SAM-based medical foundation models on various tasks. The code, data, and model checkpoints are available at <uri>https://github.com/xianlin7/SAMCT</uri>.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 3","pages":"1386-1399"},"PeriodicalIF":0.0000,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SAMCT: Segment Any CT Allowing Labor-Free Task-Indicator Prompts\",\"authors\":\"Xian Lin;Yangyang Xiang;Zhehao Wang;Kwang-Ting Cheng;Zengqiang Yan;Li Yu\",\"doi\":\"10.1109/TMI.2024.3493456\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Segment anything model (SAM), a foundation model with superior versatility and generalization across diverse segmentation tasks, has attracted widespread attention in medical imaging. However, it has been proved that SAM would encounter severe performance degradation due to the lack of medical knowledge in training and local feature encoding. Though several SAM-based models have been proposed for tuning SAM in medical imaging, they still suffer from insufficient feature extraction and highly rely on high-quality prompts. In this paper, we propose a powerful foundation model SAMCT allowing labor-free prompts and train it on a collected large CT dataset consisting of 1.1M CT images and 5M masks from public datasets. Specifically, based on SAM, SAMCT is further equipped with a U-shaped CNN image encoder, a cross-branch interaction module, and a task-indicator prompt encoder. The U-shaped CNN image encoder works in parallel with the ViT image encoder in SAM to supplement local features. Cross-branch interaction enhances the feature expression capability of the CNN image encoder and the ViT image encoder by exchanging global perception and local features from one to the other. The task-indicator prompt encoder is a plug-and-play component to effortlessly encode task-related indicators into prompt embeddings. In this way, SAMCT can work in an automatic manner in addition to the semi-automatic interactive strategy in SAM. Extensive experiments demonstrate the superiority of SAMCT against the state-of-the-art task-specific and SAM-based medical foundation models on various tasks. The code, data, and model checkpoints are available at <uri>https://github.com/xianlin7/SAMCT</uri>.\",\"PeriodicalId\":94033,\"journal\":{\"name\":\"IEEE transactions on medical imaging\",\"volume\":\"44 3\",\"pages\":\"1386-1399\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on medical imaging\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10746534/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on medical imaging","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10746534/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

任意分割模型（SAM）作为一种具有较强通用性和通用性的基础模型，在医学影像领域受到了广泛的关注。然而，事实证明，由于训练中缺乏医学知识和局部特征编码，SAM会遇到严重的性能下降。虽然已经提出了几种基于SAM的模型来优化医学成像中的SAM，但它们仍然存在特征提取不足和高度依赖高质量提示的问题。在本文中，我们提出了一个强大的基础模型SAMCT，允许人工提示，并在收集的大型CT数据集上进行训练，该数据集包括来自公共数据集的1.1M张CT图像和5M个掩模。具体而言，SAMCT在SAM的基础上，进一步配置了u形CNN图像编码器、跨分支交互模块和任务指示提示编码器。u形CNN图像编码器与SAM中的ViT图像编码器并行工作，以补充局部特征。跨分支交互通过交换全局感知和局部特征，增强了CNN图像编码器和ViT图像编码器的特征表达能力。任务指示符提示编码器是一个即插即用组件，可以毫不费力地将任务相关的指示符编码到提示嵌入中。这样，除了SAM中的半自动交互策略外，SAMCT还可以以自动方式工作。广泛的实验证明了SAMCT在各种任务上优于最先进的特定任务和基于sam的医学基础模型。代码、数据和模型检查点可在https://github.com/xianlin7/SAMCT上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SAMCT: Segment Any CT Allowing Labor-Free Task-Indicator Prompts

Segment anything model (SAM), a foundation model with superior versatility and generalization across diverse segmentation tasks, has attracted widespread attention in medical imaging. However, it has been proved that SAM would encounter severe performance degradation due to the lack of medical knowledge in training and local feature encoding. Though several SAM-based models have been proposed for tuning SAM in medical imaging, they still suffer from insufficient feature extraction and highly rely on high-quality prompts. In this paper, we propose a powerful foundation model SAMCT allowing labor-free prompts and train it on a collected large CT dataset consisting of 1.1M CT images and 5M masks from public datasets. Specifically, based on SAM, SAMCT is further equipped with a U-shaped CNN image encoder, a cross-branch interaction module, and a task-indicator prompt encoder. The U-shaped CNN image encoder works in parallel with the ViT image encoder in SAM to supplement local features. Cross-branch interaction enhances the feature expression capability of the CNN image encoder and the ViT image encoder by exchanging global perception and local features from one to the other. The task-indicator prompt encoder is a plug-and-play component to effortlessly encode task-related indicators into prompt embeddings. In this way, SAMCT can work in an automatic manner in addition to the semi-automatic interactive strategy in SAM. Extensive experiments demonstrate the superiority of SAMCT against the state-of-the-art task-specific and SAM-based medical foundation models on various tasks. The code, data, and model checkpoints are available at https://github.com/xianlin7/SAMCT.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊