首页 > 最新文献

Medical image analysis最新文献

英文 中文
Enhancing chest X-ray datasets with privacy-preserving large language models and multi-type annotations: A data-driven approach for improved classification 利用保护隐私的大型语言模型和多类型注释增强胸部 X 光数据集:改进分类的数据驱动方法。
IF 10.7 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-10 DOI: 10.1016/j.media.2024.103383
Ricardo Bigolin Lanfredi, Pritam Mukherjee, Ronald M. Summers
In chest X-ray (CXR) image analysis, rule-based systems are usually employed to extract labels from reports for dataset releases. However, there is still room for improvement in label quality. These labelers typically output only presence labels, sometimes with binary uncertainty indicators, which limits their usefulness. Supervised deep learning models have also been developed for report labeling but lack adaptability, similar to rule-based systems. In this work, we present MAPLEZ (Medical report Annotations with Privacy-preserving Large language model using Expeditious Zero shot answers), a novel approach leveraging a locally executable Large Language Model (LLM) to extract and enhance findings labels on CXR reports. MAPLEZ extracts not only binary labels indicating the presence or absence of a finding but also the location, severity, and radiologists’ uncertainty about the finding. Over eight abnormalities from five test sets, we show that our method can extract these annotations with an increase of 3.6 percentage points (pp) in macro F1 score for categorical presence annotations and more than 20 pp increase in F1 score for the location annotations over competing labelers. Additionally, using the combination of improved annotations and multi-type annotations in classification supervision in a dataset of limited-resolution CXRs, we demonstrate substantial advancements in proof-of-concept classification quality, with an increase of 1.1 pp in AUROC over models trained with annotations from the best alternative approach. We share code and annotations.
在胸部 X 光(CXR)图像分析中,通常采用基于规则的系统从报告中提取标签,以便发布数据集。然而,标签质量仍有待提高。这些标签器通常只输出存在标签,有时还带有二进制不确定性指标,这限制了它们的实用性。也有人开发了用于报告标注的有监督深度学习模型,但与基于规则的系统类似,缺乏适应性。在这项工作中,我们提出了 MAPLEZ(使用快速零枪答案的隐私保护大语言模型医学报告注释),这是一种利用本地可执行大语言模型(LLM)来提取和增强 CXR 报告中的发现标签的新方法。MAPLEZ 不仅能提取二进制标签来表示有无发现,还能提取位置、严重程度和放射医师对发现的不确定性。在五个测试集中的八种异常情况中,我们证明了我们的方法可以提取这些注释,与竞争标签相比,分类存在注释的宏观 F1 分数提高了 3.6 个百分点,位置注释的 F1 分数提高了 20 多个百分点。此外,在有限分辨率 CXR 数据集的分类监督中结合使用改进注释和多类型注释,我们展示了概念验证分类质量的实质性进步,与使用最佳替代方法注释训练的模型相比,AUROC 提高了 1.1 个百分点。我们共享代码和注释。
{"title":"Enhancing chest X-ray datasets with privacy-preserving large language models and multi-type annotations: A data-driven approach for improved classification","authors":"Ricardo Bigolin Lanfredi,&nbsp;Pritam Mukherjee,&nbsp;Ronald M. Summers","doi":"10.1016/j.media.2024.103383","DOIUrl":"10.1016/j.media.2024.103383","url":null,"abstract":"<div><div>In chest X-ray (CXR) image analysis, rule-based systems are usually employed to extract labels from reports for dataset releases. However, there is still room for improvement in label quality. These labelers typically output only presence labels, sometimes with binary uncertainty indicators, which limits their usefulness. Supervised deep learning models have also been developed for report labeling but lack adaptability, similar to rule-based systems. In this work, we present MAPLEZ (Medical report Annotations with Privacy-preserving Large language model using Expeditious Zero shot answers), a novel approach leveraging a locally executable Large Language Model (LLM) to extract and enhance findings labels on CXR reports. MAPLEZ extracts not only binary labels indicating the presence or absence of a finding but also the location, severity, and radiologists’ uncertainty about the finding. Over eight abnormalities from five test sets, we show that our method can extract these annotations with an increase of 3.6 percentage points (pp) in macro F1 score for categorical presence annotations and more than 20 pp increase in F1 score for the location annotations over competing labelers. Additionally, using the combination of improved annotations and multi-type annotations in classification supervision in a dataset of limited-resolution CXRs, we demonstrate substantial advancements in proof-of-concept classification quality, with an increase of 1.1 pp in AUROC over models trained with annotations from the best alternative approach. We share code and annotations.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103383"},"PeriodicalIF":10.7,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142639276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IGUANe: A 3D generalizable CycleGAN for multicenter harmonization of brain MR images IGUANe:用于多中心协调脑磁共振图像的三维可通用 CycleGAN。
IF 10.7 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-09 DOI: 10.1016/j.media.2024.103388
Vincent Roca , Grégory Kuchcinski , Jean-Pierre Pruvo , Dorian Manouvriez , Renaud Lopes , the Australian Imaging Biomarkers and Lifestyle flagship study of ageing , the Alzheimer’s Disease Neuroimage Initiative
In MRI studies, the aggregation of imaging data from multiple acquisition sites enhances sample size but may introduce site-related variabilities that hinder consistency in subsequent analyses. Deep learning methods for image translation have emerged as a solution for harmonizing MR images across sites. In this study, we introduce IGUANe (Image Generation with Unified Adversarial Networks), an original 3D model that leverages the strengths of domain translation and straightforward application of style transfer methods for multicenter brain MR image harmonization. IGUANe extends CycleGAN by integrating an arbitrary number of domains for training through a many-to-one architecture. The framework based on domain pairs enables the implementation of sampling strategies that prevent confusion between site-related and biological variabilities. During inference, the model can be applied to any image, even from an unknown acquisition site, making it a universal generator for harmonization. Trained on a dataset comprising T1-weighted images from 11 different scanners, IGUANe was evaluated on data from unseen sites. The assessments included the transformation of MR images with traveling subjects, the preservation of pairwise distances between MR images within domains, the evolution of volumetric patterns related to age and Alzheimer’s disease (AD), and the performance in age regression and patient classification tasks. Comparisons with other harmonization and normalization methods suggest that IGUANe better preserves individual information in MR images and is more suitable for maintaining and reinforcing variabilities related to age and AD. Future studies may further assess IGUANe in other multicenter contexts, either using the same model or retraining it for applications to different image modalities. Codes and the trained IGUANe model are available at https://github.com/RocaVincent/iguane_harmonization.git.
在核磁共振成像研究中,汇总来自多个采集点的成像数据可扩大样本量,但可能会引入与采集点相关的变异,从而妨碍后续分析的一致性。图像翻译的深度学习方法已成为协调各采集点磁共振图像的一种解决方案。在本研究中,我们引入了 IGUANe(用统一对抗网络生成图像),它是一种独创的三维模型,利用了领域翻译的优势和风格转换方法的直接应用,可用于多中心脑部 MR 图像协调。IGUANe 对 CycleGAN 进行了扩展,通过多对一架构整合了任意数量的域进行训练。基于域对的框架能够实施采样策略,防止混淆与部位相关的变异和生物变异。在推理过程中,该模型可应用于任何图像,甚至是来自未知采集地点的图像,从而使其成为一种通用的协调生成器。IGUANe 在由来自 11 个不同扫描仪的 T1 加权图像组成的数据集上进行了训练,并在来自未知部位的数据上进行了评估。评估内容包括磁共振图像与旅行对象的转换、域内磁共振图像之间成对距离的保持、与年龄和阿尔茨海默病(AD)相关的容积模式的演变,以及年龄回归和患者分类任务的性能。与其他协调和归一化方法的比较表明,IGUANe 能更好地保留 MR 图像中的个体信息,更适合保持和加强与年龄和 AD 相关的变异性。未来的研究可能会在其他多中心环境中进一步评估 IGUANe,或使用相同的模型,或对其进行重新训练以应用于不同的图像模式。代码和经过训练的 IGUANe 模型可在 https://github.com/RocaVincent/iguane_harmonization.git 网站上获取。
{"title":"IGUANe: A 3D generalizable CycleGAN for multicenter harmonization of brain MR images","authors":"Vincent Roca ,&nbsp;Grégory Kuchcinski ,&nbsp;Jean-Pierre Pruvo ,&nbsp;Dorian Manouvriez ,&nbsp;Renaud Lopes ,&nbsp;the Australian Imaging Biomarkers and Lifestyle flagship study of ageing ,&nbsp;the Alzheimer’s Disease Neuroimage Initiative","doi":"10.1016/j.media.2024.103388","DOIUrl":"10.1016/j.media.2024.103388","url":null,"abstract":"<div><div>In MRI studies, the aggregation of imaging data from multiple acquisition sites enhances sample size but may introduce site-related variabilities that hinder consistency in subsequent analyses. Deep learning methods for image translation have emerged as a solution for harmonizing MR images across sites. In this study, we introduce IGUANe (Image Generation with Unified Adversarial Networks), an original 3D model that leverages the strengths of domain translation and straightforward application of style transfer methods for multicenter brain MR image harmonization. IGUANe extends CycleGAN by integrating an arbitrary number of domains for training through a many-to-one architecture. The framework based on domain pairs enables the implementation of sampling strategies that prevent confusion between site-related and biological variabilities. During inference, the model can be applied to any image, even from an unknown acquisition site, making it a universal generator for harmonization. Trained on a dataset comprising T1-weighted images from 11 different scanners, IGUANe was evaluated on data from unseen sites. The assessments included the transformation of MR images with traveling subjects, the preservation of pairwise distances between MR images within domains, the evolution of volumetric patterns related to age and Alzheimer’s disease (AD), and the performance in age regression and patient classification tasks. Comparisons with other harmonization and normalization methods suggest that IGUANe better preserves individual information in MR images and is more suitable for maintaining and reinforcing variabilities related to age and AD. Future studies may further assess IGUANe in other multicenter contexts, either using the same model or retraining it for applications to different image modalities. Codes and the trained IGUANe model are available at <span><span>https://github.com/RocaVincent/iguane_harmonization.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103388"},"PeriodicalIF":10.7,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142639278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large-scale multi-center CT and MRI segmentation of pancreas with deep learning 利用深度学习对胰腺进行大规模多中心 CT 和 MRI 分割。
IF 10.7 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-08 DOI: 10.1016/j.media.2024.103382
Zheyuan Zhang , Elif Keles , Gorkem Durak , Yavuz Taktak , Onkar Susladkar , Vandan Gorade , Debesh Jha , Asli C. Ormeci , Alpay Medetalibeyoglu , Lanhong Yao , Bin Wang , Ilkin Sevgi Isler , Linkai Peng , Hongyi Pan , Camila Lopes Vendrami , Amir Bourhani , Yury Velichko , Boqing Gong , Concetto Spampinato , Ayis Pyrros , Ulas Bagci
Automated volumetric segmentation of the pancreas on cross-sectional imaging is needed for diagnosis and follow-up of pancreatic diseases. While CT-based pancreatic segmentation is more established, MRI-based segmentation methods are understudied, largely due to a lack of publicly available datasets, benchmarking research efforts, and domain-specific deep learning methods. In this retrospective study, we collected a large dataset (767 scans from 499 participants) of T1-weighted (T1 W) and T2-weighted (T2 W) abdominal MRI series from five centers between March 2004 and November 2022. We also collected CT scans of 1,350 patients from publicly available sources for benchmarking purposes. We introduced a new pancreas segmentation method, called PanSegNet, combining the strengths of nnUNet and a Transformer network with a new linear attention module enabling volumetric computation. We tested PanSegNet’s accuracy in cross-modality (a total of 2,117 scans) and cross-center settings with Dice and Hausdorff distance (HD95) evaluation metrics. We used Cohen’s kappa statistics for intra and inter-rater agreement evaluation and paired t-tests for volume and Dice comparisons, respectively. For segmentation accuracy, we achieved Dice coefficients of 88.3% (±7.2%, at case level) with CT, 85.0% (±7.9%) with T1 W MRI, and 86.3% (±6.4%) with T2 W MRI. There was a high correlation for pancreas volume prediction with R2 of 0.91, 0.84, and 0.85 for CT, T1 W, and T2 W, respectively. We found moderate inter-observer (0.624 and 0.638 for T1 W and T2 W MRI, respectively) and high intra-observer agreement scores. All MRI data is made available at https://osf.io/kysnj/. Our source code is available at https://github.com/NUBagciLab/PaNSegNet.
胰腺疾病的诊断和随访需要在横断面成像上对胰腺进行自动容积分割。虽然基于 CT 的胰腺分割方法已较为成熟,但基于 MRI 的分割方法还未得到充分研究,这主要是由于缺乏公开可用的数据集、基准研究工作以及特定领域的深度学习方法。在这项回顾性研究中,我们收集了 2004 年 3 月至 2022 年 11 月期间来自五个中心的 T1 加权(T1 W)和 T2 加权(T2 W)腹部 MRI 系列的大型数据集(来自 499 名参与者的 767 次扫描)。我们还从公开来源收集了 1,350 名患者的 CT 扫描结果,以作为基准。我们引入了一种新的胰腺分割方法,称为 PanSegNet,它结合了 nnUNet 和 Transformer 网络的优势,并采用了新的线性注意模块,实现了体积计算。我们使用 Dice 和 Hausdorff 距离 (HD95) 评估指标测试了 PanSegNet 在跨模态(共 2,117 次扫描)和跨中心设置下的准确性。我们使用 Cohen's kappa 统计法对评分者内部和评分者之间的一致性进行评估,并使用配对 t 检验法分别对体积和 Dice 进行比较。在分割准确性方面,CT 的 Dice 系数为 88.3%(±7.2%,病例水平),T1 W MRI 为 85.0%(±7.9%),T2 W MRI 为 86.3%(±6.4%)。胰腺体积预测的相关性很高,CT、T1 W 和 T2 W 的 R2 分别为 0.91、0.84 和 0.85。我们发现观察者之间的一致性得分中等(T1 W 和 T2 W MRI 的一致性得分分别为 0.624 和 0.638),观察者内部的一致性得分较高。所有 MRI 数据可在 https://osf.io/kysnj/ 网站上查阅。我们的源代码见 https://github.com/NUBagciLab/PaNSegNet。
{"title":"Large-scale multi-center CT and MRI segmentation of pancreas with deep learning","authors":"Zheyuan Zhang ,&nbsp;Elif Keles ,&nbsp;Gorkem Durak ,&nbsp;Yavuz Taktak ,&nbsp;Onkar Susladkar ,&nbsp;Vandan Gorade ,&nbsp;Debesh Jha ,&nbsp;Asli C. Ormeci ,&nbsp;Alpay Medetalibeyoglu ,&nbsp;Lanhong Yao ,&nbsp;Bin Wang ,&nbsp;Ilkin Sevgi Isler ,&nbsp;Linkai Peng ,&nbsp;Hongyi Pan ,&nbsp;Camila Lopes Vendrami ,&nbsp;Amir Bourhani ,&nbsp;Yury Velichko ,&nbsp;Boqing Gong ,&nbsp;Concetto Spampinato ,&nbsp;Ayis Pyrros ,&nbsp;Ulas Bagci","doi":"10.1016/j.media.2024.103382","DOIUrl":"10.1016/j.media.2024.103382","url":null,"abstract":"<div><div>Automated volumetric segmentation of the pancreas on cross-sectional imaging is needed for diagnosis and follow-up of pancreatic diseases. While CT-based pancreatic segmentation is more established, MRI-based segmentation methods are understudied, largely due to a lack of publicly available datasets, benchmarking research efforts, and domain-specific deep learning methods. In this retrospective study, we collected a large dataset (767 scans from 499 participants) of T1-weighted (T1 W) and T2-weighted (T2 W) abdominal MRI series from five centers between March 2004 and November 2022. We also collected CT scans of 1,350 patients from publicly available sources for benchmarking purposes. We introduced a new pancreas segmentation method, called <em>PanSegNet</em>, combining the strengths of <em>nnUNet</em> and a <em>Transformer</em> network with a new linear attention module enabling volumetric computation. We tested <em>PanSegNet</em>’s accuracy in cross-modality (a total of 2,117 scans) and cross-center settings with Dice and Hausdorff distance (HD95) evaluation metrics. We used Cohen’s kappa statistics for intra and inter-rater agreement evaluation and paired t-tests for volume and Dice comparisons, respectively. For segmentation accuracy, we achieved Dice coefficients of 88.3% (±7.2%, at case level) with CT, 85.0% (±7.9%) with T1 W MRI, and 86.3% (±6.4%) with T2 W MRI. There was a high correlation for pancreas volume prediction with <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> of 0.91, 0.84, and 0.85 for CT, T1 W, and T2 W, respectively. We found moderate inter-observer (0.624 and 0.638 for T1 W and T2 W MRI, respectively) and high intra-observer agreement scores. All MRI data is made available at <span><span>https://osf.io/kysnj/</span><svg><path></path></svg></span>. Our source code is available at <span><span>https://github.com/NUBagciLab/PaNSegNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103382"},"PeriodicalIF":10.7,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142623298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-task learning with cross-task consistency for improved depth estimation in colonoscopy 具有跨任务一致性的多任务学习,用于改进结肠镜检查的深度估计。
IF 10.7 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-04 DOI: 10.1016/j.media.2024.103379
Pedro Esteban Chavarrias Solano , Andrew Bulpitt , Venkataraman Subramanian , Sharib Ali
Colonoscopy screening is the gold standard procedure for assessing abnormalities in the colon and rectum, such as ulcers and cancerous polyps. Measuring the abnormal mucosal area and its 3D reconstruction can help quantify the surveyed area and objectively evaluate disease burden. However, due to the complex topology of these organs and variable physical conditions, for example, lighting, large homogeneous texture, and image modality estimating distance from the camera (aka depth) is highly challenging. Moreover, most colonoscopic video acquisition is monocular, making the depth estimation a non-trivial problem. While methods in computer vision for depth estimation have been proposed and advanced on natural scene datasets, the efficacy of these techniques has not been widely quantified on colonoscopy datasets. As the colonic mucosa has several low-texture regions that are not well pronounced, learning representations from an auxiliary task can improve salient feature extraction, allowing estimation of accurate camera depths. In this work, we propose to develop a novel multi-task learning (MTL) approach with a shared encoder and two decoders, namely a surface normal decoder and a depth estimator decoder. Our depth estimator incorporates attention mechanisms to enhance global context awareness. We leverage the surface normal prediction to improve geometric feature extraction. Also, we apply a cross-task consistency loss among the two geometrically related tasks, surface normal and camera depth. We demonstrate an improvement of 15.75% on relative error and 10.7% improvement on δ1.25 accuracy over the most accurate baseline state-of-the-art Big-to-Small (BTS) approach. All experiments are conducted on a recently released C3VD dataset, and thus, we provide a first benchmark of state-of-the-art methods on this dataset.
结肠镜检查是评估结肠和直肠异常(如溃疡和癌息肉)的黄金标准程序。测量异常粘膜区域及其三维重建有助于量化检查区域并客观评估疾病负担。然而,由于这些器官的拓扑结构复杂,物理条件多变,例如光照、大面积均匀纹理和图像模式,估计与摄像机的距离(又称深度)极具挑战性。此外,大多数结肠镜视频采集都是单眼的,这使得深度估计成为一个棘手的问题。虽然在自然场景数据集上已经提出并推进了计算机视觉深度估算方法,但这些技术在结肠镜数据集上的功效尚未得到广泛量化。由于结肠粘膜有几个纹理不明显的低纹理区域,从辅助任务中学习表征可以改善突出特征提取,从而准确估计摄像头深度。在这项工作中,我们提议开发一种新颖的多任务学习(MTL)方法,该方法具有一个共享编码器和两个解码器,即表面法线解码器和深度估计解码器。我们的深度估计器结合了注意力机制,以增强全局上下文意识。我们利用表面法线预测来改进几何特征提取。此外,我们还在两个与几何相关的任务(表面法线和摄像头深度)之间应用了跨任务一致性损失。我们证明,与最精确的最先进的大到小(BTS)方法相比,相对误差提高了 15.75%,δ1.25 精确度提高了 10.7%。所有实验都是在最近发布的 C3VD 数据集上进行的,因此,我们首次在该数据集上提供了最先进方法的基准。
{"title":"Multi-task learning with cross-task consistency for improved depth estimation in colonoscopy","authors":"Pedro Esteban Chavarrias Solano ,&nbsp;Andrew Bulpitt ,&nbsp;Venkataraman Subramanian ,&nbsp;Sharib Ali","doi":"10.1016/j.media.2024.103379","DOIUrl":"10.1016/j.media.2024.103379","url":null,"abstract":"<div><div>Colonoscopy screening is the gold standard procedure for assessing abnormalities in the colon and rectum, such as ulcers and cancerous polyps. Measuring the abnormal mucosal area and its 3D reconstruction can help quantify the surveyed area and objectively evaluate disease burden. However, due to the complex topology of these organs and variable physical conditions, for example, lighting, large homogeneous texture, and image modality estimating distance from the camera (<em>aka</em> depth) is highly challenging. Moreover, most colonoscopic video acquisition is monocular, making the depth estimation a non-trivial problem. While methods in computer vision for depth estimation have been proposed and advanced on natural scene datasets, the efficacy of these techniques has not been widely quantified on colonoscopy datasets. As the colonic mucosa has several low-texture regions that are not well pronounced, learning representations from an auxiliary task can improve salient feature extraction, allowing estimation of accurate camera depths. In this work, we propose to develop a novel multi-task learning (MTL) approach with a shared encoder and two decoders, namely a surface normal decoder and a depth estimator decoder. Our depth estimator incorporates attention mechanisms to enhance global context awareness. We leverage the surface normal prediction to improve geometric feature extraction. Also, we apply a cross-task consistency loss among the two geometrically related tasks, surface normal and camera depth. We demonstrate an improvement of 15.75% on relative error and 10.7% improvement on <span><math><msub><mrow><mi>δ</mi></mrow><mrow><mn>1</mn><mo>.</mo><mn>25</mn></mrow></msub></math></span> accuracy over the most accurate baseline state-of-the-art Big-to-Small (BTS) approach. All experiments are conducted on a recently released C3VD dataset, and thus, we provide a first benchmark of state-of-the-art methods on this dataset.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103379"},"PeriodicalIF":10.7,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142623301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantics and instance interactive learning for labeling and segmentation of vertebrae in CT images 用于 CT 图像中椎骨标注和分割的语义和实例交互式学习。
IF 10.7 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-01 DOI: 10.1016/j.media.2024.103380
Yixiao Mao , Qianjin Feng , Yu Zhang , Zhenyuan Ning
Automatically labeling and segmenting vertebrae in 3D CT images compose a complex multi-task problem. Current methods progressively conduct vertebra labeling and semantic segmentation, which typically include two separate models and may ignore feature interaction among different tasks. Although instance segmentation approaches with multi-channel prediction have been proposed to alleviate such issues, their utilization of semantic information remains insufficient. Additionally, another challenge for an accurate model is how to effectively distinguish similar adjacent vertebrae and model their sequential attribute. In this paper, we propose a Semantics and Instance Interactive Learning (SIIL) paradigm for synchronous labeling and segmentation of vertebrae in CT images. SIIL models semantic feature learning and instance feature learning, in which the former extracts spinal semantics and the latter distinguishes vertebral instances. Interactive learning involves semantic features to improve the separability of vertebral instances and instance features to help learn position and contour information, during which a Morphological Instance Localization Learning (MILL) module is introduced to align semantic and instance features and facilitate their interaction. Furthermore, an Ordinal Contrastive Prototype Learning (OCPL) module is devised to differentiate adjacent vertebrae with high similarity (via cross-image contrastive learning), and simultaneously model their sequential attribute (via a temporal unit). Extensive experiments on several datasets demonstrate that our method significantly outperforms other approaches in labeling and segmenting vertebrae. Our code is available at https://github.com/YuZhang-SMU/Vertebrae-Labeling-Segmentation
在三维 CT 图像中自动标记和分割椎骨是一个复杂的多任务问题。目前的方法逐步进行椎体标注和语义分割,通常包括两个独立的模型,可能会忽略不同任务之间的特征交互。虽然已经提出了多通道预测的实例分割方法来缓解这些问题,但它们对语义信息的利用仍然不足。此外,准确模型的另一个挑战是如何有效区分相邻的相似椎体并建立其顺序属性模型。在本文中,我们提出了一种语义与实例交互学习(SIIL)范式,用于同步标记和分割 CT 图像中的椎骨。SIIL 模型包括语义特征学习和实例特征学习,前者提取脊椎语义,后者区分脊椎实例。交互式学习涉及语义特征,以提高脊椎实例的可分离性;实例特征则有助于学习位置和轮廓信息,其间引入了形态实例定位学习(MILL)模块,以调整语义特征和实例特征并促进它们之间的交互。此外,还设计了一个顺序对比原型学习(OCPL)模块,以区分具有高度相似性的相邻椎体(通过交叉图像对比学习),并同时模拟它们的顺序属性(通过时间单元)。在多个数据集上进行的大量实验证明,我们的方法在标记和分割椎骨方面明显优于其他方法。我们的代码见 https://github.com/YuZhang-SMU/Vertebrae-Labeling-Segmentation。
{"title":"Semantics and instance interactive learning for labeling and segmentation of vertebrae in CT images","authors":"Yixiao Mao ,&nbsp;Qianjin Feng ,&nbsp;Yu Zhang ,&nbsp;Zhenyuan Ning","doi":"10.1016/j.media.2024.103380","DOIUrl":"10.1016/j.media.2024.103380","url":null,"abstract":"<div><div>Automatically labeling and segmenting vertebrae in 3D CT images compose a complex multi-task problem. Current methods progressively conduct vertebra labeling and semantic segmentation, which typically include two separate models and may ignore feature interaction among different tasks. Although instance segmentation approaches with multi-channel prediction have been proposed to alleviate such issues, their utilization of semantic information remains insufficient. Additionally, another challenge for an accurate model is how to effectively distinguish similar adjacent vertebrae and model their sequential attribute. In this paper, we propose a Semantics and Instance Interactive Learning (SIIL) paradigm for synchronous labeling and segmentation of vertebrae in CT images. SIIL models semantic feature learning and instance feature learning, in which the former extracts spinal semantics and the latter distinguishes vertebral instances. Interactive learning involves semantic features to improve the separability of vertebral instances and instance features to help learn position and contour information, during which a Morphological Instance Localization Learning (MILL) module is introduced to align semantic and instance features and facilitate their interaction. Furthermore, an Ordinal Contrastive Prototype Learning (OCPL) module is devised to differentiate adjacent vertebrae with high similarity (via cross-image contrastive learning), and simultaneously model their sequential attribute (via a temporal unit). Extensive experiments on several datasets demonstrate that our method significantly outperforms other approaches in labeling and segmenting vertebrae. Our code is available at <span><span>https://github.com/YuZhang-SMU/Vertebrae-Labeling-Segmentation</span><svg><path></path></svg></span></div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103380"},"PeriodicalIF":10.7,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142605193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Beyond strong labels: Weakly-supervised learning based on Gaussian pseudo labels for the segmentation of ellipse-like vascular structures in non-contrast CTs 超越强标签:基于高斯伪标签的弱监督学习,用于分割非对比 CT 中的椭圆形血管结构
IF 10.7 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-30 DOI: 10.1016/j.media.2024.103378
Qixiang Ma , Adrien Kaladji , Huazhong Shu , Guanyu Yang , Antoine Lucas , Pascal Haigron
Deep learning-based automated segmentation of vascular structures in preoperative CT angiography (CTA) images contributes to computer-assisted diagnosis and interventions. While CTA is the common standard, non-contrast CT imaging has the advantage of avoiding complications associated with contrast agents. However, the challenges of labor-intensive labeling and high labeling variability due to the ambiguity of vascular boundaries hinder conventional strong-label-based, fully-supervised learning in non-contrast CTs. This paper introduces a novel weakly-supervised framework using the elliptical topology nature of vascular structures in CT slices. It includes an efficient annotation process based on our proposed standards, an approach of generating 2D Gaussian heatmaps serving as pseudo labels, and a training process through a combination of voxel reconstruction loss and distribution loss with the pseudo labels. We assess the effectiveness of the proposed method on one local and two public datasets comprising non-contrast CT scans, particularly focusing on the abdominal aorta. On the local dataset, our weakly-supervised learning approach based on pseudo labels outperforms strong-label-based fully-supervised learning (1.54% of Dice score on average), reducing labeling time by around 82.0%. The efficiency in generating pseudo labels allows the inclusion of label-agnostic external data in the training set, leading to an additional improvement in performance (2.74% of Dice score on average) with a reduction of 66.3% labeling time, where the labeling time remains considerably less than that of strong labels. On the public dataset, the pseudo labels achieve an overall improvement of 1.95% in Dice score for 2D models with a reduction of 68% of the Hausdorff distance for 3D model.
基于深度学习的术前 CT 血管造影(CTA)图像血管结构自动分割有助于计算机辅助诊断和干预。虽然 CTA 是通用标准,但非对比 CT 成像的优点是可以避免造影剂带来的并发症。然而,由于血管边界的模糊性,传统的基于强标签的全监督学习在非对比 CT 中面临着劳动密集型标签和标签高变异性的挑战。本文利用 CT 切片中血管结构的椭圆拓扑特性,介绍了一种新颖的弱监督框架。它包括基于我们提出的标准的高效注释过程、生成二维高斯热图作为伪标签的方法,以及通过体素重建损失和伪标签分布损失相结合的训练过程。我们在一个本地数据集和两个公共数据集上评估了所提方法的有效性,这两个数据集由非对比 CT 扫描组成,尤其侧重于腹主动脉。在本地数据集上,我们基于伪标签的弱监督学习方法优于基于强标签的完全监督学习方法(平均减少 1.54% 的 Dice 分数),减少了约 82.0% 的标记时间。生成伪标签的效率允许在训练集中加入与标签无关的外部数据,从而进一步提高了性能(平均提高了 2.74% 的 Dice 分数),减少了 66.3% 的标注时间,其中标注时间仍然大大少于强标签。在公共数据集上,伪标签使二维模型的 Dice 分数总体提高了 1.95%,三维模型的 Hausdorff 距离缩短了 68%。
{"title":"Beyond strong labels: Weakly-supervised learning based on Gaussian pseudo labels for the segmentation of ellipse-like vascular structures in non-contrast CTs","authors":"Qixiang Ma ,&nbsp;Adrien Kaladji ,&nbsp;Huazhong Shu ,&nbsp;Guanyu Yang ,&nbsp;Antoine Lucas ,&nbsp;Pascal Haigron","doi":"10.1016/j.media.2024.103378","DOIUrl":"10.1016/j.media.2024.103378","url":null,"abstract":"<div><div>Deep learning-based automated segmentation of vascular structures in preoperative CT angiography (CTA) images contributes to computer-assisted diagnosis and interventions. While CTA is the common standard, non-contrast CT imaging has the advantage of avoiding complications associated with contrast agents. However, the challenges of labor-intensive labeling and high labeling variability due to the ambiguity of vascular boundaries hinder conventional strong-label-based, fully-supervised learning in non-contrast CTs. This paper introduces a novel weakly-supervised framework using the elliptical topology nature of vascular structures in CT slices. It includes an efficient annotation process based on our proposed standards, an approach of generating 2D Gaussian heatmaps serving as pseudo labels, and a training process through a combination of voxel reconstruction loss and distribution loss with the pseudo labels. We assess the effectiveness of the proposed method on one local and two public datasets comprising non-contrast CT scans, particularly focusing on the abdominal aorta. On the local dataset, our weakly-supervised learning approach based on pseudo labels outperforms strong-label-based fully-supervised learning (1.54% of Dice score on average), reducing labeling time by around 82.0%. The efficiency in generating pseudo labels allows the inclusion of label-agnostic external data in the training set, leading to an additional improvement in performance (2.74% of Dice score on average) with a reduction of 66.3% labeling time, where the labeling time remains considerably less than that of strong labels. On the public dataset, the pseudo labels achieve an overall improvement of 1.95% in Dice score for 2D models with a reduction of 68% of the Hausdorff distance for 3D model.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103378"},"PeriodicalIF":10.7,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142578929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A cross-attention-based deep learning approach for predicting functional stroke outcomes using 4D CTP imaging and clinical metadata 利用 4D CTP 成像和临床元数据预测功能性中风预后的交叉注意力深度学习方法
IF 10.7 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-30 DOI: 10.1016/j.media.2024.103381
Kimberly Amador , Noah Pinel , Anthony J. Winder , Jens Fiehler , Matthias Wilms , Nils D. Forkert
Acute ischemic stroke (AIS) remains a global health challenge, leading to long-term functional disabilities without timely intervention. Spatio-temporal (4D) Computed Tomography Perfusion (CTP) imaging is crucial for diagnosing and treating AIS due to its ability to rapidly assess the ischemic core and penumbra. Although traditionally used to assess acute tissue status in clinical settings, 4D CTP has also been explored in research for predicting stroke tissue outcomes. However, its potential for predicting functional outcomes, especially in combination with clinical metadata, remains unexplored. Thus, this work aims to develop and evaluate a novel multimodal deep learning model for predicting functional outcomes (specifically, 90-day modified Rankin Scale) in AIS patients by combining 4D CTP and clinical metadata. To achieve this, an intermediate fusion strategy with a cross-attention mechanism is introduced to enable a selective focus on the most relevant features and patterns from both modalities. Evaluated on a dataset comprising 70 AIS patients who underwent endovascular mechanical thrombectomy, the proposed model achieves an accuracy (ACC) of 0.77, outperforming conventional late fusion strategies (ACC = 0.73) and unimodal models based on either 4D CTP (ACC = 0.61) or clinical metadata (ACC = 0.71). The results demonstrate the superior capability of the proposed model to leverage complex inter-modal relationships, emphasizing the value of advanced multimodal fusion techniques for predicting functional stroke outcomes.
急性缺血性中风(AIS)仍是全球健康的一大挑战,如不及时干预,会导致长期功能障碍。时空(4D)计算机断层扫描灌注(CTP)成像能快速评估缺血核心区和半影区,对诊断和治疗急性缺血性中风至关重要。虽然 4D CTP 传统上用于评估临床环境中的急性组织状态,但在预测卒中组织预后的研究中也进行了探索。然而,其预测功能性预后的潜力,尤其是与临床元数据相结合的潜力,仍有待探索。因此,本研究旨在结合 4D CTP 和临床元数据,开发和评估一种新型多模态深度学习模型,用于预测 AIS 患者的功能预后(特别是 90 天修正 Rankin 量表)。为此,我们引入了一种具有交叉关注机制的中间融合策略,以便有选择性地关注两种模式中最相关的特征和模式。该模型的准确率(ACC)为 0.77,优于传统的后期融合策略(ACC = 0.73)和基于 4D CTP(ACC = 0.61)或临床元数据(ACC = 0.71)的单模态模型。结果表明,所提出的模型具有利用复杂的模态间关系的卓越能力,强调了先进的多模态融合技术在预测功能性卒中预后方面的价值。
{"title":"A cross-attention-based deep learning approach for predicting functional stroke outcomes using 4D CTP imaging and clinical metadata","authors":"Kimberly Amador ,&nbsp;Noah Pinel ,&nbsp;Anthony J. Winder ,&nbsp;Jens Fiehler ,&nbsp;Matthias Wilms ,&nbsp;Nils D. Forkert","doi":"10.1016/j.media.2024.103381","DOIUrl":"10.1016/j.media.2024.103381","url":null,"abstract":"<div><div>Acute ischemic stroke (AIS) remains a global health challenge, leading to long-term functional disabilities without timely intervention. Spatio-temporal (4D) Computed Tomography Perfusion (CTP) imaging is crucial for diagnosing and treating AIS due to its ability to rapidly assess the ischemic core and penumbra. Although traditionally used to assess acute tissue status in clinical settings, 4D CTP has also been explored in research for predicting stroke tissue outcomes. However, its potential for predicting functional outcomes, especially in combination with clinical metadata, remains unexplored. Thus, this work aims to develop and evaluate a novel multimodal deep learning model for predicting functional outcomes (specifically, 90-day modified Rankin Scale) in AIS patients by combining 4D CTP and clinical metadata. To achieve this, an intermediate fusion strategy with a cross-attention mechanism is introduced to enable a selective focus on the most relevant features and patterns from both modalities. Evaluated on a dataset comprising 70 AIS patients who underwent endovascular mechanical thrombectomy, the proposed model achieves an accuracy (ACC) of 0.77, outperforming conventional late fusion strategies (ACC = 0.73) and unimodal models based on either 4D CTP (ACC = 0.61) or clinical metadata (ACC = 0.71). The results demonstrate the superior capability of the proposed model to leverage complex inter-modal relationships, emphasizing the value of advanced multimodal fusion techniques for predicting functional stroke outcomes.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103381"},"PeriodicalIF":10.7,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142578930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clinical knowledge-guided hybrid classification network for automatic periodontal disease diagnosis in X-ray image 临床知识指导下的混合分类网络用于 X 光图像中牙周病的自动诊断
IF 10.7 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-24 DOI: 10.1016/j.media.2024.103376
Lanzhuju Mei , Ke Deng , Zhiming Cui , Yu Fang , Yuan Li , Hongchang Lai , Maurizio S. Tonetti , Dinggang Shen
Accurate classification of periodontal disease through panoramic X-ray images carries immense clinical importance for effective diagnosis and treatment. Recent methodologies attempt to classify periodontal diseases from X-ray images by estimating bone loss within these images, supervised by manual radiographic annotations for segmentation or keypoint detection. However, these annotations often lack consistency with the clinical gold standard of probing measurements, potentially causing measurement inaccuracy and leading to unstable classifications. Additionally, the diagnosis of periodontal disease necessitates exceptional sensitivity. To address these challenges, we introduce HC-Net, an innovative hybrid classification framework devised for accurately classifying periodontal disease from X-ray images. This framework comprises three main components: tooth-level classification, patient-level classification, and a learnable adaptive noisy-OR gate. In the tooth-level classification, we initially employ instance segmentation to individually identify each tooth, followed by tooth-level periodontal disease classification. For patient-level classification, we utilize a multi-task strategy to concurrently learn patient-level classification and a Classification Activation Map (CAM) that signifies the confidence of local lesion areas within the panoramic X-ray image. Eventually, our adaptive noisy-OR gate acquires a hybrid classification by amalgamating predictions from both levels. In particular, we incorporate clinical knowledge into the workflows used by professional dentists, targeting the enhanced handling of sensitivity of periodontal disease diagnosis. Extensive empirical testing on a dataset amassed from real-world clinics demonstrates that our proposed HC-Net achieves unparalleled performance in periodontal disease classification, exhibiting substantial potential for practical application.
通过全景 X 光图像对牙周疾病进行准确分类,对有效诊断和治疗具有极大的临床意义。最近的方法试图通过估算 X 光图像中的骨质流失来对牙周疾病进行分类,并通过人工放射注释进行分割或关键点检测。然而,这些注释往往与探查测量的临床黄金标准缺乏一致性,有可能造成测量不准确,导致分类不稳定。此外,牙周病的诊断还需要极高的灵敏度。为了应对这些挑战,我们引入了 HC-Net,这是一个创新的混合分类框架,用于从 X 光图像中准确地对牙周病进行分类。该框架由三个主要部分组成:牙齿级分类、患者级分类和可学习的自适应噪声-OR 门。在牙齿级分类中,我们首先使用实例分割来单独识别每颗牙齿,然后进行牙齿级牙周病分类。在患者级分类中,我们采用多任务策略,同时学习患者级分类和分类激活图(CAM),该图表示全景 X 光图像中局部病变区域的可信度。最终,我们的自适应噪声-OR 门通过综合两个层面的预测结果,获得混合分类。特别是,我们将临床知识融入到专业牙医使用的工作流程中,以提高牙周疾病诊断的敏感度为目标。在真实诊所收集的数据集上进行的广泛实证测试表明,我们提出的 HC-Net 在牙周病分类方面取得了无与伦比的性能,在实际应用中展现出巨大的潜力。
{"title":"Clinical knowledge-guided hybrid classification network for automatic periodontal disease diagnosis in X-ray image","authors":"Lanzhuju Mei ,&nbsp;Ke Deng ,&nbsp;Zhiming Cui ,&nbsp;Yu Fang ,&nbsp;Yuan Li ,&nbsp;Hongchang Lai ,&nbsp;Maurizio S. Tonetti ,&nbsp;Dinggang Shen","doi":"10.1016/j.media.2024.103376","DOIUrl":"10.1016/j.media.2024.103376","url":null,"abstract":"<div><div>Accurate classification of periodontal disease through panoramic X-ray images carries immense clinical importance for effective diagnosis and treatment. Recent methodologies attempt to classify periodontal diseases from X-ray images by estimating bone loss within these images, supervised by manual radiographic annotations for segmentation or keypoint detection. However, these annotations often lack consistency with the clinical gold standard of probing measurements, potentially causing measurement inaccuracy and leading to unstable classifications. Additionally, the diagnosis of periodontal disease necessitates exceptional sensitivity. To address these challenges, we introduce HC-Net, an innovative hybrid classification framework devised for accurately classifying periodontal disease from X-ray images. This framework comprises three main components: tooth-level classification, patient-level classification, and a learnable adaptive noisy-OR gate. In the tooth-level classification, we initially employ instance segmentation to individually identify each tooth, followed by tooth-level periodontal disease classification. For patient-level classification, we utilize a multi-task strategy to concurrently learn patient-level classification and a Classification Activation Map (CAM) that signifies the confidence of local lesion areas within the panoramic X-ray image. Eventually, our adaptive noisy-OR gate acquires a hybrid classification by amalgamating predictions from both levels. In particular, we incorporate clinical knowledge into the workflows used by professional dentists, targeting the enhanced handling of sensitivity of periodontal disease diagnosis. Extensive empirical testing on a dataset amassed from real-world clinics demonstrates that our proposed HC-Net achieves unparalleled performance in periodontal disease classification, exhibiting substantial potential for practical application.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103376"},"PeriodicalIF":10.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142623295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DACG: Dual Attention and Context Guidance model for radiology report generation DACG:用于生成放射学报告的双重注意和上下文引导模型
IF 10.7 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-23 DOI: 10.1016/j.media.2024.103377
Wangyu Lang, Zhi Liu, Yijia Zhang
Medical images are an essential basis for radiologists to write radiology reports and greatly help subsequent clinical treatment. The task of generating automatic radiology reports aims to alleviate the burden of clinical doctors writing reports and has received increasing attention this year, becoming an important research hotspot. However, there are severe issues of visual and textual data bias and long text generation in the medical field. Firstly, Abnormal areas in radiological images only account for a small portion, and most radiological reports only involve descriptions of normal findings. Secondly, there are still significant challenges in generating longer and more accurate descriptive texts for radiology report generation tasks. In this paper, we propose a new Dual Attention and Context Guidance (DACG) model to alleviate visual and textual data bias and promote the generation of long texts. We use a Dual Attention Module, including a Position Attention Block and a Channel Attention Block, to extract finer position and channel features from medical images, enhancing the image feature extraction ability of the encoder. We use the Context Guidance Module to integrate contextual information into the decoder and supervise the generation of long texts. The experimental results show that our proposed model achieves state-of-the-art performance on the most commonly used IU X-ray and MIMIC-CXR datasets. Further analysis also proves that our model can improve reporting through more accurate anomaly detection and more detailed descriptions. The source code is available at https://github.com/LangWY/DACG.
医学影像是放射科医生撰写放射报告的重要依据,对后续临床治疗有很大帮助。自动生成放射报告的任务旨在减轻临床医生撰写报告的负担,今年以来受到越来越多的关注,成为一个重要的研究热点。然而,医学领域存在着严重的视觉和文本数据偏差以及长文本生成问题。首先,放射图像中的异常区域只占一小部分,大多数放射报告只涉及正常结果的描述。其次,在为放射学报告生成任务生成更长、更准确的描述性文本方面仍存在巨大挑战。在本文中,我们提出了一种新的双重注意和上下文引导(DACG)模型,以减轻视觉和文本数据的偏差,促进长文本的生成。我们使用双注意模块(包括位置注意模块和通道注意模块)从医学图像中提取更精细的位置和通道特征,从而增强编码器的图像特征提取能力。我们使用上下文引导模块将上下文信息整合到解码器中,并监督长文本的生成。实验结果表明,我们提出的模型在最常用的 IU X 光和 MIMIC-CXR 数据集上取得了一流的性能。进一步的分析还证明,我们的模型可以通过更准确的异常检测和更详细的描述来改进报告。源代码见 https://github.com/LangWY/DACG。
{"title":"DACG: Dual Attention and Context Guidance model for radiology report generation","authors":"Wangyu Lang,&nbsp;Zhi Liu,&nbsp;Yijia Zhang","doi":"10.1016/j.media.2024.103377","DOIUrl":"10.1016/j.media.2024.103377","url":null,"abstract":"<div><div>Medical images are an essential basis for radiologists to write radiology reports and greatly help subsequent clinical treatment. The task of generating automatic radiology reports aims to alleviate the burden of clinical doctors writing reports and has received increasing attention this year, becoming an important research hotspot. However, there are severe issues of visual and textual data bias and long text generation in the medical field. Firstly, Abnormal areas in radiological images only account for a small portion, and most radiological reports only involve descriptions of normal findings. Secondly, there are still significant challenges in generating longer and more accurate descriptive texts for radiology report generation tasks. In this paper, we propose a new Dual Attention and Context Guidance (DACG) model to alleviate visual and textual data bias and promote the generation of long texts. We use a Dual Attention Module, including a Position Attention Block and a Channel Attention Block, to extract finer position and channel features from medical images, enhancing the image feature extraction ability of the encoder. We use the Context Guidance Module to integrate contextual information into the decoder and supervise the generation of long texts. The experimental results show that our proposed model achieves state-of-the-art performance on the most commonly used IU X-ray and MIMIC-CXR datasets. Further analysis also proves that our model can improve reporting through more accurate anomaly detection and more detailed descriptions. The source code is available at <span><span>https://github.com/LangWY/DACG</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103377"},"PeriodicalIF":10.7,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142554249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simulation-free prediction of atrial fibrillation inducibility with the fibrotic kernel signature 利用纤维核特征对心房颤动诱发性进行免模拟预测
IF 10.7 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-22 DOI: 10.1016/j.media.2024.103375
Tomás Banduc , Luca Azzolin , Martin Manninger , Daniel Scherr , Gernot Plank , Simone Pezzuto , Francisco Sahli Costabal
Computational models of atrial fibrillation (AF) can help improve success rates of interventions, such as ablation. However, evaluating the efficacy of different treatments requires performing multiple costly simulations by pacing at different points and checking whether AF has been induced or not, hindering the clinical application of these models. In this work, we propose a classification method that can predict AF inducibility in patient-specific cardiac models without running additional simulations. Our methodology does not require re-training when changing atrial anatomy or fibrotic patterns. To achieve this, we develop a set of features given by a variant of the heat kernel signature that incorporates fibrotic pattern information and fiber orientations: the fibrotic kernel signature (FKS). The FKS is faster to compute than a single AF simulation, and when paired with machine learning classifiers, it can predict AF inducibility in the entire domain. To learn the relationship between the FKS and AF inducibility, we performed 2371 AF simulations comprising 6 different anatomies and various fibrotic patterns, which we split into training and a testing set. We obtain a median F1 score of 85.2% in test set and we can predict the overall inducibility with a mean absolute error of 2.76 percent points, which is lower than alternative methods. We think our method can significantly speed-up the calculations of AF inducibility, which is crucial to optimize therapies for AF within clinical timelines. An example of the FKS for an open source model is provided in https://github.com/tbanduc/FKS_AtrialModel_Ferrer.git.
心房颤动(房颤)的计算模型有助于提高消融等干预措施的成功率。然而,评估不同治疗方法的疗效需要在不同点进行起搏并检查房颤是否被诱发,从而进行多次昂贵的模拟,这阻碍了这些模型的临床应用。在这项工作中,我们提出了一种分类方法,无需运行额外的模拟,即可预测患者特定心脏模型中的房颤诱导性。在改变心房解剖或纤维化模式时,我们的方法不需要重新训练。为此,我们开发了一套由热核特征变体给出的特征,其中包含纤维化模式信息和纤维方向:纤维化核特征(FKS)。FKS 的计算速度比单一房颤模拟更快,与机器学习分类器搭配使用时,可预测整个领域的房颤诱发率。为了了解 FKS 与房颤诱发率之间的关系,我们进行了 2371 次房颤模拟,包括 6 种不同的解剖结构和各种纤维化模式,并将其分为训练集和测试集。在测试集中,我们获得了 85.2% 的中位 F1 分数,我们可以预测总体诱导性,平均绝对误差为 2.76 个百分点,低于其他方法。我们认为,我们的方法可以大大加快房颤诱导性的计算速度,这对于在临床时限内优化房颤疗法至关重要。https://github.com/tbanduc/FKS_AtrialModel_Ferrer.git 中提供了一个开源模型的 FKS 示例。
{"title":"Simulation-free prediction of atrial fibrillation inducibility with the fibrotic kernel signature","authors":"Tomás Banduc ,&nbsp;Luca Azzolin ,&nbsp;Martin Manninger ,&nbsp;Daniel Scherr ,&nbsp;Gernot Plank ,&nbsp;Simone Pezzuto ,&nbsp;Francisco Sahli Costabal","doi":"10.1016/j.media.2024.103375","DOIUrl":"10.1016/j.media.2024.103375","url":null,"abstract":"<div><div>Computational models of atrial fibrillation (AF) can help improve success rates of interventions, such as ablation. However, evaluating the efficacy of different treatments requires performing multiple costly simulations by pacing at different points and checking whether AF has been induced or not, hindering the clinical application of these models. In this work, we propose a classification method that can predict AF inducibility in patient-specific cardiac models without running additional simulations. Our methodology does not require re-training when changing atrial anatomy or fibrotic patterns. To achieve this, we develop a set of features given by a variant of the heat kernel signature that incorporates fibrotic pattern information and fiber orientations: the fibrotic kernel signature (FKS). The FKS is faster to compute than a single AF simulation, and when paired with machine learning classifiers, it can predict AF inducibility in the entire domain. To learn the relationship between the FKS and AF inducibility, we performed 2371 AF simulations comprising 6 different anatomies and various fibrotic patterns, which we split into training and a testing set. We obtain a median F1 score of 85.2% in test set and we can predict the overall inducibility with a mean absolute error of 2.76 percent points, which is lower than alternative methods. We think our method can significantly speed-up the calculations of AF inducibility, which is crucial to optimize therapies for AF within clinical timelines. An example of the FKS for an open source model is provided in <span><span>https://github.com/tbanduc/FKS_AtrialModel_Ferrer.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103375"},"PeriodicalIF":10.7,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142546321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Medical image analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1