Pub Date : 2024-11-10DOI: 10.1016/j.media.2024.103383
Ricardo Bigolin Lanfredi, Pritam Mukherjee, Ronald M. Summers
In chest X-ray (CXR) image analysis, rule-based systems are usually employed to extract labels from reports for dataset releases. However, there is still room for improvement in label quality. These labelers typically output only presence labels, sometimes with binary uncertainty indicators, which limits their usefulness. Supervised deep learning models have also been developed for report labeling but lack adaptability, similar to rule-based systems. In this work, we present MAPLEZ (Medical report Annotations with Privacy-preserving Large language model using Expeditious Zero shot answers), a novel approach leveraging a locally executable Large Language Model (LLM) to extract and enhance findings labels on CXR reports. MAPLEZ extracts not only binary labels indicating the presence or absence of a finding but also the location, severity, and radiologists’ uncertainty about the finding. Over eight abnormalities from five test sets, we show that our method can extract these annotations with an increase of 3.6 percentage points (pp) in macro F1 score for categorical presence annotations and more than 20 pp increase in F1 score for the location annotations over competing labelers. Additionally, using the combination of improved annotations and multi-type annotations in classification supervision in a dataset of limited-resolution CXRs, we demonstrate substantial advancements in proof-of-concept classification quality, with an increase of 1.1 pp in AUROC over models trained with annotations from the best alternative approach. We share code and annotations.
在胸部 X 光(CXR)图像分析中,通常采用基于规则的系统从报告中提取标签,以便发布数据集。然而,标签质量仍有待提高。这些标签器通常只输出存在标签,有时还带有二进制不确定性指标,这限制了它们的实用性。也有人开发了用于报告标注的有监督深度学习模型,但与基于规则的系统类似,缺乏适应性。在这项工作中,我们提出了 MAPLEZ(使用快速零枪答案的隐私保护大语言模型医学报告注释),这是一种利用本地可执行大语言模型(LLM)来提取和增强 CXR 报告中的发现标签的新方法。MAPLEZ 不仅能提取二进制标签来表示有无发现,还能提取位置、严重程度和放射医师对发现的不确定性。在五个测试集中的八种异常情况中,我们证明了我们的方法可以提取这些注释,与竞争标签相比,分类存在注释的宏观 F1 分数提高了 3.6 个百分点,位置注释的 F1 分数提高了 20 多个百分点。此外,在有限分辨率 CXR 数据集的分类监督中结合使用改进注释和多类型注释,我们展示了概念验证分类质量的实质性进步,与使用最佳替代方法注释训练的模型相比,AUROC 提高了 1.1 个百分点。我们共享代码和注释。
{"title":"Enhancing chest X-ray datasets with privacy-preserving large language models and multi-type annotations: A data-driven approach for improved classification","authors":"Ricardo Bigolin Lanfredi, Pritam Mukherjee, Ronald M. Summers","doi":"10.1016/j.media.2024.103383","DOIUrl":"10.1016/j.media.2024.103383","url":null,"abstract":"<div><div>In chest X-ray (CXR) image analysis, rule-based systems are usually employed to extract labels from reports for dataset releases. However, there is still room for improvement in label quality. These labelers typically output only presence labels, sometimes with binary uncertainty indicators, which limits their usefulness. Supervised deep learning models have also been developed for report labeling but lack adaptability, similar to rule-based systems. In this work, we present MAPLEZ (Medical report Annotations with Privacy-preserving Large language model using Expeditious Zero shot answers), a novel approach leveraging a locally executable Large Language Model (LLM) to extract and enhance findings labels on CXR reports. MAPLEZ extracts not only binary labels indicating the presence or absence of a finding but also the location, severity, and radiologists’ uncertainty about the finding. Over eight abnormalities from five test sets, we show that our method can extract these annotations with an increase of 3.6 percentage points (pp) in macro F1 score for categorical presence annotations and more than 20 pp increase in F1 score for the location annotations over competing labelers. Additionally, using the combination of improved annotations and multi-type annotations in classification supervision in a dataset of limited-resolution CXRs, we demonstrate substantial advancements in proof-of-concept classification quality, with an increase of 1.1 pp in AUROC over models trained with annotations from the best alternative approach. We share code and annotations.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103383"},"PeriodicalIF":10.7,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142639276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-09DOI: 10.1016/j.media.2024.103388
Vincent Roca , Grégory Kuchcinski , Jean-Pierre Pruvo , Dorian Manouvriez , Renaud Lopes , the Australian Imaging Biomarkers and Lifestyle flagship study of ageing , the Alzheimer’s Disease Neuroimage Initiative
In MRI studies, the aggregation of imaging data from multiple acquisition sites enhances sample size but may introduce site-related variabilities that hinder consistency in subsequent analyses. Deep learning methods for image translation have emerged as a solution for harmonizing MR images across sites. In this study, we introduce IGUANe (Image Generation with Unified Adversarial Networks), an original 3D model that leverages the strengths of domain translation and straightforward application of style transfer methods for multicenter brain MR image harmonization. IGUANe extends CycleGAN by integrating an arbitrary number of domains for training through a many-to-one architecture. The framework based on domain pairs enables the implementation of sampling strategies that prevent confusion between site-related and biological variabilities. During inference, the model can be applied to any image, even from an unknown acquisition site, making it a universal generator for harmonization. Trained on a dataset comprising T1-weighted images from 11 different scanners, IGUANe was evaluated on data from unseen sites. The assessments included the transformation of MR images with traveling subjects, the preservation of pairwise distances between MR images within domains, the evolution of volumetric patterns related to age and Alzheimer’s disease (AD), and the performance in age regression and patient classification tasks. Comparisons with other harmonization and normalization methods suggest that IGUANe better preserves individual information in MR images and is more suitable for maintaining and reinforcing variabilities related to age and AD. Future studies may further assess IGUANe in other multicenter contexts, either using the same model or retraining it for applications to different image modalities. Codes and the trained IGUANe model are available at https://github.com/RocaVincent/iguane_harmonization.git.
{"title":"IGUANe: A 3D generalizable CycleGAN for multicenter harmonization of brain MR images","authors":"Vincent Roca , Grégory Kuchcinski , Jean-Pierre Pruvo , Dorian Manouvriez , Renaud Lopes , the Australian Imaging Biomarkers and Lifestyle flagship study of ageing , the Alzheimer’s Disease Neuroimage Initiative","doi":"10.1016/j.media.2024.103388","DOIUrl":"10.1016/j.media.2024.103388","url":null,"abstract":"<div><div>In MRI studies, the aggregation of imaging data from multiple acquisition sites enhances sample size but may introduce site-related variabilities that hinder consistency in subsequent analyses. Deep learning methods for image translation have emerged as a solution for harmonizing MR images across sites. In this study, we introduce IGUANe (Image Generation with Unified Adversarial Networks), an original 3D model that leverages the strengths of domain translation and straightforward application of style transfer methods for multicenter brain MR image harmonization. IGUANe extends CycleGAN by integrating an arbitrary number of domains for training through a many-to-one architecture. The framework based on domain pairs enables the implementation of sampling strategies that prevent confusion between site-related and biological variabilities. During inference, the model can be applied to any image, even from an unknown acquisition site, making it a universal generator for harmonization. Trained on a dataset comprising T1-weighted images from 11 different scanners, IGUANe was evaluated on data from unseen sites. The assessments included the transformation of MR images with traveling subjects, the preservation of pairwise distances between MR images within domains, the evolution of volumetric patterns related to age and Alzheimer’s disease (AD), and the performance in age regression and patient classification tasks. Comparisons with other harmonization and normalization methods suggest that IGUANe better preserves individual information in MR images and is more suitable for maintaining and reinforcing variabilities related to age and AD. Future studies may further assess IGUANe in other multicenter contexts, either using the same model or retraining it for applications to different image modalities. Codes and the trained IGUANe model are available at <span><span>https://github.com/RocaVincent/iguane_harmonization.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103388"},"PeriodicalIF":10.7,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142639278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-08DOI: 10.1016/j.media.2024.103382
Zheyuan Zhang , Elif Keles , Gorkem Durak , Yavuz Taktak , Onkar Susladkar , Vandan Gorade , Debesh Jha , Asli C. Ormeci , Alpay Medetalibeyoglu , Lanhong Yao , Bin Wang , Ilkin Sevgi Isler , Linkai Peng , Hongyi Pan , Camila Lopes Vendrami , Amir Bourhani , Yury Velichko , Boqing Gong , Concetto Spampinato , Ayis Pyrros , Ulas Bagci
Automated volumetric segmentation of the pancreas on cross-sectional imaging is needed for diagnosis and follow-up of pancreatic diseases. While CT-based pancreatic segmentation is more established, MRI-based segmentation methods are understudied, largely due to a lack of publicly available datasets, benchmarking research efforts, and domain-specific deep learning methods. In this retrospective study, we collected a large dataset (767 scans from 499 participants) of T1-weighted (T1 W) and T2-weighted (T2 W) abdominal MRI series from five centers between March 2004 and November 2022. We also collected CT scans of 1,350 patients from publicly available sources for benchmarking purposes. We introduced a new pancreas segmentation method, called PanSegNet, combining the strengths of nnUNet and a Transformer network with a new linear attention module enabling volumetric computation. We tested PanSegNet’s accuracy in cross-modality (a total of 2,117 scans) and cross-center settings with Dice and Hausdorff distance (HD95) evaluation metrics. We used Cohen’s kappa statistics for intra and inter-rater agreement evaluation and paired t-tests for volume and Dice comparisons, respectively. For segmentation accuracy, we achieved Dice coefficients of 88.3% (±7.2%, at case level) with CT, 85.0% (±7.9%) with T1 W MRI, and 86.3% (±6.4%) with T2 W MRI. There was a high correlation for pancreas volume prediction with of 0.91, 0.84, and 0.85 for CT, T1 W, and T2 W, respectively. We found moderate inter-observer (0.624 and 0.638 for T1 W and T2 W MRI, respectively) and high intra-observer agreement scores. All MRI data is made available at https://osf.io/kysnj/. Our source code is available at https://github.com/NUBagciLab/PaNSegNet.
{"title":"Large-scale multi-center CT and MRI segmentation of pancreas with deep learning","authors":"Zheyuan Zhang , Elif Keles , Gorkem Durak , Yavuz Taktak , Onkar Susladkar , Vandan Gorade , Debesh Jha , Asli C. Ormeci , Alpay Medetalibeyoglu , Lanhong Yao , Bin Wang , Ilkin Sevgi Isler , Linkai Peng , Hongyi Pan , Camila Lopes Vendrami , Amir Bourhani , Yury Velichko , Boqing Gong , Concetto Spampinato , Ayis Pyrros , Ulas Bagci","doi":"10.1016/j.media.2024.103382","DOIUrl":"10.1016/j.media.2024.103382","url":null,"abstract":"<div><div>Automated volumetric segmentation of the pancreas on cross-sectional imaging is needed for diagnosis and follow-up of pancreatic diseases. While CT-based pancreatic segmentation is more established, MRI-based segmentation methods are understudied, largely due to a lack of publicly available datasets, benchmarking research efforts, and domain-specific deep learning methods. In this retrospective study, we collected a large dataset (767 scans from 499 participants) of T1-weighted (T1 W) and T2-weighted (T2 W) abdominal MRI series from five centers between March 2004 and November 2022. We also collected CT scans of 1,350 patients from publicly available sources for benchmarking purposes. We introduced a new pancreas segmentation method, called <em>PanSegNet</em>, combining the strengths of <em>nnUNet</em> and a <em>Transformer</em> network with a new linear attention module enabling volumetric computation. We tested <em>PanSegNet</em>’s accuracy in cross-modality (a total of 2,117 scans) and cross-center settings with Dice and Hausdorff distance (HD95) evaluation metrics. We used Cohen’s kappa statistics for intra and inter-rater agreement evaluation and paired t-tests for volume and Dice comparisons, respectively. For segmentation accuracy, we achieved Dice coefficients of 88.3% (±7.2%, at case level) with CT, 85.0% (±7.9%) with T1 W MRI, and 86.3% (±6.4%) with T2 W MRI. There was a high correlation for pancreas volume prediction with <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> of 0.91, 0.84, and 0.85 for CT, T1 W, and T2 W, respectively. We found moderate inter-observer (0.624 and 0.638 for T1 W and T2 W MRI, respectively) and high intra-observer agreement scores. All MRI data is made available at <span><span>https://osf.io/kysnj/</span><svg><path></path></svg></span>. Our source code is available at <span><span>https://github.com/NUBagciLab/PaNSegNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103382"},"PeriodicalIF":10.7,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142623298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-04DOI: 10.1016/j.media.2024.103379
Pedro Esteban Chavarrias Solano , Andrew Bulpitt , Venkataraman Subramanian , Sharib Ali
Colonoscopy screening is the gold standard procedure for assessing abnormalities in the colon and rectum, such as ulcers and cancerous polyps. Measuring the abnormal mucosal area and its 3D reconstruction can help quantify the surveyed area and objectively evaluate disease burden. However, due to the complex topology of these organs and variable physical conditions, for example, lighting, large homogeneous texture, and image modality estimating distance from the camera (aka depth) is highly challenging. Moreover, most colonoscopic video acquisition is monocular, making the depth estimation a non-trivial problem. While methods in computer vision for depth estimation have been proposed and advanced on natural scene datasets, the efficacy of these techniques has not been widely quantified on colonoscopy datasets. As the colonic mucosa has several low-texture regions that are not well pronounced, learning representations from an auxiliary task can improve salient feature extraction, allowing estimation of accurate camera depths. In this work, we propose to develop a novel multi-task learning (MTL) approach with a shared encoder and two decoders, namely a surface normal decoder and a depth estimator decoder. Our depth estimator incorporates attention mechanisms to enhance global context awareness. We leverage the surface normal prediction to improve geometric feature extraction. Also, we apply a cross-task consistency loss among the two geometrically related tasks, surface normal and camera depth. We demonstrate an improvement of 15.75% on relative error and 10.7% improvement on accuracy over the most accurate baseline state-of-the-art Big-to-Small (BTS) approach. All experiments are conducted on a recently released C3VD dataset, and thus, we provide a first benchmark of state-of-the-art methods on this dataset.
{"title":"Multi-task learning with cross-task consistency for improved depth estimation in colonoscopy","authors":"Pedro Esteban Chavarrias Solano , Andrew Bulpitt , Venkataraman Subramanian , Sharib Ali","doi":"10.1016/j.media.2024.103379","DOIUrl":"10.1016/j.media.2024.103379","url":null,"abstract":"<div><div>Colonoscopy screening is the gold standard procedure for assessing abnormalities in the colon and rectum, such as ulcers and cancerous polyps. Measuring the abnormal mucosal area and its 3D reconstruction can help quantify the surveyed area and objectively evaluate disease burden. However, due to the complex topology of these organs and variable physical conditions, for example, lighting, large homogeneous texture, and image modality estimating distance from the camera (<em>aka</em> depth) is highly challenging. Moreover, most colonoscopic video acquisition is monocular, making the depth estimation a non-trivial problem. While methods in computer vision for depth estimation have been proposed and advanced on natural scene datasets, the efficacy of these techniques has not been widely quantified on colonoscopy datasets. As the colonic mucosa has several low-texture regions that are not well pronounced, learning representations from an auxiliary task can improve salient feature extraction, allowing estimation of accurate camera depths. In this work, we propose to develop a novel multi-task learning (MTL) approach with a shared encoder and two decoders, namely a surface normal decoder and a depth estimator decoder. Our depth estimator incorporates attention mechanisms to enhance global context awareness. We leverage the surface normal prediction to improve geometric feature extraction. Also, we apply a cross-task consistency loss among the two geometrically related tasks, surface normal and camera depth. We demonstrate an improvement of 15.75% on relative error and 10.7% improvement on <span><math><msub><mrow><mi>δ</mi></mrow><mrow><mn>1</mn><mo>.</mo><mn>25</mn></mrow></msub></math></span> accuracy over the most accurate baseline state-of-the-art Big-to-Small (BTS) approach. All experiments are conducted on a recently released C3VD dataset, and thus, we provide a first benchmark of state-of-the-art methods on this dataset.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103379"},"PeriodicalIF":10.7,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142623301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-01DOI: 10.1016/j.media.2024.103380
Yixiao Mao , Qianjin Feng , Yu Zhang , Zhenyuan Ning
Automatically labeling and segmenting vertebrae in 3D CT images compose a complex multi-task problem. Current methods progressively conduct vertebra labeling and semantic segmentation, which typically include two separate models and may ignore feature interaction among different tasks. Although instance segmentation approaches with multi-channel prediction have been proposed to alleviate such issues, their utilization of semantic information remains insufficient. Additionally, another challenge for an accurate model is how to effectively distinguish similar adjacent vertebrae and model their sequential attribute. In this paper, we propose a Semantics and Instance Interactive Learning (SIIL) paradigm for synchronous labeling and segmentation of vertebrae in CT images. SIIL models semantic feature learning and instance feature learning, in which the former extracts spinal semantics and the latter distinguishes vertebral instances. Interactive learning involves semantic features to improve the separability of vertebral instances and instance features to help learn position and contour information, during which a Morphological Instance Localization Learning (MILL) module is introduced to align semantic and instance features and facilitate their interaction. Furthermore, an Ordinal Contrastive Prototype Learning (OCPL) module is devised to differentiate adjacent vertebrae with high similarity (via cross-image contrastive learning), and simultaneously model their sequential attribute (via a temporal unit). Extensive experiments on several datasets demonstrate that our method significantly outperforms other approaches in labeling and segmenting vertebrae. Our code is available at https://github.com/YuZhang-SMU/Vertebrae-Labeling-Segmentation
{"title":"Semantics and instance interactive learning for labeling and segmentation of vertebrae in CT images","authors":"Yixiao Mao , Qianjin Feng , Yu Zhang , Zhenyuan Ning","doi":"10.1016/j.media.2024.103380","DOIUrl":"10.1016/j.media.2024.103380","url":null,"abstract":"<div><div>Automatically labeling and segmenting vertebrae in 3D CT images compose a complex multi-task problem. Current methods progressively conduct vertebra labeling and semantic segmentation, which typically include two separate models and may ignore feature interaction among different tasks. Although instance segmentation approaches with multi-channel prediction have been proposed to alleviate such issues, their utilization of semantic information remains insufficient. Additionally, another challenge for an accurate model is how to effectively distinguish similar adjacent vertebrae and model their sequential attribute. In this paper, we propose a Semantics and Instance Interactive Learning (SIIL) paradigm for synchronous labeling and segmentation of vertebrae in CT images. SIIL models semantic feature learning and instance feature learning, in which the former extracts spinal semantics and the latter distinguishes vertebral instances. Interactive learning involves semantic features to improve the separability of vertebral instances and instance features to help learn position and contour information, during which a Morphological Instance Localization Learning (MILL) module is introduced to align semantic and instance features and facilitate their interaction. Furthermore, an Ordinal Contrastive Prototype Learning (OCPL) module is devised to differentiate adjacent vertebrae with high similarity (via cross-image contrastive learning), and simultaneously model their sequential attribute (via a temporal unit). Extensive experiments on several datasets demonstrate that our method significantly outperforms other approaches in labeling and segmenting vertebrae. Our code is available at <span><span>https://github.com/YuZhang-SMU/Vertebrae-Labeling-Segmentation</span><svg><path></path></svg></span></div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103380"},"PeriodicalIF":10.7,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142605193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-30DOI: 10.1016/j.media.2024.103378
Qixiang Ma , Adrien Kaladji , Huazhong Shu , Guanyu Yang , Antoine Lucas , Pascal Haigron
Deep learning-based automated segmentation of vascular structures in preoperative CT angiography (CTA) images contributes to computer-assisted diagnosis and interventions. While CTA is the common standard, non-contrast CT imaging has the advantage of avoiding complications associated with contrast agents. However, the challenges of labor-intensive labeling and high labeling variability due to the ambiguity of vascular boundaries hinder conventional strong-label-based, fully-supervised learning in non-contrast CTs. This paper introduces a novel weakly-supervised framework using the elliptical topology nature of vascular structures in CT slices. It includes an efficient annotation process based on our proposed standards, an approach of generating 2D Gaussian heatmaps serving as pseudo labels, and a training process through a combination of voxel reconstruction loss and distribution loss with the pseudo labels. We assess the effectiveness of the proposed method on one local and two public datasets comprising non-contrast CT scans, particularly focusing on the abdominal aorta. On the local dataset, our weakly-supervised learning approach based on pseudo labels outperforms strong-label-based fully-supervised learning (1.54% of Dice score on average), reducing labeling time by around 82.0%. The efficiency in generating pseudo labels allows the inclusion of label-agnostic external data in the training set, leading to an additional improvement in performance (2.74% of Dice score on average) with a reduction of 66.3% labeling time, where the labeling time remains considerably less than that of strong labels. On the public dataset, the pseudo labels achieve an overall improvement of 1.95% in Dice score for 2D models with a reduction of 68% of the Hausdorff distance for 3D model.
{"title":"Beyond strong labels: Weakly-supervised learning based on Gaussian pseudo labels for the segmentation of ellipse-like vascular structures in non-contrast CTs","authors":"Qixiang Ma , Adrien Kaladji , Huazhong Shu , Guanyu Yang , Antoine Lucas , Pascal Haigron","doi":"10.1016/j.media.2024.103378","DOIUrl":"10.1016/j.media.2024.103378","url":null,"abstract":"<div><div>Deep learning-based automated segmentation of vascular structures in preoperative CT angiography (CTA) images contributes to computer-assisted diagnosis and interventions. While CTA is the common standard, non-contrast CT imaging has the advantage of avoiding complications associated with contrast agents. However, the challenges of labor-intensive labeling and high labeling variability due to the ambiguity of vascular boundaries hinder conventional strong-label-based, fully-supervised learning in non-contrast CTs. This paper introduces a novel weakly-supervised framework using the elliptical topology nature of vascular structures in CT slices. It includes an efficient annotation process based on our proposed standards, an approach of generating 2D Gaussian heatmaps serving as pseudo labels, and a training process through a combination of voxel reconstruction loss and distribution loss with the pseudo labels. We assess the effectiveness of the proposed method on one local and two public datasets comprising non-contrast CT scans, particularly focusing on the abdominal aorta. On the local dataset, our weakly-supervised learning approach based on pseudo labels outperforms strong-label-based fully-supervised learning (1.54% of Dice score on average), reducing labeling time by around 82.0%. The efficiency in generating pseudo labels allows the inclusion of label-agnostic external data in the training set, leading to an additional improvement in performance (2.74% of Dice score on average) with a reduction of 66.3% labeling time, where the labeling time remains considerably less than that of strong labels. On the public dataset, the pseudo labels achieve an overall improvement of 1.95% in Dice score for 2D models with a reduction of 68% of the Hausdorff distance for 3D model.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103378"},"PeriodicalIF":10.7,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142578929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-30DOI: 10.1016/j.media.2024.103381
Kimberly Amador , Noah Pinel , Anthony J. Winder , Jens Fiehler , Matthias Wilms , Nils D. Forkert
Acute ischemic stroke (AIS) remains a global health challenge, leading to long-term functional disabilities without timely intervention. Spatio-temporal (4D) Computed Tomography Perfusion (CTP) imaging is crucial for diagnosing and treating AIS due to its ability to rapidly assess the ischemic core and penumbra. Although traditionally used to assess acute tissue status in clinical settings, 4D CTP has also been explored in research for predicting stroke tissue outcomes. However, its potential for predicting functional outcomes, especially in combination with clinical metadata, remains unexplored. Thus, this work aims to develop and evaluate a novel multimodal deep learning model for predicting functional outcomes (specifically, 90-day modified Rankin Scale) in AIS patients by combining 4D CTP and clinical metadata. To achieve this, an intermediate fusion strategy with a cross-attention mechanism is introduced to enable a selective focus on the most relevant features and patterns from both modalities. Evaluated on a dataset comprising 70 AIS patients who underwent endovascular mechanical thrombectomy, the proposed model achieves an accuracy (ACC) of 0.77, outperforming conventional late fusion strategies (ACC = 0.73) and unimodal models based on either 4D CTP (ACC = 0.61) or clinical metadata (ACC = 0.71). The results demonstrate the superior capability of the proposed model to leverage complex inter-modal relationships, emphasizing the value of advanced multimodal fusion techniques for predicting functional stroke outcomes.
{"title":"A cross-attention-based deep learning approach for predicting functional stroke outcomes using 4D CTP imaging and clinical metadata","authors":"Kimberly Amador , Noah Pinel , Anthony J. Winder , Jens Fiehler , Matthias Wilms , Nils D. Forkert","doi":"10.1016/j.media.2024.103381","DOIUrl":"10.1016/j.media.2024.103381","url":null,"abstract":"<div><div>Acute ischemic stroke (AIS) remains a global health challenge, leading to long-term functional disabilities without timely intervention. Spatio-temporal (4D) Computed Tomography Perfusion (CTP) imaging is crucial for diagnosing and treating AIS due to its ability to rapidly assess the ischemic core and penumbra. Although traditionally used to assess acute tissue status in clinical settings, 4D CTP has also been explored in research for predicting stroke tissue outcomes. However, its potential for predicting functional outcomes, especially in combination with clinical metadata, remains unexplored. Thus, this work aims to develop and evaluate a novel multimodal deep learning model for predicting functional outcomes (specifically, 90-day modified Rankin Scale) in AIS patients by combining 4D CTP and clinical metadata. To achieve this, an intermediate fusion strategy with a cross-attention mechanism is introduced to enable a selective focus on the most relevant features and patterns from both modalities. Evaluated on a dataset comprising 70 AIS patients who underwent endovascular mechanical thrombectomy, the proposed model achieves an accuracy (ACC) of 0.77, outperforming conventional late fusion strategies (ACC = 0.73) and unimodal models based on either 4D CTP (ACC = 0.61) or clinical metadata (ACC = 0.71). The results demonstrate the superior capability of the proposed model to leverage complex inter-modal relationships, emphasizing the value of advanced multimodal fusion techniques for predicting functional stroke outcomes.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103381"},"PeriodicalIF":10.7,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142578930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-24DOI: 10.1016/j.media.2024.103376
Lanzhuju Mei , Ke Deng , Zhiming Cui , Yu Fang , Yuan Li , Hongchang Lai , Maurizio S. Tonetti , Dinggang Shen
Accurate classification of periodontal disease through panoramic X-ray images carries immense clinical importance for effective diagnosis and treatment. Recent methodologies attempt to classify periodontal diseases from X-ray images by estimating bone loss within these images, supervised by manual radiographic annotations for segmentation or keypoint detection. However, these annotations often lack consistency with the clinical gold standard of probing measurements, potentially causing measurement inaccuracy and leading to unstable classifications. Additionally, the diagnosis of periodontal disease necessitates exceptional sensitivity. To address these challenges, we introduce HC-Net, an innovative hybrid classification framework devised for accurately classifying periodontal disease from X-ray images. This framework comprises three main components: tooth-level classification, patient-level classification, and a learnable adaptive noisy-OR gate. In the tooth-level classification, we initially employ instance segmentation to individually identify each tooth, followed by tooth-level periodontal disease classification. For patient-level classification, we utilize a multi-task strategy to concurrently learn patient-level classification and a Classification Activation Map (CAM) that signifies the confidence of local lesion areas within the panoramic X-ray image. Eventually, our adaptive noisy-OR gate acquires a hybrid classification by amalgamating predictions from both levels. In particular, we incorporate clinical knowledge into the workflows used by professional dentists, targeting the enhanced handling of sensitivity of periodontal disease diagnosis. Extensive empirical testing on a dataset amassed from real-world clinics demonstrates that our proposed HC-Net achieves unparalleled performance in periodontal disease classification, exhibiting substantial potential for practical application.
通过全景 X 光图像对牙周疾病进行准确分类,对有效诊断和治疗具有极大的临床意义。最近的方法试图通过估算 X 光图像中的骨质流失来对牙周疾病进行分类,并通过人工放射注释进行分割或关键点检测。然而,这些注释往往与探查测量的临床黄金标准缺乏一致性,有可能造成测量不准确,导致分类不稳定。此外,牙周病的诊断还需要极高的灵敏度。为了应对这些挑战,我们引入了 HC-Net,这是一个创新的混合分类框架,用于从 X 光图像中准确地对牙周病进行分类。该框架由三个主要部分组成:牙齿级分类、患者级分类和可学习的自适应噪声-OR 门。在牙齿级分类中,我们首先使用实例分割来单独识别每颗牙齿,然后进行牙齿级牙周病分类。在患者级分类中,我们采用多任务策略,同时学习患者级分类和分类激活图(CAM),该图表示全景 X 光图像中局部病变区域的可信度。最终,我们的自适应噪声-OR 门通过综合两个层面的预测结果,获得混合分类。特别是,我们将临床知识融入到专业牙医使用的工作流程中,以提高牙周疾病诊断的敏感度为目标。在真实诊所收集的数据集上进行的广泛实证测试表明,我们提出的 HC-Net 在牙周病分类方面取得了无与伦比的性能,在实际应用中展现出巨大的潜力。
{"title":"Clinical knowledge-guided hybrid classification network for automatic periodontal disease diagnosis in X-ray image","authors":"Lanzhuju Mei , Ke Deng , Zhiming Cui , Yu Fang , Yuan Li , Hongchang Lai , Maurizio S. Tonetti , Dinggang Shen","doi":"10.1016/j.media.2024.103376","DOIUrl":"10.1016/j.media.2024.103376","url":null,"abstract":"<div><div>Accurate classification of periodontal disease through panoramic X-ray images carries immense clinical importance for effective diagnosis and treatment. Recent methodologies attempt to classify periodontal diseases from X-ray images by estimating bone loss within these images, supervised by manual radiographic annotations for segmentation or keypoint detection. However, these annotations often lack consistency with the clinical gold standard of probing measurements, potentially causing measurement inaccuracy and leading to unstable classifications. Additionally, the diagnosis of periodontal disease necessitates exceptional sensitivity. To address these challenges, we introduce HC-Net, an innovative hybrid classification framework devised for accurately classifying periodontal disease from X-ray images. This framework comprises three main components: tooth-level classification, patient-level classification, and a learnable adaptive noisy-OR gate. In the tooth-level classification, we initially employ instance segmentation to individually identify each tooth, followed by tooth-level periodontal disease classification. For patient-level classification, we utilize a multi-task strategy to concurrently learn patient-level classification and a Classification Activation Map (CAM) that signifies the confidence of local lesion areas within the panoramic X-ray image. Eventually, our adaptive noisy-OR gate acquires a hybrid classification by amalgamating predictions from both levels. In particular, we incorporate clinical knowledge into the workflows used by professional dentists, targeting the enhanced handling of sensitivity of periodontal disease diagnosis. Extensive empirical testing on a dataset amassed from real-world clinics demonstrates that our proposed HC-Net achieves unparalleled performance in periodontal disease classification, exhibiting substantial potential for practical application.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103376"},"PeriodicalIF":10.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142623295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-23DOI: 10.1016/j.media.2024.103377
Wangyu Lang, Zhi Liu, Yijia Zhang
Medical images are an essential basis for radiologists to write radiology reports and greatly help subsequent clinical treatment. The task of generating automatic radiology reports aims to alleviate the burden of clinical doctors writing reports and has received increasing attention this year, becoming an important research hotspot. However, there are severe issues of visual and textual data bias and long text generation in the medical field. Firstly, Abnormal areas in radiological images only account for a small portion, and most radiological reports only involve descriptions of normal findings. Secondly, there are still significant challenges in generating longer and more accurate descriptive texts for radiology report generation tasks. In this paper, we propose a new Dual Attention and Context Guidance (DACG) model to alleviate visual and textual data bias and promote the generation of long texts. We use a Dual Attention Module, including a Position Attention Block and a Channel Attention Block, to extract finer position and channel features from medical images, enhancing the image feature extraction ability of the encoder. We use the Context Guidance Module to integrate contextual information into the decoder and supervise the generation of long texts. The experimental results show that our proposed model achieves state-of-the-art performance on the most commonly used IU X-ray and MIMIC-CXR datasets. Further analysis also proves that our model can improve reporting through more accurate anomaly detection and more detailed descriptions. The source code is available at https://github.com/LangWY/DACG.
医学影像是放射科医生撰写放射报告的重要依据,对后续临床治疗有很大帮助。自动生成放射报告的任务旨在减轻临床医生撰写报告的负担,今年以来受到越来越多的关注,成为一个重要的研究热点。然而,医学领域存在着严重的视觉和文本数据偏差以及长文本生成问题。首先,放射图像中的异常区域只占一小部分,大多数放射报告只涉及正常结果的描述。其次,在为放射学报告生成任务生成更长、更准确的描述性文本方面仍存在巨大挑战。在本文中,我们提出了一种新的双重注意和上下文引导(DACG)模型,以减轻视觉和文本数据的偏差,促进长文本的生成。我们使用双注意模块(包括位置注意模块和通道注意模块)从医学图像中提取更精细的位置和通道特征,从而增强编码器的图像特征提取能力。我们使用上下文引导模块将上下文信息整合到解码器中,并监督长文本的生成。实验结果表明,我们提出的模型在最常用的 IU X 光和 MIMIC-CXR 数据集上取得了一流的性能。进一步的分析还证明,我们的模型可以通过更准确的异常检测和更详细的描述来改进报告。源代码见 https://github.com/LangWY/DACG。
{"title":"DACG: Dual Attention and Context Guidance model for radiology report generation","authors":"Wangyu Lang, Zhi Liu, Yijia Zhang","doi":"10.1016/j.media.2024.103377","DOIUrl":"10.1016/j.media.2024.103377","url":null,"abstract":"<div><div>Medical images are an essential basis for radiologists to write radiology reports and greatly help subsequent clinical treatment. The task of generating automatic radiology reports aims to alleviate the burden of clinical doctors writing reports and has received increasing attention this year, becoming an important research hotspot. However, there are severe issues of visual and textual data bias and long text generation in the medical field. Firstly, Abnormal areas in radiological images only account for a small portion, and most radiological reports only involve descriptions of normal findings. Secondly, there are still significant challenges in generating longer and more accurate descriptive texts for radiology report generation tasks. In this paper, we propose a new Dual Attention and Context Guidance (DACG) model to alleviate visual and textual data bias and promote the generation of long texts. We use a Dual Attention Module, including a Position Attention Block and a Channel Attention Block, to extract finer position and channel features from medical images, enhancing the image feature extraction ability of the encoder. We use the Context Guidance Module to integrate contextual information into the decoder and supervise the generation of long texts. The experimental results show that our proposed model achieves state-of-the-art performance on the most commonly used IU X-ray and MIMIC-CXR datasets. Further analysis also proves that our model can improve reporting through more accurate anomaly detection and more detailed descriptions. The source code is available at <span><span>https://github.com/LangWY/DACG</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103377"},"PeriodicalIF":10.7,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142554249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-22DOI: 10.1016/j.media.2024.103375
Tomás Banduc , Luca Azzolin , Martin Manninger , Daniel Scherr , Gernot Plank , Simone Pezzuto , Francisco Sahli Costabal
Computational models of atrial fibrillation (AF) can help improve success rates of interventions, such as ablation. However, evaluating the efficacy of different treatments requires performing multiple costly simulations by pacing at different points and checking whether AF has been induced or not, hindering the clinical application of these models. In this work, we propose a classification method that can predict AF inducibility in patient-specific cardiac models without running additional simulations. Our methodology does not require re-training when changing atrial anatomy or fibrotic patterns. To achieve this, we develop a set of features given by a variant of the heat kernel signature that incorporates fibrotic pattern information and fiber orientations: the fibrotic kernel signature (FKS). The FKS is faster to compute than a single AF simulation, and when paired with machine learning classifiers, it can predict AF inducibility in the entire domain. To learn the relationship between the FKS and AF inducibility, we performed 2371 AF simulations comprising 6 different anatomies and various fibrotic patterns, which we split into training and a testing set. We obtain a median F1 score of 85.2% in test set and we can predict the overall inducibility with a mean absolute error of 2.76 percent points, which is lower than alternative methods. We think our method can significantly speed-up the calculations of AF inducibility, which is crucial to optimize therapies for AF within clinical timelines. An example of the FKS for an open source model is provided in https://github.com/tbanduc/FKS_AtrialModel_Ferrer.git.
{"title":"Simulation-free prediction of atrial fibrillation inducibility with the fibrotic kernel signature","authors":"Tomás Banduc , Luca Azzolin , Martin Manninger , Daniel Scherr , Gernot Plank , Simone Pezzuto , Francisco Sahli Costabal","doi":"10.1016/j.media.2024.103375","DOIUrl":"10.1016/j.media.2024.103375","url":null,"abstract":"<div><div>Computational models of atrial fibrillation (AF) can help improve success rates of interventions, such as ablation. However, evaluating the efficacy of different treatments requires performing multiple costly simulations by pacing at different points and checking whether AF has been induced or not, hindering the clinical application of these models. In this work, we propose a classification method that can predict AF inducibility in patient-specific cardiac models without running additional simulations. Our methodology does not require re-training when changing atrial anatomy or fibrotic patterns. To achieve this, we develop a set of features given by a variant of the heat kernel signature that incorporates fibrotic pattern information and fiber orientations: the fibrotic kernel signature (FKS). The FKS is faster to compute than a single AF simulation, and when paired with machine learning classifiers, it can predict AF inducibility in the entire domain. To learn the relationship between the FKS and AF inducibility, we performed 2371 AF simulations comprising 6 different anatomies and various fibrotic patterns, which we split into training and a testing set. We obtain a median F1 score of 85.2% in test set and we can predict the overall inducibility with a mean absolute error of 2.76 percent points, which is lower than alternative methods. We think our method can significantly speed-up the calculations of AF inducibility, which is crucial to optimize therapies for AF within clinical timelines. An example of the FKS for an open source model is provided in <span><span>https://github.com/tbanduc/FKS_AtrialModel_Ferrer.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"99 ","pages":"Article 103375"},"PeriodicalIF":10.7,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142546321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}