首页 > 最新文献

Journal of imaging informatics in medicine最新文献

英文 中文
Edge-Aware Dual-Branch CNN Architecture for Alzheimer's Disease Diagnosis. 边缘感知双分支CNN架构在阿尔茨海默病诊断中的应用。
Pub Date : 2026-01-27 DOI: 10.1007/s10278-025-01836-5
Man Li, Mei Choo Ang, Musatafa Abbas Abbood Albadr, Jun Kit Chaw, JianBang Liu, Kok Weng Ng, Wei Hong

The rapid development of machine learning (ML) and deep learning (DL) has greatly advanced Alzheimer's disease (AD) diagnosis. However, existing models struggle to capture weak structural features in the marginal regions of brain MRI images, leading to limited diagnostic accuracy. To address this challenge, we introduce a Dual-Branch Convolutional Neural Network (DBCNN) equipped with a Learnable Edge Detection Module designed to jointly learn global semantic representations and fine-grained edge cues within a unified framework. Experimental results on two public datasets demonstrate that DBCNN significantly improves classification accuracy, surpassing 98%. Notably, on the OASIS dataset, it achieved an average accuracy of 99.71%, demonstrating strong generalization and robustness. This high diagnostic performance indicates that the model can assist clinicians in the early detection of Alzheimer's disease, reduce subjectivity in manual image interpretation, and enhance diagnostic consistency. Overall, the proposed approach provides a promising pathway toward intelligent, interpretable, and computationally efficient solutions for MRI-based diagnosis, offering strong potential to support early clinical decision-making.

机器学习(ML)和深度学习(DL)的快速发展极大地推动了阿尔茨海默病(AD)的诊断。然而,现有的模型难以捕捉大脑MRI图像边缘区域的弱结构特征,导致诊断准确性有限。为了解决这一挑战,我们引入了一个双分支卷积神经网络(DBCNN),该网络配备了一个可学习的边缘检测模块,旨在在统一的框架内联合学习全局语义表示和细粒度边缘线索。在两个公开数据集上的实验结果表明,DBCNN显著提高了分类准确率,达到98%以上。值得注意的是,在OASIS数据集上,平均准确率达到99.71%,显示出较强的泛化和鲁棒性。这种高诊断性能表明,该模型可以帮助临床医生早期发现阿尔茨海默病,减少人工图像解释的主观性,提高诊断的一致性。总的来说,该方法为基于mri的诊断提供了智能、可解释和计算效率高的解决方案,为支持早期临床决策提供了强大的潜力。
{"title":"Edge-Aware Dual-Branch CNN Architecture for Alzheimer's Disease Diagnosis.","authors":"Man Li, Mei Choo Ang, Musatafa Abbas Abbood Albadr, Jun Kit Chaw, JianBang Liu, Kok Weng Ng, Wei Hong","doi":"10.1007/s10278-025-01836-5","DOIUrl":"https://doi.org/10.1007/s10278-025-01836-5","url":null,"abstract":"<p><p>The rapid development of machine learning (ML) and deep learning (DL) has greatly advanced Alzheimer's disease (AD) diagnosis. However, existing models struggle to capture weak structural features in the marginal regions of brain MRI images, leading to limited diagnostic accuracy. To address this challenge, we introduce a Dual-Branch Convolutional Neural Network (DBCNN) equipped with a Learnable Edge Detection Module designed to jointly learn global semantic representations and fine-grained edge cues within a unified framework. Experimental results on two public datasets demonstrate that DBCNN significantly improves classification accuracy, surpassing 98%. Notably, on the OASIS dataset, it achieved an average accuracy of 99.71%, demonstrating strong generalization and robustness. This high diagnostic performance indicates that the model can assist clinicians in the early detection of Alzheimer's disease, reduce subjectivity in manual image interpretation, and enhance diagnostic consistency. Overall, the proposed approach provides a promising pathway toward intelligent, interpretable, and computationally efficient solutions for MRI-based diagnosis, offering strong potential to support early clinical decision-making.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146055759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lung Cancer Classification Using Effective Fusion Network Integrating Transformers and Controllable Convolutional Encoders-Decoders. 基于变压器和可控卷积编解码器有效融合网络的肺癌分类。
Pub Date : 2026-01-27 DOI: 10.1007/s10278-025-01830-x
Evgin Goceri

In this work, a new fusion network was developed and applied to lung cancer classification. It incorporates a transformer-based module, a convolutional module with encoders, and another convolutional module with decoders. Each module is strategically placed and extracts features at different scales, enabling the network to capture enriched feature information at both global and local levels. A novel hybrid loss function was also employed to reduce both pixel- and image-based differences while enhancing region-wise consistency. The model's effectiveness was evaluated by classifying lung cancer subtypes from computed tomography scans, a highly challenging task due to factors such as high interclass similarity and the presence of nontumor features. Moreover, recent methods used for lung cancer classification were applied to identical datasets and evaluated using identical metrics to ensure fair comparative assessments. The results demonstrate the superiority of the proposed approach in lung cancer subtype classification, achieving higher accuracy (96.59%), recall (96.68%), precision (96.90%), and F1-score (96.65%) compared to recent methods.

本文提出了一种新的融合网络,并将其应用于肺癌分类。它包含一个基于变压器的模块,一个带有编码器的卷积模块,以及另一个带有解码器的卷积模块。每个模块都有策略地放置并提取不同尺度的特征,使网络能够在全球和局部级别捕获丰富的特征信息。一种新的混合损失函数也被用来减少基于像素和图像的差异,同时增强区域一致性。该模型的有效性是通过从计算机断层扫描中对肺癌亚型进行分类来评估的,这是一项极具挑战性的任务,因为诸如高类间相似性和非肿瘤特征的存在等因素。此外,最近用于肺癌分类的方法应用于相同的数据集,并使用相同的指标进行评估,以确保公平的比较评估。结果表明,该方法在肺癌亚型分类方面具有优势,与现有方法相比,准确率(96.59%)、召回率(96.68%)、精密度(96.90%)和f1评分(96.65%)均较高。
{"title":"Lung Cancer Classification Using Effective Fusion Network Integrating Transformers and Controllable Convolutional Encoders-Decoders.","authors":"Evgin Goceri","doi":"10.1007/s10278-025-01830-x","DOIUrl":"https://doi.org/10.1007/s10278-025-01830-x","url":null,"abstract":"<p><p>In this work, a new fusion network was developed and applied to lung cancer classification. It incorporates a transformer-based module, a convolutional module with encoders, and another convolutional module with decoders. Each module is strategically placed and extracts features at different scales, enabling the network to capture enriched feature information at both global and local levels. A novel hybrid loss function was also employed to reduce both pixel- and image-based differences while enhancing region-wise consistency. The model's effectiveness was evaluated by classifying lung cancer subtypes from computed tomography scans, a highly challenging task due to factors such as high interclass similarity and the presence of nontumor features. Moreover, recent methods used for lung cancer classification were applied to identical datasets and evaluated using identical metrics to ensure fair comparative assessments. The results demonstrate the superiority of the proposed approach in lung cancer subtype classification, achieving higher accuracy (96.59%), recall (96.68%), precision (96.90%), and F1-score (96.65%) compared to recent methods.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146055896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generating Training Data for Ureter Segmentation Using Dual-Energy CT Two-Material Decomposition. 利用双能CT双材料分解生成输尿管分割训练数据。
Pub Date : 2026-01-26 DOI: 10.1007/s10278-026-01847-w
Dae Chul Jung, Jungwook Lee, Seungsoo Lee, Sung Il Jung, Myoung Seok Lee, Min Hoan Moon

This study aimed to evaluate the utility of dual-energy CT (DECT)-based two-material decomposition in facilitating the generation of training data for ureter segmentation. This retrospective two-center study included 180 patients who underwent DECT urography between April and July 2020, including 150 from Institution 1 and 30 from Institution 2. Virtual unenhanced (VUE) images were generated from the late excretory phase (LEP) images using a two-material decomposition technique. Ground truth segmentation masks were created by segmenting contrast-filled ureteral regions on LEP images and were then paired with the corresponding VUE images. These VUE images and their corresponding ground truth masks were used to construct training, validation, and test datasets. A deep learning-based segmentation model was developed using the nnU-Net framework. Its performance was evaluated using the Dice coefficient, precision, and recall. In the internal test dataset, the model achieved excellent performance, with a median Dice coefficient of 0.89 (95% CI 0.88-0.90), precision of 0.90 (95% CI 0.88-0.92), and recall of 0.88 (95% CI 0.86-0.91). In contrast, the external validation dataset yielded limited performance, with a median Dice coefficient of 0.43 (95% CI 0.31-0.61) and recall of 0.28 (95% CI 0.18-0.45), while precision remained high at 0.95 (95% CI 0.93-0.96). There were statistically significant differences in all metrics between the internal and external datasets (P < 0.01). DECT-based two-material decomposition is a feasible method for generating training data for ureter segmentation. Although external validation performance was limited, this approach shows promise for ureter segmentation on non-contrast CT scans.

本研究旨在评估基于双能CT (DECT)的双材料分解在促进输尿管分割训练数据生成方面的效用。这项回顾性双中心研究纳入了2020年4月至7月期间接受DECT尿路造影的180例患者,其中150例来自机构1,30例来自机构2。使用双材料分解技术从排泄后期(LEP)图像生成虚拟未增强(VUE)图像。通过在LEP图像上分割造影剂填充的输尿管区域,创建Ground truth分割蒙版,然后与相应的VUE图像配对。这些VUE图像及其相应的地面真值掩模用于构建训练、验证和测试数据集。在nnU-Net框架下,建立了基于深度学习的分割模型。使用Dice系数、精度和召回率来评估其性能。在内部测试数据集中,该模型取得了优异的性能,中位Dice系数为0.89 (95% CI 0.88-0.90),精度为0.90 (95% CI 0.88-0.92),召回率为0.88 (95% CI 0.86-0.91)。相比之下,外部验证数据集产生了有限的性能,中位Dice系数为0.43 (95% CI 0.31-0.61),召回率为0.28 (95% CI 0.18-0.45),而精度仍然很高,为0.95 (95% CI 0.93-0.96)。内部和外部数据集在所有指标上有统计学显著差异(P
{"title":"Generating Training Data for Ureter Segmentation Using Dual-Energy CT Two-Material Decomposition.","authors":"Dae Chul Jung, Jungwook Lee, Seungsoo Lee, Sung Il Jung, Myoung Seok Lee, Min Hoan Moon","doi":"10.1007/s10278-026-01847-w","DOIUrl":"https://doi.org/10.1007/s10278-026-01847-w","url":null,"abstract":"<p><p>This study aimed to evaluate the utility of dual-energy CT (DECT)-based two-material decomposition in facilitating the generation of training data for ureter segmentation. This retrospective two-center study included 180 patients who underwent DECT urography between April and July 2020, including 150 from Institution 1 and 30 from Institution 2. Virtual unenhanced (VUE) images were generated from the late excretory phase (LEP) images using a two-material decomposition technique. Ground truth segmentation masks were created by segmenting contrast-filled ureteral regions on LEP images and were then paired with the corresponding VUE images. These VUE images and their corresponding ground truth masks were used to construct training, validation, and test datasets. A deep learning-based segmentation model was developed using the nnU-Net framework. Its performance was evaluated using the Dice coefficient, precision, and recall. In the internal test dataset, the model achieved excellent performance, with a median Dice coefficient of 0.89 (95% CI 0.88-0.90), precision of 0.90 (95% CI 0.88-0.92), and recall of 0.88 (95% CI 0.86-0.91). In contrast, the external validation dataset yielded limited performance, with a median Dice coefficient of 0.43 (95% CI 0.31-0.61) and recall of 0.28 (95% CI 0.18-0.45), while precision remained high at 0.95 (95% CI 0.93-0.96). There were statistically significant differences in all metrics between the internal and external datasets (P < 0.01). DECT-based two-material decomposition is a feasible method for generating training data for ureter segmentation. Although external validation performance was limited, this approach shows promise for ureter segmentation on non-contrast CT scans.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146055782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IPATH: A Large-Scale Pathology Image-Text Dataset from Instagram for Vision-Language Model Training. IPATH:来自Instagram的用于视觉语言模型训练的大规模病理图像-文本数据集。
Pub Date : 2026-01-23 DOI: 10.1007/s10278-025-01820-z
S Mirhosseini, T Rai, P Diaz-Santana, R La Ragione, N Bacon, K Wells

Recent advancements in artificial intelligence (AI) have revealed important patterns in pathology images imperceptible to human observers that can improve diagnostic accuracy and decision support systems. However, progress has been limited due to the lack of publicly available medical images. To address this scarcity, we explore Instagram as a novel source of pathology images with expert annotations. We curated the IPATH dataset from Instagram, comprising 45,609 pathology image-text pairs rigorously filtered and curated for domain quality using classifiers, large language models, and manual filtering. To demonstrate the value of this dataset, we developed a multimodal AI model called IP-CLIP by fine-tuning a pretrained CLIP model using the IPATH dataset. We evaluated IP-CLIP on seven external histopathology datasets using zero shot classification and linear probing, where it consistently outperformed the original CLIP model. Furthermore, IP-CLIP matched or exceeded several recent state-of-the-art pathology vision-language models, despite being trained on a substantially smaller dataset. We also assessed image-text alignment on a 5k held-out IPATH subset using image-text retrieval, where IP-CLIP surpassed CLIP and other specialized models. These results demonstrate the effectiveness of the IPATH dataset and highlight the potential of leveraging social media data to develop AI models for medical image classification and enhance diagnostic accuracy.

人工智能(AI)的最新进展揭示了人类观察者无法察觉的病理图像中的重要模式,可以提高诊断准确性和决策支持系统。然而,由于缺乏公开的医学图像,进展有限。为了解决这种稀缺性,我们将Instagram作为一种带有专家注释的病理图像的新来源进行了探索。我们整理了来自Instagram的IPATH数据集,包括45,609对病理图像-文本对,使用分类器、大型语言模型和手动过滤严格过滤和整理了领域质量。为了证明该数据集的价值,我们通过使用IPATH数据集对预训练的CLIP模型进行微调,开发了一个名为IP-CLIP的多模态人工智能模型。我们使用零射击分类和线性探测在七个外部组织病理学数据集上评估了IP-CLIP,在这些数据集上,它始终优于原始CLIP模型。此外,IP-CLIP匹配或超过了几个最新的最先进的病理视觉语言模型,尽管是在一个小得多的数据集上训练的。我们还使用图像-文本检索在5k的IPATH子集上评估了图像-文本对齐,其中IP-CLIP优于CLIP和其他专用模型。这些结果证明了IPATH数据集的有效性,并突出了利用社交媒体数据开发用于医学图像分类和提高诊断准确性的人工智能模型的潜力。
{"title":"IPATH: A Large-Scale Pathology Image-Text Dataset from Instagram for Vision-Language Model Training.","authors":"S Mirhosseini, T Rai, P Diaz-Santana, R La Ragione, N Bacon, K Wells","doi":"10.1007/s10278-025-01820-z","DOIUrl":"https://doi.org/10.1007/s10278-025-01820-z","url":null,"abstract":"<p><p>Recent advancements in artificial intelligence (AI) have revealed important patterns in pathology images imperceptible to human observers that can improve diagnostic accuracy and decision support systems. However, progress has been limited due to the lack of publicly available medical images. To address this scarcity, we explore Instagram as a novel source of pathology images with expert annotations. We curated the IPATH dataset from Instagram, comprising 45,609 pathology image-text pairs rigorously filtered and curated for domain quality using classifiers, large language models, and manual filtering. To demonstrate the value of this dataset, we developed a multimodal AI model called IP-CLIP by fine-tuning a pretrained CLIP model using the IPATH dataset. We evaluated IP-CLIP on seven external histopathology datasets using zero shot classification and linear probing, where it consistently outperformed the original CLIP model. Furthermore, IP-CLIP matched or exceeded several recent state-of-the-art pathology vision-language models, despite being trained on a substantially smaller dataset. We also assessed image-text alignment on a 5k held-out IPATH subset using image-text retrieval, where IP-CLIP surpassed CLIP and other specialized models. These results demonstrate the effectiveness of the IPATH dataset and highlight the potential of leveraging social media data to develop AI models for medical image classification and enhance diagnostic accuracy.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146042272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large Language Models in Radiologic Numerical Tasks: A Thorough Evaluation and Error Analysis. 放射学数值任务中的大型语言模型:全面的评估和误差分析。
Pub Date : 2026-01-21 DOI: 10.1007/s10278-025-01824-9
Ali Nowroozi, Masha Bondarenko, Adrian Serapio, Tician Schnitzler, Sukhmanjit S Brar, Jae Ho Sohn

The purpose of this study was to investigate the performance of LLMs in radiology numerical tasks and perform a comprehensive error analysis. We defined six tasks: extracting (1) minimum T-score from DEXA report, (2) maximum common bile duct (CBD) diameter from ultrasound report, and (3) maximum lung nodule size from CT report, and judging (1) presence of a highly hypermetabolic region on a PET report, (2) whether a patient is osteoporotic based on a DEXA report, and (3) whether a patient has a dilated CBD based on an ultrasound report. Reports were extracted from the MIMIC III and our institution's databases, and the ground truths were extracted manually. The models used were Llama 3.1 8b, DeepSeek R1 distilled Llama 8b, OpenAI o1-mini, and OpenAI GPT-5-mini. We manually reviewed all incorrect outputs and performed a comprehensive error analysis. In extraction tasks, while Llama showed relatively variable results (ranging 86%-98.7%) across tasks, other models performed consistently well (accuracies > 95%). In judgment tasks, the lowest accuracies of Llama, DeepSeek distilled Llama, o1-mini, and GPT-5-mini were 62.0%, 91.7%, 91.7%, and 99.0%, respectively, while o1-mini and GPT-5-mini did reach 100% performance in detecting osteoporosis. We found no mathematical errors in the outputs of o1-mini and GPT-5-mini. Answer-only output format significantly reduced performance in Llama and DeepSeek distilled Llama but not in o1-mini or GPT-5-mini. To conclude, reinforcement learning (RL) reasoning models perform consistently well in radiology numerical tasks and show no mathematical errors. Simpler non-RL reasoning models may also achieve acceptable performance depending on the task.

本研究的目的是调查llm在放射学数值任务中的表现,并进行全面的误差分析。我们定义了六个任务:从DEXA报告中提取(1)最小t评分,(2)从超声报告中提取最大胆总管(CBD)直径,(3)从CT报告中提取最大肺结节大小,并判断(1)PET报告中是否存在高代谢区,(2)根据DEXA报告判断患者是否骨质疏松,(3)根据超声报告判断患者是否有扩张的CBD。报告是从MIMIC III和我们机构的数据库中提取的,并且人工提取了实际情况。使用的模型为Llama 3.1 8b, DeepSeek R1蒸馏Llama 8b, OpenAI 01 -mini和OpenAI GPT-5-mini。我们手动审查了所有不正确的输出,并进行了全面的错误分析。在提取任务中,虽然Llama在不同任务中显示出相对不同的结果(范围为86%-98.7%),但其他模型的表现一直很好(准确率为95%)。在判断任务中,羊驼、DeepSeek提炼羊驼、o1-mini和GPT-5-mini的最低准确率分别为62.0%、91.7%、91.7%和99.0%,而o1-mini和GPT-5-mini对骨质疏松症的检测准确率达到100%。我们发现o1-mini和GPT-5-mini的输出没有数学误差。只有答案的输出格式显着降低了Llama和DeepSeek蒸馏Llama的性能,但在01 -mini或GPT-5-mini中没有。综上所述,强化学习(RL)推理模型在放射学数值任务中表现良好,没有数学误差。更简单的非强化学习推理模型也可以根据任务实现可接受的性能。
{"title":"Large Language Models in Radiologic Numerical Tasks: A Thorough Evaluation and Error Analysis.","authors":"Ali Nowroozi, Masha Bondarenko, Adrian Serapio, Tician Schnitzler, Sukhmanjit S Brar, Jae Ho Sohn","doi":"10.1007/s10278-025-01824-9","DOIUrl":"https://doi.org/10.1007/s10278-025-01824-9","url":null,"abstract":"<p><p>The purpose of this study was to investigate the performance of LLMs in radiology numerical tasks and perform a comprehensive error analysis. We defined six tasks: extracting (1) minimum T-score from DEXA report, (2) maximum common bile duct (CBD) diameter from ultrasound report, and (3) maximum lung nodule size from CT report, and judging (1) presence of a highly hypermetabolic region on a PET report, (2) whether a patient is osteoporotic based on a DEXA report, and (3) whether a patient has a dilated CBD based on an ultrasound report. Reports were extracted from the MIMIC III and our institution's databases, and the ground truths were extracted manually. The models used were Llama 3.1 8b, DeepSeek R1 distilled Llama 8b, OpenAI o1-mini, and OpenAI GPT-5-mini. We manually reviewed all incorrect outputs and performed a comprehensive error analysis. In extraction tasks, while Llama showed relatively variable results (ranging 86%-98.7%) across tasks, other models performed consistently well (accuracies > 95%). In judgment tasks, the lowest accuracies of Llama, DeepSeek distilled Llama, o1-mini, and GPT-5-mini were 62.0%, 91.7%, 91.7%, and 99.0%, respectively, while o1-mini and GPT-5-mini did reach 100% performance in detecting osteoporosis. We found no mathematical errors in the outputs of o1-mini and GPT-5-mini. Answer-only output format significantly reduced performance in Llama and DeepSeek distilled Llama but not in o1-mini or GPT-5-mini. To conclude, reinforcement learning (RL) reasoning models perform consistently well in radiology numerical tasks and show no mathematical errors. Simpler non-RL reasoning models may also achieve acceptable performance depending on the task.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146021266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hemorrhage Segmentation in Fundus Images Using the U-Net 3+ Model: Performance Comparison Across Retinal Regions. 使用U-Net 3+模型分割眼底图像中的出血:跨视网膜区域的性能比较。
Pub Date : 2026-01-21 DOI: 10.1007/s10278-025-01837-4
Yeong Hun Kang, Young Jae Kim, Kwang Gi Kim

Diabetic retinopathy (DR) is one of the most common complications of diabetes, and timely detection of retinal hemorrhages is essential for preventing vision loss. This study evaluates the U-Net3 + model for pixel-level hemorrhage segmentation in fundus images and examines its performance across clinically meaningful retinal regions. Model performance was assessed using accuracy, sensitivity, specificity, and Dice score and further analyzed across perivascular and extravascular areas, perifoveal and extrafoveal regions, fovea-centered quadrants, and images stratified by hemorrhage burden. U-Net3 + achieved strong overall performance, with 99.93% accuracy, 87.03% sensitivity, 99.97% specificity, and an 85.02% Dice score. Higher segmentation accuracy was observed in extravascular regions and within the foveal area, while quadrant-wise performance remained largely consistent. Images with greater hemorrhage burden demonstrated higher segmentation reliability. These findings highlight the importance of region-aware evaluation and suggest that U-Net3 + can provide clinically meaningful support for automated DR screening. Further validation using larger and multi-center datasets is required to enhance the model's generalizability for real-world clinical deployment.

糖尿病视网膜病变(DR)是糖尿病最常见的并发症之一,及时发现视网膜出血对于预防视力丧失至关重要。本研究评估了U-Net3 +模型在眼底图像中的像素级出血分割,并检验了其在临床有意义的视网膜区域的表现。通过准确性、敏感性、特异性和Dice评分来评估模型的性能,并进一步分析血管周围和血管外区域、凹周和凹外区域、以中央凹为中心的象限以及按出血负担分层的图像。U-Net3 +获得了较强的整体性能,准确率为99.93%,灵敏度为87.03%,特异性为99.97%,Dice评分为85.02%。在血管外区域和中央凹区域观察到更高的分割准确性,而象限表现基本保持一致。出血负荷越大的图像分割可靠性越高。这些发现强调了区域感知评估的重要性,并表明U-Net3 +可以为自动DR筛查提供有临床意义的支持。需要使用更大的多中心数据集进行进一步验证,以增强模型在现实世界临床部署中的通用性。
{"title":"Hemorrhage Segmentation in Fundus Images Using the U-Net 3+ Model: Performance Comparison Across Retinal Regions.","authors":"Yeong Hun Kang, Young Jae Kim, Kwang Gi Kim","doi":"10.1007/s10278-025-01837-4","DOIUrl":"https://doi.org/10.1007/s10278-025-01837-4","url":null,"abstract":"<p><p>Diabetic retinopathy (DR) is one of the most common complications of diabetes, and timely detection of retinal hemorrhages is essential for preventing vision loss. This study evaluates the U-Net3 + model for pixel-level hemorrhage segmentation in fundus images and examines its performance across clinically meaningful retinal regions. Model performance was assessed using accuracy, sensitivity, specificity, and Dice score and further analyzed across perivascular and extravascular areas, perifoveal and extrafoveal regions, fovea-centered quadrants, and images stratified by hemorrhage burden. U-Net3 + achieved strong overall performance, with 99.93% accuracy, 87.03% sensitivity, 99.97% specificity, and an 85.02% Dice score. Higher segmentation accuracy was observed in extravascular regions and within the foveal area, while quadrant-wise performance remained largely consistent. Images with greater hemorrhage burden demonstrated higher segmentation reliability. These findings highlight the importance of region-aware evaluation and suggest that U-Net3 + can provide clinically meaningful support for automated DR screening. Further validation using larger and multi-center datasets is required to enhance the model's generalizability for real-world clinical deployment.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146021222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of ChatGPT-4 and Microsoft Copilot for Third-Molar Assessment on Panoramic Radiographs. ChatGPT-4和Microsoft Copilot用于全景x线片第三摩尔评估的评估。
Pub Date : 2026-01-21 DOI: 10.1007/s10278-025-01805-y
Thaísa Pinheiro Silva, Maria Fernanda Silva Andrade-Bortoletto, Caio Alencar-Palha, Thaís Santos Cerqueira Ocampo, Christiano Oliveira-Santos, Deborah Queiroz Freitas, Matheus L Oliveira

To assess the performance of two AI chatbot assistants in identifying the presence and classifying the position of third molars on panoramic radiographs. A total of 114 third molars from 100 panoramic radiographs were evaluated consensually by three examiners and independently by two AI chatbot assistants (ChatGPT-4 and Microsoft Copilot). They were asked to provide descriptions regarding the orientation of the third molar's long axis, level of bone inclusion, space between the lower second molar and the mandibular ramus, and proximity of the third molar to the mandibular canal or maxillary sinus. Keywords generated by the AI chatbot assistants were compared to those used by the examiners and scored as 0 (incorrect), 0.5 (partially correct), or 1 (correct). Mean scores and standard deviations were calculated for each parameter and compared using the Wilcoxon test (α = 0.05). Mean scores across the four parameters ranged from 0.08 to 0.30 (SD = 0.42-0.44) for ChatGPT-4 and from 0.25 to 0.31 (SD = 0.42-0.47) for Microsoft Copilot. The only significant difference in performance between the AI chatbots was observed in the space between the lower second molar and ramus, in favor of Microsoft Copilot (p < 0.05). Overall performance scores were 0.22 (SD = 0.42) for ChatGPT-4 and 0.28 (SD = 0.46) for Microsoft Copilot. Furthermore, hallucinations such as classifying absent teeth were also observed. Both ChatGPT-4 and Microsoft Copilot demonstrate generally low performance in accurately identifying and classifying the position of third molars on panoramic radiographs.

评估两种AI聊天机器人助手在全景x光片上识别第三磨牙的存在和分类位置方面的表现。来自100张全景x光片的114颗第三磨牙由三名审查员和两名人工智能聊天机器人助手(ChatGPT-4和Microsoft Copilot)独立评估。他们被要求提供关于第三磨牙长轴的方向、骨包合水平、下第二磨牙与下颌支之间的空间以及第三磨牙与下颌管或上颌窦的接近程度的描述。人工智能聊天机器人助手生成的关键词与考官使用的关键词进行比较,得分为0(不正确)、0.5(部分正确)或1(正确)。计算各参数的均分和标准差,采用Wilcoxon检验进行比较(α = 0.05)。ChatGPT-4的四个参数平均得分范围为0.08至0.30 (SD = 0.42-0.44), Microsoft Copilot的平均得分范围为0.25至0.31 (SD = 0.42-0.47)。人工智能聊天机器人之间唯一显著的性能差异是在第二臼齿和分支之间的空间中观察到的,这有利于微软的副驾驶(p
{"title":"Evaluation of ChatGPT-4 and Microsoft Copilot for Third-Molar Assessment on Panoramic Radiographs.","authors":"Thaísa Pinheiro Silva, Maria Fernanda Silva Andrade-Bortoletto, Caio Alencar-Palha, Thaís Santos Cerqueira Ocampo, Christiano Oliveira-Santos, Deborah Queiroz Freitas, Matheus L Oliveira","doi":"10.1007/s10278-025-01805-y","DOIUrl":"https://doi.org/10.1007/s10278-025-01805-y","url":null,"abstract":"<p><p>To assess the performance of two AI chatbot assistants in identifying the presence and classifying the position of third molars on panoramic radiographs. A total of 114 third molars from 100 panoramic radiographs were evaluated consensually by three examiners and independently by two AI chatbot assistants (ChatGPT-4 and Microsoft Copilot). They were asked to provide descriptions regarding the orientation of the third molar's long axis, level of bone inclusion, space between the lower second molar and the mandibular ramus, and proximity of the third molar to the mandibular canal or maxillary sinus. Keywords generated by the AI chatbot assistants were compared to those used by the examiners and scored as 0 (incorrect), 0.5 (partially correct), or 1 (correct). Mean scores and standard deviations were calculated for each parameter and compared using the Wilcoxon test (α = 0.05). Mean scores across the four parameters ranged from 0.08 to 0.30 (SD = 0.42-0.44) for ChatGPT-4 and from 0.25 to 0.31 (SD = 0.42-0.47) for Microsoft Copilot. The only significant difference in performance between the AI chatbots was observed in the space between the lower second molar and ramus, in favor of Microsoft Copilot (p < 0.05). Overall performance scores were 0.22 (SD = 0.42) for ChatGPT-4 and 0.28 (SD = 0.46) for Microsoft Copilot. Furthermore, hallucinations such as classifying absent teeth were also observed. Both ChatGPT-4 and Microsoft Copilot demonstrate generally low performance in accurately identifying and classifying the position of third molars on panoramic radiographs.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146013979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparison of Three Automated Measurements of Ventricular Volumes of the Brain. 三种脑室容积自动测量方法的比较。
Pub Date : 2026-01-20 DOI: 10.1007/s10278-025-01819-6
Xue Zhang, Yonggang Li, Ping Mu, Xiaotong Yu, Xiu Zhang, Shulin Liu, Baishi Wang, Ning Li, Fu Ren

Accurate segmentation and measurement of ventricular volume are critical for neuroscience research and neurological disease diagnosis. In resource-limited settings, free and open-source automated tools offer accessible solutions. However, the lack of comparative evaluations limits their application. This study aims to identify reliable free tools for automated ventricular volume measurement to support clinical and research utilization. Magnetic resonance imaging (MRI) data from 150 healthy adults were collected with informed consent. Ventricular volumes were segmented using three open-source tools (3D Slicer, FreeSurfer, ITK-SNAP) and compared with manual segmentation as the reference standard. Pearson's correlation coefficient, intraclass correlation coefficient (ICC), and the Bland-Altman analysis were employed to evaluate consistency and reliability. All three automated tools showed significant correlations with manual measurements (P < 0.01). ITK-SNAP had the highest Pearson correlation and ICC values, followed by 3D Slicer, while FreeSurfer had the lowest. All tools demonstrated strong reliability, with ICCs greater than 0.9. The Bland-Altman analysis showed that ITK-SNAP had the closest consistency with manual results, again followed by 3D Slicer, with FreeSurfer performing least consistently. ITK-SNAP demonstrates higher accuracy and reliability for ventricular volumetry compared to 3D Slicer and FreeSurfer. Its open-source nature supports broader implementation in resource-constrained environments, enhancing neuroimaging accessibility for clinical and research applications.

心室容积的精确分割和测量对于神经科学研究和神经系统疾病的诊断至关重要。在资源有限的环境中,免费和开源的自动化工具提供了可访问的解决方案。然而,缺乏比较评价限制了它们的应用。本研究旨在确定可靠的免费工具,用于自动心室容积测量,以支持临床和研究应用。在知情同意的情况下,收集了150名健康成人的磁共振成像(MRI)数据。采用3种开源工具(3D Slicer、FreeSurfer、ITK-SNAP)对心室体积进行分割,并与人工分割作为参考标准进行比较。采用Pearson相关系数、类内相关系数(intraclass correlation coefficient, ICC)和Bland-Altman分析评价一致性和信度。所有三种自动化工具都显示了与人工测量的显著相关性(P
{"title":"Comparison of Three Automated Measurements of Ventricular Volumes of the Brain.","authors":"Xue Zhang, Yonggang Li, Ping Mu, Xiaotong Yu, Xiu Zhang, Shulin Liu, Baishi Wang, Ning Li, Fu Ren","doi":"10.1007/s10278-025-01819-6","DOIUrl":"https://doi.org/10.1007/s10278-025-01819-6","url":null,"abstract":"<p><p>Accurate segmentation and measurement of ventricular volume are critical for neuroscience research and neurological disease diagnosis. In resource-limited settings, free and open-source automated tools offer accessible solutions. However, the lack of comparative evaluations limits their application. This study aims to identify reliable free tools for automated ventricular volume measurement to support clinical and research utilization. Magnetic resonance imaging (MRI) data from 150 healthy adults were collected with informed consent. Ventricular volumes were segmented using three open-source tools (3D Slicer, FreeSurfer, ITK-SNAP) and compared with manual segmentation as the reference standard. Pearson's correlation coefficient, intraclass correlation coefficient (ICC), and the Bland-Altman analysis were employed to evaluate consistency and reliability. All three automated tools showed significant correlations with manual measurements (P < 0.01). ITK-SNAP had the highest Pearson correlation and ICC values, followed by 3D Slicer, while FreeSurfer had the lowest. All tools demonstrated strong reliability, with ICCs greater than 0.9. The Bland-Altman analysis showed that ITK-SNAP had the closest consistency with manual results, again followed by 3D Slicer, with FreeSurfer performing least consistently. ITK-SNAP demonstrates higher accuracy and reliability for ventricular volumetry compared to 3D Slicer and FreeSurfer. Its open-source nature supports broader implementation in resource-constrained environments, enhancing neuroimaging accessibility for clinical and research applications.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146013183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting HFA 30-2 Visual Fields with Deep Learning from Multimodal OCT-Fundus Feature Fusion and Structure-Function Discordance Analysis. 基于多模态oct -眼底特征融合和结构-功能不一致分析的深度学习HFA - 30-2视野预测
Pub Date : 2026-01-20 DOI: 10.1007/s10278-025-01798-8
İlknur Tuncer Fırat, Murat Fırat, Haci Erbali, Taner Tuncer

Glaucoma is a leading cause of irreversible vision loss. During clinical follow-up, visual field (VF) tests (Humphrey Field Analyzer 30-2) assesses functional loss, while optical coherence tomography (OCT) and fundus imaging provide structural information. However, VF measurement can be subjective, exhibit test-retest variability, and sometimes exhibit structure-function discordance (SFD). Therefore, predicting VF values from structural images may support clinical decision-making. To estimate Humphrey 30-2 measures (mean deviation (MD), pattern standard deviation (PSD), and point-wise threshold sensitivity (TS)) in glaucoma/ocular hypertension (OHT) using a ViT-B/32-based feature-fusion approach on OCT and fundus images, and to examine the effect of SFD via sensitivity analysis. Visual features were extracted from color optic disc photographs, red-free fundus images, retinal nerve fiber layer (RNFL) thickness map, and circular RNFL plots using Vision Transformer (ViT-B/32)-based models. These features were combined with demographic and clinical data to form a multimodal artificial intelligence model. Global VF indices (MD, PSD) were estimated with probabilistic regression that accounts for uncertainty, and point-wise TS values were predicted using a location-aware network. In a separate analysis, eyes exhibiting SFD were identified and excluded to assess model performance under OCT-VF concordance. Mean absolute errors (MAE) were 2.26, 1.42, and 2.96 dB for MD, PSD, and mean TS, respectively, and the proportions within ± 2 dB were 59.65%, 75.44%, and 57.90%. After excluding SFD eyes, MAEs decreased to 1.82, 1.30, and 2.12 dB for MD, PSD, and mean TS, respectively; the proportions within ± 2 dB increased to 66.7%, 76.5% and 62.7%, respectively. These findings indicate that discordance affects performance and that predictions are more reliable in clinically concordant cases. ViT-B/32-based deep feature fusion offers clinically meaningful accuracy for predicting VF metrics from multimodal structural images. SFD was frequently detected among the lowest-performing cases, and this possibility should be considered when interpreting low-performing outputs.

青光眼是导致不可逆视力丧失的主要原因。在临床随访期间,视野(VF)测试(Humphrey field Analyzer 30-2)评估功能损失,而光学相干断层扫描(OCT)和眼底成像提供结构信息。然而,VF测量可能是主观的,表现出测试-再测试的可变性,有时表现出结构-功能不一致(SFD)。因此,从结构图像中预测VF值可以支持临床决策。利用基于ViT-B/32的特征融合方法对OCT和眼底图像估计青光眼/高眼压(OHT)患者的Humphrey 30-2测量(平均偏差(MD)、模式标准差(PSD)和点向阈值敏感性(TS)),并通过敏感性分析检验SFD的影响。采用基于Vision Transformer (ViT-B/32)的模型提取彩色视盘照片、无红色眼底图像、视网膜神经纤维层(RNFL)厚度图和圆形RNFL图的视觉特征。这些特征与人口统计和临床数据相结合,形成一个多模式人工智能模型。利用考虑不确定性的概率回归估计全球VF指数(MD, PSD),并利用位置感知网络预测逐点TS值。在单独的分析中,识别和排除出现SFD的眼睛,以评估OCT-VF一致性下模型的性能。MD、PSD和Mean TS的平均绝对误差(MAE)分别为2.26、1.42和2.96 dB,±2 dB范围内的比例分别为59.65%、75.44%和57.90%。排除SFD眼后,MD、PSD和mean TS的MAEs分别降至1.82、1.30和2.12 dB;±2 dB范围内的比例分别增加到66.7%、76.5%和62.7%。这些发现表明,不一致会影响表现,在临床一致的情况下,预测更可靠。基于ViT-B/32的深度特征融合为预测多模态结构图像的VF指标提供了具有临床意义的准确性。在表现最差的病例中经常检测到SFD,在解释表现不佳的输出时应考虑这种可能性。
{"title":"Predicting HFA 30-2 Visual Fields with Deep Learning from Multimodal OCT-Fundus Feature Fusion and Structure-Function Discordance Analysis.","authors":"İlknur Tuncer Fırat, Murat Fırat, Haci Erbali, Taner Tuncer","doi":"10.1007/s10278-025-01798-8","DOIUrl":"https://doi.org/10.1007/s10278-025-01798-8","url":null,"abstract":"<p><p>Glaucoma is a leading cause of irreversible vision loss. During clinical follow-up, visual field (VF) tests (Humphrey Field Analyzer 30-2) assesses functional loss, while optical coherence tomography (OCT) and fundus imaging provide structural information. However, VF measurement can be subjective, exhibit test-retest variability, and sometimes exhibit structure-function discordance (SFD). Therefore, predicting VF values from structural images may support clinical decision-making. To estimate Humphrey 30-2 measures (mean deviation (MD), pattern standard deviation (PSD), and point-wise threshold sensitivity (TS)) in glaucoma/ocular hypertension (OHT) using a ViT-B/32-based feature-fusion approach on OCT and fundus images, and to examine the effect of SFD via sensitivity analysis. Visual features were extracted from color optic disc photographs, red-free fundus images, retinal nerve fiber layer (RNFL) thickness map, and circular RNFL plots using Vision Transformer (ViT-B/32)-based models. These features were combined with demographic and clinical data to form a multimodal artificial intelligence model. Global VF indices (MD, PSD) were estimated with probabilistic regression that accounts for uncertainty, and point-wise TS values were predicted using a location-aware network. In a separate analysis, eyes exhibiting SFD were identified and excluded to assess model performance under OCT-VF concordance. Mean absolute errors (MAE) were 2.26, 1.42, and 2.96 dB for MD, PSD, and mean TS, respectively, and the proportions within ± 2 dB were 59.65%, 75.44%, and 57.90%. After excluding SFD eyes, MAEs decreased to 1.82, 1.30, and 2.12 dB for MD, PSD, and mean TS, respectively; the proportions within ± 2 dB increased to 66.7%, 76.5% and 62.7%, respectively. These findings indicate that discordance affects performance and that predictions are more reliable in clinically concordant cases. ViT-B/32-based deep feature fusion offers clinically meaningful accuracy for predicting VF metrics from multimodal structural images. SFD was frequently detected among the lowest-performing cases, and this possibility should be considered when interpreting low-performing outputs.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146013956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal Fusion and Transfer Learning for the Detection of Degenerative Parkinsonisms with Dopamine Transporter SPECT Imaging. 多巴胺转运体SPECT成像检测退行性帕金森病的多模态融合和迁移学习。
Pub Date : 2026-01-20 DOI: 10.1007/s10278-025-01831-w
Valentin Durand de Gevigney, Nicolas Nicastro, Valentina Garibotto, Jérôme Schmid

Dopamine transporter (DAT) SPECT is a validated biomarker for Parkinson's disease (PD) and related degenerative parkinsonisms. Interpretation relies on visual assessment supported by striatal image features such as striatal binding ratios (SBRs). Deep learning can aid this process but often underuses complementary data, lacks robustness to heterogeneous inputs, and offers limited interpretability. We built an end-to-end multimodal framework that encodes DAT images and scalar data (patient age and striatal image features) using, respectively, a vision transformer and a multilayer perceptron. A transformer-based fusion module then combined the encoded representations, while tackling possible missing inputs. Interpretability was provided through modality-level attention, spatial attention maps, occlusion analysis, and scalar feature saliency. Performance was evaluated on 664 Parkinson's Progression Markers Initiative (PPMI) cases and two local datasets A (N = 228) and B (N = 530) from different devices, including PD, atypical parkinsonisms, and non-degenerative subjects. Transfer learning involved pretraining on two datasets and finetuning on the third. On PPMI, the model reached 97.4% AUROC, 95.5% accuracy, 97.0% sensitivity, and 91.9% specificity, matching state-of-the-art performance. Results were similar on dataset B (98.6% AUROC) but lower on dataset A (92.6% AUROC), likely due to its smaller size and reduced image quality. Explainability analyses showed the model focused on clinically relevant striatal regions and identified key scalar features such as putamen SBR and asymmetry. The fusion module also supported stable predictions despite missing data. Our method efficiently combined multimodal data with heterogeneous datasets and partial multimodal data. Integrated explainability tools showed clinically meaningful attention that is expected to favor its adoption.

多巴胺转运蛋白(DAT) SPECT是帕金森病(PD)和相关退行性帕金森病的有效生物标志物。解释依赖于纹状体图像特征支持的视觉评估,如纹状体结合比率(sbr)。深度学习可以帮助这一过程,但通常不充分利用互补数据,对异构输入缺乏鲁棒性,并且提供有限的可解释性。我们构建了一个端到端的多模态框架,分别使用视觉转换器和多层感知器对DAT图像和标量数据(患者年龄和纹状体图像特征)进行编码。然后,基于变压器的融合模块将编码表示组合在一起,同时处理可能缺失的输入。可解释性通过模态级注意、空间注意图、遮挡分析和标量特征显著性提供。对664例帕金森进展标志物倡议(PPMI)病例和来自不同设备的两个局部数据集A (N = 228)和B (N = 530)进行了性能评估,包括PD、非典型帕金森和非退行性受试者。迁移学习涉及两个数据集的预训练和第三个数据集的微调。在PPMI上,模型达到97.4%的AUROC, 95.5%的准确率,97.0%的灵敏度和91.9%的特异性,达到了最先进的性能。数据集B的结果相似(98.6% AUROC),但数据集A的结果较低(92.6% AUROC),可能是由于其较小的尺寸和降低的图像质量。可解释性分析表明,该模型专注于临床相关的纹状体区域,并确定了壳核SBR和不对称性等关键标量特征。尽管缺少数据,聚变模块也支持稳定的预测。该方法有效地将多模态数据与异构数据集和部分多模态数据相结合。综合的可解释性工具显示了临床有意义的关注,预计将有利于其采用。
{"title":"Multimodal Fusion and Transfer Learning for the Detection of Degenerative Parkinsonisms with Dopamine Transporter SPECT Imaging.","authors":"Valentin Durand de Gevigney, Nicolas Nicastro, Valentina Garibotto, Jérôme Schmid","doi":"10.1007/s10278-025-01831-w","DOIUrl":"https://doi.org/10.1007/s10278-025-01831-w","url":null,"abstract":"<p><p>Dopamine transporter (DAT) SPECT is a validated biomarker for Parkinson's disease (PD) and related degenerative parkinsonisms. Interpretation relies on visual assessment supported by striatal image features such as striatal binding ratios (SBRs). Deep learning can aid this process but often underuses complementary data, lacks robustness to heterogeneous inputs, and offers limited interpretability. We built an end-to-end multimodal framework that encodes DAT images and scalar data (patient age and striatal image features) using, respectively, a vision transformer and a multilayer perceptron. A transformer-based fusion module then combined the encoded representations, while tackling possible missing inputs. Interpretability was provided through modality-level attention, spatial attention maps, occlusion analysis, and scalar feature saliency. Performance was evaluated on 664 Parkinson's Progression Markers Initiative (PPMI) cases and two local datasets A (N = 228) and B (N = 530) from different devices, including PD, atypical parkinsonisms, and non-degenerative subjects. Transfer learning involved pretraining on two datasets and finetuning on the third. On PPMI, the model reached 97.4% AUROC, 95.5% accuracy, 97.0% sensitivity, and 91.9% specificity, matching state-of-the-art performance. Results were similar on dataset B (98.6% AUROC) but lower on dataset A (92.6% AUROC), likely due to its smaller size and reduced image quality. Explainability analyses showed the model focused on clinically relevant striatal regions and identified key scalar features such as putamen SBR and asymmetry. The fusion module also supported stable predictions despite missing data. Our method efficiently combined multimodal data with heterogeneous datasets and partial multimodal data. Integrated explainability tools showed clinically meaningful attention that is expected to favor its adoption.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146013941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of imaging informatics in medicine
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1