首页 > 最新文献

Journal of imaging informatics in medicine最新文献

英文 中文
Development of a No-Reference CT Image Quality Assessment Method Using RadImageNet Pre-trained Deep Learning Models. 基于RadImageNet预训练深度学习模型的无参考CT图像质量评估方法的开发。
Pub Date : 2026-02-01 Epub Date: 2025-05-27 DOI: 10.1007/s10278-025-01542-2
Kohei Ohashi, Yukihiro Nagatani, Asumi Yamazaki, Makoto Yoshigoe, Kyohei Iwai, Ryo Uemura, Masayuki Shimomura, Kenta Tanimura, Takayuki Ishida

Accurate assessment of computed tomography (CT) image quality is crucial for ensuring diagnostic accuracy, optimizing imaging protocols, and preventing excessive radiation exposure. In clinical settings, where high-quality reference images are often unavailable, developing no-reference image quality assessment (NR-IQA) methods is essential. Recently, CT-NR-IQA methods using deep learning have been widely studied; however, significant challenges remain in handling multiple degradation factors and accurately reflecting real-world degradations. To address these issues, we propose a novel CT-NR-IQA method. Our approach utilizes a dataset that combines two degradation factors (noise and blur) to train convolutional neural network (CNN) models capable of handling multiple degradation factors. Additionally, we leveraged RadImageNet pre-trained models (ResNet50, DenseNet121, InceptionV3, and InceptionResNetV2), allowing the models to learn deep features from large-scale real clinical images, thus enhancing adaptability to real-world degradations without relying on artificially degraded images. The models' performances were evaluated by measuring the correlation between the subjective scores and predicted image quality scores for both artificially degraded and real clinical image datasets. The results demonstrated positive correlations between the subjective and predicted scores for both datasets. In particular, ResNet50 showed the best performance, with a correlation coefficient of 0.910 for the artificially degraded images and 0.831 for the real clinical images. These findings indicate that the proposed method could serve as a potential surrogate for subjective assessment in CT-NR-IQA.

准确评估计算机断层扫描(CT)图像质量对于确保诊断准确性、优化成像方案和防止过度辐射暴露至关重要。在临床环境中,通常无法获得高质量的参考图像,因此开发无参考图像质量评估(NR-IQA)方法至关重要。近年来,利用深度学习的CT-NR-IQA方法得到了广泛的研究;然而,在处理多种退化因素和准确反映实际退化方面仍然存在重大挑战。为了解决这些问题,我们提出了一种新的CT-NR-IQA方法。我们的方法利用结合了两种退化因素(噪声和模糊)的数据集来训练能够处理多种退化因素的卷积神经网络(CNN)模型。此外,我们利用RadImageNet预训练模型(ResNet50, DenseNet121, InceptionV3和InceptionResNetV2),允许模型从大规模真实临床图像中学习深度特征,从而增强对现实世界退化的适应性,而不依赖于人工退化的图像。通过测量人工降级和真实临床图像数据集的主观评分和预测图像质量评分之间的相关性来评估模型的性能。结果表明,两个数据集的主观得分和预测得分之间存在正相关。其中,ResNet50表现出最好的性能,人工降级图像的相关系数为0.910,真实临床图像的相关系数为0.831。这些结果表明,该方法可作为CT-NR-IQA主观评价的潜在替代方法。
{"title":"Development of a No-Reference CT Image Quality Assessment Method Using RadImageNet Pre-trained Deep Learning Models.","authors":"Kohei Ohashi, Yukihiro Nagatani, Asumi Yamazaki, Makoto Yoshigoe, Kyohei Iwai, Ryo Uemura, Masayuki Shimomura, Kenta Tanimura, Takayuki Ishida","doi":"10.1007/s10278-025-01542-2","DOIUrl":"10.1007/s10278-025-01542-2","url":null,"abstract":"<p><p>Accurate assessment of computed tomography (CT) image quality is crucial for ensuring diagnostic accuracy, optimizing imaging protocols, and preventing excessive radiation exposure. In clinical settings, where high-quality reference images are often unavailable, developing no-reference image quality assessment (NR-IQA) methods is essential. Recently, CT-NR-IQA methods using deep learning have been widely studied; however, significant challenges remain in handling multiple degradation factors and accurately reflecting real-world degradations. To address these issues, we propose a novel CT-NR-IQA method. Our approach utilizes a dataset that combines two degradation factors (noise and blur) to train convolutional neural network (CNN) models capable of handling multiple degradation factors. Additionally, we leveraged RadImageNet pre-trained models (ResNet50, DenseNet121, InceptionV3, and InceptionResNetV2), allowing the models to learn deep features from large-scale real clinical images, thus enhancing adaptability to real-world degradations without relying on artificially degraded images. The models' performances were evaluated by measuring the correlation between the subjective scores and predicted image quality scores for both artificially degraded and real clinical image datasets. The results demonstrated positive correlations between the subjective and predicted scores for both datasets. In particular, ResNet50 showed the best performance, with a correlation coefficient of 0.910 for the artificially degraded images and 0.831 for the real clinical images. These findings indicate that the proposed method could serve as a potential surrogate for subjective assessment in CT-NR-IQA.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":"46-58"},"PeriodicalIF":0.0,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12921086/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144164187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
New Machine Learning Method for Medical Image and Microarray Data Analysis for Heart Disease Classification. 心脏疾病分类医学图像和微阵列数据分析的新机器学习方法。
Pub Date : 2026-02-01 Epub Date: 2025-04-01 DOI: 10.1007/s10278-025-01492-9
Jinglan Guo, Jue Liao, Yuanlian Chen, Lisha Wen, Song Cheng

Microarray technology has become a vital tool in cardiovascular research, enabling the simultaneous analysis of thousands of gene expressions. This capability provides a robust foundation for heart disease classification and biomarker discovery. However, the high dimensionality, noise, and sparsity of microarray data present significant challenges for effective analysis. Gene selection, which aims to identify the most relevant subset of genes, is a crucial preprocessing step for improving classification accuracy, reducing computational complexity, and enhancing biological interpretability. Traditional gene selection methods often fall short in capturing complex, nonlinear interactions among genes, limiting their effectiveness in heart disease classification tasks. In this study, we propose a novel framework that leverages deep neural networks (DNNs) for optimizing gene selection and heart disease classification using microarray data. DNNs, known for their ability to model complex, nonlinear patterns, are integrated with feature selection techniques to address the challenges of high-dimensional data. The proposed method, DeepGeneNet (DGN), combines gene selection and DNN-based classification into a unified framework, ensuring robust performance and meaningful insights into the underlying biological mechanisms. Additionally, the framework incorporates hyperparameter optimization and innovative U-Net segmentation techniques to further enhance computational performance and classification accuracy. These optimizations enable DGN to deliver robust and scalable results, outperforming traditional methods in both predictive accuracy and interpretability. Experimental results demonstrate that the proposed approach significantly improves heart disease classification accuracy compared to other methods. By focusing on the interplay between gene selection and deep learning, this work advances the field of cardiovascular genomics, providing a scalable and interpretable framework for future applications.

微阵列技术已经成为心血管研究的重要工具,可以同时分析数千种基因表达。这种能力为心脏病分类和生物标志物的发现提供了坚实的基础。然而,微阵列数据的高维、噪声和稀疏性对有效分析提出了重大挑战。基因选择旨在识别最相关的基因子集,是提高分类准确性、降低计算复杂度和增强生物可解释性的关键预处理步骤。传统的基因选择方法往往无法捕捉到基因之间复杂的非线性相互作用,限制了它们在心脏病分类任务中的有效性。在这项研究中,我们提出了一个新的框架,利用深度神经网络(dnn)来优化基因选择和使用微阵列数据的心脏病分类。深度神经网络以其模拟复杂非线性模式的能力而闻名,它与特征选择技术相结合,以解决高维数据的挑战。所提出的方法DeepGeneNet (DGN)将基因选择和基于dnn的分类结合到一个统一的框架中,确保了强大的性能和对潜在生物学机制的有意义的见解。此外,该框架还结合了超参数优化和创新的U-Net分割技术,进一步提高了计算性能和分类精度。这些优化使DGN能够提供强大且可扩展的结果,在预测准确性和可解释性方面优于传统方法。实验结果表明,与其他方法相比,该方法显著提高了心脏病分类的准确率。通过关注基因选择和深度学习之间的相互作用,这项工作推动了心血管基因组学领域的发展,为未来的应用提供了一个可扩展和可解释的框架。
{"title":"New Machine Learning Method for Medical Image and Microarray Data Analysis for Heart Disease Classification.","authors":"Jinglan Guo, Jue Liao, Yuanlian Chen, Lisha Wen, Song Cheng","doi":"10.1007/s10278-025-01492-9","DOIUrl":"10.1007/s10278-025-01492-9","url":null,"abstract":"<p><p>Microarray technology has become a vital tool in cardiovascular research, enabling the simultaneous analysis of thousands of gene expressions. This capability provides a robust foundation for heart disease classification and biomarker discovery. However, the high dimensionality, noise, and sparsity of microarray data present significant challenges for effective analysis. Gene selection, which aims to identify the most relevant subset of genes, is a crucial preprocessing step for improving classification accuracy, reducing computational complexity, and enhancing biological interpretability. Traditional gene selection methods often fall short in capturing complex, nonlinear interactions among genes, limiting their effectiveness in heart disease classification tasks. In this study, we propose a novel framework that leverages deep neural networks (DNNs) for optimizing gene selection and heart disease classification using microarray data. DNNs, known for their ability to model complex, nonlinear patterns, are integrated with feature selection techniques to address the challenges of high-dimensional data. The proposed method, DeepGeneNet (DGN), combines gene selection and DNN-based classification into a unified framework, ensuring robust performance and meaningful insights into the underlying biological mechanisms. Additionally, the framework incorporates hyperparameter optimization and innovative U-Net segmentation techniques to further enhance computational performance and classification accuracy. These optimizations enable DGN to deliver robust and scalable results, outperforming traditional methods in both predictive accuracy and interpretability. Experimental results demonstrate that the proposed approach significantly improves heart disease classification accuracy compared to other methods. By focusing on the interplay between gene selection and deep learning, this work advances the field of cardiovascular genomics, providing a scalable and interpretable framework for future applications.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":"884-907"},"PeriodicalIF":0.0,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12921063/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143766305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How Do Radiologists Currently Monitor AI in Radiology and What Challenges Do They Face? An Interview Study and Qualitative Analysis. 放射科医生目前如何监控放射学中的人工智能?他们面临哪些挑战?访谈研究与定性分析。
Pub Date : 2026-02-01 Epub Date: 2025-04-08 DOI: 10.1007/s10278-025-01493-8
Jamie Chow, Ryan Lee, Honghan Wu

Artificial intelligence (AI) in radiology is becoming increasingly prevalent; however, there is not a clear picture of how AI is being monitored today and how this should practically be done given the inherent risk of AI model performance degradation over time. This research investigates current practices and what difficulties radiologists face in monitoring AI. Semi-structured virtual interviews were conducted with 6 USA and 10 Europe-based radiologists. The interviews were automatically transcribed and underwent thematic analysis. The findings suggest that AI monitoring in radiology is still relatively nascent as most of the AI projects had not yet progressed into a fully live clinical deployment. The most common method of monitoring involved a manual process of retrospectively comparing the AI results against the radiology report. Automated and statistical methods of monitoring were much less common. The biggest challenges are a lack of resources to support AI monitoring and uncertainty about how to create a robust and scalable process of monitoring the breadth and variety of radiology AI applications available. There is currently a lack of practical guidelines on how to monitor AI which has led to a variety of approaches being proposed from both healthcare providers and vendors. An ensemble of mixed methods is recommended to monitor AI across multiple domains and metrics. This will be enabled by appropriate allocation of resources and the formation of robust and diverse multidisciplinary AI governance groups.

人工智能(AI)在放射学中的应用越来越普遍;然而,鉴于人工智能模型性能随着时间的推移而下降的固有风险,目前尚不清楚人工智能是如何被监控的,也不清楚应该如何实际做到这一点。本研究调查了目前的做法以及放射科医生在监测人工智能方面面临的困难。对6名美国和10名欧洲的放射科医生进行了半结构化的虚拟访谈。访谈内容被自动抄录并进行专题分析。研究结果表明,放射学中的人工智能监测仍处于相对初级阶段,因为大多数人工智能项目尚未发展到完全现场临床部署。最常见的监测方法是手动将人工智能结果与放射报告进行回顾性比较。自动化和统计监测方法则不太常见。最大的挑战是缺乏支持人工智能监测的资源,以及如何创建一个强大且可扩展的流程来监测可用放射学人工智能应用的广度和多样性的不确定性。目前缺乏关于如何监测人工智能的实用指南,这导致医疗保健提供者和供应商提出了各种方法。建议使用混合方法的集合来跨多个领域和度量监视AI。这将通过适当分配资源和组建强大和多样化的多学科人工智能治理小组来实现。
{"title":"How Do Radiologists Currently Monitor AI in Radiology and What Challenges Do They Face? An Interview Study and Qualitative Analysis.","authors":"Jamie Chow, Ryan Lee, Honghan Wu","doi":"10.1007/s10278-025-01493-8","DOIUrl":"10.1007/s10278-025-01493-8","url":null,"abstract":"<p><p>Artificial intelligence (AI) in radiology is becoming increasingly prevalent; however, there is not a clear picture of how AI is being monitored today and how this should practically be done given the inherent risk of AI model performance degradation over time. This research investigates current practices and what difficulties radiologists face in monitoring AI. Semi-structured virtual interviews were conducted with 6 USA and 10 Europe-based radiologists. The interviews were automatically transcribed and underwent thematic analysis. The findings suggest that AI monitoring in radiology is still relatively nascent as most of the AI projects had not yet progressed into a fully live clinical deployment. The most common method of monitoring involved a manual process of retrospectively comparing the AI results against the radiology report. Automated and statistical methods of monitoring were much less common. The biggest challenges are a lack of resources to support AI monitoring and uncertainty about how to create a robust and scalable process of monitoring the breadth and variety of radiology AI applications available. There is currently a lack of practical guidelines on how to monitor AI which has led to a variety of approaches being proposed from both healthcare providers and vendors. An ensemble of mixed methods is recommended to monitor AI across multiple domains and metrics. This will be enabled by appropriate allocation of resources and the formation of robust and diverse multidisciplinary AI governance groups.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":"6-19"},"PeriodicalIF":0.0,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12920929/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143813388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Preoperative Prediction of Non-functional Pituitary Neuroendocrine Tumors and Posterior Pituitary Tumors Based on MRI Radiomic Features. 基于MRI放射学特征的非功能性垂体神经内分泌肿瘤和垂体后叶肿瘤术前预测。
Pub Date : 2026-02-01 Epub Date: 2025-04-14 DOI: 10.1007/s10278-025-01400-1
Shucheng Jin, Qin Xu, Chen Sun, Yuan Zhang, Yangyang Wang, Xi Wang, Xiudong Guan, Deling Li, Yiming Li, Chuanbao Zhang, Wang Jia

Compared to non-functional pituitary neuroendocrine tumors (NF-PitNETs), posterior pituitary tumors (PPTs) require more intraoperative protection of the pituitary stalk and hypothalamus, and their perioperative management is more complex than NF-PitNETs. However, they are difficult to be distinguished via magnetic resonance images (MRI) before operation. Based on clinical features and radiological signature extracted from MRI, this study aims to establish a model for distinguishing NF-PitNETs and PPTs. Preoperative MRI of 110 patients with NF-PitNETs and 55 patients with PPTs were retrospectively obtained. Patients were randomly assigned to the training (n = 110) and validation (n = 55) cohorts in a 2:1 ratio. The lest absolute shrinkage and selection operator (LASSO) algorithm was applied to develop a radiomic signature. Afterwards, an individualized predictive model (nomogram) incorporating radiomic signatures and predictive clinical features was developed. The nomogram's performance was evaluated by calibration and decision curve analyses. Five features derived from contrast-enhanced images were selected using the LASSO algorithm. Based on the mentioned methods, the calculation formula of radiomic score was obtained. The constructed nomogram incorporating radiomic signature and predictive clinical features showed a good calibration and outperformed the clinical features for predicting NF-PitNETs and PPTs (area under the curve [AUC]: 0.937 vs. 0.595 in training cohort [p < 0.001]; 0.907 vs. 0.782 in validation cohort [p = 0.03]). The decision curve shows that the individualized predictive model adds more benefit than clinical feature when the threshold probability ranges from 10 to 100%. Individualized predictive model provides a novel noninvasive imaging biomarker and could be conveniently used to distinguish NF-PitNETs and PPTs, which provides a significant reference for preoperative preparation and intraoperative decision-making.

与非功能性垂体神经内分泌肿瘤(NF-PitNETs)相比,垂体后叶肿瘤(pts)术中需要对垂体柄和下丘脑进行更多的保护,其围手术期处理也比NF-PitNETs更为复杂。然而,在手术前通过磁共振成像(MRI)很难区分它们。基于临床特征和MRI提取的放射学特征,本研究旨在建立区分NF-PitNETs和PPTs的模型。回顾性分析110例NF-PitNETs患者和55例PPTs患者的术前MRI。患者按2:1的比例随机分配到训练组(n = 110)和验证组(n = 55)。最小绝对收缩和选择算子(LASSO)算法应用于开发放射性特征。随后,结合放射学特征和预测临床特征的个体化预测模型(nomogram)被开发出来。通过标定和决策曲线分析来评价nomogram的性能。利用LASSO算法从对比度增强图像中选择5个特征。根据上述方法,得到了放射学评分的计算公式。结合放射学特征和预测临床特征构建的nomogram(诺图图)在预测NF-PitNETs和PPTs方面具有良好的校准效果,并且优于临床特征(曲线下面积[AUC]: 0.937 vs.训练队列中的0.595)
{"title":"Preoperative Prediction of Non-functional Pituitary Neuroendocrine Tumors and Posterior Pituitary Tumors Based on MRI Radiomic Features.","authors":"Shucheng Jin, Qin Xu, Chen Sun, Yuan Zhang, Yangyang Wang, Xi Wang, Xiudong Guan, Deling Li, Yiming Li, Chuanbao Zhang, Wang Jia","doi":"10.1007/s10278-025-01400-1","DOIUrl":"10.1007/s10278-025-01400-1","url":null,"abstract":"<p><p>Compared to non-functional pituitary neuroendocrine tumors (NF-PitNETs), posterior pituitary tumors (PPTs) require more intraoperative protection of the pituitary stalk and hypothalamus, and their perioperative management is more complex than NF-PitNETs. However, they are difficult to be distinguished via magnetic resonance images (MRI) before operation. Based on clinical features and radiological signature extracted from MRI, this study aims to establish a model for distinguishing NF-PitNETs and PPTs. Preoperative MRI of 110 patients with NF-PitNETs and 55 patients with PPTs were retrospectively obtained. Patients were randomly assigned to the training (n = 110) and validation (n = 55) cohorts in a 2:1 ratio. The lest absolute shrinkage and selection operator (LASSO) algorithm was applied to develop a radiomic signature. Afterwards, an individualized predictive model (nomogram) incorporating radiomic signatures and predictive clinical features was developed. The nomogram's performance was evaluated by calibration and decision curve analyses. Five features derived from contrast-enhanced images were selected using the LASSO algorithm. Based on the mentioned methods, the calculation formula of radiomic score was obtained. The constructed nomogram incorporating radiomic signature and predictive clinical features showed a good calibration and outperformed the clinical features for predicting NF-PitNETs and PPTs (area under the curve [AUC]: 0.937 vs. 0.595 in training cohort [p < 0.001]; 0.907 vs. 0.782 in validation cohort [p = 0.03]). The decision curve shows that the individualized predictive model adds more benefit than clinical feature when the threshold probability ranges from 10 to 100%. Individualized predictive model provides a novel noninvasive imaging biomarker and could be conveniently used to distinguish NF-PitNETs and PPTs, which provides a significant reference for preoperative preparation and intraoperative decision-making.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":"115-126"},"PeriodicalIF":0.0,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12920986/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144056627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-Institutional Evaluation of Large Language Models for Radiology Diagnosis Extraction: A Prompt-Engineering Perspective. 放射学诊断提取的大语言模型的跨机构评估:快速工程的视角。
Pub Date : 2026-02-01 Epub Date: 2025-05-08 DOI: 10.1007/s10278-025-01523-5
Mana Moassefi, Sina Houshmand, Shahriar Faghani, Peter D Chang, Shawn H Sun, Bardia Khosravi, Aakash G Triphati, Ghulam Rasool, Neil K Bhatia, Les Folio, Katherine P Andriole, Judy W Gichoya, Bradley J Erickson

The rapid evolution of large language models (LLMs) offers promising opportunities for radiology report annotation, aiding in determining the presence of specific findings. This study evaluates the effectiveness of a human-optimized prompt in labeling radiology reports across multiple institutions using LLMs. Six distinct institutions collected 500 radiology reports: 100 in each of 5 categories. A standardized Python script was distributed to participating sites, allowing the use of one common locally executed LLM with a standard human-optimized prompt. The script executed the LLM's analysis for each report and compared predictions to reference labels provided by local investigators. Models' performance using accuracy was calculated, and results were aggregated centrally. The human-optimized prompt demonstrated high consistency across sites and pathologies. Preliminary analysis indicates significant agreement between the LLM's outputs and investigator-provided reference across multiple institutions. At one site, eight LLMs were systematically compared, with Llama 3.1 70b achieving the highest performance in accurately identifying the specified findings. Comparable performance with Llama 3.1 70b was observed at two additional centers, demonstrating the model's robust adaptability to variations in report structures and institutional practices. Our findings illustrate the potential of optimized prompt engineering in leveraging LLMs for cross-institutional radiology report labeling. This approach is straightforward while maintaining high accuracy and adaptability. Future work will explore model robustness to diverse report structures and further refine prompts to improve generalizability.

大型语言模型(llm)的快速发展为放射学报告注释提供了有希望的机会,有助于确定特定发现的存在。本研究评估了在多个机构使用llm标记放射学报告中人类优化提示的有效性。6个不同的机构收集了500份放射学报告:5类各100份。标准化的Python脚本被分发到参与的站点,允许使用一个带有标准人为优化提示符的通用本地执行LLM。该脚本执行法学硕士对每个报告的分析,并将预测与当地调查人员提供的参考标签进行比较。利用精度计算模型的性能,并对结果进行集中汇总。人工优化提示在不同部位和病理表现出高度的一致性。初步分析表明,法学硕士的产出和跨多个机构的研究者提供的参考之间存在显著的一致性。在一个站点,对8个llm进行了系统比较,Llama 3.1 70b在准确识别指定结果方面表现最佳。在另外两个中心观察到与Llama 3.1 70b相当的性能,表明该模型对报告结构和机构实践的变化具有强大的适应性。我们的研究结果说明了利用llm进行跨机构放射学报告标记的优化提示工程的潜力。这种方法简单明了,同时又保持了较高的准确性和适应性。未来的工作将探索模型对不同报告结构的稳健性,并进一步完善提示以提高概括性。
{"title":"Cross-Institutional Evaluation of Large Language Models for Radiology Diagnosis Extraction: A Prompt-Engineering Perspective.","authors":"Mana Moassefi, Sina Houshmand, Shahriar Faghani, Peter D Chang, Shawn H Sun, Bardia Khosravi, Aakash G Triphati, Ghulam Rasool, Neil K Bhatia, Les Folio, Katherine P Andriole, Judy W Gichoya, Bradley J Erickson","doi":"10.1007/s10278-025-01523-5","DOIUrl":"10.1007/s10278-025-01523-5","url":null,"abstract":"<p><p>The rapid evolution of large language models (LLMs) offers promising opportunities for radiology report annotation, aiding in determining the presence of specific findings. This study evaluates the effectiveness of a human-optimized prompt in labeling radiology reports across multiple institutions using LLMs. Six distinct institutions collected 500 radiology reports: 100 in each of 5 categories. A standardized Python script was distributed to participating sites, allowing the use of one common locally executed LLM with a standard human-optimized prompt. The script executed the LLM's analysis for each report and compared predictions to reference labels provided by local investigators. Models' performance using accuracy was calculated, and results were aggregated centrally. The human-optimized prompt demonstrated high consistency across sites and pathologies. Preliminary analysis indicates significant agreement between the LLM's outputs and investigator-provided reference across multiple institutions. At one site, eight LLMs were systematically compared, with Llama 3.1 70b achieving the highest performance in accurately identifying the specified findings. Comparable performance with Llama 3.1 70b was observed at two additional centers, demonstrating the model's robust adaptability to variations in report structures and institutional practices. Our findings illustrate the potential of optimized prompt engineering in leveraging LLMs for cross-institutional radiology report labeling. This approach is straightforward while maintaining high accuracy and adaptability. Future work will explore model robustness to diverse report structures and further refine prompts to improve generalizability.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":"989-994"},"PeriodicalIF":0.0,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12920939/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144059653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modified Dice Coefficients for Evaluation of Tumor Segmentation from PET Images: A Proof-of-Concept Study. 用于评估PET图像中肿瘤分割的改进骰子系数:概念验证研究。
Pub Date : 2026-02-01 Epub Date: 2025-05-08 DOI: 10.1007/s10278-025-01535-1
Oona Rainio, Riku Klén

The Sørensen-Dice similarity coefficient (DSC) is the most common evaluation metric used for image segmentation but it is not always ideal. Namely, the DSC values only depend on the number of misplaced elements instead of their location with respect to the correct segments. Because of this, the DSC is ill-suited for such tasks where the correct location of the borders of an object is difficult to define in an objective way, as is the case in tumor segmentation in positron emission tomography (PET) images. To avoid this issue, we introduce two different modifications of the DSC, one with weights and one with an additional loss term, which also evaluate the distance between the real and the predicted segments. We computed the values of DSC and our new coefficient from 191 predicted tumor segmentation masks created by using PET images of 89 head and neck squamous cell carcinoma patients. We compared the values of all three coefficients with the scores given to these masks by human evaluators. According to our results, the weighted modification of DSC had a higher correlation with the scores given by the human evaluators than the original DSC, and it also produced significantly less variation within the two highest score classes (p-value 0.018). The new weighted coefficient introduced here has much potential in the evaluation of segmentation results from medical imaging.

Sørensen-Dice相似系数(DSC)是图像分割中最常用的评价指标,但它并不总是理想的。也就是说,DSC值仅取决于错位元素的数量,而不是它们相对于正确片段的位置。正因为如此,DSC不适合这样的任务,物体边界的正确位置很难以客观的方式定义,就像正电子发射断层扫描(PET)图像中的肿瘤分割一样。为了避免这个问题,我们引入了DSC的两种不同的修改,一种带有权重,另一种带有额外的损失项,它也评估真实段和预测段之间的距离。我们利用89例头颈部鳞状细胞癌患者的PET图像,从191个预测的肿瘤分割掩模中计算DSC值和我们的新系数。我们将所有三个系数的值与人类评估者给这些口罩的分数进行了比较。根据我们的研究结果,与原始DSC相比,DSC的加权修正与人类评价者给出的分数具有更高的相关性,并且在两个最高分数类别内产生的变异也显着减少(p值≤0.018)。本文提出的新加权系数在医学影像分割结果的评价中具有很大的应用潜力。
{"title":"Modified Dice Coefficients for Evaluation of Tumor Segmentation from PET Images: A Proof-of-Concept Study.","authors":"Oona Rainio, Riku Klén","doi":"10.1007/s10278-025-01535-1","DOIUrl":"10.1007/s10278-025-01535-1","url":null,"abstract":"<p><p>The Sørensen-Dice similarity coefficient (DSC) is the most common evaluation metric used for image segmentation but it is not always ideal. Namely, the DSC values only depend on the number of misplaced elements instead of their location with respect to the correct segments. Because of this, the DSC is ill-suited for such tasks where the correct location of the borders of an object is difficult to define in an objective way, as is the case in tumor segmentation in positron emission tomography (PET) images. To avoid this issue, we introduce two different modifications of the DSC, one with weights and one with an additional loss term, which also evaluate the distance between the real and the predicted segments. We computed the values of DSC and our new coefficient from 191 predicted tumor segmentation masks created by using PET images of 89 head and neck squamous cell carcinoma patients. We compared the values of all three coefficients with the scores given to these masks by human evaluators. According to our results, the weighted modification of DSC had a higher correlation with the scores given by the human evaluators than the original DSC, and it also produced significantly less variation within the two highest score classes (p-value <math><mo>≤</mo></math> 0.018). The new weighted coefficient introduced here has much potential in the evaluation of segmentation results from medical imaging.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":"785-793"},"PeriodicalIF":0.0,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12920985/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144061588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Deep-Learning Approach for Vocal Fold Pose Estimation in Videoendoscopy. 视频内窥镜下声带姿态估计的深度学习方法。
Pub Date : 2026-02-01 Epub Date: 2025-02-12 DOI: 10.1007/s10278-025-01431-8
Francesca Pia Villani, Maria Chiara Fiorentino, Lorenzo Federici, Cesare Piazza, Emanuele Frontoni, Alberto Paderno, Sara Moccia

Accurate vocal fold (VF) pose estimation is crucial for diagnosing larynx diseases that can eventually lead to VF paralysis. The videoendoscopic examination is used to assess VF motility, usually estimating the change in the anterior glottic angle (AGA). This is a subjective and time-consuming procedure requiring extensive expertise. This research proposes a deep learning framework to estimate VF pose from laryngoscopy frames acquired in the actual clinical practice. The framework performs heatmap regression relying on three anatomically relevant keypoints as a prior for AGA computation, which is estimated from the coordinates of the predicted points. The assessment of the proposed framework is performed using a newly collected dataset of 471 laryngoscopy frames from 124 patients, 28 of whom with cancer. The framework was tested in various configurations and compared with other state-of-the-art approaches (direct keypoints regression and glottal segmentation) for both pose estimation, and AGA evaluation. The proposed framework obtained the lowest root mean square error (RMSE) computed on all the keypoints (5.09, 6.56, and 6.40 pixels, respectively) among all the models tested for VF pose estimation. Also for the AGA evaluation, heatmap regression reached the lowest mean average error (MAE) ( 5 . 87 ). Results show that relying on keypoints heatmap regression allows to perform VF pose estimation with a small error, overcoming drawbacks of state-of-the-art algorithms, especially in challenging images such as pathologic subjects, presence of noise, and occlusion.

准确的声带(VF)姿态估计是诊断喉疾病的关键,最终可能导致VF瘫痪。视频内窥镜检查用于评估VF运动,通常估计声门前角(AGA)的变化。这是一个主观且耗时的过程,需要广泛的专业知识。本研究提出了一个深度学习框架,从实际临床实践中获得的喉镜框架中估计VF姿势。该框架执行热图回归依赖于三个解剖学上相关的关键点作为AGA计算的先验,这是从预测点的坐标估计的。对拟议框架的评估是使用新收集的来自124名患者的471个喉镜框架数据集进行的,其中28名患者患有癌症。该框架在各种配置下进行了测试,并与其他最先进的方法(直接关键点回归和声门分割)进行了比较,用于姿态估计和AGA评估。在所有模型中,该框架在所有关键点(分别为5.09、6.56和6.40像素)上计算的均方根误差(RMSE)最小。同样对于AGA评价,热图回归达到最低的平均误差(MAE)(5。87°)。结果表明,依靠关键点热图回归可以以较小的误差进行VF姿态估计,克服了最先进算法的缺点,特别是在具有挑战性的图像中,例如病理受试者,存在噪声和遮挡。
{"title":"A Deep-Learning Approach for Vocal Fold Pose Estimation in Videoendoscopy.","authors":"Francesca Pia Villani, Maria Chiara Fiorentino, Lorenzo Federici, Cesare Piazza, Emanuele Frontoni, Alberto Paderno, Sara Moccia","doi":"10.1007/s10278-025-01431-8","DOIUrl":"10.1007/s10278-025-01431-8","url":null,"abstract":"<p><p>Accurate vocal fold (VF) pose estimation is crucial for diagnosing larynx diseases that can eventually lead to VF paralysis. The videoendoscopic examination is used to assess VF motility, usually estimating the change in the anterior glottic angle (AGA). This is a subjective and time-consuming procedure requiring extensive expertise. This research proposes a deep learning framework to estimate VF pose from laryngoscopy frames acquired in the actual clinical practice. The framework performs heatmap regression relying on three anatomically relevant keypoints as a prior for AGA computation, which is estimated from the coordinates of the predicted points. The assessment of the proposed framework is performed using a newly collected dataset of 471 laryngoscopy frames from 124 patients, 28 of whom with cancer. The framework was tested in various configurations and compared with other state-of-the-art approaches (direct keypoints regression and glottal segmentation) for both pose estimation, and AGA evaluation. The proposed framework obtained the lowest root mean square error (RMSE) computed on all the keypoints (5.09, 6.56, and 6.40 pixels, respectively) among all the models tested for VF pose estimation. Also for the AGA evaluation, heatmap regression reached the lowest mean average error (MAE) ( <math><mrow><mn>5</mn> <mo>.</mo> <msup><mn>87</mn> <mo>∘</mo></msup> </mrow> </math> ). Results show that relying on keypoints heatmap regression allows to perform VF pose estimation with a small error, overcoming drawbacks of state-of-the-art algorithms, especially in challenging images such as pathologic subjects, presence of noise, and occlusion.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":"842-852"},"PeriodicalIF":0.0,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12920861/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143412210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancing Visual Perception Through VCANet-Crossover Osprey Algorithm: Integrating Visual Technologies. 通过VCANet-Crossover鱼鹰算法提升视觉感知:集成视觉技术。
Pub Date : 2026-02-01 Epub Date: 2025-04-03 DOI: 10.1007/s10278-025-01467-w
Yuwen Ning, Jiaxin Li, Shuyi Sun

Diabetic retinopathy (DR) is a significant vision-threatening condition, necessitating accurate and efficient automated screening methods. Traditional deep learning (DL) models struggle to detect subtle lesions and also suffer from high computational complexity. Existing models primarily mimic the primary visual cortex (V1) of the human visual system, neglecting other higher-order processing regions. To overcome these limitations, this research introduces the vision core-adapted network-based crossover osprey algorithm (VCANet-COP) for subtle lesion recognition with better computational efficiency. The model integrates sparse autoencoders (SAEs) to extract vascular structures and lesion-specific features at a pixel level for improved abnormality detection. The front-end network in the VCANet emulates the V1, V2, V4, and inferotemporal (IT) regions to derive subtle lesions effectively and improve lesion detection accuracy. Additionally, the COP algorithm leveraging the osprey optimization algorithm (OOA) with a crossover strategy optimizes hyperparameters and network configurations to ensure better computational efficiency, faster convergence, and enhanced performance in lesion recognition. The experimental assessment of the VCANet-COP model on multiple DR datasets namely Diabetic_Retinopathy_Data (DR-Data), Structured Analysis of the Retina (STARE) dataset, Indian Diabetic Retinopathy Image Dataset (IDRiD), Digital Retinal Images for Vessel Extraction (DRIVE) dataset, and Retinal fundus multi-disease image dataset (RFMID) demonstrates superior performance over baseline works, namely EDLDR, FFU_Net, LSTM_MFORG, fundus-DeepNet, and CNN_SVD by achieving average outcomes of 98.14% accuracy, 97.9% sensitivity, 98.08% specificity, 98.4% precision, 98.1% F1-score, 96.2% kappa coefficient, 2.0% false positive rate (FPR), 2.1% false negative rate (FNR), and 1.5-s execution time. By addressing critical limitations, VCANet-COP provides a scalable and robust solution for real-world DR screening and clinical decision support.

糖尿病视网膜病变(DR)是一种严重威胁视力的疾病,需要准确高效的自动筛查方法。传统的深度学习(DL)模型难以检测到细微病变,而且计算复杂度高。现有模型主要模仿人类视觉系统的初级视觉皮层(V1),忽略了其他高阶处理区域。为了克服这些局限性,本研究引入了基于视觉核心适配网络的交叉鱼鹰算法(VCANet-COP),以更高的计算效率识别细微病变。该模型集成了稀疏自动编码器(SAE),以提取像素级的血管结构和病变特定特征,从而改进异常检测。VCANet 中的前端网络模拟了 V1、V2、V4 和颞下部(IT)区域,可有效提取细微病变,提高病变检测的准确性。此外,COP 算法利用带有交叉策略的鱼鹰优化算法(OOA)优化超参数和网络配置,以确保更好的计算效率、更快的收敛速度和更高的病变识别性能。在多个糖尿病视网膜病变数据集(即糖尿病视网膜病变数据集(DR-Data)、视网膜结构分析数据集(STARE)、印度糖尿病视网膜病变图像数据集(IDRiD))上对 VCANet-COP 模型进行了实验评估、数字视网膜血管提取图像数据集(DRIVE)和视网膜眼底多种疾病图像数据集(RFMID)的性能优于基线研究成果,即 EDLDR、FFU_Net、LSTM_MFORG、fundus-DeepNet 和 CNN_SVD,平均准确率达到 98.14% 的准确率、97.9% 的灵敏度、98.08% 的特异性、98.4% 的精确度、98.1% 的 F1 分数、96.2% 的卡帕系数、2.0% 的假阳性率 (FPR)、2.1% 的假阴性率 (FNR) 和 1.5 秒的执行时间。通过解决关键的局限性问题,VCANet-COP 为真实世界的 DR 筛查和临床决策支持提供了一个可扩展的强大解决方案。
{"title":"Advancing Visual Perception Through VCANet-Crossover Osprey Algorithm: Integrating Visual Technologies.","authors":"Yuwen Ning, Jiaxin Li, Shuyi Sun","doi":"10.1007/s10278-025-01467-w","DOIUrl":"10.1007/s10278-025-01467-w","url":null,"abstract":"<p><p>Diabetic retinopathy (DR) is a significant vision-threatening condition, necessitating accurate and efficient automated screening methods. Traditional deep learning (DL) models struggle to detect subtle lesions and also suffer from high computational complexity. Existing models primarily mimic the primary visual cortex (V1) of the human visual system, neglecting other higher-order processing regions. To overcome these limitations, this research introduces the vision core-adapted network-based crossover osprey algorithm (VCANet-COP) for subtle lesion recognition with better computational efficiency. The model integrates sparse autoencoders (SAEs) to extract vascular structures and lesion-specific features at a pixel level for improved abnormality detection. The front-end network in the VCANet emulates the V1, V2, V4, and inferotemporal (IT) regions to derive subtle lesions effectively and improve lesion detection accuracy. Additionally, the COP algorithm leveraging the osprey optimization algorithm (OOA) with a crossover strategy optimizes hyperparameters and network configurations to ensure better computational efficiency, faster convergence, and enhanced performance in lesion recognition. The experimental assessment of the VCANet-COP model on multiple DR datasets namely Diabetic_Retinopathy_Data (DR-Data), Structured Analysis of the Retina (STARE) dataset, Indian Diabetic Retinopathy Image Dataset (IDRiD), Digital Retinal Images for Vessel Extraction (DRIVE) dataset, and Retinal fundus multi-disease image dataset (RFMID) demonstrates superior performance over baseline works, namely EDLDR, FFU_Net, LSTM_MFORG, fundus-DeepNet, and CNN_SVD by achieving average outcomes of 98.14% accuracy, 97.9% sensitivity, 98.08% specificity, 98.4% precision, 98.1% F1-score, 96.2% kappa coefficient, 2.0% false positive rate (FPR), 2.1% false negative rate (FNR), and 1.5-s execution time. By addressing critical limitations, VCANet-COP provides a scalable and robust solution for real-world DR screening and clinical decision support.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":"669-698"},"PeriodicalIF":0.0,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12920876/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143782301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unified Framework for Enhancement of Low-Quality Fundus Images. 低质量眼底图像增强的统一框架。
Pub Date : 2026-02-01 Epub Date: 2025-04-29 DOI: 10.1007/s10278-025-01509-3
Lihua Ding, Chengyi Zhang, Xingzheng Lyu, Deji Cheng, Shuchang Xu

Compared to desktop fundus cameras, handheld ones offer portability and affordability, although they often produce lower-quality images. This paper primarily addresses the issue of reduced image quality commonly associated with images captured by handheld fundus cameras. We first collected 538 fundus images obtained from handheld devices to form a dataset called Mule. A unified framework that consists of three main modules is then proposed to enhance the quality of fundus images. The Light Balance Module is employed first to suppress overexposure and underexposure. This is followed by the Super Resolution Module to enhance vascular details. Finally, the Vessel Enhancement Module is applied to improve image contrast. And a special preservation strategy is additionally applied to retain mocular features in the final fundus image. Objective evaluations demonstrate that the proposed framework yields the most promising results. Further experiments also suggest that it improves accuracy in downstream tasks, such as vessel segmentation, optic disc/optic cup detection, macula detection, and fundus image quality assessment. Our code is available at: https://github.com/Alen880/UFELQ.

与台式眼底相机相比,手持眼底相机具有便携性和可负担性,尽管它们通常产生的图像质量较低。本文主要解决的问题,降低图像质量通常与手持式眼底相机捕获的图像相关。我们首先收集了538张从手持设备获得的眼底图像,形成了一个名为Mule的数据集。然后提出了一个由三个主要模块组成的统一框架来提高眼底图像的质量。首先使用光平衡模块来抑制过度曝光和曝光不足。接下来是超分辨率模块来增强血管细节。最后,利用血管增强模块提高图像对比度。此外,还采用了一种特殊的保存策略来保留最终眼底图像中的分子特征。客观评价表明,提议的框架产生了最有希望的结果。进一步的实验还表明,它提高了下游任务的准确性,如血管分割、视盘/视杯检测、黄斑检测和眼底图像质量评估。我们的代码可在:https://github.com/Alen880/UFELQ。
{"title":"Unified Framework for Enhancement of Low-Quality Fundus Images.","authors":"Lihua Ding, Chengyi Zhang, Xingzheng Lyu, Deji Cheng, Shuchang Xu","doi":"10.1007/s10278-025-01509-3","DOIUrl":"10.1007/s10278-025-01509-3","url":null,"abstract":"<p><p>Compared to desktop fundus cameras, handheld ones offer portability and affordability, although they often produce lower-quality images. This paper primarily addresses the issue of reduced image quality commonly associated with images captured by handheld fundus cameras. We first collected 538 fundus images obtained from handheld devices to form a dataset called Mule. A unified framework that consists of three main modules is then proposed to enhance the quality of fundus images. The Light Balance Module is employed first to suppress overexposure and underexposure. This is followed by the Super Resolution Module to enhance vascular details. Finally, the Vessel Enhancement Module is applied to improve image contrast. And a special preservation strategy is additionally applied to retain mocular features in the final fundus image. Objective evaluations demonstrate that the proposed framework yields the most promising results. Further experiments also suggest that it improves accuracy in downstream tasks, such as vessel segmentation, optic disc/optic cup detection, macula detection, and fundus image quality assessment. Our code is available at: https://github.com/Alen880/UFELQ.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":"699-713"},"PeriodicalIF":0.0,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12921125/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144048922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of Reporting Methods for Assessment and Surgical Planning of Perianal Fistulas. 评估肛周瘘管的报告方法及手术计划。
Pub Date : 2026-02-01 Epub Date: 2025-05-13 DOI: 10.1007/s10278-025-01524-4
Sarra Kharbech, Nabil Sherif Mahmood, Ma'mon Qasem, Julien Abinahed, Amal Alobadli, Mohamed Abunada, Omar Aboumarzouk, Abdulla Al Ansari, Shidin Balakrishnan, Nikhil Navkar, Adham Darweesh

Perianal fistula is a complex condition where surgeons conduct surgeries based on the mentally mapped images they created from the information found in the radiology report. If not properly treated, a fistula could reoccur. To reduce the chances of reoccurrence, a patient-specific, visual, and accurate depiction of the internal tracts in relation to the pelvic floor is required. A three-dimensional (3D) parametric model generation software was previously developed and evaluated successfully with radiologists. In this paper, the software output is evaluated with two colorectal surgeons for 10 fistula cases. The paper compares three reporting different modes: (1) 3D models only, (2) conventional radiology report and picture archiving and communication system (PACS) magnetic resonance (MR) images, and (3) 3D models + standardized radiology report. The percentage of agreement between surgeons across cases and cognitive load are the primary metrics used for evaluation. Mode 3 superseded both modes 1 and 2, meaning that surgeons prefer to see a 3D model along with a standardized report to plan a case's surgical intervention. Mode 1 superseded mode 2, which also shows surgeons preference to inspect a 3D model rather than inspecting cases the conventional way. Surgeons' agreement in opinions across cases in mode 3 was 85%, whereas it was 18% and 5% in mode 1 and mode 2, respectively. This shows that information was conveyed more consistently across surgeons in mode 3. NASA TLX tests show that surgeons had the least cognitive load while working with mode 3, followed by mode 1 and then mode 2. Overall, the findings indicate that 3D models, even without radiologists' written input, outperform the current standard practice of delivering unstructured radiology reports alongside raw PACS images.

肛周瘘是一种复杂的疾病,外科医生根据他们从放射学报告中找到的信息创建的心理映射图像进行手术。如果治疗不当,瘘管可能会再次发生。为了减少复发的机会,需要对与盆底相关的内束进行患者特异性的、视觉的和准确的描述。一个三维(3D)参数化模型生成软件之前被开发出来,并与放射科医生成功地进行了评估。本文对两名结直肠外科医生治疗10例瘘管病例的软件输出进行了评估。本文比较了三种不同的报告模式:(1)仅3D模型,(2)常规放射报告和图像存档与通信系统(PACS)磁共振(MR)图像,(3)3D模型+标准化放射报告。外科医生在不同病例之间的一致性百分比和认知负荷是用于评估的主要指标。模式3取代了模式1和模式2,这意味着外科医生更喜欢看到3D模型和标准化报告,以计划病例的手术干预。模式1取代了模式2,这也表明外科医生更倾向于检查3D模型,而不是传统的检查方法。在模式3的病例中,外科医生的意见一致性为85%,而在模式1和模式2中分别为18%和5%。这表明在模式3下,信息在外科医生之间的传递更加一致。NASA TLX测试显示,外科医生在使用模式3时的认知负荷最小,其次是模式1,然后是模式2。总体而言,研究结果表明,即使没有放射科医生的书面输入,3D模型也优于目前的标准做法,即在提供原始PACS图像的同时提供非结构化放射学报告。
{"title":"Evaluation of Reporting Methods for Assessment and Surgical Planning of Perianal Fistulas.","authors":"Sarra Kharbech, Nabil Sherif Mahmood, Ma'mon Qasem, Julien Abinahed, Amal Alobadli, Mohamed Abunada, Omar Aboumarzouk, Abdulla Al Ansari, Shidin Balakrishnan, Nikhil Navkar, Adham Darweesh","doi":"10.1007/s10278-025-01524-4","DOIUrl":"10.1007/s10278-025-01524-4","url":null,"abstract":"<p><p>Perianal fistula is a complex condition where surgeons conduct surgeries based on the mentally mapped images they created from the information found in the radiology report. If not properly treated, a fistula could reoccur. To reduce the chances of reoccurrence, a patient-specific, visual, and accurate depiction of the internal tracts in relation to the pelvic floor is required. A three-dimensional (3D) parametric model generation software was previously developed and evaluated successfully with radiologists. In this paper, the software output is evaluated with two colorectal surgeons for 10 fistula cases. The paper compares three reporting different modes: (1) 3D models only, (2) conventional radiology report and picture archiving and communication system (PACS) magnetic resonance (MR) images, and (3) 3D models + standardized radiology report. The percentage of agreement between surgeons across cases and cognitive load are the primary metrics used for evaluation. Mode 3 superseded both modes 1 and 2, meaning that surgeons prefer to see a 3D model along with a standardized report to plan a case's surgical intervention. Mode 1 superseded mode 2, which also shows surgeons preference to inspect a 3D model rather than inspecting cases the conventional way. Surgeons' agreement in opinions across cases in mode 3 was 85%, whereas it was 18% and 5% in mode 1 and mode 2, respectively. This shows that information was conveyed more consistently across surgeons in mode 3. NASA TLX tests show that surgeons had the least cognitive load while working with mode 3, followed by mode 1 and then mode 2. Overall, the findings indicate that 3D models, even without radiologists' written input, outperform the current standard practice of delivering unstructured radiology reports alongside raw PACS images.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":"20-33"},"PeriodicalIF":0.0,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12921070/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144048918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of imaging informatics in medicine
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1