Pub Date : 2025-09-01Epub Date: 2025-05-28DOI: 10.1117/1.JMI.12.5.051805
Kazi Ramisa Rifa, Md Atik Ahamed, Jie Zhang, Abdullah Imran
Purpose: The accurate assessment of computed tomography (CT) image quality is crucial for ensuring diagnostic reliability while minimizing radiation dose. Radiologists' evaluations are time-consuming and labor-intensive. Existing automated approaches often require large CT datasets with predefined image quality assessment (IQA) scores, which often do not align well with clinical evaluations. We aim to develop a reference-free, automated method for CT IQA that closely reflects radiologists' evaluations, reducing the dependency on large annotated datasets.
Approach: We propose Task-Focused Knowledge Transfer (TFKT), a deep learning-based IQA method leveraging knowledge transfer from task-similar natural image datasets. TFKT incorporates a hybrid convolutional neural network-transformer model, enabling accurate quality predictions by learning from natural image distortions with human-annotated mean opinion scores. The model is pre-trained on natural image datasets and fine-tuned on low-dose computed tomography perceptual image quality assessment data to ensure task-specific adaptability.
Results: Extensive evaluations demonstrate that the proposed TFKT method effectively predicts IQA scores aligned with radiologists' assessments on in-domain datasets and generalizes well to out-of-domain clinical pediatric CT exams. The model achieves robust performance without requiring high-dose reference images. Our model is capable of assessing the quality of CT image slices in a second.
Conclusions: The proposed TFKT approach provides a scalable, accurate, and reference-free solution for CT IQA. The model bridges the gap between traditional and deep learning-based IQA, offering clinically relevant and computationally efficient assessments applicable to real-world clinical settings.
{"title":"TFKT V2: task-focused knowledge transfer from natural images for computed tomography perceptual image quality assessment.","authors":"Kazi Ramisa Rifa, Md Atik Ahamed, Jie Zhang, Abdullah Imran","doi":"10.1117/1.JMI.12.5.051805","DOIUrl":"10.1117/1.JMI.12.5.051805","url":null,"abstract":"<p><strong>Purpose: </strong>The accurate assessment of computed tomography (CT) image quality is crucial for ensuring diagnostic reliability while minimizing radiation dose. Radiologists' evaluations are time-consuming and labor-intensive. Existing automated approaches often require large CT datasets with predefined image quality assessment (IQA) scores, which often do not align well with clinical evaluations. We aim to develop a reference-free, automated method for CT IQA that closely reflects radiologists' evaluations, reducing the dependency on large annotated datasets.</p><p><strong>Approach: </strong>We propose Task-Focused Knowledge Transfer (TFKT), a deep learning-based IQA method leveraging knowledge transfer from task-similar natural image datasets. TFKT incorporates a hybrid convolutional neural network-transformer model, enabling accurate quality predictions by learning from natural image distortions with human-annotated mean opinion scores. The model is pre-trained on natural image datasets and fine-tuned on low-dose computed tomography perceptual image quality assessment data to ensure task-specific adaptability.</p><p><strong>Results: </strong>Extensive evaluations demonstrate that the proposed TFKT method effectively predicts IQA scores aligned with radiologists' assessments on in-domain datasets and generalizes well to out-of-domain clinical pediatric CT exams. The model achieves robust performance without requiring high-dose reference images. Our model is capable of assessing the quality of <math><mrow><mo>∼</mo> <mn>30</mn></mrow> </math> CT image slices in a second.</p><p><strong>Conclusions: </strong>The proposed TFKT approach provides a scalable, accurate, and reference-free solution for CT IQA. The model bridges the gap between traditional and deep learning-based IQA, offering clinically relevant and computationally efficient assessments applicable to real-world clinical settings.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 5","pages":"051805"},"PeriodicalIF":1.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12116730/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144182165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-09-17DOI: 10.1117/1.JMI.12.5.054001
Andrew M Birnbaum, Adam Buchwald, Peter Turkeltaub, Adam Jacks, George Carr, Shreya Kannan, Yu Huang, Abhisheck Datta, Lucas C Parra, Lukas A Hirsch
Purpose: Our goal was to develop a deep network for whole-head segmentation, including clinical magnetic resonance imaging (MRI) with abnormal anatomy, and compile the first public benchmark dataset for this purpose. We collected 98 MRIs with volumetric segmentation labels for a diverse set of human subjects, including normal and abnormal anatomy in clinical cases of stroke and disorders of consciousness.
Approach: Training labels were generated by manually correcting initial automated segmentations for skin/scalp, skull, cerebro-spinal fluid, gray matter, white matter, air cavity, and extracephalic air. We developed a "MultiAxial" network consisting of three 2D U-Net that operate independently in sagittal, axial, and coronal planes, which are then combined to produce a single 3D segmentation.
Results: The MultiAxial network achieved a test-set Dice scores of (median ± interquartile range) on whole-head segmentation, including gray and white matter. This was compared with for Multipriors and for SPM12, two standard tools currently available for this task. The MultiAxial network gains in robustness by avoiding the need for coregistration with an atlas. It performed well in regions with abnormal anatomy and on images that have been de-identified. It enables more accurate and robust current flow modeling when incorporated into ROAST, a widely used modeling toolbox for transcranial electric stimulation.
Conclusions: We are releasing a new state-of-the-art tool for whole-head MRI segmentation in abnormal anatomy, along with the largest volume of labeled clinical head MRIs, including labels for nonbrain structures. Together, the model and data may serve as a benchmark for future efforts.
{"title":"Full-head segmentation of MRI with abnormal brain anatomy: model and data release.","authors":"Andrew M Birnbaum, Adam Buchwald, Peter Turkeltaub, Adam Jacks, George Carr, Shreya Kannan, Yu Huang, Abhisheck Datta, Lucas C Parra, Lukas A Hirsch","doi":"10.1117/1.JMI.12.5.054001","DOIUrl":"10.1117/1.JMI.12.5.054001","url":null,"abstract":"<p><strong>Purpose: </strong>Our goal was to develop a deep network for whole-head segmentation, including clinical magnetic resonance imaging (MRI) with abnormal anatomy, and compile the first public benchmark dataset for this purpose. We collected 98 MRIs with volumetric segmentation labels for a diverse set of human subjects, including normal and abnormal anatomy in clinical cases of stroke and disorders of consciousness.</p><p><strong>Approach: </strong>Training labels were generated by manually correcting initial automated segmentations for skin/scalp, skull, cerebro-spinal fluid, gray matter, white matter, air cavity, and extracephalic air. We developed a \"MultiAxial\" network consisting of three 2D U-Net that operate independently in sagittal, axial, and coronal planes, which are then combined to produce a single 3D segmentation.</p><p><strong>Results: </strong>The MultiAxial network achieved a test-set Dice scores of <math><mrow><mn>0.88</mn> <mo>±</mo> <mn>0.04</mn></mrow> </math> (median ± interquartile range) on whole-head segmentation, including gray and white matter. This was compared with <math><mrow><mn>0.86</mn> <mo>±</mo> <mn>0.04</mn></mrow> </math> for Multipriors and <math><mrow><mn>0.79</mn> <mo>±</mo> <mn>0.10</mn></mrow> </math> for SPM12, two standard tools currently available for this task. The MultiAxial network gains in robustness by avoiding the need for coregistration with an atlas. It performed well in regions with abnormal anatomy and on images that have been de-identified. It enables more accurate and robust current flow modeling when incorporated into ROAST, a widely used modeling toolbox for transcranial electric stimulation.</p><p><strong>Conclusions: </strong>We are releasing a new state-of-the-art tool for whole-head MRI segmentation in abnormal anatomy, along with the largest volume of labeled clinical head MRIs, including labels for nonbrain structures. Together, the model and data may serve as a benchmark for future efforts.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 5","pages":"054001"},"PeriodicalIF":1.7,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12442731/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145087827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-10-18DOI: 10.1117/1.JMI.12.5.053502
Xiao Jiang, Grace J Gang, J Webster Stayman
Purpose: Medical implants, often made of dense materials, pose significant challenges to accurate computed tomography (CT) reconstruction, especially near implants due to beam hardening and partial-volume artifacts. Moreover, diagnostics involving implants often require separate visualization for implants and anatomy. In this work, we propose a approach for joint estimation of anatomy and implants as separate volumes using a mixed prior model.
Approach: We leverage a learning-based prior for anatomy and a sparsity prior for implants to decouple the two volumes. In addition, a hybrid mono-polyenergetic forward model is employed to accommodate the spectral effects of implants, and a multiresolution object model is used to achieve high-resolution implant reconstruction. The reconstruction process alternates between diffusion posterior sampling for anatomy updates and classic optimization for implants and spectral coefficients.
Results: Evaluations were performed on emulated cardiac imaging with stent and spine imaging with pedicle screws. The structures of the cardiac stent with 0.25 mm wires were clearly visualized in the implant images, whereas the blooming artifacts around the stent were effectively suppressed in the anatomical reconstruction. For pedicle screws, the proposed algorithm mitigated streaking and beam-hardening artifacts in the anatomy volume, demonstrating significant improvements in SSIM and PSNR compared with frequency-splitting metal artifact reduction and model-based reconstruction on slices containing implants.
Conclusion: The proposed mixed prior model coupled with a hybrid spectral and multiresolution model can help to separate spatially and spectrally distinct objects that differ from anatomical features in single-energy CT, improving both image quality and separate visualization of implants and anatomy.
{"title":"Joint CT reconstruction of anatomy and implants using a mixed prior model.","authors":"Xiao Jiang, Grace J Gang, J Webster Stayman","doi":"10.1117/1.JMI.12.5.053502","DOIUrl":"10.1117/1.JMI.12.5.053502","url":null,"abstract":"<p><strong>Purpose: </strong>Medical implants, often made of dense materials, pose significant challenges to accurate computed tomography (CT) reconstruction, especially near implants due to beam hardening and partial-volume artifacts. Moreover, diagnostics involving implants often require separate visualization for implants and anatomy. In this work, we propose a approach for joint estimation of anatomy and implants as separate volumes using a mixed prior model.</p><p><strong>Approach: </strong>We leverage a learning-based prior for anatomy and a sparsity prior for implants to decouple the two volumes. In addition, a hybrid mono-polyenergetic forward model is employed to accommodate the spectral effects of implants, and a multiresolution object model is used to achieve high-resolution implant reconstruction. The reconstruction process alternates between diffusion posterior sampling for anatomy updates and classic optimization for implants and spectral coefficients.</p><p><strong>Results: </strong>Evaluations were performed on emulated cardiac imaging with stent and spine imaging with pedicle screws. The structures of the cardiac stent with 0.25 mm wires were clearly visualized in the implant images, whereas the blooming artifacts around the stent were effectively suppressed in the anatomical reconstruction. For pedicle screws, the proposed algorithm mitigated streaking and beam-hardening artifacts in the anatomy volume, demonstrating significant improvements in SSIM and PSNR compared with frequency-splitting metal artifact reduction and model-based reconstruction on slices containing implants.</p><p><strong>Conclusion: </strong>The proposed mixed prior model coupled with a hybrid spectral and multiresolution model can help to separate spatially and spectrally distinct objects that differ from anatomical features in single-energy CT, improving both image quality and separate visualization of implants and anatomy.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 5","pages":"053502"},"PeriodicalIF":1.7,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12537543/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145349286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-09-17DOI: 10.1117/1.JMI.12.5.054501
Isabella Cama, Alejandro Guzmán, Cristina Campi, Michele Piana, Karim Lekadir, Sara Garbarino, Oliver Díaz
Purpose: Many studies caution against using radiomic features that are sensitive to contouring variability in predictive models for disease stratification. Consequently, metrics such as the intraclass correlation coefficient (ICC) are recommended to guide feature selection based on stability. However, the direct impact of segmentation variability on the performance of predictive models remains underexplored. We examine how segmentation variability affects both feature stability and predictive performance in the radiomics-based classification of triple-negative breast cancer (TNBC) using breast magnetic resonance imaging.
Approach: We analyzed 244 images from the Duke dataset, introducing segmentation variability through controlled modifications of manual segmentations. For each segmentation mask, explainable radiomic features were selected using Shapley Additive exPlanations and used to train logistic regression models. Feature stability across segmentations was assessed via ICC, Pearson's correlation, and reliability scores quantifying the relationship between segmentation variability and feature robustness.
Results: Model performances in predicting TNBC do not exhibit a significant difference across varying segmentations. The most explicative and predictive features exhibit decreasing ICC as segmentation accuracy decreases. However, their predictive power remains intact due to low ICC combined with high Pearson's correlation. No shared numerical relationship is found between feature stability and segmentation variability among the most predictive features.
Conclusions: Moderate segmentation variability has a limited impact on model performance. Although incorporating peritumoral information may reduce feature reproducibility, it does not compromise predictive utility. Notably, feature stability is not a strict prerequisite for predictive relevance, highlighting that exclusive reliance on ICC or stability metrics for feature selection may inadvertently discard informative features.
{"title":"Segmentation variability and radiomics stability for predicting triple-negative breast cancer subtype using magnetic resonance imaging.","authors":"Isabella Cama, Alejandro Guzmán, Cristina Campi, Michele Piana, Karim Lekadir, Sara Garbarino, Oliver Díaz","doi":"10.1117/1.JMI.12.5.054501","DOIUrl":"https://doi.org/10.1117/1.JMI.12.5.054501","url":null,"abstract":"<p><strong>Purpose: </strong>Many studies caution against using radiomic features that are sensitive to contouring variability in predictive models for disease stratification. Consequently, metrics such as the intraclass correlation coefficient (ICC) are recommended to guide feature selection based on stability. However, the direct impact of segmentation variability on the performance of predictive models remains underexplored. We examine how segmentation variability affects both feature stability and predictive performance in the radiomics-based classification of triple-negative breast cancer (TNBC) using breast magnetic resonance imaging.</p><p><strong>Approach: </strong>We analyzed 244 images from the Duke dataset, introducing segmentation variability through controlled modifications of manual segmentations. For each segmentation mask, explainable radiomic features were selected using Shapley Additive exPlanations and used to train logistic regression models. Feature stability across segmentations was assessed via ICC, Pearson's correlation, and reliability scores quantifying the relationship between segmentation variability and feature robustness.</p><p><strong>Results: </strong>Model performances in predicting TNBC do not exhibit a significant difference across varying segmentations. The most explicative and predictive features exhibit decreasing ICC as segmentation accuracy decreases. However, their predictive power remains intact due to low ICC combined with high Pearson's correlation. No shared numerical relationship is found between feature stability and segmentation variability among the most predictive features.</p><p><strong>Conclusions: </strong>Moderate segmentation variability has a limited impact on model performance. Although incorporating peritumoral information may reduce feature reproducibility, it does not compromise predictive utility. Notably, feature stability is not a strict prerequisite for predictive relevance, highlighting that exclusive reliance on ICC or stability metrics for feature selection may inadvertently discard informative features.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 5","pages":"054501"},"PeriodicalIF":1.7,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12443385/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145087856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-09-24DOI: 10.1117/1.JMI.12.5.051808
William F Auffermann, Nathan Barber, Ryan Stockard, Soham Banerjee
Purpose: Gamification can be a helpful adjunct to education and is increasingly used in radiology. We aim to determine if using a gamified framework to teach medical trainees about emergency radiology can improve perceptual and interpretive skills and facilitate learning.
Approach: We obtained approval from the Institutional Review Board, and participation was voluntary. Participants received training at the RadSimPE radiology workstation simulator and were shown three sets of computed tomography images related to emergency radiology diagnoses. Participants were asked to state their certainty that an abnormality was not present, localize it if present, and give their confidence in localization. Between case sets 1 and 2, the experimental group was provided with gamified emergency radiology training on the Stab the Diagnosis program, whereas the control group was not. Following the session, participants completed an eight-question survey to assess their thoughts about the training.
Results: A total of 36 medical trainees participated. Both the experimental group and control group improved in localization accuracy, but the experimental group's localization confidence was significantly greater than the control group ( ). Survey results were generally positive and were statistically significantly greater than the neutral value of 3, with -values for all eight questions. For example, survey results indicated that participants felt the training was a helpful educational experience ( ) and that the session was more effective for learning than traditional educational techniques ( ).
Conclusions: Gamification may be a valuable adjunct to conventional methods in radiology education and may improve trainee confidence.
{"title":"Gamification for emergency radiology education and image perception: stab the diagnosis.","authors":"William F Auffermann, Nathan Barber, Ryan Stockard, Soham Banerjee","doi":"10.1117/1.JMI.12.5.051808","DOIUrl":"https://doi.org/10.1117/1.JMI.12.5.051808","url":null,"abstract":"<p><strong>Purpose: </strong>Gamification can be a helpful adjunct to education and is increasingly used in radiology. We aim to determine if using a gamified framework to teach medical trainees about emergency radiology can improve perceptual and interpretive skills and facilitate learning.</p><p><strong>Approach: </strong>We obtained approval from the Institutional Review Board, and participation was voluntary. Participants received training at the RadSimPE radiology workstation simulator and were shown three sets of computed tomography images related to emergency radiology diagnoses. Participants were asked to state their certainty that an abnormality was not present, localize it if present, and give their confidence in localization. Between case sets 1 and 2, the experimental group was provided with gamified emergency radiology training on the Stab the Diagnosis program, whereas the control group was not. Following the session, participants completed an eight-question survey to assess their thoughts about the training.</p><p><strong>Results: </strong>A total of 36 medical trainees participated. Both the experimental group and control group improved in localization accuracy, but the experimental group's localization confidence was significantly greater than the control group ( <math><mrow><mi>p</mi> <mo>=</mo> <mn>0.0364</mn></mrow> </math> ). Survey results were generally positive and were statistically significantly greater than the neutral value of 3, with <math><mrow><mi>p</mi></mrow> </math> -values <math><mrow><mo><</mo> <mn>0.05</mn></mrow> </math> for all eight questions. For example, survey results indicated that participants felt the training was a helpful educational experience ( <math><mrow><mi>p</mi> <mo><</mo> <mn>0.001</mn></mrow> </math> ) and that the session was more effective for learning than traditional educational techniques ( <math><mrow><mi>p</mi> <mo>=</mo> <mn>0.001</mn></mrow> </math> ).</p><p><strong>Conclusions: </strong>Gamification may be a valuable adjunct to conventional methods in radiology education and may improve trainee confidence.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 5","pages":"051808"},"PeriodicalIF":1.7,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12458100/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145151419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-04-11DOI: 10.1117/1.JMI.12.5.051803
Michelle C Pryde, James Rioux, Adela Elena Cora, David Volders, Matthias H Schmidt, Mohammed Abdolell, Chris Bowen, Steven D Beyea
Purpose: Objective image quality metrics (IQMs) are widely used as outcome measures to assess acquisition and reconstruction strategies for diagnostic images. For nonpathological magnetic resonance (MR) images, these IQMs correlate to varying degrees with expert radiologists' confidence scores of overall perceived diagnostic image quality. However, it is unclear whether IQMs also correlate with task-specific diagnostic image quality or expert radiologists' confidence in performing a specific diagnostic task, which calls into question their use as surrogates for radiologist opinion.
Approach: 0.5 T MR images from 16 stroke patients and two healthy volunteers were retrospectively undersampled ( to ) and reconstructed via compressed sensing. Three neuroradiologists reported the presence/absence of acute ischemic stroke (AIS) and assigned a Fazekas score describing the extent of chronic ischemic lesion burden. Neuroradiologists ranked their confidence in performing each task using a 1 to 5 Likert scale. Confidence scores were correlated with noise quality measure, the visual information fidelity criterion, the feature similarity index, root mean square error, and structural similarity (SSIM) via nonlinear regression modeling.
Results: Although acceleration alters image quality, neuroradiologists remain able to report pathology. All of the IQMs tested correlated to some degree with diagnostic confidence for assessing chronic ischemic lesion burden, but none correlated with diagnostic confidence in diagnosing the presence/absence of AIS due to consistent radiologist performance regardless of image degradation.
Conclusions: Accelerated images were helpful for understanding the ability of IQMs to assess task-specific diagnostic image quality in the context of chronic ischemic lesion burden, although not in the case of AIS diagnosis. These findings suggest that commonly used IQMs, such as the SSIM index, do not necessarily indicate an image's utility when performing certain diagnostic tasks.
目的:客观图像质量指标(IQMs)被广泛用于评估诊断图像的采集和重建策略。对于非病理性磁共振(MR)图像,这些iqm与放射科专家对整体感知诊断图像质量的信心得分有不同程度的相关性。然而,目前尚不清楚iqm是否也与特定任务的诊断图像质量或专家放射科医生在执行特定诊断任务时的信心相关,这就使他们作为放射科医生意见的替代品的使用受到质疑。方法:对16例脑卒中患者和2名健康志愿者的0.5 T MR图像进行回顾性欠采样(R = 1 ~ 7 ×),并通过压缩感知进行重构。三名神经放射学家报告了急性缺血性卒中(AIS)的存在/不存在,并分配了描述慢性缺血性病变负担程度的Fazekas评分。神经放射学家用1到5的李克特量表对他们完成每项任务的信心进行排名。通过非线性回归建模,置信度得分与噪声质量度量、视觉信息保真度标准、特征相似度指数、均方根误差和结构相似度(SSIM)相关。结果:虽然加速改变图像质量,神经放射科医生仍然能够报告病理。所有测试的iqm都在一定程度上与评估慢性缺血性病变负担的诊断信心相关,但没有一个与诊断AIS存在与否的诊断信心相关,因为无论图像退化如何,放射科医生的表现都是一致的。结论:加速图像有助于理解IQMs在慢性缺血性病变负担背景下评估特定任务诊断图像质量的能力,尽管在AIS诊断情况下并非如此。这些发现表明,在执行某些诊断任务时,常用的iqm(如SSIM索引)不一定表明映像的实用性。
{"title":"Correlation of objective image quality metrics with radiologists' diagnostic confidence depends on the clinical task performed.","authors":"Michelle C Pryde, James Rioux, Adela Elena Cora, David Volders, Matthias H Schmidt, Mohammed Abdolell, Chris Bowen, Steven D Beyea","doi":"10.1117/1.JMI.12.5.051803","DOIUrl":"10.1117/1.JMI.12.5.051803","url":null,"abstract":"<p><strong>Purpose: </strong>Objective image quality metrics (IQMs) are widely used as outcome measures to assess acquisition and reconstruction strategies for diagnostic images. For nonpathological magnetic resonance (MR) images, these IQMs correlate to varying degrees with expert radiologists' confidence scores of overall perceived diagnostic image quality. However, it is unclear whether IQMs also correlate with task-specific diagnostic image quality or expert radiologists' confidence in performing a specific diagnostic task, which calls into question their use as surrogates for radiologist opinion.</p><p><strong>Approach: </strong>0.5 T MR images from 16 stroke patients and two healthy volunteers were retrospectively undersampled ( <math><mrow><mi>R</mi> <mo>=</mo> <mn>1</mn></mrow> </math> to <math><mrow><mn>7</mn> <mo>×</mo></mrow> </math> ) and reconstructed via compressed sensing. Three neuroradiologists reported the presence/absence of acute ischemic stroke (AIS) and assigned a Fazekas score describing the extent of chronic ischemic lesion burden. Neuroradiologists ranked their confidence in performing each task using a 1 to 5 Likert scale. Confidence scores were correlated with noise quality measure, the visual information fidelity criterion, the feature similarity index, root mean square error, and structural similarity (SSIM) via nonlinear regression modeling.</p><p><strong>Results: </strong>Although acceleration alters image quality, neuroradiologists remain able to report pathology. All of the IQMs tested correlated to some degree with diagnostic confidence for assessing chronic ischemic lesion burden, but none correlated with diagnostic confidence in diagnosing the presence/absence of AIS due to consistent radiologist performance regardless of image degradation.</p><p><strong>Conclusions: </strong>Accelerated images were helpful for understanding the ability of IQMs to assess task-specific diagnostic image quality in the context of chronic ischemic lesion burden, although not in the case of AIS diagnosis. These findings suggest that commonly used IQMs, such as the SSIM index, do not necessarily indicate an image's utility when performing certain diagnostic tasks.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 5","pages":"051803"},"PeriodicalIF":1.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11991859/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144018546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-06-18DOI: 10.1117/1.JMI.12.5.051806
Alisa Mohebbi, Ali Abdi, Saeed Mohammadzadeh, Mohammad Mirza-Aghazadeh-Attari, Ali Abbasian Ardakani, Afshin Mohammadi
Purpose: Our purpose is to assess the inter-rater agreement between digital mammography (DM) and contrast-enhanced spectral mammography (CESM) in evaluating the Breast Imaging Reporting and Data System (BI-RADS) grading.
Approach: This retrospective study included 326 patients recruited between January 2019 and February 2021. The study protocol was pre-registered on the Open Science Framework platform. Two expert radiologists interpreted the CESM and DM findings. Pathological data are used for radiologically suspicious or malignant-appearing lesions, whereas follow-up was considered the gold standard for benign-appearing lesions and breasts without lesions.
Results: For intra-device agreement, both imaging modalities showed "almost perfect" agreement, indicating that different radiologists are expected to report the same BI-RADS score for the same image. Despite showing a similar interpretation, a paired -test showed significantly higher agreement for CESM compared with DM ( ). Subgrouping based on the side or view did not show a considerable difference for both imaging modalities. For inter-device agreement, "almost perfect" agreement was also achieved. However, for proven malignant lesions, an overall higher BI-RADS score was achieved for CESM, whereas for benign or normal breasts, a lower BI-RADS score was reported, indicating a more precise BI-RADS classification for CESM compared with DM.
Conclusions: Our findings demonstrated strong agreement among readers regarding the identification of DM and CESM findings in breast images from various views. Moreover, it indicates that CESM is equally precise compared with DM and can be used as an alternative in clinical centers.
{"title":"Contrast-enhanced spectral mammography demonstrates better inter-reader repeatability than digital mammography for screening breast cancer patients.","authors":"Alisa Mohebbi, Ali Abdi, Saeed Mohammadzadeh, Mohammad Mirza-Aghazadeh-Attari, Ali Abbasian Ardakani, Afshin Mohammadi","doi":"10.1117/1.JMI.12.5.051806","DOIUrl":"10.1117/1.JMI.12.5.051806","url":null,"abstract":"<p><strong>Purpose: </strong>Our purpose is to assess the inter-rater agreement between digital mammography (DM) and contrast-enhanced spectral mammography (CESM) in evaluating the Breast Imaging Reporting and Data System (BI-RADS) grading.</p><p><strong>Approach: </strong>This retrospective study included 326 patients recruited between January 2019 and February 2021. The study protocol was pre-registered on the Open Science Framework platform. Two expert radiologists interpreted the CESM and DM findings. Pathological data are used for radiologically suspicious or malignant-appearing lesions, whereas follow-up was considered the gold standard for benign-appearing lesions and breasts without lesions.</p><p><strong>Results: </strong>For intra-device agreement, both imaging modalities showed \"almost perfect\" agreement, indicating that different radiologists are expected to report the same BI-RADS score for the same image. Despite showing a similar interpretation, a paired <math><mrow><mi>t</mi></mrow> </math> -test showed significantly higher agreement for CESM compared with DM ( <math><mrow><mi>p</mi> <mo><</mo> <mn>0.001</mn></mrow> </math> ). Subgrouping based on the side or view did not show a considerable difference for both imaging modalities. For inter-device agreement, \"almost perfect\" agreement was also achieved. However, for proven malignant lesions, an overall higher BI-RADS score was achieved for CESM, whereas for benign or normal breasts, a lower BI-RADS score was reported, indicating a more precise BI-RADS classification for CESM compared with DM.</p><p><strong>Conclusions: </strong>Our findings demonstrated strong agreement among readers regarding the identification of DM and CESM findings in breast images from various views. Moreover, it indicates that CESM is equally precise compared with DM and can be used as an alternative in clinical centers.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 5","pages":"051806"},"PeriodicalIF":1.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12175086/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144334196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-03-19DOI: 10.1117/1.JMI.12.5.051802
Haoqi Wang, Xiomara T Gonzalez, Gabriela A Renta-López, Mary Catherine Bordes, Michael C Hout, Seung W Choi, Gregory P Reece, Mia K Markey
Purpose: It is often hard for patients to articulate their expectations about breast reconstruction appearance outcomes to their providers. Our overarching goal is to develop a tool to help patients visually express what they expect to look like after reconstruction. We aim to comprehensively understand how breast cancer survivors perceive diverse breast appearance states by mapping them onto a low-dimensional Euclidean space, which simplifies the complex information about perceptual similarity relationships into a more interpretable form.
Approach: We recruited breast cancer survivors and conducted observer experiments to assess the visual similarities among clinical photographs depicting a range of appearances of the torso relevant to breast reconstruction. Then, we developed a perceptual map to illuminate how breast cancer survivors perceive and distinguish among these appearance states.
Results: We sampled 100 photographs as stimuli and recruited 34 breast cancer survivors locally. The resulting perceptual map, constructed in two dimensions, offers valuable insights into factors influencing breast cancer survivors' perceptions of breast reconstruction outcomes. Our findings highlight specific aspects, such as the number of nipples, symmetry, ptosis, scars, and breast shape, that emerge as particularly noteworthy for breast cancer survivors.
Conclusions: Analysis of the perceptual map identified factors associated with breast cancer survivors' perceptions of breast appearance states that should be emphasized in the appearance consultation process. The perceptual map could be used to assist patients in visually expressing what they expect to look like. Our study lays the groundwork for evaluating interventions intended to help patients form realistic expectations.
{"title":"Breast cancer survivors' perceptual map of breast reconstruction appearance outcomes.","authors":"Haoqi Wang, Xiomara T Gonzalez, Gabriela A Renta-López, Mary Catherine Bordes, Michael C Hout, Seung W Choi, Gregory P Reece, Mia K Markey","doi":"10.1117/1.JMI.12.5.051802","DOIUrl":"10.1117/1.JMI.12.5.051802","url":null,"abstract":"<p><strong>Purpose: </strong>It is often hard for patients to articulate their expectations about breast reconstruction appearance outcomes to their providers. Our overarching goal is to develop a tool to help patients visually express what they expect to look like after reconstruction. We aim to comprehensively understand how breast cancer survivors perceive diverse breast appearance states by mapping them onto a low-dimensional Euclidean space, which simplifies the complex information about perceptual similarity relationships into a more interpretable form.</p><p><strong>Approach: </strong>We recruited breast cancer survivors and conducted observer experiments to assess the visual similarities among clinical photographs depicting a range of appearances of the torso relevant to breast reconstruction. Then, we developed a perceptual map to illuminate how breast cancer survivors perceive and distinguish among these appearance states.</p><p><strong>Results: </strong>We sampled 100 photographs as stimuli and recruited 34 breast cancer survivors locally. The resulting perceptual map, constructed in two dimensions, offers valuable insights into factors influencing breast cancer survivors' perceptions of breast reconstruction outcomes. Our findings highlight specific aspects, such as the number of nipples, symmetry, ptosis, scars, and breast shape, that emerge as particularly noteworthy for breast cancer survivors.</p><p><strong>Conclusions: </strong>Analysis of the perceptual map identified factors associated with breast cancer survivors' perceptions of breast appearance states that should be emphasized in the appearance consultation process. The perceptual map could be used to assist patients in visually expressing what they expect to look like. Our study lays the groundwork for evaluating interventions intended to help patients form realistic expectations.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 5","pages":"051802"},"PeriodicalIF":1.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11921042/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143671445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-10-16DOI: 10.1117/1.JMI.12.5.051809
Aditya Jonnalagadda, Bruno B Barufaldi, Andrew D A Maidment, Susan P Weinstein, Craig K Abbey, Miguel P Eckstein
Purpose: We aim to assess the perceptual tasks in which convolutional neural networks (CNNs) might be better tools than commonly used linear model observers (LMOs) to evaluate medical image quality.
Approach: We compared the LMOs (channelized Hotelling [CHO] and frequency convolution channels observers [FCO]) and CNN detection accuracies for tasks with a few possible signal locations (location known exactly) and for the search for mass and microcalcification signals embedded in 2D/3D breast tomosynthesis phantoms. We also compared the LMOs and CNN accuracies to those of radiologists in the search tasks. We analyzed radiologists' eye position to assess whether they fixate longer at locations considered suspicious by the LMOs or those by the CNN.
Results: LMOs resulted in similar detection accuracies [area under the receiver operating characteristic curve (AUC)] to the CNN for tasks with up to 100 signal locations but lower accuracies in the search task for microcalcification and mass 3D images. Radiologists' AUC was significantly higher ( ) than that of LMOs for the microcalcification 2D search (CHO, FCO) and 3D mass search ( , CHO) but was not higher than the CNN's AUC. For both signal types, radiologists fixated longer on the locations of the highest response scores of the CNN than those of the LMOs but only reached statistical significance for the mass (masses: versus CHO and versus FCO).
Conclusion: We show that CNNs are a more suitable model observer for search tasks. Like radiologists but not traditional LMOs, CNNs can discount false positives arising from anatomical backgrounds.
{"title":"Convolutional neural network model observers discount signal-like anatomical structures during search in virtual digital breast tomosynthesis phantoms.","authors":"Aditya Jonnalagadda, Bruno B Barufaldi, Andrew D A Maidment, Susan P Weinstein, Craig K Abbey, Miguel P Eckstein","doi":"10.1117/1.JMI.12.5.051809","DOIUrl":"10.1117/1.JMI.12.5.051809","url":null,"abstract":"<p><strong>Purpose: </strong>We aim to assess the perceptual tasks in which convolutional neural networks (CNNs) might be better tools than commonly used linear model observers (LMOs) to evaluate medical image quality.</p><p><strong>Approach: </strong>We compared the LMOs (channelized Hotelling [CHO] and frequency convolution channels observers [FCO]) and CNN detection accuracies for tasks with a few possible signal locations (location known exactly) and for the search for mass and microcalcification signals embedded in 2D/3D breast tomosynthesis phantoms. We also compared the LMOs and CNN accuracies to those of radiologists in the search tasks. We analyzed radiologists' eye position to assess whether they fixate longer at locations considered suspicious by the LMOs or those by the CNN.</p><p><strong>Results: </strong>LMOs resulted in similar detection accuracies [area under the receiver operating characteristic curve (AUC)] to the CNN for tasks with up to 100 signal locations but lower accuracies in the search task for microcalcification and mass 3D images. Radiologists' AUC was significantly higher ( <math><mrow><mi>p</mi> <mo><</mo> <mn>1</mn> <mi>e</mi> <mo>-</mo> <mn>4</mn></mrow> </math> ) than that of LMOs for the microcalcification 2D search (CHO, FCO) and 3D mass search ( <math><mrow><mi>p</mi> <mo><</mo> <mn>0.05</mn></mrow> </math> , CHO) but was not higher than the CNN's AUC. For both signal types, radiologists fixated longer on the locations of the highest response scores of the CNN than those of the LMOs but only reached statistical significance for the mass (masses: <math><mrow><mi>p</mi> <mo>=</mo> <mn>0.009</mn></mrow> </math> versus CHO and <math><mrow><mi>p</mi> <mo>=</mo> <mn>0.004</mn></mrow> </math> versus FCO).</p><p><strong>Conclusion: </strong>We show that CNNs are a more suitable model observer for search tasks. Like radiologists but not traditional LMOs, CNNs can discount false positives arising from anatomical backgrounds.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 5","pages":"051809"},"PeriodicalIF":1.7,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12530144/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145330494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-10-07DOI: 10.1117/1.JMI.12.5.054502
Karen Drukker, Samuel G Armato, Lubomir Hadjiiski, Judy Gichoya, Nicholas Gruszauskas, Jayashree Kalpathy-Cramer, Hui Li, Kyle J Myers, Robert M Tomek, Heather M Whitney, Zi Zhang, Maryellen L Giger
Purpose: The Medical Imaging and Data Resource Center Mastermind Grand Challenge of modified radiographic assessment of lung edema (mRALE) tasked participants with developing machine learning techniques for automated COVID-19 severity assessment via mRALE scores on portable chest radiographs (CXRs). We examine potential biases across demographic subgroups for the best-performing models of the nine teams participating in the test phase of the challenge.
Approach: Models were evaluated against a nonpublic test set of CXRs (814 patients) annotated by radiologists for disease severity (mRALE score 0 to 24). Participants used a variety of data and methods for training. Performance was measured using quadratic-weighted kappa (QWK). Bias analyses considered demographics (sex, age, race, ethnicity, and their intersections) using QWK. In addition, for distinguishing no/mild versus moderate/severe disease, equal opportunity difference (EOD) and average absolute odds difference (AAOD) were calculated. Bias was defined as statistically significant QWK subgroup differences, or EOD outside [ ; 0.1], or AAOD outside [0; 0.1].
Results: The nine models demonstrated good agreement with the reference standard (QWK 0.74 to 0.88). The winning model (QWK = 0.884 [0.819; 0.949]) was the only model without biases identified in terms of QWK. The runner-up model (QWK = 0.874 [0.813; 0.936]) showed no identified biases in terms of EOD and AAOD, whereas the winning model disadvantaged three subgroups in each of these metrics. The median number of disadvantaged subgroups for all models was 3.
Conclusions: The challenge demonstrated strong model performances but identified subgroup disparities. Bias analysis is essential as models with similar accuracy may exhibit varying fairness.
目的:医学成像和数据资源中心(Medical Imaging and Data Resource Center)发起了改进肺水肿放射学评估(mRALE)的大挑战,要求参与者开发机器学习技术,通过便携式胸片(cxr)上的mRALE评分自动评估COVID-19严重程度。我们检查了参与挑战测试阶段的九个团队中表现最佳的模型在人口统计子组中的潜在偏差。方法:根据放射科医生对疾病严重程度(mRALE评分0至24分)注释的非公开cxr测试集(814例患者)对模型进行评估。参与者使用各种数据和方法进行训练。使用二次加权kappa (QWK)来测量性能。偏差分析使用QWK考虑人口统计学(性别、年龄、种族、民族及其交集)。此外,为了区分无/轻度与中度/重度疾病,计算了均等机会差(EOD)和平均绝对优势差(AAOD)。偏倚定义为统计学上显著的QWK亚组差异,或EOD外[- 0.1;0.1],或AAOD外[0;0.1]。结果:9个模型与参考标准的QWK值(0.74 ~ 0.88)吻合较好。获胜的模型(QWK = 0.884[0.819; 0.949])是唯一一个在QWK方面没有发现偏差的模型。第二名模型(QWK = 0.874[0.813; 0.936])在EOD和AAOD方面没有明显的偏差,而获胜模型在这些指标中都有三个亚组处于劣势。所有模型的弱势亚组中位数为3。结论:挑战证明了强大的模型性能,但确定了亚组差异。偏差分析是必不可少的,因为具有相似精度的模型可能表现出不同的公平性。
{"title":"Machine learning evaluation of pneumonia severity: subgroup performance in the Medical Imaging and Data Resource Center modified radiographic assessment of lung edema mastermind challenge.","authors":"Karen Drukker, Samuel G Armato, Lubomir Hadjiiski, Judy Gichoya, Nicholas Gruszauskas, Jayashree Kalpathy-Cramer, Hui Li, Kyle J Myers, Robert M Tomek, Heather M Whitney, Zi Zhang, Maryellen L Giger","doi":"10.1117/1.JMI.12.5.054502","DOIUrl":"10.1117/1.JMI.12.5.054502","url":null,"abstract":"<p><strong>Purpose: </strong>The Medical Imaging and Data Resource Center Mastermind Grand Challenge of modified radiographic assessment of lung edema (mRALE) tasked participants with developing machine learning techniques for automated COVID-19 severity assessment via mRALE scores on portable chest radiographs (CXRs). We examine potential biases across demographic subgroups for the best-performing models of the nine teams participating in the test phase of the challenge.</p><p><strong>Approach: </strong>Models were evaluated against a nonpublic test set of CXRs (814 patients) annotated by radiologists for disease severity (mRALE score 0 to 24). Participants used a variety of data and methods for training. Performance was measured using quadratic-weighted kappa (QWK). Bias analyses considered demographics (sex, age, race, ethnicity, and their intersections) using QWK. In addition, for distinguishing no/mild versus moderate/severe disease, equal opportunity difference (EOD) and average absolute odds difference (AAOD) were calculated. Bias was defined as statistically significant QWK subgroup differences, or EOD outside [ <math><mrow><mo>-</mo> <mn>0.1</mn></mrow> </math> ; 0.1], or AAOD outside [0; 0.1].</p><p><strong>Results: </strong>The nine models demonstrated good agreement with the reference standard (QWK 0.74 to 0.88). The winning model (QWK = 0.884 [0.819; 0.949]) was the only model without biases identified in terms of QWK. The runner-up model (QWK = 0.874 [0.813; 0.936]) showed no identified biases in terms of EOD and AAOD, whereas the winning model disadvantaged three subgroups in each of these metrics. The median number of disadvantaged subgroups for all models was 3.</p><p><strong>Conclusions: </strong>The challenge demonstrated strong model performances but identified subgroup disparities. Bias analysis is essential as models with similar accuracy may exhibit varying fairness.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 5","pages":"054502"},"PeriodicalIF":1.7,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12503059/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145253373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}