Pub Date : 2025-09-01Epub Date: 2025-09-24DOI: 10.1117/1.JMI.12.5.051808
William F Auffermann, Nathan Barber, Ryan Stockard, Soham Banerjee
Purpose: Gamification can be a helpful adjunct to education and is increasingly used in radiology. We aim to determine if using a gamified framework to teach medical trainees about emergency radiology can improve perceptual and interpretive skills and facilitate learning.
Approach: We obtained approval from the Institutional Review Board, and participation was voluntary. Participants received training at the RadSimPE radiology workstation simulator and were shown three sets of computed tomography images related to emergency radiology diagnoses. Participants were asked to state their certainty that an abnormality was not present, localize it if present, and give their confidence in localization. Between case sets 1 and 2, the experimental group was provided with gamified emergency radiology training on the Stab the Diagnosis program, whereas the control group was not. Following the session, participants completed an eight-question survey to assess their thoughts about the training.
Results: A total of 36 medical trainees participated. Both the experimental group and control group improved in localization accuracy, but the experimental group's localization confidence was significantly greater than the control group ( ). Survey results were generally positive and were statistically significantly greater than the neutral value of 3, with -values for all eight questions. For example, survey results indicated that participants felt the training was a helpful educational experience ( ) and that the session was more effective for learning than traditional educational techniques ( ).
Conclusions: Gamification may be a valuable adjunct to conventional methods in radiology education and may improve trainee confidence.
{"title":"Gamification for emergency radiology education and image perception: stab the diagnosis.","authors":"William F Auffermann, Nathan Barber, Ryan Stockard, Soham Banerjee","doi":"10.1117/1.JMI.12.5.051808","DOIUrl":"https://doi.org/10.1117/1.JMI.12.5.051808","url":null,"abstract":"<p><strong>Purpose: </strong>Gamification can be a helpful adjunct to education and is increasingly used in radiology. We aim to determine if using a gamified framework to teach medical trainees about emergency radiology can improve perceptual and interpretive skills and facilitate learning.</p><p><strong>Approach: </strong>We obtained approval from the Institutional Review Board, and participation was voluntary. Participants received training at the RadSimPE radiology workstation simulator and were shown three sets of computed tomography images related to emergency radiology diagnoses. Participants were asked to state their certainty that an abnormality was not present, localize it if present, and give their confidence in localization. Between case sets 1 and 2, the experimental group was provided with gamified emergency radiology training on the Stab the Diagnosis program, whereas the control group was not. Following the session, participants completed an eight-question survey to assess their thoughts about the training.</p><p><strong>Results: </strong>A total of 36 medical trainees participated. Both the experimental group and control group improved in localization accuracy, but the experimental group's localization confidence was significantly greater than the control group ( <math><mrow><mi>p</mi> <mo>=</mo> <mn>0.0364</mn></mrow> </math> ). Survey results were generally positive and were statistically significantly greater than the neutral value of 3, with <math><mrow><mi>p</mi></mrow> </math> -values <math><mrow><mo><</mo> <mn>0.05</mn></mrow> </math> for all eight questions. For example, survey results indicated that participants felt the training was a helpful educational experience ( <math><mrow><mi>p</mi> <mo><</mo> <mn>0.001</mn></mrow> </math> ) and that the session was more effective for learning than traditional educational techniques ( <math><mrow><mi>p</mi> <mo>=</mo> <mn>0.001</mn></mrow> </math> ).</p><p><strong>Conclusions: </strong>Gamification may be a valuable adjunct to conventional methods in radiology education and may improve trainee confidence.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 5","pages":"051808"},"PeriodicalIF":1.7,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12458100/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145151419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-04-11DOI: 10.1117/1.JMI.12.5.051803
Michelle C Pryde, James Rioux, Adela Elena Cora, David Volders, Matthias H Schmidt, Mohammed Abdolell, Chris Bowen, Steven D Beyea
Purpose: Objective image quality metrics (IQMs) are widely used as outcome measures to assess acquisition and reconstruction strategies for diagnostic images. For nonpathological magnetic resonance (MR) images, these IQMs correlate to varying degrees with expert radiologists' confidence scores of overall perceived diagnostic image quality. However, it is unclear whether IQMs also correlate with task-specific diagnostic image quality or expert radiologists' confidence in performing a specific diagnostic task, which calls into question their use as surrogates for radiologist opinion.
Approach: 0.5 T MR images from 16 stroke patients and two healthy volunteers were retrospectively undersampled ( to ) and reconstructed via compressed sensing. Three neuroradiologists reported the presence/absence of acute ischemic stroke (AIS) and assigned a Fazekas score describing the extent of chronic ischemic lesion burden. Neuroradiologists ranked their confidence in performing each task using a 1 to 5 Likert scale. Confidence scores were correlated with noise quality measure, the visual information fidelity criterion, the feature similarity index, root mean square error, and structural similarity (SSIM) via nonlinear regression modeling.
Results: Although acceleration alters image quality, neuroradiologists remain able to report pathology. All of the IQMs tested correlated to some degree with diagnostic confidence for assessing chronic ischemic lesion burden, but none correlated with diagnostic confidence in diagnosing the presence/absence of AIS due to consistent radiologist performance regardless of image degradation.
Conclusions: Accelerated images were helpful for understanding the ability of IQMs to assess task-specific diagnostic image quality in the context of chronic ischemic lesion burden, although not in the case of AIS diagnosis. These findings suggest that commonly used IQMs, such as the SSIM index, do not necessarily indicate an image's utility when performing certain diagnostic tasks.
目的:客观图像质量指标(IQMs)被广泛用于评估诊断图像的采集和重建策略。对于非病理性磁共振(MR)图像,这些iqm与放射科专家对整体感知诊断图像质量的信心得分有不同程度的相关性。然而,目前尚不清楚iqm是否也与特定任务的诊断图像质量或专家放射科医生在执行特定诊断任务时的信心相关,这就使他们作为放射科医生意见的替代品的使用受到质疑。方法:对16例脑卒中患者和2名健康志愿者的0.5 T MR图像进行回顾性欠采样(R = 1 ~ 7 ×),并通过压缩感知进行重构。三名神经放射学家报告了急性缺血性卒中(AIS)的存在/不存在,并分配了描述慢性缺血性病变负担程度的Fazekas评分。神经放射学家用1到5的李克特量表对他们完成每项任务的信心进行排名。通过非线性回归建模,置信度得分与噪声质量度量、视觉信息保真度标准、特征相似度指数、均方根误差和结构相似度(SSIM)相关。结果:虽然加速改变图像质量,神经放射科医生仍然能够报告病理。所有测试的iqm都在一定程度上与评估慢性缺血性病变负担的诊断信心相关,但没有一个与诊断AIS存在与否的诊断信心相关,因为无论图像退化如何,放射科医生的表现都是一致的。结论:加速图像有助于理解IQMs在慢性缺血性病变负担背景下评估特定任务诊断图像质量的能力,尽管在AIS诊断情况下并非如此。这些发现表明,在执行某些诊断任务时,常用的iqm(如SSIM索引)不一定表明映像的实用性。
{"title":"Correlation of objective image quality metrics with radiologists' diagnostic confidence depends on the clinical task performed.","authors":"Michelle C Pryde, James Rioux, Adela Elena Cora, David Volders, Matthias H Schmidt, Mohammed Abdolell, Chris Bowen, Steven D Beyea","doi":"10.1117/1.JMI.12.5.051803","DOIUrl":"10.1117/1.JMI.12.5.051803","url":null,"abstract":"<p><strong>Purpose: </strong>Objective image quality metrics (IQMs) are widely used as outcome measures to assess acquisition and reconstruction strategies for diagnostic images. For nonpathological magnetic resonance (MR) images, these IQMs correlate to varying degrees with expert radiologists' confidence scores of overall perceived diagnostic image quality. However, it is unclear whether IQMs also correlate with task-specific diagnostic image quality or expert radiologists' confidence in performing a specific diagnostic task, which calls into question their use as surrogates for radiologist opinion.</p><p><strong>Approach: </strong>0.5 T MR images from 16 stroke patients and two healthy volunteers were retrospectively undersampled ( <math><mrow><mi>R</mi> <mo>=</mo> <mn>1</mn></mrow> </math> to <math><mrow><mn>7</mn> <mo>×</mo></mrow> </math> ) and reconstructed via compressed sensing. Three neuroradiologists reported the presence/absence of acute ischemic stroke (AIS) and assigned a Fazekas score describing the extent of chronic ischemic lesion burden. Neuroradiologists ranked their confidence in performing each task using a 1 to 5 Likert scale. Confidence scores were correlated with noise quality measure, the visual information fidelity criterion, the feature similarity index, root mean square error, and structural similarity (SSIM) via nonlinear regression modeling.</p><p><strong>Results: </strong>Although acceleration alters image quality, neuroradiologists remain able to report pathology. All of the IQMs tested correlated to some degree with diagnostic confidence for assessing chronic ischemic lesion burden, but none correlated with diagnostic confidence in diagnosing the presence/absence of AIS due to consistent radiologist performance regardless of image degradation.</p><p><strong>Conclusions: </strong>Accelerated images were helpful for understanding the ability of IQMs to assess task-specific diagnostic image quality in the context of chronic ischemic lesion burden, although not in the case of AIS diagnosis. These findings suggest that commonly used IQMs, such as the SSIM index, do not necessarily indicate an image's utility when performing certain diagnostic tasks.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 5","pages":"051803"},"PeriodicalIF":1.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11991859/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144018546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-06-18DOI: 10.1117/1.JMI.12.5.051806
Alisa Mohebbi, Ali Abdi, Saeed Mohammadzadeh, Mohammad Mirza-Aghazadeh-Attari, Ali Abbasian Ardakani, Afshin Mohammadi
Purpose: Our purpose is to assess the inter-rater agreement between digital mammography (DM) and contrast-enhanced spectral mammography (CESM) in evaluating the Breast Imaging Reporting and Data System (BI-RADS) grading.
Approach: This retrospective study included 326 patients recruited between January 2019 and February 2021. The study protocol was pre-registered on the Open Science Framework platform. Two expert radiologists interpreted the CESM and DM findings. Pathological data are used for radiologically suspicious or malignant-appearing lesions, whereas follow-up was considered the gold standard for benign-appearing lesions and breasts without lesions.
Results: For intra-device agreement, both imaging modalities showed "almost perfect" agreement, indicating that different radiologists are expected to report the same BI-RADS score for the same image. Despite showing a similar interpretation, a paired -test showed significantly higher agreement for CESM compared with DM ( ). Subgrouping based on the side or view did not show a considerable difference for both imaging modalities. For inter-device agreement, "almost perfect" agreement was also achieved. However, for proven malignant lesions, an overall higher BI-RADS score was achieved for CESM, whereas for benign or normal breasts, a lower BI-RADS score was reported, indicating a more precise BI-RADS classification for CESM compared with DM.
Conclusions: Our findings demonstrated strong agreement among readers regarding the identification of DM and CESM findings in breast images from various views. Moreover, it indicates that CESM is equally precise compared with DM and can be used as an alternative in clinical centers.
{"title":"Contrast-enhanced spectral mammography demonstrates better inter-reader repeatability than digital mammography for screening breast cancer patients.","authors":"Alisa Mohebbi, Ali Abdi, Saeed Mohammadzadeh, Mohammad Mirza-Aghazadeh-Attari, Ali Abbasian Ardakani, Afshin Mohammadi","doi":"10.1117/1.JMI.12.5.051806","DOIUrl":"10.1117/1.JMI.12.5.051806","url":null,"abstract":"<p><strong>Purpose: </strong>Our purpose is to assess the inter-rater agreement between digital mammography (DM) and contrast-enhanced spectral mammography (CESM) in evaluating the Breast Imaging Reporting and Data System (BI-RADS) grading.</p><p><strong>Approach: </strong>This retrospective study included 326 patients recruited between January 2019 and February 2021. The study protocol was pre-registered on the Open Science Framework platform. Two expert radiologists interpreted the CESM and DM findings. Pathological data are used for radiologically suspicious or malignant-appearing lesions, whereas follow-up was considered the gold standard for benign-appearing lesions and breasts without lesions.</p><p><strong>Results: </strong>For intra-device agreement, both imaging modalities showed \"almost perfect\" agreement, indicating that different radiologists are expected to report the same BI-RADS score for the same image. Despite showing a similar interpretation, a paired <math><mrow><mi>t</mi></mrow> </math> -test showed significantly higher agreement for CESM compared with DM ( <math><mrow><mi>p</mi> <mo><</mo> <mn>0.001</mn></mrow> </math> ). Subgrouping based on the side or view did not show a considerable difference for both imaging modalities. For inter-device agreement, \"almost perfect\" agreement was also achieved. However, for proven malignant lesions, an overall higher BI-RADS score was achieved for CESM, whereas for benign or normal breasts, a lower BI-RADS score was reported, indicating a more precise BI-RADS classification for CESM compared with DM.</p><p><strong>Conclusions: </strong>Our findings demonstrated strong agreement among readers regarding the identification of DM and CESM findings in breast images from various views. Moreover, it indicates that CESM is equally precise compared with DM and can be used as an alternative in clinical centers.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 5","pages":"051806"},"PeriodicalIF":1.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12175086/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144334196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-03-19DOI: 10.1117/1.JMI.12.5.051802
Haoqi Wang, Xiomara T Gonzalez, Gabriela A Renta-López, Mary Catherine Bordes, Michael C Hout, Seung W Choi, Gregory P Reece, Mia K Markey
Purpose: It is often hard for patients to articulate their expectations about breast reconstruction appearance outcomes to their providers. Our overarching goal is to develop a tool to help patients visually express what they expect to look like after reconstruction. We aim to comprehensively understand how breast cancer survivors perceive diverse breast appearance states by mapping them onto a low-dimensional Euclidean space, which simplifies the complex information about perceptual similarity relationships into a more interpretable form.
Approach: We recruited breast cancer survivors and conducted observer experiments to assess the visual similarities among clinical photographs depicting a range of appearances of the torso relevant to breast reconstruction. Then, we developed a perceptual map to illuminate how breast cancer survivors perceive and distinguish among these appearance states.
Results: We sampled 100 photographs as stimuli and recruited 34 breast cancer survivors locally. The resulting perceptual map, constructed in two dimensions, offers valuable insights into factors influencing breast cancer survivors' perceptions of breast reconstruction outcomes. Our findings highlight specific aspects, such as the number of nipples, symmetry, ptosis, scars, and breast shape, that emerge as particularly noteworthy for breast cancer survivors.
Conclusions: Analysis of the perceptual map identified factors associated with breast cancer survivors' perceptions of breast appearance states that should be emphasized in the appearance consultation process. The perceptual map could be used to assist patients in visually expressing what they expect to look like. Our study lays the groundwork for evaluating interventions intended to help patients form realistic expectations.
{"title":"Breast cancer survivors' perceptual map of breast reconstruction appearance outcomes.","authors":"Haoqi Wang, Xiomara T Gonzalez, Gabriela A Renta-López, Mary Catherine Bordes, Michael C Hout, Seung W Choi, Gregory P Reece, Mia K Markey","doi":"10.1117/1.JMI.12.5.051802","DOIUrl":"10.1117/1.JMI.12.5.051802","url":null,"abstract":"<p><strong>Purpose: </strong>It is often hard for patients to articulate their expectations about breast reconstruction appearance outcomes to their providers. Our overarching goal is to develop a tool to help patients visually express what they expect to look like after reconstruction. We aim to comprehensively understand how breast cancer survivors perceive diverse breast appearance states by mapping them onto a low-dimensional Euclidean space, which simplifies the complex information about perceptual similarity relationships into a more interpretable form.</p><p><strong>Approach: </strong>We recruited breast cancer survivors and conducted observer experiments to assess the visual similarities among clinical photographs depicting a range of appearances of the torso relevant to breast reconstruction. Then, we developed a perceptual map to illuminate how breast cancer survivors perceive and distinguish among these appearance states.</p><p><strong>Results: </strong>We sampled 100 photographs as stimuli and recruited 34 breast cancer survivors locally. The resulting perceptual map, constructed in two dimensions, offers valuable insights into factors influencing breast cancer survivors' perceptions of breast reconstruction outcomes. Our findings highlight specific aspects, such as the number of nipples, symmetry, ptosis, scars, and breast shape, that emerge as particularly noteworthy for breast cancer survivors.</p><p><strong>Conclusions: </strong>Analysis of the perceptual map identified factors associated with breast cancer survivors' perceptions of breast appearance states that should be emphasized in the appearance consultation process. The perceptual map could be used to assist patients in visually expressing what they expect to look like. Our study lays the groundwork for evaluating interventions intended to help patients form realistic expectations.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 5","pages":"051802"},"PeriodicalIF":1.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11921042/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143671445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-10-16DOI: 10.1117/1.JMI.12.5.051809
Aditya Jonnalagadda, Bruno B Barufaldi, Andrew D A Maidment, Susan P Weinstein, Craig K Abbey, Miguel P Eckstein
Purpose: We aim to assess the perceptual tasks in which convolutional neural networks (CNNs) might be better tools than commonly used linear model observers (LMOs) to evaluate medical image quality.
Approach: We compared the LMOs (channelized Hotelling [CHO] and frequency convolution channels observers [FCO]) and CNN detection accuracies for tasks with a few possible signal locations (location known exactly) and for the search for mass and microcalcification signals embedded in 2D/3D breast tomosynthesis phantoms. We also compared the LMOs and CNN accuracies to those of radiologists in the search tasks. We analyzed radiologists' eye position to assess whether they fixate longer at locations considered suspicious by the LMOs or those by the CNN.
Results: LMOs resulted in similar detection accuracies [area under the receiver operating characteristic curve (AUC)] to the CNN for tasks with up to 100 signal locations but lower accuracies in the search task for microcalcification and mass 3D images. Radiologists' AUC was significantly higher ( ) than that of LMOs for the microcalcification 2D search (CHO, FCO) and 3D mass search ( , CHO) but was not higher than the CNN's AUC. For both signal types, radiologists fixated longer on the locations of the highest response scores of the CNN than those of the LMOs but only reached statistical significance for the mass (masses: versus CHO and versus FCO).
Conclusion: We show that CNNs are a more suitable model observer for search tasks. Like radiologists but not traditional LMOs, CNNs can discount false positives arising from anatomical backgrounds.
{"title":"Convolutional neural network model observers discount signal-like anatomical structures during search in virtual digital breast tomosynthesis phantoms.","authors":"Aditya Jonnalagadda, Bruno B Barufaldi, Andrew D A Maidment, Susan P Weinstein, Craig K Abbey, Miguel P Eckstein","doi":"10.1117/1.JMI.12.5.051809","DOIUrl":"10.1117/1.JMI.12.5.051809","url":null,"abstract":"<p><strong>Purpose: </strong>We aim to assess the perceptual tasks in which convolutional neural networks (CNNs) might be better tools than commonly used linear model observers (LMOs) to evaluate medical image quality.</p><p><strong>Approach: </strong>We compared the LMOs (channelized Hotelling [CHO] and frequency convolution channels observers [FCO]) and CNN detection accuracies for tasks with a few possible signal locations (location known exactly) and for the search for mass and microcalcification signals embedded in 2D/3D breast tomosynthesis phantoms. We also compared the LMOs and CNN accuracies to those of radiologists in the search tasks. We analyzed radiologists' eye position to assess whether they fixate longer at locations considered suspicious by the LMOs or those by the CNN.</p><p><strong>Results: </strong>LMOs resulted in similar detection accuracies [area under the receiver operating characteristic curve (AUC)] to the CNN for tasks with up to 100 signal locations but lower accuracies in the search task for microcalcification and mass 3D images. Radiologists' AUC was significantly higher ( <math><mrow><mi>p</mi> <mo><</mo> <mn>1</mn> <mi>e</mi> <mo>-</mo> <mn>4</mn></mrow> </math> ) than that of LMOs for the microcalcification 2D search (CHO, FCO) and 3D mass search ( <math><mrow><mi>p</mi> <mo><</mo> <mn>0.05</mn></mrow> </math> , CHO) but was not higher than the CNN's AUC. For both signal types, radiologists fixated longer on the locations of the highest response scores of the CNN than those of the LMOs but only reached statistical significance for the mass (masses: <math><mrow><mi>p</mi> <mo>=</mo> <mn>0.009</mn></mrow> </math> versus CHO and <math><mrow><mi>p</mi> <mo>=</mo> <mn>0.004</mn></mrow> </math> versus FCO).</p><p><strong>Conclusion: </strong>We show that CNNs are a more suitable model observer for search tasks. Like radiologists but not traditional LMOs, CNNs can discount false positives arising from anatomical backgrounds.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 5","pages":"051809"},"PeriodicalIF":1.7,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12530144/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145330494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-10-07DOI: 10.1117/1.JMI.12.5.054502
Karen Drukker, Samuel G Armato, Lubomir Hadjiiski, Judy Gichoya, Nicholas Gruszauskas, Jayashree Kalpathy-Cramer, Hui Li, Kyle J Myers, Robert M Tomek, Heather M Whitney, Zi Zhang, Maryellen L Giger
Purpose: The Medical Imaging and Data Resource Center Mastermind Grand Challenge of modified radiographic assessment of lung edema (mRALE) tasked participants with developing machine learning techniques for automated COVID-19 severity assessment via mRALE scores on portable chest radiographs (CXRs). We examine potential biases across demographic subgroups for the best-performing models of the nine teams participating in the test phase of the challenge.
Approach: Models were evaluated against a nonpublic test set of CXRs (814 patients) annotated by radiologists for disease severity (mRALE score 0 to 24). Participants used a variety of data and methods for training. Performance was measured using quadratic-weighted kappa (QWK). Bias analyses considered demographics (sex, age, race, ethnicity, and their intersections) using QWK. In addition, for distinguishing no/mild versus moderate/severe disease, equal opportunity difference (EOD) and average absolute odds difference (AAOD) were calculated. Bias was defined as statistically significant QWK subgroup differences, or EOD outside [ ; 0.1], or AAOD outside [0; 0.1].
Results: The nine models demonstrated good agreement with the reference standard (QWK 0.74 to 0.88). The winning model (QWK = 0.884 [0.819; 0.949]) was the only model without biases identified in terms of QWK. The runner-up model (QWK = 0.874 [0.813; 0.936]) showed no identified biases in terms of EOD and AAOD, whereas the winning model disadvantaged three subgroups in each of these metrics. The median number of disadvantaged subgroups for all models was 3.
Conclusions: The challenge demonstrated strong model performances but identified subgroup disparities. Bias analysis is essential as models with similar accuracy may exhibit varying fairness.
目的:医学成像和数据资源中心(Medical Imaging and Data Resource Center)发起了改进肺水肿放射学评估(mRALE)的大挑战,要求参与者开发机器学习技术,通过便携式胸片(cxr)上的mRALE评分自动评估COVID-19严重程度。我们检查了参与挑战测试阶段的九个团队中表现最佳的模型在人口统计子组中的潜在偏差。方法:根据放射科医生对疾病严重程度(mRALE评分0至24分)注释的非公开cxr测试集(814例患者)对模型进行评估。参与者使用各种数据和方法进行训练。使用二次加权kappa (QWK)来测量性能。偏差分析使用QWK考虑人口统计学(性别、年龄、种族、民族及其交集)。此外,为了区分无/轻度与中度/重度疾病,计算了均等机会差(EOD)和平均绝对优势差(AAOD)。偏倚定义为统计学上显著的QWK亚组差异,或EOD外[- 0.1;0.1],或AAOD外[0;0.1]。结果:9个模型与参考标准的QWK值(0.74 ~ 0.88)吻合较好。获胜的模型(QWK = 0.884[0.819; 0.949])是唯一一个在QWK方面没有发现偏差的模型。第二名模型(QWK = 0.874[0.813; 0.936])在EOD和AAOD方面没有明显的偏差,而获胜模型在这些指标中都有三个亚组处于劣势。所有模型的弱势亚组中位数为3。结论:挑战证明了强大的模型性能,但确定了亚组差异。偏差分析是必不可少的,因为具有相似精度的模型可能表现出不同的公平性。
{"title":"Machine learning evaluation of pneumonia severity: subgroup performance in the Medical Imaging and Data Resource Center modified radiographic assessment of lung edema mastermind challenge.","authors":"Karen Drukker, Samuel G Armato, Lubomir Hadjiiski, Judy Gichoya, Nicholas Gruszauskas, Jayashree Kalpathy-Cramer, Hui Li, Kyle J Myers, Robert M Tomek, Heather M Whitney, Zi Zhang, Maryellen L Giger","doi":"10.1117/1.JMI.12.5.054502","DOIUrl":"10.1117/1.JMI.12.5.054502","url":null,"abstract":"<p><strong>Purpose: </strong>The Medical Imaging and Data Resource Center Mastermind Grand Challenge of modified radiographic assessment of lung edema (mRALE) tasked participants with developing machine learning techniques for automated COVID-19 severity assessment via mRALE scores on portable chest radiographs (CXRs). We examine potential biases across demographic subgroups for the best-performing models of the nine teams participating in the test phase of the challenge.</p><p><strong>Approach: </strong>Models were evaluated against a nonpublic test set of CXRs (814 patients) annotated by radiologists for disease severity (mRALE score 0 to 24). Participants used a variety of data and methods for training. Performance was measured using quadratic-weighted kappa (QWK). Bias analyses considered demographics (sex, age, race, ethnicity, and their intersections) using QWK. In addition, for distinguishing no/mild versus moderate/severe disease, equal opportunity difference (EOD) and average absolute odds difference (AAOD) were calculated. Bias was defined as statistically significant QWK subgroup differences, or EOD outside [ <math><mrow><mo>-</mo> <mn>0.1</mn></mrow> </math> ; 0.1], or AAOD outside [0; 0.1].</p><p><strong>Results: </strong>The nine models demonstrated good agreement with the reference standard (QWK 0.74 to 0.88). The winning model (QWK = 0.884 [0.819; 0.949]) was the only model without biases identified in terms of QWK. The runner-up model (QWK = 0.874 [0.813; 0.936]) showed no identified biases in terms of EOD and AAOD, whereas the winning model disadvantaged three subgroups in each of these metrics. The median number of disadvantaged subgroups for all models was 3.</p><p><strong>Conclusions: </strong>The challenge demonstrated strong model performances but identified subgroup disparities. Bias analysis is essential as models with similar accuracy may exhibit varying fairness.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 5","pages":"054502"},"PeriodicalIF":1.7,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12503059/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145253373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-09-27DOI: 10.1117/1.JMI.12.5.054002
Alan Romero-Pacheco, Nidiyare Hevia-Montiel, Blanca Vazquez, Fernando Arámbula Cosío, Jorge Perez-Gonzalez
Purpose: We present a deep-learning-based methodology for estimating deformation in 2D echocardiograms. The goal is to automatically estimate the longitudinal strain of the left ventricle (LV) walls in images affected by speckle noise and acoustic occlusions.
Approach: The proposed methodology integrates algorithms for converting sparse to dense flow, a Res-UNet architecture for automatic myocardium segmentation, flow estimation using a global motion aggregation network, and the computation of longitudinal strain curves and the global longitudinal strain (GLS) index. The approach was evaluated using two echocardiographic datasets in apical four-chamber view, both modified with noise and acoustic shadows. The CAMUS dataset ( ) was used for LV wall segmentation, whereas a synthetic image database ( ) was employed for flow estimation.
Results: Among the main performance metrics achieved are 98% [96 to 99] of correlation in the conversion from sparse to dense flow, a Dice index of for myocardial segmentation, an endpoint error of 0.133 [0.13 to 0.14] pixels in flow estimation, and an error of 1.34% [0.94 to 2.09] in the estimation of the GLS index.
Conclusions: The results demonstrate improvements over previously reported performances while maintaining stability in echocardiograms with acoustic shadows. This methodology could be useful in clinical practice for the analysis of echocardiograms with noise artifacts and acoustic occlusions. Our code and trained models are publicly available at https://github.com/ArBioIIMAS/echo-gma.
{"title":"Deep-learning-based estimation of left ventricle myocardial strain from echocardiograms with occlusion artifacts.","authors":"Alan Romero-Pacheco, Nidiyare Hevia-Montiel, Blanca Vazquez, Fernando Arámbula Cosío, Jorge Perez-Gonzalez","doi":"10.1117/1.JMI.12.5.054002","DOIUrl":"https://doi.org/10.1117/1.JMI.12.5.054002","url":null,"abstract":"<p><strong>Purpose: </strong>We present a deep-learning-based methodology for estimating deformation in 2D echocardiograms. The goal is to automatically estimate the longitudinal strain of the left ventricle (LV) walls in images affected by speckle noise and acoustic occlusions.</p><p><strong>Approach: </strong>The proposed methodology integrates algorithms for converting sparse to dense flow, a Res-UNet architecture for automatic myocardium segmentation, flow estimation using a global motion aggregation network, and the computation of longitudinal strain curves and the global longitudinal strain (GLS) index. The approach was evaluated using two echocardiographic datasets in apical four-chamber view, both modified with noise and acoustic shadows. The CAMUS dataset ( <math><mrow><mi>N</mi> <mo>=</mo> <mn>250</mn></mrow> </math> ) was used for LV wall segmentation, whereas a synthetic image database ( <math><mrow><mi>N</mi> <mo>=</mo> <mn>2037</mn></mrow> </math> ) was employed for flow estimation.</p><p><strong>Results: </strong>Among the main performance metrics achieved are 98% [96 to 99] of correlation in the conversion from sparse to dense flow, a Dice index of <math><mrow><mn>88.2</mn> <mo>%</mo> <mo>±</mo> <mn>3.8</mn> <mo>%</mo></mrow> </math> for myocardial segmentation, an endpoint error of 0.133 [0.13 to 0.14] pixels in flow estimation, and an error of 1.34% [0.94 to 2.09] in the estimation of the GLS index.</p><p><strong>Conclusions: </strong>The results demonstrate improvements over previously reported performances while maintaining stability in echocardiograms with acoustic shadows. This methodology could be useful in clinical practice for the analysis of echocardiograms with noise artifacts and acoustic occlusions. Our code and trained models are publicly available at https://github.com/ArBioIIMAS/echo-gma.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 5","pages":"054002"},"PeriodicalIF":1.7,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12476231/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145187211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose: Personalized federated learning (PFL) has been explored to address data heterogeneity while preserving privacy, and its application in computer-aided detection/diagnosis (CAD) software has been investigated. Ditto, a commonly studied PFL method, trains global and personalized models but is limited by instability in model updates and high hyperparameter tuning costs. We proposed Improved Ditto, a PFL method that dynamically adjusts the proportion of global model weights during personalized model updates to enhance stability and reduce hyperparameter tuning costs.
Approach: We introduced a personalized model update rule in Improved Ditto that dynamically determines the proportion of global model weights based on the L2-norm of the gradient-derived and global-model-derived terms. This method was evaluated using three types of CAD software: cerebral aneurysm detection in magnetic resonance (MR) angiography images (segmentation), brain metastasis detection in contrast-enhanced T1-weighted MR images (object detection), and liver lesion classification in gadolinium-ethoxybenzyl-diethylenetriamine pentaacetic acid-enhanced MR images (classification). The proposed method was compared with several conventional methods.
Results: In two out of three CAD software, the performance of Improved Ditto was competitive with Ditto and other federated-learning-based methods. The proposed method achieved a narrower hyperparameter search space, which contributed to reducing the tuning costs. In addition, it improved the stability of personalized model updates, suggesting enhanced adaptability to diverse datasets and tasks.
Conclusions: We demonstrate that dynamically adjusting global model weights during personalized model updates can improve the stability and adaptability of PFL. The proposed method reduces the hyperparameter tuning costs and offers potential benefits for CAD software.
{"title":"Improving personalized federated learning to optimize site-specific performance in computer-aided detection/diagnosis.","authors":"Aiki Yamada, Shouhei Hanaoka, Tomomi Takenaga, Soichiro Miki, Takeharu Yoshikawa, Osamu Abe, Toshiya Nakaguchi, Yukihiro Nomura","doi":"10.1117/1.JMI.12.5.054503","DOIUrl":"https://doi.org/10.1117/1.JMI.12.5.054503","url":null,"abstract":"<p><strong>Purpose: </strong>Personalized federated learning (PFL) has been explored to address data heterogeneity while preserving privacy, and its application in computer-aided detection/diagnosis (CAD) software has been investigated. Ditto, a commonly studied PFL method, trains global and personalized models but is limited by instability in model updates and high hyperparameter tuning costs. We proposed Improved Ditto, a PFL method that dynamically adjusts the proportion of global model weights during personalized model updates to enhance stability and reduce hyperparameter tuning costs.</p><p><strong>Approach: </strong>We introduced a personalized model update rule in Improved Ditto that dynamically determines the proportion of global model weights based on the L2-norm of the gradient-derived and global-model-derived terms. This method was evaluated using three types of CAD software: cerebral aneurysm detection in magnetic resonance (MR) angiography images (segmentation), brain metastasis detection in contrast-enhanced T1-weighted MR images (object detection), and liver lesion classification in gadolinium-ethoxybenzyl-diethylenetriamine pentaacetic acid-enhanced MR images (classification). The proposed method was compared with several conventional methods.</p><p><strong>Results: </strong>In two out of three CAD software, the performance of Improved Ditto was competitive with Ditto and other federated-learning-based methods. The proposed method achieved a narrower hyperparameter search space, which contributed to reducing the tuning costs. In addition, it improved the stability of personalized model updates, suggesting enhanced adaptability to diverse datasets and tasks.</p><p><strong>Conclusions: </strong>We demonstrate that dynamically adjusting global model weights during personalized model updates can improve the stability and adaptability of PFL. The proposed method reduces the hyperparameter tuning costs and offers potential benefits for CAD software.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 5","pages":"054503"},"PeriodicalIF":1.7,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12543029/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145355763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-10-21DOI: 10.1117/1.JMI.12.5.055002
Hannah G Mason, Jack H Noble
Purpose: Cochlear implants (CIs) are neural prosthetics used to treat patients with severe-to-profound hearing loss. Patient-specific modeling of CI stimulation of the auditory nerve fiber (ANF) can help audiologists improve the CI programming. These models require localization of the ANFs relative to the surrounding anatomy and the CI. Localization is challenging because the ANFs are so small that they are not directly visible in clinical imaging. We hypothesize that the position of the ANFs can be accurately inferred from the location of the internal auditory canal (IAC), which has high contrast in CT because the ANFs pass through this canal between the cochlea and the brain.
Approach: Inspired by VoxelMorph, we propose a deep atlas-based IAC segmentation network. We create a single atlas in which the IAC and ANFs are pre-localized. Our network is trained to produce deformation fields (DFs) mapping coordinates from the atlas to new target volumes and that accurately segment the IAC. We hypothesize that DFs that accurately segment the IAC in target images will also facilitate accurate atlas-based localization of the ANFs. As opposed to VoxelMorph, which aims to produce DFs that accurately register the entire volume, our contribution is an entirely self-supervised training scheme that aims to produce DFs that accurately segment the target structure. This self-supervision is facilitated using a loss function inspired by the Mumford-Shah functional. We call our method Deep Atlas-Based Segmentation using Mumford-Shah (DABS-MS).
Results: Results show that DABS-MS outperforms VoxelMorph for IAC segmentation. Tests with publicly available datasets for trachea and kidney segmentation also show significant improvement in segmentation accuracy, demonstrating the generalizability of the method.
Conclusions: Our proposed DABS-MS method can accurately segment the IAC, which can then facilitate the localization of the ANFs. This patient-specific modeling of CI stimulation of the ANFs can help audiologists improve the CI programming, leading to better outcomes for patients with severe-to-profound hearing loss.
{"title":"DABS-MS: deep atlas-based segmentation using the Mumford-Shah functional.","authors":"Hannah G Mason, Jack H Noble","doi":"10.1117/1.JMI.12.5.055002","DOIUrl":"10.1117/1.JMI.12.5.055002","url":null,"abstract":"<p><strong>Purpose: </strong>Cochlear implants (CIs) are neural prosthetics used to treat patients with severe-to-profound hearing loss. Patient-specific modeling of CI stimulation of the auditory nerve fiber (ANF) can help audiologists improve the CI programming. These models require localization of the ANFs relative to the surrounding anatomy and the CI. Localization is challenging because the ANFs are so small that they are not directly visible in clinical imaging. We hypothesize that the position of the ANFs can be accurately inferred from the location of the internal auditory canal (IAC), which has high contrast in CT because the ANFs pass through this canal between the cochlea and the brain.</p><p><strong>Approach: </strong>Inspired by VoxelMorph, we propose a deep atlas-based IAC segmentation network. We create a single atlas in which the IAC and ANFs are pre-localized. Our network is trained to produce deformation fields (DFs) mapping coordinates from the atlas to new target volumes and that accurately segment the IAC. We hypothesize that DFs that accurately segment the IAC in target images will also facilitate accurate atlas-based localization of the ANFs. As opposed to VoxelMorph, which aims to produce DFs that accurately register the entire volume, our contribution is an entirely self-supervised training scheme that aims to produce DFs that accurately segment the target structure. This self-supervision is facilitated using a loss function inspired by the Mumford-Shah functional. We call our method Deep Atlas-Based Segmentation using Mumford-Shah (DABS-MS).</p><p><strong>Results: </strong>Results show that DABS-MS outperforms VoxelMorph for IAC segmentation. Tests with publicly available datasets for trachea and kidney segmentation also show significant improvement in segmentation accuracy, demonstrating the generalizability of the method.</p><p><strong>Conclusions: </strong>Our proposed DABS-MS method can accurately segment the IAC, which can then facilitate the localization of the ANFs. This patient-specific modeling of CI stimulation of the ANFs can help audiologists improve the CI programming, leading to better outcomes for patients with severe-to-profound hearing loss.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 5","pages":"055002"},"PeriodicalIF":1.7,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12539791/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145349288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-10-23DOI: 10.1117/1.JMI.12.5.051801
Frank Tong, Elizabeth A Krupinski
The editorial introduces the Special Section on Medical Image Perception and Observer Performance for JMI Volume 12 Issue 5.
该社论介绍了JMI第12卷第5期的医学图像感知和观察者表现特别部分。
{"title":"Advancing Medical Image Perception and Quality Assessment Through Technology and Human Factors Research.","authors":"Frank Tong, Elizabeth A Krupinski","doi":"10.1117/1.JMI.12.5.051801","DOIUrl":"https://doi.org/10.1117/1.JMI.12.5.051801","url":null,"abstract":"<p><p>The editorial introduces the Special Section on Medical Image Perception and Observer Performance for JMI Volume 12 Issue 5.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 5","pages":"051801"},"PeriodicalIF":1.7,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12547280/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145372992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}