Purpose: Recently, diffusion posterior sampling (DPS), where score-based diffusion priors are combined with likelihood models, has been used to produce high-quality computed tomography (CT) images given low-quality measurements. This technique permits one-time, unsupervised training of a CT prior, which can then be incorporated with an arbitrary data model. However, current methods rely on a linear model of X-ray CT physics to reconstruct. Although it is common to linearize the transmission tomography reconstruction problem, this is an approximation to the true and inherently nonlinear forward model. We propose a DPS method that integrates a general nonlinear measurement model.
Approach: We implement a traditional unconditional diffusion model by training a prior score function estimator and apply Bayes' rule to combine this prior with a measurement likelihood score function derived from the nonlinear physical model to arrive at a posterior score function that can be used to sample the reverse-time diffusion process. We develop computational enhancements for the approach and evaluate the reconstruction approach in several simulation studies.
Results: The proposed nonlinear DPS provides improved performance over traditional reconstruction methods and DPS with a linear model. Moreover, as compared with a conditionally trained deep learning approach, the nonlinear DPS approach shows a better ability to provide high-quality images for different acquisition protocols.
Conclusion: This plug-and-play method allows the incorporation of a diffusion-based prior with a general nonlinear CT measurement model. This permits the application of the approach to different systems, protocols, etc., without the need for any additional training.
{"title":"CT reconstruction using diffusion posterior sampling conditioned on a nonlinear measurement model.","authors":"Shudong Li, Xiao Jiang, Matthew Tivnan, Grace J Gang, Yuan Shen, J Webster Stayman","doi":"10.1117/1.JMI.11.4.043504","DOIUrl":"10.1117/1.JMI.11.4.043504","url":null,"abstract":"<p><strong>Purpose: </strong>Recently, diffusion posterior sampling (DPS), where score-based diffusion priors are combined with likelihood models, has been used to produce high-quality computed tomography (CT) images given low-quality measurements. This technique permits one-time, unsupervised training of a CT prior, which can then be incorporated with an arbitrary data model. However, current methods rely on a linear model of X-ray CT physics to reconstruct. Although it is common to linearize the transmission tomography reconstruction problem, this is an approximation to the true and inherently nonlinear forward model. We propose a DPS method that integrates a general nonlinear measurement model.</p><p><strong>Approach: </strong>We implement a traditional unconditional diffusion model by training a prior score function estimator and apply Bayes' rule to combine this prior with a measurement likelihood score function derived from the nonlinear physical model to arrive at a posterior score function that can be used to sample the reverse-time diffusion process. We develop computational enhancements for the approach and evaluate the reconstruction approach in several simulation studies.</p><p><strong>Results: </strong>The proposed nonlinear DPS provides improved performance over traditional reconstruction methods and DPS with a linear model. Moreover, as compared with a conditionally trained deep learning approach, the nonlinear DPS approach shows a better ability to provide high-quality images for different acquisition protocols.</p><p><strong>Conclusion: </strong>This plug-and-play method allows the incorporation of a diffusion-based prior with a general nonlinear CT measurement model. This permits the application of the approach to different systems, protocols, etc., without the need for any additional training.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"11 4","pages":"043504"},"PeriodicalIF":1.9,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11362816/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142113459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01Epub Date: 2024-08-06DOI: 10.1117/1.JMI.11.4.044506
Steven Squires, Alistair Mackenzie, Dafydd Gareth Evans, Sacha J Howell, Susan M Astley
Purpose: Breast density is associated with the risk of developing cancer and can be automatically estimated using deep learning models from digital mammograms. Our aim is to evaluate the capacity and reliability of such models to predict density from low-dose mammograms taken to enable risk estimates for younger women.
Approach: We trained deep learning models on standard-dose and simulated low-dose mammograms. The models were then tested on a mammography dataset with paired standard- and low-dose images. The effect of different factors (including age, density, and dose ratio) on the differences between predictions on standard and low doses is analyzed. Methods to improve performance are assessed, and factors that reduce the model quality are demonstrated.
Results: We showed that, although many factors have no significant effect on the quality of low-dose density prediction, both density and breast area have an impact. The correlation between density predictions on low- and standard-dose images of breasts with the largest breast area is 0.985 (0.949 to 0.995), whereas that with the smallest is 0.882 (0.697 to 0.961). We also demonstrated that averaging across craniocaudal-mediolateral oblique (CC-MLO) images and across repeatedly trained models can improve predictive performance.
Conclusions: Low-dose mammography can be used to produce density and risk estimates that are comparable to standard-dose images. Averaging across CC-MLO and model predictions should improve this performance. The model quality is reduced when making predictions on denser and smaller breasts.
目的:乳房密度与罹患癌症的风险有关,可以使用深度学习模型从数字乳房X光照片中自动估算出乳房密度。我们的目的是评估此类模型预测低剂量乳房 X 光照片密度的能力和可靠性,以便对年轻女性进行风险估计:我们在标准剂量和模拟低剂量乳房 X 光照片上训练了深度学习模型。然后在标准剂量和低剂量图像配对的乳房 X 射线照相数据集上对模型进行测试。分析了不同因素(包括年龄、密度和剂量比)对标准剂量和低剂量预测差异的影响。评估了提高性能的方法,并展示了降低模型质量的因素:结果:我们发现,虽然很多因素对低剂量密度预测的质量没有显著影响,但密度和乳房面积都有影响。乳房面积最大的乳房在低剂量和标准剂量图像上的密度预测相关性为 0.985(0.949 至 0.995),而乳房面积最小的乳房在低剂量和标准剂量图像上的密度预测相关性为 0.882(0.697 至 0.961)。我们还证明,在颅尾-中间偏斜(CC-MLO)图像和反复训练的模型之间进行平均,可以提高预测性能:结论:低剂量乳腺 X 射线照相术可产生与标准剂量图像相当的密度和风险估计值。CC-MLO和模型预测的平均值应能提高这一性能。对密度较高和较小的乳房进行预测时,模型质量会下降。
{"title":"Capability and reliability of deep learning models to make density predictions on low-dose mammograms.","authors":"Steven Squires, Alistair Mackenzie, Dafydd Gareth Evans, Sacha J Howell, Susan M Astley","doi":"10.1117/1.JMI.11.4.044506","DOIUrl":"10.1117/1.JMI.11.4.044506","url":null,"abstract":"<p><strong>Purpose: </strong>Breast density is associated with the risk of developing cancer and can be automatically estimated using deep learning models from digital mammograms. Our aim is to evaluate the capacity and reliability of such models to predict density from low-dose mammograms taken to enable risk estimates for younger women.</p><p><strong>Approach: </strong>We trained deep learning models on standard-dose and simulated low-dose mammograms. The models were then tested on a mammography dataset with paired standard- and low-dose images. The effect of different factors (including age, density, and dose ratio) on the differences between predictions on standard and low doses is analyzed. Methods to improve performance are assessed, and factors that reduce the model quality are demonstrated.</p><p><strong>Results: </strong>We showed that, although many factors have no significant effect on the quality of low-dose density prediction, both density and breast area have an impact. The correlation between density predictions on low- and standard-dose images of breasts with the largest breast area is 0.985 (0.949 to 0.995), whereas that with the smallest is 0.882 (0.697 to 0.961). We also demonstrated that averaging across craniocaudal-mediolateral oblique (CC-MLO) images and across repeatedly trained models can improve predictive performance.</p><p><strong>Conclusions: </strong>Low-dose mammography can be used to produce density and risk estimates that are comparable to standard-dose images. Averaging across CC-MLO and model predictions should improve this performance. The model quality is reduced when making predictions on denser and smaller breasts.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"11 4","pages":"044506"},"PeriodicalIF":1.9,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11301609/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141903210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01Epub Date: 2024-08-01DOI: 10.1117/1.JMI.11.4.044005
Artyom Tsanda, Hannes Nickisch, Tobias Wissel, Tobias Klinder, Tobias Knopp, Michael Grass
Purpose: The trend towards lower radiation doses and advances in computed tomography (CT) reconstruction may impair the operation of pretrained segmentation models, giving rise to the problem of estimating the dose robustness of existing segmentation models. Previous studies addressing the issue suffer either from a lack of registered low- and full-dose CT images or from simplified simulations.
Approach: We employed raw data from full-dose acquisitions to simulate low-dose CT scans, avoiding the need to rescan a patient. The accuracy of the simulation is validated using a real CT scan of a phantom. We consider down to 20% reduction of radiation dose, for which we measure deviations of several pretrained segmentation models from the full-dose prediction. In addition, compatibility with existing denoising methods is considered.
Results: The results reveal the surprising robustness of the TotalSegmentator approach, showing minimal differences at the pixel level even without denoising. Less robust models show good compatibility with the denoising methods, which help to improve robustness in almost all cases. With denoising based on a convolutional neural network (CNN), the median Dice between low- and full-dose data does not fall below 0.9 (12 for the Hausdorff distance) for all but one model. We observe volatile results for labels with effective radii less than 19 mm and improved results for contrasted CT acquisitions.
Conclusion: The proposed approach facilitates clinically relevant analysis of dose robustness for human organ segmentation models. The results outline the robustness properties of a diverse set of models. Further studies are needed to identify the robustness of approaches for lesion segmentation and to rank the factors contributing to dose robustness.
{"title":"Dose robustness of deep learning models for anatomic segmentation of computed tomography images.","authors":"Artyom Tsanda, Hannes Nickisch, Tobias Wissel, Tobias Klinder, Tobias Knopp, Michael Grass","doi":"10.1117/1.JMI.11.4.044005","DOIUrl":"10.1117/1.JMI.11.4.044005","url":null,"abstract":"<p><strong>Purpose: </strong>The trend towards lower radiation doses and advances in computed tomography (CT) reconstruction may impair the operation of pretrained segmentation models, giving rise to the problem of estimating the dose robustness of existing segmentation models. Previous studies addressing the issue suffer either from a lack of registered low- and full-dose CT images or from simplified simulations.</p><p><strong>Approach: </strong>We employed raw data from full-dose acquisitions to simulate low-dose CT scans, avoiding the need to rescan a patient. The accuracy of the simulation is validated using a real CT scan of a phantom. We consider down to 20% reduction of radiation dose, for which we measure deviations of several pretrained segmentation models from the full-dose prediction. In addition, compatibility with existing denoising methods is considered.</p><p><strong>Results: </strong>The results reveal the surprising robustness of the TotalSegmentator approach, showing minimal differences at the pixel level even without denoising. Less robust models show good compatibility with the denoising methods, which help to improve robustness in almost all cases. With denoising based on a convolutional neural network (CNN), the median Dice between low- and full-dose data does not fall below 0.9 (12 for the Hausdorff distance) for all but one model. We observe volatile results for labels with effective radii less than 19 mm and improved results for contrasted CT acquisitions.</p><p><strong>Conclusion: </strong>The proposed approach facilitates clinically relevant analysis of dose robustness for human organ segmentation models. The results outline the robustness properties of a diverse set of models. Further studies are needed to identify the robustness of approaches for lesion segmentation and to rank the factors contributing to dose robustness.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"11 4","pages":"044005"},"PeriodicalIF":1.9,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11293838/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141890472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01Epub Date: 2024-08-30DOI: 10.1117/1.JMI.11.4.040101
Bennett Landman
The editorial discusses highlights from JMI Issue 4.
社论讨论了第四期 JMI 的亮点。
{"title":"Highlights from JMI Issue 4.","authors":"Bennett Landman","doi":"10.1117/1.JMI.11.4.040101","DOIUrl":"https://doi.org/10.1117/1.JMI.11.4.040101","url":null,"abstract":"<p><p>The editorial discusses highlights from JMI Issue 4.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"11 4","pages":"040101"},"PeriodicalIF":1.9,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11368250/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142126996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01Epub Date: 2024-07-09DOI: 10.1117/1.JMI.11.4.044501
Daniel D Liang, David D Liang, Marc J Pomeroy, Yongfeng Gao, Licheng R Kuo, Lihong C Li
Purpose: Medical imaging-based machine learning (ML) for computer-aided diagnosis of in vivo lesions consists of two basic components or modules of (i) feature extraction from non-invasively acquired medical images and (ii) feature classification for prediction of malignancy of lesions detected or localized in the medical images. This study investigates their individual performances for diagnosis of low-dose computed tomography (CT) screening-detected lesions of pulmonary nodules and colorectal polyps.
Approach: Three feature extraction methods were investigated. One uses the mathematical descriptor of gray-level co-occurrence image texture measure to extract the Haralick image texture features (HFs). One uses the convolutional neural network (CNN) architecture to extract deep learning (DL) image abstractive features (DFs). The third one uses the interactions between lesion tissues and X-ray energy of CT to extract tissue-energy specific characteristic features (TFs). All the above three categories of extracted features were classified by the random forest (RF) classifier with comparison to the DL-CNN method, which reads the images, extracts the DFs, and classifies the DFs in an end-to-end manner. The ML diagnosis of lesions or prediction of lesion malignancy was measured by the area under the receiver operating characteristic curve (AUC). Three lesion image datasets were used. The lesions' tissue pathological reports were used as the learning labels.
Results: Experiments on the three datasets produced AUC values of 0.724 to 0.878 for the HFs, 0.652 to 0.965 for the DFs, and 0.985 to 0.996 for the TFs, compared to the DL-CNN of 0.694 to 0.964. These experimental outcomes indicate that the RF classifier performed comparably to the DL-CNN classification module and the extraction of tissue-energy specific characteristic features dramatically improved AUC value.
Conclusions: The feature extraction module is more important than the feature classification module. Extraction of tissue-energy specific characteristic features is more important than extraction of image abstractive and characteristic features.
{"title":"Examining feature extraction and classification modules in machine learning for diagnosis of low-dose computed tomographic screening-detected <i>in vivo</i> lesions.","authors":"Daniel D Liang, David D Liang, Marc J Pomeroy, Yongfeng Gao, Licheng R Kuo, Lihong C Li","doi":"10.1117/1.JMI.11.4.044501","DOIUrl":"10.1117/1.JMI.11.4.044501","url":null,"abstract":"<p><strong>Purpose: </strong>Medical imaging-based machine learning (ML) for computer-aided diagnosis of <i>in vivo</i> lesions consists of two basic components or modules of (i) feature extraction from non-invasively acquired medical images and (ii) feature classification for prediction of malignancy of lesions detected or localized in the medical images. This study investigates their individual performances for diagnosis of low-dose computed tomography (CT) screening-detected lesions of pulmonary nodules and colorectal polyps.</p><p><strong>Approach: </strong>Three feature extraction methods were investigated. One uses the mathematical descriptor of gray-level co-occurrence image texture measure to extract the Haralick image texture features (HFs). One uses the convolutional neural network (CNN) architecture to extract deep learning (DL) image abstractive features (DFs). The third one uses the interactions between lesion tissues and X-ray energy of CT to extract tissue-energy specific characteristic features (TFs). All the above three categories of extracted features were classified by the random forest (RF) classifier with comparison to the DL-CNN method, which reads the images, extracts the DFs, and classifies the DFs in an end-to-end manner. The ML diagnosis of lesions or prediction of lesion malignancy was measured by the area under the receiver operating characteristic curve (AUC). Three lesion image datasets were used. The lesions' tissue pathological reports were used as the learning labels.</p><p><strong>Results: </strong>Experiments on the three datasets produced AUC values of 0.724 to 0.878 for the HFs, 0.652 to 0.965 for the DFs, and 0.985 to 0.996 for the TFs, compared to the DL-CNN of 0.694 to 0.964. These experimental outcomes indicate that the RF classifier performed comparably to the DL-CNN classification module and the extraction of tissue-energy specific characteristic features dramatically improved AUC value.</p><p><strong>Conclusions: </strong>The feature extraction module is more important than the feature classification module. Extraction of tissue-energy specific characteristic features is more important than extraction of image abstractive and characteristic features.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"11 4","pages":"044501"},"PeriodicalIF":1.9,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11234229/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141591735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose: Tetralogy of Fallot (TOF) is a congenital heart disease, and patients undergo surgical repair early in their lives. The evaluation of TOF patients is continuous through their adulthood. The use of cardiac magnetic resonance imaging (CMR) is vital for the evaluation of TOF patients. We aim to correlate advanced MRI sequences [parametric longitudinal relaxation time (T1), extracellular volume (ECV) mapping] with cardiac functionality to provide biomarkers for the evaluation of these patients.
Methods: A complete CMR examination with the same imaging protocol was conducted in a total of 11 TOF patients and a control group of 25 healthy individuals. A Modified Look-Locker Inversion recovery (MOLLI) sequence was included to acquire the global T1 myocardial relaxation times of the left ventricular (LV) pre and post-contrast administration. Appropriate software (Circle cmr42) was used for the CMR analysis and the calculation of native, post-contrast T1, and ECV maps. A regression analysis was conducted for the correlation between global LV T1 values and right ventricular (RV) functional indices.
Results: Statistically significant results were obtained for RV cardiac index [RV_CI= -32.765 + 0.029 × T1 native; ], RV end diastolic volume [RV_EDV/BSA = -1023.872 + 0.902 × T1 native; ], and RV end systolic volume [RV_ESV/BSA = -536.704 + 0.472 × T1 native; ].
Conclusions: We further support the diagnostic importance of T1 mapping as a structural imaging tool in CMR. In addition to the well-known affected RV function in TOF patients, the LV structure is also impaired as there is a strong correlation between LV T1 mapping and RV function, evoking that the heart operates as an entity.
{"title":"Left ventricular structural integrity on tetralogy of Fallot patients: approach using longitudinal relaxation time mapping.","authors":"Giorgos Broumpoulis, Efstratios Karavasilis, Niki Lama, Ioannis Papadopoulos, Panagiotis Zachos, Sotiria Apostolopoulou, Nikolaos Kelekis","doi":"10.1117/1.JMI.11.4.044004","DOIUrl":"10.1117/1.JMI.11.4.044004","url":null,"abstract":"<p><strong>Purpose: </strong>Tetralogy of Fallot (TOF) is a congenital heart disease, and patients undergo surgical repair early in their lives. The evaluation of TOF patients is continuous through their adulthood. The use of cardiac magnetic resonance imaging (CMR) is vital for the evaluation of TOF patients. We aim to correlate advanced MRI sequences [parametric longitudinal relaxation time (T1), extracellular volume (ECV) mapping] with cardiac functionality to provide biomarkers for the evaluation of these patients.</p><p><strong>Methods: </strong>A complete CMR examination with the same imaging protocol was conducted in a total of 11 TOF patients and a control group of 25 healthy individuals. A Modified Look-Locker Inversion recovery (MOLLI) sequence was included to acquire the global T1 myocardial relaxation times of the left ventricular (LV) pre and post-contrast administration. Appropriate software (Circle cmr42) was used for the CMR analysis and the calculation of native, post-contrast T1, and ECV maps. A regression analysis was conducted for the correlation between global LV T1 values and right ventricular (RV) functional indices.</p><p><strong>Results: </strong>Statistically significant results were obtained for RV cardiac index [RV_CI= -32.765 + 0.029 × T1 native; <math><mrow><mi>p</mi> <mo>=</mo> <mn>0.003</mn></mrow> </math> ], RV end diastolic volume [RV_EDV/BSA = -1023.872 + 0.902 × T1 native; <math><mrow><mi>p</mi> <mo>=</mo> <mn>0.001</mn></mrow> </math> ], and RV end systolic volume [RV_ESV/BSA = -536.704 + 0.472 × T1 native; <math><mrow><mi>p</mi> <mo>=</mo> <mn>0.011</mn></mrow> </math> ].</p><p><strong>Conclusions: </strong>We further support the diagnostic importance of T1 mapping as a structural imaging tool in CMR. In addition to the well-known affected RV function in TOF patients, the LV structure is also impaired as there is a strong correlation between LV T1 mapping and RV function, evoking that the heart operates as an entity.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"11 4","pages":"044004"},"PeriodicalIF":1.9,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11293558/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141890473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01Epub Date: 2024-08-13DOI: 10.1117/1.JMI.11.4.045503
Joshua D Herman, Rachel E Roca, Alexandra G O'Neill, Marcus L Wong, Sajan Goud Lingala, Angel R Pineda
Purpose: Recent research explores using neural networks to reconstruct undersampled magnetic resonance imaging. Because of the complexity of the artifacts in the reconstructed images, there is a need to develop task-based approaches to image quality. We compared conventional global quantitative metrics to evaluate image quality in undersampled images generated by a neural network with human observer performance in a detection task. The purpose is to study which acceleration (2×, 3×, 4×, 5×) would be chosen with the conventional metrics and compare it to the acceleration chosen by human observer performance.
Approach: We used common global metrics for evaluating image quality: the normalized root mean squared error (NRMSE) and structural similarity (SSIM). These metrics are compared with a measure of image quality that incorporates a subtle signal for a specific task to allow for image quality assessment that locally evaluates the effect of undersampling on a signal. We used a U-Net to reconstruct under-sampled images with 2×, 3×, 4×, and 5× one-dimensional undersampling rates. Cross-validation was performed for a 500- and a 4000-image training set with both SSIM and MSE losses. A two-alternative forced choice (2-AFC) observer study was carried out for detecting a subtle signal (small blurred disk) from images with the 4000-image training set.
Results: We found that for both loss functions, the human observer performance on the 2-AFC studies led to a choice of a 2× undersampling, but the SSIM and NRMSE led to a choice of a 3× undersampling.
Conclusions: For this detection task using a subtle small signal at the edge of detectability, SSIM and NRMSE led to an overestimate of the achievable undersampling using a U-Net before a steep loss of image quality between 2×, 3×, 4×, 5× undersampling rates when compared to the performance of human observers in the detection task.
{"title":"Task-based assessment for neural networks: evaluating undersampled MRI reconstructions based on human observer signal detection.","authors":"Joshua D Herman, Rachel E Roca, Alexandra G O'Neill, Marcus L Wong, Sajan Goud Lingala, Angel R Pineda","doi":"10.1117/1.JMI.11.4.045503","DOIUrl":"10.1117/1.JMI.11.4.045503","url":null,"abstract":"<p><strong>Purpose: </strong>Recent research explores using neural networks to reconstruct undersampled magnetic resonance imaging. Because of the complexity of the artifacts in the reconstructed images, there is a need to develop task-based approaches to image quality. We compared conventional global quantitative metrics to evaluate image quality in undersampled images generated by a neural network with human observer performance in a detection task. The purpose is to study which acceleration (2×, 3×, 4×, 5×) would be chosen with the conventional metrics and compare it to the acceleration chosen by human observer performance.</p><p><strong>Approach: </strong>We used common global metrics for evaluating image quality: the normalized root mean squared error (NRMSE) and structural similarity (SSIM). These metrics are compared with a measure of image quality that incorporates a subtle signal for a specific task to allow for image quality assessment that locally evaluates the effect of undersampling on a signal. We used a U-Net to reconstruct under-sampled images with 2×, 3×, 4×, and 5× one-dimensional undersampling rates. Cross-validation was performed for a 500- and a 4000-image training set with both SSIM and MSE losses. A two-alternative forced choice (2-AFC) observer study was carried out for detecting a subtle signal (small blurred disk) from images with the 4000-image training set.</p><p><strong>Results: </strong>We found that for both loss functions, the human observer performance on the 2-AFC studies led to a choice of a 2× undersampling, but the SSIM and NRMSE led to a choice of a 3× undersampling.</p><p><strong>Conclusions: </strong>For this detection task using a subtle small signal at the edge of detectability, SSIM and NRMSE led to an overestimate of the achievable undersampling using a U-Net before a steep loss of image quality between 2×, 3×, 4×, 5× undersampling rates when compared to the performance of human observers in the detection task.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"11 4","pages":"045503"},"PeriodicalIF":1.9,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11321363/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141983636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01Epub Date: 2024-08-23DOI: 10.1117/1.JMI.11.4.044006
Taebin Kim, Yao Li, Benjamin C Calhoun, Aatish Thennavan, Lisa A Carey, W Fraser Symmans, Melissa A Troester, Charles M Perou, J S Marron
Purpose: We address the need for effective stain domain adaptation methods in histopathology to enhance the performance of downstream computational tasks, particularly classification. Existing methods exhibit varying strengths and weaknesses, prompting the exploration of a different approach. The focus is on improving stain color consistency, expanding the stain domain scope, and minimizing the domain gap between image batches.
Approach: We introduce a new domain adaptation method, Stain simultaneous augmentation and normalization (SAN), designed to adjust the distribution of stain colors to align with a target distribution. Stain SAN combines the merits of established methods, such as stain normalization, stain augmentation, and stain mix-up, while mitigating their inherent limitations. Stain SAN adapts stain domains by resampling stain color matrices from a well-structured target distribution.
Results: Experimental evaluations of cross-dataset clinical estrogen receptor status classification demonstrate the efficacy of Stain SAN and its superior performance compared with existing stain adaptation methods. In one case, the area under the curve (AUC) increased by 11.4%. Overall, our results clearly show the improvements made over the history of the development of these methods culminating with substantial enhancement provided by Stain SAN. Furthermore, we show that Stain SAN achieves results comparable with the state-of-the-art generative adversarial network-based approach without requiring separate training for stain adaptation or access to the target domain during training. Stain SAN's performance is on par with HistAuGAN, proving its effectiveness and computational efficiency.
Conclusions: Stain SAN emerges as a promising solution, addressing the potential shortcomings of contemporary stain adaptation methods. Its effectiveness is underscored by notable improvements in the context of clinical estrogen receptor status classification, where it achieves the best AUC performance. The findings endorse Stain SAN as a robust approach for stain domain adaptation in histopathology images, with implications for advancing computational tasks in the field.
目的:我们需要组织病理学中有效的染色域适应方法,以提高下游计算任务(尤其是分类)的性能。现有方法表现出不同的优缺点,促使我们探索不同的方法。重点在于提高染色剂颜色的一致性、扩大染色剂领域范围以及尽量缩小图像批次之间的领域差距:我们引入了一种新的领域适应方法--染色同步增强和归一化(SAN),旨在调整染色颜色的分布,使其与目标分布相一致。染色同步增强和归一化结合了染色归一化、染色增强和染色混合等既有方法的优点,同时又减少了它们固有的局限性。Stain SAN 通过从结构良好的目标分布中重新采样染色剂颜色矩阵来调整染色剂域:结果:对跨数据集临床雌激素受体状态分类的实验评估证明了 Stain SAN 的功效以及与现有染色适应方法相比的卓越性能。在一个案例中,曲线下面积(AUC)增加了 11.4%。总之,我们的研究结果清楚地表明,这些方法在发展过程中不断改进,最终由 Stain SAN 实现了大幅提升。此外,我们还表明,Stain SAN 所取得的结果可与最先进的基于生成式对抗网络的方法相媲美,而无需对染色适应进行单独训练,也无需在训练期间访问目标域。Stain SAN 的性能与 HistAuGAN 相当,证明了其有效性和计算效率:Stain SAN 是一种很有前途的解决方案,它解决了当代染色适应方法的潜在缺陷。在临床雌激素受体状态分类方面,Stain SAN 取得了最佳的 AUC 性能,其显著的改进凸显了它的有效性。研究结果证明,Stain SAN 是组织病理学图像染色域适应的一种稳健方法,对推进该领域的计算任务具有重要意义。
{"title":"Stain SAN: simultaneous augmentation and normalization for histopathology images.","authors":"Taebin Kim, Yao Li, Benjamin C Calhoun, Aatish Thennavan, Lisa A Carey, W Fraser Symmans, Melissa A Troester, Charles M Perou, J S Marron","doi":"10.1117/1.JMI.11.4.044006","DOIUrl":"10.1117/1.JMI.11.4.044006","url":null,"abstract":"<p><strong>Purpose: </strong>We address the need for effective stain domain adaptation methods in histopathology to enhance the performance of downstream computational tasks, particularly classification. Existing methods exhibit varying strengths and weaknesses, prompting the exploration of a different approach. The focus is on improving stain color consistency, expanding the stain domain scope, and minimizing the domain gap between image batches.</p><p><strong>Approach: </strong>We introduce a new domain adaptation method, Stain simultaneous augmentation and normalization (SAN), designed to adjust the distribution of stain colors to align with a target distribution. Stain SAN combines the merits of established methods, such as stain normalization, stain augmentation, and stain mix-up, while mitigating their inherent limitations. Stain SAN adapts stain domains by resampling stain color matrices from a well-structured target distribution.</p><p><strong>Results: </strong>Experimental evaluations of cross-dataset clinical estrogen receptor status classification demonstrate the efficacy of Stain SAN and its superior performance compared with existing stain adaptation methods. In one case, the area under the curve (AUC) increased by 11.4%. Overall, our results clearly show the improvements made over the history of the development of these methods culminating with substantial enhancement provided by Stain SAN. Furthermore, we show that Stain SAN achieves results comparable with the state-of-the-art generative adversarial network-based approach without requiring separate training for stain adaptation or access to the target domain during training. Stain SAN's performance is on par with HistAuGAN, proving its effectiveness and computational efficiency.</p><p><strong>Conclusions: </strong>Stain SAN emerges as a promising solution, addressing the potential shortcomings of contemporary stain adaptation methods. Its effectiveness is underscored by notable improvements in the context of clinical estrogen receptor status classification, where it achieves the best AUC performance. The findings endorse Stain SAN as a robust approach for stain domain adaptation in histopathology images, with implications for advancing computational tasks in the field.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"11 4","pages":"044006"},"PeriodicalIF":1.9,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11342968/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142056923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01Epub Date: 2024-08-07DOI: 10.1117/1.JMI.11.4.044507
Mohammad Mehdi Farhangi, Michael Maynord, Cornelia Fermüller, Yiannis Aloimonos, Berkman Sahiner, Nicholas Petrick
Purpose: Synthetic datasets hold the potential to offer cost-effective alternatives to clinical data, ensuring privacy protections and potentially addressing biases in clinical data. We present a method leveraging such datasets to train a machine learning algorithm applied as part of a computer-aided detection (CADe) system.
Approach: Our proposed approach utilizes clinically acquired computed tomography (CT) scans of a physical anthropomorphic phantom into which manufactured lesions were inserted to train a machine learning algorithm. We treated the training database obtained from the anthropomorphic phantom as a simplified representation of clinical data and increased the variability in this dataset using a set of randomized and parameterized augmentations. Furthermore, to mitigate the inherent differences between phantom and clinical datasets, we investigated adding unlabeled clinical data into the training pipeline.
Results: We apply our proposed method to the false positive reduction stage of a lung nodule CADe system in CT scans, in which regions of interest containing potential lesions are classified as nodule or non-nodule regions. Experimental results demonstrate the effectiveness of the proposed method; the system trained on labeled data from physical phantom scans and unlabeled clinical data achieves a sensitivity of 90% at eight false positives per scan. Furthermore, the experimental results demonstrate the benefit of the physical phantom in which the performance in terms of competitive performance metric increased by 6% when a training set consisting of 50 clinical CT scans was enlarged by the scans obtained from the physical phantom.
Conclusions: The scalability of synthetic datasets can lead to improved CADe performance, particularly in scenarios in which the size of the labeled clinical data is limited or subject to inherent bias. Our proposed approach demonstrates an effective utilization of synthetic datasets for training machine learning algorithms.
{"title":"Exploring synthetic datasets for computer-aided detection: a case study using phantom scan data for enhanced lung nodule false positive reduction.","authors":"Mohammad Mehdi Farhangi, Michael Maynord, Cornelia Fermüller, Yiannis Aloimonos, Berkman Sahiner, Nicholas Petrick","doi":"10.1117/1.JMI.11.4.044507","DOIUrl":"10.1117/1.JMI.11.4.044507","url":null,"abstract":"<p><strong>Purpose: </strong>Synthetic datasets hold the potential to offer cost-effective alternatives to clinical data, ensuring privacy protections and potentially addressing biases in clinical data. We present a method leveraging such datasets to train a machine learning algorithm applied as part of a computer-aided detection (CADe) system.</p><p><strong>Approach: </strong>Our proposed approach utilizes clinically acquired computed tomography (CT) scans of a physical anthropomorphic phantom into which manufactured lesions were inserted to train a machine learning algorithm. We treated the training database obtained from the anthropomorphic phantom as a simplified representation of clinical data and increased the variability in this dataset using a set of randomized and parameterized augmentations. Furthermore, to mitigate the inherent differences between phantom and clinical datasets, we investigated adding unlabeled clinical data into the training pipeline.</p><p><strong>Results: </strong>We apply our proposed method to the false positive reduction stage of a lung nodule CADe system in CT scans, in which regions of interest containing potential lesions are classified as nodule or non-nodule regions. Experimental results demonstrate the effectiveness of the proposed method; the system trained on labeled data from physical phantom scans and unlabeled clinical data achieves a sensitivity of 90% at eight false positives per scan. Furthermore, the experimental results demonstrate the benefit of the physical phantom in which the performance in terms of competitive performance metric increased by 6% when a training set consisting of 50 clinical CT scans was enlarged by the scans obtained from the physical phantom.</p><p><strong>Conclusions: </strong>The scalability of synthetic datasets can lead to improved CADe performance, particularly in scenarios in which the size of the labeled clinical data is limited or subject to inherent bias. Our proposed approach demonstrates an effective utilization of synthetic datasets for training machine learning algorithms.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"11 4","pages":"044507"},"PeriodicalIF":1.9,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11304989/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141907942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01Epub Date: 2024-08-10DOI: 10.1117/1.JMI.11.4.045502
Jelena M Mihailovic, Yoshihisa Kanaji, Daniel Miller, Malcolm R Bell, Kenneth A Fetterly
Purpose: Spatio-temporal variability in clinical fluoroscopy and cine angiography images combined with nonlinear image processing prevents the application of traditional image quality measurements in the cardiac catheterization laboratory. We aimed to develop and validate methods to measure human observer impressions of the image quality.
Approach: Multi-frame images of the thorax of a euthanized pig were acquired to provide an anatomical background. The detector dose was varied from 6 to 200 nGy (increments 2×), and 0.6 and 1.0 mm focal spots were used. Two coronary stents with/without 0.5 mm separation and a synthetic right coronary artery (RCA) with hemispherical defects were embedded into the background images as test objects. The quantitative observer ( ) performance was measured using a two-alternating forced-choice test of whether stents were separated and by a count of visible right coronary artery defects. Qualitative impressions of noise, spatial resolution, and overall image quality were measured using a visual analog scale (VAS). A paired -test and multinomial logistic regression model were used to identify statistically significant factors affecting the observer's impression image quality.
Results: The proportion of correct detection of stent separation and the number of reported right coronary artery defects changed significantly with detector dose increment in the 6 to 100 nGy ( ). Although a trend favored the 0.6 versus 1.0 mm focal spot for these quantitative assessments, this was insignificant. Visual analog scale measurements changed significantly with detector dose increments in the range of 24 to 100 nGy and focal spot size ( ). The application of multinomial logistic regression analysis to observer VAS scores demonstrated sensitivity matching of the paired -test applied to quantitative observer performance measurements.
Conclusions: Both quantitative and qualitative measurements of observer impression of the image quality were sensitive to image quality changes associated with changing the detector dose and focal spot size. These findings encourage future work that uses qualitative image quality measurements to assess clinical fluoroscopy and angiography image quality.
{"title":"Comparison of human observer impression of X-ray fluoroscopy and angiography image quality with technical changes to image quality.","authors":"Jelena M Mihailovic, Yoshihisa Kanaji, Daniel Miller, Malcolm R Bell, Kenneth A Fetterly","doi":"10.1117/1.JMI.11.4.045502","DOIUrl":"10.1117/1.JMI.11.4.045502","url":null,"abstract":"<p><strong>Purpose: </strong>Spatio-temporal variability in clinical fluoroscopy and cine angiography images combined with nonlinear image processing prevents the application of traditional image quality measurements in the cardiac catheterization laboratory. We aimed to develop and validate methods to measure human observer impressions of the image quality.</p><p><strong>Approach: </strong>Multi-frame images of the thorax of a euthanized pig were acquired to provide an anatomical background. The detector dose was varied from 6 to 200 nGy (increments 2×), and 0.6 and 1.0 mm focal spots were used. Two coronary stents with/without 0.5 mm separation and a synthetic right coronary artery (RCA) with hemispherical defects were embedded into the background images as test objects. The quantitative observer ( <math><mrow><mi>n</mi> <mo>=</mo> <mn>17</mn></mrow> </math> ) performance was measured using a two-alternating forced-choice test of whether stents were separated and by a count of visible right coronary artery defects. Qualitative impressions of noise, spatial resolution, and overall image quality were measured using a visual analog scale (VAS). A paired <math><mrow><mi>t</mi></mrow> </math> -test and multinomial logistic regression model were used to identify statistically significant factors affecting the observer's impression image quality.</p><p><strong>Results: </strong>The proportion of correct detection of stent separation and the number of reported right coronary artery defects changed significantly with detector dose increment in the 6 to 100 nGy ( <math><mrow><mi>p</mi> <mo><</mo> <mn>0.05</mn></mrow> </math> ). Although a trend favored the 0.6 versus 1.0 mm focal spot for these quantitative assessments, this was insignificant. Visual analog scale measurements changed significantly with detector dose increments in the range of 24 to 100 nGy and focal spot size ( <math><mrow><mi>p</mi> <mo><</mo> <mn>0.05</mn></mrow> </math> ). The application of multinomial logistic regression analysis to observer VAS scores demonstrated sensitivity matching of the paired <math><mrow><mi>t</mi></mrow> </math> -test applied to quantitative observer performance measurements.</p><p><strong>Conclusions: </strong>Both quantitative and qualitative measurements of observer impression of the image quality were sensitive to image quality changes associated with changing the detector dose and focal spot size. These findings encourage future work that uses qualitative image quality measurements to assess clinical fluoroscopy and angiography image quality.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"11 4","pages":"045502"},"PeriodicalIF":1.9,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11316400/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141917779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}