Philip M Adamson, Arjun D Desai, Jeffrey Dominic, Maya Varma, Christian Bluethgen, Jeff P Wood, Ali B Syed, Robert D Boutin, Kathryn J Stevens, Shreyas Vasanawala, John M Pauly, Beliz Gunel, Akshay S Chaudhari
{"title":"Using deep feature distances for evaluating the perceptual quality of MR image reconstructions.","authors":"Philip M Adamson, Arjun D Desai, Jeffrey Dominic, Maya Varma, Christian Bluethgen, Jeff P Wood, Ali B Syed, Robert D Boutin, Kathryn J Stevens, Shreyas Vasanawala, John M Pauly, Beliz Gunel, Akshay S Chaudhari","doi":"10.1002/mrm.30437","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Commonly used MR image quality (IQ) metrics have poor concordance with radiologist-perceived diagnostic IQ. Here, we develop and explore deep feature distances (DFDs)-distances computed in a lower-dimensional feature space encoded by a convolutional neural network (CNN)-as improved perceptual IQ metrics for MR image reconstruction. We further explore the impact of distribution shifts between images in the DFD CNN encoder training data and the IQ metric evaluation.</p><p><strong>Methods: </strong>We compare commonly used IQ metrics (PSNR and SSIM) to two \"out-of-domain\" DFDs with encoders trained on natural images, an \"in-domain\" DFD trained on MR images alone, and two domain-adjacent DFDs trained on large medical imaging datasets. We additionally compare these with several state-of-the-art but less commonly reported IQ metrics, visual information fidelity (VIF), noise quality metric (NQM), and the high-frequency error norm (HFEN). IQ metric performance is assessed via correlations with five expert radiologist reader scores of perceived diagnostic IQ of various accelerated MR image reconstructions. We characterize the behavior of these IQ metrics under common distortions expected during image acquisition, including their sensitivity to acquisition noise.</p><p><strong>Results: </strong>All DFDs and HFEN correlate more strongly with radiologist-perceived diagnostic IQ than SSIM, PSNR, and other state-of-the-art metrics, with correlations being comparable to radiologist inter-reader variability. Surprisingly, out-of-domain DFDs perform comparably to in-domain and domain-adjacent DFDs.</p><p><strong>Conclusion: </strong>A suite of IQ metrics, including DFDs and HFEN, should be used alongside commonly-reported IQ metrics for a more holistic evaluation of MR image reconstruction perceptual quality. We also observe that general vision encoders are capable of assessing visual IQ even for MR images.</p>","PeriodicalId":18065,"journal":{"name":"Magnetic Resonance in Medicine","volume":" ","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2025-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Magnetic Resonance in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/mrm.30437","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: Commonly used MR image quality (IQ) metrics have poor concordance with radiologist-perceived diagnostic IQ. Here, we develop and explore deep feature distances (DFDs)-distances computed in a lower-dimensional feature space encoded by a convolutional neural network (CNN)-as improved perceptual IQ metrics for MR image reconstruction. We further explore the impact of distribution shifts between images in the DFD CNN encoder training data and the IQ metric evaluation.
Methods: We compare commonly used IQ metrics (PSNR and SSIM) to two "out-of-domain" DFDs with encoders trained on natural images, an "in-domain" DFD trained on MR images alone, and two domain-adjacent DFDs trained on large medical imaging datasets. We additionally compare these with several state-of-the-art but less commonly reported IQ metrics, visual information fidelity (VIF), noise quality metric (NQM), and the high-frequency error norm (HFEN). IQ metric performance is assessed via correlations with five expert radiologist reader scores of perceived diagnostic IQ of various accelerated MR image reconstructions. We characterize the behavior of these IQ metrics under common distortions expected during image acquisition, including their sensitivity to acquisition noise.
Results: All DFDs and HFEN correlate more strongly with radiologist-perceived diagnostic IQ than SSIM, PSNR, and other state-of-the-art metrics, with correlations being comparable to radiologist inter-reader variability. Surprisingly, out-of-domain DFDs perform comparably to in-domain and domain-adjacent DFDs.
Conclusion: A suite of IQ metrics, including DFDs and HFEN, should be used alongside commonly-reported IQ metrics for a more holistic evaluation of MR image reconstruction perceptual quality. We also observe that general vision encoders are capable of assessing visual IQ even for MR images.
期刊介绍:
Magnetic Resonance in Medicine (Magn Reson Med) is an international journal devoted to the publication of original investigations concerned with all aspects of the development and use of nuclear magnetic resonance and electron paramagnetic resonance techniques for medical applications. Reports of original investigations in the areas of mathematics, computing, engineering, physics, biophysics, chemistry, biochemistry, and physiology directly relevant to magnetic resonance will be accepted, as well as methodology-oriented clinical studies.