Chenwei Tang, Laura B Eisenmenger, Leonardo Rivera-Rivera, Eugene Huo, Jacqueline C Junn, Anthony D Kuner, Thekla H Oechtering, Anthony Peret, Jitka Starekova, Kevin M Johnson
{"title":"Incorporating Radiologist Knowledge Into MRI Quality Metrics for Machine Learning Using Rank-Based Ratings.","authors":"Chenwei Tang, Laura B Eisenmenger, Leonardo Rivera-Rivera, Eugene Huo, Jacqueline C Junn, Anthony D Kuner, Thekla H Oechtering, Anthony Peret, Jitka Starekova, Kevin M Johnson","doi":"10.1002/jmri.29672","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Deep learning (DL) often requires an image quality metric; however, widely used metrics are not designed for medical images.</p><p><strong>Purpose: </strong>To develop an image quality metric that is specific to MRI using radiologists image rankings and DL models.</p><p><strong>Study type: </strong>Retrospective.</p><p><strong>Population: </strong>A total of 19,344 rankings on 2916 unique image pairs from the NYU fastMRI Initiative neuro database was used for the neural network-based image quality metrics training with an 80%/20% training/validation split and fivefold cross-validation.</p><p><strong>Field strength/sequence: </strong>1.5 T and 3 T T1, T1 postcontrast, T2, and FLuid Attenuated Inversion Recovery (FLAIR).</p><p><strong>Assessment: </strong>Synthetically corrupted image pairs were ranked by radiologists (N = 7), with a subset also scoring images using a Likert scale (N = 2). DL models were trained to match rankings using two architectures (EfficientNet and IQ-Net) with and without reference image subtraction and compared to ranking based on mean squared error (MSE) and structural similarity (SSIM). Image quality assessing DL models were evaluated as alternatives to MSE and SSIM as optimization targets for DL denoising and reconstruction.</p><p><strong>Statistical tests: </strong>Radiologists' agreement was assessed by a percentage metric and quadratic weighted Cohen's kappa. Ranking accuracies were compared using repeated measurements analysis of variance. Reconstruction models trained with IQ-Net score, MSE and SSIM were compared by paired t test. P < 0.05 was considered significant.</p><p><strong>Results: </strong>Compared to direct Likert scoring, ranking produced a higher level of agreement between radiologists (70.4% vs. 25%). Image ranking was subjective with a high level of intraobserver agreement ( <math> <semantics><mrow><mn>94.9</mn> <mo>%</mo> <mo>±</mo> <mn>2.4</mn> <mo>%</mo></mrow> <annotation>$$ 94.9\\%\\pm 2.4\\% $$</annotation></semantics> </math> ) and lower interobserver agreement ( <math> <semantics><mrow><mn>61.47</mn> <mo>%</mo> <mo>±</mo> <mn>5.51</mn> <mo>%</mo></mrow> <annotation>$$ 61.47\\%\\pm 5.51\\% $$</annotation></semantics> </math> ). IQ-Net and EfficientNet accurately predicted rankings with a reference image ( <math> <semantics><mrow><mn>75.2</mn> <mo>%</mo> <mo>±</mo> <mn>1.3</mn> <mo>%</mo></mrow> <annotation>$$ 75.2\\%\\pm 1.3\\% $$</annotation></semantics> </math> and <math> <semantics><mrow><mn>79.2</mn> <mo>%</mo> <mo>±</mo> <mn>1.7</mn> <mo>%</mo></mrow> <annotation>$$ 79.2\\%\\pm 1.7\\% $$</annotation></semantics> </math> ). However, EfficientNet resulted in images with artifacts and high MSE when used in denoising tasks while IQ-Net optimized networks performed well for both denoising and reconstruction tasks.</p><p><strong>Data conclusion: </strong>Image quality networks can be trained from image ranking and used to optimize DL tasks.</p><p><strong>Level of evidence: </strong>3 TECHNICAL EFFICACY: Stage 1.</p>","PeriodicalId":16140,"journal":{"name":"Journal of Magnetic Resonance Imaging","volume":" ","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Magnetic Resonance Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/jmri.29672","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Deep learning (DL) often requires an image quality metric; however, widely used metrics are not designed for medical images.
Purpose: To develop an image quality metric that is specific to MRI using radiologists image rankings and DL models.
Study type: Retrospective.
Population: A total of 19,344 rankings on 2916 unique image pairs from the NYU fastMRI Initiative neuro database was used for the neural network-based image quality metrics training with an 80%/20% training/validation split and fivefold cross-validation.
Field strength/sequence: 1.5 T and 3 T T1, T1 postcontrast, T2, and FLuid Attenuated Inversion Recovery (FLAIR).
Assessment: Synthetically corrupted image pairs were ranked by radiologists (N = 7), with a subset also scoring images using a Likert scale (N = 2). DL models were trained to match rankings using two architectures (EfficientNet and IQ-Net) with and without reference image subtraction and compared to ranking based on mean squared error (MSE) and structural similarity (SSIM). Image quality assessing DL models were evaluated as alternatives to MSE and SSIM as optimization targets for DL denoising and reconstruction.
Statistical tests: Radiologists' agreement was assessed by a percentage metric and quadratic weighted Cohen's kappa. Ranking accuracies were compared using repeated measurements analysis of variance. Reconstruction models trained with IQ-Net score, MSE and SSIM were compared by paired t test. P < 0.05 was considered significant.
Results: Compared to direct Likert scoring, ranking produced a higher level of agreement between radiologists (70.4% vs. 25%). Image ranking was subjective with a high level of intraobserver agreement ( ) and lower interobserver agreement ( ). IQ-Net and EfficientNet accurately predicted rankings with a reference image ( and ). However, EfficientNet resulted in images with artifacts and high MSE when used in denoising tasks while IQ-Net optimized networks performed well for both denoising and reconstruction tasks.
Data conclusion: Image quality networks can be trained from image ranking and used to optimize DL tasks.
期刊介绍:
The Journal of Magnetic Resonance Imaging (JMRI) is an international journal devoted to the timely publication of basic and clinical research, educational and review articles, and other information related to the diagnostic applications of magnetic resonance.