Deep Convolutional Backbone Comparison for Automated PET Image Quality Assessment

IF 4.6 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING IEEE Transactions on Radiation and Plasma Medical Sciences Pub Date : 2024-08-01 DOI:10.1109/TRPMS.2024.3436697

Jessica B. Hopson;Anthime Flaus;Colm J. McGinnity;Radhouene Neji;Andrew J. Reader;Alexander Hammers

{"title":"Deep Convolutional Backbone Comparison for Automated PET Image Quality Assessment","authors":"Jessica B. Hopson;Anthime Flaus;Colm J. McGinnity;Radhouene Neji;Andrew J. Reader;Alexander Hammers","doi":"10.1109/TRPMS.2024.3436697","DOIUrl":null,"url":null,"abstract":"Pretraining deep convolutional network mappings using natural images helps with medical imaging analysis tasks; this is important given the limited number of clinically annotated medical images. Many 2-D pretrained backbone networks, however, are currently available. This work compared 18 different backbones from 5 architecture groups (pretrained on ImageNet) for the task of assessing [18F]FDG brain positron emission tomography (PET) image quality (reconstructed at seven simulated doses), based on three clinical image quality metrics (global quality rating, pattern recognition, and diagnostic confidence). Using 2-D randomly sampled patches, up to eight patients (at three dose levels each) were used for training, with three separate patient datasets used for testing. Each backbone was trained five times with the same training and validation sets, and with six cross-folds. Training only the final fully connected layer (with ~6000–20000 trainable parameters) achieved a test mean-absolute-error (MAE) of ~0.5 (which was within the intrinsic uncertainty of clinical scoring). To compare “classical” and over-parameterized regimes, the pretrained weights of the last 40% of the network layers were then unfrozen. The MAE fell below 0.5 for 14 out of the 18 backbones assessed, including two that previously failed to train. Generally, backbones with residual units (e.g., DenseNets and ResNetV2s), were suited to this task, in terms of achieving the lowest MAE at test time (~0.45–0.5). This proof-of-concept study shows that over-parameterization may also be important for automated PET image quality assessments.","PeriodicalId":46807,"journal":{"name":"IEEE Transactions on Radiation and Plasma Medical Sciences","volume":"8 8","pages":"893-901"},"PeriodicalIF":4.6000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Radiation and Plasma Medical Sciences","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10620309/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Pretraining deep convolutional network mappings using natural images helps with medical imaging analysis tasks; this is important given the limited number of clinically annotated medical images. Many 2-D pretrained backbone networks, however, are currently available. This work compared 18 different backbones from 5 architecture groups (pretrained on ImageNet) for the task of assessing [18F]FDG brain positron emission tomography (PET) image quality (reconstructed at seven simulated doses), based on three clinical image quality metrics (global quality rating, pattern recognition, and diagnostic confidence). Using 2-D randomly sampled patches, up to eight patients (at three dose levels each) were used for training, with three separate patient datasets used for testing. Each backbone was trained five times with the same training and validation sets, and with six cross-folds. Training only the final fully connected layer (with ~6000–20000 trainable parameters) achieved a test mean-absolute-error (MAE) of ~0.5 (which was within the intrinsic uncertainty of clinical scoring). To compare “classical” and over-parameterized regimes, the pretrained weights of the last 40% of the network layers were then unfrozen. The MAE fell below 0.5 for 14 out of the 18 backbones assessed, including two that previously failed to train. Generally, backbones with residual units (e.g., DenseNets and ResNetV2s), were suited to this task, in terms of achieving the lowest MAE at test time (~0.45–0.5). This proof-of-concept study shows that over-parameterization may also be important for automated PET image quality assessments.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于 PET 图像质量自动评估的深度卷积骨干比较。

利用自然图像预训练深度卷积网络映射有助于医学影像分析任务；鉴于临床注释医学图像的数量有限，这一点非常重要。然而，目前有许多二维预训练骨干网络。这项研究比较了来自 5 个架构组（在 ImageNet 上经过预训练）的 18 种不同骨干网络，根据三种临床图像质量指标（全局质量评级、模式识别和诊断可信度），评估 [18F]FDG 脑正电子发射透射（PET）图像质量（按七种模拟剂量重建）。使用二维随机抽样斑块，对多达八名患者（每名患者三个剂量水平）进行训练，并使用三个独立的患者数据集进行测试。每个骨干层使用相同的训练集和验证集以及六个交叉褶皱训练五次。只训练最后的全连接层（可训练参数约为 6,000-20,000 个），测试平均绝对误差约为 0.5（在临床评分的内在不确定性范围内）。为了比较 "经典 "和过度参数化机制，对最后 40% 网络层的预训练权重进行了解冻。在接受评估的 18 个骨干网中，有 14 个骨干网的平均绝对误差低于 0.5，其中包括两个之前训练失败的骨干网。一般来说，具有残余单元的骨干网（如 DenseNets 和 ResNetV2）适合这项任务，在测试时可获得最低的平均绝对误差（~0.45 - 0.5）。这项概念验证研究表明，过度参数化对 PET 图像质量自动评估也很重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊