Imputing Brain Measurements Across Data Sets via Graph Neural Networks.

PRedictive Intelligence in MEdicine. PRIME (Workshop) Pub Date : 2023-01-01 DOI:10.1007/978-3-031-46005-0_15

Yixin Wang, Wei Peng, Susan F Tapert, Qingyu Zhao, Kilian M Pohl

{"title":"Imputing Brain Measurements Across Data Sets via Graph Neural Networks.","authors":"Yixin Wang, Wei Peng, Susan F Tapert, Qingyu Zhao, Kilian M Pohl","doi":"10.1007/978-3-031-46005-0_15","DOIUrl":null,"url":null,"abstract":"Publicly available data sets of structural MRIs might not contain specific measurements of brain Regions of Interests (ROIs) that are important for training machine learning models. For example, the curvature scores computed by Freesurfer are not released by the Adolescent Brain Cognitive Development (ABCD) Study. One can address this issue by simply reapplying Freesurfer to the data set. However, this approach is generally computationally and labor intensive (e.g., requiring quality control). An alternative is to impute the missing measurements via a deep learning approach. However, the state-of-the-art is designed to estimate randomly missing values rather than entire measurements. We therefore propose to re-frame the imputation problem as a prediction task on another (public) data set that contains the missing measurements and shares some ROI measurements with the data sets of interest. A deep learning model is then trained to predict the missing measurements from the shared ones and afterwards is applied to the other data sets. Our proposed algorithm models the dependencies between ROI measurements via a graph neural network (GNN) and accounts for demographic differences in brain measurements (e.g. sex) by feeding the graph encoding into a parallel architecture. The architecture simultaneously optimizes a graph decoder to impute values and a classifier in predicting demographic factors. We test the approach, called Demographic Aware Graph-based Imputation (DAGI), on imputing those missing Freesurfer measurements of ABCD (N=3760; minimum age 12 years) by training the predictor on those publicly released by the National Consortium on Alcohol and Neurodevelopment in Adolescence (NCANDA, N=540). 5-fold cross-validation on NCANDA reveals that the imputed scores are more accurate than those generated by linear regressors and deep learning models. Adding them also to a classifier trained in identifying sex results in higher accuracy than only using those Freesurfer scores provided by ABCD.","PeriodicalId":92572,"journal":{"name":"PRedictive Intelligence in MEdicine. PRIME (Workshop)","volume":"14277 ","pages":"172-183"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10634632/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PRedictive Intelligence in MEdicine. PRIME (Workshop)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/978-3-031-46005-0_15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Publicly available data sets of structural MRIs might not contain specific measurements of brain Regions of Interests (ROIs) that are important for training machine learning models. For example, the curvature scores computed by Freesurfer are not released by the Adolescent Brain Cognitive Development (ABCD) Study. One can address this issue by simply reapplying Freesurfer to the data set. However, this approach is generally computationally and labor intensive (e.g., requiring quality control). An alternative is to impute the missing measurements via a deep learning approach. However, the state-of-the-art is designed to estimate randomly missing values rather than entire measurements. We therefore propose to re-frame the imputation problem as a prediction task on another (public) data set that contains the missing measurements and shares some ROI measurements with the data sets of interest. A deep learning model is then trained to predict the missing measurements from the shared ones and afterwards is applied to the other data sets. Our proposed algorithm models the dependencies between ROI measurements via a graph neural network (GNN) and accounts for demographic differences in brain measurements (e.g. sex) by feeding the graph encoding into a parallel architecture. The architecture simultaneously optimizes a graph decoder to impute values and a classifier in predicting demographic factors. We test the approach, called Demographic Aware Graph-based Imputation (DAGI), on imputing those missing Freesurfer measurements of ABCD (N=3760; minimum age 12 years) by training the predictor on those publicly released by the National Consortium on Alcohol and Neurodevelopment in Adolescence (NCANDA, N=540). 5-fold cross-validation on NCANDA reveals that the imputed scores are more accurate than those generated by linear regressors and deep learning models. Adding them also to a classifier trained in identifying sex results in higher accuracy than only using those Freesurfer scores provided by ABCD.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过图神经网络对数据集的大脑测量进行脉冲。

公开可用的结构MRI数据集可能不包含对训练机器学习模型很重要的大脑兴趣区域（ROI）的特定测量。例如，青少年大脑认知发展研究没有公布Freesurfer计算的曲率分数。可以通过简单地将Freesurfer重新应用到数据集来解决这个问题。然而，这种方法通常是计算密集型和劳动密集型的（例如，需要质量控制）。另一种选择是通过深度学习方法估算缺失的测量值。然而，最先进的技术旨在估计随机缺失的值，而不是整个测量值。因此，我们建议将插补问题重新定义为另一个（公共）数据集的预测任务，该数据集包含缺失的测量值，并与感兴趣的数据集共享一些ROI测量值。然后训练深度学习模型以从共享的测量值中预测缺失的测量值，然后将其应用于其他数据集。我们提出的算法通过图神经网络（GNN）对ROI测量之间的依赖性进行建模，并通过将图编码输入并行架构来解释大脑测量中的人口统计学差异（例如性别）。该架构同时优化图解码器以估算值，并优化分类器以预测人口统计因素。我们测试了一种名为“基于人口统计感知图的推断”（DAGI）的方法，通过对国家青少年酒精和神经发育联合会（NCANDA，N=540）公开发布的预测因子进行训练，来推断那些缺失的ABCD自由冲浪测量值（N=3760；最低年龄12岁）。NCANDA的5倍交叉验证表明，估算的分数比线性回归和深度学习模型产生的分数更准确。将它们添加到经过性别识别训练的分类器中，比只使用ABCD提供的Freesurfer分数更准确。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

PRedictive Intelligence in MEdicine. PRIME (Workshop)

自引率

0.00%

发文量