大脑表型预测外部验证的有效性和可重复性

IF 21.4 1区心理学 Q1 MULTIDISCIPLINARY SCIENCES Nature Human Behaviour Pub Date : 2024-07-31 DOI:10.1038/s41562-024-01931-7

Matthew Rosenblatt, Link Tejavibulya, Huili Sun, Chris C. Camp, Milana Khaitova, Brendan D. Adkinson, Rongtao Jiang, Margaret L. Westwater, Stephanie Noble, Dustin Scheinost

{"title":"大脑表型预测外部验证的有效性和可重复性","authors":"Matthew Rosenblatt, Link Tejavibulya, Huili Sun, Chris C. Camp, Milana Khaitova, Brendan D. Adkinson, Rongtao Jiang, Margaret L. Westwater, Stephanie Noble, Dustin Scheinost","doi":"10.1038/s41562-024-01931-7","DOIUrl":null,"url":null,"abstract":"Brain-phenotype predictive models seek to identify reproducible and generalizable brain-phenotype associations. External validation, or the evaluation of a model in external datasets, is the gold standard in evaluating the generalizability of models in neuroimaging. Unlike typical studies, external validation involves two sample sizes: the training and the external sample sizes. Thus, traditional power calculations may not be appropriate. Here we ran over 900 million resampling-based simulations in functional and structural connectivity data to investigate the relationship between training sample size, external sample size, phenotype effect size, theoretical power and simulated power. Our analysis included a wide range of datasets: the Healthy Brain Network, the Adolescent Brain Cognitive Development Study, the Human Connectome Project (Development and Young Adult), the Philadelphia Neurodevelopmental Cohort, the Queensland Twin Adolescent Brain Project, and the Chinese Human Connectome Project; and phenotypes: age, body mass index, matrix reasoning, working memory, attention problems, anxiety/depression symptoms and relational processing. High effect size predictions achieved adequate power with training and external sample sizes of a few hundred individuals, whereas low and medium effect size predictions required hundreds to thousands of training and external samples. In addition, most previous external validation studies used sample sizes prone to low power, and theoretical power curves should be adjusted for the training sample size. Furthermore, model performance in internal validation often informed subsequent external validation performance (Pearson’s r difference <0.2), particularly for well-harmonized datasets. These results could help decide how to power future external validation studies. Rosenblatt et al. run over 900 million resampling-based simulations in functional and structural connectivity data to show that low and medium effect size predictions require training and external samples in the hundreds to thousands of participants.","PeriodicalId":19074,"journal":{"name":"Nature Human Behaviour","volume":null,"pages":null},"PeriodicalIF":21.4000,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Power and reproducibility in the external validation of brain-phenotype predictions\",\"authors\":\"Matthew Rosenblatt, Link Tejavibulya, Huili Sun, Chris C. Camp, Milana Khaitova, Brendan D. Adkinson, Rongtao Jiang, Margaret L. Westwater, Stephanie Noble, Dustin Scheinost\",\"doi\":\"10.1038/s41562-024-01931-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Brain-phenotype predictive models seek to identify reproducible and generalizable brain-phenotype associations. External validation, or the evaluation of a model in external datasets, is the gold standard in evaluating the generalizability of models in neuroimaging. Unlike typical studies, external validation involves two sample sizes: the training and the external sample sizes. Thus, traditional power calculations may not be appropriate. Here we ran over 900 million resampling-based simulations in functional and structural connectivity data to investigate the relationship between training sample size, external sample size, phenotype effect size, theoretical power and simulated power. Our analysis included a wide range of datasets: the Healthy Brain Network, the Adolescent Brain Cognitive Development Study, the Human Connectome Project (Development and Young Adult), the Philadelphia Neurodevelopmental Cohort, the Queensland Twin Adolescent Brain Project, and the Chinese Human Connectome Project; and phenotypes: age, body mass index, matrix reasoning, working memory, attention problems, anxiety/depression symptoms and relational processing. High effect size predictions achieved adequate power with training and external sample sizes of a few hundred individuals, whereas low and medium effect size predictions required hundreds to thousands of training and external samples. In addition, most previous external validation studies used sample sizes prone to low power, and theoretical power curves should be adjusted for the training sample size. Furthermore, model performance in internal validation often informed subsequent external validation performance (Pearson’s r difference <0.2), particularly for well-harmonized datasets. These results could help decide how to power future external validation studies. Rosenblatt et al. run over 900 million resampling-based simulations in functional and structural connectivity data to show that low and medium effect size predictions require training and external samples in the hundreds to thousands of participants.\",\"PeriodicalId\":19074,\"journal\":{\"name\":\"Nature Human Behaviour\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":21.4000,\"publicationDate\":\"2024-07-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nature Human Behaviour\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://www.nature.com/articles/s41562-024-01931-7\",\"RegionNum\":1,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Human Behaviour","FirstCategoryId":"102","ListUrlMain":"https://www.nature.com/articles/s41562-024-01931-7","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

脑表型预测模型旨在确定可重复和可推广的脑表型关联。外部验证，即在外部数据集中对模型进行评估，是评估神经影像学模型可推广性的黄金标准。与一般研究不同，外部验证涉及两个样本量：训练和外部样本量。因此，传统的功率计算可能并不合适。在此，我们对功能和结构连接数据进行了超过 9 亿次基于重采样的模拟，以研究训练样本大小、外部样本大小、表型效应大小、理论功率和模拟功率之间的关系。我们的分析包括一系列数据集：健康大脑网络、青少年大脑认知发展研究、人类连接组计划（发育和青少年）、费城神经发育队列、昆士兰双胞胎青少年大脑计划和中国人类连接组计划；以及表型：年龄、体重指数、矩阵推理、工作记忆、注意力问题、焦虑/抑郁症状和关系处理。高效应规模预测只需几百人的训练和外部样本量就能达到足够的功率，而中低效应规模预测则需要几百到几千人的训练和外部样本。此外，之前的大多数外部验证研究使用的样本量容易导致低功率，理论功率曲线应根据训练样本量进行调整。此外，模型在内部验证中的表现往往会影响到随后的外部验证表现（皮尔逊r差值为0.2），尤其是对于协调性较好的数据集。这些结果有助于决定未来如何进行外部验证研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Power and reproducibility in the external validation of brain-phenotype predictions

Brain-phenotype predictive models seek to identify reproducible and generalizable brain-phenotype associations. External validation, or the evaluation of a model in external datasets, is the gold standard in evaluating the generalizability of models in neuroimaging. Unlike typical studies, external validation involves two sample sizes: the training and the external sample sizes. Thus, traditional power calculations may not be appropriate. Here we ran over 900 million resampling-based simulations in functional and structural connectivity data to investigate the relationship between training sample size, external sample size, phenotype effect size, theoretical power and simulated power. Our analysis included a wide range of datasets: the Healthy Brain Network, the Adolescent Brain Cognitive Development Study, the Human Connectome Project (Development and Young Adult), the Philadelphia Neurodevelopmental Cohort, the Queensland Twin Adolescent Brain Project, and the Chinese Human Connectome Project; and phenotypes: age, body mass index, matrix reasoning, working memory, attention problems, anxiety/depression symptoms and relational processing. High effect size predictions achieved adequate power with training and external sample sizes of a few hundred individuals, whereas low and medium effect size predictions required hundreds to thousands of training and external samples. In addition, most previous external validation studies used sample sizes prone to low power, and theoretical power curves should be adjusted for the training sample size. Furthermore, model performance in internal validation often informed subsequent external validation performance (Pearson’s r difference <0.2), particularly for well-harmonized datasets. These results could help decide how to power future external validation studies. Rosenblatt et al. run over 900 million resampling-based simulations in functional and structural connectivity data to show that low and medium effect size predictions require training and external samples in the hundreds to thousands of participants.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Nature Human Behaviour Psychology-Social Psychology

CiteScore

36.80

自引率

1.00%

发文量

227

期刊介绍： Nature Human Behaviour is a journal that focuses on publishing research of outstanding significance into any aspect of human behavior.The research can cover various areas such as psychological, biological, and social bases of human behavior.It also includes the study of origins, development, and disorders related to human behavior.The primary aim of the journal is to increase the visibility of research in the field and enhance its societal reach and impact.