TARGETING UNDERREPRESENTED POPULATIONS IN PRECISION MEDICINE: A FEDERATED TRANSFER LEARNING APPROACH.

IF 1.3 4区 数学 Q2 STATISTICS & PROBABILITY Annals of Applied Statistics Pub Date : 2023-12-01 Epub Date: 2023-10-30 DOI:10.1214/23-AOAS1747
By Sai Li, Tianxi Cai, Rui Duan
{"title":"TARGETING UNDERREPRESENTED POPULATIONS IN PRECISION MEDICINE: A FEDERATED TRANSFER LEARNING APPROACH.","authors":"By Sai Li, Tianxi Cai, Rui Duan","doi":"10.1214/23-AOAS1747","DOIUrl":null,"url":null,"abstract":"<p><p>The limited representation of minorities and disadvantaged populations in large-scale clinical and genomics research poses a significant barrier to translating precision medicine research into practice. Prediction models are likely to underperform in underrepresented populations due to heterogeneity across populations, thereby exacerbating known health disparities. To address this issue, we propose FETA, a two-way data integration method that leverages a federated transfer learning approach to integrate heterogeneous data from diverse populations and multiple healthcare institutions, with a focus on a target population of interest having limited sample sizes. We show that FETA achieves performance comparable to the pooled analysis, where individual-level data is shared across institutions, with only a small number of communications across participating sites. Our theoretical analysis and simulation study demonstrate how FETA's estimation accuracy is influenced by communication budgets, privacy restrictions, and heterogeneity across populations. We apply FETA to multisite data from the electronic Medical Records and Genomics (eMERGE) Network to construct genetic risk prediction models for extreme obesity. Compared to models trained using target data only, source data only, and all data without accounting for population-level differences, FETA shows superior predictive performance. FETA has the potential to improve estimation and prediction accuracy in underrepresented populations and reduce the gap in model performance across populations.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"17 4","pages":"2970-2992"},"PeriodicalIF":1.3000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11417462/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Applied Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/23-AOAS1747","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/10/30 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

Abstract

The limited representation of minorities and disadvantaged populations in large-scale clinical and genomics research poses a significant barrier to translating precision medicine research into practice. Prediction models are likely to underperform in underrepresented populations due to heterogeneity across populations, thereby exacerbating known health disparities. To address this issue, we propose FETA, a two-way data integration method that leverages a federated transfer learning approach to integrate heterogeneous data from diverse populations and multiple healthcare institutions, with a focus on a target population of interest having limited sample sizes. We show that FETA achieves performance comparable to the pooled analysis, where individual-level data is shared across institutions, with only a small number of communications across participating sites. Our theoretical analysis and simulation study demonstrate how FETA's estimation accuracy is influenced by communication budgets, privacy restrictions, and heterogeneity across populations. We apply FETA to multisite data from the electronic Medical Records and Genomics (eMERGE) Network to construct genetic risk prediction models for extreme obesity. Compared to models trained using target data only, source data only, and all data without accounting for population-level differences, FETA shows superior predictive performance. FETA has the potential to improve estimation and prediction accuracy in underrepresented populations and reduce the gap in model performance across populations.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
针对精准医疗中代表性不足的人群:一种联合转移学习方法。
少数民族和弱势群体在大规模临床和基因组学研究中的代表性有限,这对将精准医学研究转化为实践构成了重大障碍。由于人群间的异质性,预测模型在代表性不足的人群中很可能表现不佳,从而加剧已知的健康差异。为了解决这个问题,我们提出了一种双向数据整合方法 FETA,它利用联合迁移学习方法整合来自不同人群和多个医疗机构的异构数据,重点关注样本量有限的目标人群。我们的研究表明,FETA 的性能可与汇集分析相媲美,在汇集分析中,各机构共享个人层面的数据,而各参与机构之间只需进行少量沟通。我们的理论分析和模拟研究证明了 FETA 的估计准确性如何受到通信预算、隐私限制和不同人群异质性的影响。我们将 FETA 应用于电子病历和基因组学(eMERGE)网络的多站点数据,以构建极度肥胖的遗传风险预测模型。与仅使用目标数据、仅使用源数据和不考虑人群水平差异的所有数据训练的模型相比,FETA 显示出更优越的预测性能。FETA 有潜力提高对代表性不足人群的估计和预测准确性,并缩小不同人群之间模型性能的差距。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Annals of Applied Statistics
Annals of Applied Statistics 社会科学-统计学与概率论
CiteScore
3.10
自引率
5.60%
发文量
131
审稿时长
6-12 weeks
期刊介绍: Statistical research spans an enormous range from direct subject-matter collaborations to pure mathematical theory. The Annals of Applied Statistics, the newest journal from the IMS, is aimed at papers in the applied half of this range. Published quarterly in both print and electronic form, our goal is to provide a timely and unified forum for all areas of applied statistics.
期刊最新文献
PATIENT RECRUITMENT USING ELECTRONIC HEALTH RECORDS UNDER SELECTION BIAS: A TWO-PHASE SAMPLING FRAMEWORK. A NONPARAMETRIC MIXED-EFFECTS MIXTURE MODEL FOR PATTERNS OF CLINICAL MEASUREMENTS ASSOCIATED WITH COVID-19. A bootstrap model comparison test for identifying genes with context-specific patterns of genetic regulation. BIVARIATE FUNCTIONAL PATTERNS OF LIFETIME MEDICARE COSTS AMONG ESRD PATIENTS. EXPOSURE EFFECTS ON COUNT OUTCOMES WITH OBSERVATIONAL DATA, WITH APPLICATION TO INCARCERATED WOMEN.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1