具有协变量异质性和批效应的多中心研究的部分线性综合模型

IF 1.2 4区 数学 Q2 STATISTICS & PROBABILITY Statistics Pub Date : 2023-09-03 DOI:10.1080/02331888.2023.2258429
Lei Yang, Yongzhao Shao
{"title":"具有协变量异质性和批效应的多中心研究的部分线性综合模型","authors":"Lei Yang, Yongzhao Shao","doi":"10.1080/02331888.2023.2258429","DOIUrl":null,"url":null,"abstract":"AbstractMulti-centre study is increasingly used for borrowing strength from multiple research groups to obtain reproducible study findings. Regression analysis is widely used for analysing multi-group studies, however, some of the regression predictors are nonlinear and/or often measured with batch effects. Also, the group compositions are potentially heterogeneous across different centres. The conventional pooled data analysis can cause biased regression estimates. This paper proposes an integrated partially linear regression model (IPLM) to account for predictor's nonlinearity, general batch effect, group composition heterogeneity, and potential measurement-error in covariates simultaneously. A local linear regression-based approach is employed to estimate the nonlinear component and a regularization procedure is introduced to identify the predictors' effects. The IPLM-based method has estimation consistency and variable-selection consistency. Moreover, it has a fast computing algorithm and its effectiveness is supported by simulation studies. A multi-centre Alzheimer's disease research project is provided to illustrate the proposed IPLM-based analysis.Keywords: Multi-centre studydata harmonizationpartially linear regression modelgeneral batch effectsgroup composition heterogeneity AcknowledgementsThe authors would like to thank the reviewers and the associate editor for careful reading and for many constructive suggestions. The authors would like to thank Drs. Mony de Leon, Ricardo Osorio, and Elizabeth Pirraglia for sharing with us the NYU Alzheimer's disease data sets used in Section 5 for the illustration of our proposed model and analysis. The NYU study data are available from figshare (https://figshare.com/s/16d233d4822b810bcd9b, DOI: 10.6084/m9.figshare.5758554). One part of the data used in the preparation of the example in Section 5 of this article was obtained from the Alzheimers Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu/data-samples/access-data/). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in the design, analysis or writing of this report. A complete list of ADNI investigators is at: http://adni.loni.usc.edu/wpcontent/uploads/how to apply/ADNI Acknowledgement List.pdf.Disclosure statementNo potential conflict of interest was reported by the author(s).Additional informationFundingThis research was partially supported by the United States National Institute of Health grants (NIA grants P30AG066512, P01AG060882, NCI grants P50CA225450, P30CA016087) and Center for Disease Control and Prevention (CDC) grant U01OH012486.","PeriodicalId":54358,"journal":{"name":"Statistics","volume":"37 1","pages":"0"},"PeriodicalIF":1.2000,"publicationDate":"2023-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Integrated partially linear model for multi-centre studies with heterogeneity and batch effect in covariates\",\"authors\":\"Lei Yang, Yongzhao Shao\",\"doi\":\"10.1080/02331888.2023.2258429\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"AbstractMulti-centre study is increasingly used for borrowing strength from multiple research groups to obtain reproducible study findings. Regression analysis is widely used for analysing multi-group studies, however, some of the regression predictors are nonlinear and/or often measured with batch effects. Also, the group compositions are potentially heterogeneous across different centres. The conventional pooled data analysis can cause biased regression estimates. This paper proposes an integrated partially linear regression model (IPLM) to account for predictor's nonlinearity, general batch effect, group composition heterogeneity, and potential measurement-error in covariates simultaneously. A local linear regression-based approach is employed to estimate the nonlinear component and a regularization procedure is introduced to identify the predictors' effects. The IPLM-based method has estimation consistency and variable-selection consistency. Moreover, it has a fast computing algorithm and its effectiveness is supported by simulation studies. A multi-centre Alzheimer's disease research project is provided to illustrate the proposed IPLM-based analysis.Keywords: Multi-centre studydata harmonizationpartially linear regression modelgeneral batch effectsgroup composition heterogeneity AcknowledgementsThe authors would like to thank the reviewers and the associate editor for careful reading and for many constructive suggestions. The authors would like to thank Drs. Mony de Leon, Ricardo Osorio, and Elizabeth Pirraglia for sharing with us the NYU Alzheimer's disease data sets used in Section 5 for the illustration of our proposed model and analysis. The NYU study data are available from figshare (https://figshare.com/s/16d233d4822b810bcd9b, DOI: 10.6084/m9.figshare.5758554). One part of the data used in the preparation of the example in Section 5 of this article was obtained from the Alzheimers Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu/data-samples/access-data/). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in the design, analysis or writing of this report. A complete list of ADNI investigators is at: http://adni.loni.usc.edu/wpcontent/uploads/how to apply/ADNI Acknowledgement List.pdf.Disclosure statementNo potential conflict of interest was reported by the author(s).Additional informationFundingThis research was partially supported by the United States National Institute of Health grants (NIA grants P30AG066512, P01AG060882, NCI grants P50CA225450, P30CA016087) and Center for Disease Control and Prevention (CDC) grant U01OH012486.\",\"PeriodicalId\":54358,\"journal\":{\"name\":\"Statistics\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2023-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/02331888.2023.2258429\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/02331888.2023.2258429","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

摘要

摘要多中心研究越来越多地利用多个研究小组的力量来获得可重复的研究结果。回归分析被广泛用于分析多组研究,然而,一些回归预测因子是非线性的,并且/或者经常用批效应来测量。此外,在不同的中心,群体组成可能是异质的。传统的汇总数据分析可能导致偏倚回归估计。本文提出了一种综合部分线性回归模型(IPLM),以同时考虑预测因子的非线性、一般批效应、群体组成异质性和协变量中潜在的测量误差。采用基于局部线性回归的方法来估计非线性分量,并引入正则化过程来识别预测因子的影响。基于iplm的方法具有估计一致性和变量选择一致性。此外,该方法具有快速的计算算法,仿真研究证明了其有效性。提供了一个多中心阿尔茨海默病研究项目来说明所提出的基于iplm的分析。关键词:多中心研究数据协调部分线性回归模型一般批效应组组成异质性致谢作者感谢审稿人和副编辑的认真阅读和许多建设性的建议。作者们要感谢dr。mondeleon, Ricardo Osorio和Elizabeth piraglia与我们分享了第5节中使用的纽约大学阿尔茨海默病数据集,用于说明我们提出的模型和分析。纽约大学的研究数据可从figshare (https://figshare.com/s/16d233d4822b810bcd9b, DOI: 10.6084/m9.figshare.5758554)获得。本文第5节中准备示例所使用的部分数据来自阿尔茨海默病神经影像学倡议(ADNI)数据库(http://adni.loni.usc.edu/data-samples/access-data/)。因此,ADNI内部的调查人员参与了ADNI的设计和实施和/或提供了数据,但未参与本报告的设计、分析或撰写。ADNI研究人员的完整名单可在:http://adni.loni.usc.edu/wpcontent/uploads/how申请/ADNI确认名单。pdf.披露声明作者未报告潜在的利益冲突。本研究得到了美国国立卫生研究院拨款(NIA拨款P30AG066512, P01AG060882, NCI拨款P50CA225450, P30CA016087)和疾病控制与预防中心(CDC)拨款U01OH012486的部分支持。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Integrated partially linear model for multi-centre studies with heterogeneity and batch effect in covariates
AbstractMulti-centre study is increasingly used for borrowing strength from multiple research groups to obtain reproducible study findings. Regression analysis is widely used for analysing multi-group studies, however, some of the regression predictors are nonlinear and/or often measured with batch effects. Also, the group compositions are potentially heterogeneous across different centres. The conventional pooled data analysis can cause biased regression estimates. This paper proposes an integrated partially linear regression model (IPLM) to account for predictor's nonlinearity, general batch effect, group composition heterogeneity, and potential measurement-error in covariates simultaneously. A local linear regression-based approach is employed to estimate the nonlinear component and a regularization procedure is introduced to identify the predictors' effects. The IPLM-based method has estimation consistency and variable-selection consistency. Moreover, it has a fast computing algorithm and its effectiveness is supported by simulation studies. A multi-centre Alzheimer's disease research project is provided to illustrate the proposed IPLM-based analysis.Keywords: Multi-centre studydata harmonizationpartially linear regression modelgeneral batch effectsgroup composition heterogeneity AcknowledgementsThe authors would like to thank the reviewers and the associate editor for careful reading and for many constructive suggestions. The authors would like to thank Drs. Mony de Leon, Ricardo Osorio, and Elizabeth Pirraglia for sharing with us the NYU Alzheimer's disease data sets used in Section 5 for the illustration of our proposed model and analysis. The NYU study data are available from figshare (https://figshare.com/s/16d233d4822b810bcd9b, DOI: 10.6084/m9.figshare.5758554). One part of the data used in the preparation of the example in Section 5 of this article was obtained from the Alzheimers Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu/data-samples/access-data/). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in the design, analysis or writing of this report. A complete list of ADNI investigators is at: http://adni.loni.usc.edu/wpcontent/uploads/how to apply/ADNI Acknowledgement List.pdf.Disclosure statementNo potential conflict of interest was reported by the author(s).Additional informationFundingThis research was partially supported by the United States National Institute of Health grants (NIA grants P30AG066512, P01AG060882, NCI grants P50CA225450, P30CA016087) and Center for Disease Control and Prevention (CDC) grant U01OH012486.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Statistics
Statistics 数学-统计学与概率论
CiteScore
1.00
自引率
0.00%
发文量
59
审稿时长
12 months
期刊介绍: Statistics publishes papers developing and analysing new methods for any active field of statistics, motivated by real-life problems. Papers submitted for consideration should provide interesting and novel contributions to statistical theory and its applications with rigorous mathematical results and proofs. Moreover, numerical simulations and application to real data sets can improve the quality of papers, and should be included where appropriate. Statistics does not publish papers which represent mere application of existing procedures to case studies, and papers are required to contain methodological or theoretical innovation. Topics of interest include, for example, nonparametric statistics, time series, analysis of topological or functional data. Furthermore the journal also welcomes submissions in the field of theoretical econometrics and its links to mathematical statistics.
期刊最新文献
Robust estimator of the ruin probability in infinite time for heavy-tailed distributions Gaussian modeling with B-splines for spatial functional data on irregular domains A note on the asymptotic behavior of a mildly unstable integer-valued AR(1) model Explainable machine learning for financial risk management: two practical use cases Online updating Huber robust regression for big data streams
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1