具有协变量异质性和批效应的多中心研究的部分线性综合模型

IF 1.2 4区数学 Q2 STATISTICS & PROBABILITY Statistics Pub Date : 2023-09-03 DOI:10.1080/02331888.2023.2258429

Lei Yang, Yongzhao Shao

{"title":"具有协变量异质性和批效应的多中心研究的部分线性综合模型","authors":"Lei Yang, Yongzhao Shao","doi":"10.1080/02331888.2023.2258429","DOIUrl":null,"url":null,"abstract":"AbstractMulti-centre study is increasingly used for borrowing strength from multiple research groups to obtain reproducible study findings. Regression analysis is widely used for analysing multi-group studies, however, some of the regression predictors are nonlinear and/or often measured with batch effects. Also, the group compositions are potentially heterogeneous across different centres. The conventional pooled data analysis can cause biased regression estimates. This paper proposes an integrated partially linear regression model (IPLM) to account for predictor's nonlinearity, general batch effect, group composition heterogeneity, and potential measurement-error in covariates simultaneously. A local linear regression-based approach is employed to estimate the nonlinear component and a regularization procedure is introduced to identify the predictors' effects. The IPLM-based method has estimation consistency and variable-selection consistency. Moreover, it has a fast computing algorithm and its effectiveness is supported by simulation studies. A multi-centre Alzheimer's disease research project is provided to illustrate the proposed IPLM-based analysis.Keywords: Multi-centre studydata harmonizationpartially linear regression modelgeneral batch effectsgroup composition heterogeneity AcknowledgementsThe authors would like to thank the reviewers and the associate editor for careful reading and for many constructive suggestions. The authors would like to thank Drs. Mony de Leon, Ricardo Osorio, and Elizabeth Pirraglia for sharing with us the NYU Alzheimer's disease data sets used in Section 5 for the illustration of our proposed model and analysis. The NYU study data are available from figshare (https://figshare.com/s/16d233d4822b810bcd9b, DOI: 10.6084/m9.figshare.5758554). One part of the data used in the preparation of the example in Section 5 of this article was obtained from the Alzheimers Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu/data-samples/access-data/). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in the design, analysis or writing of this report. A complete list of ADNI investigators is at: http://adni.loni.usc.edu/wpcontent/uploads/how to apply/ADNI Acknowledgement List.pdf.Disclosure statementNo potential conflict of interest was reported by the author(s).Additional informationFundingThis research was partially supported by the United States National Institute of Health grants (NIA grants P30AG066512, P01AG060882, NCI grants P50CA225450, P30CA016087) and Center for Disease Control and Prevention (CDC) grant U01OH012486.","PeriodicalId":54358,"journal":{"name":"Statistics","volume":"37 1","pages":"0"},"PeriodicalIF":1.2000,"publicationDate":"2023-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Integrated partially linear model for multi-centre studies with heterogeneity and batch effect in covariates\",\"authors\":\"Lei Yang, Yongzhao Shao\",\"doi\":\"10.1080/02331888.2023.2258429\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"AbstractMulti-centre study is increasingly used for borrowing strength from multiple research groups to obtain reproducible study findings. Regression analysis is widely used for analysing multi-group studies, however, some of the regression predictors are nonlinear and/or often measured with batch effects. Also, the group compositions are potentially heterogeneous across different centres. The conventional pooled data analysis can cause biased regression estimates. This paper proposes an integrated partially linear regression model (IPLM) to account for predictor's nonlinearity, general batch effect, group composition heterogeneity, and potential measurement-error in covariates simultaneously. A local linear regression-based approach is employed to estimate the nonlinear component and a regularization procedure is introduced to identify the predictors' effects. The IPLM-based method has estimation consistency and variable-selection consistency. Moreover, it has a fast computing algorithm and its effectiveness is supported by simulation studies. A multi-centre Alzheimer's disease research project is provided to illustrate the proposed IPLM-based analysis.Keywords: Multi-centre studydata harmonizationpartially linear regression modelgeneral batch effectsgroup composition heterogeneity AcknowledgementsThe authors would like to thank the reviewers and the associate editor for careful reading and for many constructive suggestions. The authors would like to thank Drs. Mony de Leon, Ricardo Osorio, and Elizabeth Pirraglia for sharing with us the NYU Alzheimer's disease data sets used in Section 5 for the illustration of our proposed model and analysis. The NYU study data are available from figshare (https://figshare.com/s/16d233d4822b810bcd9b, DOI: 10.6084/m9.figshare.5758554). One part of the data used in the preparation of the example in Section 5 of this article was obtained from the Alzheimers Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu/data-samples/access-data/). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in the design, analysis or writing of this report. A complete list of ADNI investigators is at: http://adni.loni.usc.edu/wpcontent/uploads/how to apply/ADNI Acknowledgement List.pdf.Disclosure statementNo potential conflict of interest was reported by the author(s).Additional informationFundingThis research was partially supported by the United States National Institute of Health grants (NIA grants P30AG066512, P01AG060882, NCI grants P50CA225450, P30CA016087) and Center for Disease Control and Prevention (CDC) grant U01OH012486.\",\"PeriodicalId\":54358,\"journal\":{\"name\":\"Statistics\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2023-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/02331888.2023.2258429\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/02331888.2023.2258429","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

摘要

摘要多中心研究越来越多地利用多个研究小组的力量来获得可重复的研究结果。回归分析被广泛用于分析多组研究，然而，一些回归预测因子是非线性的，并且/或者经常用批效应来测量。此外，在不同的中心，群体组成可能是异质的。传统的汇总数据分析可能导致偏倚回归估计。本文提出了一种综合部分线性回归模型(IPLM)，以同时考虑预测因子的非线性、一般批效应、群体组成异质性和协变量中潜在的测量误差。采用基于局部线性回归的方法来估计非线性分量，并引入正则化过程来识别预测因子的影响。基于iplm的方法具有估计一致性和变量选择一致性。此外，该方法具有快速的计算算法，仿真研究证明了其有效性。提供了一个多中心阿尔茨海默病研究项目来说明所提出的基于iplm的分析。关键词:多中心研究数据协调部分线性回归模型一般批效应组组成异质性致谢作者感谢审稿人和副编辑的认真阅读和许多建设性的建议。作者们要感谢dr。mondeleon, Ricardo Osorio和Elizabeth piraglia与我们分享了第5节中使用的纽约大学阿尔茨海默病数据集，用于说明我们提出的模型和分析。纽约大学的研究数据可从figshare (https://figshare.com/s/16d233d4822b810bcd9b, DOI: 10.6084/m9.figshare.5758554)获得。本文第5节中准备示例所使用的部分数据来自阿尔茨海默病神经影像学倡议(ADNI)数据库(http://adni.loni.usc.edu/data-samples/access-data/)。因此，ADNI内部的调查人员参与了ADNI的设计和实施和/或提供了数据，但未参与本报告的设计、分析或撰写。ADNI研究人员的完整名单可在:http://adni.loni.usc.edu/wpcontent/uploads/how申请/ADNI确认名单。pdf.披露声明作者未报告潜在的利益冲突。本研究得到了美国国立卫生研究院拨款(NIA拨款P30AG066512, P01AG060882, NCI拨款P50CA225450, P30CA016087)和疾病控制与预防中心(CDC)拨款U01OH012486的部分支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Integrated partially linear model for multi-centre studies with heterogeneity and batch effect in covariates

AbstractMulti-centre study is increasingly used for borrowing strength from multiple research groups to obtain reproducible study findings. Regression analysis is widely used for analysing multi-group studies, however, some of the regression predictors are nonlinear and/or often measured with batch effects. Also, the group compositions are potentially heterogeneous across different centres. The conventional pooled data analysis can cause biased regression estimates. This paper proposes an integrated partially linear regression model (IPLM) to account for predictor's nonlinearity, general batch effect, group composition heterogeneity, and potential measurement-error in covariates simultaneously. A local linear regression-based approach is employed to estimate the nonlinear component and a regularization procedure is introduced to identify the predictors' effects. The IPLM-based method has estimation consistency and variable-selection consistency. Moreover, it has a fast computing algorithm and its effectiveness is supported by simulation studies. A multi-centre Alzheimer's disease research project is provided to illustrate the proposed IPLM-based analysis.Keywords: Multi-centre studydata harmonizationpartially linear regression modelgeneral batch effectsgroup composition heterogeneity AcknowledgementsThe authors would like to thank the reviewers and the associate editor for careful reading and for many constructive suggestions. The authors would like to thank Drs. Mony de Leon, Ricardo Osorio, and Elizabeth Pirraglia for sharing with us the NYU Alzheimer's disease data sets used in Section 5 for the illustration of our proposed model and analysis. The NYU study data are available from figshare (https://figshare.com/s/16d233d4822b810bcd9b, DOI: 10.6084/m9.figshare.5758554). One part of the data used in the preparation of the example in Section 5 of this article was obtained from the Alzheimers Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu/data-samples/access-data/). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in the design, analysis or writing of this report. A complete list of ADNI investigators is at: http://adni.loni.usc.edu/wpcontent/uploads/how to apply/ADNI Acknowledgement List.pdf.Disclosure statementNo potential conflict of interest was reported by the author(s).Additional informationFundingThis research was partially supported by the United States National Institute of Health grants (NIA grants P30AG066512, P01AG060882, NCI grants P50CA225450, P30CA016087) and Center for Disease Control and Prevention (CDC) grant U01OH012486.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Statistics 数学-统计学与概率论

CiteScore

1.00

自引率

0.00%

发文量

审稿时长

12 months

期刊介绍： Statistics publishes papers developing and analysing new methods for any active field of statistics, motivated by real-life problems. Papers submitted for consideration should provide interesting and novel contributions to statistical theory and its applications with rigorous mathematical results and proofs. Moreover, numerical simulations and application to real data sets can improve the quality of papers, and should be included where appropriate. Statistics does not publish papers which represent mere application of existing procedures to case studies, and papers are required to contain methodological or theoretical innovation. Topics of interest include, for example, nonparametric statistics, time series, analysis of topological or functional data. Furthermore the journal also welcomes submissions in the field of theoretical econometrics and its links to mathematical statistics.