{"title":"具有协变量异质性和批效应的多中心研究的部分线性综合模型","authors":"Lei Yang, Yongzhao Shao","doi":"10.1080/02331888.2023.2258429","DOIUrl":null,"url":null,"abstract":"AbstractMulti-centre study is increasingly used for borrowing strength from multiple research groups to obtain reproducible study findings. Regression analysis is widely used for analysing multi-group studies, however, some of the regression predictors are nonlinear and/or often measured with batch effects. Also, the group compositions are potentially heterogeneous across different centres. The conventional pooled data analysis can cause biased regression estimates. This paper proposes an integrated partially linear regression model (IPLM) to account for predictor's nonlinearity, general batch effect, group composition heterogeneity, and potential measurement-error in covariates simultaneously. A local linear regression-based approach is employed to estimate the nonlinear component and a regularization procedure is introduced to identify the predictors' effects. The IPLM-based method has estimation consistency and variable-selection consistency. Moreover, it has a fast computing algorithm and its effectiveness is supported by simulation studies. A multi-centre Alzheimer's disease research project is provided to illustrate the proposed IPLM-based analysis.Keywords: Multi-centre studydata harmonizationpartially linear regression modelgeneral batch effectsgroup composition heterogeneity AcknowledgementsThe authors would like to thank the reviewers and the associate editor for careful reading and for many constructive suggestions. The authors would like to thank Drs. Mony de Leon, Ricardo Osorio, and Elizabeth Pirraglia for sharing with us the NYU Alzheimer's disease data sets used in Section 5 for the illustration of our proposed model and analysis. The NYU study data are available from figshare (https://figshare.com/s/16d233d4822b810bcd9b, DOI: 10.6084/m9.figshare.5758554). One part of the data used in the preparation of the example in Section 5 of this article was obtained from the Alzheimers Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu/data-samples/access-data/). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in the design, analysis or writing of this report. A complete list of ADNI investigators is at: http://adni.loni.usc.edu/wpcontent/uploads/how to apply/ADNI Acknowledgement List.pdf.Disclosure statementNo potential conflict of interest was reported by the author(s).Additional informationFundingThis research was partially supported by the United States National Institute of Health grants (NIA grants P30AG066512, P01AG060882, NCI grants P50CA225450, P30CA016087) and Center for Disease Control and Prevention (CDC) grant U01OH012486.","PeriodicalId":54358,"journal":{"name":"Statistics","volume":"37 1","pages":"0"},"PeriodicalIF":1.2000,"publicationDate":"2023-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Integrated partially linear model for multi-centre studies with heterogeneity and batch effect in covariates\",\"authors\":\"Lei Yang, Yongzhao Shao\",\"doi\":\"10.1080/02331888.2023.2258429\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"AbstractMulti-centre study is increasingly used for borrowing strength from multiple research groups to obtain reproducible study findings. Regression analysis is widely used for analysing multi-group studies, however, some of the regression predictors are nonlinear and/or often measured with batch effects. Also, the group compositions are potentially heterogeneous across different centres. The conventional pooled data analysis can cause biased regression estimates. This paper proposes an integrated partially linear regression model (IPLM) to account for predictor's nonlinearity, general batch effect, group composition heterogeneity, and potential measurement-error in covariates simultaneously. A local linear regression-based approach is employed to estimate the nonlinear component and a regularization procedure is introduced to identify the predictors' effects. The IPLM-based method has estimation consistency and variable-selection consistency. Moreover, it has a fast computing algorithm and its effectiveness is supported by simulation studies. A multi-centre Alzheimer's disease research project is provided to illustrate the proposed IPLM-based analysis.Keywords: Multi-centre studydata harmonizationpartially linear regression modelgeneral batch effectsgroup composition heterogeneity AcknowledgementsThe authors would like to thank the reviewers and the associate editor for careful reading and for many constructive suggestions. The authors would like to thank Drs. Mony de Leon, Ricardo Osorio, and Elizabeth Pirraglia for sharing with us the NYU Alzheimer's disease data sets used in Section 5 for the illustration of our proposed model and analysis. The NYU study data are available from figshare (https://figshare.com/s/16d233d4822b810bcd9b, DOI: 10.6084/m9.figshare.5758554). One part of the data used in the preparation of the example in Section 5 of this article was obtained from the Alzheimers Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu/data-samples/access-data/). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in the design, analysis or writing of this report. A complete list of ADNI investigators is at: http://adni.loni.usc.edu/wpcontent/uploads/how to apply/ADNI Acknowledgement List.pdf.Disclosure statementNo potential conflict of interest was reported by the author(s).Additional informationFundingThis research was partially supported by the United States National Institute of Health grants (NIA grants P30AG066512, P01AG060882, NCI grants P50CA225450, P30CA016087) and Center for Disease Control and Prevention (CDC) grant U01OH012486.\",\"PeriodicalId\":54358,\"journal\":{\"name\":\"Statistics\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2023-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/02331888.2023.2258429\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/02331888.2023.2258429","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
Integrated partially linear model for multi-centre studies with heterogeneity and batch effect in covariates
AbstractMulti-centre study is increasingly used for borrowing strength from multiple research groups to obtain reproducible study findings. Regression analysis is widely used for analysing multi-group studies, however, some of the regression predictors are nonlinear and/or often measured with batch effects. Also, the group compositions are potentially heterogeneous across different centres. The conventional pooled data analysis can cause biased regression estimates. This paper proposes an integrated partially linear regression model (IPLM) to account for predictor's nonlinearity, general batch effect, group composition heterogeneity, and potential measurement-error in covariates simultaneously. A local linear regression-based approach is employed to estimate the nonlinear component and a regularization procedure is introduced to identify the predictors' effects. The IPLM-based method has estimation consistency and variable-selection consistency. Moreover, it has a fast computing algorithm and its effectiveness is supported by simulation studies. A multi-centre Alzheimer's disease research project is provided to illustrate the proposed IPLM-based analysis.Keywords: Multi-centre studydata harmonizationpartially linear regression modelgeneral batch effectsgroup composition heterogeneity AcknowledgementsThe authors would like to thank the reviewers and the associate editor for careful reading and for many constructive suggestions. The authors would like to thank Drs. Mony de Leon, Ricardo Osorio, and Elizabeth Pirraglia for sharing with us the NYU Alzheimer's disease data sets used in Section 5 for the illustration of our proposed model and analysis. The NYU study data are available from figshare (https://figshare.com/s/16d233d4822b810bcd9b, DOI: 10.6084/m9.figshare.5758554). One part of the data used in the preparation of the example in Section 5 of this article was obtained from the Alzheimers Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu/data-samples/access-data/). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in the design, analysis or writing of this report. A complete list of ADNI investigators is at: http://adni.loni.usc.edu/wpcontent/uploads/how to apply/ADNI Acknowledgement List.pdf.Disclosure statementNo potential conflict of interest was reported by the author(s).Additional informationFundingThis research was partially supported by the United States National Institute of Health grants (NIA grants P30AG066512, P01AG060882, NCI grants P50CA225450, P30CA016087) and Center for Disease Control and Prevention (CDC) grant U01OH012486.
期刊介绍:
Statistics publishes papers developing and analysing new methods for any active field of statistics, motivated by real-life problems. Papers submitted for consideration should provide interesting and novel contributions to statistical theory and its applications with rigorous mathematical results and proofs. Moreover, numerical simulations and application to real data sets can improve the quality of papers, and should be included where appropriate. Statistics does not publish papers which represent mere application of existing procedures to case studies, and papers are required to contain methodological or theoretical innovation. Topics of interest include, for example, nonparametric statistics, time series, analysis of topological or functional data. Furthermore the journal also welcomes submissions in the field of theoretical econometrics and its links to mathematical statistics.