{"title":"通过交叉拟合,加快基于 R2 的高维中介效应的区间估计。","authors":"Zhichao Xu, Chunlin Li, Sunyi Chi, Tianzhong Yang, Peng Wei","doi":"10.1093/biostatistics/kxae037","DOIUrl":null,"url":null,"abstract":"<p><p>Mediation analysis is a useful tool in investigating how molecular phenotypes such as gene expression mediate the effect of exposure on health outcomes. However, commonly used mean-based total mediation effect measures may suffer from cancellation of component-wise mediation effects in opposite directions in the presence of high-dimensional omics mediators. To overcome this limitation, we recently proposed a variance-based R-squared total mediation effect measure that relies on the computationally intensive nonparametric bootstrap for confidence interval estimation. In the work described herein, we formulated a more efficient two-stage, cross-fitted estimation procedure for the R2 measure. To avoid potential bias, we performed iterative Sure Independence Screening (iSIS) in two subsamples to exclude the non-mediators, followed by ordinary least squares regressions for the variance estimation. We then constructed confidence intervals based on the newly derived closed-form asymptotic distribution of the R2 measure. Extensive simulation studies demonstrated that this proposed procedure is much more computationally efficient than the resampling-based method, with comparable coverage probability. Furthermore, when applied to the Framingham Heart Study, the proposed method replicated the established finding of gene expression mediating age-related variation in systolic blood pressure and identified the role of gene expression profiles in the relationship between sex and high-density lipoprotein cholesterol level. The proposed estimation procedure is implemented in R package CFR2M.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823199/pdf/","citationCount":"0","resultStr":"{\"title\":\"Speeding up interval estimation for R2-based mediation effect of high-dimensional mediators via cross-fitting.\",\"authors\":\"Zhichao Xu, Chunlin Li, Sunyi Chi, Tianzhong Yang, Peng Wei\",\"doi\":\"10.1093/biostatistics/kxae037\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Mediation analysis is a useful tool in investigating how molecular phenotypes such as gene expression mediate the effect of exposure on health outcomes. However, commonly used mean-based total mediation effect measures may suffer from cancellation of component-wise mediation effects in opposite directions in the presence of high-dimensional omics mediators. To overcome this limitation, we recently proposed a variance-based R-squared total mediation effect measure that relies on the computationally intensive nonparametric bootstrap for confidence interval estimation. In the work described herein, we formulated a more efficient two-stage, cross-fitted estimation procedure for the R2 measure. To avoid potential bias, we performed iterative Sure Independence Screening (iSIS) in two subsamples to exclude the non-mediators, followed by ordinary least squares regressions for the variance estimation. We then constructed confidence intervals based on the newly derived closed-form asymptotic distribution of the R2 measure. Extensive simulation studies demonstrated that this proposed procedure is much more computationally efficient than the resampling-based method, with comparable coverage probability. Furthermore, when applied to the Framingham Heart Study, the proposed method replicated the established finding of gene expression mediating age-related variation in systolic blood pressure and identified the role of gene expression profiles in the relationship between sex and high-density lipoprotein cholesterol level. The proposed estimation procedure is implemented in R package CFR2M.</p>\",\"PeriodicalId\":55357,\"journal\":{\"name\":\"Biostatistics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2024-12-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823199/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biostatistics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1093/biostatistics/kxae037\",\"RegionNum\":3,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biostatistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/biostatistics/kxae037","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
中介分析是研究基因表达等分子表型如何介导暴露对健康结果影响的有用工具。然而,常用的基于均值的总中介效应测量方法可能会在存在高维表观中介因子的情况下,出现分量-分量-分量的反向中介效应抵消的问题。为了克服这一局限性,我们最近提出了一种基于方差的 R 平方总中介效应测量方法,它依赖于计算密集型非参数自举法进行置信区间估计。在本文所述的工作中,我们为 R2 测量制定了更有效的两阶段交叉拟合估计程序。为了避免潜在的偏差,我们在两个子样本中进行了迭代确定独立性筛选(iSIS),以排除非调解人,然后用普通最小二乘法回归进行方差估计。然后,我们根据新推导出的 R2 测量的闭式渐近分布构建置信区间。广泛的模拟研究表明,与基于重采样的方法相比,我们提出的方法在计算上更有效率,而且覆盖概率相当。此外,当应用于弗雷明汉心脏研究时,所提出的方法复制了基因表达介导收缩压年龄相关变化的既定结论,并确定了基因表达谱在性别与高密度脂蛋白胆固醇水平之间关系中的作用。拟议的估计程序在 R 软件包 CFR2M 中实现。
Speeding up interval estimation for R2-based mediation effect of high-dimensional mediators via cross-fitting.
Mediation analysis is a useful tool in investigating how molecular phenotypes such as gene expression mediate the effect of exposure on health outcomes. However, commonly used mean-based total mediation effect measures may suffer from cancellation of component-wise mediation effects in opposite directions in the presence of high-dimensional omics mediators. To overcome this limitation, we recently proposed a variance-based R-squared total mediation effect measure that relies on the computationally intensive nonparametric bootstrap for confidence interval estimation. In the work described herein, we formulated a more efficient two-stage, cross-fitted estimation procedure for the R2 measure. To avoid potential bias, we performed iterative Sure Independence Screening (iSIS) in two subsamples to exclude the non-mediators, followed by ordinary least squares regressions for the variance estimation. We then constructed confidence intervals based on the newly derived closed-form asymptotic distribution of the R2 measure. Extensive simulation studies demonstrated that this proposed procedure is much more computationally efficient than the resampling-based method, with comparable coverage probability. Furthermore, when applied to the Framingham Heart Study, the proposed method replicated the established finding of gene expression mediating age-related variation in systolic blood pressure and identified the role of gene expression profiles in the relationship between sex and high-density lipoprotein cholesterol level. The proposed estimation procedure is implemented in R package CFR2M.
期刊介绍:
Among the important scientific developments of the 20th century is the explosive growth in statistical reasoning and methods for application to studies of human health. Examples include developments in likelihood methods for inference, epidemiologic statistics, clinical trials, survival analysis, and statistical genetics. Substantive problems in public health and biomedical research have fueled the development of statistical methods, which in turn have improved our ability to draw valid inferences from data. The objective of Biostatistics is to advance statistical science and its application to problems of human health and disease, with the ultimate goal of advancing the public''s health.