We estimate relative hazards and absolute risks (or cumulative incidence or crude risk) under cause-specific proportional hazards models for competing risks from double nested case-control (DNCC) data. In the DNCC design, controls are time-matched not only to cases from the cause of primary interest, but also to cases from competing risks (the phase-two sample). Complete covariate data are available in the phase-two sample, but other cohort members only have information on survival outcomes and some covariates. Design-weighted estimators use inverse sampling probabilities computed from Samuelsen-type calculations for DNCC. To take advantage of additional information available on all cohort members, we augment the estimating equations with a term that is unbiased for zero but improves the efficiency of estimates from the cause-specific proportional hazards model. We establish the asymptotic properties of the proposed estimators, including the estimator of absolute risk, and derive consistent variance estimators. We show that augmented design-weighted estimators are more efficient than design-weighted estimators. Through simulations, we show that the proposed asymptotic methods yield nominal operating characteristics in practical sample sizes. We illustrate the methods using prostate cancer mortality data from the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial Study of the National Cancer Institute.
{"title":"Absolute risk from double nested case-control designs: cause-specific proportional hazards models with and without augmented estimating equations.","authors":"Minjung Lee, Mitchell H Gail","doi":"10.1093/biomtc/ujae062","DOIUrl":"https://doi.org/10.1093/biomtc/ujae062","url":null,"abstract":"<p><p>We estimate relative hazards and absolute risks (or cumulative incidence or crude risk) under cause-specific proportional hazards models for competing risks from double nested case-control (DNCC) data. In the DNCC design, controls are time-matched not only to cases from the cause of primary interest, but also to cases from competing risks (the phase-two sample). Complete covariate data are available in the phase-two sample, but other cohort members only have information on survival outcomes and some covariates. Design-weighted estimators use inverse sampling probabilities computed from Samuelsen-type calculations for DNCC. To take advantage of additional information available on all cohort members, we augment the estimating equations with a term that is unbiased for zero but improves the efficiency of estimates from the cause-specific proportional hazards model. We establish the asymptotic properties of the proposed estimators, including the estimator of absolute risk, and derive consistent variance estimators. We show that augmented design-weighted estimators are more efficient than design-weighted estimators. Through simulations, we show that the proposed asymptotic methods yield nominal operating characteristics in practical sample sizes. We illustrate the methods using prostate cancer mortality data from the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial Study of the National Cancer Institute.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141589541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Interval-censored failure time data frequently arise in various scientific studies where each subject experiences periodical examinations for the occurrence of the failure event of interest, and the failure time is only known to lie in a specific time interval. In addition, collected data may include multiple observed variables with a certain degree of correlation, leading to severe multicollinearity issues. This work proposes a factor-augmented transformation model to analyze interval-censored failure time data while reducing model dimensionality and avoiding multicollinearity elicited by multiple correlated covariates. We provide a joint modeling framework by comprising a factor analysis model to group multiple observed variables into a few latent factors and a class of semiparametric transformation models with the augmented factors to examine their and other covariate effects on the failure event. Furthermore, we propose a nonparametric maximum likelihood estimation approach and develop a computationally stable and reliable expectation-maximization algorithm for its implementation. We establish the asymptotic properties of the proposed estimators and conduct simulation studies to assess the empirical performance of the proposed method. An application to the Alzheimer's Disease Neuroimaging Initiative (ADNI) study is provided. An R package ICTransCFA is also available for practitioners. Data used in preparation of this article were obtained from the ADNI database.
区间删失失效时间数据经常出现在各种科学研究中,在这些研究中,每个受试者都经历了相关失效事件发生的定期检查,而失效时间只知道在一个特定的时间区间内。此外,收集到的数据可能包含多个具有一定相关性的观测变量,从而导致严重的多重共线性问题。本研究提出了一种因子增强变换模型,用于分析区间删失的故障时间数据,同时降低模型维度,避免多个相关协变量引起的多重共线性。我们提供了一个联合建模框架,其中包括一个因子分析模型,用于将多个观测变量归类为几个潜在因子,以及一类带有增强因子的半参数变换模型,用于检验这些因子和其他协变量对故障事件的影响。此外,我们还提出了一种非参数最大似然估计方法,并为其实现开发了一种计算稳定可靠的期望最大化算法。我们建立了所提估计器的渐近特性,并进行了模拟研究,以评估所提方法的经验性能。我们还提供了阿尔茨海默病神经影像倡议(ADNI)研究的应用。此外,还为实践者提供了一个 R 软件包 ICTransCFA。本文编写过程中使用的数据来自 ADNI 数据库。
{"title":"Factor-augmented transformation models for interval-censored failure time data.","authors":"Hongxi Li, Shuwei Li, Liuquan Sun, Xinyuan Song","doi":"10.1093/biomtc/ujae078","DOIUrl":"https://doi.org/10.1093/biomtc/ujae078","url":null,"abstract":"<p><p>Interval-censored failure time data frequently arise in various scientific studies where each subject experiences periodical examinations for the occurrence of the failure event of interest, and the failure time is only known to lie in a specific time interval. In addition, collected data may include multiple observed variables with a certain degree of correlation, leading to severe multicollinearity issues. This work proposes a factor-augmented transformation model to analyze interval-censored failure time data while reducing model dimensionality and avoiding multicollinearity elicited by multiple correlated covariates. We provide a joint modeling framework by comprising a factor analysis model to group multiple observed variables into a few latent factors and a class of semiparametric transformation models with the augmented factors to examine their and other covariate effects on the failure event. Furthermore, we propose a nonparametric maximum likelihood estimation approach and develop a computationally stable and reliable expectation-maximization algorithm for its implementation. We establish the asymptotic properties of the proposed estimators and conduct simulation studies to assess the empirical performance of the proposed method. An application to the Alzheimer's Disease Neuroimaging Initiative (ADNI) study is provided. An R package ICTransCFA is also available for practitioners. Data used in preparation of this article were obtained from the ADNI database.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142035125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Erin E Gabriel, Michael C Sachs, Ingeborg Waernbaum, Els Goetghebeur, Paul F Blanche, Stijn Vansteelandt, Arvid Sjölander, Thomas Scheike
Recently, it has become common for applied works to combine commonly used survival analysis modeling methods, such as the multivariable Cox model and propensity score weighting, with the intention of forming a doubly robust estimator of an exposure effect hazard ratio that is unbiased in large samples when either the Cox model or the propensity score model is correctly specified. This combination does not, in general, produce a doubly robust estimator, even after regression standardization, when there is truly a causal effect. We demonstrate via simulation this lack of double robustness for the semiparametric Cox model, the Weibull proportional hazards model, and a simple proportional hazards flexible parametric model, with both the latter models fit via maximum likelihood. We provide a novel proof that the combination of propensity score weighting and a proportional hazards survival model, fit either via full or partial likelihood, is consistent under the null of no causal effect of the exposure on the outcome under particular censoring mechanisms if either the propensity score or the outcome model is correctly specified and contains all confounders. Given our results suggesting that double robustness only exists under the null, we outline 2 simple alternative estimators that are doubly robust for the survival difference at a given time point (in the above sense), provided the censoring mechanism can be correctly modeled, and one doubly robust method of estimation for the full survival curve. We provide R code to use these estimators for estimation and inference in the supporting information.
{"title":"Propensity weighting plus adjustment in proportional hazards model is not doubly robust.","authors":"Erin E Gabriel, Michael C Sachs, Ingeborg Waernbaum, Els Goetghebeur, Paul F Blanche, Stijn Vansteelandt, Arvid Sjölander, Thomas Scheike","doi":"10.1093/biomtc/ujae069","DOIUrl":"https://doi.org/10.1093/biomtc/ujae069","url":null,"abstract":"<p><p>Recently, it has become common for applied works to combine commonly used survival analysis modeling methods, such as the multivariable Cox model and propensity score weighting, with the intention of forming a doubly robust estimator of an exposure effect hazard ratio that is unbiased in large samples when either the Cox model or the propensity score model is correctly specified. This combination does not, in general, produce a doubly robust estimator, even after regression standardization, when there is truly a causal effect. We demonstrate via simulation this lack of double robustness for the semiparametric Cox model, the Weibull proportional hazards model, and a simple proportional hazards flexible parametric model, with both the latter models fit via maximum likelihood. We provide a novel proof that the combination of propensity score weighting and a proportional hazards survival model, fit either via full or partial likelihood, is consistent under the null of no causal effect of the exposure on the outcome under particular censoring mechanisms if either the propensity score or the outcome model is correctly specified and contains all confounders. Given our results suggesting that double robustness only exists under the null, we outline 2 simple alternative estimators that are doubly robust for the survival difference at a given time point (in the above sense), provided the censoring mechanism can be correctly modeled, and one doubly robust method of estimation for the full survival curve. We provide R code to use these estimators for estimation and inference in the supporting information.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141733497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peisong Han, Haoyue Li, Sung Kyun Park, Bhramar Mukherjee, Jeremy M G Taylor
We consider the setting where (1) an internal study builds a linear regression model for prediction based on individual-level data, (2) some external studies have fitted similar linear regression models that use only subsets of the covariates and provide coefficient estimates for the reduced models without individual-level data, and (3) there is heterogeneity across these study populations. The goal is to integrate the external model summary information into fitting the internal model to improve prediction accuracy. We adapt the James-Stein shrinkage method to propose estimators that are no worse and are oftentimes better in the prediction mean squared error after information integration, regardless of the degree of study population heterogeneity. We conduct comprehensive simulation studies to investigate the numerical performance of the proposed estimators. We also apply the method to enhance a prediction model for patella bone lead level in terms of blood lead level and other covariates by integrating summary information from published literature.
{"title":"Improving prediction of linear regression models by integrating external information from heterogeneous populations: James-Stein estimators.","authors":"Peisong Han, Haoyue Li, Sung Kyun Park, Bhramar Mukherjee, Jeremy M G Taylor","doi":"10.1093/biomtc/ujae072","DOIUrl":"10.1093/biomtc/ujae072","url":null,"abstract":"<p><p>We consider the setting where (1) an internal study builds a linear regression model for prediction based on individual-level data, (2) some external studies have fitted similar linear regression models that use only subsets of the covariates and provide coefficient estimates for the reduced models without individual-level data, and (3) there is heterogeneity across these study populations. The goal is to integrate the external model summary information into fitting the internal model to improve prediction accuracy. We adapt the James-Stein shrinkage method to propose estimators that are no worse and are oftentimes better in the prediction mean squared error after information integration, regardless of the degree of study population heterogeneity. We conduct comprehensive simulation studies to investigate the numerical performance of the proposed estimators. We also apply the method to enhance a prediction model for patella bone lead level in terms of blood lead level and other covariates by integrating summary information from published literature.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11299067/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141888418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The dynamics that govern disease spread are hard to model because infections are functions of both the underlying pathogen as well as human or animal behavior. This challenge is increased when modeling how diseases spread between different spatial locations. Many proposed spatial epidemiological models require trade-offs to fit, either by abstracting away theoretical spread dynamics, fitting a deterministic model, or by requiring large computational resources for many simulations. We propose an approach that approximates the complex spatial spread dynamics with a Gaussian process. We first propose a flexible spatial extension to the well-known SIR stochastic process, and then we derive a moment-closure approximation to this stochastic process. This moment-closure approximation yields ordinary differential equations for the evolution of the means and covariances of the susceptibles and infectious through time. Because these ODEs are a bottleneck to fitting our model by MCMC, we approximate them using a low-rank emulator. This approximation serves as the basis for our hierarchical model for noisy, underreported counts of new infections by spatial location and time. We demonstrate using our model to conduct inference on simulated infections from the underlying, true spatial SIR jump process. We then apply our method to model counts of new Zika infections in Brazil from late 2015 through early 2016.
控制疾病传播的动力学很难建模,因为感染既是潜在病原体的函数,也是人类或动物行为的函数。在模拟疾病如何在不同空间地点之间传播时,这一挑战就更大了。许多建议的空间流行病学模型需要权衡利弊才能拟合,要么抽象出理论传播动态,要么拟合确定性模型,要么需要大量计算资源进行多次模拟。我们提出了一种用高斯过程近似复杂空间传播动态的方法。我们首先对著名的 SIR 随机过程提出了灵活的空间扩展,然后推导出这一随机过程的时刻闭合近似值。这种时刻闭合近似得到了易感因子和感染因子的均值和协方差随时间演变的常微分方程。由于这些常微分方程是用 MCMC 拟合模型的瓶颈,因此我们使用低阶仿真器对其进行近似。这一近似值为我们的分层模型奠定了基础,该模型可用于按空间位置和时间计算有噪声的、未充分报告的新感染人数。我们演示了如何使用我们的模型,根据真实的空间 SIR 跳跃过程对模拟感染进行推断。然后,我们将我们的方法应用于巴西 2015 年末至 2016 年初的寨卡新发感染人数建模。
{"title":"A Gaussian-process approximation to a spatial SIR process using moment closures and emulators.","authors":"Parker Trostle, Joseph Guinness, Brian J Reich","doi":"10.1093/biomtc/ujae068","DOIUrl":"10.1093/biomtc/ujae068","url":null,"abstract":"<p><p>The dynamics that govern disease spread are hard to model because infections are functions of both the underlying pathogen as well as human or animal behavior. This challenge is increased when modeling how diseases spread between different spatial locations. Many proposed spatial epidemiological models require trade-offs to fit, either by abstracting away theoretical spread dynamics, fitting a deterministic model, or by requiring large computational resources for many simulations. We propose an approach that approximates the complex spatial spread dynamics with a Gaussian process. We first propose a flexible spatial extension to the well-known SIR stochastic process, and then we derive a moment-closure approximation to this stochastic process. This moment-closure approximation yields ordinary differential equations for the evolution of the means and covariances of the susceptibles and infectious through time. Because these ODEs are a bottleneck to fitting our model by MCMC, we approximate them using a low-rank emulator. This approximation serves as the basis for our hierarchical model for noisy, underreported counts of new infections by spatial location and time. We demonstrate using our model to conduct inference on simulated infections from the underlying, true spatial SIR jump process. We then apply our method to model counts of new Zika infections in Brazil from late 2015 through early 2016.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11261348/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141733496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A dynamic treatment regime (DTR) is a mathematical representation of a multistage decision process. When applied to sequential treatment selection in medical settings, DTRs are useful for identifying optimal therapies for chronic diseases such as AIDs, mental illnesses, substance abuse, and many cancers. Sequential multiple assignment randomized trials (SMARTs) provide a useful framework for constructing DTRs and providing unbiased between-DTR comparisons. A limitation of SMARTs is that they ignore data from past patients that may be useful for reducing the probability of exposing new patients to inferior treatments. In practice, this may result in decreased treatment adherence or dropouts. To address this problem, we propose a generalized outcome-adaptive (GO) SMART design that adaptively unbalances stage-specific randomization probabilities in favor of treatments observed to be more effective in previous patients. To correct for bias induced by outcome adaptive randomization, we propose G-estimators and inverse-probability-weighted estimators of DTR effects embedded in a GO-SMART and show analytically that they are consistent. We report simulation results showing that, compared to a SMART, Response-Adaptive SMART and SMART with adaptive randomization, a GO-SMART design treats significantly more patients with the optimal DTR and achieves a larger number of total responses while maintaining similar or better statistical power.
{"title":"A generalized outcome-adaptive sequential multiple assignment randomized trial design.","authors":"Xue Yang, Yu Cheng, Peter F Thall, Abdus S Wahed","doi":"10.1093/biomtc/ujae073","DOIUrl":"https://doi.org/10.1093/biomtc/ujae073","url":null,"abstract":"<p><p>A dynamic treatment regime (DTR) is a mathematical representation of a multistage decision process. When applied to sequential treatment selection in medical settings, DTRs are useful for identifying optimal therapies for chronic diseases such as AIDs, mental illnesses, substance abuse, and many cancers. Sequential multiple assignment randomized trials (SMARTs) provide a useful framework for constructing DTRs and providing unbiased between-DTR comparisons. A limitation of SMARTs is that they ignore data from past patients that may be useful for reducing the probability of exposing new patients to inferior treatments. In practice, this may result in decreased treatment adherence or dropouts. To address this problem, we propose a generalized outcome-adaptive (GO) SMART design that adaptively unbalances stage-specific randomization probabilities in favor of treatments observed to be more effective in previous patients. To correct for bias induced by outcome adaptive randomization, we propose G-estimators and inverse-probability-weighted estimators of DTR effects embedded in a GO-SMART and show analytically that they are consistent. We report simulation results showing that, compared to a SMART, Response-Adaptive SMART and SMART with adaptive randomization, a GO-SMART design treats significantly more patients with the optimal DTR and achieves a larger number of total responses while maintaining similar or better statistical power.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141896689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We develop a methodology for valid inference after variable selection in logistic regression when the responses are partially observed, that is, when one observes a set of error-prone testing outcomes instead of the true values of the responses. Aiming at selecting important covariates while accounting for missing information in the response data, we apply the expectation-maximization algorithm to compute maximum likelihood estimators subject to LASSO penalization. Subsequent to variable selection, we make inferences on the selected covariate effects by extending post-selection inference methodology based on the polyhedral lemma. Empirical evidence from our extensive simulation study suggests that our post-selection inference results are more reliable than those from naive inference methods that use the same data to perform variable selection and inference without adjusting for variable selection.
{"title":"Post-selection inference in regression models for group testing data.","authors":"Qinyan Shen, Karl Gregory, Xianzheng Huang","doi":"10.1093/biomtc/ujae101","DOIUrl":"https://doi.org/10.1093/biomtc/ujae101","url":null,"abstract":"<p><p>We develop a methodology for valid inference after variable selection in logistic regression when the responses are partially observed, that is, when one observes a set of error-prone testing outcomes instead of the true values of the responses. Aiming at selecting important covariates while accounting for missing information in the response data, we apply the expectation-maximization algorithm to compute maximum likelihood estimators subject to LASSO penalization. Subsequent to variable selection, we make inferences on the selected covariate effects by extending post-selection inference methodology based on the polyhedral lemma. Empirical evidence from our extensive simulation study suggests that our post-selection inference results are more reliable than those from naive inference methods that use the same data to perform variable selection and inference without adjusting for variable selection.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142280082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The need to select mediators from a high dimensional data source, such as neuroimaging data and genetic data, arises in much scientific research. In this work, we formulate a multiple-hypothesis testing framework for mediator selection from a high-dimensional candidate set, and propose a method, which extends the recent development in false discovery rate (FDR)-controlled variable selection with knockoff to select mediators with FDR control. We show that the proposed method and algorithm achieved finite sample FDR control. We present extensive simulation results to demonstrate the power and finite sample performance compared with the existing method. Lastly, we demonstrate the method for analyzing the Adolescent Brain Cognitive Development (ABCD) study, in which the proposed method selects several resting-state functional magnetic resonance imaging connectivity markers as mediators for the relationship between adverse childhood events and the crystallized composite score in the NIH toolbox.
{"title":"Controlling false discovery rate for mediator selection in high-dimensional data.","authors":"Ran Dai, Ruiyang Li, Seonjoo Lee, Ying Liu","doi":"10.1093/biomtc/ujae064","DOIUrl":"10.1093/biomtc/ujae064","url":null,"abstract":"<p><p>The need to select mediators from a high dimensional data source, such as neuroimaging data and genetic data, arises in much scientific research. In this work, we formulate a multiple-hypothesis testing framework for mediator selection from a high-dimensional candidate set, and propose a method, which extends the recent development in false discovery rate (FDR)-controlled variable selection with knockoff to select mediators with FDR control. We show that the proposed method and algorithm achieved finite sample FDR control. We present extensive simulation results to demonstrate the power and finite sample performance compared with the existing method. Lastly, we demonstrate the method for analyzing the Adolescent Brain Cognitive Development (ABCD) study, in which the proposed method selects several resting-state functional magnetic resonance imaging connectivity markers as mediators for the relationship between adverse childhood events and the crystallized composite score in the NIH toolbox.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11285112/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141787238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Integrating multiple observational studies to make unconfounded causal or descriptive comparisons of group potential outcomes in a large natural population is challenging. Moreover, retrospective cohorts, being convenience samples, are usually unrepresentative of the natural population of interest and have groups with unbalanced covariates. We propose a general covariate-balancing framework based on pseudo-populations that extends established weighting methods to the meta-analysis of multiple retrospective cohorts with multiple groups. Additionally, by maximizing the effective sample sizes of the cohorts, we propose a FLEXible, Optimized, and Realistic (FLEXOR) weighting method appropriate for integrative analyses. We develop new weighted estimators for unconfounded inferences on wide-ranging population-level features and estimands relevant to group comparisons of quantitative, categorical, or multivariate outcomes. Asymptotic properties of these estimators are examined. Through simulation studies and meta-analyses of TCGA datasets, we demonstrate the versatility and reliability of the proposed weighting strategy, especially for the FLEXOR pseudo-population.
{"title":"Causal meta-analysis by integrating multiple observational studies with multivariate outcomes.","authors":"Subharup Guha, Yi Li","doi":"10.1093/biomtc/ujae070","DOIUrl":"10.1093/biomtc/ujae070","url":null,"abstract":"<p><p>Integrating multiple observational studies to make unconfounded causal or descriptive comparisons of group potential outcomes in a large natural population is challenging. Moreover, retrospective cohorts, being convenience samples, are usually unrepresentative of the natural population of interest and have groups with unbalanced covariates. We propose a general covariate-balancing framework based on pseudo-populations that extends established weighting methods to the meta-analysis of multiple retrospective cohorts with multiple groups. Additionally, by maximizing the effective sample sizes of the cohorts, we propose a FLEXible, Optimized, and Realistic (FLEXOR) weighting method appropriate for integrative analyses. We develop new weighted estimators for unconfounded inferences on wide-ranging population-level features and estimands relevant to group comparisons of quantitative, categorical, or multivariate outcomes. Asymptotic properties of these estimators are examined. Through simulation studies and meta-analyses of TCGA datasets, we demonstrate the versatility and reliability of the proposed weighting strategy, especially for the FLEXOR pseudo-population.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11285113/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141787237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Katherine Brumberg, Dylan S Small, Paul R Rosenbaum
What is the best way to split one stratum into two to maximally reduce the within-stratum imbalance in many covariates? We formulate this as an integer program and approximate the solution by randomized rounding of a linear program. A linear program may assign a fraction of a person to each refined stratum. Randomized rounding views fractional people as probabilities, assigning intact people to strata using biased coins. Randomized rounding is a well-studied theoretical technique for approximating the optimal solution of certain insoluble integer programs. When the number of people in a stratum is large relative to the number of covariates, we prove the following new results: (i) randomized rounding to split a stratum does very little randomizing, so it closely resembles the linear programming relaxation without splitting intact people; (ii) the linear relaxation and the randomly rounded solution place lower and upper bounds on the unattainable integer programming solution; and because of (i), these bounds are often close, thereby ratifying the usable randomly rounded solution. We illustrate using an observational study that balanced many covariates by forming matched pairs composed of 2016 patients selected from 5735 using a propensity score. Instead, we form 5 propensity score strata and refine them into 10 strata, obtaining excellent covariate balance while retaining all patients. An R package optrefine at CRAN implements the method. Supplementary materials are available online.
{"title":"Optimal refinement of strata to balance covariates.","authors":"Katherine Brumberg, Dylan S Small, Paul R Rosenbaum","doi":"10.1093/biomtc/ujae061","DOIUrl":"https://doi.org/10.1093/biomtc/ujae061","url":null,"abstract":"<p><p>What is the best way to split one stratum into two to maximally reduce the within-stratum imbalance in many covariates? We formulate this as an integer program and approximate the solution by randomized rounding of a linear program. A linear program may assign a fraction of a person to each refined stratum. Randomized rounding views fractional people as probabilities, assigning intact people to strata using biased coins. Randomized rounding is a well-studied theoretical technique for approximating the optimal solution of certain insoluble integer programs. When the number of people in a stratum is large relative to the number of covariates, we prove the following new results: (i) randomized rounding to split a stratum does very little randomizing, so it closely resembles the linear programming relaxation without splitting intact people; (ii) the linear relaxation and the randomly rounded solution place lower and upper bounds on the unattainable integer programming solution; and because of (i), these bounds are often close, thereby ratifying the usable randomly rounded solution. We illustrate using an observational study that balanced many covariates by forming matched pairs composed of 2016 patients selected from 5735 using a propensity score. Instead, we form 5 propensity score strata and refine them into 10 strata, obtaining excellent covariate balance while retaining all patients. An R package optrefine at CRAN implements the method. Supplementary materials are available online.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141589543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}