Pub Date : 2024-11-18DOI: 10.1093/biostatistics/kxae044
Anastasiia Holovchak, Helen McIlleron, Paolo Denti, Michael Schomaker
Missing data in multiple variables is a common issue. We investigate the applicability of the framework of graphical models for handling missing data to a complex longitudinal pharmacological study of children with HIV treated with an efavirenz-based regimen as part of the CHAPAS-3 trial. Specifically, we examine whether the causal effects of interest, defined through static interventions on multiple continuous variables, can be recovered (estimated consistently) from the available data only. So far, no general algorithms are available to decide on recoverability, and decisions have to be made on a case-by-case basis. We emphasize the sensitivity of recoverability to even the smallest changes in the graph structure, and present recoverability results for three plausible missingness-directed acyclic graphs (m-DAGs) in the CHAPAS-3 study, informed by clinical knowledge. Furthermore, we propose the concept of a "closed missingness mechanism": if missing data are generated based on this mechanism, an available case analysis is admissible for consistent estimation of any statistical or causal estimand, even if data are missing not at random. Both simulations and theoretical considerations demonstrate how, in the assumed MNAR setting of our study, a complete or available case analysis can be superior to multiple imputation, and estimation results vary depending on the assumed missingness DAG. Our analyses demonstrate an innovative application of missingness DAGs to complex longitudinal real-world data, while highlighting the sensitivity of the results with respect to the assumed causal model.
多个变量的缺失数据是一个常见问题。我们研究了处理缺失数据的图形模型框架在一项复杂的纵向药理学研究中的适用性,该研究是 CHAPAS-3 试验的一部分,研究对象是接受以依非韦伦为基础的方案治疗的 HIV 感染儿童。具体来说,我们研究了通过对多个连续变量的静态干预所确定的相关因果效应是否可以仅从现有数据中恢复(一致估计)。到目前为止,还没有可用来决定可恢复性的通用算法,必须根据具体情况做出决定。我们强调了可恢复性对图结构中最小变化的敏感性,并介绍了 CHAPAS-3 研究中三个可信的缺失指向无环图(m-DAG)的可恢复性结果,这些结果是以临床知识为基础的。此外,我们还提出了 "封闭缺失机制 "的概念:如果缺失数据是基于这种机制产生的,那么即使数据不是随机缺失,也可以通过可用的病例分析对任何统计或因果估计进行一致的估计。模拟和理论考虑都表明,在我们研究的假定 MNAR 设置中,完整或可用案例分析如何优于多重估算,估算结果因假定的缺失 DAG 而异。我们的分析展示了缺失 DAG 在复杂的纵向真实世界数据中的创新应用,同时强调了结果对假定因果模型的敏感性。
{"title":"Recoverability of causal effects under presence of missing data: a longitudinal case study.","authors":"Anastasiia Holovchak, Helen McIlleron, Paolo Denti, Michael Schomaker","doi":"10.1093/biostatistics/kxae044","DOIUrl":"https://doi.org/10.1093/biostatistics/kxae044","url":null,"abstract":"<p><p>Missing data in multiple variables is a common issue. We investigate the applicability of the framework of graphical models for handling missing data to a complex longitudinal pharmacological study of children with HIV treated with an efavirenz-based regimen as part of the CHAPAS-3 trial. Specifically, we examine whether the causal effects of interest, defined through static interventions on multiple continuous variables, can be recovered (estimated consistently) from the available data only. So far, no general algorithms are available to decide on recoverability, and decisions have to be made on a case-by-case basis. We emphasize the sensitivity of recoverability to even the smallest changes in the graph structure, and present recoverability results for three plausible missingness-directed acyclic graphs (m-DAGs) in the CHAPAS-3 study, informed by clinical knowledge. Furthermore, we propose the concept of a \"closed missingness mechanism\": if missing data are generated based on this mechanism, an available case analysis is admissible for consistent estimation of any statistical or causal estimand, even if data are missing not at random. Both simulations and theoretical considerations demonstrate how, in the assumed MNAR setting of our study, a complete or available case analysis can be superior to multiple imputation, and estimation results vary depending on the assumed missingness DAG. Our analyses demonstrate an innovative application of missingness DAGs to complex longitudinal real-world data, while highlighting the sensitivity of the results with respect to the assumed causal model.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142649856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-11DOI: 10.1093/biostatistics/kxae043
Tingting Yu, Lang Wu, Ronald J Bosch, Davey M Smith, Rui Wang
Maximum likelihood inference can often become computationally intensive when performing joint modeling of longitudinal and time-to-event data, due to the intractable integrals in the joint likelihood function. The computational challenges escalate further when modeling HIV-1 viral load data, owing to the nonlinear trajectories and the presence of left-censored data resulting from the assay's lower limit of quantification. In this paper, for a joint model comprising a nonlinear mixed-effect model and a Cox Proportional Hazards model, we develop a computationally efficient Stochastic EM (StEM) algorithm for parameter estimation. Furthermore, we propose a novel technique for fast standard error estimation, which directly estimates standard errors from the results of StEM iterations and is broadly applicable to various joint modeling settings, such as those containing generalized linear mixed-effect models, parametric survival models, or joint models with more than two submodels. We evaluate the performance of the proposed methods through simulation studies and apply them to HIV-1 viral load data from six AIDS Clinical Trials Group studies to characterize viral rebound trajectories following the interruption of antiretroviral therapy (ART), accounting for the informative duration of off-ART periods.
{"title":"Fast standard error estimation for joint models of longitudinal and time-to-event data based on stochastic EM algorithms.","authors":"Tingting Yu, Lang Wu, Ronald J Bosch, Davey M Smith, Rui Wang","doi":"10.1093/biostatistics/kxae043","DOIUrl":"https://doi.org/10.1093/biostatistics/kxae043","url":null,"abstract":"<p><p>Maximum likelihood inference can often become computationally intensive when performing joint modeling of longitudinal and time-to-event data, due to the intractable integrals in the joint likelihood function. The computational challenges escalate further when modeling HIV-1 viral load data, owing to the nonlinear trajectories and the presence of left-censored data resulting from the assay's lower limit of quantification. In this paper, for a joint model comprising a nonlinear mixed-effect model and a Cox Proportional Hazards model, we develop a computationally efficient Stochastic EM (StEM) algorithm for parameter estimation. Furthermore, we propose a novel technique for fast standard error estimation, which directly estimates standard errors from the results of StEM iterations and is broadly applicable to various joint modeling settings, such as those containing generalized linear mixed-effect models, parametric survival models, or joint models with more than two submodels. We evaluate the performance of the proposed methods through simulation studies and apply them to HIV-1 viral load data from six AIDS Clinical Trials Group studies to characterize viral rebound trajectories following the interruption of antiretroviral therapy (ART), accounting for the informative duration of off-ART periods.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142632694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-09DOI: 10.1093/biostatistics/kxae042
Erin E Gabriel, Michael C Sachs, Arvid Sjölander
In instrumental variable (IV) settings, such as imperfect randomized trials and observational studies with Mendelian randomization, one may encounter a continuous exposure, the causal effect of which is not of true interest. Instead, scientific interest may lie in a coarsened version of this exposure. Although there is a lengthy literature on the impact of coarsening of an exposure with several works focusing specifically on IV settings, all methods proposed in this literature require parametric assumptions. Instead, just as in the standard IV setting, one can consider partial identification via bounds making no parametric assumptions. This was first pointed out in Alexander Balke's PhD dissertation. We extend and clarify his work and derive novel bounds in several settings, including for a three-level IV, which will most likely be the case in Mendelian randomization. We demonstrate our findings in two real data examples, a randomized trial for peanut allergy in infants and a Mendelian randomization setting investigating the effect of homocysteine on cardiovascular disease.
在工具变量(IV)环境中,如不完全随机试验和孟德尔随机化的观察研究中,我们可能会遇到一个连续的暴露因子,但其因果效应并不是我们真正感兴趣的。相反,科学兴趣可能在于这种暴露的粗略版本。尽管有大量文献研究了粗略化暴露的影响,其中有几部著作特别关注 IV 设置,但这些文献中提出的所有方法都需要参数假设。相反,就像在标准 IV 设置中一样,我们可以通过不带参数假设的约束来考虑部分识别。Alexander Balke 的博士论文首次指出了这一点。我们对他的工作进行了扩展和澄清,并在几种情况下推导出了新的边界,包括三层 IV,这很可能是孟德尔随机化的情况。我们在两个真实数据示例中展示了我们的发现,一个是针对婴儿花生过敏的随机试验,另一个是调查同型半胱氨酸对心血管疾病影响的孟德尔随机设置。
{"title":"The impact of coarsening an exposure on partial identifiability in instrumental variable settings.","authors":"Erin E Gabriel, Michael C Sachs, Arvid Sjölander","doi":"10.1093/biostatistics/kxae042","DOIUrl":"https://doi.org/10.1093/biostatistics/kxae042","url":null,"abstract":"<p><p>In instrumental variable (IV) settings, such as imperfect randomized trials and observational studies with Mendelian randomization, one may encounter a continuous exposure, the causal effect of which is not of true interest. Instead, scientific interest may lie in a coarsened version of this exposure. Although there is a lengthy literature on the impact of coarsening of an exposure with several works focusing specifically on IV settings, all methods proposed in this literature require parametric assumptions. Instead, just as in the standard IV setting, one can consider partial identification via bounds making no parametric assumptions. This was first pointed out in Alexander Balke's PhD dissertation. We extend and clarify his work and derive novel bounds in several settings, including for a three-level IV, which will most likely be the case in Mendelian randomization. We demonstrate our findings in two real data examples, a randomized trial for peanut allergy in infants and a Mendelian randomization setting investigating the effect of homocysteine on cardiovascular disease.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142632696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1093/biostatistics/kxad018
Ravi Varadhan, Jiafeng Zhu, Karen Bandeen-Roche
Many older adults experience a major stressor at some point in their lives. The ability to recover well after a major stressor is known as resilience. An important goal of geriatric research is to identify factors that influence resilience to stressors. Studies of resilience in older adults are typically conducted with a single-arm where everyone experiences the stressor. The simplistic approach of regressing change versus baseline yields biased estimates due to mathematical coupling and regression to the mean (RTM). We develop a method to correct the bias. We extend the method to include covariates. Our approach considers a counterfactual control group and involves sensitivity analyses to evaluate different settings of control group parameters. Only minimal distributional assumptions are required. Simulation studies demonstrate the validity of the method. We illustrate the method using a large, registry of older adults (N =7239) who underwent total knee replacement (TKR). We demonstrate how external data can be utilized to constrain the sensitivity analysis. Naive analyses implicated several treatment effect modifiers including baseline function, age, body-mass index (BMI), gender, number of comorbidities, income, and race. Corrected analysis revealed that baseline (pre-stressor) function was not strongly linked to recovery after TKR and among the covariates, only age and number of comorbidities were consistently and negatively associated with post-stressor recovery in all functional domains. Correction of mathematical coupling and RTM is necessary for drawing valid inferences regarding the effect of covariates and baseline status on pre-post change. Our method provides a simple estimator to this end.
{"title":"Identifying predictors of resilience to stressors in single-arm studies of pre-post change.","authors":"Ravi Varadhan, Jiafeng Zhu, Karen Bandeen-Roche","doi":"10.1093/biostatistics/kxad018","DOIUrl":"10.1093/biostatistics/kxad018","url":null,"abstract":"<p><p>Many older adults experience a major stressor at some point in their lives. The ability to recover well after a major stressor is known as resilience. An important goal of geriatric research is to identify factors that influence resilience to stressors. Studies of resilience in older adults are typically conducted with a single-arm where everyone experiences the stressor. The simplistic approach of regressing change versus baseline yields biased estimates due to mathematical coupling and regression to the mean (RTM). We develop a method to correct the bias. We extend the method to include covariates. Our approach considers a counterfactual control group and involves sensitivity analyses to evaluate different settings of control group parameters. Only minimal distributional assumptions are required. Simulation studies demonstrate the validity of the method. We illustrate the method using a large, registry of older adults (N =7239) who underwent total knee replacement (TKR). We demonstrate how external data can be utilized to constrain the sensitivity analysis. Naive analyses implicated several treatment effect modifiers including baseline function, age, body-mass index (BMI), gender, number of comorbidities, income, and race. Corrected analysis revealed that baseline (pre-stressor) function was not strongly linked to recovery after TKR and among the covariates, only age and number of comorbidities were consistently and negatively associated with post-stressor recovery in all functional domains. Correction of mathematical coupling and RTM is necessary for drawing valid inferences regarding the effect of covariates and baseline status on pre-post change. Our method provides a simple estimator to this end.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"1094-1111"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10297247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1093/biostatistics/kxad031
Jian Wang, Xinyang Jiang, Jing Ning
Interest in analyzing recurrent event data has increased over the past few decades. One essential aspect of a risk prediction model for recurrent event data is to accurately distinguish individuals with different risks of developing a recurrent event. Although the concordance index (C-index) effectively evaluates the overall discriminative ability of a regression model for recurrent event data, a local measure is also desirable to capture dynamic performance of the regression model over time. Therefore, in this study, we propose a time-dependent C-index measure for inferring the model's discriminative ability locally. We formulated the C-index as a function of time using a flexible parametric model and constructed a concordance-based likelihood for estimation and inference. We adapted a perturbation-resampling procedure for variance estimation. Extensive simulations were conducted to investigate the proposed time-dependent C-index's finite-sample performance and estimation procedure. We applied the time-dependent C-index to three regression models of a study of re-hospitalization in patients with colorectal cancer to evaluate the models' discriminative capability.
{"title":"Evaluating dynamic and predictive discrimination for recurrent event models: use of a time-dependent C-index.","authors":"Jian Wang, Xinyang Jiang, Jing Ning","doi":"10.1093/biostatistics/kxad031","DOIUrl":"10.1093/biostatistics/kxad031","url":null,"abstract":"<p><p>Interest in analyzing recurrent event data has increased over the past few decades. One essential aspect of a risk prediction model for recurrent event data is to accurately distinguish individuals with different risks of developing a recurrent event. Although the concordance index (C-index) effectively evaluates the overall discriminative ability of a regression model for recurrent event data, a local measure is also desirable to capture dynamic performance of the regression model over time. Therefore, in this study, we propose a time-dependent C-index measure for inferring the model's discriminative ability locally. We formulated the C-index as a function of time using a flexible parametric model and constructed a concordance-based likelihood for estimation and inference. We adapted a perturbation-resampling procedure for variance estimation. Extensive simulations were conducted to investigate the proposed time-dependent C-index's finite-sample performance and estimation procedure. We applied the time-dependent C-index to three regression models of a study of re-hospitalization in patients with colorectal cancer to evaluate the models' discriminative capability.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"1140-1155"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11471962/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89720651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The tree-based scan statistic is a data mining method used to identify signals of adverse drug reactions in a database of spontaneous reporting systems. It is particularly beneficial when dealing with hierarchical data structures. One may use a retrospective case-control study design from spontaneous reporting systems (SRS) to investigate whether a specific adverse event of interest is associated with certain drugs. However, the existing Bernoulli model of the tree-based scan statistic may not be suitable as it fails to adequately account for dependencies within matched pairs. In this article, we propose signal detection statistics for matched case-control data based on McNemar's test, Wald test for conditional logistic regression, and the likelihood ratio test for a multinomial distribution. Through simulation studies, we demonstrate that our proposed methods outperform the existing approach in terms of the type I error rate, power, sensitivity, and false detection rate. To illustrate our proposed approach, we applied the three methods and the existing method to detect drug signals for dizziness-related adverse events related to antihypertensive drugs using the database of the Korea Adverse Event Reporting System.
{"title":"Signal detection statistics of adverse drug events in hierarchical structure for matched case-control data.","authors":"Seok-Jae Heo, Sohee Jeong, Dagyeom Jung, Inkyung Jung","doi":"10.1093/biostatistics/kxad029","DOIUrl":"10.1093/biostatistics/kxad029","url":null,"abstract":"<p><p>The tree-based scan statistic is a data mining method used to identify signals of adverse drug reactions in a database of spontaneous reporting systems. It is particularly beneficial when dealing with hierarchical data structures. One may use a retrospective case-control study design from spontaneous reporting systems (SRS) to investigate whether a specific adverse event of interest is associated with certain drugs. However, the existing Bernoulli model of the tree-based scan statistic may not be suitable as it fails to adequately account for dependencies within matched pairs. In this article, we propose signal detection statistics for matched case-control data based on McNemar's test, Wald test for conditional logistic regression, and the likelihood ratio test for a multinomial distribution. Through simulation studies, we demonstrate that our proposed methods outperform the existing approach in terms of the type I error rate, power, sensitivity, and false detection rate. To illustrate our proposed approach, we applied the three methods and the existing method to detect drug signals for dizziness-related adverse events related to antihypertensive drugs using the database of the Korea Adverse Event Reporting System.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"1112-1121"},"PeriodicalIF":1.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"54232410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1093/biostatistics/kxad013
Qi Qian, Danh V Nguyen, Donatello Telesca, Esra Kurum, Connie M Rhee, Sudipto Banerjee, Yihao Li, Damla Senturk
Dialysis patients experience frequent hospitalizations and a higher mortality rate compared to other Medicare populations, in whom hospitalizations are a major contributor to morbidity, mortality, and healthcare costs. Patients also typically remain on dialysis for the duration of their lives or until kidney transplantation. Hence, there is growing interest in studying the spatiotemporal trends in the correlated outcomes of hospitalization and mortality among dialysis patients as a function of time starting from transition to dialysis across the United States Utilizing national data from the United States Renal Data System (USRDS), we propose a novel multivariate spatiotemporal functional principal component analysis model to study the joint spatiotemporal patterns of hospitalization and mortality rates among dialysis patients. The proposal is based on a multivariate Karhunen-Loéve expansion that describes leading directions of variation across time and induces spatial correlations among region-specific scores. An efficient estimation procedure is proposed using only univariate principal components decompositions and a Markov Chain Monte Carlo framework for targeting the spatial correlations. The finite sample performance of the proposed method is studied through simulations. Novel applications to the USRDS data highlight hot spots across the United States with higher hospitalization and/or mortality rates and time periods of elevated risk.
{"title":"Multivariate spatiotemporal functional principal component analysis for modeling hospitalization and mortality rates in the dialysis population.","authors":"Qi Qian, Danh V Nguyen, Donatello Telesca, Esra Kurum, Connie M Rhee, Sudipto Banerjee, Yihao Li, Damla Senturk","doi":"10.1093/biostatistics/kxad013","DOIUrl":"10.1093/biostatistics/kxad013","url":null,"abstract":"<p><p>Dialysis patients experience frequent hospitalizations and a higher mortality rate compared to other Medicare populations, in whom hospitalizations are a major contributor to morbidity, mortality, and healthcare costs. Patients also typically remain on dialysis for the duration of their lives or until kidney transplantation. Hence, there is growing interest in studying the spatiotemporal trends in the correlated outcomes of hospitalization and mortality among dialysis patients as a function of time starting from transition to dialysis across the United States Utilizing national data from the United States Renal Data System (USRDS), we propose a novel multivariate spatiotemporal functional principal component analysis model to study the joint spatiotemporal patterns of hospitalization and mortality rates among dialysis patients. The proposal is based on a multivariate Karhunen-Loéve expansion that describes leading directions of variation across time and induces spatial correlations among region-specific scores. An efficient estimation procedure is proposed using only univariate principal components decompositions and a Markov Chain Monte Carlo framework for targeting the spatial correlations. The finite sample performance of the proposed method is studied through simulations. Novel applications to the USRDS data highlight hot spots across the United States with higher hospitalization and/or mortality rates and time periods of elevated risk.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"718-735"},"PeriodicalIF":1.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11358256/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10019524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1093/biostatistics/kxad012
Farhad Hatami, Alex Ocampo, Gordon Graham, Thomas E Nichols, Habib Ganjgahi
Existing methods for fitting continuous time Markov models (CTMM) in the presence of covariates suffer from scalability issues due to high computational cost of matrix exponentials calculated for each observation. In this article, we propose an optimization technique for CTMM which uses a stochastic gradient descent algorithm combined with differentiation of the matrix exponential using a Padé approximation. This approach makes fitting large scale data feasible. We present two methods for computing standard errors, one novel approach using the Padé expansion and the other using power series expansion of the matrix exponential. Through simulations, we find improved performance relative to existing CTMM methods, and we demonstrate the method on the large-scale multiple sclerosis NO.MS data set.
{"title":"A scalable approach for continuous time Markov models with covariates.","authors":"Farhad Hatami, Alex Ocampo, Gordon Graham, Thomas E Nichols, Habib Ganjgahi","doi":"10.1093/biostatistics/kxad012","DOIUrl":"10.1093/biostatistics/kxad012","url":null,"abstract":"<p><p>Existing methods for fitting continuous time Markov models (CTMM) in the presence of covariates suffer from scalability issues due to high computational cost of matrix exponentials calculated for each observation. In this article, we propose an optimization technique for CTMM which uses a stochastic gradient descent algorithm combined with differentiation of the matrix exponential using a Padé approximation. This approach makes fitting large scale data feasible. We present two methods for computing standard errors, one novel approach using the Padé expansion and the other using power series expansion of the matrix exponential. Through simulations, we find improved performance relative to existing CTMM methods, and we demonstrate the method on the large-scale multiple sclerosis NO.MS data set.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"681-701"},"PeriodicalIF":1.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247187/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9770094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1093/biostatistics/kxad024
Lillian M F Haine, Thomas A Murry, Raquel Nahra, Giota Touloumi, Eduardo Fernández-Cruz, Kathy Petoumenos, Joseph S Koopmeiners
The traditional trial paradigm is often criticized as being slow, inefficient, and costly. Statistical approaches that leverage external trial data have emerged to make trials more efficient by augmenting the sample size. However, these approaches assume that external data are from previously conducted trials, leaving a rich source of untapped real-world data (RWD) that cannot yet be effectively leveraged. We propose a semi-supervised mixture (SS-MIX) multisource exchangeability model (MEM); a flexible, two-step Bayesian approach for incorporating RWD into randomized controlled trial analyses. The first step is a SS-MIX model on a modified propensity score and the second step is a MEM. The first step targets a representative subgroup of individuals from the trial population and the second step avoids borrowing when there are substantial differences in outcomes among the trial sample and the representative observational sample. When comparing the proposed approach to competing borrowing approaches in a simulation study, we find that our approach borrows efficiently when the trial and RWD are consistent, while mitigating bias when the trial and external data differ on either measured or unmeasured covariates. We illustrate the proposed approach with an application to a randomized controlled trial investigating intravenous hyperimmune immunoglobulin in hospitalized patients with influenza, while leveraging data from an external observational study to supplement a subgroup analysis by influenza subtype.
{"title":"Semi-supervised mixture multi-source exchangeability model for leveraging real-world data in clinical trials.","authors":"Lillian M F Haine, Thomas A Murry, Raquel Nahra, Giota Touloumi, Eduardo Fernández-Cruz, Kathy Petoumenos, Joseph S Koopmeiners","doi":"10.1093/biostatistics/kxad024","DOIUrl":"10.1093/biostatistics/kxad024","url":null,"abstract":"<p><p>The traditional trial paradigm is often criticized as being slow, inefficient, and costly. Statistical approaches that leverage external trial data have emerged to make trials more efficient by augmenting the sample size. However, these approaches assume that external data are from previously conducted trials, leaving a rich source of untapped real-world data (RWD) that cannot yet be effectively leveraged. We propose a semi-supervised mixture (SS-MIX) multisource exchangeability model (MEM); a flexible, two-step Bayesian approach for incorporating RWD into randomized controlled trial analyses. The first step is a SS-MIX model on a modified propensity score and the second step is a MEM. The first step targets a representative subgroup of individuals from the trial population and the second step avoids borrowing when there are substantial differences in outcomes among the trial sample and the representative observational sample. When comparing the proposed approach to competing borrowing approaches in a simulation study, we find that our approach borrows efficiently when the trial and RWD are consistent, while mitigating bias when the trial and external data differ on either measured or unmeasured covariates. We illustrate the proposed approach with an application to a randomized controlled trial investigating intravenous hyperimmune immunoglobulin in hospitalized patients with influenza, while leveraging data from an external observational study to supplement a subgroup analysis by influenza subtype.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"617-632"},"PeriodicalIF":1.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247180/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10268326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1093/biostatistics/kxad028
Quran Wu, Michael Daniels, Areej El-Jawahri, Marie Bakitas, Zhigang Li
Joint modeling of longitudinal data such as quality of life data and survival data is important for palliative care researchers to draw efficient inferences because it can account for the associations between those two types of data. Modeling quality of life on a retrospective from death time scale is useful for investigators to interpret the analysis results of palliative care studies which have relatively short life expectancies. However, informative censoring remains a complex challenge for modeling quality of life on the retrospective time scale although it has been addressed for joint models on the prospective time scale. To fill this gap, we develop a novel joint modeling approach that can address the challenge by allowing informative censoring events to be dependent on patients' quality of life and survival through a random effect. There are two sub-models in our approach: a linear mixed effect model for the longitudinal quality of life and a competing-risk model for the death time and dropout time that share the same random effect as the longitudinal model. Our approach can provide unbiased estimates for parameters of interest by appropriately modeling the informative censoring time. Model performance is assessed with a simulation study and compared with existing approaches. A real-world study is presented to illustrate the application of the new approach.
{"title":"Joint modeling in presence of informative censoring on the retrospective time scale with application to palliative care research.","authors":"Quran Wu, Michael Daniels, Areej El-Jawahri, Marie Bakitas, Zhigang Li","doi":"10.1093/biostatistics/kxad028","DOIUrl":"10.1093/biostatistics/kxad028","url":null,"abstract":"<p><p>Joint modeling of longitudinal data such as quality of life data and survival data is important for palliative care researchers to draw efficient inferences because it can account for the associations between those two types of data. Modeling quality of life on a retrospective from death time scale is useful for investigators to interpret the analysis results of palliative care studies which have relatively short life expectancies. However, informative censoring remains a complex challenge for modeling quality of life on the retrospective time scale although it has been addressed for joint models on the prospective time scale. To fill this gap, we develop a novel joint modeling approach that can address the challenge by allowing informative censoring events to be dependent on patients' quality of life and survival through a random effect. There are two sub-models in our approach: a linear mixed effect model for the longitudinal quality of life and a competing-risk model for the death time and dropout time that share the same random effect as the longitudinal model. Our approach can provide unbiased estimates for parameters of interest by appropriately modeling the informative censoring time. Model performance is assessed with a simulation study and compared with existing approaches. A real-world study is presented to illustrate the application of the new approach.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"754-768"},"PeriodicalIF":1.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11247190/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41161763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}