Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae023
Hyung G Park
This paper presents a Bayesian reformulation of covariate-assisted principal regression for covariance matrix outcomes to identify low-dimensional components in the covariance associated with covariates. By introducing a geometric approach to the covariance matrices and leveraging Euclidean geometry, we estimate dimension reduction parameters and model covariance heterogeneity based on covariates. This method enables joint estimation and uncertainty quantification of relevant model parameters associated with heteroscedasticity. We demonstrate our approach through simulation studies and apply it to analyze associations between covariates and brain functional connectivity using data from the Human Connectome Project.
{"title":"Bayesian estimation of covariate assisted principal regression for brain functional connectivity.","authors":"Hyung G Park","doi":"10.1093/biostatistics/kxae023","DOIUrl":"10.1093/biostatistics/kxae023","url":null,"abstract":"<p><p>This paper presents a Bayesian reformulation of covariate-assisted principal regression for covariance matrix outcomes to identify low-dimensional components in the covariance associated with covariates. By introducing a geometric approach to the covariance matrices and leveraging Euclidean geometry, we estimate dimension reduction parameters and model covariance heterogeneity based on covariates. This method enables joint estimation and uncertainty quantification of relevant model parameters associated with heteroscedasticity. We demonstrate our approach through simulation studies and apply it to analyze associations between covariates and brain functional connectivity using data from the Human Connectome Project.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823071/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141565188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf032
Kaiqiong Zhao, Archer Y Yang, Karim Oualkacha, Yixiao Zeng, Kathleen Klein, Marie Hudson, Inés Colmegna, Sasha Bernatsky, Celia M T Greenwood
Varying coefficient models offer the flexibility to learn the dynamic changes of regression coefficients. Despite their good interpretability and diverse applications, in high-dimensional settings, existing estimation methods for such models have important limitations. For example, we routinely encounter the need for variable selection when faced with a large collection of covariates with nonlinear/varying effects on outcomes, and no ideal solutions exist. One illustration of this situation could be identifying a subset of genetic variants with local influence on methylation levels in a regulatory region. To address this problem, we propose a composite sparse penalty that encourages both sparsity and smoothness for the varying coefficients. We present an efficient proximal gradient descent algorithm that scales to high-dimensional predictor spaces, providing sparse solutions for the varying coefficients. A comprehensive simulation study has been conducted to evaluate the performance of our approach in terms of estimation, prediction and selection accuracy. We show that the inclusion of smoothness control yields much better results over sparsity-only approaches. An adaptive version of the penalty offers additional performance gains. We further demonstrate the utility of our method in identifying regional mQTLs from asymptomatic samples in the CARTaGENE cohort. The methodology is implemented in the R package sparseSOMNiBUS, available on GitHub.
{"title":"A novel high-dimensional model for identifying regional DNA methylation QTLs.","authors":"Kaiqiong Zhao, Archer Y Yang, Karim Oualkacha, Yixiao Zeng, Kathleen Klein, Marie Hudson, Inés Colmegna, Sasha Bernatsky, Celia M T Greenwood","doi":"10.1093/biostatistics/kxaf032","DOIUrl":"10.1093/biostatistics/kxaf032","url":null,"abstract":"<p><p>Varying coefficient models offer the flexibility to learn the dynamic changes of regression coefficients. Despite their good interpretability and diverse applications, in high-dimensional settings, existing estimation methods for such models have important limitations. For example, we routinely encounter the need for variable selection when faced with a large collection of covariates with nonlinear/varying effects on outcomes, and no ideal solutions exist. One illustration of this situation could be identifying a subset of genetic variants with local influence on methylation levels in a regulatory region. To address this problem, we propose a composite sparse penalty that encourages both sparsity and smoothness for the varying coefficients. We present an efficient proximal gradient descent algorithm that scales to high-dimensional predictor spaces, providing sparse solutions for the varying coefficients. A comprehensive simulation study has been conducted to evaluate the performance of our approach in terms of estimation, prediction and selection accuracy. We show that the inclusion of smoothness control yields much better results over sparsity-only approaches. An adaptive version of the penalty offers additional performance gains. We further demonstrate the utility of our method in identifying regional mQTLs from asymptomatic samples in the CARTaGENE cohort. The methodology is implemented in the R package sparseSOMNiBUS, available on GitHub.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12554007/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145373301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf038
Raphaël Morsomme, C Jason Liang, Allyson Mateja, Dean A Follmann, Meagan P O'Brien, Chenguang Wang, Jonathan Fintzi
We introduce a computationally efficient and general approach for utilizing multiple, possibly interval-censored, data streams to study complex biomedical endpoints using multistate semi-Markov models. Our motivating application is the REGEN-2069 trial, which investigated the protective efficacy (PE) of the monoclonal antibody combination REGEN-COV against SARS-CoV-2 when administered prophylactically to individuals in households at high risk of secondary transmission. Using data on symptom onset, episodic RT-qPCR sampling, and serological testing, we estimate the PE of REGEN-COV for asymptomatic infection, its effect on seroconversion following infection, and the duration of viral shedding. We find that REGEN-COV reduced the risk of asymptomatic infection and the duration of viral shedding, and led to lower rates of seroconversion among asymptomatically infected participants. Our algorithm for fitting semi-Markov models to interval-censored data employs a Monte Carlo expectation-maximization algorithm combined with importance sampling to efficiently address the intractability of the marginal likelihood when data are intermittently observed. Our algorithm provides substantial computational improvements over existing methods and allows us to fit semi-parametric models despite complex coarsening of the data.
{"title":"Assessing treatment efficacy for interval-censored endpoints using multistate semi-Markov models fit to multiple data streams.","authors":"Raphaël Morsomme, C Jason Liang, Allyson Mateja, Dean A Follmann, Meagan P O'Brien, Chenguang Wang, Jonathan Fintzi","doi":"10.1093/biostatistics/kxaf038","DOIUrl":"10.1093/biostatistics/kxaf038","url":null,"abstract":"<p><p>We introduce a computationally efficient and general approach for utilizing multiple, possibly interval-censored, data streams to study complex biomedical endpoints using multistate semi-Markov models. Our motivating application is the REGEN-2069 trial, which investigated the protective efficacy (PE) of the monoclonal antibody combination REGEN-COV against SARS-CoV-2 when administered prophylactically to individuals in households at high risk of secondary transmission. Using data on symptom onset, episodic RT-qPCR sampling, and serological testing, we estimate the PE of REGEN-COV for asymptomatic infection, its effect on seroconversion following infection, and the duration of viral shedding. We find that REGEN-COV reduced the risk of asymptomatic infection and the duration of viral shedding, and led to lower rates of seroconversion among asymptomatically infected participants. Our algorithm for fitting semi-Markov models to interval-censored data employs a Monte Carlo expectation-maximization algorithm combined with importance sampling to efficiently address the intractability of the marginal likelihood when data are intermittently observed. Our algorithm provides substantial computational improvements over existing methods and allows us to fit semi-parametric models despite complex coarsening of the data.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12629085/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145558436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae050
{"title":"Correction to: Scalable kernel balancing weights in a nationwide observational study of hospital profit status and heart attack outcomes.","authors":"","doi":"10.1093/biostatistics/kxae050","DOIUrl":"10.1093/biostatistics/kxae050","url":null,"abstract":"","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142883691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae053
Yingfa Xie, Haoda Fu, Yuan Huang, Vladimir Pozdnyakov, Jun Yan
Patients with type 2 diabetes need to closely monitor blood sugar levels as their routine diabetes self-management. Although many treatment agents aim to tightly control blood sugar, hypoglycemia often stands as an adverse event. In practice, patients can observe hypoglycemic events more easily than hyperglycemic events due to the perception of neurogenic symptoms. We propose to model each patient's observed hypoglycemic event as a lower boundary crossing event for a reflected Brownian motion with an upper reflection barrier. The lower boundary is set by clinical standards. To capture patient heterogeneity and within-patient dependence, covariates and a patient level frailty are incorporated into the volatility and the upper reflection barrier. This framework provides quantification for the underlying glucose level variability, patients heterogeneity, and risk factors' impact on glucose. We make inferences based on a Bayesian framework using Markov chain Monte Carlo. Two model comparison criteria, the deviance information criterion and the logarithm of the pseudo-marginal likelihood, are used for model selection. The methodology is validated in simulation studies. In analyzing a dataset from the diabetic patients in the DURABLE trial, our model provides adequate fit, generates data similar to the observed data, and offers insights that could be missed by other models.
{"title":"Recurrent events modeling based on a reflected Brownian motion with application to hypoglycemia.","authors":"Yingfa Xie, Haoda Fu, Yuan Huang, Vladimir Pozdnyakov, Jun Yan","doi":"10.1093/biostatistics/kxae053","DOIUrl":"10.1093/biostatistics/kxae053","url":null,"abstract":"<p><p>Patients with type 2 diabetes need to closely monitor blood sugar levels as their routine diabetes self-management. Although many treatment agents aim to tightly control blood sugar, hypoglycemia often stands as an adverse event. In practice, patients can observe hypoglycemic events more easily than hyperglycemic events due to the perception of neurogenic symptoms. We propose to model each patient's observed hypoglycemic event as a lower boundary crossing event for a reflected Brownian motion with an upper reflection barrier. The lower boundary is set by clinical standards. To capture patient heterogeneity and within-patient dependence, covariates and a patient level frailty are incorporated into the volatility and the upper reflection barrier. This framework provides quantification for the underlying glucose level variability, patients heterogeneity, and risk factors' impact on glucose. We make inferences based on a Bayesian framework using Markov chain Monte Carlo. Two model comparison criteria, the deviance information criterion and the logarithm of the pseudo-marginal likelihood, are used for model selection. The methodology is validated in simulation studies. In analyzing a dataset from the diabetic patients in the DURABLE trial, our model provides adequate fit, generates data similar to the observed data, and offers insights that could be missed by other models.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf019
Yi Zhao, Xi Luo, Michael E Sobel, Martin A Lindquist, Brian S Caffo
A primary goal of task-based functional magnetic resonance imaging (fMRI) studies is to quantify the effective connectivity between brain regions when stimuli are presented. Assessing the dynamics of effective connectivity has attracted increasing attention. Causal mediation analysis serves as a widely implemented tool aiming to delineate the mechanism between task stimuli and brain activations. However, the case, where the treatment, mediator, and outcome are continuous functions, has not been studied. Causal mediation analysis for functional data is considered. Semiparametric functional linear structural equation models are introduced and causal assumptions are discussed. The proposed models allow for the estimation of individual effect curves. The models are applied to a task-based fMRI study, providing a new perspective of studying dynamic brain connectivity. The R package cfma for implementation is available on CRAN.
{"title":"Causal functional mediation analysis with an application to functional magnetic resonance imaging data.","authors":"Yi Zhao, Xi Luo, Michael E Sobel, Martin A Lindquist, Brian S Caffo","doi":"10.1093/biostatistics/kxaf019","DOIUrl":"10.1093/biostatistics/kxaf019","url":null,"abstract":"<p><p>A primary goal of task-based functional magnetic resonance imaging (fMRI) studies is to quantify the effective connectivity between brain regions when stimuli are presented. Assessing the dynamics of effective connectivity has attracted increasing attention. Causal mediation analysis serves as a widely implemented tool aiming to delineate the mechanism between task stimuli and brain activations. However, the case, where the treatment, mediator, and outcome are continuous functions, has not been studied. Causal mediation analysis for functional data is considered. Semiparametric functional linear structural equation models are introduced and causal assumptions are discussed. The proposed models allow for the estimation of individual effect curves. The models are applied to a task-based fMRI study, providing a new perspective of studying dynamic brain connectivity. The R package cfma for implementation is available on CRAN.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12206356/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144531230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae020
Wei Zong, Danyang Li, Marianne L Seney, Colleen A Mcclung, George C Tseng
High-dimensional omics data often contain intricate and multifaceted information, resulting in the coexistence of multiple plausible sample partitions based on different subsets of selected features. Conventional clustering methods typically yield only one clustering solution, limiting their capacity to fully capture all facets of cluster structures in high-dimensional data. To address this challenge, we propose a model-based multifacet clustering (MFClust) method based on a mixture of Gaussian mixture models, where the former mixture achieves facet assignment for gene features and the latter mixture determines cluster assignment of samples. We demonstrate superior facet and cluster assignment accuracy of MFClust through simulation studies. The proposed method is applied to three transcriptomic applications from postmortem brain and lung disease studies. The result captures multifacet clustering structures associated with critical clinical variables and provides intriguing biological insights for further hypothesis generation and discovery.
{"title":"Model-based multifacet clustering with high-dimensional omics applications.","authors":"Wei Zong, Danyang Li, Marianne L Seney, Colleen A Mcclung, George C Tseng","doi":"10.1093/biostatistics/kxae020","DOIUrl":"10.1093/biostatistics/kxae020","url":null,"abstract":"<p><p>High-dimensional omics data often contain intricate and multifaceted information, resulting in the coexistence of multiple plausible sample partitions based on different subsets of selected features. Conventional clustering methods typically yield only one clustering solution, limiting their capacity to fully capture all facets of cluster structures in high-dimensional data. To address this challenge, we propose a model-based multifacet clustering (MFClust) method based on a mixture of Gaussian mixture models, where the former mixture achieves facet assignment for gene features and the latter mixture determines cluster assignment of samples. We demonstrate superior facet and cluster assignment accuracy of MFClust through simulation studies. The proposed method is applied to three transcriptomic applications from postmortem brain and lung disease studies. The result captures multifacet clustering structures associated with critical clinical variables and provides intriguing biological insights for further hypothesis generation and discovery.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823124/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141604511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae046
Yiqun T Chen, Lucy L Gao
For many applications, it is critical to interpret and validate groups of observations obtained via clustering. A common interpretation and validation approach involves testing differences in feature means between observations in two estimated clusters. In this setting, classical hypothesis tests lead to an inflated Type I error rate. To overcome this problem, we propose a new test for the difference in means in a single feature between a pair of clusters obtained using hierarchical or k-means clustering. The test controls the selective Type I error rate in finite samples and can be efficiently computed. We further illustrate the validity and power of our proposal in simulation and demonstrate its use on single-cell RNA-sequencing data.
{"title":"Testing for a difference in means of a single feature after clustering.","authors":"Yiqun T Chen, Lucy L Gao","doi":"10.1093/biostatistics/kxae046","DOIUrl":"10.1093/biostatistics/kxae046","url":null,"abstract":"<p><p>For many applications, it is critical to interpret and validate groups of observations obtained via clustering. A common interpretation and validation approach involves testing differences in feature means between observations in two estimated clusters. In this setting, classical hypothesis tests lead to an inflated Type I error rate. To overcome this problem, we propose a new test for the difference in means in a single feature between a pair of clusters obtained using hierarchical or k-means clustering. The test controls the selective Type I error rate in finite samples and can be efficiently computed. We further illustrate the validity and power of our proposal in simulation and demonstrate its use on single-cell RNA-sequencing data.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11687323/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142911253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf028
Andrea Sottosanti, Enrico Bovo, Pietro Belloni, Giovanna Boccuzzo
Disease mapping analyses the distribution of several disease outcomes within a territory. Primary goals include identifying areas with unexpected changes in mortality rates, studying the relation among multiple diseases, and dividing the analysed territory into clusters based on the observed levels of disease incidence or mortality. In this work, we focus on detecting spatial mortality clusters, that occur when neighbouring areas within a territory exhibit similar mortality levels due to one or more diseases. When multiple causes of death are examined together, it is relevant to identify not only the spatial boundaries of the clusters but also the diseases that lead to their formation. However, existing methods in literature struggle to address this dual problem effectively and simultaneously. To overcome these limitations, we introduce perla, a multivariate Bayesian model that clusters areas in a territory according to the observed mortality rates of multiple causes of death, also exploiting the information of external covariates. Our model incorporates the spatial structure of data directly into the clustering probabilities by leveraging the stick-breaking formulation of the multinomial distribution. Additionally, it exploits suitable global-local shrinkage priors to ensure that the detection of clusters depends on diseases showing concrete increases or decreases in mortality levels, while excluding uninformative diseases. We propose a Markov chain Monte Carlo algorithm for posterior inference that consists of closed-form Gibbs sampling moves for nearly every model parameter, without requiring complex tuning operations. This work is primarily motivated by a case study on the territory of a local unit within the Italian public healthcare system, known as ULSS6 Euganea. To demonstrate the flexibility and effectiveness of our methodology, we also validate perla with a series of simulation experiments and an extensive case study on mortality levels in U.S. counties.
{"title":"Bayesian mapping of mortality clusters.","authors":"Andrea Sottosanti, Enrico Bovo, Pietro Belloni, Giovanna Boccuzzo","doi":"10.1093/biostatistics/kxaf028","DOIUrl":"10.1093/biostatistics/kxaf028","url":null,"abstract":"<p><p>Disease mapping analyses the distribution of several disease outcomes within a territory. Primary goals include identifying areas with unexpected changes in mortality rates, studying the relation among multiple diseases, and dividing the analysed territory into clusters based on the observed levels of disease incidence or mortality. In this work, we focus on detecting spatial mortality clusters, that occur when neighbouring areas within a territory exhibit similar mortality levels due to one or more diseases. When multiple causes of death are examined together, it is relevant to identify not only the spatial boundaries of the clusters but also the diseases that lead to their formation. However, existing methods in literature struggle to address this dual problem effectively and simultaneously. To overcome these limitations, we introduce perla, a multivariate Bayesian model that clusters areas in a territory according to the observed mortality rates of multiple causes of death, also exploiting the information of external covariates. Our model incorporates the spatial structure of data directly into the clustering probabilities by leveraging the stick-breaking formulation of the multinomial distribution. Additionally, it exploits suitable global-local shrinkage priors to ensure that the detection of clusters depends on diseases showing concrete increases or decreases in mortality levels, while excluding uninformative diseases. We propose a Markov chain Monte Carlo algorithm for posterior inference that consists of closed-form Gibbs sampling moves for nearly every model parameter, without requiring complex tuning operations. This work is primarily motivated by a case study on the territory of a local unit within the Italian public healthcare system, known as ULSS6 Euganea. To demonstrate the flexibility and effectiveness of our methodology, we also validate perla with a series of simulation experiments and an extensive case study on mortality levels in U.S. counties.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12596199/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145477223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae043
Tingting Yu, Lang Wu, Ronald J Bosch, Davey M Smith, Rui Wang
Maximum likelihood inference can often become computationally intensive when performing joint modeling of longitudinal and time-to-event data, due to the intractable integrals in the joint likelihood function. The computational challenges escalate further when modeling HIV-1 viral load data, owing to the nonlinear trajectories and the presence of left-censored data resulting from the assay's lower limit of quantification. In this paper, for a joint model comprising a nonlinear mixed-effect model and a Cox Proportional Hazards model, we develop a computationally efficient Stochastic EM (StEM) algorithm for parameter estimation. Furthermore, we propose a novel technique for fast standard error estimation, which directly estimates standard errors from the results of StEM iterations and is broadly applicable to various joint modeling settings, such as those containing generalized linear mixed-effect models, parametric survival models, or joint models with more than two submodels. We evaluate the performance of the proposed methods through simulation studies and apply them to HIV-1 viral load data from six AIDS Clinical Trials Group studies to characterize viral rebound trajectories following the interruption of antiretroviral therapy (ART), accounting for the informative duration of off-ART periods.
{"title":"Fast standard error estimation for joint models of longitudinal and time-to-event data based on stochastic EM algorithms.","authors":"Tingting Yu, Lang Wu, Ronald J Bosch, Davey M Smith, Rui Wang","doi":"10.1093/biostatistics/kxae043","DOIUrl":"10.1093/biostatistics/kxae043","url":null,"abstract":"<p><p>Maximum likelihood inference can often become computationally intensive when performing joint modeling of longitudinal and time-to-event data, due to the intractable integrals in the joint likelihood function. The computational challenges escalate further when modeling HIV-1 viral load data, owing to the nonlinear trajectories and the presence of left-censored data resulting from the assay's lower limit of quantification. In this paper, for a joint model comprising a nonlinear mixed-effect model and a Cox Proportional Hazards model, we develop a computationally efficient Stochastic EM (StEM) algorithm for parameter estimation. Furthermore, we propose a novel technique for fast standard error estimation, which directly estimates standard errors from the results of StEM iterations and is broadly applicable to various joint modeling settings, such as those containing generalized linear mixed-effect models, parametric survival models, or joint models with more than two submodels. We evaluate the performance of the proposed methods through simulation studies and apply them to HIV-1 viral load data from six AIDS Clinical Trials Group studies to characterize viral rebound trajectories following the interruption of antiretroviral therapy (ART), accounting for the informative duration of off-ART periods.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823262/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142632694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}