Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf031
Brian Gilbert, Katherine Hoffman, Nicholas Williams, Kara Rudolph, Edward J Schenck, Iván Díaz
We demonstrate a comprehensive semiparametric approach to causal mediation analysis, addressing the complexities inherent in settings with longitudinal and continuous treatments, confounders, and mediators. Our methodology utilizes a nonparametric structural equation model and a cross-fitted sequential regression technique based on doubly robust pseudo-outcomes, yielding an efficient, asymptotically normal estimator without relying on restrictive parametric modeling assumptions. We are motivated by a recent scientific controversy regarding the effects of invasive mechanical ventilation on the survival of COVID-19 patients, considering acute kidney injury as a mediating factor. We highlight the possibility of "inconsistent mediation," in which the direct and indirect effects of the exposure operate in opposite directions. We discuss the significance of mediation analysis for scientific understanding and its potential utility in treatment decisions.
{"title":"Identification and estimation of mediational effects of longitudinal modified treatment policies.","authors":"Brian Gilbert, Katherine Hoffman, Nicholas Williams, Kara Rudolph, Edward J Schenck, Iván Díaz","doi":"10.1093/biostatistics/kxaf031","DOIUrl":"10.1093/biostatistics/kxaf031","url":null,"abstract":"<p><p>We demonstrate a comprehensive semiparametric approach to causal mediation analysis, addressing the complexities inherent in settings with longitudinal and continuous treatments, confounders, and mediators. Our methodology utilizes a nonparametric structural equation model and a cross-fitted sequential regression technique based on doubly robust pseudo-outcomes, yielding an efficient, asymptotically normal estimator without relying on restrictive parametric modeling assumptions. We are motivated by a recent scientific controversy regarding the effects of invasive mechanical ventilation on the survival of COVID-19 patients, considering acute kidney injury as a mediating factor. We highlight the possibility of \"inconsistent mediation,\" in which the direct and indirect effects of the exposure operate in opposite directions. We discuss the significance of mediation analysis for scientific understanding and its potential utility in treatment decisions.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145477179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf003
Ales Kotalik, David M Vock, Nancy E Sherwood, Brian P Hobbs, Joseph S Koopmeiners
The Sequential Multiple Assignment Randomized Trial (SMART) is a complex trial design that involves randomizing a single participant multiple times in a sequential manner. This results in the branching nature of a SMART, which represents several distinct groups defined by different combinations of treatments, response statuses, etc. A SMART can then answer various scientific questions of interest, eg, the optimal dynamic treatment regime (DTR) for treating a chronic illness, what intervention to offer first, and what intervention to offer to nonresponders (or suboptimal responders). However, the analysis of a SMART can suffer from low precision, as the potentially widely branching structure can lead to reduced sample sizes in some groups of interest. In this paper, we propose a novel analysis method for a SMART in which dynamic borrowing is used to borrow strength across groups with similar expected outcomes, thus providing increased precision for the estimation of the expected outcomes of DTRs. We apply our method to a SMART evaluating various weight loss strategies using a binary endpoint of clinically significant weight loss and show by simulation that our method can improve the precision of the estimated expected outcome of a DTR, aid in the identification of the optimal DTR, and produce a clustering analysis of DTRs embedded in a SMART.
{"title":"Within-trial data borrowing for sequential multiple assignment randomized trials.","authors":"Ales Kotalik, David M Vock, Nancy E Sherwood, Brian P Hobbs, Joseph S Koopmeiners","doi":"10.1093/biostatistics/kxaf003","DOIUrl":"10.1093/biostatistics/kxaf003","url":null,"abstract":"<p><p>The Sequential Multiple Assignment Randomized Trial (SMART) is a complex trial design that involves randomizing a single participant multiple times in a sequential manner. This results in the branching nature of a SMART, which represents several distinct groups defined by different combinations of treatments, response statuses, etc. A SMART can then answer various scientific questions of interest, eg, the optimal dynamic treatment regime (DTR) for treating a chronic illness, what intervention to offer first, and what intervention to offer to nonresponders (or suboptimal responders). However, the analysis of a SMART can suffer from low precision, as the potentially widely branching structure can lead to reduced sample sizes in some groups of interest. In this paper, we propose a novel analysis method for a SMART in which dynamic borrowing is used to borrow strength across groups with similar expected outcomes, thus providing increased precision for the estimation of the expected outcomes of DTRs. We apply our method to a SMART evaluating various weight loss strategies using a binary endpoint of clinically significant weight loss and show by simulation that our method can improve the precision of the estimated expected outcome of a DTR, aid in the identification of the optimal DTR, and produce a clustering analysis of DTRs embedded in a SMART.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11963638/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143765923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf047
Xi Fang, Hajime Uno, Fan Li
Composite endpoints are frequently used in clinical trials to enhance the event rate and improve the statistical power. In the presence of a terminal event, the while-alive cumulative frequency measure offers a useful alternative to define composite survival outcomes, by relating the average event rate to the survival time. Although non-parametric methods have been proposed for two-sample comparisons, limited attention has been given to regression methods that directly address time-varying association effects in while-alive measures. We address this gap by developing a regression framework for exposure-weighted while-alive measures for composite survival outcomes that include a terminal component event. Our regression approach uses splines to model time-varying association between covariates and a generalized while-alive loss rate of all component events, and can be applied to both independent and clustered data. We derive the asymptotic properties of the regression estimator under both independent data and cluster-correlated data settings, and study the operating characteristics of our methods through simulations. Finally, we apply our regression method to analyze data two randomized clinical trials. The proposed methods are implemented in the WAreg R package.
{"title":"While-alive regression analysis of composite survival endpoints.","authors":"Xi Fang, Hajime Uno, Fan Li","doi":"10.1093/biostatistics/kxaf047","DOIUrl":"https://doi.org/10.1093/biostatistics/kxaf047","url":null,"abstract":"<p><p>Composite endpoints are frequently used in clinical trials to enhance the event rate and improve the statistical power. In the presence of a terminal event, the while-alive cumulative frequency measure offers a useful alternative to define composite survival outcomes, by relating the average event rate to the survival time. Although non-parametric methods have been proposed for two-sample comparisons, limited attention has been given to regression methods that directly address time-varying association effects in while-alive measures. We address this gap by developing a regression framework for exposure-weighted while-alive measures for composite survival outcomes that include a terminal component event. Our regression approach uses splines to model time-varying association between covariates and a generalized while-alive loss rate of all component events, and can be applied to both independent and clustered data. We derive the asymptotic properties of the regression estimator under both independent data and cluster-correlated data settings, and study the operating characteristics of our methods through simulations. Finally, we apply our regression method to analyze data two randomized clinical trials. The proposed methods are implemented in the WAreg R package.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145828750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In biomedical studies, continuous and ordinal longitudinal variables are frequently encountered. In many of these studies it is of interest to estimate the effect of one of these longitudinal variables on the other. Time-dependent covariates have, however, several limitations; they can, for example, not be included when the data is not collected at fixed intervals. The issues can be circumvented by implementing joint models, where two or more longitudinal variables are treated as a response and modeled with a correlated random effect. Next, by conditioning on these response(s), we can study the effect of one or more longitudinal variables on another. We propose a normal-ordinal(probit) joint model. First, we derive closed-form formulas to estimate the model-based correlations between the responses on their original scale. In addition, we derive the marginal model, where the interpretation is no longer conditional on the random effects. As a consequence, we can make predictions for a subvector of one response conditional on the other response and potentially a subvector of the history of the response. Next, we extend the approach to a high-dimensional case with more than two ordinal and/or continuous longitudinal variables. The methodology is applied to a case study where, among others, a longitudinal ordinal response is predicted with a longitudinal continuous variable.
{"title":"A joint normal-ordinal (probit) model for ordinal and continuous longitudinal data.","authors":"Margaux Delporte, Geert Molenberghs, Steffen Fieuws, Geert Verbeke","doi":"10.1093/biostatistics/kxae014","DOIUrl":"10.1093/biostatistics/kxae014","url":null,"abstract":"<p><p>In biomedical studies, continuous and ordinal longitudinal variables are frequently encountered. In many of these studies it is of interest to estimate the effect of one of these longitudinal variables on the other. Time-dependent covariates have, however, several limitations; they can, for example, not be included when the data is not collected at fixed intervals. The issues can be circumvented by implementing joint models, where two or more longitudinal variables are treated as a response and modeled with a correlated random effect. Next, by conditioning on these response(s), we can study the effect of one or more longitudinal variables on another. We propose a normal-ordinal(probit) joint model. First, we derive closed-form formulas to estimate the model-based correlations between the responses on their original scale. In addition, we derive the marginal model, where the interpretation is no longer conditional on the random effects. As a consequence, we can make predictions for a subvector of one response conditional on the other response and potentially a subvector of the history of the response. Next, we extend the approach to a high-dimensional case with more than two ordinal and/or continuous longitudinal variables. The methodology is applied to a case study where, among others, a longitudinal ordinal response is predicted with a longitudinal continuous variable.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141312354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae027
Yue Wang, Haoran Shi
This paper tackles the challenge of estimating correlations between higher-level biological variables (e.g. proteins and gene pathways) when only lower-level measurements are directly observed (e.g. peptides and individual genes). Existing methods typically aggregate lower-level data into higher-level variables and then estimate correlations based on the aggregated data. However, different data aggregation methods can yield varying correlation estimates as they target different higher-level quantities. Our solution is a latent factor model that directly estimates these higher-level correlations from lower-level data without the need for data aggregation. We further introduce a shrinkage estimator to ensure the positive definiteness and improve the accuracy of the estimated correlation matrix. Furthermore, we establish the asymptotic normality of our estimator, enabling efficient computation of P-values for the identification of significant correlations. The effectiveness of our approach is demonstrated through comprehensive simulations and the analysis of proteomics and gene expression datasets. We develop the R package highcor for implementing our method.
本文探讨了在只能直接观测到较低层次测量数据(如肽和单个基因)的情况下,如何估算较高层次生物变量(如蛋白质和基因通路)之间的相关性这一难题。现有方法通常是将较低级别的数据聚合为较高级别的变量,然后根据聚合数据估计相关性。然而,不同的数据聚合方法会产生不同的相关性估计值,因为它们针对的是不同的高层次数量。我们的解决方案是采用潜因模型,无需数据聚合,直接从低层次数据中估算这些高层次相关性。我们进一步引入了收缩估计器,以确保正定性并提高相关矩阵估计的准确性。此外,我们还建立了估计器的渐近正态性,从而可以高效计算 P 值,识别重要的相关性。我们通过对蛋白质组学和基因表达数据集的全面模拟和分析,证明了我们方法的有效性。我们开发了用于实现我们方法的 R 软件包 highcor。
{"title":"Direct estimation and inference of higher-level correlations from lower-level measurements with applications in gene-pathway and proteomics studies.","authors":"Yue Wang, Haoran Shi","doi":"10.1093/biostatistics/kxae027","DOIUrl":"10.1093/biostatistics/kxae027","url":null,"abstract":"<p><p>This paper tackles the challenge of estimating correlations between higher-level biological variables (e.g. proteins and gene pathways) when only lower-level measurements are directly observed (e.g. peptides and individual genes). Existing methods typically aggregate lower-level data into higher-level variables and then estimate correlations based on the aggregated data. However, different data aggregation methods can yield varying correlation estimates as they target different higher-level quantities. Our solution is a latent factor model that directly estimates these higher-level correlations from lower-level data without the need for data aggregation. We further introduce a shrinkage estimator to ensure the positive definiteness and improve the accuracy of the estimated correlation matrix. Furthermore, we establish the asymptotic normality of our estimator, enabling efficient computation of P-values for the identification of significant correlations. The effectiveness of our approach is demonstrated through comprehensive simulations and the analysis of proteomics and gene expression datasets. We develop the R package highcor for implementing our method.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141861746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf008
Ruihan Lu, Yehua Li, Weixin Yao
The transitional phase of menopause induces significant hormonal fluctuations, exerting a profound influence on the long-term well-being of women. In an extensive longitudinal investigation of women's health during mid-life and beyond, known as the Study of Women's Health Across the Nation (SWAN), hormonal biomarkers are repeatedly assessed, following an asynchronous schedule compared to other error-prone covariates, such as physical and cardiovascular measurements. We conduct a subgroup analysis of the SWAN data employing a semiparametric mixture regression model, which allows us to explore how the relationship between hormonal responses and other time-varying or time-invariant covariates varies across subgroups. To address the challenges posed by asynchronous scheduling and measurement errors, we model the time-varying covariate trajectories as functional data with reduced-rank Karhunen-Loéve expansions, where splines are employed to capture the mean and eigenfunctions. Treating the latent subgroup membership and the functional principal component (FPC) scores as missing data, we propose an Expectation-Maximization algorithm to effectively fit the joint model, combining the mixture regression for the hormonal response and the FPC model for the asynchronous, time-varying covariates. In addition, we explore data-driven methods to determine the optimal number of subgroups within the population. Through our comprehensive analysis of the SWAN data, we unveil a crucial subgroup structure within the aging female population, shedding light on important distinctions and patterns among women undergoing menopause.
{"title":"Semiparametric mixture regression for asynchronous longitudinal data using multivariate functional principal component analysis.","authors":"Ruihan Lu, Yehua Li, Weixin Yao","doi":"10.1093/biostatistics/kxaf008","DOIUrl":"10.1093/biostatistics/kxaf008","url":null,"abstract":"<p><p>The transitional phase of menopause induces significant hormonal fluctuations, exerting a profound influence on the long-term well-being of women. In an extensive longitudinal investigation of women's health during mid-life and beyond, known as the Study of Women's Health Across the Nation (SWAN), hormonal biomarkers are repeatedly assessed, following an asynchronous schedule compared to other error-prone covariates, such as physical and cardiovascular measurements. We conduct a subgroup analysis of the SWAN data employing a semiparametric mixture regression model, which allows us to explore how the relationship between hormonal responses and other time-varying or time-invariant covariates varies across subgroups. To address the challenges posed by asynchronous scheduling and measurement errors, we model the time-varying covariate trajectories as functional data with reduced-rank Karhunen-Loéve expansions, where splines are employed to capture the mean and eigenfunctions. Treating the latent subgroup membership and the functional principal component (FPC) scores as missing data, we propose an Expectation-Maximization algorithm to effectively fit the joint model, combining the mixture regression for the hormonal response and the FPC model for the asynchronous, time-varying covariates. In addition, we explore data-driven methods to determine the optimal number of subgroups within the population. Through our comprehensive analysis of the SWAN data, we unveil a crucial subgroup structure within the aging female population, shedding light on important distinctions and patterns among women undergoing menopause.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11929387/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143694532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae023
Hyung G Park
This paper presents a Bayesian reformulation of covariate-assisted principal regression for covariance matrix outcomes to identify low-dimensional components in the covariance associated with covariates. By introducing a geometric approach to the covariance matrices and leveraging Euclidean geometry, we estimate dimension reduction parameters and model covariance heterogeneity based on covariates. This method enables joint estimation and uncertainty quantification of relevant model parameters associated with heteroscedasticity. We demonstrate our approach through simulation studies and apply it to analyze associations between covariates and brain functional connectivity using data from the Human Connectome Project.
{"title":"Bayesian estimation of covariate assisted principal regression for brain functional connectivity.","authors":"Hyung G Park","doi":"10.1093/biostatistics/kxae023","DOIUrl":"10.1093/biostatistics/kxae023","url":null,"abstract":"<p><p>This paper presents a Bayesian reformulation of covariate-assisted principal regression for covariance matrix outcomes to identify low-dimensional components in the covariance associated with covariates. By introducing a geometric approach to the covariance matrices and leveraging Euclidean geometry, we estimate dimension reduction parameters and model covariance heterogeneity based on covariates. This method enables joint estimation and uncertainty quantification of relevant model parameters associated with heteroscedasticity. We demonstrate our approach through simulation studies and apply it to analyze associations between covariates and brain functional connectivity using data from the Human Connectome Project.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823071/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141565188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf032
Kaiqiong Zhao, Archer Y Yang, Karim Oualkacha, Yixiao Zeng, Kathleen Klein, Marie Hudson, Inés Colmegna, Sasha Bernatsky, Celia M T Greenwood
Varying coefficient models offer the flexibility to learn the dynamic changes of regression coefficients. Despite their good interpretability and diverse applications, in high-dimensional settings, existing estimation methods for such models have important limitations. For example, we routinely encounter the need for variable selection when faced with a large collection of covariates with nonlinear/varying effects on outcomes, and no ideal solutions exist. One illustration of this situation could be identifying a subset of genetic variants with local influence on methylation levels in a regulatory region. To address this problem, we propose a composite sparse penalty that encourages both sparsity and smoothness for the varying coefficients. We present an efficient proximal gradient descent algorithm that scales to high-dimensional predictor spaces, providing sparse solutions for the varying coefficients. A comprehensive simulation study has been conducted to evaluate the performance of our approach in terms of estimation, prediction and selection accuracy. We show that the inclusion of smoothness control yields much better results over sparsity-only approaches. An adaptive version of the penalty offers additional performance gains. We further demonstrate the utility of our method in identifying regional mQTLs from asymptomatic samples in the CARTaGENE cohort. The methodology is implemented in the R package sparseSOMNiBUS, available on GitHub.
{"title":"A novel high-dimensional model for identifying regional DNA methylation QTLs.","authors":"Kaiqiong Zhao, Archer Y Yang, Karim Oualkacha, Yixiao Zeng, Kathleen Klein, Marie Hudson, Inés Colmegna, Sasha Bernatsky, Celia M T Greenwood","doi":"10.1093/biostatistics/kxaf032","DOIUrl":"10.1093/biostatistics/kxaf032","url":null,"abstract":"<p><p>Varying coefficient models offer the flexibility to learn the dynamic changes of regression coefficients. Despite their good interpretability and diverse applications, in high-dimensional settings, existing estimation methods for such models have important limitations. For example, we routinely encounter the need for variable selection when faced with a large collection of covariates with nonlinear/varying effects on outcomes, and no ideal solutions exist. One illustration of this situation could be identifying a subset of genetic variants with local influence on methylation levels in a regulatory region. To address this problem, we propose a composite sparse penalty that encourages both sparsity and smoothness for the varying coefficients. We present an efficient proximal gradient descent algorithm that scales to high-dimensional predictor spaces, providing sparse solutions for the varying coefficients. A comprehensive simulation study has been conducted to evaluate the performance of our approach in terms of estimation, prediction and selection accuracy. We show that the inclusion of smoothness control yields much better results over sparsity-only approaches. An adaptive version of the penalty offers additional performance gains. We further demonstrate the utility of our method in identifying regional mQTLs from asymptomatic samples in the CARTaGENE cohort. The methodology is implemented in the R package sparseSOMNiBUS, available on GitHub.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12554007/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145373301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf038
Raphaël Morsomme, C Jason Liang, Allyson Mateja, Dean A Follmann, Meagan P O'Brien, Chenguang Wang, Jonathan Fintzi
We introduce a computationally efficient and general approach for utilizing multiple, possibly interval-censored, data streams to study complex biomedical endpoints using multistate semi-Markov models. Our motivating application is the REGEN-2069 trial, which investigated the protective efficacy (PE) of the monoclonal antibody combination REGEN-COV against SARS-CoV-2 when administered prophylactically to individuals in households at high risk of secondary transmission. Using data on symptom onset, episodic RT-qPCR sampling, and serological testing, we estimate the PE of REGEN-COV for asymptomatic infection, its effect on seroconversion following infection, and the duration of viral shedding. We find that REGEN-COV reduced the risk of asymptomatic infection and the duration of viral shedding, and led to lower rates of seroconversion among asymptomatically infected participants. Our algorithm for fitting semi-Markov models to interval-censored data employs a Monte Carlo expectation-maximization algorithm combined with importance sampling to efficiently address the intractability of the marginal likelihood when data are intermittently observed. Our algorithm provides substantial computational improvements over existing methods and allows us to fit semi-parametric models despite complex coarsening of the data.
{"title":"Assessing treatment efficacy for interval-censored endpoints using multistate semi-Markov models fit to multiple data streams.","authors":"Raphaël Morsomme, C Jason Liang, Allyson Mateja, Dean A Follmann, Meagan P O'Brien, Chenguang Wang, Jonathan Fintzi","doi":"10.1093/biostatistics/kxaf038","DOIUrl":"10.1093/biostatistics/kxaf038","url":null,"abstract":"<p><p>We introduce a computationally efficient and general approach for utilizing multiple, possibly interval-censored, data streams to study complex biomedical endpoints using multistate semi-Markov models. Our motivating application is the REGEN-2069 trial, which investigated the protective efficacy (PE) of the monoclonal antibody combination REGEN-COV against SARS-CoV-2 when administered prophylactically to individuals in households at high risk of secondary transmission. Using data on symptom onset, episodic RT-qPCR sampling, and serological testing, we estimate the PE of REGEN-COV for asymptomatic infection, its effect on seroconversion following infection, and the duration of viral shedding. We find that REGEN-COV reduced the risk of asymptomatic infection and the duration of viral shedding, and led to lower rates of seroconversion among asymptomatically infected participants. Our algorithm for fitting semi-Markov models to interval-censored data employs a Monte Carlo expectation-maximization algorithm combined with importance sampling to efficiently address the intractability of the marginal likelihood when data are intermittently observed. Our algorithm provides substantial computational improvements over existing methods and allows us to fit semi-parametric models despite complex coarsening of the data.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12629085/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145558436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae050
{"title":"Correction to: Scalable kernel balancing weights in a nationwide observational study of hospital profit status and heart attack outcomes.","authors":"","doi":"10.1093/biostatistics/kxae050","DOIUrl":"10.1093/biostatistics/kxae050","url":null,"abstract":"","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142883691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}