Pub Date : 2025-01-19DOI: 10.1177/09622802241307613
Arnoldus F Otto, Johannes T Ferreira, Salvatore Daniele Tomarchio, Andriëtte Bekker, Antonio Punzo
In medical and health research, investigators are often interested in countable quantities such as hospital length of stay (e.g., in days) or the number of doctor visits. Poisson regression is commonly used to model such count data, but this approach can't accommodate overdispersion-when the variance exceeds the mean. To address this issue, the negative binomial (NB) distribution (NB-D) and, by extension, NB regression provide a well-documented alternative. However, real-data applications present additional challenges that must be considered. Two such challenges are (i) the presence of (mild) outliers that can influence the performance of the NB-D and (ii) the availability of covariates that can enhance inference about the mean of the count variable of interest. To jointly address these issues, we propose the contaminated NB (cNB) distribution that exhibits the necessary flexibility to accommodate mild outliers. This model is shown to be simple and intuitive in interpretation. In addition to the parameters of the NB-D, our proposed model has a parameter describing the proportion of mild outliers and one specifying the degree of contamination. To allow available covariates to improve the estimation of the mean of the cNB distribution, we propose the cNB regression model. An expectation-maximization algorithm is outlined for parameter estimation, and its performance is evaluated through a parameter recovery study. The effectiveness of our model is demonstrated via a sensitivity analysis and on two health datasets, where it outperforms well-known count models. The methodology proposed is implemented in an R package which is available at https://github.com/arnootto/cNB.
{"title":"A contaminated regression model for count health data.","authors":"Arnoldus F Otto, Johannes T Ferreira, Salvatore Daniele Tomarchio, Andriëtte Bekker, Antonio Punzo","doi":"10.1177/09622802241307613","DOIUrl":"https://doi.org/10.1177/09622802241307613","url":null,"abstract":"<p><p>In medical and health research, investigators are often interested in countable quantities such as hospital length of stay (e.g., in days) or the number of doctor visits. Poisson regression is commonly used to model such count data, but this approach can't accommodate overdispersion-when the variance exceeds the mean. To address this issue, the negative binomial (NB) distribution (NB-D) and, by extension, NB regression provide a well-documented alternative. However, real-data applications present additional challenges that must be considered. Two such challenges are (i) the presence of (mild) outliers that can influence the performance of the NB-D and (ii) the availability of covariates that can enhance inference about the mean of the count variable of interest. To jointly address these issues, we propose the contaminated NB (cNB) distribution that exhibits the necessary flexibility to accommodate mild outliers. This model is shown to be simple and intuitive in interpretation. In addition to the parameters of the NB-D, our proposed model has a parameter describing the proportion of mild outliers and one specifying the degree of contamination. To allow available covariates to improve the estimation of the mean of the cNB distribution, we propose the cNB regression model. An expectation-maximization algorithm is outlined for parameter estimation, and its performance is evaluated through a parameter recovery study. The effectiveness of our model is demonstrated via a sensitivity analysis and on two health datasets, where it outperforms well-known count models. The methodology proposed is implemented in an R package which is available at https://github.com/arnootto/cNB.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802241307613"},"PeriodicalIF":1.6,"publicationDate":"2025-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143011863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-19DOI: 10.1177/09622802241289557
Luca Genetti, Giuliana Cortese, Henrik Ravn, Thomas Scheike
Recurrent events data are often encountered in biomedical settings, where individuals may also experience a terminal event such as death. A useful estimand to summarize such data is the marginal mean of the cumulative number of recurrent events up to a specific time horizon, allowing also for the possible presence of a terminal event. Recently, it was found that augmented estimators can estimate this quantity efficiently, providing improved inference. Improvement in efficiency by the use of covariate adjustment is increasing in popularity as the methods get further developed, and is supported by regulatory agencies EMA (2015) and FDA (2023). Motivated by these arguments, this article presents novel efficient estimators for clinical data from randomized controlled trials, accounting for additional information from auxiliary covariates. Moreover, in randomized studies when both right censoring and competing risks are present, we propose a novel doubly augmented estimator of the marginal mean , which has two optimal augmentation components due to censoring and randomization. We provide theoretical and asymptotic details for the novel estimators, also confirmed by simulation studies. Then, we discuss how to improve efficiency, both theoretically by computing the expected amount of variance reduction, and practically by showing the performance of different working regression models that are needed in the augmentation, when they are correctly specified or misspecified. The methods are applied to the LEADER study, a randomized controlled trial that studied cardiovascular safety of treatments in type 2 diabetes patients.
{"title":"Efficient estimation of the marginal mean of recurrent events in randomized controlled trials.","authors":"Luca Genetti, Giuliana Cortese, Henrik Ravn, Thomas Scheike","doi":"10.1177/09622802241289557","DOIUrl":"https://doi.org/10.1177/09622802241289557","url":null,"abstract":"<p><p>Recurrent events data are often encountered in biomedical settings, where individuals may also experience a terminal event such as death. A useful estimand to summarize such data is the marginal mean of the cumulative number of recurrent events up to a specific time horizon, allowing also for the possible presence of a terminal event. Recently, it was found that augmented estimators can estimate this quantity efficiently, providing improved inference. Improvement in efficiency by the use of covariate adjustment is increasing in popularity as the methods get further developed, and is supported by regulatory agencies EMA (2015) and FDA (2023). Motivated by these arguments, this article presents novel efficient estimators for clinical data from randomized controlled trials, accounting for additional information from auxiliary covariates. Moreover, in randomized studies when both right censoring and competing risks are present, we propose a novel doubly augmented estimator of the marginal mean , which has two optimal augmentation components due to censoring and randomization. We provide theoretical and asymptotic details for the novel estimators, also confirmed by simulation studies. Then, we discuss how to improve efficiency, both theoretically by computing the expected amount of variance reduction, and practically by showing the performance of different working regression models that are needed in the augmentation, when they are correctly specified or misspecified. The methods are applied to the LEADER study, a randomized controlled trial that studied cardiovascular safety of treatments in type 2 diabetes patients.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802241289557"},"PeriodicalIF":1.6,"publicationDate":"2025-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143011949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-19DOI: 10.1177/09622802241304111
Zhaojin Li, Xiang Geng, Yawen Hou, Zheng Chen
The proportional hazards (PH) assumption is often violated in clinical trials. If the most commonly used Log-rank test is used for trial design in non-proportional hazard (NPH) cases, it will result in power loss or inflation, and the effect measures hazard ratio will become difficult to interpret. To circumvent the issue caused by the NPH for trial design and to make the effect measures easy to interpret and communicate, two simulation-free methods about restricted mean survival time group sequential (GS-RMST) design are introduced in this study: the independent increment GS-RMST (GS-RMSTi) design and the non-independent increment GS-RMST (GS-RMSTn) design. For the above two designs, the corresponding analytic expression of the variance-covariance matrix, the calculations of the stopping boundaries and sample size are given in the study. Simulation studies show that both designs can achieve the corresponding nominal type I error and nominal power. The GS-RMSTn simulation studies show that the Max-Combo test group sequential design is robust in different NPH scenarios and is suitable for discovering whether there is a treatment effect difference. However, it does not have a corresponding easy-to-interpret effect measure indicating effect difference magnitude. GS-RMST performs well in both PH and NPH scenarios, and it can obtain time-scale effect measures that are easy to understand by both physicians and patients. Examples of both GS-RMST designs are also illustrated.
{"title":"Group sequential design using restricted mean survival time as the primary endpoint in clinical trials.","authors":"Zhaojin Li, Xiang Geng, Yawen Hou, Zheng Chen","doi":"10.1177/09622802241304111","DOIUrl":"https://doi.org/10.1177/09622802241304111","url":null,"abstract":"<p><p>The proportional hazards (PH) assumption is often violated in clinical trials. If the most commonly used Log-rank test is used for trial design in non-proportional hazard (NPH) cases, it will result in power loss or inflation, and the effect measures hazard ratio will become difficult to interpret. To circumvent the issue caused by the NPH for trial design and to make the effect measures easy to interpret and communicate, two simulation-free methods about restricted mean survival time group sequential (GS-RMST) design are introduced in this study: the independent increment GS-RMST (GS-RMSTi) design and the non-independent increment GS-RMST (GS-RMSTn) design. For the above two designs, the corresponding analytic expression of the variance-covariance matrix, the calculations of the stopping boundaries and sample size are given in the study. Simulation studies show that both designs can achieve the corresponding nominal type I error and nominal power. The GS-RMSTn simulation studies show that the Max-Combo test group sequential design is robust in different NPH scenarios and is suitable for discovering whether there is a treatment effect difference. However, it does not have a corresponding easy-to-interpret effect measure indicating effect difference magnitude. GS-RMST performs well in both PH and NPH scenarios, and it can obtain time-scale effect measures that are easy to understand by both physicians and patients. Examples of both GS-RMST designs are also illustrated.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802241304111"},"PeriodicalIF":1.6,"publicationDate":"2025-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143011973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-19DOI: 10.1177/09622802241307237
Remi Luschei, Werner Brannath
The population-wise error rate is a type I error rate for clinical trials with multiple target populations. In such trials, a treatment is tested for its efficacy in each population. The population-wise error rate is defined as the probability that a randomly selected, future patient will be exposed to an inefficient treatment based on the study results. It can be understood and computed as an average of strata-specific family wise error rates and involves the prevalences of these strata. A major issue of this concept is that the prevalences are usually unknown in practice, so that the population-wise error rate cannot be directly controlled. Instead, one could use an estimator based on the given sample, like their maximum-likelihood estimator under a multinomial distribution. In this article, we demonstrate through simulations that this does not substantially inflate the true population-wise error rate. We differentiate between the expected population-wise error rate, which is almost perfectly controlled, and study-specific values of the population-wise error rate which are conditioned on all subgroup sample sizes and vary within a narrow range. Thereby, we consider up to eight different overlapping populations and moderate to large sample sizes. In these settings, we also consider the maximum strata-wise family wise error rate, which is found to be, on average, at least bounded by twice the significance level used for population-wise error rate control.
{"title":"The effect of estimating prevalences on the population-wise error rate.","authors":"Remi Luschei, Werner Brannath","doi":"10.1177/09622802241307237","DOIUrl":"https://doi.org/10.1177/09622802241307237","url":null,"abstract":"<p><p>The population-wise error rate is a type I error rate for clinical trials with multiple target populations. In such trials, a treatment is tested for its efficacy in each population. The population-wise error rate is defined as the probability that a randomly selected, future patient will be exposed to an inefficient treatment based on the study results. It can be understood and computed as an average of strata-specific family wise error rates and involves the prevalences of these strata. A major issue of this concept is that the prevalences are usually unknown in practice, so that the population-wise error rate cannot be directly controlled. Instead, one could use an estimator based on the given sample, like their maximum-likelihood estimator under a multinomial distribution. In this article, we demonstrate through simulations that this does not substantially inflate the true population-wise error rate. We differentiate between the expected population-wise error rate, which is almost perfectly controlled, and study-specific values of the population-wise error rate which are conditioned on all subgroup sample sizes and vary within a narrow range. Thereby, we consider up to eight different overlapping populations and moderate to large sample sizes. In these settings, we also consider the maximum strata-wise family wise error rate, which is found to be, on average, at least bounded by twice the significance level used for population-wise error rate control.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802241307237"},"PeriodicalIF":1.6,"publicationDate":"2025-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143011999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-17DOI: 10.1177/09622802241300049
Helen Bell Gorrod, Shahrul Mt-Isa, Jingyi Xuan, Kristel Vandormael, William Malbecq, Victoria Yorke-Edwards, Ian R White, Nicholas Latimer
Treatment switching is common in randomised controlled trials (RCTs). Participants may switch onto a variety of different treatments, all of which may have different treatment effects. Adjustment analyses that target hypothetical estimands - estimating outcomes that would have been observed in the absence of treatment switching - have focused primarily on a single type of switch. In this study, we assess the performance of applications of inverse probability of censoring weights (IPCW) and two-stage estimation (TSE) which adjust for multiple switches by either (i) adjusting for each type of switching separately ('treatments separate') or (ii) adjusting for switches combined without differentiating between switched-to treatments ('treatments combined'). We simulate 48 scenarios in which RCT participants may switch to multiple treatments. Switch proportions, treatment effects, number of switched-to treatments and censoring proportions were varied. Method performance measures included mean percentage bias in restricted mean survival time and the frequency of model convergence. Similar levels of bias were produced by treatments combined and treatments separate in both TSE and IPCW applications. In the scenarios examined, there was no demonstrable advantage associated with adjusting for each type of switch separately, compared with adjusting for all switches together.
{"title":"Adjusting for switches to multiple treatments: Should switches be handled separately or combined?","authors":"Helen Bell Gorrod, Shahrul Mt-Isa, Jingyi Xuan, Kristel Vandormael, William Malbecq, Victoria Yorke-Edwards, Ian R White, Nicholas Latimer","doi":"10.1177/09622802241300049","DOIUrl":"https://doi.org/10.1177/09622802241300049","url":null,"abstract":"<p><p>Treatment switching is common in randomised controlled trials (RCTs). Participants may switch onto a variety of different treatments, all of which may have different treatment effects. Adjustment analyses that target hypothetical estimands - estimating outcomes that would have been observed in the absence of treatment switching - have focused primarily on a single type of switch. In this study, we assess the performance of applications of inverse probability of censoring weights (IPCW) and two-stage estimation (TSE) which adjust for multiple switches by either (i) adjusting for each type of switching separately ('treatments separate') or (ii) adjusting for switches combined without differentiating between switched-to treatments ('treatments combined'). We simulate 48 scenarios in which RCT participants may switch to multiple treatments. Switch proportions, treatment effects, number of switched-to treatments and censoring proportions were varied. Method performance measures included mean percentage bias in restricted mean survival time and the frequency of model convergence. Similar levels of bias were produced by treatments combined and treatments separate in both TSE and IPCW applications. In the scenarios examined, there was no demonstrable advantage associated with adjusting for each type of switch separately, compared with adjusting for all switches together.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802241300049"},"PeriodicalIF":1.6,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143011900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2024-12-10DOI: 10.1177/09622802241295335
Yi Niu, Duze Fan, Jie Ding, Yingwei Peng
The semiparametric accelerated failure time mixture cure model is an appealing alternative to the proportional hazards mixture cure model in analyzing failure time data with long-term survivors. However, this model was only proposed for independent survival data and it has not been extended to clustered or correlated survival data, partly due to the complexity of the estimation method for the model. In this paper, we consider a marginal semiparametric accelerated failure time mixture cure model for clustered right-censored failure time data with a potential cure fraction. We overcome the complexity of the existing semiparametric method by proposing a generalized estimating equations approach based on the expectation-maximization algorithm to estimate the regression parameters in the model. The correlation structures within clusters are modeled by working correlation matrices in the proposed generalized estimating equations. The large sample properties of the regression estimators are established. Numerical studies demonstrate that the proposed estimation method is easy to use and robust to the misspecification of working matrices and that higher efficiency is achieved when the working correlation structure is closer to the true correlation structure. We apply the proposed model and estimation method to a contralateral breast cancer study and reveal new insights when the potential correlation between patients is taken into account.
{"title":"Marginal semiparametric accelerated failure time cure model for clustered survival data.","authors":"Yi Niu, Duze Fan, Jie Ding, Yingwei Peng","doi":"10.1177/09622802241295335","DOIUrl":"10.1177/09622802241295335","url":null,"abstract":"<p><p>The semiparametric accelerated failure time mixture cure model is an appealing alternative to the proportional hazards mixture cure model in analyzing failure time data with long-term survivors. However, this model was only proposed for independent survival data and it has not been extended to clustered or correlated survival data, partly due to the complexity of the estimation method for the model. In this paper, we consider a marginal semiparametric accelerated failure time mixture cure model for clustered right-censored failure time data with a potential cure fraction. We overcome the complexity of the existing semiparametric method by proposing a generalized estimating equations approach based on the expectation-maximization algorithm to estimate the regression parameters in the model. The correlation structures within clusters are modeled by working correlation matrices in the proposed generalized estimating equations. The large sample properties of the regression estimators are established. Numerical studies demonstrate that the proposed estimation method is easy to use and robust to the misspecification of working matrices and that higher efficiency is achieved when the working correlation structure is closer to the true correlation structure. We apply the proposed model and estimation method to a contralateral breast cancer study and reveal new insights when the potential correlation between patients is taken into account.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"150-169"},"PeriodicalIF":1.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11800722/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142808103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2024-12-12DOI: 10.1177/09622802241298704
Kuan Liu, Olli Saarela, George Tomlinson, Brian M Feldman, Eleanor Pullenayegum
Bayesian methods are becoming increasingly in demand in clinical and public health comparative effectiveness research. Limited literature has explored parametric Bayesian causal approaches to handle time-dependent treatment and time-dependent covariates. In this article, building on to the work on Bayesian g-computation, we propose a fully Bayesian causal approach, implemented using latent confounder classes which represent the patient's disease and health status. Our setting is suitable when the latent class represents a true disease state that the physician is able to infer without misclassification based on manifest variables. We consider a causal effect that is confounded by the visit-specific latent class in a longitudinal setting and formulate the joint likelihood of the treatment, outcome and latent class models conditionally on the class indicators. The proposed causal structure with latent classes features dimension reduction of time-dependent confounders. We examine the performance of the proposed method using simulation studies and compare the proposed method to other causal methods for longitudinal data with time-dependent treatment and time-dependent confounding. Our approach is illustrated through a study of the effectiveness of intravenous immunoglobulin in treating newly diagnosed juvenile dermatomyositis.
{"title":"A Bayesian latent class approach to causal inference with longitudinal data.","authors":"Kuan Liu, Olli Saarela, George Tomlinson, Brian M Feldman, Eleanor Pullenayegum","doi":"10.1177/09622802241298704","DOIUrl":"10.1177/09622802241298704","url":null,"abstract":"<p><p>Bayesian methods are becoming increasingly in demand in clinical and public health comparative effectiveness research. Limited literature has explored parametric Bayesian causal approaches to handle time-dependent treatment and time-dependent covariates. In this article, building on to the work on Bayesian g-computation, we propose a fully Bayesian causal approach, implemented using latent confounder classes which represent the patient's disease and health status. Our setting is suitable when the latent class represents a true disease state that the physician is able to infer without misclassification based on manifest variables. We consider a causal effect that is confounded by the visit-specific latent class in a longitudinal setting and formulate the joint likelihood of the treatment, outcome and latent class models conditionally on the class indicators. The proposed causal structure with latent classes features dimension reduction of time-dependent confounders. We examine the performance of the proposed method using simulation studies and compare the proposed method to other causal methods for longitudinal data with time-dependent treatment and time-dependent confounding. Our approach is illustrated through a study of the effectiveness of intravenous immunoglobulin in treating newly diagnosed juvenile dermatomyositis.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"55-68"},"PeriodicalIF":1.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11800708/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2024-12-12DOI: 10.1177/09622802241299410
Takehiro Shoji, Jun Tsuchida, Hiroshi Yadohisa
When using the propensity score method to estimate the treatment effects, it is important to select the covariates to be included in the propensity score model. The inclusion of covariates unrelated to the outcome in the propensity score model led to bias and large variance in the estimator of treatment effects. Many data-driven covariate selection methods have been proposed for selecting covariates related to outcomes. However, most of them assume an average treatment effect estimation and may not be designed to estimate quantile treatment effects (QTEs), which are the effects of treatment on the quantiles of outcome distribution. In QTE estimation, we consider two relation types with the outcome as the expected value and quantile point. To achieve this, we propose a data-driven covariate selection method for propensity score models that allows for the selection of covariates related to the expected value and quantile of the outcome for QTE estimation. Assuming the quantile regression model as an outcome regression model, covariate selection was performed using a regularization method with the partial regression coefficients of the quantile regression model as weights. The proposed method was applied to artificial data and a dataset of mothers and children born in King County, Washington, to compare the performance of existing methods and QTE estimators. As a result, the proposed method performs well in the presence of covariates related to both the expected value and quantile of the outcome.
{"title":"Quantile outcome adaptive lasso: Covariate selection for inverse probability weighting estimator of quantile treatment effects.","authors":"Takehiro Shoji, Jun Tsuchida, Hiroshi Yadohisa","doi":"10.1177/09622802241299410","DOIUrl":"10.1177/09622802241299410","url":null,"abstract":"<p><p>When using the propensity score method to estimate the treatment effects, it is important to select the covariates to be included in the propensity score model. The inclusion of covariates unrelated to the outcome in the propensity score model led to bias and large variance in the estimator of treatment effects. Many data-driven covariate selection methods have been proposed for selecting covariates related to outcomes. However, most of them assume an average treatment effect estimation and may not be designed to estimate quantile treatment effects (QTEs), which are the effects of treatment on the quantiles of outcome distribution. In QTE estimation, we consider two relation types with the outcome as the expected value and quantile point. To achieve this, we propose a data-driven covariate selection method for propensity score models that allows for the selection of covariates related to the expected value and quantile of the outcome for QTE estimation. Assuming the quantile regression model as an outcome regression model, covariate selection was performed using a regularization method with the partial regression coefficients of the quantile regression model as weights. The proposed method was applied to artificial data and a dataset of mothers and children born in King County, Washington, to compare the performance of existing methods and QTE estimators. As a result, the proposed method performs well in the presence of covariates related to both the expected value and quantile of the outcome.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"69-84"},"PeriodicalIF":1.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11800702/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2024-12-12DOI: 10.1177/09622802241300845
Divan Aristo Burger, Sean van der Merwe, Janet van Niekerk, Emmanuel Lesaffre, Antoine Pironet
This study introduces a novel joint modeling framework integrating quantile regression for longitudinal continuous proportions data with Cox regression for time-to-event analysis, employing integrated nested Laplace approximation for Bayesian inference. Our approach facilitates an examination across the entire distribution of patient health metrics over time, including the occurrence of key health events and their impact on patient outcomes, particularly in the context of medication adherence and persistence. Integrated nested Laplace approximation's fast computational speed significantly enhances the efficiency of this process, making the model particularly suitable for applications requiring rapid data analysis and updates. Applying this model to a dataset of patients who underwent treatment with atorvastatin, we demonstrate the significant impact of targeted interventions on improving medication adherence and persistence across various patient subgroups. Furthermore, we have developed a dynamic prediction method within this framework that rapidly estimates persistence probabilities based on the latest medication adherence data, demonstrating integrated nested Laplace approximation's quick updates and prediction capability. The simulation study validates the reliability of our modeling approach, evidenced by minimal bias and appropriate credible interval coverage probabilities across different quantile levels.
{"title":"Joint quantile regression of longitudinal continuous proportions and time-to-event data: Application in medication adherence and persistence.","authors":"Divan Aristo Burger, Sean van der Merwe, Janet van Niekerk, Emmanuel Lesaffre, Antoine Pironet","doi":"10.1177/09622802241300845","DOIUrl":"10.1177/09622802241300845","url":null,"abstract":"<p><p>This study introduces a novel joint modeling framework integrating quantile regression for longitudinal continuous proportions data with Cox regression for time-to-event analysis, employing integrated nested Laplace approximation for Bayesian inference. Our approach facilitates an examination across the entire distribution of patient health metrics over time, including the occurrence of key health events and their impact on patient outcomes, particularly in the context of medication adherence and persistence. Integrated nested Laplace approximation's fast computational speed significantly enhances the efficiency of this process, making the model particularly suitable for applications requiring rapid data analysis and updates. Applying this model to a dataset of patients who underwent treatment with atorvastatin, we demonstrate the significant impact of targeted interventions on improving medication adherence and persistence across various patient subgroups. Furthermore, we have developed a dynamic prediction method within this framework that rapidly estimates persistence probabilities based on the latest medication adherence data, demonstrating integrated nested Laplace approximation's quick updates and prediction capability. The simulation study validates the reliability of our modeling approach, evidenced by minimal bias and appropriate credible interval coverage probabilities across different quantile levels.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"111-130"},"PeriodicalIF":1.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2024-12-10DOI: 10.1177/09622802241293768
Julien St-Pierre, Karim Oualkacha, Sahir Rai Bhatnagar
Interactions between genes and environmental factors may play a key role in the etiology of many common disorders. Several regularized generalized linear models have been proposed for hierarchical selection of gene by environment interaction effects, where a gene-environment interaction effect is selected only if the corresponding genetic main effect is also selected in the model. However, none of these methods allow to include random effects to account for population structure, subject relatedness and shared environmental exposure. In this article, we develop a unified approach based on regularized penalized quasi-likelihood estimation to perform hierarchical selection of gene-environment interaction effects in sparse regularized mixed models. We compare the selection and prediction accuracy of our proposed model with existing methods through simulations under the presence of population structure and shared environmental exposure. We show that for all simulation scenarios, including and additional random effect to account for the shared environmental exposure reduces the false positive rate and false discovery rate of our proposed method for selection of both gene-environment interaction and main effects. Using the score as a balanced measure of the false discovery rate and true positive rate, we further show that in the hierarchical simulation scenarios, our method outperforms other methods for retrieving important gene-environment interaction effects. Finally, we apply our method to a real data application using the Orofacial Pain: Prospective Evaluation and Risk Assessment (OPPERA) study, and found that our method retrieves previously reported significant loci.
{"title":"Hierarchical selection of genetic and gene by environment interaction effects in high-dimensional mixed models.","authors":"Julien St-Pierre, Karim Oualkacha, Sahir Rai Bhatnagar","doi":"10.1177/09622802241293768","DOIUrl":"10.1177/09622802241293768","url":null,"abstract":"<p><p>Interactions between genes and environmental factors may play a key role in the etiology of many common disorders. Several regularized generalized linear models have been proposed for hierarchical selection of gene by environment interaction effects, where a gene-environment interaction effect is selected only if the corresponding genetic main effect is also selected in the model. However, none of these methods allow to include random effects to account for population structure, subject relatedness and shared environmental exposure. In this article, we develop a unified approach based on regularized penalized quasi-likelihood estimation to perform hierarchical selection of gene-environment interaction effects in sparse regularized mixed models. We compare the selection and prediction accuracy of our proposed model with existing methods through simulations under the presence of population structure and shared environmental exposure. We show that for all simulation scenarios, including and additional random effect to account for the shared environmental exposure reduces the false positive rate and false discovery rate of our proposed method for selection of both gene-environment interaction and main effects. Using the <math><msub><mi>F</mi><mn>1</mn></msub></math> score as a balanced measure of the false discovery rate and true positive rate, we further show that in the hierarchical simulation scenarios, our method outperforms other methods for retrieving important gene-environment interaction effects. Finally, we apply our method to a real data application using the Orofacial Pain: Prospective Evaluation and Risk Assessment (OPPERA) study, and found that our method retrieves previously reported significant loci.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"180-198"},"PeriodicalIF":1.6,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11800719/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142808101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}