Pub Date : 2024-08-12eCollection Date: 2024-11-01DOI: 10.1515/ijb-2023-0116
Awa Diop, Caroline Sirois, Jason R Guertin, Mireille E Schnitzer, James M Brophy, Claudia Blais, Denis Talbot
In previous work, we introduced a framework that combines latent class growth analysis (LCGA) with marginal structural models (LCGA-MSM). LCGA-MSM first summarizes the numerous time-varying treatment patterns into a few trajectory groups and then allows for a population-level causal interpretation of the group differences. However, the LCGA-MSM framework is not suitable when the outcome is time-dependent. In this study, we propose combining a nonparametric history-restricted marginal structural model (HRMSM) with LCGA. HRMSMs can be seen as an application of standard MSMs on multiple time intervals. To the best of our knowledge, we also present the first application of HRMSMs with a time-to-event outcome. It was previously noted that HRMSMs could pose interpretation problems in survival analysis when either targeting a hazard ratio or a survival curve. We propose a causal parameter that bypasses these interpretation challenges. We consider three different estimators of the parameters: inverse probability of treatment weighting (IPTW), g-computation, and a pooled longitudinal targeted maximum likelihood estimator (pooled LTMLE). We conduct simulation studies to measure the performance of the proposed LCGA-HRMSM. For all scenarios, we obtain unbiased estimates when using either g-computation or pooled LTMLE. IPTW produced estimates with slightly larger bias in some scenarios. Overall, all approaches have good coverage of the 95 % confidence interval. We applied our approach to a population of older Quebecers composed of 57,211 statin initiators and found that a greater adherence to statins was associated with a lower combined risk of cardiovascular disease or all-cause mortality.
{"title":"History-restricted marginal structural model and latent class growth analysis of treatment trajectories for a time-dependent outcome.","authors":"Awa Diop, Caroline Sirois, Jason R Guertin, Mireille E Schnitzer, James M Brophy, Claudia Blais, Denis Talbot","doi":"10.1515/ijb-2023-0116","DOIUrl":"10.1515/ijb-2023-0116","url":null,"abstract":"<p><p>In previous work, we introduced a framework that combines latent class growth analysis (LCGA) with marginal structural models (LCGA-MSM). LCGA-MSM first summarizes the numerous time-varying treatment patterns into a few trajectory groups and then allows for a population-level causal interpretation of the group differences. However, the LCGA-MSM framework is not suitable when the outcome is time-dependent. In this study, we propose combining a nonparametric history-restricted marginal structural model (HRMSM) with LCGA. HRMSMs can be seen as an application of standard MSMs on multiple time intervals. To the best of our knowledge, we also present the first application of HRMSMs with a time-to-event outcome. It was previously noted that HRMSMs could pose interpretation problems in survival analysis when either targeting a hazard ratio or a survival curve. We propose a causal parameter that bypasses these interpretation challenges. We consider three different estimators of the parameters: inverse probability of treatment weighting (IPTW), g-computation, and a pooled longitudinal targeted maximum likelihood estimator (pooled LTMLE). We conduct simulation studies to measure the performance of the proposed LCGA-HRMSM. For all scenarios, we obtain unbiased estimates when using either g-computation or pooled LTMLE. IPTW produced estimates with slightly larger bias in some scenarios. Overall, all approaches have good coverage of the 95 % confidence interval. We applied our approach to a population of older Quebecers composed of 57,211 statin initiators and found that a greater adherence to statins was associated with a lower combined risk of cardiovascular disease or all-cause mortality.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":"467-490"},"PeriodicalIF":1.2,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11661564/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141972255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01eCollection Date: 2024-11-01DOI: 10.1515/ijb-2023-0050
Valeria Sambucini
Traditional methods for Sample Size Determination (SSD) based on power analysis exploit relevant fixed values or preliminary estimates for the unknown parameters. A hybrid classical-Bayesian approach can be used to formally incorporate information or model uncertainty on unknown quantities by using prior distributions according to the Bayesian approach, while still analysing the data in a frequentist framework. In this paper, we propose a hybrid procedure for SSD in two-arm superiority trials, that takes into account the different role played by the unknown parameters involved in the statistical power. Thus, different prior distributions are used to formalize design expectations and to model information or uncertainty on preliminary estimates involved at the analysis stage. To illustrate the method, we consider binary data and derive the proposed hybrid criteria using three possible parameters of interest, i.e. the difference between proportions of successes, the logarithm of the relative risk and the logarithm of the odds ratio. Numerical examples taken from the literature are presented to show how to implement the proposed procedure.
{"title":"Hybrid classical-Bayesian approach to sample size determination for two-arm superiority clinical trials.","authors":"Valeria Sambucini","doi":"10.1515/ijb-2023-0050","DOIUrl":"10.1515/ijb-2023-0050","url":null,"abstract":"<p><p>Traditional methods for Sample Size Determination (SSD) based on power analysis exploit relevant fixed values or preliminary estimates for the unknown parameters. A hybrid classical-Bayesian approach can be used to formally incorporate information or model uncertainty on unknown quantities by using prior distributions according to the Bayesian approach, while still analysing the data in a frequentist framework. In this paper, we propose a hybrid procedure for SSD in two-arm superiority trials, that takes into account the different role played by the unknown parameters involved in the statistical power. Thus, different prior distributions are used to formalize design expectations and to model information or uncertainty on preliminary estimates involved at the analysis stage. To illustrate the method, we consider binary data and derive the proposed hybrid criteria using three possible parameters of interest, i.e. the difference between proportions of successes, the logarithm of the relative risk and the logarithm of the odds ratio. Numerical examples taken from the literature are presented to show how to implement the proposed procedure.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":"553-570"},"PeriodicalIF":1.2,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141472121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-25eCollection Date: 2024-11-01DOI: 10.1515/ijb-2023-0061
Massimo Bilancia, Andrea Nigri, Barbara Cafarelli, Danilo Di Bona
Asthma is a disease characterized by chronic airway hyperresponsiveness and inflammation, with signs of variable airflow limitation and impaired lung function leading to respiratory symptoms such as shortness of breath, chest tightness and cough. Eosinophilic asthma is a distinct phenotype that affects more than half of patients diagnosed with severe asthma. It can be effectively treated with monoclonal antibodies targeting specific immunological signaling pathways that fuel the inflammation underlying the disease, particularly Interleukin-5 (IL-5), a cytokine that plays a crucial role in asthma. In this study, we propose a data analysis pipeline aimed at identifying subphenotypes of severe eosinophilic asthma in relation to response to therapy at follow-up, which could have great potential for use in routine clinical practice. Once an optimal partition of patients into subphenotypes has been determined, the labels indicating the group to which each patient has been assigned are used in a novel way. For each input variable in a specialized logistic regression model, a clusterwise effect on response to therapy is determined by an appropriate interaction term between the input variable under consideration and the cluster label. We show that the clusterwise odds ratios can be meaningfully interpreted conditional on the cluster label. In this way, we can define an effect measure for the response variable for each input variable in each of the groups identified by the clustering algorithm, which is not possible in standard logistic regression because the effect of the reference class is aliased with the overall intercept. The interpretability of the model is enforced by promoting sparsity, a goal achieved by learning interactions in a hierarchical manner using a special group-Lasso technique. In addition, valid expressions are provided for computing odds ratios in the unusual parameterization used by the sparsity-promoting algorithm. We show how to apply the proposed data analysis pipeline to the problem of sub-phenotyping asthma patients also in terms of quality of response to therapy with monoclonal antibodies.
{"title":"An interpretable cluster-based logistic regression model, with application to the characterization of response to therapy in severe eosinophilic asthma.","authors":"Massimo Bilancia, Andrea Nigri, Barbara Cafarelli, Danilo Di Bona","doi":"10.1515/ijb-2023-0061","DOIUrl":"10.1515/ijb-2023-0061","url":null,"abstract":"<p><p>Asthma is a disease characterized by chronic airway hyperresponsiveness and inflammation, with signs of variable airflow limitation and impaired lung function leading to respiratory symptoms such as shortness of breath, chest tightness and cough. Eosinophilic asthma is a distinct phenotype that affects more than half of patients diagnosed with severe asthma. It can be effectively treated with monoclonal antibodies targeting specific immunological signaling pathways that fuel the inflammation underlying the disease, particularly Interleukin-5 (IL-5), a cytokine that plays a crucial role in asthma. In this study, we propose a data analysis pipeline aimed at identifying subphenotypes of severe eosinophilic asthma in relation to response to therapy at follow-up, which could have great potential for use in routine clinical practice. Once an optimal partition of patients into subphenotypes has been determined, the labels indicating the group to which each patient has been assigned are used in a novel way. For each input variable in a specialized logistic regression model, a clusterwise effect on response to therapy is determined by an appropriate interaction term between the input variable under consideration and the cluster label. We show that the clusterwise odds ratios can be meaningfully interpreted conditional on the cluster label. In this way, we can define an effect measure for the response variable for each input variable in each of the groups identified by the clustering algorithm, which is not possible in standard logistic regression because the effect of the reference class is aliased with the overall intercept. The interpretability of the model is enforced by promoting sparsity, a goal achieved by learning interactions in a hierarchical manner using a special group-Lasso technique. In addition, valid expressions are provided for computing odds ratios in the unusual parameterization used by the sparsity-promoting algorithm. We show how to apply the proposed data analysis pipeline to the problem of sub-phenotyping asthma patients also in terms of quality of response to therapy with monoclonal antibodies.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":"361-388"},"PeriodicalIF":1.2,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141443615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-06eCollection Date: 2024-11-01DOI: 10.1515/ijb-2024-0013
Heng Chen, Daniel F Heitjan
{"title":"Response to comments on 'sensitivity of estimands in clinical trials with imperfect compliance'.","authors":"Heng Chen, Daniel F Heitjan","doi":"10.1515/ijb-2024-0013","DOIUrl":"10.1515/ijb-2024-0013","url":null,"abstract":"","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":"433"},"PeriodicalIF":1.2,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140854329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The mean residual lifetime (MRL) of a unit in a population at a given time t, is the average remaining lifetime among those population units still alive at the time t. In some applications, it is reasonable to assume that MRL function is a decreasing function over time. Thus, one natural way to improve the estimation of MRL function is to use this assumption in estimation process. In this paper, we develop an MRL estimator in ranked set sampling (RSS) which, enjoys the monotonicity property. We prove that it is a strongly uniformly consistent estimator of true MRL function. We also show that the asymptotic distribution of the introduced estimator is the same as the empirical one, and therefore the novel estimator is obtained "free of charge", at least in an asymptotic sense. We then compare the proposed estimator with its competitors in RSS and simple random sampling (SRS) using Monte Carlo simulation. Our simulation results confirm the superiority of the proposed procedure for finite sample sizes. Finally, a real dataset from the Surveillance, Epidemiology and End Results (SEER) program of the US National Cancer Institute (NCI) is used to show that the introduced technique can provide more accurate estimates for the average remaining lifetime of patients with breast cancer.
{"title":"Estimation of a decreasing mean residual life based on ranked set sampling with an application to survival analysis.","authors":"Elham Zamanzade, Ehsan Zamanzade, Afshin Parvardeh","doi":"10.1515/ijb-2023-0051","DOIUrl":"10.1515/ijb-2023-0051","url":null,"abstract":"<p><p>The mean residual lifetime (MRL) of a unit in a population at a given time <i>t</i>, is the average remaining lifetime among those population units still alive at the time <i>t</i>. In some applications, it is reasonable to assume that MRL function is a decreasing function over time. Thus, one natural way to improve the estimation of MRL function is to use this assumption in estimation process. In this paper, we develop an MRL estimator in ranked set sampling (RSS) which, enjoys the monotonicity property. We prove that it is a strongly uniformly consistent estimator of true MRL function. We also show that the asymptotic distribution of the introduced estimator is the same as the empirical one, and therefore the novel estimator is obtained \"free of charge\", at least in an asymptotic sense. We then compare the proposed estimator with its competitors in RSS and simple random sampling (SRS) using Monte Carlo simulation. Our simulation results confirm the superiority of the proposed procedure for finite sample sizes. Finally, a real dataset from the Surveillance, Epidemiology and End Results (SEER) program of the US National Cancer Institute (NCI) is used to show that the introduced technique can provide more accurate estimates for the average remaining lifetime of patients with breast cancer.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":"571-583"},"PeriodicalIF":1.2,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140319833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-22eCollection Date: 2024-11-01DOI: 10.1515/ijb-2023-0037
Claus Thorn Ekstrøm, Bendix Carstensen
Agreement between methods for quantitative measurements are typically assessed by computing limits of agreement between pairs of methods and/or by illustration through Bland-Altman plots. We consider the situation where the observed measurement methods are considered a random sample from a population of possible methods, and discuss how the underlying linear mixed effects model can be extended to this situation. This is relevant when, for example, the methods represent raters/judges that are used to score specific individuals or items. In the case of random methods, we are not interested in estimates pertaining to the specific methods, but are instead interested in quantifying the variation between the methods actually involved making measurements, and accommodating this as an extra source of variation when generalizing to the clinical performance of a method. In the model we allow raters to have individual precision/skill and permit linked replicates (i.e., when the numbering, labeling or ordering of the replicates within items is important). Applications involving estimation of the limits of agreement for two datasets are shown: A dataset of spatial perception among a group of students as well as a dataset on consumer preference of French chocolate. The models are implemented in the MethComp package for R [Carstensen B, Gurrin L, Ekstrøm CT, Figurski M. MethComp: functions for analysis of agreement in method comparison studies; 2013. R package version 1.22, R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2012].
定量测量方法之间的一致性通常是通过计算成对方法之间的一致性极限和/或通过布兰-阿尔特曼图进行说明来评估的。我们考虑的情况是,观察到的测量方法被视为可能方法群体中的随机样本,并讨论如何将基本的线性混合效应模型扩展到这种情况。例如,当测量方法代表用于对特定个人或项目进行评分的评分者/评判者时,这种情况就很重要。在随机方法的情况下,我们对与特定方法有关的估计值不感兴趣,而是对量化实际参与测量的方法之间的变异感兴趣,并在归纳方法的临床表现时将其作为额外的变异来源。在模型中,我们允许评定者有各自的精确度/技能,并允许链接重复(即当项目内重复的编号、标签或排序很重要时)。本文展示了对两个数据集的一致性极限进行估计的应用:一个是一组学生的空间感知数据集,另一个是消费者对法国巧克力的偏好数据集。模型由 R 软件包 MethComp 实现[Carstensen B, Gurrin L, Ekstrøm CT, Figurski M. MethComp: functions for analysis of agreement in method comparison studies; 2013.R 软件包 1.22 版,R 核心团队。R: a language and environment for statistical computing.奥地利维也纳:R Foundation for Statistical Computing; 2012]。
{"title":"Statistical models for assessing agreement for quantitative data with heterogeneous random raters and replicate measurements.","authors":"Claus Thorn Ekstrøm, Bendix Carstensen","doi":"10.1515/ijb-2023-0037","DOIUrl":"10.1515/ijb-2023-0037","url":null,"abstract":"<p><p>Agreement between methods for quantitative measurements are typically assessed by computing limits of agreement between pairs of methods and/or by illustration through Bland-Altman plots. We consider the situation where the observed measurement methods are considered a random sample from a population of possible methods, and discuss how the underlying linear mixed effects model can be extended to this situation. This is relevant when, for example, the methods represent raters/judges that are used to score specific individuals or items. In the case of random methods, we are not interested in estimates pertaining to the specific methods, but are instead interested in quantifying the variation between the methods actually involved making measurements, and accommodating this as an extra source of variation when generalizing to the clinical performance of a method. In the model we allow raters to have individual precision/skill and permit linked replicates (<i>i.e.</i>, when the numbering, labeling or ordering of the replicates within items is important). Applications involving estimation of the limits of agreement for two datasets are shown: A dataset of spatial perception among a group of students as well as a dataset on consumer preference of French chocolate. The models are implemented in the MethComp package for R [Carstensen B, Gurrin L, Ekstrøm CT, Figurski M. MethComp: functions for analysis of agreement in method comparison studies; 2013. R package version 1.22, R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2012].</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":"455-466"},"PeriodicalIF":1.2,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139913982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-13eCollection Date: 2024-11-01DOI: 10.1515/ijb-2023-0059
Brian D Williamson, Ying Huang
In many applications, it is of interest to identify a parsimonious set of features, or panel, from multiple candidates that achieves a desired level of performance in predicting a response. This task is often complicated in practice by missing data arising from the sampling design or other random mechanisms. Most recent work on variable selection in missing data contexts relies in some part on a finite-dimensional statistical model, e.g., a generalized or penalized linear model. In cases where this model is misspecified, the selected variables may not all be truly scientifically relevant and can result in panels with suboptimal classification performance. To address this limitation, we propose a nonparametric variable selection algorithm combined with multiple imputation to develop flexible panels in the presence of missing-at-random data. We outline strategies based on the proposed algorithm that achieve control of commonly used error rates. Through simulations, we show that our proposal has good operating characteristics and results in panels with higher classification and variable selection performance compared to several existing penalized regression approaches in cases where a generalized linear model is misspecified. Finally, we use the proposed method to develop biomarker panels for separating pancreatic cysts with differing malignancy potential in a setting where complicated missingness in the biomarkers arose due to limited specimen volumes.
{"title":"Flexible variable selection in the presence of missing data.","authors":"Brian D Williamson, Ying Huang","doi":"10.1515/ijb-2023-0059","DOIUrl":"10.1515/ijb-2023-0059","url":null,"abstract":"<p><p>In many applications, it is of interest to identify a parsimonious set of features, or panel, from multiple candidates that achieves a desired level of performance in predicting a response. This task is often complicated in practice by missing data arising from the sampling design or other random mechanisms. Most recent work on variable selection in missing data contexts relies in some part on a finite-dimensional statistical model, e.g., a generalized or penalized linear model. In cases where this model is misspecified, the selected variables may not all be truly scientifically relevant and can result in panels with suboptimal classification performance. To address this limitation, we propose a nonparametric variable selection algorithm combined with multiple imputation to develop flexible panels in the presence of missing-at-random data. We outline strategies based on the proposed algorithm that achieve control of commonly used error rates. Through simulations, we show that our proposal has good operating characteristics and results in panels with higher classification and variable selection performance compared to several existing penalized regression approaches in cases where a generalized linear model is misspecified. Finally, we use the proposed method to develop biomarker panels for separating pancreatic cysts with differing malignancy potential in a setting where complicated missingness in the biomarkers arose due to limited specimen volumes.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":"347-359"},"PeriodicalIF":1.2,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11323294/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139724719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-29eCollection Date: 2024-11-01DOI: 10.1515/ijb-2022-0101
Bingkai Wang, Yu Du
In randomized trials, repeated measures of the outcome are routinely collected. The mixed model for repeated measures (MMRM) leverages the information from these repeated outcome measures, and is often used for the primary analysis to estimate the average treatment effect at the primary endpoint. MMRM, however, can suffer from bias and precision loss when it models intermediate outcomes incorrectly, and hence fails to use the post-randomization information harmlessly. This paper proposes an extension of the commonly used MMRM, called IMMRM, that improves the robustness and optimizes the precision gain from covariate adjustment, stratified randomization, and adjustment for intermediate outcome measures. Under regularity conditions and missing completely at random, we prove that the IMMRM estimator for the average treatment effect is robust to arbitrary model misspecification and is asymptotically equal or more precise than the analysis of covariance (ANCOVA) estimator and the MMRM estimator. Under missing at random, IMMRM is less likely to be misspecified than MMRM, and we demonstrate via simulation studies that IMMRM continues to have less bias and smaller variance. Our results are further supported by a re-analysis of a randomized trial for the treatment of diabetes.
{"title":"Improving the mixed model for repeated measures to robustly increase precision in randomized trials.","authors":"Bingkai Wang, Yu Du","doi":"10.1515/ijb-2022-0101","DOIUrl":"10.1515/ijb-2022-0101","url":null,"abstract":"<p><p>In randomized trials, repeated measures of the outcome are routinely collected. The mixed model for repeated measures (MMRM) leverages the information from these repeated outcome measures, and is often used for the primary analysis to estimate the average treatment effect at the primary endpoint. MMRM, however, can suffer from bias and precision loss when it models intermediate outcomes incorrectly, and hence fails to use the post-randomization information harmlessly. This paper proposes an extension of the commonly used MMRM, called IMMRM, that improves the robustness and optimizes the precision gain from covariate adjustment, stratified randomization, and adjustment for intermediate outcome measures. Under regularity conditions and missing completely at random, we prove that the IMMRM estimator for the average treatment effect is robust to arbitrary model misspecification and is asymptotically equal or more precise than the analysis of covariance (ANCOVA) estimator and the MMRM estimator. Under missing at random, IMMRM is less likely to be misspecified than MMRM, and we demonstrate via simulation studies that IMMRM continues to have less bias and smaller variance. Our results are further supported by a re-analysis of a randomized trial for the treatment of diabetes.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":"585-598"},"PeriodicalIF":1.2,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138452976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-27eCollection Date: 2024-05-01DOI: 10.1515/ijb-2023-0052
Sophie Potts, Elisabeth Bergherr, Constantin Reinke, Colin Griesbach
Model-based component-wise gradient boosting is a popular tool for data-driven variable selection. In order to improve its prediction and selection qualities even further, several modifications of the original algorithm have been developed, that mainly focus on different stopping criteria, leaving the actual variable selection mechanism untouched. We investigate different prediction-based mechanisms for the variable selection step in model-based component-wise gradient boosting. These approaches include Akaikes Information Criterion (AIC) as well as a selection rule relying on the component-wise test error computed via cross-validation. We implemented the AIC and cross-validation routines for Generalized Linear Models and evaluated them regarding their variable selection properties and predictive performance. An extensive simulation study revealed improved selection properties whereas the prediction error could be lowered in a real world application with age-standardized COVID-19 incidence rates.
基于模型的组件梯度增强是一种流行的数据驱动变量选择工具。为了进一步提高其预测和选择质量,对原始算法进行了一些修改,主要关注不同的停止准则,而没有改变实际的变量选择机制。我们研究了基于模型的组件梯度增强中变量选择步骤的不同基于预测的机制。这些方法包括赤池氏信息准则(Akaikes Information Criterion, AIC)以及依赖于通过交叉验证计算的组件测试误差的选择规则。我们实现了广义线性模型的AIC和交叉验证例程,并评估了它们的变量选择特性和预测性能。一项广泛的模拟研究揭示了改进的选择特性,而在年龄标准化的COVID-19发病率的现实世界应用中,预测误差可以降低。
{"title":"Prediction-based variable selection for component-wise gradient boosting.","authors":"Sophie Potts, Elisabeth Bergherr, Constantin Reinke, Colin Griesbach","doi":"10.1515/ijb-2023-0052","DOIUrl":"10.1515/ijb-2023-0052","url":null,"abstract":"<p><p>Model-based component-wise gradient boosting is a popular tool for data-driven variable selection. In order to improve its prediction and selection qualities even further, several modifications of the original algorithm have been developed, that mainly focus on different stopping criteria, leaving the actual variable selection mechanism untouched. We investigate different prediction-based mechanisms for the variable selection step in model-based component-wise gradient boosting. These approaches include Akaikes Information Criterion (AIC) as well as a selection rule relying on the component-wise test error computed via cross-validation. We implemented the AIC and cross-validation routines for Generalized Linear Models and evaluated them regarding their variable selection properties and predictive performance. An extensive simulation study revealed improved selection properties whereas the prediction error could be lowered in a real world application with age-standardized COVID-19 incidence rates.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":"293-314"},"PeriodicalIF":1.2,"publicationDate":"2023-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138435376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-27eCollection Date: 2024-11-01DOI: 10.1515/ijb-2022-0014
Elahe Momeni Roochi, Samaneh Eftekhari Mahabadi
Incomplete data is a prevalent complication in longitudinal studies due to individuals' drop-out before intended completion time. Currently available methods via commercial software for analyzing incomplete longitudinal data at best rely on the ignorability of the drop-outs. If the underlying missing mechanism was non-ignorable, potential bias arises in the statistical inferences. To remove the bias when the drop-out is non-ignorable, joint complete-data and drop-out models have been proposed which involve computational difficulties and untestable assumptions. Since the critical ignorability assumption is unverifiable based on the observed part of the sample, some local sensitivity indices have been proposed in the literature. Specifically, Eftekhari Mahabadi (Second-order local sensitivity to non-ignorability in Bayesian inferences. Stat Med 2018;59:55-95) proposed a second-order local sensitivity tool for Bayesian analysis of cross-sectional studies and show its better performance for handling bias compared with the first-order ones. In this paper, we aim to extend this index for the Bayesian sensitivity analysis of normal longitudinal studies with drop-outs. The index is driven based on a selection model for the drop-out mechanism and a Bayesian linear mixed-effect complete-data model. The presented formulas are calculated using the posterior estimation and draws from the simpler ignorable model. The method is illustrated via some simulation studies and sensitivity analysis of a real antidepressant clinical trial data. Overall, the numerical analysis showed that when repeated outcomes are subject to missingness, regression coefficient estimates are nearly approximated well by a linear function in the neighbourhood of MAR model, but there are a considerable amount of second-order sensitivity for the error term and random effect variances in Bayesian linear mixed-effect model framework.
在纵向研究中,由于个体在预期完成时间之前退出,数据不完整是一个普遍的并发症。目前可用的通过商业软件分析不完整纵向数据的方法,最多依赖于辍学的可忽略性。如果潜在的缺失机制是不可忽略的,则在统计推断中产生潜在的偏差。为了消除drop-out不可忽略时的偏差,提出了联合完整数据和drop-out模型,该模型涉及计算困难和不可检验的假设。由于临界可忽略性假设无法根据样本的观测部分进行验证,因此文献中提出了一些局部敏感性指标。具体地说,Eftekhari Mahabadi(二阶局部灵敏度对贝叶斯推理的不可忽略性)。Stat Med 2018;59:55-95)提出了一种用于横断面研究贝叶斯分析的二阶局部灵敏度工具,与一阶工具相比,其处理偏倚的性能更好。在本文中,我们的目标是将该指标扩展到具有辍学的正常纵向研究的贝叶斯灵敏度分析。该指标基于退出机制的选择模型和贝叶斯线性混合效应完整数据模型驱动。给出的公式是用后验估计计算的,并从更简单的可忽略模型中得出。通过模拟研究和对真实抗抑郁药物临床试验数据的敏感性分析来说明该方法。总体而言,数值分析表明,当重复结果存在缺失时,回归系数估计可以通过MAR模型邻域的线性函数近似地逼近,但贝叶斯线性混合效应模型框架中误差项和随机效应方差存在相当大的二阶敏感性。
{"title":"Bayesian second-order sensitivity of longitudinal inferences to non-ignorability: an application to antidepressant clinical trial data.","authors":"Elahe Momeni Roochi, Samaneh Eftekhari Mahabadi","doi":"10.1515/ijb-2022-0014","DOIUrl":"10.1515/ijb-2022-0014","url":null,"abstract":"<p><p>Incomplete data is a prevalent complication in longitudinal studies due to individuals' drop-out before intended completion time. Currently available methods via commercial software for analyzing incomplete longitudinal data at best rely on the ignorability of the drop-outs. If the underlying missing mechanism was non-ignorable, potential bias arises in the statistical inferences. To remove the bias when the drop-out is non-ignorable, joint complete-data and drop-out models have been proposed which involve computational difficulties and untestable assumptions. Since the critical ignorability assumption is unverifiable based on the observed part of the sample, some local sensitivity indices have been proposed in the literature. Specifically, Eftekhari Mahabadi (Second-order local sensitivity to non-ignorability in Bayesian inferences. Stat Med 2018;59:55-95) proposed a second-order local sensitivity tool for Bayesian analysis of cross-sectional studies and show its better performance for handling bias compared with the first-order ones. In this paper, we aim to extend this index for the Bayesian sensitivity analysis of normal longitudinal studies with drop-outs. The index is driven based on a selection model for the drop-out mechanism and a Bayesian linear mixed-effect complete-data model. The presented formulas are calculated using the posterior estimation and draws from the simpler ignorable model. The method is illustrated via some simulation studies and sensitivity analysis of a real antidepressant clinical trial data. Overall, the numerical analysis showed that when repeated outcomes are subject to missingness, regression coefficient estimates are nearly approximated well by a linear function in the neighbourhood of MAR model, but there are a considerable amount of second-order sensitivity for the error term and random effect variances in Bayesian linear mixed-effect model framework.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":"599-629"},"PeriodicalIF":1.2,"publicationDate":"2023-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138441586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}