Learning individualized treatment rules (ITRs) for a target patient population with mental disorders is confronted with many challenges. First, the target population may be different from the training population that provided data for learning ITRs. Ignoring differences between the training patient data and the target population can result in sub-optimal treatment strategies for the target population. Second, for mental disorders, a patient's underlying mental state is not observed but can be inferred from measures of high-dimensional combinations of symptomatology. Treatment mechanisms are unknown and can be complex, and thus treatment effect moderation can take complicated forms. To address these challenges, we propose a novel method that connects measurement models, efficient weighting schemes, and flexible neural network architecture through latent variables to tailor treatments for a target population. Patients' underlying mental states are represented by a compact set of latent state variables while preserving interpretability. Weighting schemes are designed based on lower-dimensional latent variables to efficiently balance population differences so that biases in learning the latent structure and treatment effects are mitigated. Extensive simulation studies demonstrated consistent superiority of the proposed method and the weighting approach. Applications to two real-world studies of patients with major depressive disorder have shown a broad utility of the proposed method in improving treatment outcomes in the target population.
{"title":"Optimizing personalized treatments for targeted patient populations across multiple domains.","authors":"Yuan Chen, Donglin Zeng, Yuanjia Wang","doi":"10.1515/ijb-2024-0068","DOIUrl":"10.1515/ijb-2024-0068","url":null,"abstract":"<p><p>Learning individualized treatment rules (ITRs) for a target patient population with mental disorders is confronted with many challenges. First, the target population may be different from the training population that provided data for learning ITRs. Ignoring differences between the training patient data and the target population can result in sub-optimal treatment strategies for the target population. Second, for mental disorders, a patient's underlying mental state is not observed but can be inferred from measures of high-dimensional combinations of symptomatology. Treatment mechanisms are unknown and can be complex, and thus treatment effect moderation can take complicated forms. To address these challenges, we propose a novel method that connects measurement models, efficient weighting schemes, and flexible neural network architecture through latent variables to tailor treatments for a target population. Patients' underlying mental states are represented by a compact set of latent state variables while preserving interpretability. Weighting schemes are designed based on lower-dimensional latent variables to efficiently balance population differences so that biases in learning the latent structure and treatment effects are mitigated. Extensive simulation studies demonstrated consistent superiority of the proposed method and the weighting approach. Applications to two real-world studies of patients with major depressive disorder have shown a broad utility of the proposed method in improving treatment outcomes in the target population.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142331579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Awa Diop, Caroline Sirois, Jason R Guertin, Mireille E Schnitzer, James M Brophy, Claudia Blais, Denis Talbot
In previous work, we introduced a framework that combines latent class growth analysis (LCGA) with marginal structural models (LCGA-MSM). LCGA-MSM first summarizes the numerous time-varying treatment patterns into a few trajectory groups and then allows for a population-level causal interpretation of the group differences. However, the LCGA-MSM framework is not suitable when the outcome is time-dependent. In this study, we propose combining a nonparametric history-restricted marginal structural model (HRMSM) with LCGA. HRMSMs can be seen as an application of standard MSMs on multiple time intervals. To the best of our knowledge, we also present the first application of HRMSMs with a time-to-event outcome. It was previously noted that HRMSMs could pose interpretation problems in survival analysis when either targeting a hazard ratio or a survival curve. We propose a causal parameter that bypasses these interpretation challenges. We consider three different estimators of the parameters: inverse probability of treatment weighting (IPTW), g-computation, and a pooled longitudinal targeted maximum likelihood estimator (pooled LTMLE). We conduct simulation studies to measure the performance of the proposed LCGA-HRMSM. For all scenarios, we obtain unbiased estimates when using either g-computation or pooled LTMLE. IPTW produced estimates with slightly larger bias in some scenarios. Overall, all approaches have good coverage of the 95 % confidence interval. We applied our approach to a population of older Quebecers composed of 57,211 statin initiators and found that a greater adherence to statins was associated with a lower combined risk of cardiovascular disease or all-cause mortality.
{"title":"History-restricted marginal structural model and latent class growth analysis of treatment trajectories for a time-dependent outcome.","authors":"Awa Diop, Caroline Sirois, Jason R Guertin, Mireille E Schnitzer, James M Brophy, Claudia Blais, Denis Talbot","doi":"10.1515/ijb-2023-0116","DOIUrl":"10.1515/ijb-2023-0116","url":null,"abstract":"<p><p>In previous work, we introduced a framework that combines latent class growth analysis (LCGA) with marginal structural models (LCGA-MSM). LCGA-MSM first summarizes the numerous time-varying treatment patterns into a few trajectory groups and then allows for a population-level causal interpretation of the group differences. However, the LCGA-MSM framework is not suitable when the outcome is time-dependent. In this study, we propose combining a nonparametric history-restricted marginal structural model (HRMSM) with LCGA. HRMSMs can be seen as an application of standard MSMs on multiple time intervals. To the best of our knowledge, we also present the first application of HRMSMs with a time-to-event outcome. It was previously noted that HRMSMs could pose interpretation problems in survival analysis when either targeting a hazard ratio or a survival curve. We propose a causal parameter that bypasses these interpretation challenges. We consider three different estimators of the parameters: inverse probability of treatment weighting (IPTW), g-computation, and a pooled longitudinal targeted maximum likelihood estimator (pooled LTMLE). We conduct simulation studies to measure the performance of the proposed LCGA-HRMSM. For all scenarios, we obtain unbiased estimates when using either g-computation or pooled LTMLE. IPTW produced estimates with slightly larger bias in some scenarios. Overall, all approaches have good coverage of the 95 % confidence interval. We applied our approach to a population of older Quebecers composed of 57,211 statin initiators and found that a greater adherence to statins was associated with a lower combined risk of cardiovascular disease or all-cause mortality.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141972255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Traditional methods for Sample Size Determination (SSD) based on power analysis exploit relevant fixed values or preliminary estimates for the unknown parameters. A hybrid classical-Bayesian approach can be used to formally incorporate information or model uncertainty on unknown quantities by using prior distributions according to the Bayesian approach, while still analysing the data in a frequentist framework. In this paper, we propose a hybrid procedure for SSD in two-arm superiority trials, that takes into account the different role played by the unknown parameters involved in the statistical power. Thus, different prior distributions are used to formalize design expectations and to model information or uncertainty on preliminary estimates involved at the analysis stage. To illustrate the method, we consider binary data and derive the proposed hybrid criteria using three possible parameters of interest, i.e. the difference between proportions of successes, the logarithm of the relative risk and the logarithm of the odds ratio. Numerical examples taken from the literature are presented to show how to implement the proposed procedure.
{"title":"Hybrid classical-Bayesian approach to sample size determination for two-arm superiority clinical trials.","authors":"Valeria Sambucini","doi":"10.1515/ijb-2023-0050","DOIUrl":"https://doi.org/10.1515/ijb-2023-0050","url":null,"abstract":"<p><p>Traditional methods for Sample Size Determination (SSD) based on power analysis exploit relevant fixed values or preliminary estimates for the unknown parameters. A hybrid classical-Bayesian approach can be used to formally incorporate information or model uncertainty on unknown quantities by using prior distributions according to the Bayesian approach, while still analysing the data in a frequentist framework. In this paper, we propose a hybrid procedure for SSD in two-arm superiority trials, that takes into account the different role played by the unknown parameters involved in the statistical power. Thus, different prior distributions are used to formalize design expectations and to model information or uncertainty on preliminary estimates involved at the analysis stage. To illustrate the method, we consider binary data and derive the proposed hybrid criteria using three possible parameters of interest, i.e. the difference between proportions of successes, the logarithm of the relative risk and the logarithm of the odds ratio. Numerical examples taken from the literature are presented to show how to implement the proposed procedure.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141472121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Massimo Bilancia, Andrea Nigri, Barbara Cafarelli, Danilo Di Bona
Asthma is a disease characterized by chronic airway hyperresponsiveness and inflammation, with signs of variable airflow limitation and impaired lung function leading to respiratory symptoms such as shortness of breath, chest tightness and cough. Eosinophilic asthma is a distinct phenotype that affects more than half of patients diagnosed with severe asthma. It can be effectively treated with monoclonal antibodies targeting specific immunological signaling pathways that fuel the inflammation underlying the disease, particularly Interleukin-5 (IL-5), a cytokine that plays a crucial role in asthma. In this study, we propose a data analysis pipeline aimed at identifying subphenotypes of severe eosinophilic asthma in relation to response to therapy at follow-up, which could have great potential for use in routine clinical practice. Once an optimal partition of patients into subphenotypes has been determined, the labels indicating the group to which each patient has been assigned are used in a novel way. For each input variable in a specialized logistic regression model, a clusterwise effect on response to therapy is determined by an appropriate interaction term between the input variable under consideration and the cluster label. We show that the clusterwise odds ratios can be meaningfully interpreted conditional on the cluster label. In this way, we can define an effect measure for the response variable for each input variable in each of the groups identified by the clustering algorithm, which is not possible in standard logistic regression because the effect of the reference class is aliased with the overall intercept. The interpretability of the model is enforced by promoting sparsity, a goal achieved by learning interactions in a hierarchical manner using a special group-Lasso technique. In addition, valid expressions are provided for computing odds ratios in the unusual parameterization used by the sparsity-promoting algorithm. We show how to apply the proposed data analysis pipeline to the problem of sub-phenotyping asthma patients also in terms of quality of response to therapy with monoclonal antibodies.
{"title":"An interpretable cluster-based logistic regression model, with application to the characterization of response to therapy in severe eosinophilic asthma.","authors":"Massimo Bilancia, Andrea Nigri, Barbara Cafarelli, Danilo Di Bona","doi":"10.1515/ijb-2023-0061","DOIUrl":"https://doi.org/10.1515/ijb-2023-0061","url":null,"abstract":"<p><p>Asthma is a disease characterized by chronic airway hyperresponsiveness and inflammation, with signs of variable airflow limitation and impaired lung function leading to respiratory symptoms such as shortness of breath, chest tightness and cough. Eosinophilic asthma is a distinct phenotype that affects more than half of patients diagnosed with severe asthma. It can be effectively treated with monoclonal antibodies targeting specific immunological signaling pathways that fuel the inflammation underlying the disease, particularly Interleukin-5 (IL-5), a cytokine that plays a crucial role in asthma. In this study, we propose a data analysis pipeline aimed at identifying subphenotypes of severe eosinophilic asthma in relation to response to therapy at follow-up, which could have great potential for use in routine clinical practice. Once an optimal partition of patients into subphenotypes has been determined, the labels indicating the group to which each patient has been assigned are used in a novel way. For each input variable in a specialized logistic regression model, a clusterwise effect on response to therapy is determined by an appropriate interaction term between the input variable under consideration and the cluster label. We show that the clusterwise odds ratios can be meaningfully interpreted conditional on the cluster label. In this way, we can define an effect measure for the response variable for each input variable in each of the groups identified by the clustering algorithm, which is not possible in standard logistic regression because the effect of the reference class is aliased with the overall intercept. The interpretability of the model is enforced by promoting sparsity, a goal achieved by learning interactions in a hierarchical manner using a special group-Lasso technique. In addition, valid expressions are provided for computing odds ratios in the unusual parameterization used by the sparsity-promoting algorithm. We show how to apply the proposed data analysis pipeline to the problem of sub-phenotyping asthma patients also in terms of quality of response to therapy with monoclonal antibodies.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141443615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Response to comments on 'sensitivity of estimands in clinical trials with imperfect compliance'.","authors":"Heng Chen, Daniel F Heitjan","doi":"10.1515/ijb-2024-0013","DOIUrl":"https://doi.org/10.1515/ijb-2024-0013","url":null,"abstract":"","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140854329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The mean residual lifetime (MRL) of a unit in a population at a given time t, is the average remaining lifetime among those population units still alive at the time t. In some applications, it is reasonable to assume that MRL function is a decreasing function over time. Thus, one natural way to improve the estimation of MRL function is to use this assumption in estimation process. In this paper, we develop an MRL estimator in ranked set sampling (RSS) which, enjoys the monotonicity property. We prove that it is a strongly uniformly consistent estimator of true MRL function. We also show that the asymptotic distribution of the introduced estimator is the same as the empirical one, and therefore the novel estimator is obtained "free of charge", at least in an asymptotic sense. We then compare the proposed estimator with its competitors in RSS and simple random sampling (SRS) using Monte Carlo simulation. Our simulation results confirm the superiority of the proposed procedure for finite sample sizes. Finally, a real dataset from the Surveillance, Epidemiology and End Results (SEER) program of the US National Cancer Institute (NCI) is used to show that the introduced technique can provide more accurate estimates for the average remaining lifetime of patients with breast cancer.
{"title":"Estimation of a decreasing mean residual life based on ranked set sampling with an application to survival analysis.","authors":"Elham Zamanzade, Ehsan Zamanzade, Afshin Parvardeh","doi":"10.1515/ijb-2023-0051","DOIUrl":"https://doi.org/10.1515/ijb-2023-0051","url":null,"abstract":"<p><p>The mean residual lifetime (MRL) of a unit in a population at a given time <i>t</i>, is the average remaining lifetime among those population units still alive at the time <i>t</i>. In some applications, it is reasonable to assume that MRL function is a decreasing function over time. Thus, one natural way to improve the estimation of MRL function is to use this assumption in estimation process. In this paper, we develop an MRL estimator in ranked set sampling (RSS) which, enjoys the monotonicity property. We prove that it is a strongly uniformly consistent estimator of true MRL function. We also show that the asymptotic distribution of the introduced estimator is the same as the empirical one, and therefore the novel estimator is obtained \"free of charge\", at least in an asymptotic sense. We then compare the proposed estimator with its competitors in RSS and simple random sampling (SRS) using Monte Carlo simulation. Our simulation results confirm the superiority of the proposed procedure for finite sample sizes. Finally, a real dataset from the Surveillance, Epidemiology and End Results (SEER) program of the US National Cancer Institute (NCI) is used to show that the introduced technique can provide more accurate estimates for the average remaining lifetime of patients with breast cancer.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140319833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Agreement between methods for quantitative measurements are typically assessed by computing limits of agreement between pairs of methods and/or by illustration through Bland-Altman plots. We consider the situation where the observed measurement methods are considered a random sample from a population of possible methods, and discuss how the underlying linear mixed effects model can be extended to this situation. This is relevant when, for example, the methods represent raters/judges that are used to score specific individuals or items. In the case of random methods, we are not interested in estimates pertaining to the specific methods, but are instead interested in quantifying the variation between the methods actually involved making measurements, and accommodating this as an extra source of variation when generalizing to the clinical performance of a method. In the model we allow raters to have individual precision/skill and permit linked replicates (i.e., when the numbering, labeling or ordering of the replicates within items is important). Applications involving estimation of the limits of agreement for two datasets are shown: A dataset of spatial perception among a group of students as well as a dataset on consumer preference of French chocolate. The models are implemented in the MethComp package for R [Carstensen B, Gurrin L, Ekstrøm CT, Figurski M. MethComp: functions for analysis of agreement in method comparison studies; 2013. R package version 1.22, R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2012].
定量测量方法之间的一致性通常是通过计算成对方法之间的一致性极限和/或通过布兰-阿尔特曼图进行说明来评估的。我们考虑的情况是,观察到的测量方法被视为可能方法群体中的随机样本,并讨论如何将基本的线性混合效应模型扩展到这种情况。例如,当测量方法代表用于对特定个人或项目进行评分的评分者/评判者时,这种情况就很重要。在随机方法的情况下,我们对与特定方法有关的估计值不感兴趣,而是对量化实际参与测量的方法之间的变异感兴趣,并在归纳方法的临床表现时将其作为额外的变异来源。在模型中,我们允许评定者有各自的精确度/技能,并允许链接重复(即当项目内重复的编号、标签或排序很重要时)。本文展示了对两个数据集的一致性极限进行估计的应用:一个是一组学生的空间感知数据集,另一个是消费者对法国巧克力的偏好数据集。模型由 R 软件包 MethComp 实现[Carstensen B, Gurrin L, Ekstrøm CT, Figurski M. MethComp: functions for analysis of agreement in method comparison studies; 2013.R 软件包 1.22 版,R 核心团队。R: a language and environment for statistical computing.奥地利维也纳:R Foundation for Statistical Computing; 2012]。
{"title":"Statistical models for assessing agreement for quantitative data with heterogeneous random raters and replicate measurements.","authors":"Claus Thorn Ekstrøm, Bendix Carstensen","doi":"10.1515/ijb-2023-0037","DOIUrl":"https://doi.org/10.1515/ijb-2023-0037","url":null,"abstract":"<p><p>Agreement between methods for quantitative measurements are typically assessed by computing limits of agreement between pairs of methods and/or by illustration through Bland-Altman plots. We consider the situation where the observed measurement methods are considered a random sample from a population of possible methods, and discuss how the underlying linear mixed effects model can be extended to this situation. This is relevant when, for example, the methods represent raters/judges that are used to score specific individuals or items. In the case of random methods, we are not interested in estimates pertaining to the specific methods, but are instead interested in quantifying the variation between the methods actually involved making measurements, and accommodating this as an extra source of variation when generalizing to the clinical performance of a method. In the model we allow raters to have individual precision/skill and permit linked replicates (<i>i.e.</i>, when the numbering, labeling or ordering of the replicates within items is important). Applications involving estimation of the limits of agreement for two datasets are shown: A dataset of spatial perception among a group of students as well as a dataset on consumer preference of French chocolate. The models are implemented in the MethComp package for R [Carstensen B, Gurrin L, Ekstrøm CT, Figurski M. MethComp: functions for analysis of agreement in method comparison studies; 2013. R package version 1.22, R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2012].</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139913982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In many applications, it is of interest to identify a parsimonious set of features, or panel, from multiple candidates that achieves a desired level of performance in predicting a response. This task is often complicated in practice by missing data arising from the sampling design or other random mechanisms. Most recent work on variable selection in missing data contexts relies in some part on a finite-dimensional statistical model, e.g., a generalized or penalized linear model. In cases where this model is misspecified, the selected variables may not all be truly scientifically relevant and can result in panels with suboptimal classification performance. To address this limitation, we propose a nonparametric variable selection algorithm combined with multiple imputation to develop flexible panels in the presence of missing-at-random data. We outline strategies based on the proposed algorithm that achieve control of commonly used error rates. Through simulations, we show that our proposal has good operating characteristics and results in panels with higher classification and variable selection performance compared to several existing penalized regression approaches in cases where a generalized linear model is misspecified. Finally, we use the proposed method to develop biomarker panels for separating pancreatic cysts with differing malignancy potential in a setting where complicated missingness in the biomarkers arose due to limited specimen volumes.
{"title":"Flexible variable selection in the presence of missing data.","authors":"Brian D Williamson, Ying Huang","doi":"10.1515/ijb-2023-0059","DOIUrl":"10.1515/ijb-2023-0059","url":null,"abstract":"<p><p>In many applications, it is of interest to identify a parsimonious set of features, or panel, from multiple candidates that achieves a desired level of performance in predicting a response. This task is often complicated in practice by missing data arising from the sampling design or other random mechanisms. Most recent work on variable selection in missing data contexts relies in some part on a finite-dimensional statistical model, e.g., a generalized or penalized linear model. In cases where this model is misspecified, the selected variables may not all be truly scientifically relevant and can result in panels with suboptimal classification performance. To address this limitation, we propose a nonparametric variable selection algorithm combined with multiple imputation to develop flexible panels in the presence of missing-at-random data. We outline strategies based on the proposed algorithm that achieve control of commonly used error rates. Through simulations, we show that our proposal has good operating characteristics and results in panels with higher classification and variable selection performance compared to several existing penalized regression approaches in cases where a generalized linear model is misspecified. Finally, we use the proposed method to develop biomarker panels for separating pancreatic cysts with differing malignancy potential in a setting where complicated missingness in the biomarkers arose due to limited specimen volumes.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11323294/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139724719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In randomized trials, repeated measures of the outcome are routinely collected. The mixed model for repeated measures (MMRM) leverages the information from these repeated outcome measures, and is often used for the primary analysis to estimate the average treatment effect at the primary endpoint. MMRM, however, can suffer from bias and precision loss when it models intermediate outcomes incorrectly, and hence fails to use the post-randomization information harmlessly. This paper proposes an extension of the commonly used MMRM, called IMMRM, that improves the robustness and optimizes the precision gain from covariate adjustment, stratified randomization, and adjustment for intermediate outcome measures. Under regularity conditions and missing completely at random, we prove that the IMMRM estimator for the average treatment effect is robust to arbitrary model misspecification and is asymptotically equal or more precise than the analysis of covariance (ANCOVA) estimator and the MMRM estimator. Under missing at random, IMMRM is less likely to be misspecified than MMRM, and we demonstrate via simulation studies that IMMRM continues to have less bias and smaller variance. Our results are further supported by a re-analysis of a randomized trial for the treatment of diabetes.
{"title":"Improving the mixed model for repeated measures to robustly increase precision in randomized trials.","authors":"Bingkai Wang, Yu Du","doi":"10.1515/ijb-2022-0101","DOIUrl":"https://doi.org/10.1515/ijb-2022-0101","url":null,"abstract":"<p><p>In randomized trials, repeated measures of the outcome are routinely collected. The mixed model for repeated measures (MMRM) leverages the information from these repeated outcome measures, and is often used for the primary analysis to estimate the average treatment effect at the primary endpoint. MMRM, however, can suffer from bias and precision loss when it models intermediate outcomes incorrectly, and hence fails to use the post-randomization information harmlessly. This paper proposes an extension of the commonly used MMRM, called IMMRM, that improves the robustness and optimizes the precision gain from covariate adjustment, stratified randomization, and adjustment for intermediate outcome measures. Under regularity conditions and missing completely at random, we prove that the IMMRM estimator for the average treatment effect is robust to arbitrary model misspecification and is asymptotically equal or more precise than the analysis of covariance (ANCOVA) estimator and the MMRM estimator. Under missing at random, IMMRM is less likely to be misspecified than MMRM, and we demonstrate via simulation studies that IMMRM continues to have less bias and smaller variance. Our results are further supported by a re-analysis of a randomized trial for the treatment of diabetes.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138452976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-27eCollection Date: 2024-05-01DOI: 10.1515/ijb-2023-0052
Sophie Potts, Elisabeth Bergherr, Constantin Reinke, Colin Griesbach
Model-based component-wise gradient boosting is a popular tool for data-driven variable selection. In order to improve its prediction and selection qualities even further, several modifications of the original algorithm have been developed, that mainly focus on different stopping criteria, leaving the actual variable selection mechanism untouched. We investigate different prediction-based mechanisms for the variable selection step in model-based component-wise gradient boosting. These approaches include Akaikes Information Criterion (AIC) as well as a selection rule relying on the component-wise test error computed via cross-validation. We implemented the AIC and cross-validation routines for Generalized Linear Models and evaluated them regarding their variable selection properties and predictive performance. An extensive simulation study revealed improved selection properties whereas the prediction error could be lowered in a real world application with age-standardized COVID-19 incidence rates.
基于模型的组件梯度增强是一种流行的数据驱动变量选择工具。为了进一步提高其预测和选择质量,对原始算法进行了一些修改,主要关注不同的停止准则,而没有改变实际的变量选择机制。我们研究了基于模型的组件梯度增强中变量选择步骤的不同基于预测的机制。这些方法包括赤池氏信息准则(Akaikes Information Criterion, AIC)以及依赖于通过交叉验证计算的组件测试误差的选择规则。我们实现了广义线性模型的AIC和交叉验证例程,并评估了它们的变量选择特性和预测性能。一项广泛的模拟研究揭示了改进的选择特性,而在年龄标准化的COVID-19发病率的现实世界应用中,预测误差可以降低。
{"title":"Prediction-based variable selection for component-wise gradient boosting.","authors":"Sophie Potts, Elisabeth Bergherr, Constantin Reinke, Colin Griesbach","doi":"10.1515/ijb-2023-0052","DOIUrl":"10.1515/ijb-2023-0052","url":null,"abstract":"<p><p>Model-based component-wise gradient boosting is a popular tool for data-driven variable selection. In order to improve its prediction and selection qualities even further, several modifications of the original algorithm have been developed, that mainly focus on different stopping criteria, leaving the actual variable selection mechanism untouched. We investigate different prediction-based mechanisms for the variable selection step in model-based component-wise gradient boosting. These approaches include Akaikes Information Criterion (AIC) as well as a selection rule relying on the component-wise test error computed via cross-validation. We implemented the AIC and cross-validation routines for Generalized Linear Models and evaluated them regarding their variable selection properties and predictive performance. An extensive simulation study revealed improved selection properties whereas the prediction error could be lowered in a real world application with age-standardized COVID-19 incidence rates.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":"293-314"},"PeriodicalIF":1.2,"publicationDate":"2023-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138435376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}