Pub Date : 2026-01-01Epub Date: 2025-10-24DOI: 10.1177/09622802251387430
Junwei Shen, Erica Em Moodie, Shirin Golchi
Individualized treatment rules leverage patient-level information to tailor treatments for individuals. Estimating these rules, with the goal of optimizing expected patient outcomes, typically relies on individual-level data to identify the variability in treatment effects across patient subgroups defined by different covariate combinations. To increase the statistical power for detecting treatment-covariate interactions and the generalizability of the findings, data from multisite studies are often used. However, sharing sensitive patient-level health data is sometimes restricted. Additionally, due to funding or time constraints, only a subset of available treatments can be included at each site, but an individualized treatment rule considering all treatments is desired. In this work, we adopt a two-stage Bayesian network meta-analysis approach to estimate individualized treatment rules for multiple treatments using multisite data without disclosing individual-level data beyond the sites. Simulation results demonstrate that our approach can provide consistent estimates of the parameters that fully characterize the optimal individualized treatment rule. We illustrate the method's application through an analysis of data from the Sequenced Treatment Alternatives to Relieve Depression study, the Establishing Moderators and Biosignatures of Antidepressant Response for Clinical Care study, and the Research Evaluating the Value of Augmenting Medication with Psychotherapy study.
{"title":"Two-stage Bayesian network meta-analysis of individualized treatment rules for multiple treatments with siloed data.","authors":"Junwei Shen, Erica Em Moodie, Shirin Golchi","doi":"10.1177/09622802251387430","DOIUrl":"10.1177/09622802251387430","url":null,"abstract":"<p><p>Individualized treatment rules leverage patient-level information to tailor treatments for individuals. Estimating these rules, with the goal of optimizing expected patient outcomes, typically relies on individual-level data to identify the variability in treatment effects across patient subgroups defined by different covariate combinations. To increase the statistical power for detecting treatment-covariate interactions and the generalizability of the findings, data from multisite studies are often used. However, sharing sensitive patient-level health data is sometimes restricted. Additionally, due to funding or time constraints, only a subset of available treatments can be included at each site, but an individualized treatment rule considering all treatments is desired. In this work, we adopt a two-stage Bayesian network meta-analysis approach to estimate individualized treatment rules for multiple treatments using multisite data without disclosing individual-level data beyond the sites. Simulation results demonstrate that our approach can provide consistent estimates of the parameters that fully characterize the optimal individualized treatment rule. We illustrate the method's application through an analysis of data from the Sequenced Treatment Alternatives to Relieve Depression study, the Establishing Moderators and Biosignatures of Antidepressant Response for Clinical Care study, and the Research Evaluating the Value of Augmenting Medication with Psychotherapy study.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"3-20"},"PeriodicalIF":1.9,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12783382/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145369039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Clinical prediction models are developed to estimate a patient's risk for a specific outcome, and machine learning is frequently employed to improve prediction accuracy. When the outcome is some event that happens over time, binary classifiers can predict the risk at specific time points if right-censoring is addressed by inverse-probability-of-censoring-weighting . Assessing prediction uncertainty is crucial for interpreting individual risks, but there is limited knowledge on how to consider inverse-probability-of-censoring-weighting when estimating this uncertainty. We propose an adjustment of the infinitesimal jackknife estimator for the standard error of predictions that incorporates inverse-probability-of-censoring-weighting. By using a nonparametric approach, it is broadly applicable, especially to machine learning classifiers. For a simple tractable example, we show that the proposed adjustment reveals unbiased standard error estimates. For other situations, we evaluate performance through simulation studies under both parametric models with inverse-probability-of-censoring-weighting-customized log-likelihood and machine learning with inverse-probability-of-censoring-weighting-customized loss function. We illustrate the methods by predicting post-transplant survival probabilities, using national kidney transplant registry data. Our findings show that the proposed estimator is useful for quantifying prediction uncertainty of inverse-probability-of-censoring-weighting classifiers. Applications to simulated and real data show that prediction uncertainty increases when employing binary classifiers on dichotomized data compared to predictions from survival models.
{"title":"A jackknife approach to estimate the prediction uncertainty from binary classifiers under right-censoring.","authors":"Antje Jahn-Eimermacher, Lukas Klein, Gunter Grieser","doi":"10.1177/09622802251393626","DOIUrl":"10.1177/09622802251393626","url":null,"abstract":"<p><p>Clinical prediction models are developed to estimate a patient's risk for a specific outcome, and machine learning is frequently employed to improve prediction accuracy. When the outcome is some event that happens over time, binary classifiers can predict the risk at specific time points if right-censoring is addressed by inverse-probability-of-censoring-weighting . Assessing prediction uncertainty is crucial for interpreting individual risks, but there is limited knowledge on how to consider inverse-probability-of-censoring-weighting when estimating this uncertainty. We propose an adjustment of the infinitesimal jackknife estimator for the standard error of predictions that incorporates inverse-probability-of-censoring-weighting. By using a nonparametric approach, it is broadly applicable, especially to machine learning classifiers. For a simple tractable example, we show that the proposed adjustment reveals unbiased standard error estimates. For other situations, we evaluate performance through simulation studies under both parametric models with inverse-probability-of-censoring-weighting-customized log-likelihood and machine learning with inverse-probability-of-censoring-weighting-customized loss function. We illustrate the methods by predicting post-transplant survival probabilities, using national kidney transplant registry data. Our findings show that the proposed estimator is useful for quantifying prediction uncertainty of inverse-probability-of-censoring-weighting classifiers. Applications to simulated and real data show that prediction uncertainty increases when employing binary classifiers on dichotomized data compared to predictions from survival models.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"79-100"},"PeriodicalIF":1.9,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12783380/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145522909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-12-08DOI: 10.1177/09622802251404063
Yun Yang, Xiaoyan Lin, Kerrie P Nelson
This paper delves into the realm of ordinal classification processes for multiple raters. A Probit hierarchical model is proposed linking rater's ordinal ratings with rater diagnostic skills (bias and magnifier) and patient latent disease severity, where patient latent disease severity is assumed to follow a latent class normal mixture distribution. This model specification provides closed-form expressions for both overall and individual rater receiver operator characteristic (ROC) curves and the area under these ROC curves (AUC). We further extend the model by incorporating covariate information and adding a regression layer for rater diagnostic skill parameters and/or for patient latent disease severity. The extended covariate models also offer closed-form solutions for covariate-specific ROCs and AUCs. These analytical tools greatly facilitate traditional diagnostic accuracy analysis. We demonstrate our methods thoroughly with a practical mammography example.
{"title":"Diagnostic accuracy analysis for multiple raters using probit hierarchical model for ordinal ratings.","authors":"Yun Yang, Xiaoyan Lin, Kerrie P Nelson","doi":"10.1177/09622802251404063","DOIUrl":"10.1177/09622802251404063","url":null,"abstract":"<p><p>This paper delves into the realm of ordinal classification processes for multiple raters. A Probit hierarchical model is proposed linking rater's ordinal ratings with rater diagnostic skills (bias and magnifier) and patient latent disease severity, where patient latent disease severity is assumed to follow a latent class normal mixture distribution. This model specification provides closed-form expressions for both overall and individual rater receiver operator characteristic (ROC) curves and the area under these ROC curves (AUC). We further extend the model by incorporating covariate information and adding a regression layer for rater diagnostic skill parameters and/or for patient latent disease severity. The extended covariate models also offer closed-form solutions for covariate-specific ROCs and AUCs. These analytical tools greatly facilitate traditional diagnostic accuracy analysis. We demonstrate our methods thoroughly with a practical mammography example.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"205-221"},"PeriodicalIF":1.9,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12783370/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145709497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-12-02DOI: 10.1177/09622802251399915
Rui Wang, Yiwei Fan
Survival analysis is a vital field in statistics with widespread applications. The short-term and long-term hazard ratio model is a novel semiparametric framework designed to handle crossing survival curves, encompassing the proportional hazards and proportional odds models as special cases. In this paper, we extend the short-term and long-term hazard ratio model to accommodate interval-censored and truncated data with covariates. The identifiability challenges arising from truncation are also discussed. We first prove that the nonparametric maximum likelihood estimation of the baseline survival function retains piecewise constant. Then an efficient iterative convex minorant algorithm, enhanced with a half-stepping strategy, is developed for computation. Additionally, we present a straightforward Wald test for hypothesis testing under a simplified yet commonly encountered practical scenario. Extensive simulation studies under diverse censoring and truncation scenarios demonstrate the robustness and accuracy in estimation of the proposed approach, particularly when traditional proportional hazards or proportional odds assumptions are violated. Applications to three real-world datasets further demonstrate the model's ability to capture varying covariate effects on survival probabilities across early and late stages, offering valuable insights for clinical practice and decision-making.
{"title":"Estimation of the short-term and long-term hazard ratios for interval-censored and truncated data.","authors":"Rui Wang, Yiwei Fan","doi":"10.1177/09622802251399915","DOIUrl":"10.1177/09622802251399915","url":null,"abstract":"<p><p>Survival analysis is a vital field in statistics with widespread applications. The short-term and long-term hazard ratio model is a novel semiparametric framework designed to handle crossing survival curves, encompassing the proportional hazards and proportional odds models as special cases. In this paper, we extend the short-term and long-term hazard ratio model to accommodate interval-censored and truncated data with covariates. The identifiability challenges arising from truncation are also discussed. We first prove that the nonparametric maximum likelihood estimation of the baseline survival function retains piecewise constant. Then an efficient iterative convex minorant algorithm, enhanced with a half-stepping strategy, is developed for computation. Additionally, we present a straightforward Wald test for hypothesis testing under a simplified yet commonly encountered practical scenario. Extensive simulation studies under diverse censoring and truncation scenarios demonstrate the robustness and accuracy in estimation of the proposed approach, particularly when traditional proportional hazards or proportional odds assumptions are violated. Applications to three real-world datasets further demonstrate the model's ability to capture varying covariate effects on survival probabilities across early and late stages, offering valuable insights for clinical practice and decision-making.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"131-146"},"PeriodicalIF":1.9,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145661977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-29DOI: 10.1177/09622802251403356
Jiangshan Zhang, Vivek Pradhan, Yuxi Zhao
The Binary Emax model is widely employed in dose-response analysis during drug development, where missing data often pose significant challenges. Addressing nonignorable missing binary responses-where the likelihood of missing data is related to unobserved outcomes-is particularly important, yet existing methods often lead to biased estimates. This issue is compounded when using the regulatory-recommended ''imputing as treatment failure'' approach, known as non-responder imputation (NRI). Moreover, the problem of separation, where a predictor perfectly distinguishes between outcome classes, can further complicate likelihood maximization. In this paper, we introduce a penalized likelihood-based method that integrates a modified expectation-maximization (EM) algorithm in the spirit of Ibrahim and Lipsitz to effectively manage both nonignorable missing data and separation issues. Our approach applies a noninformative Jeffreys' prior to the likelihood, reducing bias in parameter estimation. Simulation studies demonstrate that our method outperforms existing methods, such as NRI, and the superiority is further supported by its application to data from a Phase II clinical trial. Additionally, we have developed an R package, ememax (https://github.com/Celaeno1017/ememax), to facilitate the implementation of the proposed method.
{"title":"Robust Emax model fitting: Addressing nonignorable missing binary outcome in dose-response analysis.","authors":"Jiangshan Zhang, Vivek Pradhan, Yuxi Zhao","doi":"10.1177/09622802251403356","DOIUrl":"https://doi.org/10.1177/09622802251403356","url":null,"abstract":"<p><p>The Binary Emax model is widely employed in dose-response analysis during drug development, where missing data often pose significant challenges. Addressing nonignorable missing binary responses-where the likelihood of missing data is related to unobserved outcomes-is particularly important, yet existing methods often lead to biased estimates. This issue is compounded when using the regulatory-recommended ''imputing as treatment failure'' approach, known as non-responder imputation (NRI). Moreover, the problem of separation, where a predictor perfectly distinguishes between outcome classes, can further complicate likelihood maximization. In this paper, we introduce a penalized likelihood-based method that integrates a modified expectation-maximization (EM) algorithm in the spirit of Ibrahim and Lipsitz to effectively manage both nonignorable missing data and separation issues. Our approach applies a noninformative Jeffreys' prior to the likelihood, reducing bias in parameter estimation. Simulation studies demonstrate that our method outperforms existing methods, such as NRI, and the superiority is further supported by its application to data from a Phase II clinical trial. Additionally, we have developed an R package, <i>ememax</i> (<i>https://github.com/Celaeno1017/ememax</i>), to facilitate the implementation of the proposed method.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251403356"},"PeriodicalIF":1.9,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145857966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Predicting the risk of death for chronic patients is highly valuable for informed medical decision-making. This paper proposes a general framework for dynamic prediction of the risk of death of a patient given her hospitalization history. Predictions are based on a joint model for the death and hospitalization processes, thereby avoiding the potential bias arising from selection of survivors. The framework is valid for arbitrary models for the hospitalization process-it does not require independence of hospitalization times nor gap times. In particular, we study the prediction of the risk of death in a renewal model for hospitalizations-a common approach to recurrent event modeling. In the renewal model, the distribution of hospitalizations throughout the follow-up period impacts the risk of death. This result differs from the prediction of death when considering the Poisson model for the hospitalization process, previously studied, where only the number of hospitalizations matters. We apply our methodology to a prospective, observational cohort study of 512 patients treated for chronic obstructive pulmonary disease in one of six outpatient respiratory clinics run by the Respiratory Service of Galdakao University Hospital, with a median follow-up of 4.7 years. We find that more concentrated hospitalizations increase the risk of death and that the hazard ratio for death continuously increases as the number of hospitalizations increases during follow-up.
{"title":"Dynamic prediction of death risk given a renewal hospitalization process.","authors":"Telmo Pérez-Izquierdo, Irantzu Barrio, Cristobal Esteban","doi":"10.1177/09622802251404065","DOIUrl":"https://doi.org/10.1177/09622802251404065","url":null,"abstract":"<p><p>Predicting the risk of death for chronic patients is highly valuable for informed medical decision-making. This paper proposes a general framework for dynamic prediction of the risk of death of a patient given her hospitalization history. Predictions are based on a joint model for the death and hospitalization processes, thereby avoiding the potential bias arising from selection of survivors. The framework is valid for arbitrary models for the hospitalization process-it does not require independence of hospitalization times nor gap times. In particular, we study the prediction of the risk of death in a renewal model for hospitalizations-a common approach to recurrent event modeling. In the renewal model, the distribution of hospitalizations throughout the follow-up period impacts the risk of death. This result differs from the prediction of death when considering the Poisson model for the hospitalization process, previously studied, where only the number of hospitalizations matters. We apply our methodology to a prospective, observational cohort study of 512 patients treated for chronic obstructive pulmonary disease in one of six outpatient respiratory clinics run by the Respiratory Service of Galdakao University Hospital, with a median follow-up of 4.7 years. We find that more concentrated hospitalizations increase the risk of death and that the hazard ratio for death continuously increases as the number of hospitalizations increases during follow-up.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251404065"},"PeriodicalIF":1.9,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145794914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-15DOI: 10.1177/09622802251403355
Vanessa McNealis, Erica Em Moodie, Nema Dean
Causal inference on populations embedded in social networks poses technical challenges, since the typical no-interference assumption frequently does not hold. Existing methods developed in the context of network interference rely upon the assumption of no unmeasured confounding. However, when faced with multilevel network data, there may be a latent factor influencing both the exposure and the outcome at the cluster level. We propose a Bayesian inference approach that combines a joint mixed-effects model for the outcome and the exposure with direct standardisation to identify and estimate causal effects in the presence of network interference and unmeasured cluster confounding. In simulations, we compare our proposed method with linear mixed and fixed effects models and show that unbiased estimation is achieved using the joint model. Having derived valid tools for estimation, we examine the effect of home environment on adolescent school performance using data from the National Longitudinal Study of Adolescent Health.
{"title":"Joint mixed-effects models for causal inference in clustered network-based observational studies.","authors":"Vanessa McNealis, Erica Em Moodie, Nema Dean","doi":"10.1177/09622802251403355","DOIUrl":"https://doi.org/10.1177/09622802251403355","url":null,"abstract":"<p><p>Causal inference on populations embedded in social networks poses technical challenges, since the typical no-interference assumption frequently does not hold. Existing methods developed in the context of network interference rely upon the assumption of no unmeasured confounding. However, when faced with multilevel network data, there may be a latent factor influencing both the exposure and the outcome at the cluster level. We propose a Bayesian inference approach that combines a joint mixed-effects model for the outcome and the exposure with direct standardisation to identify and estimate causal effects in the presence of network interference and unmeasured cluster confounding. In simulations, we compare our proposed method with linear mixed and fixed effects models and show that unbiased estimation is achieved using the joint model. Having derived valid tools for estimation, we examine the effect of home environment on adolescent school performance using data from the National Longitudinal Study of Adolescent Health.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251403355"},"PeriodicalIF":1.9,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145763951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-15DOI: 10.1177/09622802251404064
Daniel Tompsett, Stijn Vansteelandt, Richard Grieve, John Robson, Manuel Gomes
As routinely collected longitudinal data becomes more available in many settings, policy makers are increasingly interested in the effect of time-varying treatments (sustained treatment strategies). In settings such as this, many commonly used statistical approaches for estimating treatment effects, such as g-methods, often adopt the 'no unmeasured confounding' assumption. Instrumental variable (IV) methods aim to reduce biases due to unmeasured confounding, but have received limited attention in settings with time-varying treatments. This paper extends and critically evaluates a commonly used IV estimating approach, Two Stage Least Squares (2SLS), for evaluating time-varying treatments. Using a simulation study, we found that, unlike standard 2SLS, the extended 2SLS performs relatively well across a wide range of circumstances, including certain model misspecifications. We illustrate the methods in an evaluation of treatment intensification for Type-2 Diabetes Mellitus, exploring the exogeneity in prescribing preferences to operationalise a time-varying instrument.
{"title":"Two stage least squares with time-varying instruments: An application to an evaluation of treatment intensification for type-2 diabetes.","authors":"Daniel Tompsett, Stijn Vansteelandt, Richard Grieve, John Robson, Manuel Gomes","doi":"10.1177/09622802251404064","DOIUrl":"https://doi.org/10.1177/09622802251404064","url":null,"abstract":"<p><p>As routinely collected longitudinal data becomes more available in many settings, policy makers are increasingly interested in the effect of time-varying treatments (sustained treatment strategies). In settings such as this, many commonly used statistical approaches for estimating treatment effects, such as g-methods, often adopt the 'no unmeasured confounding' assumption. Instrumental variable (IV) methods aim to reduce biases due to unmeasured confounding, but have received limited attention in settings with time-varying treatments. This paper extends and critically evaluates a commonly used IV estimating approach, Two Stage Least Squares (2SLS), for evaluating time-varying treatments. Using a simulation study, we found that, unlike standard 2SLS, the extended 2SLS performs relatively well across a wide range of circumstances, including certain model misspecifications. We illustrate the methods in an evaluation of treatment intensification for Type-2 Diabetes Mellitus, exploring the exogeneity in prescribing preferences to operationalise a time-varying instrument.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251404064"},"PeriodicalIF":1.9,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145757655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-09DOI: 10.1177/09622802251399913
Daniela Cr Oliveira, Fernanda L Schumacher, Victor H Lachos
The expectation-maximization (EM) algorithm is a popular tool for maximum likelihood estimation, but its use in high-dimensional regularization problems in linear mixed-effects models has been limited. In this article, we introduce the EMLMLasso algorithm, which combines the EM algorithm with the popular and efficient R package glmnet for Lasso variable selection of fixed effects in linear mixed-effects models and allows for automatic selection of the tuning parameter. A comprehensive performance evaluation is conducted, comparing the proposed EMLMLasso algorithm against two existing algorithms implemented in the R packages glmmLasso and splmm. In both simulated and real-world applications analyzed, our algorithm showed robustness and effectiveness in variable selection, including cases where the number of predictors is greater than the number of independent observations . In most evaluated scenarios, the EMLMLasso algorithm consistently outperformed both glmmLasso and splmm. The proposed method is quite general and simple to implement, allowing for extensions based on ridge and elastic net penalties in linear mixed-effects models.
{"title":"The use of the EM algorithm for regularization problems in high-dimensional linear mixed-effects models.","authors":"Daniela Cr Oliveira, Fernanda L Schumacher, Victor H Lachos","doi":"10.1177/09622802251399913","DOIUrl":"https://doi.org/10.1177/09622802251399913","url":null,"abstract":"<p><p>The expectation-maximization (EM) algorithm is a popular tool for maximum likelihood estimation, but its use in high-dimensional regularization problems in linear mixed-effects models has been limited. In this article, we introduce the EMLMLasso algorithm, which combines the EM algorithm with the popular and efficient R package glmnet for Lasso variable selection of fixed effects in linear mixed-effects models and allows for automatic selection of the tuning parameter. A comprehensive performance evaluation is conducted, comparing the proposed EMLMLasso algorithm against two existing algorithms implemented in the R packages glmmLasso and splmm. In both simulated and real-world applications analyzed, our algorithm showed robustness and effectiveness in variable selection, including cases where the number of predictors <math><mo>(</mo><mi>p</mi><mo>)</mo></math> is greater than the number of independent observations <math><mo>(</mo><mi>n</mi><mo>)</mo></math>. In most evaluated scenarios, the EMLMLasso algorithm consistently outperformed both glmmLasso and splmm. The proposed method is quite general and simple to implement, allowing for extensions based on ridge and elastic net penalties in linear mixed-effects models.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251399913"},"PeriodicalIF":1.9,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145709555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-08DOI: 10.1177/09622802251403279
Yen Chang, Anastasia Ivanova, Demetrius Albanes, Jason P Fine, Yei Eun Shin
Longitudinal data are often available in cohort studies and clinical settings, such as covariates collected at cohort follow-up visits or prescriptions captured in electronic health records. Such longitudinal information, if correlates with the health event of interest, may be incorporated to dynamically predict the probability of a health event with better precision. Landmarking is a popular approach to dynamic prediction. There are well-established methods for landmarking using full cohort data, but collecting data on all cohort members may not be feasible when resource is limited. Instead, one may select a subset of the cohort using subsampling designs, and only collect data on this subset. In this work, we present conditional likelihood and inverse-probability weighted methods for landmarking using data from cohort subsampling designs, and discuss considerations for choosing a particular method. Simulations are conducted to evaluate the applicability of the methods and their predictive performance in different scenarios. Results show that our methods have similar predictive performance to the full cohort analysis but only use small fractions of the full cohort data. We use real nested case-control data from the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial to illustrate the methods.
{"title":"Dynamic prediction by landmarking with data from cohort subsampling designs.","authors":"Yen Chang, Anastasia Ivanova, Demetrius Albanes, Jason P Fine, Yei Eun Shin","doi":"10.1177/09622802251403279","DOIUrl":"https://doi.org/10.1177/09622802251403279","url":null,"abstract":"<p><p>Longitudinal data are often available in cohort studies and clinical settings, such as covariates collected at cohort follow-up visits or prescriptions captured in electronic health records. Such longitudinal information, if correlates with the health event of interest, may be incorporated to dynamically predict the probability of a health event with better precision. Landmarking is a popular approach to dynamic prediction. There are well-established methods for landmarking using full cohort data, but collecting data on all cohort members may not be feasible when resource is limited. Instead, one may select a subset of the cohort using subsampling designs, and only collect data on this subset. In this work, we present conditional likelihood and inverse-probability weighted methods for landmarking using data from cohort subsampling designs, and discuss considerations for choosing a particular method. Simulations are conducted to evaluate the applicability of the methods and their predictive performance in different scenarios. Results show that our methods have similar predictive performance to the full cohort analysis but only use small fractions of the full cohort data. We use real nested case-control data from the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial to illustrate the methods.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251403279"},"PeriodicalIF":1.9,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145701680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}