Stephen Salerno, Emily K Roberts, Belinda L Needham, Tyler H McCormick, Fan Li, Bhramar Mukherjee, Xu Shi
In this work, we are motivated by the problem of estimating racial disparities in health outcomes, specifically the average controlled difference (ACD) in telomere length between Black and White individuals, using data from the National Health and Nutrition Examination Survey (NHANES). To do so, we build a propensity for race to properly adjust for other social determinants while characterizing the controlled effect of race on telomere length. Propensity score methods are broadly employed with observational data as a tool to achieve covariate balance, but how to implement them in complex surveys is less studied-in particular, when the survey weights depend on the group variable under comparison (as the NHANES sampling scheme depends on self-reported race). We propose identification formulas to properly estimate the ACD in outcomes between Black and White individuals, with appropriate weighting for both covariate imbalance across the two racial groups and generalizability. Via extensive simulation, we show that our proposed methods outperform traditional analytic approaches in terms of bias, mean squared error, and coverage when estimating the ACD for our setting of interest. In our data, we find that evidence of racial differences in telomere length between Black and White individuals attenuates after accounting for confounding by socioeconomic factors and utilizing appropriate propensity score and survey weighting techniques. Software to implement these methods and code to reproduce our results can be found in the R package svycdiff, available through the Comprehensive R Archive Network (CRAN) at cran.r-project.org/web/packages/svycdiff/, or in a development version on GitHub at github.com/salernos/svycdiff.
{"title":"What's the Weight? Estimating Controlled Outcome Differences in Complex Surveys for Health Disparities Research.","authors":"Stephen Salerno, Emily K Roberts, Belinda L Needham, Tyler H McCormick, Fan Li, Bhramar Mukherjee, Xu Shi","doi":"10.1002/sim.70289","DOIUrl":"10.1002/sim.70289","url":null,"abstract":"<p><p>In this work, we are motivated by the problem of estimating racial disparities in health outcomes, specifically the average controlled difference (ACD) in telomere length between Black and White individuals, using data from the National Health and Nutrition Examination Survey (NHANES). To do so, we build a propensity for race to properly adjust for other social determinants while characterizing the controlled effect of race on telomere length. Propensity score methods are broadly employed with observational data as a tool to achieve covariate balance, but how to implement them in complex surveys is less studied-in particular, when the survey weights depend on the group variable under comparison (as the NHANES sampling scheme depends on self-reported race). We propose identification formulas to properly estimate the ACD in outcomes between Black and White individuals, with appropriate weighting for both covariate imbalance across the two racial groups and generalizability. Via extensive simulation, we show that our proposed methods outperform traditional analytic approaches in terms of bias, mean squared error, and coverage when estimating the ACD for our setting of interest. In our data, we find that evidence of racial differences in telomere length between Black and White individuals attenuates after accounting for confounding by socioeconomic factors and utilizing appropriate propensity score and survey weighting techniques. Software to implement these methods and code to reproduce our results can be found in the R package svycdiff, available through the Comprehensive R Archive Network (CRAN) at cran.r-project.org/web/packages/svycdiff/, or in a development version on GitHub at github.com/salernos/svycdiff.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 23-24","pages":"e70289"},"PeriodicalIF":1.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12636266/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145239759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Samsul Alam, Dongrak Choi, Salil Koner, Sheng Luo
The progressive and multifaceted nature of Parkinson's disease (PD) calls for the integration of diverse data types, including continuous, ordinal, and binary, in longitudinal studies for a comprehensive understanding of symptom progression and disease trajectory. Significant terminal events, such as severe disability or mortality, highlight the need for joint modeling approaches that simultaneously address multivariate outcomes and time-to-event data. We introduce functional latent trait model-joint model (FLTM-JM), a novel joint modeling framework based on the functional latent trait model (FLTM), to jointly analyze multivariate longitudinal data and survival outcomes. The FLTM component leverages a non-parametric, function-on-scalar regression framework, enabling flexible modeling of complex relationships between covariates and patient outcomes over time. This joint modeling approach supports dynamic, subject-specific predictions, offering valuable insights for personalized treatment strategies. Applied to Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS-UPDRS) data from the Parkinson's Progression Markers Initiative (PPMI), our model effectively identifies the influence of key covariates and demonstrates the utility of dynamic predictions in clinical decision-making. Extensive simulation studies validate the accuracy, robustness, and computational efficiency of FLTM-JM, even under model misspecification.
{"title":"Dynamic Prediction Using Functional Latent Trait Joint Models for Multivariate Longitudinal Outcomes: An Application to Parkinson's Disease.","authors":"Mohammad Samsul Alam, Dongrak Choi, Salil Koner, Sheng Luo","doi":"10.1002/sim.70285","DOIUrl":"10.1002/sim.70285","url":null,"abstract":"<p><p>The progressive and multifaceted nature of Parkinson's disease (PD) calls for the integration of diverse data types, including continuous, ordinal, and binary, in longitudinal studies for a comprehensive understanding of symptom progression and disease trajectory. Significant terminal events, such as severe disability or mortality, highlight the need for joint modeling approaches that simultaneously address multivariate outcomes and time-to-event data. We introduce functional latent trait model-joint model (FLTM-JM), a novel joint modeling framework based on the functional latent trait model (FLTM), to jointly analyze multivariate longitudinal data and survival outcomes. The FLTM component leverages a non-parametric, function-on-scalar regression framework, enabling flexible modeling of complex relationships between covariates and patient outcomes over time. This joint modeling approach supports dynamic, subject-specific predictions, offering valuable insights for personalized treatment strategies. Applied to Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS-UPDRS) data from the Parkinson's Progression Markers Initiative (PPMI), our model effectively identifies the influence of key covariates and demonstrates the utility of dynamic predictions in clinical decision-making. Extensive simulation studies validate the accuracy, robustness, and computational efficiency of FLTM-JM, even under model misspecification.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 23-24","pages":"e70285"},"PeriodicalIF":1.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12614809/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145309197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Estimating risk factors for the incidence of a disease is crucial for understanding its etiology. For diseases caused by enteric pathogens, off-the-shelf statistical model-based approaches do not consider the biological mechanisms through which infection occurs and thus can only be used to make comparatively weak statements about the association between risk factors and incidence. Building off of established work in quantitative microbiological risk assessment, we propose a new approach to determining the association between risk factors and dose accrual rates. Our more mechanistic approach achieves a higher degree of biological plausibility, incorporates currently ignored sources of variability, and provides regression parameters that are easily interpretable as the dose accrual rate ratio due to changes in the risk factors under study. We also describe a method for leveraging information across multiple pathogens. The proposed methods are available as an R package at https://github.com/dksewell/dare. Our simulation study shows unacceptable coverage rates from generalized linear models, while the proposed approach empirically maintains the nominal rate even when the model is misspecified. Finally, we demonstrated our proposed approach by applying our method to infant data obtained through the PATHOME study (https://reporter.nih.gov/project-details/10227256), discovering the impact of various environmental factors on infant enteric infections.
{"title":"Estimating Risk Factors for Pathogenic Dose Accrual From Longitudinal Data.","authors":"Daniel K Sewell, Kelly K Baker","doi":"10.1002/sim.70291","DOIUrl":"10.1002/sim.70291","url":null,"abstract":"<p><p>Estimating risk factors for the incidence of a disease is crucial for understanding its etiology. For diseases caused by enteric pathogens, off-the-shelf statistical model-based approaches do not consider the biological mechanisms through which infection occurs and thus can only be used to make comparatively weak statements about the association between risk factors and incidence. Building off of established work in quantitative microbiological risk assessment, we propose a new approach to determining the association between risk factors and dose accrual rates. Our more mechanistic approach achieves a higher degree of biological plausibility, incorporates currently ignored sources of variability, and provides regression parameters that are easily interpretable as the dose accrual rate ratio due to changes in the risk factors under study. We also describe a method for leveraging information across multiple pathogens. The proposed methods are available as an R package at https://github.com/dksewell/dare. Our simulation study shows unacceptable coverage rates from generalized linear models, while the proposed approach empirically maintains the nominal rate even when the model is misspecified. Finally, we demonstrated our proposed approach by applying our method to infant data obtained through the PATHOME study (https://reporter.nih.gov/project-details/10227256), discovering the impact of various environmental factors on infant enteric infections.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 23-24","pages":"e70291"},"PeriodicalIF":1.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12503088/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145239741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
James M McGree, Antony M Overstall, Mark Jones, Robert K Mahar
Clinical trials are an integral component of medical research. Trials require careful design to, for example, maintain the safety of participants and to use resources efficiently. Adaptive clinical trials are often more efficient and ethical than standard or non-adaptive trials because they can require fewer participants, target more promising treatments, and stop early with sufficient evidence of effectiveness or harm. The design of adaptive trials is usually undertaken via simulation, which requires assumptions about the data-generating process to be specified a priori. Unfortunately, if such assumptions are misspecified, then the resulting trial design may not perform as expected, leading to, for example, reduced statistical power or an increased Type I error. Motivated by a clinical trial of a vaccine to protect against gastroenteritis in infants, we propose an approach to design adaptive clinical trials with time-to-event outcomes without needing to explicitly define the data-generating process. To facilitate this, we consider trial design within a general Bayesian framework where inference about the treatment effect is based on the partial likelihood. As a result, inference is robust to the form of the baseline hazard function, and we exploit this property to undertake trial design when the data-generating process is only implicitly defined. The benefits of this approach are demonstrated via an illustrative example and via redesigning our motivating clinical trial.
{"title":"An Approach to Design Adaptive Clinical Trials With Time-to-Event Outcomes Based on a General Bayesian Posterior Distribution.","authors":"James M McGree, Antony M Overstall, Mark Jones, Robert K Mahar","doi":"10.1002/sim.70207","DOIUrl":"10.1002/sim.70207","url":null,"abstract":"<p><p>Clinical trials are an integral component of medical research. Trials require careful design to, for example, maintain the safety of participants and to use resources efficiently. Adaptive clinical trials are often more efficient and ethical than standard or non-adaptive trials because they can require fewer participants, target more promising treatments, and stop early with sufficient evidence of effectiveness or harm. The design of adaptive trials is usually undertaken via simulation, which requires assumptions about the data-generating process to be specified a priori. Unfortunately, if such assumptions are misspecified, then the resulting trial design may not perform as expected, leading to, for example, reduced statistical power or an increased Type I error. Motivated by a clinical trial of a vaccine to protect against gastroenteritis in infants, we propose an approach to design adaptive clinical trials with time-to-event outcomes without needing to explicitly define the data-generating process. To facilitate this, we consider trial design within a general Bayesian framework where inference about the treatment effect is based on the partial likelihood. As a result, inference is robust to the form of the baseline hazard function, and we exploit this property to undertake trial design when the data-generating process is only implicitly defined. The benefits of this approach are demonstrated via an illustrative example and via redesigning our motivating clinical trial.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 23-24","pages":"e70207"},"PeriodicalIF":1.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12510400/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145252732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mixtures of linear mixed models are widely used for modeling longitudinal data for which observation times differ between subjects. In typical applications, temporal trends are described using a basis expansion, with basis coefficients treated as random effects varying by subject. Additional random effects can describe variation between mixture components or other known sources of variation in complex designs. A key advantage of these models is that they provide a natural mechanism for clustering. Current versions of mixtures of linear mixed models are not specifically designed for the case where there are many observations per subject and complex temporal trends, which require a large number of basis functions to capture. In this case, the subject-specific basis coefficients are a high-dimensional random effects vector, for which the covariance matrix is hard to specify and estimate, especially if it varies between mixture components. To address this issue, we consider the use of deep mixture of factor analyzers models as a prior for the random effects. The resulting deep mixture of linear mixed models is well suited for high-dimensional settings, and we describe an efficient variational inference approach to posterior computation. The efficacy of the method is demonstrated in biomedical applications and on simulated data.
{"title":"Deep Mixture of Linear Mixed Models for Complex Longitudinal Data.","authors":"Lucas Kock, Nadja Klein, David J Nott","doi":"10.1002/sim.70288","DOIUrl":"10.1002/sim.70288","url":null,"abstract":"<p><p>Mixtures of linear mixed models are widely used for modeling longitudinal data for which observation times differ between subjects. In typical applications, temporal trends are described using a basis expansion, with basis coefficients treated as random effects varying by subject. Additional random effects can describe variation between mixture components or other known sources of variation in complex designs. A key advantage of these models is that they provide a natural mechanism for clustering. Current versions of mixtures of linear mixed models are not specifically designed for the case where there are many observations per subject and complex temporal trends, which require a large number of basis functions to capture. In this case, the subject-specific basis coefficients are a high-dimensional random effects vector, for which the covariance matrix is hard to specify and estimate, especially if it varies between mixture components. To address this issue, we consider the use of deep mixture of factor analyzers models as a prior for the random effects. The resulting deep mixture of linear mixed models is well suited for high-dimensional settings, and we describe an efficient variational inference approach to posterior computation. The efficacy of the method is demonstrated in biomedical applications and on simulated data.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 23-24","pages":"e70288"},"PeriodicalIF":1.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12503021/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145239632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jessie K Edwards, Bonnie E Shook-Sa, Giorgos Bakoyannis, Paul N Zivich, Michael E Herce, Stephen R Cole
Misclassification between causes of death can produce bias in estimated cumulative incidence functions. When estimating causal quantities, such as comparing the cumulative incidence of death due to specific causes under interventions, such bias can lead to suboptimal decision making. Here, a consistent semiparametric estimator of the cumulative incidence function under interventions in settings with misclassification between two event types is presented. The measurement parameters for this estimator can be informed by validation data or expert knowledge. Moreover, a modified bootstrap approach to variance estimation is proposed for confidence interval construction. The proposed estimator was applied to estimate the cumulative incidence of AIDS-related mortality in the Multicenter AIDS Cohort Study under single- versus combination-drug antiretroviral therapy regimens that may be subject to confounding. The proposed estimator is shown to be consistent and performed well in finite samples via a series of simulation experiments.
{"title":"Accounting for Misclassification of Cause of Death in Weighted Cumulative Incidence Functions for Causal Analyses.","authors":"Jessie K Edwards, Bonnie E Shook-Sa, Giorgos Bakoyannis, Paul N Zivich, Michael E Herce, Stephen R Cole","doi":"10.1002/sim.70281","DOIUrl":"10.1002/sim.70281","url":null,"abstract":"<p><p>Misclassification between causes of death can produce bias in estimated cumulative incidence functions. When estimating causal quantities, such as comparing the cumulative incidence of death due to specific causes under interventions, such bias can lead to suboptimal decision making. Here, a consistent semiparametric estimator of the cumulative incidence function under interventions in settings with misclassification between two event types is presented. The measurement parameters for this estimator can be informed by validation data or expert knowledge. Moreover, a modified bootstrap approach to variance estimation is proposed for confidence interval construction. The proposed estimator was applied to estimate the cumulative incidence of AIDS-related mortality in the Multicenter AIDS Cohort Study under single- versus combination-drug antiretroviral therapy regimens that may be subject to confounding. The proposed estimator is shown to be consistent and performed well in finite samples via a series of simulation experiments.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 23-24","pages":"e70281"},"PeriodicalIF":1.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12695060/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145239714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christopher B Boyer, Issa J Dahabreh, Jon A Steingrimsson
Counterfactual prediction methods are required when a model will be deployed in a setting where treatment policies differ from the setting where the model was developed, or when a model provides predictions under hypothetical interventions to support decision-making. However, estimating and evaluating counterfactual prediction models is challenging because, unlike traditional (factual) prediction, one does not observe the potential outcomes for all individuals under all treatment strategies of interest. Here, we discuss how to estimate a counterfactual prediction model, how to assess the model's performance, and how to perform model and tuning parameter selection. We provide identification and estimation results for counterfactual prediction models and for multiple measures of counterfactual model performance, including loss-based measures, the area under the receiver operating characteristic curve, and the calibration curve. Importantly, our results allow valid estimates of model performance under counterfactual intervention even if the candidate prediction model is misspecified, permitting a wider array of use cases. We illustrate these methods using simulation and apply them to the task of developing a statin-naïve risk prediction model for cardiovascular disease.
{"title":"Estimating and Evaluating Counterfactual Prediction Models.","authors":"Christopher B Boyer, Issa J Dahabreh, Jon A Steingrimsson","doi":"10.1002/sim.70287","DOIUrl":"10.1002/sim.70287","url":null,"abstract":"<p><p>Counterfactual prediction methods are required when a model will be deployed in a setting where treatment policies differ from the setting where the model was developed, or when a model provides predictions under hypothetical interventions to support decision-making. However, estimating and evaluating counterfactual prediction models is challenging because, unlike traditional (factual) prediction, one does not observe the potential outcomes for all individuals under all treatment strategies of interest. Here, we discuss how to estimate a counterfactual prediction model, how to assess the model's performance, and how to perform model and tuning parameter selection. We provide identification and estimation results for counterfactual prediction models and for multiple measures of counterfactual model performance, including loss-based measures, the area under the receiver operating characteristic curve, and the calibration curve. Importantly, our results allow valid estimates of model performance under counterfactual intervention even if the candidate prediction model is misspecified, permitting a wider array of use cases. We illustrate these methods using simulation and apply them to the task of developing a statin-naïve risk prediction model for cardiovascular disease.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 23-24","pages":"e70287"},"PeriodicalIF":1.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12503020/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145239796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The relationship between the treatment effect and the baseline risk is a recognized tool to investigate the heterogeneity of treatment effects in meta-analyses of clinical trials. Since the baseline risk is difficult to measure, a proxy is adopted, which is based on the rate of events for the subject under the control condition. The use of the proxy in terms of aggregated information at the study level implies that the data are affected by measurement errors, a problem that the literature has explored and addressed in recent years. This paper proposes an extension of the classical meta-analysis with baseline risk information, which includes additional study-specific covariates other than the rate of events to explain heterogeneity. Likelihood-based inference is carried out by including measurement error correction techniques necessary to prevent unreliable inference due to the measurement errors affecting the covariates summarized at the study level. Within-study covariances between risk measures and the covariate components are computed using Taylor expansions based on study-level covariate subgroup summary information. When such information is not available and, more generally, in order to reduce computational difficulties, a pseudo-likelihood solution is developed under a working independence assumption between the observed error-prone measures. The performance of the methods is investigated in a series of simulation studies under different specifications for the sample size, the between-study heterogeneity, and the underlying risk distribution. They are applied to a meta-analysis about the association between COVID-19 and schizophrenia.
{"title":"Modeling the Role of Baseline Risk and Additional Study-Level Covariates in Meta-Analysis of Treatment Effects.","authors":"Phuc T Tran, Annamaria Guolo","doi":"10.1002/sim.70278","DOIUrl":"10.1002/sim.70278","url":null,"abstract":"<p><p>The relationship between the treatment effect and the baseline risk is a recognized tool to investigate the heterogeneity of treatment effects in meta-analyses of clinical trials. Since the baseline risk is difficult to measure, a proxy is adopted, which is based on the rate of events for the subject under the control condition. The use of the proxy in terms of aggregated information at the study level implies that the data are affected by measurement errors, a problem that the literature has explored and addressed in recent years. This paper proposes an extension of the classical meta-analysis with baseline risk information, which includes additional study-specific covariates other than the rate of events to explain heterogeneity. Likelihood-based inference is carried out by including measurement error correction techniques necessary to prevent unreliable inference due to the measurement errors affecting the covariates summarized at the study level. Within-study covariances between risk measures and the covariate components are computed using Taylor expansions based on study-level covariate subgroup summary information. When such information is not available and, more generally, in order to reduce computational difficulties, a pseudo-likelihood solution is developed under a working independence assumption between the observed error-prone measures. The performance of the methods is investigated in a series of simulation studies under different specifications for the sample size, the between-study heterogeneity, and the underlying risk distribution. They are applied to a meta-analysis about the association between COVID-19 and schizophrenia.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 23-24","pages":"e70278"},"PeriodicalIF":1.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12548021/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145347463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexandra Bühler, Richard J Cook, Jerald F Lawless
In complex diseases, individuals are often at risk of several types of possibly semi-competing events and may experience recurrent symptomatic episodes. This complex disease course makes it challenging to define target estimands for clinical trials. While composite endpoints are routinely adopted, recent innovations involving the win ratio and other methods based on ranking the disease course have received considerable attention. We emphasize the usefulness of multistate models for addressing challenges arising in complex diseases, along with the simplicity and interpretability that come from defining utilities to synthesize evidence of treatment effects on different aspects of the disease process. Robust variance estimation based on the infinitesimal jackknife means that such methods can be used as the basis of primary analyses of clinical trials. We illustrate the use of utilities for the assessment of bleeding outcomes in a trial of cancer patients with thrombocytopenia.
{"title":"Specification of Estimands for Complex Disease Processes Using Multistate Models and Utility Functions.","authors":"Alexandra Bühler, Richard J Cook, Jerald F Lawless","doi":"10.1002/sim.70269","DOIUrl":"10.1002/sim.70269","url":null,"abstract":"<p><p>In complex diseases, individuals are often at risk of several types of possibly semi-competing events and may experience recurrent symptomatic episodes. This complex disease course makes it challenging to define target estimands for clinical trials. While composite endpoints are routinely adopted, recent innovations involving the win ratio and other methods based on ranking the disease course have received considerable attention. We emphasize the usefulness of multistate models for addressing challenges arising in complex diseases, along with the simplicity and interpretability that come from defining utilities to synthesize evidence of treatment effects on different aspects of the disease process. Robust variance estimation based on the infinitesimal jackknife means that such methods can be used as the basis of primary analyses of clinical trials. We illustrate the use of utilities for the assessment of bleeding outcomes in a trial of cancer patients with thrombocytopenia.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 23-24","pages":"e70269"},"PeriodicalIF":1.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12519945/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145287044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dennis Dobler, Harald Binder, Anne-Laure Boulesteix, Jan-Bernd Igelmann, David Köhler, Ulrich Mansmann, Markus Pauly, André Scherag, Matthias Schmid, Amani Al Tawil, Susanne Weber
Modern large language models (LLMs) have reshaped the workflows of people across countless fields-and biostatistics is no exception. These models offer novel support in drafting study plans, generating software code, or writing reports. However, reliance on LLMs carries the risk of inaccuracies due to potential hallucinations that may produce fabricated "facts", leading to erroneous statistical statements and conclusions. Such errors could compromise the high precision and transparency fundamental to our field. This tutorial aims to illustrate the impact of LLM-based applications on various contemporary biostatistical tasks. We will explore both the risks and opportunities presented by this new era of artificial intelligence. Our ultimate conclusion emphasizes that advanced applications should only be used in combination with sufficient background knowledge. Over time, consistently verifying LLM outputs may lead to an appropriately calibrated trust in these tools among users.
{"title":"ChatGPT as a Tool for Biostatisticians: A Tutorial on Applications, Opportunities, and Limitations.","authors":"Dennis Dobler, Harald Binder, Anne-Laure Boulesteix, Jan-Bernd Igelmann, David Köhler, Ulrich Mansmann, Markus Pauly, André Scherag, Matthias Schmid, Amani Al Tawil, Susanne Weber","doi":"10.1002/sim.70263","DOIUrl":"10.1002/sim.70263","url":null,"abstract":"<p><p>Modern large language models (LLMs) have reshaped the workflows of people across countless fields-and biostatistics is no exception. These models offer novel support in drafting study plans, generating software code, or writing reports. However, reliance on LLMs carries the risk of inaccuracies due to potential hallucinations that may produce fabricated \"facts\", leading to erroneous statistical statements and conclusions. Such errors could compromise the high precision and transparency fundamental to our field. This tutorial aims to illustrate the impact of LLM-based applications on various contemporary biostatistical tasks. We will explore both the risks and opportunities presented by this new era of artificial intelligence. Our ultimate conclusion emphasizes that advanced applications should only be used in combination with sufficient background knowledge. Over time, consistently verifying LLM outputs may lead to an appropriately calibrated trust in these tools among users.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 23-24","pages":"e70263"},"PeriodicalIF":1.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12548020/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145347357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}