Pub Date : 2025-02-11DOI: 10.1177/09622802241311455
Carlos García Meixide, Marcos Matabuena
Counterfactual inference at the distributional level presents new challenges with censored targets, especially in modern healthcare problems. To mitigate selection bias in this context, we exploit the intrinsic structure of reproducing kernel Hilbert spaces (RKHS) harnessing the notion of kernel mean embedding. This enables the development of a non-parametric estimator of counterfactual survival functions. We provide rigorous theoretical guarantees regarding consistency and convergence rates of our new estimator under general hypotheses related to smoothness of the underlying RKHS. We illustrate the practical viability of our methodology through extensive simulations and a relevant case study: The SPRINT trial. Our estimatort presents a distinct perspective compared to existing methods within the literature, which often rely on semi-parametric approaches and confront limitations in causal interpretations of model parameters.
{"title":"Causal survival embeddings: Non-parametric counterfactual inference under right-censoring.","authors":"Carlos García Meixide, Marcos Matabuena","doi":"10.1177/09622802241311455","DOIUrl":"https://doi.org/10.1177/09622802241311455","url":null,"abstract":"<p><p>Counterfactual inference at the distributional level presents new challenges with censored targets, especially in modern healthcare problems. To mitigate selection bias in this context, we exploit the intrinsic structure of reproducing kernel Hilbert spaces (RKHS) harnessing the notion of kernel mean embedding. This enables the development of a non-parametric estimator of counterfactual survival functions. We provide rigorous theoretical guarantees regarding consistency and convergence rates of our new estimator under general hypotheses related to smoothness of the underlying RKHS. We illustrate the practical viability of our methodology through extensive simulations and a relevant case study: The SPRINT trial. Our estimatort presents a distinct perspective compared to existing methods within the literature, which often rely on semi-parametric approaches and confront limitations in causal interpretations of model parameters.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802241311455"},"PeriodicalIF":1.6,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143391932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-11DOI: 10.1177/09622802241313293
Moritz Berger, Nadja Klein, Michael Wagner, Matthias Schmid
Modeling the ratio of two dependent components as a function of covariates is a frequently pursued objective in observational research. Despite the high relevance of this topic in medical studies, where biomarker ratios are often used as surrogate endpoints for specific diseases, existing models are commonly based on oversimplified assumptions, assuming e.g. independence or strictly positive associations between the components. In this paper, we overcome such limitations and propose a regression model where the marginal distributions of the two components are linked by a copula. A key feature of our model is that it allows for both positive and negative associations between the components, with one of the model parameters being directly interpretable in terms of Kendall's rank correlation coefficient. We study our method theoretically, evaluate finite sample properties in a simulation study and demonstrate its efficacy in an application to diagnosis of Alzheimer's disease via ratios of amyloid-beta and total tau protein biomarkers.
{"title":"Modeling the ratio of correlated biomarkers using copula regression.","authors":"Moritz Berger, Nadja Klein, Michael Wagner, Matthias Schmid","doi":"10.1177/09622802241313293","DOIUrl":"https://doi.org/10.1177/09622802241313293","url":null,"abstract":"<p><p>Modeling the ratio of two dependent components as a function of covariates is a frequently pursued objective in observational research. Despite the high relevance of this topic in medical studies, where biomarker ratios are often used as surrogate endpoints for specific diseases, existing models are commonly based on oversimplified assumptions, assuming e.g. independence or strictly positive associations between the components. In this paper, we overcome such limitations and propose a regression model where the marginal distributions of the two components are linked by a copula. A key feature of our model is that it allows for both positive and negative associations between the components, with one of the model parameters being directly interpretable in terms of Kendall's rank correlation coefficient. We study our method theoretically, evaluate finite sample properties in a simulation study and demonstrate its efficacy in an application to diagnosis of Alzheimer's disease via ratios of amyloid-beta and total tau protein biomarkers.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802241313293"},"PeriodicalIF":1.6,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143392048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multivariate control charts have found wide application in healthcare, yet they primarily cater to continuous or categorical variables. However, the emergence of mixed-type data has sparked interest in adapting traditional control charts to handle such complexity. Unfortunately, existing methods often struggle to effectively manage this complexity, particularly in scenarios with limited historical in-control data. In response, this article introduces three distribution-free control charts specifically designed for monitoring mixed-type processes. The proposed approach revolves around computing distances between observations and a specified point, thereby reducing the data to a single dimension. Subsequently, the ranks of these one-dimensional distances are leveraged to develop monitoring statistics. Furthermore, to facilitate dimensionality reduction, a novel distance measure tailored for mixed-type data is introduced. Extensive validation of our proposed method is conducted through comprehensive simulation experiments. Moreover, we demonstrate the practical applicability of the proposed method using an example related to heart disease.
{"title":"Distribution-free control charts for mixed-type data based on rank of interpoint distances.","authors":"Guojun Liu, Jyun-You Chiang, Yajie Bai, Zhengcheng Mou","doi":"10.1177/09622802251316964","DOIUrl":"https://doi.org/10.1177/09622802251316964","url":null,"abstract":"<p><p>Multivariate control charts have found wide application in healthcare, yet they primarily cater to continuous or categorical variables. However, the emergence of mixed-type data has sparked interest in adapting traditional control charts to handle such complexity. Unfortunately, existing methods often struggle to effectively manage this complexity, particularly in scenarios with limited historical in-control data. In response, this article introduces three distribution-free control charts specifically designed for monitoring mixed-type processes. The proposed approach revolves around computing distances between observations and a specified point, thereby reducing the data to a single dimension. Subsequently, the ranks of these one-dimensional distances are leveraged to develop monitoring statistics. Furthermore, to facilitate dimensionality reduction, a novel distance measure tailored for mixed-type data is introduced. Extensive validation of our proposed method is conducted through comprehensive simulation experiments. Moreover, we demonstrate the practical applicability of the proposed method using an example related to heart disease.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251316964"},"PeriodicalIF":1.6,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143391936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-05DOI: 10.1177/09622802241313284
Pierre Masselot, Antonio Gasparrini
Multi-location studies are increasingly used in environmental epidemiology. Their application is supported by designs and statistical techniques developed in the last decades, which however have known limitations. In this contribution, we propose an improved modelling framework that addresses these issues. Specifically, this flexible framework allows the direct modelling of demographic differences across locations, defining geographical variations linked to multiple vulnerability factors, capturing spatial heterogeneity and predicting risks to new locations, and improving the assessment of uncertainty. We illustrate these new developments in an analysis of temperature-mortality associations in Italian cities, providing fully reproducible R code and data.
{"title":"Modelling extensions for multi-location studies in environmental epidemiology.","authors":"Pierre Masselot, Antonio Gasparrini","doi":"10.1177/09622802241313284","DOIUrl":"10.1177/09622802241313284","url":null,"abstract":"<p><p>Multi-location studies are increasingly used in environmental epidemiology. Their application is supported by designs and statistical techniques developed in the last decades, which however have known limitations. In this contribution, we propose an improved modelling framework that addresses these issues. Specifically, this flexible framework allows the direct modelling of demographic differences across locations, defining geographical variations linked to multiple vulnerability factors, capturing spatial heterogeneity and predicting risks to new locations, and improving the assessment of uncertainty. We illustrate these new developments in an analysis of temperature-mortality associations in Italian cities, providing fully reproducible R code and data.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802241313284"},"PeriodicalIF":1.6,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143189642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-31DOI: 10.1177/09622802241309348
Dane Isenberg, Michael O Harhay, Nandita Mitra, Fan Li
Patient-centered outcomes, such as quality of life and length of hospital stay, are the focus in a wide array of clinical studies. However, participants in randomized trials for elderly or critically and severely ill patient populations may have truncated or undefined non-mortality outcomes if they do not survive through the measurement time point. To address truncation by death, the survivor average causal effect has been proposed as a causally interpretable subgroup treatment effect defined under the principal stratification framework. However, the majority of methods for estimating the survivor average causal effect have been developed in the context of individually randomized trials. Only limited discussions have been centered around cluster-randomized trials, where methods typically involve strong distributional assumptions for outcome modeling. In this article, we propose two weighting methods to estimate the survivor average causal effect in cluster-randomized trials that obviate the need for potentially complicated outcome distribution modeling. We establish the requisite assumptions that address latent clustering effects to enable point identification of the survivor average causal effect, and we provide computationally efficient asymptotic variance estimators for each weighting estimator. In simulations, we evaluate our weighting estimators, demonstrating their finite-sample operating characteristics and robustness to certain departures from the identification assumptions. We illustrate our methods using data from a cluster-randomized trial to assess the impact of a sedation protocol on mechanical ventilation among children with acute respiratory failure.
{"title":"Weighting methods for truncation by death in cluster-randomized trials.","authors":"Dane Isenberg, Michael O Harhay, Nandita Mitra, Fan Li","doi":"10.1177/09622802241309348","DOIUrl":"https://doi.org/10.1177/09622802241309348","url":null,"abstract":"<p><p>Patient-centered outcomes, such as quality of life and length of hospital stay, are the focus in a wide array of clinical studies. However, participants in randomized trials for elderly or critically and severely ill patient populations may have truncated or undefined non-mortality outcomes if they do not survive through the measurement time point. To address truncation by death, the survivor average causal effect has been proposed as a causally interpretable subgroup treatment effect defined under the principal stratification framework. However, the majority of methods for estimating the survivor average causal effect have been developed in the context of individually randomized trials. Only limited discussions have been centered around cluster-randomized trials, where methods typically involve strong distributional assumptions for outcome modeling. In this article, we propose two weighting methods to estimate the survivor average causal effect in cluster-randomized trials that obviate the need for potentially complicated outcome distribution modeling. We establish the requisite assumptions that address latent clustering effects to enable point identification of the survivor average causal effect, and we provide computationally efficient asymptotic variance estimators for each weighting estimator. In simulations, we evaluate our weighting estimators, demonstrating their finite-sample operating characteristics and robustness to certain departures from the identification assumptions. We illustrate our methods using data from a cluster-randomized trial to assess the impact of a sedation protocol on mechanical ventilation among children with acute respiratory failure.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802241309348"},"PeriodicalIF":1.6,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143068032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-31DOI: 10.1177/09622802241304112
Lars Leonardus Joannes van der Burg, Hein Putter, Henning Baldauf, Jürgen Sauter, Johannes Schetelig, Liesbeth C de Wreede, Stefan Böhringer
Missing data problems are common in biological, high-dimensional data, where data can be partially or completely missing. Algorithms have been developed to reconstruct the missing values by means of imputation or expectation-maximization algorithms. For missing data problems, it has been suggested that the regression model of interest should be incorporated into the imputation procedure to reduce bias of the regression coefficients. We here consider a challenging missing data problem, where diplotypes of the KIR loci are to be reconstructed. These loci are difficult to genotype, resulting in ambiguous genotype calls. We extend a previously proposed expectation-maximization algorithm by incorporating a potentially high-dimensional regression model to model the outcome. Three strategies are evaluated: (1) only allelic predictors, (2) allelic predictors and forward-backward selection on haplotype predictors, and (3) penalized regression on a saturated model. In a simulation study, we compared these strategies with a baseline expectation-maximization algorithm without outcome model. For extreme choices of effect sizes and missingness levels, the outcome-based expectation-maximization algorithms outperformed the no-outcome expectation-maximization algorithm. However, in all other cases, the no-outcome expectation-maximization algorithm performed either superior or comparable to the three strategies, suggesting the outcome model can have a harmful effect. In a data analysis concerning death after allogeneic hematopoietic stem cell transplantation as a function of donor KIR genes, expectation-maximization algorithms with and without outcome showed very similar results. In conclusion, outcome based missing data models in the high-dimensional setting have to be used with care and are likely to lead to biased results.
{"title":"<ArticleTitle xmlns:ns0=\"http://www.w3.org/1998/Math/MathML\">High-dimensional, outcome-dependent missing data problems: Models for the human <ns0:math><ns0:mi>K</ns0:mi><ns0:mi>I</ns0:mi><ns0:mi>R</ns0:mi></ns0:math> loci.","authors":"Lars Leonardus Joannes van der Burg, Hein Putter, Henning Baldauf, Jürgen Sauter, Johannes Schetelig, Liesbeth C de Wreede, Stefan Böhringer","doi":"10.1177/09622802241304112","DOIUrl":"https://doi.org/10.1177/09622802241304112","url":null,"abstract":"<p><p>Missing data problems are common in biological, high-dimensional data, where data can be partially or completely missing. Algorithms have been developed to reconstruct the missing values by means of imputation or expectation-maximization algorithms. For missing data problems, it has been suggested that the regression model of interest should be incorporated into the imputation procedure to reduce bias of the regression coefficients. We here consider a challenging missing data problem, where diplotypes of the <i>KIR</i> loci are to be reconstructed. These loci are difficult to genotype, resulting in ambiguous genotype calls. We extend a previously proposed expectation-maximization algorithm by incorporating a potentially high-dimensional regression model to model the outcome. Three strategies are evaluated: (1) only allelic predictors, (2) allelic predictors and forward-backward selection on haplotype predictors, and (3) penalized regression on a saturated model. In a simulation study, we compared these strategies with a baseline expectation-maximization algorithm without outcome model. For extreme choices of effect sizes and missingness levels, the outcome-based expectation-maximization algorithms outperformed the no-outcome expectation-maximization algorithm. However, in all other cases, the no-outcome expectation-maximization algorithm performed either superior or comparable to the three strategies, suggesting the outcome model can have a harmful effect. In a data analysis concerning death after allogeneic hematopoietic stem cell transplantation as a function of donor <i>KIR</i> genes, expectation-maximization algorithms with and without outcome showed very similar results. In conclusion, outcome based missing data models in the high-dimensional setting have to be used with care and are likely to lead to biased results.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802241304112"},"PeriodicalIF":1.6,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143068018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-31DOI: 10.1177/09622802241309349
Tsung-I Lin, Wan-Lun Wang
The article proposes a robust approach to jointly modeling multiple repeated clinical measures with intricate features. More specifically, we aim to expand the scope of the multivariate linear mixed model by using the multivariate contaminated normal distribution. The proposed model, called the multivariate contaminated normal linear mixed model with censored and missing responses (MCNLMM-CM), is designed to handle minor outliers effectively, while simultaneously accommodating censored measurements and intermittent missing responses. An expectation conditional maximization either algorithm is developed to estimate the parameters of the proposed model in situations involving missing at random responses. We also provide techniques for approximating the asymptotic standard errors of the parameters, recovering censored data, imputing missing values, and identifying outliers. A simulation study is conducted to evaluate the finite-sample properties of the parameter estimators and demonstrate the superior performance of the proposed model compared to existing models. The proposed methodology is inspired by and applied to data from the Alzheimer's disease neuroimaging initiative cohort study, which involves longitudinal clinical measurements of patients with mild cognitive impairment.
{"title":"Multivariate contaminated normal linear mixed models applied to Alzheimer's disease study with censored and missing data.","authors":"Tsung-I Lin, Wan-Lun Wang","doi":"10.1177/09622802241309349","DOIUrl":"https://doi.org/10.1177/09622802241309349","url":null,"abstract":"<p><p>The article proposes a robust approach to jointly modeling multiple repeated clinical measures with intricate features. More specifically, we aim to expand the scope of the multivariate linear mixed model by using the multivariate contaminated normal distribution. The proposed model, called the multivariate contaminated normal linear mixed model with censored and missing responses (MCNLMM-CM), is designed to handle minor outliers effectively, while simultaneously accommodating censored measurements and intermittent missing responses. An expectation conditional maximization either algorithm is developed to estimate the parameters of the proposed model in situations involving missing at random responses. We also provide techniques for approximating the asymptotic standard errors of the parameters, recovering censored data, imputing missing values, and identifying outliers. A simulation study is conducted to evaluate the finite-sample properties of the parameter estimators and demonstrate the superior performance of the proposed model compared to existing models. The proposed methodology is inspired by and applied to data from the Alzheimer's disease neuroimaging initiative cohort study, which involves longitudinal clinical measurements of patients with mild cognitive impairment.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802241309349"},"PeriodicalIF":1.6,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143068021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-23DOI: 10.1177/09622802241311458
Pablo Martínez-Camblor, Juan Carlos Pardo-Fernández
The study of the predictive ability of a marker is mainly based on the accuracy measures provided by the so-called confusion matrix. Besides, the area under the receiver operating characteristic curve has become a popular index for summarizing the overall accuracy of a marker. However, the nature of the relationship between the marker and the outcome, and the role that potential confounders play in this relationship could be fundamental in order to extrapolate the observed results. Directed acyclic graphs commonly used in epidemiology and in causality, could provide good feedback for learning the possibilities and limits of this extrapolation applied to the binary classification problem. Both the covariate-specific and the covariate-adjusted receiver operating characteristic curves are valuable tools, which can help to a better understanding of the real classification abilities of a marker. Since they are strongly related with the conditional distributions of the marker on the positive (subjects with the studied characteristic) and negative (subjects without the studied characteristic) populations, the use of proportional hazard regression models arises in a very natural way. We explore the use of flexible proportional hazard Cox regression models for estimating the covariate-specific and the covariate-adjusted receiver operating characteristic curves. We study their large- and finite-sample properties and apply the proposed estimators to a real-world problem. The developed code (in R language) is provided on Supplemental Material.
{"title":"Semiparametric estimator for the covariate-specific receiver operating characteristic curve.","authors":"Pablo Martínez-Camblor, Juan Carlos Pardo-Fernández","doi":"10.1177/09622802241311458","DOIUrl":"https://doi.org/10.1177/09622802241311458","url":null,"abstract":"<p><p>The study of the predictive ability of a marker is mainly based on the accuracy measures provided by the so-called confusion matrix. Besides, the area under the receiver operating characteristic curve has become a popular index for summarizing the overall accuracy of a marker. However, the nature of the relationship between the marker and the outcome, and the role that potential confounders play in this relationship could be fundamental in order to extrapolate the observed results. Directed acyclic graphs commonly used in epidemiology and in causality, could provide good feedback for learning the possibilities and limits of this extrapolation applied to the binary classification problem. Both the covariate-specific and the covariate-adjusted receiver operating characteristic curves are valuable tools, which can help to a better understanding of the real classification abilities of a marker. Since they are strongly related with the conditional distributions of the marker on the positive (subjects with the studied characteristic) and negative (subjects without the studied characteristic) populations, the use of proportional hazard regression models arises in a very natural way. We explore the use of flexible proportional hazard Cox regression models for estimating the covariate-specific and the covariate-adjusted receiver operating characteristic curves. We study their large- and finite-sample properties and apply the proposed estimators to a real-world problem. The developed code (in R language) is provided on Supplemental Material.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802241311458"},"PeriodicalIF":1.6,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143024802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-23DOI: 10.1177/09622802241310328
Xuqiao Li, Qiuyan Zhou, Ying Wu, Ying Yan
One primary goal of precision medicine is to estimate the individualized treatment rules that optimize patients' health outcomes based on individual characteristics. Health studies with multiple treatments are commonly seen in practice. However, most existing individualized treatment rule estimation methods were developed for the studies with binary treatments. Many require that the outcomes are fully observed. In this article, we propose a matching-based machine learning method to estimate the optimal individualized treatment rules in observational studies with multiple treatments when the outcomes are fully observed or right-censored. We establish theoretical property for the proposed method. It is compared with the existing competitive methods in simulation studies and a hepatocellular carcinoma study.
{"title":"Multicategory matched learning for estimating optimal individualized treatment rules in observational studies with application to a hepatocellular carcinoma study.","authors":"Xuqiao Li, Qiuyan Zhou, Ying Wu, Ying Yan","doi":"10.1177/09622802241310328","DOIUrl":"https://doi.org/10.1177/09622802241310328","url":null,"abstract":"<p><p>One primary goal of precision medicine is to estimate the individualized treatment rules that optimize patients' health outcomes based on individual characteristics. Health studies with multiple treatments are commonly seen in practice. However, most existing individualized treatment rule estimation methods were developed for the studies with binary treatments. Many require that the outcomes are fully observed. In this article, we propose a matching-based machine learning method to estimate the optimal individualized treatment rules in observational studies with multiple treatments when the outcomes are fully observed or right-censored. We establish theoretical property for the proposed method. It is compared with the existing competitive methods in simulation studies and a hepatocellular carcinoma study.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802241310328"},"PeriodicalIF":1.6,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143024790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-19DOI: 10.1177/09622802241307642
Hwanhee Hong, Lu Liu, Elizabeth A Stuart
Meta-analysis of randomized controlled trials is commonly used to evaluate treatments and inform policy decisions because it provides comprehensive summaries of all available evidence. However, meta-analyses are limited to draw population inference of treatment effects because they usually do not define target populations of interest specifically, and results of the individual randomized controlled trials in those meta-analyses may not generalize to the target populations. To leverage evidence from multiple randomized controlled trials in the generalizability context, we bridge the ideas from meta-analysis and causal inference. We integrate meta-analysis with causal inference approaches estimating target population average treatment effect. We evaluate the performance of the methods via simulation studies and apply the methods to generalize meta-analysis results from randomized controlled trials of treatments on schizophrenia to adults with schizophrenia who present to usual care settings in the United States. Our simulation results show that all methods perform comparably and well across different settings. The data analysis results show that the treatment effect in the target population is meaningful, although the effect size is smaller than the sample average treatment effect. We recommend applying multiple methods and comparing the results to ensure robustness, rather than relying on a single method.
{"title":"Estimating target population treatment effects in meta-analysis with individual participant-level data.","authors":"Hwanhee Hong, Lu Liu, Elizabeth A Stuart","doi":"10.1177/09622802241307642","DOIUrl":"https://doi.org/10.1177/09622802241307642","url":null,"abstract":"<p><p>Meta-analysis of randomized controlled trials is commonly used to evaluate treatments and inform policy decisions because it provides comprehensive summaries of all available evidence. However, meta-analyses are limited to draw population inference of treatment effects because they usually do not define target populations of interest specifically, and results of the individual randomized controlled trials in those meta-analyses may not generalize to the target populations. To leverage evidence from multiple randomized controlled trials in the generalizability context, we bridge the ideas from meta-analysis and causal inference. We integrate meta-analysis with causal inference approaches estimating target population average treatment effect. We evaluate the performance of the methods via simulation studies and apply the methods to generalize meta-analysis results from randomized controlled trials of treatments on schizophrenia to adults with schizophrenia who present to usual care settings in the United States. Our simulation results show that all methods perform comparably and well across different settings. The data analysis results show that the treatment effect in the target population is meaningful, although the effect size is smaller than the sample average treatment effect. We recommend applying multiple methods and comparing the results to ensure robustness, rather than relying on a single method.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802241307642"},"PeriodicalIF":1.6,"publicationDate":"2025-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143012040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}