Pub Date : 2025-09-08eCollection Date: 2025-11-01DOI: 10.1515/ijb-2024-0108
Mark A van de Wiel, Wessel N van Wieringen
Variable selection is challenging for high-dimensional data, in particular when sample size is low. It is widely recognized that external information in the form of complementary data on the variables, 'co-data', may improve results. Examples are known variable groups or p-values from a related study. Such co-data are ubiquitous in genomics settings due to the availability of public repositories, and is likely equally relevant for other applications. Yet, the uptake of prediction methods that structurally use such co-data is limited. We review guided adaptive shrinkage methods: a class of regression-based learners that use co-data to adapt the shrinkage parameters, crucial for the performance of those learners. We discuss technical aspects, but also the applicability in terms of types of co-data that can be handled. This class of methods is contrasted with several others. In particular, group-adaptive shrinkage is compared with the better-known sparse group-lasso by evaluating variable selection. Moreover, we demonstrate the versatility of the guided shrinkage methodology by showing how to 'do-it-yourself': we integrate implementations of a co-data learner and the spike-and-slab prior for the purpose of improving variable selection in genetics studies. We conclude with a real data example.
{"title":"Leveraging external information by guided adaptive shrinkage to improve variable selection in high-dimensional regression settings.","authors":"Mark A van de Wiel, Wessel N van Wieringen","doi":"10.1515/ijb-2024-0108","DOIUrl":"10.1515/ijb-2024-0108","url":null,"abstract":"<p><p>Variable selection is challenging for high-dimensional data, in particular when sample size is low. It is widely recognized that external information in the form of complementary data on the variables, 'co-data', may improve results. Examples are known variable groups or <i>p</i>-values from a related study. Such co-data are ubiquitous in genomics settings due to the availability of public repositories, and is likely equally relevant for other applications. Yet, the uptake of prediction methods that structurally use such co-data is limited. We review guided adaptive shrinkage methods: a class of regression-based learners that use co-data to adapt the shrinkage parameters, crucial for the performance of those learners. We discuss technical aspects, but also the applicability in terms of types of co-data that can be handled. This class of methods is contrasted with several others. In particular, group-adaptive shrinkage is compared with the better-known sparse group-lasso by evaluating variable selection. Moreover, we demonstrate the versatility of the guided shrinkage methodology by showing how to 'do-it-yourself': we integrate implementations of a co-data learner and the spike-and-slab prior for the purpose of improving variable selection in genetics studies. We conclude with a real data example.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":"271-283"},"PeriodicalIF":1.2,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145076513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-05eCollection Date: 2025-11-01DOI: 10.1515/ijb-2024-0120
Leonora Pahirko, Janis Valeinis, Deivids Jēkabsons
In this paper, a two-sample empirical likelihood method for right censored data is established. This method allows for comparisons between various functionals of survival distributions, such as mean lifetimes, survival probabilities at a fixed time, restricted mean survival times, and other parameters of interest. It is demonstrated that under some regularity conditions, the scaled empirical likelihood statistic converges to a chi-squared distributed random variable with one degree of freedom. A consistent estimator for the scaling constant is proposed, involving the jackknife estimator of the asymptotic variance of the Kaplan-Meier integral. A simulation study is carried out to investigate the coverage accuracy of confidence intervals. Finally, two real datasets are analyzed to illustrate the application of the proposed method.
{"title":"Two-sample empirical likelihood method for right censored data.","authors":"Leonora Pahirko, Janis Valeinis, Deivids Jēkabsons","doi":"10.1515/ijb-2024-0120","DOIUrl":"10.1515/ijb-2024-0120","url":null,"abstract":"<p><p>In this paper, a two-sample empirical likelihood method for right censored data is established. This method allows for comparisons between various functionals of survival distributions, such as mean lifetimes, survival probabilities at a fixed time, restricted mean survival times, and other parameters of interest. It is demonstrated that under some regularity conditions, the scaled empirical likelihood statistic converges to a chi-squared distributed random variable with one degree of freedom. A consistent estimator for the scaling constant is proposed, involving the jackknife estimator of the asymptotic variance of the Kaplan-Meier integral. A simulation study is carried out to investigate the coverage accuracy of confidence intervals. Finally, two real datasets are analyzed to illustrate the application of the proposed method.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":"299-319"},"PeriodicalIF":1.2,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145070810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-05eCollection Date: 2025-11-01DOI: 10.1515/ijb-2024-0106
Raju Dey, Arne C Bathke, Somesh Kumar
The quantification of overlap between two distributions has applications in various fields of biology, medical, genetic, and ecological research. In this article, new overlap and containment indices are considered for quantifying the niche overlap between two species/populations. Some new properties of these indices are established and the problem of estimation is studied, when the two distributions are exponential with different scale parameters. We propose several estimators and compare their relative performance with respect to different loss functions. The asymptotic normality of the maximum likelihood estimators of these indices is proved under certain conditions. We also obtain confidence intervals of the indices based on three different approaches and compare their average lengths and coverage probabilities. The point and confidence interval procedures developed here are applied on a breast cancer data set to analyze the similarity between the survival times of patients undergoing two different types of surgery. Additionally, the similarity between the relapse free times of these two sets of patients is also studied.
{"title":"Inference on overlap index: with an application to cancer data.","authors":"Raju Dey, Arne C Bathke, Somesh Kumar","doi":"10.1515/ijb-2024-0106","DOIUrl":"10.1515/ijb-2024-0106","url":null,"abstract":"<p><p>The quantification of overlap between two distributions has applications in various fields of biology, medical, genetic, and ecological research. In this article, new overlap and containment indices are considered for quantifying the niche overlap between two species/populations. Some new properties of these indices are established and the problem of estimation is studied, when the two distributions are exponential with different scale parameters. We propose several estimators and compare their relative performance with respect to different loss functions. The asymptotic normality of the maximum likelihood estimators of these indices is proved under certain conditions. We also obtain confidence intervals of the indices based on three different approaches and compare their average lengths and coverage probabilities. The point and confidence interval procedures developed here are applied on a breast cancer data set to analyze the similarity between the survival times of patients undergoing two different types of surgery. Additionally, the similarity between the relapse free times of these two sets of patients is also studied.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":"357-383"},"PeriodicalIF":1.2,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145070713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-02eCollection Date: 2025-11-01DOI: 10.1515/ijb-2024-0075
Rawiyah Muneer Alraddadi, Mohamed Abd Allah El-Hadidy, Qin Shao, Qu Xianggui, Sadik Khuder
Hyponatremia, characterized by a serum sodium concentration below 135 mEq/L, is a prevalent electrolyte imbalance associated with increased morbidity and mortality across various clinical conditions. This study employs the Holt-Winters seasonal method, a robust time series forecasting model, to predict mortality rates attributed to hyponatremia. Leveraging retrospective mortality data from a cohort of hospitals in the United States, our analysis aims to elucidate temporal patterns and trends in hyponatremia-related deaths. The findings underscore the critical role of statistical forecasting in healthcare, facilitating proactive resource allocation and targeted interventions to mitigate mortality risks associated with electrolyte imbalances. Integrating predictive analytics into clinical practice holds promise for enhancing patient care and optimizing health outcomes in populations vulnerable to hyponatremia-related complications.
{"title":"Forecasting mortality rates in hyponatremia: a statistical approach using Holt-Winters models.","authors":"Rawiyah Muneer Alraddadi, Mohamed Abd Allah El-Hadidy, Qin Shao, Qu Xianggui, Sadik Khuder","doi":"10.1515/ijb-2024-0075","DOIUrl":"10.1515/ijb-2024-0075","url":null,"abstract":"<p><p>Hyponatremia, characterized by a serum sodium concentration below 135 mEq/L, is a prevalent electrolyte imbalance associated with increased morbidity and mortality across various clinical conditions. This study employs the Holt-Winters seasonal method, a robust time series forecasting model, to predict mortality rates attributed to hyponatremia. Leveraging retrospective mortality data from a cohort of hospitals in the United States, our analysis aims to elucidate temporal patterns and trends in hyponatremia-related deaths. The findings underscore the critical role of statistical forecasting in healthcare, facilitating proactive resource allocation and targeted interventions to mitigate mortality risks associated with electrolyte imbalances. Integrating predictive analytics into clinical practice holds promise for enhancing patient care and optimizing health outcomes in populations vulnerable to hyponatremia-related complications.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":"463-471"},"PeriodicalIF":1.2,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144977326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-29eCollection Date: 2025-11-01DOI: 10.1515/ijb-2024-0016
Yichen Lou, Mingyue Du
This paper discusses regression analysis of interval-censored failure time data arising from semiparametric transformation models in the presence of covariates that are missing at random (MAR). We define a specific formulation of the MAR mechanism tailored to the interval censoring, where the timing of observation adds complexity to handling missing covariates. To overcome the limitations and computational challenges present in the existing methods, we propose a multiple imputation procedure that can be easily implemented with the use of the standard software. The proposed method makes use of two predictive scores for each individual and the distance defined by these scores. Furthermore, it utilizes partial information from incomplete observations and thus yields more efficient estimators than the complete-case analysis and the inverse probability weighting approach. An extensive simulation study is conducted to assess the performance of the proposed method and indicates that it performs well in practical situations. Finally we apply the proposed approach to an Alzheimer's Disease study that motivated this work.
{"title":"Regression analysis of interval-censored failure time data under semiparametric transformation models with missing covariates.","authors":"Yichen Lou, Mingyue Du","doi":"10.1515/ijb-2024-0016","DOIUrl":"10.1515/ijb-2024-0016","url":null,"abstract":"<p><p>This paper discusses regression analysis of interval-censored failure time data arising from semiparametric transformation models in the presence of covariates that are missing at random (MAR). We define a specific formulation of the MAR mechanism tailored to the interval censoring, where the timing of observation adds complexity to handling missing covariates. To overcome the limitations and computational challenges present in the existing methods, we propose a multiple imputation procedure that can be easily implemented with the use of the standard software. The proposed method makes use of two predictive scores for each individual and the distance defined by these scores. Furthermore, it utilizes partial information from incomplete observations and thus yields more efficient estimators than the complete-case analysis and the inverse probability weighting approach. An extensive simulation study is conducted to assess the performance of the proposed method and indicates that it performs well in practical situations. Finally we apply the proposed approach to an Alzheimer's Disease study that motivated this work.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":"321-337"},"PeriodicalIF":1.2,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144976071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-05eCollection Date: 2025-05-01DOI: 10.1515/ijb-2023-0134
Quentin Edward Seifert, Anton Thielmann, Elisabeth Bergherr, Benjamin Säfken, Jakob Zierk, Manfred Rauh, Tobias Hepp
Mixture Density Networks (MDN) belong to a class of models that can be applied to data which cannot be sufficiently described by a single distribution since it originates from different components of the main unit and therefore needs to be described by a mixture of densities. In some situations, MDNs may have problems with the proper identification of the latent components. While these identification issues can to some extent be contained by using custom initialization strategies for the network weights, this solution is still less than ideal since it involves subjective opinions. We therefore suggest replacing the hidden layers between the model input and the output parameter vector of MDNs and estimating the respective distributional parameters with penalized cubic regression splines. Results on simulated data from both Gaussian and Gamma mixture distributions motivated by an application to indirect reference interval estimation drastically improved the identification performance with all splines reliably converging to their true parameter values.
{"title":"Penalized regression splines in Mixture Density Networks.","authors":"Quentin Edward Seifert, Anton Thielmann, Elisabeth Bergherr, Benjamin Säfken, Jakob Zierk, Manfred Rauh, Tobias Hepp","doi":"10.1515/ijb-2023-0134","DOIUrl":"10.1515/ijb-2023-0134","url":null,"abstract":"<p><p>Mixture Density Networks (MDN) belong to a class of models that can be applied to data which cannot be sufficiently described by a single distribution since it originates from different components of the main unit and therefore needs to be described by a mixture of densities. In some situations, MDNs may have problems with the proper identification of the latent components. While these identification issues can to some extent be contained by using custom initialization strategies for the network weights, this solution is still less than ideal since it involves subjective opinions. We therefore suggest replacing the hidden layers between the model input and the output parameter vector of MDNs and estimating the respective distributional parameters with penalized cubic regression splines. Results on simulated data from both Gaussian and Gamma mixture distributions motivated by an application to indirect reference interval estimation drastically improved the identification performance with all splines reliably converging to their true parameter values.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":"239-253"},"PeriodicalIF":1.2,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144217434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-03eCollection Date: 2025-11-01DOI: 10.1515/ijb-2023-0040
Masahiro Kojima
Phase I trials aim to identify the maximum tolerated dose (MTD) early and proceed quickly to an expansion cohort or a Phase II trial to assess the efficacy of the treatment. We present an early completion method based on multiple dosages (adjacent dose information) to accelerate the identification of the MTD in model-assisted designs. By using not only toxicity data for the current dose but also toxicity data for the next higher and lower doses, the MTD can be identified early without compromising accuracy. The early completion method is performed based on dose-assignment probabilities for multiple dosages. These probabilities are straightforward to calculate. We evaluated the early completion method using from an actual clinical trial. In a simulation study, we evaluated the percentage of correct MTD selection and the impact of early completion on trial outcomes. The results indicate that our proposed early completion method maintains a high level of accuracy in MTD selection, with minimal reduction compared to the standard approach. In certain scenarios, the accuracy of MTD selection even improves under the early completion framework. We conclude that the use of this early completion method poses no issue when applied to model-assisted designs.
{"title":"Early completion based on adjacent dose information for model-assisted designs to accelerate maximum tolerated dose finding.","authors":"Masahiro Kojima","doi":"10.1515/ijb-2023-0040","DOIUrl":"10.1515/ijb-2023-0040","url":null,"abstract":"<p><p>Phase I trials aim to identify the maximum tolerated dose (MTD) early and proceed quickly to an expansion cohort or a Phase II trial to assess the efficacy of the treatment. We present an early completion method based on multiple dosages (adjacent dose information) to accelerate the identification of the MTD in model-assisted designs. By using not only toxicity data for the current dose but also toxicity data for the next higher and lower doses, the MTD can be identified early without compromising accuracy. The early completion method is performed based on dose-assignment probabilities for multiple dosages. These probabilities are straightforward to calculate. We evaluated the early completion method using from an actual clinical trial. In a simulation study, we evaluated the percentage of correct MTD selection and the impact of early completion on trial outcomes. The results indicate that our proposed early completion method maintains a high level of accuracy in MTD selection, with minimal reduction compared to the standard approach. In certain scenarios, the accuracy of MTD selection even improves under the early completion framework. We conclude that the use of this early completion method poses no issue when applied to model-assisted designs.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":"411-421"},"PeriodicalIF":1.2,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144217433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-30eCollection Date: 2025-11-01DOI: 10.1515/ijb-2023-0027
Aya Kuchiba, Ran Gao, Molin Wang
A disease of interest can often be classified into subtypes based on its various molecular or pathological characteristics. Recent epidemiological studies have increasingly provided evidence that some molecular subtypes in a disease may have distinct etiologies, by assessing whether the associations of a potential risk factor vary by disease subtypes (i.e., etiologic heterogeneity). Case-control and case-case studies are popular study designs in molecular epidemiology, and both can be validly applied in studies of etiologic heterogeneity. This study compared the efficiency of the etiologic heterogeneity parameter estimation between these two study designs by theoretical and numerical examinations. In settings where the two study designs have the same number of cases, the results showed that, compared with the case-case study, case-control studies always provided more efficient estimates or estimates with at least equivalent efficiency for heterogeneity parameters. In addition, we illustrated both approaches in a study for aiming to evaluate the association between plasma free estradiol and breast cancer risk according to the status of tumor estrogen and progesterone receptors, the results of which were originally provided through case-control study data.
{"title":"Efficiency for evaluation of disease etiologic heterogeneity in case-case and case-control studies.","authors":"Aya Kuchiba, Ran Gao, Molin Wang","doi":"10.1515/ijb-2023-0027","DOIUrl":"10.1515/ijb-2023-0027","url":null,"abstract":"<p><p>A disease of interest can often be classified into subtypes based on its various molecular or pathological characteristics. Recent epidemiological studies have increasingly provided evidence that some molecular subtypes in a disease may have distinct etiologies, by assessing whether the associations of a potential risk factor vary by disease subtypes (i.e., etiologic heterogeneity). Case-control and case-case studies are popular study designs in molecular epidemiology, and both can be validly applied in studies of etiologic heterogeneity. This study compared the efficiency of the etiologic heterogeneity parameter estimation between these two study designs by theoretical and numerical examinations. In settings where the two study designs have the same number of cases, the results showed that, compared with the case-case study, case-control studies always provided more efficient estimates or estimates with at least equivalent efficiency for heterogeneity parameters. In addition, we illustrated both approaches in a study for aiming to evaluate the association between plasma free estradiol and breast cancer risk according to the status of tumor estrogen and progesterone receptors, the results of which were originally provided through case-control study data.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":"339-356"},"PeriodicalIF":1.2,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12707193/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144210099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-23eCollection Date: 2025-05-01DOI: 10.1515/ijb-2024-0021
Juan Chen, Yingchun Zhou
With the increasing complexity of data, researchers in various fields have become increasingly interested in estimating the causal effect of a matrix exposure, which involves complex multivariate treatments, on an outcome. Balancing covariates for the matrix exposure is essential to achieve this goal. While exact balancing and approximate balancing methods have been proposed for multiple balancing constraints, dealing with a matrix treatment introduces a large number of constraints, making it challenging to achieve exact balance or select suitable threshold parameters for approximate balancing methods. To address this challenge, the weighted Euclidean balancing method is proposed, which offers an approximate balance of covariates from an overall perspective. In this study, both parametric and nonparametric methods for estimating the causal effect of a matrix treatment is proposed, along with providing theoretical properties of the two estimations. To validate the effectiveness of our approach, extensive simulation results demonstrate that the proposed method outperforms alternative approaches across various scenarios. Finally, we apply the method to analyze the causal impact of the omics variables on the drug sensitivity of Vandetanib. The results indicate that EGFR CNV has a significant positive causal effect on Vandetanib efficacy, whereas EGFR methylation exerts a significant negative causal effect.
{"title":"Weighted Euclidean balancing for a matrix exposure in estimating causal effect.","authors":"Juan Chen, Yingchun Zhou","doi":"10.1515/ijb-2024-0021","DOIUrl":"10.1515/ijb-2024-0021","url":null,"abstract":"<p><p>With the increasing complexity of data, researchers in various fields have become increasingly interested in estimating the causal effect of a matrix exposure, which involves complex multivariate treatments, on an outcome. Balancing covariates for the matrix exposure is essential to achieve this goal. While exact balancing and approximate balancing methods have been proposed for multiple balancing constraints, dealing with a matrix treatment introduces a large number of constraints, making it challenging to achieve exact balance or select suitable threshold parameters for approximate balancing methods. To address this challenge, the weighted Euclidean balancing method is proposed, which offers an approximate balance of covariates from an overall perspective. In this study, both parametric and nonparametric methods for estimating the causal effect of a matrix treatment is proposed, along with providing theoretical properties of the two estimations. To validate the effectiveness of our approach, extensive simulation results demonstrate that the proposed method outperforms alternative approaches across various scenarios. Finally, we apply the method to analyze the causal impact of the omics variables on the drug sensitivity of Vandetanib. The results indicate that EGFR CNV has a significant positive causal effect on Vandetanib efficacy, whereas EGFR methylation exerts a significant negative causal effect.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":"219-237"},"PeriodicalIF":1.2,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144152240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-22eCollection Date: 2025-05-01DOI: 10.1515/ijb-2024-0005
Philippe Boileau, Ning Leng, Sandrine Dudoit
Individualized treatment rules, cornerstones of precision medicine, inform patient treatment decisions with the goal of optimizing patient outcomes. These rules are generally unknown functions of patients' pre-treatment covariates, meaning they must be estimated from clinical or observational study data. Myriad methods have been developed to learn these rules, and these procedures are demonstrably successful in traditional asymptotic settings with moderate number of covariates. The finite-sample performance of these methods in high-dimensional covariate settings, which are increasingly the norm in modern clinical trials, has not been well characterized, however. We perform a comprehensive comparison of state-of-the-art individualized treatment rule estimators, assessing performance on the basis of the estimators' rule quality, interpretability, and computational efficiency. Sixteen data-generating processes with continuous outcomes and binary treatment assignments are considered, reflecting a diversity of randomized and observational studies. We summarize our findings and provide succinct advice to practitioners needing to estimate individualized treatment rules in high dimensions. Owing to individualized treatment rule estimators' poor interpretability, we propose a novel pre-treatment covariate filtering procedure based on recent work for uncovering treatment effect modifiers. We show that it improves estimators' rule quality and interpretability. All code is made publicly available, facilitating modifications and extensions to our simulation study.
{"title":"Guidance on individualized treatment rule estimation in high dimensions.","authors":"Philippe Boileau, Ning Leng, Sandrine Dudoit","doi":"10.1515/ijb-2024-0005","DOIUrl":"10.1515/ijb-2024-0005","url":null,"abstract":"<p><p>Individualized treatment rules, cornerstones of precision medicine, inform patient treatment decisions with the goal of optimizing patient outcomes. These rules are generally unknown functions of patients' pre-treatment covariates, meaning they must be estimated from clinical or observational study data. Myriad methods have been developed to learn these rules, and these procedures are demonstrably successful in traditional asymptotic settings with moderate number of covariates. The finite-sample performance of these methods in high-dimensional covariate settings, which are increasingly the norm in modern clinical trials, has not been well characterized, however. We perform a comprehensive comparison of state-of-the-art individualized treatment rule estimators, assessing performance on the basis of the estimators' rule quality, interpretability, and computational efficiency. Sixteen data-generating processes with continuous outcomes and binary treatment assignments are considered, reflecting a diversity of randomized and observational studies. We summarize our findings and provide succinct advice to practitioners needing to estimate individualized treatment rules in high dimensions. Owing to individualized treatment rule estimators' poor interpretability, we propose a novel pre-treatment covariate filtering procedure based on recent work for uncovering treatment effect modifiers. We show that it improves estimators' rule quality and interpretability. All code is made publicly available, facilitating modifications and extensions to our simulation study.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":"183-218"},"PeriodicalIF":1.2,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144151742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}