This paper considers semiparametric estimation strategies for the nonlinear semiparametric regression model (NSRM) under the sparsity assumption by modifying the Gauss-Newton method for both low- and high-dimensional data scenarios. In the low-dimensional case, coefficients are partitioned into two parts that represent nonzero (strong signals) and sparse coefficients. In the high-dimensional case, a weighted-ridge approach is employed, and coefficients are partitioned into three parts, adding weak signals as well. Shrinkage estimators are then obtained in both cases. More importantly, in this paper, we assume that a nonlinear structure is present in the parametric component of the model, which makes the direct application of penalized least squares to the NSRM impossible. To solve this problem, we employ the iterative Gauss-Newton method to obtain the final NSRM estimators. We provide both theoretical and practical details for the suggested estimators. Asymptotic results are derived for both low- and high-dimensional cases. We conduct an extensive simulation study to evaluate the performance of the estimators in a practical setting. Moreover, we substantiate our findings with data examples from two distinct breast cancer datasets: the Breast Cancer in the United States (BCUS) and Wisconsin datasets. By demonstrating the effectiveness of our introduced estimators in these particular biostatistical contexts, our numerical study provides support for the theoretical efficacy of shrinkage estimators, suggesting their potential relevance to breast cancer research and biostatistical methodologies.
{"title":"Post-shrinkage strategies for nonlinear semiparametric regression models in low and high-dimensional settings.","authors":"S Ejaz Ahmed, Dursun Aydın, Ersin Yılmaz","doi":"10.1515/ijb-2024-0011","DOIUrl":"https://doi.org/10.1515/ijb-2024-0011","url":null,"abstract":"<p><p>This paper considers semiparametric estimation strategies for the nonlinear semiparametric regression model (NSRM) under the sparsity assumption by modifying the Gauss-Newton method for both low- and high-dimensional data scenarios. In the low-dimensional case, coefficients are partitioned into two parts that represent nonzero (strong signals) and sparse coefficients. In the high-dimensional case, a weighted-ridge approach is employed, and coefficients are partitioned into three parts, adding weak signals as well. Shrinkage estimators are then obtained in both cases. More importantly, in this paper, we assume that a nonlinear structure is present in the parametric component of the model, which makes the direct application of penalized least squares to the NSRM impossible. To solve this problem, we employ the iterative Gauss-Newton method to obtain the final NSRM estimators. We provide both theoretical and practical details for the suggested estimators. Asymptotic results are derived for both low- and high-dimensional cases. We conduct an extensive simulation study to evaluate the performance of the estimators in a practical setting. Moreover, we substantiate our findings with data examples from two distinct breast cancer datasets: the Breast Cancer in the United States (BCUS) and Wisconsin datasets. By demonstrating the effectiveness of our introduced estimators in these particular biostatistical contexts, our numerical study provides support for the theoretical efficacy of shrinkage estimators, suggesting their potential relevance to breast cancer research and biostatistical methodologies.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145558461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information borrowing from historical data is gaining increasing attention in clinical trials for rare and pediatric diseases, where small sample sizes may lead to insufficient statistical power for confirming efficacy. While Bayesian information borrowing methods are well established, recent frequentist approaches, such as the test-then-pool and equivalence-based test-then-pool methods, have been proposed to determine whether historical data should be incorporated into statistical hypothesis testing. Depending on the outcome of these hypothesis tests, historical data may or may not be utilized. This paper introduces a dynamic borrowing method for leveraging historical information based on the similarity between current and historical data. Similar to Bayesian dynamic borrowing, our proposed method adjusts the degree of information borrowing dynamically, ranging from 0 to 100 %. We present two approaches to measure similarity: one using the density function of the t-distribution and the other employing a logistic function. The performance of the proposed methods is evaluated through Monte Carlo simulations. Additionally, we demonstrate the utility of dynamic information borrowing by reanalyzing data from an actual clinical trial.
{"title":"DBMS: dynamic borrowing method for frequentist hybrid control designs based on historical-current data similarity.","authors":"Masahiro Kojima","doi":"10.1515/ijb-2024-0051","DOIUrl":"https://doi.org/10.1515/ijb-2024-0051","url":null,"abstract":"<p><p>Information borrowing from historical data is gaining increasing attention in clinical trials for rare and pediatric diseases, where small sample sizes may lead to insufficient statistical power for confirming efficacy. While Bayesian information borrowing methods are well established, recent frequentist approaches, such as the test-then-pool and equivalence-based test-then-pool methods, have been proposed to determine whether historical data should be incorporated into statistical hypothesis testing. Depending on the outcome of these hypothesis tests, historical data may or may not be utilized. This paper introduces a dynamic borrowing method for leveraging historical information based on the similarity between current and historical data. Similar to Bayesian dynamic borrowing, our proposed method adjusts the degree of information borrowing dynamically, ranging from 0 to 100 %. We present two approaches to measure similarity: one using the density function of the t-distribution and the other employing a logistic function. The performance of the proposed methods is evaluated through Monte Carlo simulations. Additionally, we demonstrate the utility of dynamic information borrowing by reanalyzing data from an actual clinical trial.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145459898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jesús Gutiérrez-Botella, Carmen Armero, Thomas Kneib, María P Pata, Javier García-Seara
Competing risks models are survival models with several events of interest acting in competition and whose occurrence is only observed for the event that occurs first in time. This paper presents a Bayesian approach to these models in which the issue of model selection is treated in a special way by proposing generalizations of some of the Bayesian procedures used in univariate survival analysis. This research is motivated by a study on the survival of patients with heart failure undergoing cardiac resynchronization therapy, a procedure which involves the implant of a device to stabilize the heartbeat. Two different causes of death have been considered: cardiovascular and non-cardiovascular, and a set of baseline covariates are examined in order to better understand their relationship with both causes of death. Model selection, model checking, and model comparison procedures have been implemented and assessed. The posterior distribution of some relevant outputs such as the overall survival function, cumulative incidence functions, and transition probabilities have been computed and discussed.
{"title":"Bayesian competing risks survival modeling for assessing the cause of death of patients with heart failure.","authors":"Jesús Gutiérrez-Botella, Carmen Armero, Thomas Kneib, María P Pata, Javier García-Seara","doi":"10.1515/ijb-2025-0011","DOIUrl":"https://doi.org/10.1515/ijb-2025-0011","url":null,"abstract":"<p><p>Competing risks models are survival models with several events of interest acting in competition and whose occurrence is only observed for the event that occurs first in time. This paper presents a Bayesian approach to these models in which the issue of model selection is treated in a special way by proposing generalizations of some of the Bayesian procedures used in univariate survival analysis. This research is motivated by a study on the survival of patients with heart failure undergoing cardiac resynchronization therapy, a procedure which involves the implant of a device to stabilize the heartbeat. Two different causes of death have been considered: cardiovascular and non-cardiovascular, and a set of baseline covariates are examined in order to better understand their relationship with both causes of death. Model selection, model checking, and model comparison procedures have been implemented and assessed. The posterior distribution of some relevant outputs such as the overall survival function, cumulative incidence functions, and transition probabilities have been computed and discussed.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145423494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The binary classification problem (BCP) aims to correctly allocate subjects in one of two possible groups. The groups are frequently defined as having or not one characteristic of interest. With this goal, we are allowed to use different types of information. There is a huge number of methods dealing with this problem; including standard binary regression models, or complex machine learning techniques such as support vector machine, boosting, or perceptron, among others. When this information is summarized in a continuous score, we have to define classification regions (or subsets) which will determine whether the subjects are classified as positive, with the characteristic under study, or as negative, otherwise. The standard (or regular) receiver-operating characteristic (ROC) curve assumes that higher values of the marker are associated with higher probabilities of being positive and considers as positive those patients with values within the intervals [c, ∞) , and plots the true- against the false- positive rates (sensitivity against one minus specificity) for all potential c. The so-called generalized ROC curve, gROC, allows that both higher and lower values of the score are associated with higher probabilities of being positive. The efficient ROC curve, eROC, considers the best ROC curve based on a transformation of the score. In this manuscript, we are interested in studying, comparing and approximating the transformations leading to the eROC and to the gROC curves. We will prove that, when the optimal transformation does not have relative maximum, both curves are equivalent. Besides, we investigate the use of the gROC curve on some theoretical models, explore the relationship between the gROC and the eROC curves, and propose two non-parametric procedures for approximating the transformation leading to the gROC curve. The finite-sample behavior of the proposed estimators is explored through Monte Carlo simulations. Two real-data sets illustrate the practical use of the proposed methods.
{"title":"The gROC curve and the optimal classification.","authors":"Pablo Martínez-Camblor, Sonia Pérez-Fernández","doi":"10.1515/ijb-2025-0016","DOIUrl":"https://doi.org/10.1515/ijb-2025-0016","url":null,"abstract":"<p><p>The binary classification problem (BCP) aims to correctly allocate subjects in one of two possible groups. The groups are frequently defined as having or not one characteristic of interest. With this goal, we are allowed to use different types of information. There is a huge number of methods dealing with this problem; including standard binary regression models, or complex machine learning techniques such as support vector machine, boosting, or perceptron, among others. When this information is summarized in a continuous score, we have to define classification regions (or subsets) which will determine whether the subjects are classified as positive, with the characteristic under study, or as negative, otherwise. The standard (or regular) receiver-operating characteristic (ROC) curve assumes that higher values of the marker are associated with higher probabilities of being positive and considers as positive those patients with values within the intervals [<i>c</i>, ∞) <math><mrow><mo>(</mo> <mrow><mi>c</mi> <mo>∈</mo> <mi>R</mi></mrow> <mo>)</mo></mrow> </math> , and plots the true- against the false- positive rates (sensitivity against one minus specificity) for all potential <i>c</i>. The so-called generalized ROC curve, gROC, allows that both higher and lower values of the score are associated with higher probabilities of being positive. The efficient ROC curve, eROC, considers the best ROC curve based on a transformation of the score. In this manuscript, we are interested in studying, comparing and approximating the transformations leading to the eROC and to the gROC curves. We will prove that, when the optimal transformation does not have relative maximum, both curves are equivalent. Besides, we investigate the use of the gROC curve on some theoretical models, explore the relationship between the gROC and the eROC curves, and propose two non-parametric procedures for approximating the transformation leading to the gROC curve. The finite-sample behavior of the proposed estimators is explored through Monte Carlo simulations. Two real-data sets illustrate the practical use of the proposed methods.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145423453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In observational studies, the treatment assignment is typically not random. Even in randomized clinical trials, the randomization may be imperfect given the limitation of sample size. In these cases, traditional statistical methods may lead to biased estimates of treatment effects, and causal inference methods are needed to obtain unbiased estimates. The doubly robust estimator (DRE) is a recent development in causal inference, but the literature on DRE for survival data is very limited, and existing methods tend to have complicated forms and may not have double robustness in the original sense. Some are constructed based on the Nelson-Aalen estimator, and to our knowledge no DRE is constructed based on the Kaplan-Meier estimator. Furthermore, in these methods, the propensity score model is often subjectively specified with a logistic model. DRE can be seriously biased if the propensity score and outcome models are slightly misspecified. Here we propose a new semiparametric robust estimator that utilizes the Kaplan-Meier estimator and Stute weighted empirical form to address these issues. Our proposed estimator is not only doubly robust in the original sense but also enhances robustness with the use of semiparametric specification. The asymptotic properties of the proposed estimator are derived, and extensive simulation studies are conducted to evaluate its finite sample performance and compare it with existing methods. Finally, we apply our proposed method to a real clinical study.
{"title":"Enhanced doubly robust estimate with semiparametric models for causal inference of survival outcome.","authors":"Tianmin Wu, Ao Yuan, Ming Tan","doi":"10.1515/ijb-2023-0131","DOIUrl":"https://doi.org/10.1515/ijb-2023-0131","url":null,"abstract":"<p><p>In observational studies, the treatment assignment is typically not random. Even in randomized clinical trials, the randomization may be imperfect given the limitation of sample size. In these cases, traditional statistical methods may lead to biased estimates of treatment effects, and causal inference methods are needed to obtain unbiased estimates. The doubly robust estimator (DRE) is a recent development in causal inference, but the literature on DRE for survival data is very limited, and existing methods tend to have complicated forms and may not have double robustness in the original sense. Some are constructed based on the Nelson-Aalen estimator, and to our knowledge no DRE is constructed based on the Kaplan-Meier estimator. Furthermore, in these methods, the propensity score model is often subjectively specified with a logistic model. DRE can be seriously biased if the propensity score and outcome models are slightly misspecified. Here we propose a new semiparametric robust estimator that utilizes the Kaplan-Meier estimator and Stute weighted empirical form to address these issues. Our proposed estimator is not only doubly robust in the original sense but also enhances robustness with the use of semiparametric specification. The asymptotic properties of the proposed estimator are derived, and extensive simulation studies are conducted to evaluate its finite sample performance and compare it with existing methods. Finally, we apply our proposed method to a real clinical study.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145410580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Traditional survival analysis typically assumes that all subjects will eventually experience the event of interest given a sufficiently long follow-up period. Nevertheless, due to advancements in medical technology, researchers now frequently observe that some subjects never experience the event and are considered cured. Furthermore, traditional survival analysis assumes independence between failure time and censoring time. However, practical applications often reveal dependence between them. Ignoring both the cured subgroup and this dependence structure can introduce bias in model estimates. Among the methods for handling dependent censoring data, the numerical integration process of frailty models is complex and sensitive to the assumptions about the latent variable distribution. In contrast, the copula method, by flexibly modeling the dependence between variables, avoids strong assumptions about the latent variable structure, offering greater robustness and computational feasibility. Therefore, this paper proposes a copula-based method to handle dependent current status data involving a cure fraction. In the modeling process, we establish a logistic model to describe the susceptible rate and a Cox proportional hazards model to describe the failure time and censoring time. In the estimation process, we employ a sieve maximum likelihood estimation method based on Bernstein polynomials for parameter estimation. Extensive simulation experiments show that the proposed method demonstrates consistency and asymptotic efficiency under various settings. Finally, this paper applies the method to lymph follicle cell data, verifying its effectiveness in practical data analysis.
{"title":"Copula-based Cox models for dependent current status data with a cure fraction.","authors":"Shuying Wang, Danping Zhou, Yunfei Yang, Bo Zhao","doi":"10.1515/ijb-2025-0038","DOIUrl":"https://doi.org/10.1515/ijb-2025-0038","url":null,"abstract":"<p><p>Traditional survival analysis typically assumes that all subjects will eventually experience the event of interest given a sufficiently long follow-up period. Nevertheless, due to advancements in medical technology, researchers now frequently observe that some subjects never experience the event and are considered cured. Furthermore, traditional survival analysis assumes independence between failure time and censoring time. However, practical applications often reveal dependence between them. Ignoring both the cured subgroup and this dependence structure can introduce bias in model estimates. Among the methods for handling dependent censoring data, the numerical integration process of frailty models is complex and sensitive to the assumptions about the latent variable distribution. In contrast, the copula method, by flexibly modeling the dependence between variables, avoids strong assumptions about the latent variable structure, offering greater robustness and computational feasibility. Therefore, this paper proposes a copula-based method to handle dependent current status data involving a cure fraction. In the modeling process, we establish a logistic model to describe the susceptible rate and a Cox proportional hazards model to describe the failure time and censoring time. In the estimation process, we employ a sieve maximum likelihood estimation method based on Bernstein polynomials for parameter estimation. Extensive simulation experiments show that the proposed method demonstrates consistency and asymptotic efficiency under various settings. Finally, this paper applies the method to lymph follicle cell data, verifying its effectiveness in practical data analysis.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145253416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multi-stage models for cohort data are widely used in various fields, including disease progression, the biological development of plants and animals, and laboratory studies of life cycle development. However, the likelihood functions of these models are often intractable and complex. These complexities in the likelihood functions frequently result in significant biases and high computational costs when estimating parameters using current Bayesian methods. This paper aims to address these challenges by applying the enhanced Sequential Monte Carlo approximate Bayesian computation (ABC-SMC) method, which does not rely on explicit likelihood functions, to stage-structured development models with non-hazard rates and stage-wise constant hazard rates. Instead of using a likelihood function, the proposed method determines parameter estimates based on matching vector summary statistics. It incorporates stage-wise parameter estimations and retains accepted parameters across stages. This approach not only reduces model biases but also improves the computational efficiency of parameter estimations, despite the computational intractability of the likelihood functions. The proposed ABC-SMC method is validated through simulation studies on stage-structured development models and applied to a case study of breast development in New Zealand schoolgirls. The results demonstrate that the proposed methods effectively reduce biases in later-stage estimates for stage-structured models, enhance computational efficiency, and maintain accuracy and reliability in parameter estimations compared to the current methods.
{"title":"An enhanced approximate Bayesian computation method for stage-structured development models.","authors":"Hoa Pham, Huong T T Pham, Kai Siong Yow","doi":"10.1515/ijb-2025-0065","DOIUrl":"https://doi.org/10.1515/ijb-2025-0065","url":null,"abstract":"<p><p>Multi-stage models for cohort data are widely used in various fields, including disease progression, the biological development of plants and animals, and laboratory studies of life cycle development. However, the likelihood functions of these models are often intractable and complex. These complexities in the likelihood functions frequently result in significant biases and high computational costs when estimating parameters using current Bayesian methods. This paper aims to address these challenges by applying the enhanced Sequential Monte Carlo approximate Bayesian computation (ABC-SMC) method, which does not rely on explicit likelihood functions, to stage-structured development models with non-hazard rates and stage-wise constant hazard rates. Instead of using a likelihood function, the proposed method determines parameter estimates based on matching vector summary statistics. It incorporates stage-wise parameter estimations and retains accepted parameters across stages. This approach not only reduces model biases but also improves the computational efficiency of parameter estimations, despite the computational intractability of the likelihood functions. The proposed ABC-SMC method is validated through simulation studies on stage-structured development models and applied to a case study of breast development in New Zealand schoolgirls. The results demonstrate that the proposed methods effectively reduce biases in later-stage estimates for stage-structured models, enhance computational efficiency, and maintain accuracy and reliability in parameter estimations compared to the current methods.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145126234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Variable selection is challenging for high-dimensional data, in particular when sample size is low. It is widely recognized that external information in the form of complementary data on the variables, 'co-data', may improve results. Examples are known variable groups or p-values from a related study. Such co-data are ubiquitous in genomics settings due to the availability of public repositories, and is likely equally relevant for other applications. Yet, the uptake of prediction methods that structurally use such co-data is limited. We review guided adaptive shrinkage methods: a class of regression-based learners that use co-data to adapt the shrinkage parameters, crucial for the performance of those learners. We discuss technical aspects, but also the applicability in terms of types of co-data that can be handled. This class of methods is contrasted with several others. In particular, group-adaptive shrinkage is compared with the better-known sparse group-lasso by evaluating variable selection. Moreover, we demonstrate the versatility of the guided shrinkage methodology by showing how to 'do-it-yourself': we integrate implementations of a co-data learner and the spike-and-slab prior for the purpose of improving variable selection in genetics studies. We conclude with a real data example.
{"title":"Leveraging external information by guided adaptive shrinkage to improve variable selection in high-dimensional regression settings.","authors":"Mark A van de Wiel, Wessel N van Wieringen","doi":"10.1515/ijb-2024-0108","DOIUrl":"https://doi.org/10.1515/ijb-2024-0108","url":null,"abstract":"<p><p>Variable selection is challenging for high-dimensional data, in particular when sample size is low. It is widely recognized that external information in the form of complementary data on the variables, 'co-data', may improve results. Examples are known variable groups or <i>p</i>-values from a related study. Such co-data are ubiquitous in genomics settings due to the availability of public repositories, and is likely equally relevant for other applications. Yet, the uptake of prediction methods that structurally use such co-data is limited. We review guided adaptive shrinkage methods: a class of regression-based learners that use co-data to adapt the shrinkage parameters, crucial for the performance of those learners. We discuss technical aspects, but also the applicability in terms of types of co-data that can be handled. This class of methods is contrasted with several others. In particular, group-adaptive shrinkage is compared with the better-known sparse group-lasso by evaluating variable selection. Moreover, we demonstrate the versatility of the guided shrinkage methodology by showing how to 'do-it-yourself': we integrate implementations of a co-data learner and the spike-and-slab prior for the purpose of improving variable selection in genetics studies. We conclude with a real data example.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145076513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Leonora Pahirko, Janis Valeinis, Deivids Jēkabsons
In this paper, a two-sample empirical likelihood method for right censored data is established. This method allows for comparisons between various functionals of survival distributions, such as mean lifetimes, survival probabilities at a fixed time, restricted mean survival times, and other parameters of interest. It is demonstrated that under some regularity conditions, the scaled empirical likelihood statistic converges to a chi-squared distributed random variable with one degree of freedom. A consistent estimator for the scaling constant is proposed, involving the jackknife estimator of the asymptotic variance of the Kaplan-Meier integral. A simulation study is carried out to investigate the coverage accuracy of confidence intervals. Finally, two real datasets are analyzed to illustrate the application of the proposed method.
{"title":"Two-sample empirical likelihood method for right censored data.","authors":"Leonora Pahirko, Janis Valeinis, Deivids Jēkabsons","doi":"10.1515/ijb-2024-0120","DOIUrl":"https://doi.org/10.1515/ijb-2024-0120","url":null,"abstract":"<p><p>In this paper, a two-sample empirical likelihood method for right censored data is established. This method allows for comparisons between various functionals of survival distributions, such as mean lifetimes, survival probabilities at a fixed time, restricted mean survival times, and other parameters of interest. It is demonstrated that under some regularity conditions, the scaled empirical likelihood statistic converges to a chi-squared distributed random variable with one degree of freedom. A consistent estimator for the scaling constant is proposed, involving the jackknife estimator of the asymptotic variance of the Kaplan-Meier integral. A simulation study is carried out to investigate the coverage accuracy of confidence intervals. Finally, two real datasets are analyzed to illustrate the application of the proposed method.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145070810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The quantification of overlap between two distributions has applications in various fields of biology, medical, genetic, and ecological research. In this article, new overlap and containment indices are considered for quantifying the niche overlap between two species/populations. Some new properties of these indices are established and the problem of estimation is studied, when the two distributions are exponential with different scale parameters. We propose several estimators and compare their relative performance with respect to different loss functions. The asymptotic normality of the maximum likelihood estimators of these indices is proved under certain conditions. We also obtain confidence intervals of the indices based on three different approaches and compare their average lengths and coverage probabilities. The point and confidence interval procedures developed here are applied on a breast cancer data set to analyze the similarity between the survival times of patients undergoing two different types of surgery. Additionally, the similarity between the relapse free times of these two sets of patients is also studied.
{"title":"Inference on overlap index: with an application to cancer data.","authors":"Raju Dey, Arne C Bathke, Somesh Kumar","doi":"10.1515/ijb-2024-0106","DOIUrl":"https://doi.org/10.1515/ijb-2024-0106","url":null,"abstract":"<p><p>The quantification of overlap between two distributions has applications in various fields of biology, medical, genetic, and ecological research. In this article, new overlap and containment indices are considered for quantifying the niche overlap between two species/populations. Some new properties of these indices are established and the problem of estimation is studied, when the two distributions are exponential with different scale parameters. We propose several estimators and compare their relative performance with respect to different loss functions. The asymptotic normality of the maximum likelihood estimators of these indices is proved under certain conditions. We also obtain confidence intervals of the indices based on three different approaches and compare their average lengths and coverage probabilities. The point and confidence interval procedures developed here are applied on a breast cancer data set to analyze the similarity between the survival times of patients undergoing two different types of surgery. Additionally, the similarity between the relapse free times of these two sets of patients is also studied.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145070713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}