首页 > 最新文献

International Journal of Biostatistics最新文献

英文 中文
Post-shrinkage strategies for nonlinear semiparametric regression models in low and high-dimensional settings. 低维和高维非线性半参数回归模型的收缩后策略。
IF 1.2 4区 数学 Pub Date : 2025-11-20 DOI: 10.1515/ijb-2024-0011
S Ejaz Ahmed, Dursun Aydın, Ersin Yılmaz

This paper considers semiparametric estimation strategies for the nonlinear semiparametric regression model (NSRM) under the sparsity assumption by modifying the Gauss-Newton method for both low- and high-dimensional data scenarios. In the low-dimensional case, coefficients are partitioned into two parts that represent nonzero (strong signals) and sparse coefficients. In the high-dimensional case, a weighted-ridge approach is employed, and coefficients are partitioned into three parts, adding weak signals as well. Shrinkage estimators are then obtained in both cases. More importantly, in this paper, we assume that a nonlinear structure is present in the parametric component of the model, which makes the direct application of penalized least squares to the NSRM impossible. To solve this problem, we employ the iterative Gauss-Newton method to obtain the final NSRM estimators. We provide both theoretical and practical details for the suggested estimators. Asymptotic results are derived for both low- and high-dimensional cases. We conduct an extensive simulation study to evaluate the performance of the estimators in a practical setting. Moreover, we substantiate our findings with data examples from two distinct breast cancer datasets: the Breast Cancer in the United States (BCUS) and Wisconsin datasets. By demonstrating the effectiveness of our introduced estimators in these particular biostatistical contexts, our numerical study provides support for the theoretical efficacy of shrinkage estimators, suggesting their potential relevance to breast cancer research and biostatistical methodologies.

本文通过对高斯-牛顿方法的改进,研究了在稀疏性假设下的非线性半参数回归模型(NSRM)的半参数估计策略。在低维情况下,系数被划分为表示非零(强信号)和稀疏系数的两部分。在高维情况下,采用加权脊法,将系数分成三部分,同时加入弱信号。然后得到两种情况下的收缩估计量。更重要的是,在本文中,我们假设模型的参数分量中存在非线性结构,这使得惩罚最小二乘法不可能直接应用于NSRM。为了解决这个问题,我们采用迭代高斯-牛顿方法来得到最终的NSRM估计量。我们为建议的估算器提供理论和实践细节。在低维和高维情况下均得到渐近结果。我们进行了广泛的模拟研究,以评估估计器在实际设置中的性能。此外,我们用来自两个不同乳腺癌数据集的数据实例来证实我们的发现:美国乳腺癌(BCUS)和威斯康星州数据集。通过证明我们引入的估计器在这些特定生物统计学背景下的有效性,我们的数值研究为收缩估计器的理论有效性提供了支持,表明它们与乳腺癌研究和生物统计学方法的潜在相关性。
{"title":"Post-shrinkage strategies for nonlinear semiparametric regression models in low and high-dimensional settings.","authors":"S Ejaz Ahmed, Dursun Aydın, Ersin Yılmaz","doi":"10.1515/ijb-2024-0011","DOIUrl":"https://doi.org/10.1515/ijb-2024-0011","url":null,"abstract":"<p><p>This paper considers semiparametric estimation strategies for the nonlinear semiparametric regression model (NSRM) under the sparsity assumption by modifying the Gauss-Newton method for both low- and high-dimensional data scenarios. In the low-dimensional case, coefficients are partitioned into two parts that represent nonzero (strong signals) and sparse coefficients. In the high-dimensional case, a weighted-ridge approach is employed, and coefficients are partitioned into three parts, adding weak signals as well. Shrinkage estimators are then obtained in both cases. More importantly, in this paper, we assume that a nonlinear structure is present in the parametric component of the model, which makes the direct application of penalized least squares to the NSRM impossible. To solve this problem, we employ the iterative Gauss-Newton method to obtain the final NSRM estimators. We provide both theoretical and practical details for the suggested estimators. Asymptotic results are derived for both low- and high-dimensional cases. We conduct an extensive simulation study to evaluate the performance of the estimators in a practical setting. Moreover, we substantiate our findings with data examples from two distinct breast cancer datasets: the Breast Cancer in the United States (BCUS) and Wisconsin datasets. By demonstrating the effectiveness of our introduced estimators in these particular biostatistical contexts, our numerical study provides support for the theoretical efficacy of shrinkage estimators, suggesting their potential relevance to breast cancer research and biostatistical methodologies.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145558461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DBMS: dynamic borrowing method for frequentist hybrid control designs based on historical-current data similarity. 基于历史-当前数据相似度的频率混合控制设计的动态借用方法。
IF 1.2 4区 数学 Pub Date : 2025-11-06 DOI: 10.1515/ijb-2024-0051
Masahiro Kojima

Information borrowing from historical data is gaining increasing attention in clinical trials for rare and pediatric diseases, where small sample sizes may lead to insufficient statistical power for confirming efficacy. While Bayesian information borrowing methods are well established, recent frequentist approaches, such as the test-then-pool and equivalence-based test-then-pool methods, have been proposed to determine whether historical data should be incorporated into statistical hypothesis testing. Depending on the outcome of these hypothesis tests, historical data may or may not be utilized. This paper introduces a dynamic borrowing method for leveraging historical information based on the similarity between current and historical data. Similar to Bayesian dynamic borrowing, our proposed method adjusts the degree of information borrowing dynamically, ranging from 0 to 100 %. We present two approaches to measure similarity: one using the density function of the t-distribution and the other employing a logistic function. The performance of the proposed methods is evaluated through Monte Carlo simulations. Additionally, we demonstrate the utility of dynamic information borrowing by reanalyzing data from an actual clinical trial.

在罕见病和儿科疾病的临床试验中,借鉴历史数据的信息越来越受到重视,这些临床试验的样本量较小,可能导致证实疗效的统计能力不足。虽然贝叶斯信息借用方法已经很好地建立起来,但最近的频率主义方法,如test-then-pool和基于等效的test-then-pool方法,已经被提出来确定是否应该将历史数据纳入统计假设检验中。根据这些假设检验的结果,可以使用历史数据,也可以不使用历史数据。本文介绍了一种基于当前数据与历史数据相似度的历史信息动态借用方法。与贝叶斯动态借用相似,我们提出的方法动态调整信息借用的程度,范围从0到100 %。我们提出了两种测量相似性的方法:一种使用t分布的密度函数,另一种使用逻辑函数。通过蒙特卡洛仿真对所提方法的性能进行了评价。此外,我们通过重新分析实际临床试验的数据来展示动态信息借用的效用。
{"title":"DBMS: dynamic borrowing method for frequentist hybrid control designs based on historical-current data similarity.","authors":"Masahiro Kojima","doi":"10.1515/ijb-2024-0051","DOIUrl":"https://doi.org/10.1515/ijb-2024-0051","url":null,"abstract":"<p><p>Information borrowing from historical data is gaining increasing attention in clinical trials for rare and pediatric diseases, where small sample sizes may lead to insufficient statistical power for confirming efficacy. While Bayesian information borrowing methods are well established, recent frequentist approaches, such as the test-then-pool and equivalence-based test-then-pool methods, have been proposed to determine whether historical data should be incorporated into statistical hypothesis testing. Depending on the outcome of these hypothesis tests, historical data may or may not be utilized. This paper introduces a dynamic borrowing method for leveraging historical information based on the similarity between current and historical data. Similar to Bayesian dynamic borrowing, our proposed method adjusts the degree of information borrowing dynamically, ranging from 0 to 100 %. We present two approaches to measure similarity: one using the density function of the t-distribution and the other employing a logistic function. The performance of the proposed methods is evaluated through Monte Carlo simulations. Additionally, we demonstrate the utility of dynamic information borrowing by reanalyzing data from an actual clinical trial.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145459898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian competing risks survival modeling for assessing the cause of death of patients with heart failure. 评估心力衰竭患者死亡原因的贝叶斯竞争风险生存模型。
IF 1.2 4区 数学 Pub Date : 2025-11-03 DOI: 10.1515/ijb-2025-0011
Jesús Gutiérrez-Botella, Carmen Armero, Thomas Kneib, María P Pata, Javier García-Seara

Competing risks models are survival models with several events of interest acting in competition and whose occurrence is only observed for the event that occurs first in time. This paper presents a Bayesian approach to these models in which the issue of model selection is treated in a special way by proposing generalizations of some of the Bayesian procedures used in univariate survival analysis. This research is motivated by a study on the survival of patients with heart failure undergoing cardiac resynchronization therapy, a procedure which involves the implant of a device to stabilize the heartbeat. Two different causes of death have been considered: cardiovascular and non-cardiovascular, and a set of baseline covariates are examined in order to better understand their relationship with both causes of death. Model selection, model checking, and model comparison procedures have been implemented and assessed. The posterior distribution of some relevant outputs such as the overall survival function, cumulative incidence functions, and transition probabilities have been computed and discussed.

竞争风险模型是一种生存模型,它包含几个相互竞争的事件,并且只有在第一时间发生的事件才会被观察到。本文提出了一种贝叶斯方法来处理这些模型,其中通过提出一些用于单变量生存分析的贝叶斯过程的概括,以一种特殊的方式处理模型选择问题。这项研究的动机是对接受心脏再同步化治疗的心力衰竭患者的生存率进行研究,这一治疗过程涉及植入一个稳定心跳的装置。考虑了两种不同的死亡原因:心血管和非心血管,并检查了一组基线协变量,以便更好地了解它们与两种死亡原因的关系。模型选择,模型检查和模型比较程序已经实施和评估。计算并讨论了总体生存函数、累积关联函数和转移概率等相关输出的后验分布。
{"title":"Bayesian competing risks survival modeling for assessing the cause of death of patients with heart failure.","authors":"Jesús Gutiérrez-Botella, Carmen Armero, Thomas Kneib, María P Pata, Javier García-Seara","doi":"10.1515/ijb-2025-0011","DOIUrl":"https://doi.org/10.1515/ijb-2025-0011","url":null,"abstract":"<p><p>Competing risks models are survival models with several events of interest acting in competition and whose occurrence is only observed for the event that occurs first in time. This paper presents a Bayesian approach to these models in which the issue of model selection is treated in a special way by proposing generalizations of some of the Bayesian procedures used in univariate survival analysis. This research is motivated by a study on the survival of patients with heart failure undergoing cardiac resynchronization therapy, a procedure which involves the implant of a device to stabilize the heartbeat. Two different causes of death have been considered: cardiovascular and non-cardiovascular, and a set of baseline covariates are examined in order to better understand their relationship with both causes of death. Model selection, model checking, and model comparison procedures have been implemented and assessed. The posterior distribution of some relevant outputs such as the overall survival function, cumulative incidence functions, and transition probabilities have been computed and discussed.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145423494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The gROC curve and the optimal classification. gROC曲线及最优分类。
IF 1.2 4区 数学 Pub Date : 2025-11-03 DOI: 10.1515/ijb-2025-0016
Pablo Martínez-Camblor, Sonia Pérez-Fernández

The binary classification problem (BCP) aims to correctly allocate subjects in one of two possible groups. The groups are frequently defined as having or not one characteristic of interest. With this goal, we are allowed to use different types of information. There is a huge number of methods dealing with this problem; including standard binary regression models, or complex machine learning techniques such as support vector machine, boosting, or perceptron, among others. When this information is summarized in a continuous score, we have to define classification regions (or subsets) which will determine whether the subjects are classified as positive, with the characteristic under study, or as negative, otherwise. The standard (or regular) receiver-operating characteristic (ROC) curve assumes that higher values of the marker are associated with higher probabilities of being positive and considers as positive those patients with values within the intervals [c, ∞) ( c R ) , and plots the true- against the false- positive rates (sensitivity against one minus specificity) for all potential c. The so-called generalized ROC curve, gROC, allows that both higher and lower values of the score are associated with higher probabilities of being positive. The efficient ROC curve, eROC, considers the best ROC curve based on a transformation of the score. In this manuscript, we are interested in studying, comparing and approximating the transformations leading to the eROC and to the gROC curves. We will prove that, when the optimal transformation does not have relative maximum, both curves are equivalent. Besides, we investigate the use of the gROC curve on some theoretical models, explore the relationship between the gROC and the eROC curves, and propose two non-parametric procedures for approximating the transformation leading to the gROC curve. The finite-sample behavior of the proposed estimators is explored through Monte Carlo simulations. Two real-data sets illustrate the practical use of the proposed methods.

二值分类问题(BCP)的目标是将受试者正确地分配到两个可能的组中。这些群体通常被定义为具有或不具有一种兴趣特征。有了这个目标,我们可以使用不同类型的信息。有很多方法可以解决这个问题;包括标准的二元回归模型,或复杂的机器学习技术,如支持向量机、增强或感知机等。当这些信息汇总成一个连续的分数时,我们必须定义分类区域(或子集),它将决定受试者是被分类为具有研究特征的积极的,还是被分类为消极的,否则。标准(或常规)接受者工作特征(ROC)曲线假设标记值越高,阳性概率越高,并将值在区间[c,∞)(c∈R)内的患者视为阳性,并绘制所有潜在c的真阳性率与假阳性率(灵敏度对1减去特异性)。允许较高和较低的分数值与较高的正概率相关联。有效ROC曲线(eROC)是基于分数的转换来考虑最佳ROC曲线。在本文中,我们感兴趣的是研究、比较和近似导致eROC和gROC曲线的转换。我们将证明,当最优变换不存在相对最大值时,两条曲线是等价的。此外,我们还研究了gROC曲线在一些理论模型上的应用,探讨了gROC曲线与eROC曲线之间的关系,并提出了两种逼近gROC曲线转换的非参数过程。通过蒙特卡罗模拟探讨了所提估计器的有限样本行为。两个实际数据集说明了所提方法的实际应用。
{"title":"The gROC curve and the optimal classification.","authors":"Pablo Martínez-Camblor, Sonia Pérez-Fernández","doi":"10.1515/ijb-2025-0016","DOIUrl":"https://doi.org/10.1515/ijb-2025-0016","url":null,"abstract":"<p><p>The binary classification problem (BCP) aims to correctly allocate subjects in one of two possible groups. The groups are frequently defined as having or not one characteristic of interest. With this goal, we are allowed to use different types of information. There is a huge number of methods dealing with this problem; including standard binary regression models, or complex machine learning techniques such as support vector machine, boosting, or perceptron, among others. When this information is summarized in a continuous score, we have to define classification regions (or subsets) which will determine whether the subjects are classified as positive, with the characteristic under study, or as negative, otherwise. The standard (or regular) receiver-operating characteristic (ROC) curve assumes that higher values of the marker are associated with higher probabilities of being positive and considers as positive those patients with values within the intervals [<i>c</i>, ∞) <math><mrow><mo>(</mo> <mrow><mi>c</mi> <mo>∈</mo> <mi>R</mi></mrow> <mo>)</mo></mrow> </math> , and plots the true- against the false- positive rates (sensitivity against one minus specificity) for all potential <i>c</i>. The so-called generalized ROC curve, gROC, allows that both higher and lower values of the score are associated with higher probabilities of being positive. The efficient ROC curve, eROC, considers the best ROC curve based on a transformation of the score. In this manuscript, we are interested in studying, comparing and approximating the transformations leading to the eROC and to the gROC curves. We will prove that, when the optimal transformation does not have relative maximum, both curves are equivalent. Besides, we investigate the use of the gROC curve on some theoretical models, explore the relationship between the gROC and the eROC curves, and propose two non-parametric procedures for approximating the transformation leading to the gROC curve. The finite-sample behavior of the proposed estimators is explored through Monte Carlo simulations. Two real-data sets illustrate the practical use of the proposed methods.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145423453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced doubly robust estimate with semiparametric models for causal inference of survival outcome. 用半参数模型对生存结果进行因果推断的增强双稳健估计。
IF 1.2 4区 数学 Pub Date : 2025-10-31 DOI: 10.1515/ijb-2023-0131
Tianmin Wu, Ao Yuan, Ming Tan

In observational studies, the treatment assignment is typically not random. Even in randomized clinical trials, the randomization may be imperfect given the limitation of sample size. In these cases, traditional statistical methods may lead to biased estimates of treatment effects, and causal inference methods are needed to obtain unbiased estimates. The doubly robust estimator (DRE) is a recent development in causal inference, but the literature on DRE for survival data is very limited, and existing methods tend to have complicated forms and may not have double robustness in the original sense. Some are constructed based on the Nelson-Aalen estimator, and to our knowledge no DRE is constructed based on the Kaplan-Meier estimator. Furthermore, in these methods, the propensity score model is often subjectively specified with a logistic model. DRE can be seriously biased if the propensity score and outcome models are slightly misspecified. Here we propose a new semiparametric robust estimator that utilizes the Kaplan-Meier estimator and Stute weighted empirical form to address these issues. Our proposed estimator is not only doubly robust in the original sense but also enhances robustness with the use of semiparametric specification. The asymptotic properties of the proposed estimator are derived, and extensive simulation studies are conducted to evaluate its finite sample performance and compare it with existing methods. Finally, we apply our proposed method to a real clinical study.

在观察性研究中,治疗分配通常不是随机的。即使在随机临床试验中,由于样本量的限制,随机化也可能是不完美的。在这些情况下,传统的统计方法可能导致治疗效果的估计有偏,需要因果推理方法来获得无偏估计。双鲁棒估计(DRE)是近年来在因果推理中发展起来的一种方法,但关于生存数据的双鲁棒估计的文献非常有限,而且现有的方法往往形式复杂,可能不具有原有意义上的双鲁棒性。有些是基于Nelson-Aalen估计量构造的,据我们所知,没有DRE是基于Kaplan-Meier估计量构造的。此外,在这些方法中,倾向得分模型往往是主观指定的逻辑模型。如果倾向评分和结果模型稍有错误,DRE可能会有严重偏差。本文提出了一种新的半参数鲁棒估计,利用Kaplan-Meier估计和Stute加权经验形式来解决这些问题。我们提出的估计不仅具有原有意义上的双重鲁棒性,而且利用半参数规范增强了鲁棒性。推导了该估计器的渐近性质,并进行了广泛的仿真研究,以评估其有限样本性能,并将其与现有方法进行比较。最后,我们将提出的方法应用到实际的临床研究中。
{"title":"Enhanced doubly robust estimate with semiparametric models for causal inference of survival outcome.","authors":"Tianmin Wu, Ao Yuan, Ming Tan","doi":"10.1515/ijb-2023-0131","DOIUrl":"https://doi.org/10.1515/ijb-2023-0131","url":null,"abstract":"<p><p>In observational studies, the treatment assignment is typically not random. Even in randomized clinical trials, the randomization may be imperfect given the limitation of sample size. In these cases, traditional statistical methods may lead to biased estimates of treatment effects, and causal inference methods are needed to obtain unbiased estimates. The doubly robust estimator (DRE) is a recent development in causal inference, but the literature on DRE for survival data is very limited, and existing methods tend to have complicated forms and may not have double robustness in the original sense. Some are constructed based on the Nelson-Aalen estimator, and to our knowledge no DRE is constructed based on the Kaplan-Meier estimator. Furthermore, in these methods, the propensity score model is often subjectively specified with a logistic model. DRE can be seriously biased if the propensity score and outcome models are slightly misspecified. Here we propose a new semiparametric robust estimator that utilizes the Kaplan-Meier estimator and Stute weighted empirical form to address these issues. Our proposed estimator is not only doubly robust in the original sense but also enhances robustness with the use of semiparametric specification. The asymptotic properties of the proposed estimator are derived, and extensive simulation studies are conducted to evaluate its finite sample performance and compare it with existing methods. Finally, we apply our proposed method to a real clinical study.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145410580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Copula-based Cox models for dependent current status data with a cure fraction. 基于copula的具有固化分数的相关电流状态数据的Cox模型。
IF 1.2 4区 数学 Pub Date : 2025-10-09 DOI: 10.1515/ijb-2025-0038
Shuying Wang, Danping Zhou, Yunfei Yang, Bo Zhao

Traditional survival analysis typically assumes that all subjects will eventually experience the event of interest given a sufficiently long follow-up period. Nevertheless, due to advancements in medical technology, researchers now frequently observe that some subjects never experience the event and are considered cured. Furthermore, traditional survival analysis assumes independence between failure time and censoring time. However, practical applications often reveal dependence between them. Ignoring both the cured subgroup and this dependence structure can introduce bias in model estimates. Among the methods for handling dependent censoring data, the numerical integration process of frailty models is complex and sensitive to the assumptions about the latent variable distribution. In contrast, the copula method, by flexibly modeling the dependence between variables, avoids strong assumptions about the latent variable structure, offering greater robustness and computational feasibility. Therefore, this paper proposes a copula-based method to handle dependent current status data involving a cure fraction. In the modeling process, we establish a logistic model to describe the susceptible rate and a Cox proportional hazards model to describe the failure time and censoring time. In the estimation process, we employ a sieve maximum likelihood estimation method based on Bernstein polynomials for parameter estimation. Extensive simulation experiments show that the proposed method demonstrates consistency and asymptotic efficiency under various settings. Finally, this paper applies the method to lymph follicle cell data, verifying its effectiveness in practical data analysis.

传统的生存分析通常假设,在足够长的随访期内,所有受试者最终都会经历感兴趣的事件。然而,由于医疗技术的进步,研究人员现在经常观察到一些受试者从未经历过这一事件,并被认为已经治愈。此外,传统的生存分析假设失效时间与检测时间无关。然而,实际应用往往揭示了它们之间的依赖关系。忽略治愈子群和这种依赖结构会在模型估计中引入偏差。在相关筛选数据的处理方法中,脆弱性模型的数值积分过程复杂且对潜在变量分布的假设敏感。相比之下,copula方法通过灵活地建模变量之间的相关性,避免了对潜在变量结构的强假设,具有更强的鲁棒性和计算可行性。因此,本文提出了一种基于copula的方法来处理涉及固化分数的相关电流状态数据。在建模过程中,建立了描述易感率的logistic模型和描述故障时间和审查时间的Cox比例风险模型。在估计过程中,我们采用基于Bernstein多项式的筛极大似然估计方法进行参数估计。大量的仿真实验表明,该方法在各种设置下都具有一致性和渐近效率。最后,将该方法应用于淋巴滤泡细胞数据,验证了该方法在实际数据分析中的有效性。
{"title":"Copula-based Cox models for dependent current status data with a cure fraction.","authors":"Shuying Wang, Danping Zhou, Yunfei Yang, Bo Zhao","doi":"10.1515/ijb-2025-0038","DOIUrl":"https://doi.org/10.1515/ijb-2025-0038","url":null,"abstract":"<p><p>Traditional survival analysis typically assumes that all subjects will eventually experience the event of interest given a sufficiently long follow-up period. Nevertheless, due to advancements in medical technology, researchers now frequently observe that some subjects never experience the event and are considered cured. Furthermore, traditional survival analysis assumes independence between failure time and censoring time. However, practical applications often reveal dependence between them. Ignoring both the cured subgroup and this dependence structure can introduce bias in model estimates. Among the methods for handling dependent censoring data, the numerical integration process of frailty models is complex and sensitive to the assumptions about the latent variable distribution. In contrast, the copula method, by flexibly modeling the dependence between variables, avoids strong assumptions about the latent variable structure, offering greater robustness and computational feasibility. Therefore, this paper proposes a copula-based method to handle dependent current status data involving a cure fraction. In the modeling process, we establish a logistic model to describe the susceptible rate and a Cox proportional hazards model to describe the failure time and censoring time. In the estimation process, we employ a sieve maximum likelihood estimation method based on Bernstein polynomials for parameter estimation. Extensive simulation experiments show that the proposed method demonstrates consistency and asymptotic efficiency under various settings. Finally, this paper applies the method to lymph follicle cell data, verifying its effectiveness in practical data analysis.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145253416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An enhanced approximate Bayesian computation method for stage-structured development models. 阶段结构开发模型的改进近似贝叶斯计算方法。
IF 1.2 4区 数学 Pub Date : 2025-09-23 DOI: 10.1515/ijb-2025-0065
Hoa Pham, Huong T T Pham, Kai Siong Yow

Multi-stage models for cohort data are widely used in various fields, including disease progression, the biological development of plants and animals, and laboratory studies of life cycle development. However, the likelihood functions of these models are often intractable and complex. These complexities in the likelihood functions frequently result in significant biases and high computational costs when estimating parameters using current Bayesian methods. This paper aims to address these challenges by applying the enhanced Sequential Monte Carlo approximate Bayesian computation (ABC-SMC) method, which does not rely on explicit likelihood functions, to stage-structured development models with non-hazard rates and stage-wise constant hazard rates. Instead of using a likelihood function, the proposed method determines parameter estimates based on matching vector summary statistics. It incorporates stage-wise parameter estimations and retains accepted parameters across stages. This approach not only reduces model biases but also improves the computational efficiency of parameter estimations, despite the computational intractability of the likelihood functions. The proposed ABC-SMC method is validated through simulation studies on stage-structured development models and applied to a case study of breast development in New Zealand schoolgirls. The results demonstrate that the proposed methods effectively reduce biases in later-stage estimates for stage-structured models, enhance computational efficiency, and maintain accuracy and reliability in parameter estimations compared to the current methods.

队列数据的多阶段模型广泛应用于疾病进展、动植物生物学发育以及生命周期发育的实验室研究等各个领域。然而,这些模型的似然函数通常是难以处理和复杂的。当使用当前的贝叶斯方法估计参数时,这些复杂性在似然函数中经常导致显著的偏差和高计算成本。本文旨在通过应用增强型序列蒙特卡罗近似贝叶斯计算(ABC-SMC)方法来解决这些挑战,该方法不依赖于显式似然函数,用于具有非风险率和阶段恒定风险率的阶段结构开发模型。该方法不使用似然函数,而是基于匹配向量汇总统计来确定参数估计。它结合了分段参数估计,并在各阶段保留可接受的参数。这种方法不仅减少了模型偏差,而且提高了参数估计的计算效率,尽管似然函数的计算困难。提出的ABC-SMC方法通过阶段结构发育模型的模拟研究得到了验证,并应用于新西兰女学生乳房发育的案例研究。结果表明,与现有方法相比,所提出的方法有效地减少了阶段结构模型后期估计中的偏差,提高了计算效率,并保持了参数估计的准确性和可靠性。
{"title":"An enhanced approximate Bayesian computation method for stage-structured development models.","authors":"Hoa Pham, Huong T T Pham, Kai Siong Yow","doi":"10.1515/ijb-2025-0065","DOIUrl":"https://doi.org/10.1515/ijb-2025-0065","url":null,"abstract":"<p><p>Multi-stage models for cohort data are widely used in various fields, including disease progression, the biological development of plants and animals, and laboratory studies of life cycle development. However, the likelihood functions of these models are often intractable and complex. These complexities in the likelihood functions frequently result in significant biases and high computational costs when estimating parameters using current Bayesian methods. This paper aims to address these challenges by applying the enhanced Sequential Monte Carlo approximate Bayesian computation (ABC-SMC) method, which does not rely on explicit likelihood functions, to stage-structured development models with non-hazard rates and stage-wise constant hazard rates. Instead of using a likelihood function, the proposed method determines parameter estimates based on matching vector summary statistics. It incorporates stage-wise parameter estimations and retains accepted parameters across stages. This approach not only reduces model biases but also improves the computational efficiency of parameter estimations, despite the computational intractability of the likelihood functions. The proposed ABC-SMC method is validated through simulation studies on stage-structured development models and applied to a case study of breast development in New Zealand schoolgirls. The results demonstrate that the proposed methods effectively reduce biases in later-stage estimates for stage-structured models, enhance computational efficiency, and maintain accuracy and reliability in parameter estimations compared to the current methods.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145126234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging external information by guided adaptive shrinkage to improve variable selection in high-dimensional regression settings. 利用外部信息引导自适应收缩,以提高变量选择在高维回归设置。
IF 1.2 4区 数学 Pub Date : 2025-09-08 DOI: 10.1515/ijb-2024-0108
Mark A van de Wiel, Wessel N van Wieringen

Variable selection is challenging for high-dimensional data, in particular when sample size is low. It is widely recognized that external information in the form of complementary data on the variables, 'co-data', may improve results. Examples are known variable groups or p-values from a related study. Such co-data are ubiquitous in genomics settings due to the availability of public repositories, and is likely equally relevant for other applications. Yet, the uptake of prediction methods that structurally use such co-data is limited. We review guided adaptive shrinkage methods: a class of regression-based learners that use co-data to adapt the shrinkage parameters, crucial for the performance of those learners. We discuss technical aspects, but also the applicability in terms of types of co-data that can be handled. This class of methods is contrasted with several others. In particular, group-adaptive shrinkage is compared with the better-known sparse group-lasso by evaluating variable selection. Moreover, we demonstrate the versatility of the guided shrinkage methodology by showing how to 'do-it-yourself': we integrate implementations of a co-data learner and the spike-and-slab prior for the purpose of improving variable selection in genetics studies. We conclude with a real data example.

对于高维数据,特别是当样本量较低时,变量选择是具有挑战性的。人们普遍认为,有关变量的补充数据形式的外部信息,即“协数据”,可能会改善结果。例如,相关研究中的已知变量组或p值。由于公共存储库的可用性,这种协同数据在基因组学设置中无处不在,并且可能与其他应用程序同样相关。然而,在结构上使用这种协同数据的预测方法的吸收是有限的。我们回顾了引导自适应收缩方法:一类基于回归的学习器,它使用协数据来适应收缩参数,这对这些学习器的性能至关重要。我们讨论了技术方面的问题,但也讨论了可处理的协同数据类型的适用性。这类方法与其他几种方法作了对比。特别是,通过评估变量选择,将群体自适应收缩与更著名的稀疏群体lasso进行比较。此外,我们通过展示如何“自己动手”来展示引导收缩方法的多功能性:我们整合了共同数据学习器的实现和尖钉-板先验,以改善遗传学研究中的变量选择。我们以一个真实的数据示例作为总结。
{"title":"Leveraging external information by guided adaptive shrinkage to improve variable selection in high-dimensional regression settings.","authors":"Mark A van de Wiel, Wessel N van Wieringen","doi":"10.1515/ijb-2024-0108","DOIUrl":"https://doi.org/10.1515/ijb-2024-0108","url":null,"abstract":"<p><p>Variable selection is challenging for high-dimensional data, in particular when sample size is low. It is widely recognized that external information in the form of complementary data on the variables, 'co-data', may improve results. Examples are known variable groups or <i>p</i>-values from a related study. Such co-data are ubiquitous in genomics settings due to the availability of public repositories, and is likely equally relevant for other applications. Yet, the uptake of prediction methods that structurally use such co-data is limited. We review guided adaptive shrinkage methods: a class of regression-based learners that use co-data to adapt the shrinkage parameters, crucial for the performance of those learners. We discuss technical aspects, but also the applicability in terms of types of co-data that can be handled. This class of methods is contrasted with several others. In particular, group-adaptive shrinkage is compared with the better-known sparse group-lasso by evaluating variable selection. Moreover, we demonstrate the versatility of the guided shrinkage methodology by showing how to 'do-it-yourself': we integrate implementations of a co-data learner and the spike-and-slab prior for the purpose of improving variable selection in genetics studies. We conclude with a real data example.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145076513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Two-sample empirical likelihood method for right censored data. 右截尾数据的两样本经验似然方法。
IF 1.2 4区 数学 Pub Date : 2025-09-05 DOI: 10.1515/ijb-2024-0120
Leonora Pahirko, Janis Valeinis, Deivids Jēkabsons

In this paper, a two-sample empirical likelihood method for right censored data is established. This method allows for comparisons between various functionals of survival distributions, such as mean lifetimes, survival probabilities at a fixed time, restricted mean survival times, and other parameters of interest. It is demonstrated that under some regularity conditions, the scaled empirical likelihood statistic converges to a chi-squared distributed random variable with one degree of freedom. A consistent estimator for the scaling constant is proposed, involving the jackknife estimator of the asymptotic variance of the Kaplan-Meier integral. A simulation study is carried out to investigate the coverage accuracy of confidence intervals. Finally, two real datasets are analyzed to illustrate the application of the proposed method.

本文建立了右截尾数据的两样本经验似然方法。这种方法允许在生存分布的各种函数之间进行比较,例如平均寿命、固定时间的生存概率、受限的平均生存时间和其他感兴趣的参数。证明了在一定的正则性条件下,尺度经验似然统计量收敛于一个单自由度的卡方分布随机变量。给出了尺度常数的一个一致估计量,其中包括Kaplan-Meier积分渐近方差的刀切估计量。对置信区间的覆盖精度进行了仿真研究。最后,通过对两个实际数据集的分析来说明该方法的应用。
{"title":"Two-sample empirical likelihood method for right censored data.","authors":"Leonora Pahirko, Janis Valeinis, Deivids Jēkabsons","doi":"10.1515/ijb-2024-0120","DOIUrl":"https://doi.org/10.1515/ijb-2024-0120","url":null,"abstract":"<p><p>In this paper, a two-sample empirical likelihood method for right censored data is established. This method allows for comparisons between various functionals of survival distributions, such as mean lifetimes, survival probabilities at a fixed time, restricted mean survival times, and other parameters of interest. It is demonstrated that under some regularity conditions, the scaled empirical likelihood statistic converges to a chi-squared distributed random variable with one degree of freedom. A consistent estimator for the scaling constant is proposed, involving the jackknife estimator of the asymptotic variance of the Kaplan-Meier integral. A simulation study is carried out to investigate the coverage accuracy of confidence intervals. Finally, two real datasets are analyzed to illustrate the application of the proposed method.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145070810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inference on overlap index: with an application to cancer data. 重叠指数的推理:与癌症数据的应用。
IF 1.2 4区 数学 Pub Date : 2025-09-05 DOI: 10.1515/ijb-2024-0106
Raju Dey, Arne C Bathke, Somesh Kumar

The quantification of overlap between two distributions has applications in various fields of biology, medical, genetic, and ecological research. In this article, new overlap and containment indices are considered for quantifying the niche overlap between two species/populations. Some new properties of these indices are established and the problem of estimation is studied, when the two distributions are exponential with different scale parameters. We propose several estimators and compare their relative performance with respect to different loss functions. The asymptotic normality of the maximum likelihood estimators of these indices is proved under certain conditions. We also obtain confidence intervals of the indices based on three different approaches and compare their average lengths and coverage probabilities. The point and confidence interval procedures developed here are applied on a breast cancer data set to analyze the similarity between the survival times of patients undergoing two different types of surgery. Additionally, the similarity between the relapse free times of these two sets of patients is also studied.

两种分布之间重叠的量化在生物学、医学、遗传学和生态学研究的各个领域都有应用。本文考虑了新的重叠指数和遏制指数来量化两个物种/种群之间的生态位重叠。建立了这些指标的一些新性质,并研究了两种分布在不同尺度参数下呈指数分布时的估计问题。我们提出了几种估计器,并比较了它们相对于不同损失函数的相对性能。在一定条件下,证明了这些指标的极大似然估计的渐近正态性。我们还基于三种不同的方法获得了指数的置信区间,并比较了它们的平均长度和覆盖概率。本文提出的点和置信区间程序应用于乳腺癌数据集,以分析接受两种不同类型手术的患者生存时间之间的相似性。此外,还研究了两组患者无复发时间的相似性。
{"title":"Inference on overlap index: with an application to cancer data.","authors":"Raju Dey, Arne C Bathke, Somesh Kumar","doi":"10.1515/ijb-2024-0106","DOIUrl":"https://doi.org/10.1515/ijb-2024-0106","url":null,"abstract":"<p><p>The quantification of overlap between two distributions has applications in various fields of biology, medical, genetic, and ecological research. In this article, new overlap and containment indices are considered for quantifying the niche overlap between two species/populations. Some new properties of these indices are established and the problem of estimation is studied, when the two distributions are exponential with different scale parameters. We propose several estimators and compare their relative performance with respect to different loss functions. The asymptotic normality of the maximum likelihood estimators of these indices is proved under certain conditions. We also obtain confidence intervals of the indices based on three different approaches and compare their average lengths and coverage probabilities. The point and confidence interval procedures developed here are applied on a breast cancer data set to analyze the similarity between the survival times of patients undergoing two different types of surgery. Additionally, the similarity between the relapse free times of these two sets of patients is also studied.</p>","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145070713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Biostatistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1