首页 > 最新文献

International Journal of Biostatistics最新文献

英文 中文
Relative Risk Estimation in Cluster Randomized Trials: A Comparison of Generalized Estimating Equation Methods 聚类随机试验的相对风险估计:广义估计方程方法的比较
IF 1.2 4区 数学 Pub Date : 2011-05-21 DOI: 10.2202/1557-4679.1323
L. Yelland, A. Salter, Philip Ryan
Relative risks have become a popular measure of treatment effect for binary outcomes in randomized controlled trials (RCTs). Relative risks can be estimated directly using log binomial regression but the model may fail to converge. Alternative methods are available for estimating relative risks but these have generally only been evaluated for independent data. As some of these methods are now being applied in cluster RCTs, investigation of their performance in this context is needed. We compare log binomial regression and three alternative methods (expanded logistic regression, log Poisson regression and log normal regression) for estimating relative risks in cluster RCTs. Clustering is taken into account using generalized estimating equations (GEEs) with an independence or exchangeable working correlation structure. The results of our large simulation study show that the log binomial GEE generally performs well for clustered data but suffers from convergence problems, as expected. Both the log Poisson GEE and log normal GEE have advantages in certain settings in terms of type I error, bias and coverage. The expanded logistic GEE can perform poorly and is sensitive to the chosen working correlation structure. Conclusions about the effectiveness of treatment often differ depending on the method used, highlighting the need to pre-specify an analysis approach. We recommend pre-specifying that either the log Poisson GEE or log normal GEE will be used in the event that the log binomial GEE fails to converge.
在随机对照试验(rct)中,相对危险度已成为衡量二元结果治疗效果的常用指标。使用对数二项回归可以直接估计相对风险,但模型可能无法收敛。有其他方法可用于估计相对风险,但这些方法通常仅对独立数据进行了评估。由于其中一些方法目前正在集群随机对照试验中应用,因此有必要研究它们在这种情况下的性能。我们比较了对数二项回归和三种替代方法(扩展逻辑回归、对数泊松回归和对数正态回归)在集群随机对照试验中的相对风险估计。使用具有独立或可交换工作关联结构的广义估计方程(GEEs)来考虑聚类。我们的大型模拟研究结果表明,对数二项GEE对于聚类数据通常表现良好,但正如预期的那样存在收敛问题。对数泊松曲线和对数正态曲线在I型误差、偏差和覆盖范围等方面都具有一定的优势。扩展后的逻辑GEE性能较差,且对选择的工作关联结构比较敏感。关于治疗有效性的结论往往因使用的方法而异,这突出了预先指定分析方法的必要性。我们建议在日志二项式GEE不能收敛的情况下,预先指定使用日志泊松GEE或日志正态GEE。
{"title":"Relative Risk Estimation in Cluster Randomized Trials: A Comparison of Generalized Estimating Equation Methods","authors":"L. Yelland, A. Salter, Philip Ryan","doi":"10.2202/1557-4679.1323","DOIUrl":"https://doi.org/10.2202/1557-4679.1323","url":null,"abstract":"Relative risks have become a popular measure of treatment effect for binary outcomes in randomized controlled trials (RCTs). Relative risks can be estimated directly using log binomial regression but the model may fail to converge. Alternative methods are available for estimating relative risks but these have generally only been evaluated for independent data. As some of these methods are now being applied in cluster RCTs, investigation of their performance in this context is needed. We compare log binomial regression and three alternative methods (expanded logistic regression, log Poisson regression and log normal regression) for estimating relative risks in cluster RCTs. Clustering is taken into account using generalized estimating equations (GEEs) with an independence or exchangeable working correlation structure. The results of our large simulation study show that the log binomial GEE generally performs well for clustered data but suffers from convergence problems, as expected. Both the log Poisson GEE and log normal GEE have advantages in certain settings in terms of type I error, bias and coverage. The expanded logistic GEE can perform poorly and is sensitive to the chosen working correlation structure. Conclusions about the effectiveness of treatment often differ depending on the method used, highlighting the need to pre-specify an analysis approach. We recommend pre-specifying that either the log Poisson GEE or log normal GEE will be used in the event that the log binomial GEE fails to converge.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"7 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2011-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1323","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68718384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
A First Passage Time Model for Long-Term Survivors with Competing Risks 具有竞争风险的长期幸存者的首次通过时间模型
IF 1.2 4区 数学 Pub Date : 2011-05-21 DOI: 10.2202/1557-4679.1224
Ruimin Xu, P. McNicholas, A. Desmond, G. Darlington
We investigate a competing risks model, using the specification of the Gompertz distribution for failure times from competing causes and the inverse Gaussian distribution for failure times from the cause of interest. The expectation-maximization algorithm is used for parameter estimation and the model is applied to real data on breast cancer and melanoma. In these applications, our models compare favourably with existing techniques. The proposed method provides a useful technique that may be more broadly applicable than existing alternatives.
我们研究了一个竞争风险模型,使用来自竞争原因的失败次数的Gompertz分布和来自利益原因的失败次数的逆高斯分布的规范。采用期望最大化算法进行参数估计,并将该模型应用于乳腺癌和黑色素瘤的实际数据。在这些应用中,我们的模型与现有技术相比具有优势。所提出的方法提供了一种有用的技术,可能比现有的替代方法更广泛地适用。
{"title":"A First Passage Time Model for Long-Term Survivors with Competing Risks","authors":"Ruimin Xu, P. McNicholas, A. Desmond, G. Darlington","doi":"10.2202/1557-4679.1224","DOIUrl":"https://doi.org/10.2202/1557-4679.1224","url":null,"abstract":"We investigate a competing risks model, using the specification of the Gompertz distribution for failure times from competing causes and the inverse Gaussian distribution for failure times from the cause of interest. The expectation-maximization algorithm is used for parameter estimation and the model is applied to real data on breast cancer and melanoma. In these applications, our models compare favourably with existing techniques. The proposed method provides a useful technique that may be more broadly applicable than existing alternatives.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"7 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2011-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1224","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68717157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Lower Bound Model for Multiple Record Systems Estimation with Heterogeneous Catchability 具有异构可捕获性的多记录系统估计下界模型
IF 1.2 4区 数学 Pub Date : 2011-05-18 DOI: 10.2202/1557-4679.1283
L. Rivest
This work considers the estimation of the size N of a closed population using incomplete lists of its members. Capture histories are constructed by establishing the presence or the absence of each individual in all the lists available. Models for data featuring a heterogeneous catchability and list dependencies are considered. A log-linear model leading to a lower bound for the population size is derived for a known set of list dependencies and a latent catchability variable with an arbitrary distribution. This generalizes Chao’s lower bound to models with interactions. The proposed model can be used to carry out a search for important list interactions. It also provides diagnostic information about the nature of the underlying heterogeneity. Indeed, it is shown that the Poisson maximum likelihood estimator of N under a dichotomous latent class model does not exist for a particular set of LB models. Several distributions for the heterogeneous catchability are considered; they allow to investigate the sensitivity of the population size estimate to the model for the heterogeneous catchability.
这项工作考虑了使用其成员的不完整列表估计封闭种群的大小N。捕获历史是通过确定所有可用列表中每个个体的存在或不存在来构建的。考虑了具有异构可捕获性和列表依赖性的数据模型。对于已知的列表依赖项集和具有任意分布的潜在可捕获性变量,导出了导致总体大小下界的对数线性模型。这将Chao的下界推广到具有相互作用的模型。所提出的模型可用于搜索重要的列表交互。它还提供了关于潜在异质性性质的诊断信息。事实上,对于一组特定的LB模型,在二分类潜在类模型下N的泊松极大似然估计量不存在。考虑了异质捕集力的几种分布;它们允许研究种群大小估计对异质捕获能力模型的敏感性。
{"title":"A Lower Bound Model for Multiple Record Systems Estimation with Heterogeneous Catchability","authors":"L. Rivest","doi":"10.2202/1557-4679.1283","DOIUrl":"https://doi.org/10.2202/1557-4679.1283","url":null,"abstract":"This work considers the estimation of the size N of a closed population using incomplete lists of its members. Capture histories are constructed by establishing the presence or the absence of each individual in all the lists available. Models for data featuring a heterogeneous catchability and list dependencies are considered. A log-linear model leading to a lower bound for the population size is derived for a known set of list dependencies and a latent catchability variable with an arbitrary distribution. This generalizes Chao’s lower bound to models with interactions. The proposed model can be used to carry out a search for important list interactions. It also provides diagnostic information about the nature of the underlying heterogeneity. Indeed, it is shown that the Poisson maximum likelihood estimator of N under a dichotomous latent class model does not exist for a particular set of LB models. Several distributions for the heterogeneous catchability are considered; they allow to investigate the sensitivity of the population size estimate to the model for the heterogeneous catchability.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"41 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2011-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1283","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68717594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Reaction to Pearl's Critique of Principal Stratification 对珀尔《主要分层批判》的反应
IF 1.2 4区 数学 Pub Date : 2011-04-13 DOI: 10.2202/1557-4679.1324
Arvid Sjolander
This Reader’s Reaction contains some brief remarks regarding Pearl’s concerns regarding the value of principal stratification.
这个读者的反应包含了一些关于珀尔对主要分层的价值的关注的简短评论。
{"title":"Reaction to Pearl's Critique of Principal Stratification","authors":"Arvid Sjolander","doi":"10.2202/1557-4679.1324","DOIUrl":"https://doi.org/10.2202/1557-4679.1324","url":null,"abstract":"This Reader’s Reaction contains some brief remarks regarding Pearl’s concerns regarding the value of principal stratification.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"7 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2011-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1324","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68718456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Bayesian Variable Selection in Semiparametric Proportional Hazards Model for High Dimensional Survival Data 高维生存数据半参数比例风险模型中的贝叶斯变量选择
IF 1.2 4区 数学 Pub Date : 2011-04-07 DOI: 10.2202/1557-4679.1301
Kyu Ha Lee, S. Chakraborty, Jianguo Sun
Variable selection for high dimensional data has recently received a great deal of attention. However, due to the complex structure of the likelihood, only limited developments have been made for time-to-event data where censoring is present. In this paper, we propose a Bayesian variable selection scheme for a Bayesian semiparametric survival model for right censored survival data sets. A special shrinkage prior on the coefficients corresponding to the predictor variables is used to handle cases when the explanatory variables are of very high-dimension. The shrinkage prior is obtained through a scale mixture representation of Normal and Gamma distributions. Our proposed variable selection prior corresponds to the well known lasso penalty. The likelihood function is based on the Cox proportional hazards model framework, where the cumulative baseline hazard function is modeled a priori by a gamma process. We assign a prior on the tuning parameter of the shrinkage prior and adaptively control the sparsity of our model. The primary use of the proposed model is to identify the important covariates relating to the survival curves. To implement our methodology, we have developed a fast Markov chain Monte Carlo algorithm with an adaptive jumping rule. We have successfully applied our method on simulated data sets under two different settings and real microarray data sets which contain right censored survival time. The performance of our Bayesian variable selection model compared with other competing methods is also provided to demonstrate the superiority of our method. A short description of the biological relevance of the selected genes in the real data sets is provided, further strengthening our claims.
高维数据的变量选择问题近年来受到了广泛的关注。然而,由于可能性的复杂结构,对于存在审查的事件时间数据,只进行了有限的开发。本文针对右截尾生存数据集的贝叶斯半参数生存模型,提出了一个贝叶斯变量选择方案。当解释变量具有非常高的维度时,对预测变量对应的系数使用特殊的先验收缩来处理。收缩先验是通过正态分布和伽玛分布的比例混合表示获得的。我们提出的变量选择先验对应于众所周知的套索惩罚。似然函数基于Cox比例风险模型框架,其中累积基线风险函数通过gamma过程先验建模。我们对收缩先验的调整参数赋予一个先验,并自适应地控制模型的稀疏度。该模型的主要用途是识别与生存曲线相关的重要协变量。为了实现我们的方法,我们开发了一个具有自适应跳跃规则的快速马尔可夫链蒙特卡罗算法。我们成功地将我们的方法应用于两种不同设置下的模拟数据集和包含正确截短存活时间的真实微阵列数据集。最后,将贝叶斯变量选择模型的性能与其他竞争方法进行了比较,证明了该方法的优越性。在真实的数据集中提供了所选基因的生物学相关性的简短描述,进一步加强了我们的主张。
{"title":"Bayesian Variable Selection in Semiparametric Proportional Hazards Model for High Dimensional Survival Data","authors":"Kyu Ha Lee, S. Chakraborty, Jianguo Sun","doi":"10.2202/1557-4679.1301","DOIUrl":"https://doi.org/10.2202/1557-4679.1301","url":null,"abstract":"Variable selection for high dimensional data has recently received a great deal of attention. However, due to the complex structure of the likelihood, only limited developments have been made for time-to-event data where censoring is present. In this paper, we propose a Bayesian variable selection scheme for a Bayesian semiparametric survival model for right censored survival data sets. A special shrinkage prior on the coefficients corresponding to the predictor variables is used to handle cases when the explanatory variables are of very high-dimension. The shrinkage prior is obtained through a scale mixture representation of Normal and Gamma distributions. Our proposed variable selection prior corresponds to the well known lasso penalty. The likelihood function is based on the Cox proportional hazards model framework, where the cumulative baseline hazard function is modeled a priori by a gamma process. We assign a prior on the tuning parameter of the shrinkage prior and adaptively control the sparsity of our model. The primary use of the proposed model is to identify the important covariates relating to the survival curves. To implement our methodology, we have developed a fast Markov chain Monte Carlo algorithm with an adaptive jumping rule. We have successfully applied our method on simulated data sets under two different settings and real microarray data sets which contain right censored survival time. The performance of our Bayesian variable selection model compared with other competing methods is also provided to demonstrate the superiority of our method. A short description of the biological relevance of the selected genes in the real data sets is provided, further strengthening our claims.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"7 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2011-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1301","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68717639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
An Alternative to Pooling Kaplan-Meier Curves in Time-to-Event Meta-Analysis 时间-事件元分析中Kaplan-Meier曲线池化的替代方法
IF 1.2 4区 数学 Pub Date : 2011-03-30 DOI: 10.2202/1557-4679.1289
D. Rubin
A meta-analysis that uses individual-level data instead of study-level data is widely considered to be a gold standard approach, in part because it allows a time-to-event analysis. Unfortunately, with the common practice of presenting Kaplan-Meier survival curves after pooling subjects across randomized trials, using individual-level data can actually be a step backwards; a Simpson's paradox can occur in which pooling incorrectly reverses the direction of an association. We introduce a nonparametric procedure for synthesizing survival curves across studies that is designed to avoid this difficulty and preserve the integrity of randomization. The technique is based on a counterfactual formulation in which we ask what pooled survival curves would look like if all subjects in all studies had been assigned treatment, or if all subjects had been assigned to control arms. The method is related to a Kaplan-Meier adjustment proposed in 2005 by Xie and Liu to correct for confounding in nonrandomized studies, but is formulated for the meta-analysis setting. The procedure is discussed in the context of examining rosiglitazone and cardiovascular adverse events.
使用个人层面数据而不是研究层面数据的元分析被广泛认为是一种黄金标准方法,部分原因是它允许对事件进行时间分析。不幸的是,在随机试验汇集受试者后呈现Kaplan-Meier生存曲线的普遍做法是,使用个人水平的数据实际上是一种倒退;辛普森悖论可能发生在汇集错误地扭转了一个联系的方向。我们引入了一种非参数程序来综合研究中的生存曲线,旨在避免这一困难并保持随机化的完整性。这项技术是基于一个反事实的公式,在这个公式中,我们问如果所有研究中的所有受试者都被分配治疗,或者如果所有受试者都被分配到对照组,那么合并生存曲线会是什么样子。该方法与Xie和Liu在2005年提出的Kaplan-Meier调整有关,该调整旨在纠正非随机研究中的混淆,但该方法是为meta分析设置而制定的。在检查罗格列酮和心血管不良事件的背景下讨论该程序。
{"title":"An Alternative to Pooling Kaplan-Meier Curves in Time-to-Event Meta-Analysis","authors":"D. Rubin","doi":"10.2202/1557-4679.1289","DOIUrl":"https://doi.org/10.2202/1557-4679.1289","url":null,"abstract":"A meta-analysis that uses individual-level data instead of study-level data is widely considered to be a gold standard approach, in part because it allows a time-to-event analysis. Unfortunately, with the common practice of presenting Kaplan-Meier survival curves after pooling subjects across randomized trials, using individual-level data can actually be a step backwards; a Simpson's paradox can occur in which pooling incorrectly reverses the direction of an association. We introduce a nonparametric procedure for synthesizing survival curves across studies that is designed to avoid this difficulty and preserve the integrity of randomization. The technique is based on a counterfactual formulation in which we ask what pooled survival curves would look like if all subjects in all studies had been assigned treatment, or if all subjects had been assigned to control arms. The method is related to a Kaplan-Meier adjustment proposed in 2005 by Xie and Liu to correct for confounding in nonrandomized studies, but is formulated for the meta-analysis setting. The procedure is discussed in the context of examining rosiglitazone and cardiovascular adverse events.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"7 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2011-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1289","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68717777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Marginal Models for Censored Longitudinal Cost Data: Appropriate Working Variance Matrices in Inverse-Probability-Weighted GEEs Can Improve Precision 删减纵向成本数据的边际模型:在反概率加权GEEs中适当的工作方差矩阵可以提高精度
IF 1.2 4区 数学 Pub Date : 2011-02-07 DOI: 10.2202/1557-4679.1170
E. Pullenayegum, A. Willan
When cost data are collected in a clinical study, interest centers on the between-treatment difference in mean cost. When censoring is present, the resulting loss of information can be limited by collecting cost data for several pre-specified time intervals, leading to censored longitudinal cost data. Most models for marginal costs stratify by time interval. However, in few other areas of biostatistics would we stratify by default. We argue that there are benefits to considering more general models: for example, in some settings, pooling regression coefficients across intervals can improve the precision of the estimated between-treatment difference in mean cost. Previous work has used inverse-probability-weighted GEEs coupled with an independent working variance to estimate parameters from these more general models. We show that the greatest precision benefits of non-stratified models are achieved by using more sophisticated working variance matrices.
当在临床研究中收集成本数据时,兴趣集中在平均成本的治疗间差异上。当存在审查时,可以通过收集几个预先指定的时间间隔的成本数据来限制所导致的信息丢失,从而导致审查的纵向成本数据。大多数边际成本模型按时间间隔分层。然而,在生物统计学的其他领域,我们不会默认分层。我们认为,考虑更一般的模型是有好处的:例如,在某些情况下,跨区间的回归系数池化可以提高估计平均成本的处理间差异的精度。以前的工作使用了逆概率加权的GEEs,再加上一个独立的工作方差,从这些更一般的模型中估计参数。我们表明,非分层模型的最大精度效益是通过使用更复杂的工作方差矩阵来实现的。
{"title":"Marginal Models for Censored Longitudinal Cost Data: Appropriate Working Variance Matrices in Inverse-Probability-Weighted GEEs Can Improve Precision","authors":"E. Pullenayegum, A. Willan","doi":"10.2202/1557-4679.1170","DOIUrl":"https://doi.org/10.2202/1557-4679.1170","url":null,"abstract":"When cost data are collected in a clinical study, interest centers on the between-treatment difference in mean cost. When censoring is present, the resulting loss of information can be limited by collecting cost data for several pre-specified time intervals, leading to censored longitudinal cost data. Most models for marginal costs stratify by time interval. However, in few other areas of biostatistics would we stratify by default. We argue that there are benefits to considering more general models: for example, in some settings, pooling regression coefficients across intervals can improve the precision of the estimated between-treatment difference in mean cost. Previous work has used inverse-probability-weighted GEEs coupled with an independent working variance to estimate parameters from these more general models. We show that the greatest precision benefits of non-stratified models are achieved by using more sophisticated working variance matrices.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"7 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2011-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1170","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68716427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
HingeBoost: ROC-Based Boost for Classification and Variable Selection HingeBoost:基于roc的分类和变量选择Boost
IF 1.2 4区 数学 Pub Date : 2011-02-04 DOI: 10.2202/1557-4679.1304
Zhuo Wang
In disease classification, a traditional technique is the receiver operative characteristic (ROC) curve and the area under the curve (AUC). With high-dimensional data, the ROC techniques are needed to conduct classification and variable selection. The current ROC methods do not explicitly incorporate unequal misclassification costs or do not have a theoretical grounding for optimizing the AUC. Empirical studies in the literature have demonstrated that optimizing the hinge loss can maximize the AUC approximately. In theory, minimizing the hinge rank loss is equivalent to minimizing the AUC in the asymptotic limit. In this article, we propose a novel nonparametric method HingeBoost to optimize a weighted hinge loss incorporating misclassification costs. HingeBoost can be used to construct linear and nonlinear classifiers. The estimation and variable selection for the hinge loss are addressed by a new boosting algorithm. Furthermore, the proposed twin HingeBoost can select more sparse predictors. Some properties of HingeBoost are studied as well. To compare HingeBoost with existing classification methods, we present empirical study results using data from simulations and a prostate cancer study with mass spectrometry-based proteomics.
在疾病分类中,传统的方法是根据受试者的工作特征(ROC)曲线和曲线下面积(AUC)进行分类。对于高维数据,需要使用ROC技术进行分类和变量选择。目前的ROC方法没有明确地纳入不等错分类成本,也没有优化AUC的理论基础。已有文献的实证研究表明,优化铰链损耗可以近似地使AUC最大化。理论上,最小化铰阶损失相当于最小化渐近极限下的AUC。在本文中,我们提出了一种新的非参数方法HingeBoost来优化包含误分类成本的加权铰链损失。HingeBoost可以用来构造线性和非线性分类器。提出了一种新的增强算法,解决了铰链损耗的估计和变量选择问题。此外,提出的孪生HingeBoost可以选择更多的稀疏预测器。研究了HingeBoost的一些特性。为了将HingeBoost与现有的分类方法进行比较,我们利用模拟数据和基于质谱的蛋白质组学的前列腺癌研究结果进行了实证研究。
{"title":"HingeBoost: ROC-Based Boost for Classification and Variable Selection","authors":"Zhuo Wang","doi":"10.2202/1557-4679.1304","DOIUrl":"https://doi.org/10.2202/1557-4679.1304","url":null,"abstract":"In disease classification, a traditional technique is the receiver operative characteristic (ROC) curve and the area under the curve (AUC). With high-dimensional data, the ROC techniques are needed to conduct classification and variable selection. The current ROC methods do not explicitly incorporate unequal misclassification costs or do not have a theoretical grounding for optimizing the AUC. Empirical studies in the literature have demonstrated that optimizing the hinge loss can maximize the AUC approximately. In theory, minimizing the hinge rank loss is equivalent to minimizing the AUC in the asymptotic limit. In this article, we propose a novel nonparametric method HingeBoost to optimize a weighted hinge loss incorporating misclassification costs. HingeBoost can be used to construct linear and nonlinear classifiers. The estimation and variable selection for the hinge loss are addressed by a new boosting algorithm. Furthermore, the proposed twin HingeBoost can select more sparse predictors. Some properties of HingeBoost are studied as well. To compare HingeBoost with existing classification methods, we present empirical study results using data from simulations and a prostate cancer study with mass spectrometry-based proteomics.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"7 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2011-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1304","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68717746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Targeting the Optimal Design in Randomized Clinical Trials with Binary Outcomes and No Covariate: Simulation Study 二元结果无协变量随机临床试验的优化设计目标:模拟研究
IF 1.2 4区 数学 Pub Date : 2011-01-24 DOI: 10.2202/1557-4679.1310
A. Chambaz, M. J. van der Laan
We undertake here a comprehensive simulation study of the theoretical properties that we derive in a companion article devoted to the asymptotic study of adaptive group sequential designs in the case of randomized clinical trials (RCTs) with binary treatment, binary outcome and no covariate. By adaptive design, we mean in this setting a RCT design that allows the investigator to dynamically modify its course through data-driven adjustment of the randomization probability based on data accrued so far without negatively impacting on the statistical integrity of the trial. By adaptive group sequential design, we refer to the fact that group sequential testing methods can be equally well applied on top of adaptive designs. The simulation study validates the theory. It notably shows in the estimation framework that the confidence intervals we obtain achieve the desired coverage even for moderate sample sizes. In addition, it shows in the testing framework that type I error control at the prescribed level is guaranteed and that all sampling procedures only suffer from a very slight increase of the type II error. A three-sentence take-home message is “Adaptive designs do learn the targeted optimal design and inference and testing can be carried out under adaptive sampling as they would under the targeted optimal randomization probability iid sampling. In particular, adaptive designs achieve the same efficiency as the fixed oracle design. This is confirmed by a simulation study, at least for moderate or large sample sizes, across a large collection of targeted randomization probabilities.”
我们在这里进行了一项全面的模拟研究,该研究是我们在一篇同伴文章中得出的,该文章致力于在随机临床试验(rct)中采用二元治疗、二元结果和无协变量的情况下,对自适应组序列设计进行渐近研究。通过自适应设计,我们的意思是在这种情况下,RCT设计允许研究者根据迄今为止累积的数据,通过数据驱动的随机化概率调整来动态修改其过程,而不会对试验的统计完整性产生负面影响。通过自适应组序列设计,我们指的是组序列测试方法可以同样很好地应用于自适应设计之上。仿真研究验证了理论的正确性。值得注意的是,在估计框架中,即使对于中等样本量,我们获得的置信区间也达到了期望的覆盖率。此外,在测试框架中,可以保证在规定的水平上控制第一类误差,并且所有采样过程只会受到第二类误差的非常轻微的增加。一个三句话的关键信息是“自适应设计确实学习目标最佳设计,并且在自适应抽样下可以进行推理和测试,就像在目标最佳随机化概率抽样下一样。”特别是,自适应设计可以达到与固定oracle设计相同的效率。模拟研究证实了这一点,至少对于中等或较大的样本量,在大量目标随机化概率的集合中。”
{"title":"Targeting the Optimal Design in Randomized Clinical Trials with Binary Outcomes and No Covariate: Simulation Study","authors":"A. Chambaz, M. J. van der Laan","doi":"10.2202/1557-4679.1310","DOIUrl":"https://doi.org/10.2202/1557-4679.1310","url":null,"abstract":"We undertake here a comprehensive simulation study of the theoretical properties that we derive in a companion article devoted to the asymptotic study of adaptive group sequential designs in the case of randomized clinical trials (RCTs) with binary treatment, binary outcome and no covariate. By adaptive design, we mean in this setting a RCT design that allows the investigator to dynamically modify its course through data-driven adjustment of the randomization probability based on data accrued so far without negatively impacting on the statistical integrity of the trial. By adaptive group sequential design, we refer to the fact that group sequential testing methods can be equally well applied on top of adaptive designs. The simulation study validates the theory. It notably shows in the estimation framework that the confidence intervals we obtain achieve the desired coverage even for moderate sample sizes. In addition, it shows in the testing framework that type I error control at the prescribed level is guaranteed and that all sampling procedures only suffer from a very slight increase of the type II error. A three-sentence take-home message is “Adaptive designs do learn the targeted optimal design and inference and testing can be carried out under adaptive sampling as they would under the targeted optimal randomization probability iid sampling. In particular, adaptive designs achieve the same efficiency as the fixed oracle design. This is confirmed by a simulation study, at least for moderate or large sample sizes, across a large collection of targeted randomization probabilities.”","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"7 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2011-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1310","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68718288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Relative Risk Estimation in Randomized Controlled Trials: A Comparison of Methods for Independent Observations 随机对照试验的相对风险估计:独立观察方法的比较
IF 1.2 4区 数学 Pub Date : 2011-01-06 DOI: 10.2202/1557-4679.1278
L. Yelland, A. Salter, Philip Ryan
The relative risk is a clinically important measure of the effect of treatment on binary outcomes in randomized controlled trials (RCTs). An adjusted relative risk can be estimated using log binomial regression; however, convergence problems are common with this model. While alternative methods have been proposed for estimating relative risks, comparisons between methods have been limited, particularly in the context of RCTs. We compare ten different methods for estimating relative risks under a variety of scenarios relevant to RCTs with independent observations. Results of a large simulation study show that some methods may fail to overcome the convergence problems of log binomial regression, while others may substantially overestimate the treatment effect or produce inaccurate confidence intervals. Further, conclusions about the effectiveness of treatment may differ depending on the method used. We give recommendations for choosing a method for estimating relative risks in the context of RCTs with independent observations.
在随机对照试验(rct)中,相对危险度是衡量治疗对二元结局影响的重要临床指标。调整后的相对风险可用对数二项回归估计;然而,该模型的收敛性问题是常见的。虽然已经提出了估算相对风险的替代方法,但方法之间的比较有限,特别是在随机对照试验的背景下。我们比较了十种不同的方法来估计相对风险在各种情况下与独立观察的随机对照试验相关。一项大型模拟研究的结果表明,一些方法可能无法克服对数二项回归的收敛问题,而另一些方法可能严重高估处理效果或产生不准确的置信区间。此外,关于治疗有效性的结论可能因使用的方法而异。我们建议在独立观察的随机对照试验中选择一种估计相对风险的方法。
{"title":"Relative Risk Estimation in Randomized Controlled Trials: A Comparison of Methods for Independent Observations","authors":"L. Yelland, A. Salter, Philip Ryan","doi":"10.2202/1557-4679.1278","DOIUrl":"https://doi.org/10.2202/1557-4679.1278","url":null,"abstract":"The relative risk is a clinically important measure of the effect of treatment on binary outcomes in randomized controlled trials (RCTs). An adjusted relative risk can be estimated using log binomial regression; however, convergence problems are common with this model. While alternative methods have been proposed for estimating relative risks, comparisons between methods have been limited, particularly in the context of RCTs. We compare ten different methods for estimating relative risks under a variety of scenarios relevant to RCTs with independent observations. Results of a large simulation study show that some methods may fail to overcome the convergence problems of log binomial regression, while others may substantially overestimate the treatment effect or produce inaccurate confidence intervals. Further, conclusions about the effectiveness of treatment may differ depending on the method used. We give recommendations for choosing a method for estimating relative risks in the context of RCTs with independent observations.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"7 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2011-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1278","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68717935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
期刊
International Journal of Biostatistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1