首页 > 最新文献

International Journal of Biostatistics最新文献

英文 中文
Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics 当前状态数据的得分统计:与似然比和沃尔德统计的比较
IF 1.2 4区 数学 Pub Date : 2005-08-04 DOI: 10.2202/1557-4679.1001
M. Banerjee, J. Wellner
In this paper we introduce three natural ``score statistics" for testing the hypothesis that F(t_0)takes on a fixed value in the context of nonparametric inference with current status data. These three new test statistics have natural interpretations in terms of certain (weighted) L_2 distances, and are also connected to natural ``one-sided" scores. We compare these new test statistics with the analogue of the classical Wald statistic and the likelihood ratio statistic introduced in Banerjee and Wellner (2001) for the same testing problem. Under classical ``regular" statistical problems the likelihood ratio, score, and Wald statistics all have the same chi-squared limiting distribution under the null hypothesis. In sharp contrast, in this non-regular problem all three statistics have different limiting distributions under the null hypothesis. Thus we begin by establishing the limit distribution theory of the statistics under the null hypothesis, and discuss calculation of the relevant critical points for the test statistics. Once the null distribution theory is known, the immediate question becomes that of power. We establish the limiting behavior of the three types of statistics under local alternatives. We have also compared the power of these five different statistics via a limited Monte-Carlo study. Our conclusions are: (a) the Wald statistic is less powerful than the likelihood ratio and score statistics; and (b) one of the score statistics may have more power than the likelihood ratio statistic for some alternatives.
在本文中,我们引入了三种自然的“分数统计”来检验F(t_0)在使用当前状态数据进行非参数推理的情况下取固定值的假设。这三个新的测试统计量在一定(加权)l2距离方面具有自然解释,并且也与自然的“片面”分数有关。我们将这些新的检验统计量与Banerjee和Wellner(2001)为同一检验问题引入的经典Wald统计量和似然比统计量的类比进行比较。在经典的“规则”统计问题中,在零假设下,似然比、分数和Wald统计量都具有相同的卡方极限分布。与此形成鲜明对比的是,在这个非正则问题中,所有三种统计量在零假设下具有不同的极限分布。因此,我们首先建立了零假设下统计量的极限分布理论,并讨论了检验统计量的相关临界点的计算。一旦知道了零分布理论,直接的问题就变成了权力的问题。建立了三种统计量在局部替代条件下的极限行为。我们还通过一项有限的蒙特卡洛研究比较了这五种不同统计数据的效力。我们的结论是:(a) Wald统计量比似然比和评分统计量更弱;(b)对于某些选项,其中一个得分统计可能比似然比统计更有效。
{"title":"Score Statistics for Current Status Data: Comparisons with Likelihood Ratio and Wald Statistics","authors":"M. Banerjee, J. Wellner","doi":"10.2202/1557-4679.1001","DOIUrl":"https://doi.org/10.2202/1557-4679.1001","url":null,"abstract":"In this paper we introduce three natural ``score statistics\" for testing the hypothesis that F(t_0)takes on a fixed value in the context of nonparametric inference with current status data. These three new test statistics have natural interpretations in terms of certain (weighted) L_2 distances, and are also connected to natural ``one-sided\" scores. We compare these new test statistics with the analogue of the classical Wald statistic and the likelihood ratio statistic introduced in Banerjee and Wellner (2001) for the same testing problem. Under classical ``regular\" statistical problems the likelihood ratio, score, and Wald statistics all have the same chi-squared limiting distribution under the null hypothesis. In sharp contrast, in this non-regular problem all three statistics have different limiting distributions under the null hypothesis. Thus we begin by establishing the limit distribution theory of the statistics under the null hypothesis, and discuss calculation of the relevant critical points for the test statistics. Once the null distribution theory is known, the immediate question becomes that of power. We establish the limiting behavior of the three types of statistics under local alternatives. We have also compared the power of these five different statistics via a limited Monte-Carlo study. Our conclusions are: (a) the Wald statistic is less powerful than the likelihood ratio and score statistics; and (b) one of the score statistics may have more power than the likelihood ratio statistic for some alternatives.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"1 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2005-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68714657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Some Variants of the Backcalculation Method for Estimation of Disease Incidence: An Application to Multiple Sclerosis Data from the Faroe Islands 估计疾病发病率的反计算方法的一些变体:法罗群岛多发性硬化症数据的应用
IF 1.2 4区 数学 Pub Date : 2005-06-09 DOI: 10.2202/1557-4679.1002
N. Jewell, B. Lu
Backcalculation is a technique that was originally developed for the study of HIV incidence. Here we introduce some variants of the estimation technique that allow for (i) correlation of the unobserved disease incidence counts, and (ii) the use of a smoothing step as part of the maximizing step in the EM algorithm to reduce instability due to small diagnosis counts. Both of these issues can be important in the analysis of small "epidemics." In addition, identification of correlation between diagnosis counts provides indirect evidence of correlation among unobserved incidence counts, hinting at the possibility of an infectious agent. We illustrate the ideas by reconstructing an incidence intensity function for the onset of multiple sclerosis, using data from the Faroe Islands. Previously, this data had been examined statistically, by Joseph, Wolfson & Wolfson (1990), to address the issue of infectiousness of multiple sclerosis. We argue that the incidence function cannot directly shed light on the enigmatic origin of multiple sclerosis in the Faroe Islands during World War II, and, in particular, cannot discriminate between hypotheses of an infectious or environmental agent.
反向计算是一种最初为研究艾滋病毒发病率而开发的技术。在这里,我们介绍了一些估计技术的变体,它们允许(i)未观察到的疾病发病率计数的相关性,以及(ii)在EM算法中使用平滑步骤作为最大化步骤的一部分,以减少由于小诊断计数而导致的不稳定性。这两个问题在分析小型“流行病”时都很重要。此外,确定诊断计数之间的相关性为未观察到的发病率计数之间的相关性提供了间接证据,暗示了感染原的可能性。我们利用法罗群岛的数据,通过重建多发性硬化症发病的发病率强度函数来说明这些观点。此前,Joseph, Wolfson和Wolfson(1990)对这些数据进行了统计检验,以解决多发性硬化症的传染性问题。我们认为,发病率函数不能直接阐明第二次世界大战期间法罗群岛多发性硬化症的神秘起源,特别是不能区分感染或环境因素的假设。
{"title":"Some Variants of the Backcalculation Method for Estimation of Disease Incidence: An Application to Multiple Sclerosis Data from the Faroe Islands","authors":"N. Jewell, B. Lu","doi":"10.2202/1557-4679.1002","DOIUrl":"https://doi.org/10.2202/1557-4679.1002","url":null,"abstract":"Backcalculation is a technique that was originally developed for the study of HIV incidence. Here we introduce some variants of the estimation technique that allow for (i) correlation of the unobserved disease incidence counts, and (ii) the use of a smoothing step as part of the maximizing step in the EM algorithm to reduce instability due to small diagnosis counts. Both of these issues can be important in the analysis of small \"epidemics.\" In addition, identification of correlation between diagnosis counts provides indirect evidence of correlation among unobserved incidence counts, hinting at the possibility of an infectious agent. We illustrate the ideas by reconstructing an incidence intensity function for the onset of multiple sclerosis, using data from the Faroe Islands. Previously, this data had been examined statistically, by Joseph, Wolfson & Wolfson (1990), to address the issue of infectiousness of multiple sclerosis. We argue that the incidence function cannot directly shed light on the enigmatic origin of multiple sclerosis in the Faroe Islands during World War II, and, in particular, cannot discriminate between hypotheses of an infectious or environmental agent.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"1 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2005-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1002","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68714703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Weighted Risk Set Estimator for Survival Distributions in Two-Stage Randomization Designs with Censored Survival Data 带有删节生存数据的两阶段随机化设计中生存分布的加权风险集估计
IF 1.2 4区 数学 Pub Date : 2005-01-01 DOI: 10.2202/1557-4679.1000
Xiang Guo, A. Tsiatis
In many clinical trials related to diseases such as cancers and HIV, patients are treated by different combinations of therapies. This leads to two-stage designs, where patients are initially randomized to a primary therapy and then depending on disease remission and patients' consent, a maintenance therapy will be randomly assigned. In such designs, the effects of different treatment policies, i.e., combinations of primary and maintenance therapy are of great interest. In this paper, we propose an estimator for the survival distribution for each treatment policy in such two-stage studies with right-censoring using the method of weighted estimation equations within risk sets. We also derive the large-sample properties. The method is demonstrated and compared with other estimators through simulations and applied to analyze a two-stage randomized study with leukemia patients.
在许多与癌症和艾滋病毒等疾病有关的临床试验中,患者接受不同的治疗组合。这导致了两阶段设计,患者最初随机接受主要治疗,然后根据疾病缓解和患者同意,随机分配维持治疗。在这样的设计中,不同治疗政策的效果,即初级治疗和维持治疗的组合是非常有趣的。在本文中,我们提出了一个估计器,在这种两阶段的研究中,每个治疗策略的生存分布与权利审查使用风险集中加权估计方程的方法。我们还推导了大样本性质。通过模拟验证了该方法,并与其他估计方法进行了比较,并应用于白血病患者的两期随机研究。
{"title":"A Weighted Risk Set Estimator for Survival Distributions in Two-Stage Randomization Designs with Censored Survival Data","authors":"Xiang Guo, A. Tsiatis","doi":"10.2202/1557-4679.1000","DOIUrl":"https://doi.org/10.2202/1557-4679.1000","url":null,"abstract":"In many clinical trials related to diseases such as cancers and HIV, patients are treated by different combinations of therapies. This leads to two-stage designs, where patients are initially randomized to a primary therapy and then depending on disease remission and patients' consent, a maintenance therapy will be randomly assigned. In such designs, the effects of different treatment policies, i.e., combinations of primary and maintenance therapy are of great interest. In this paper, we propose an estimator for the survival distribution for each treatment policy in such two-stage studies with right-censoring using the method of weighted estimation equations within risk sets. We also derive the large-sample properties. The method is demonstrated and compared with other estimators through simulations and applied to analyze a two-stage randomized study with leukemia patients.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"1 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2005-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1000","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68714613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Frontmatter Frontmatter
IF 1.2 4区 数学 Pub Date : 1900-01-01 DOI: 10.1515/ijb-2021-frontmatter1
{"title":"Frontmatter","authors":"","doi":"10.1515/ijb-2021-frontmatter1","DOIUrl":"https://doi.org/10.1515/ijb-2021-frontmatter1","url":null,"abstract":"","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"39 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2021-frontmatter1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66988181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Relationship between Derivatives of the Observed and Full Loglikelihoods and Application to Newton-Raphson Algorithm 观测似然与全对数似然的导数关系及其在Newton-Raphson算法中的应用
IF 1.2 4区 数学 Pub Date : 1900-01-01 DOI: 10.2202/1557-4679.1010
D. Commenges, V. Rondeau
In the case of incomplete data we give general relationships between the first and second derivatives of the loglikelihood relative to the full and the incomplete observation set-ups. In the case where these quantities are easy to compute for the full observation set-up we propose to compute their analogue for the incomplete observation set-up using the above mentioned relationships: this involves numerical integrations. Once we are able to compute these quantities, Newton-Raphson type algorithms can be applied to find the maximum likelihood estimators, together with estimates of their variances. We detail the application of this approach to parametric multiplicative frailty models and we show that the method works well in practice using both a real data and a simulated example. The proposed algorithm outperforms a Newton-Raphson type algorithm using numerical derivatives.
在数据不完整的情况下,我们给出对数似然的一阶导数和二阶导数相对于完整和不完整观测设置之间的一般关系。在这些量很容易计算完整观测设置的情况下,我们建议使用上述关系计算不完整观测设置的模拟量:这涉及数值积分。一旦我们能够计算这些量,Newton-Raphson型算法就可以应用于找到最大似然估计量,以及它们的方差估计。我们详细介绍了该方法在参数乘法脆弱性模型中的应用,并通过实际数据和模拟示例表明该方法在实践中效果良好。该算法优于使用数值导数的Newton-Raphson型算法。
{"title":"Relationship between Derivatives of the Observed and Full Loglikelihoods and Application to Newton-Raphson Algorithm","authors":"D. Commenges, V. Rondeau","doi":"10.2202/1557-4679.1010","DOIUrl":"https://doi.org/10.2202/1557-4679.1010","url":null,"abstract":"In the case of incomplete data we give general relationships between the first and second derivatives of the loglikelihood relative to the full and the incomplete observation set-ups. In the case where these quantities are easy to compute for the full observation set-up we propose to compute their analogue for the incomplete observation set-up using the above mentioned relationships: this involves numerical integrations. Once we are able to compute these quantities, Newton-Raphson type algorithms can be applied to find the maximum likelihood estimators, together with estimates of their variances. We detail the application of this approach to parametric multiplicative frailty models and we show that the method works well in practice using both a real data and a simulated example. The proposed algorithm outperforms a Newton-Raphson type algorithm using numerical derivatives.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"2 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68714578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Approximate Power and Sample Size Calculations with the Benjamini-Hochberg Method 用Benjamini-Hochberg方法计算近似功率和样本量
IF 1.2 4区 数学 Pub Date : 1900-01-01 DOI: 10.2202/1557-4679.1018
J. A. Ferreira, A. Zwinderman
We provide a method for calculating the sample size required to attain a given average power (the ratio of rejected hypotheses to the number of false hypotheses) and a given false discovery rate (the number of incorrect rejections divided by the number of rejections) in adaptive versions of the Benjamini-Hochberg method of multiple testing. The method works in an asymptotic sense as the number of hypotheses grows to infinity and under quite general conditions, and it requires data from a pilot study. The consistency of the method follows from several results in classical areas of nonparametric statistics developed in a new context of "weak" dependence.
在benjamin - hochberg多重检验方法的自适应版本中,我们提供了一种方法来计算获得给定平均功率(被拒绝的假设与错误假设的数量之比)和给定错误发现率(不正确拒绝的数量除以拒绝的数量)所需的样本量。当假设的数量增长到无穷大,并且在相当一般的条件下,该方法在渐近意义上起作用,并且它需要来自初步研究的数据。该方法的一致性来自非参数统计经典领域的几个结果,这些结果是在“弱”依赖的新背景下发展起来的。
{"title":"Approximate Power and Sample Size Calculations with the Benjamini-Hochberg Method","authors":"J. A. Ferreira, A. Zwinderman","doi":"10.2202/1557-4679.1018","DOIUrl":"https://doi.org/10.2202/1557-4679.1018","url":null,"abstract":"We provide a method for calculating the sample size required to attain a given average power (the ratio of rejected hypotheses to the number of false hypotheses) and a given false discovery rate (the number of incorrect rejections divided by the number of rejections) in adaptive versions of the Benjamini-Hochberg method of multiple testing. The method works in an asymptotic sense as the number of hypotheses grows to infinity and under quite general conditions, and it requires data from a pilot study. The consistency of the method follows from several results in classical areas of nonparametric statistics developed in a new context of \"weak\" dependence.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"2 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1018","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68714885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Statistical Classification of Abnormal Blood Profiles in Athletes 运动员血液异常的统计分类
IF 1.2 4区 数学 Pub Date : 1900-01-01 DOI: 10.2202/1557-4679.1011
P. Sottas, N. Robinson, S. Giraud, F. Taroni, M. Kamber, P. Mangin, M. Saugy
Blood doping has been challenging the scientific community since the early 1970's, where it was demonstrated that blood transfusion significantly improves physical performance. Here, we present through 3 applications how statistical classification techniques can assist the implementation of effective tests to deter blood doping in elite sports. In particular, we developed a new indirect and universal test of blood doping, called Abnormal Blood Profile Score (ABPS), based on the statistical classification of indirect biomarkers of altered erythropoiesis. Up to 601 hematological profiles have been compiled in a reference database. Twenty-one of them were obtained from blood samples withdrawn from professional athletes convicted of blood doping by other direct tests. Discriminative training algorithms were used jointly with cross-validation techniques to map these labeled reference profiles to target outputs. The strict cross-validation procedure facilitates the adherence to medico-legal standards mandated by the World Anti Doping Agency (WADA). The test has a sensitivity to recombinant erythropoietin (rhEPO) abuse up to 3 times better than current generative models, independently whether the athlete is currently taking rhEPO or has stopped the treatment. The test is also sensitive to any form of blood transfusion, autologous transfusion included. We finally conclude why a probabilistic approach should be encouraged for the evaluation of evidence in anti-doping area of investigation.
自20世纪70年代初以来,血液兴奋剂一直是科学界的挑战,当时输血被证明能显著提高身体表现。在这里,我们通过3个应用介绍统计分类技术如何帮助实施有效的测试,以阻止精英运动中的血液兴奋剂。特别是,我们开发了一种新的间接和通用的血液兴奋剂测试,称为异常血液特征评分(ABPS),基于红细胞生成改变的间接生物标志物的统计分类。在参考数据库中汇编了多达601个血液学概况。其中21个是从被判使用血液兴奋剂的职业运动员通过其他直接检测提取的血液样本中获得的。鉴别训练算法与交叉验证技术联合使用,将这些标记的参考轮廓映射到目标输出。严格的交叉验证程序有助于遵守世界反兴奋剂机构(WADA)规定的医疗法律标准。该测试对重组红细胞生成素(rhEPO)滥用的敏感性比目前的生成模型高3倍,与运动员目前是否服用rhEPO或已停止治疗无关。该测试对任何形式的输血也很敏感,包括自体输血。我们最后总结了为什么应该鼓励概率方法来评估反兴奋剂调查领域的证据。
{"title":"Statistical Classification of Abnormal Blood Profiles in Athletes","authors":"P. Sottas, N. Robinson, S. Giraud, F. Taroni, M. Kamber, P. Mangin, M. Saugy","doi":"10.2202/1557-4679.1011","DOIUrl":"https://doi.org/10.2202/1557-4679.1011","url":null,"abstract":"Blood doping has been challenging the scientific community since the early 1970's, where it was demonstrated that blood transfusion significantly improves physical performance. Here, we present through 3 applications how statistical classification techniques can assist the implementation of effective tests to deter blood doping in elite sports. In particular, we developed a new indirect and universal test of blood doping, called Abnormal Blood Profile Score (ABPS), based on the statistical classification of indirect biomarkers of altered erythropoiesis. Up to 601 hematological profiles have been compiled in a reference database. Twenty-one of them were obtained from blood samples withdrawn from professional athletes convicted of blood doping by other direct tests. Discriminative training algorithms were used jointly with cross-validation techniques to map these labeled reference profiles to target outputs. The strict cross-validation procedure facilitates the adherence to medico-legal standards mandated by the World Anti Doping Agency (WADA). The test has a sensitivity to recombinant erythropoietin (rhEPO) abuse up to 3 times better than current generative models, independently whether the athlete is currently taking rhEPO or has stopped the treatment. The test is also sensitive to any form of blood transfusion, autologous transfusion included. We finally conclude why a probabilistic approach should be encouraged for the evaluation of evidence in anti-doping area of investigation.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"2 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1011","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68714923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Estimating a Survival Distribution with Current Status Data and High-dimensional Covariates 用当前状态数据和高维协变量估计生存分布
IF 1.2 4区 数学 Pub Date : 1900-01-01 DOI: 10.2202/1557-4679.1014
A. van der Vaart, M. J. van der Laan
We consider the inverse problem of estimating a survival distribution when the survival times are only observed to be in one of the intervals of a random bisection of the time axis. We are particularly interested in the case that high-dimensional and/or time-dependent covariates are available, and/or the survival events and censoring times are only conditionally independent given the covariate process. The method of estimation consists of regularizing the survival distribution by taking the primitive function or smoothing, estimating the regularized parameter by using estimating equations, and finally recovering an estimator for the parameter of interest.
我们考虑当生存时间仅在时间轴随机平分的一个区间内观察到时估计生存分布的反问题。我们对高维和/或时间相关协变量可用的情况特别感兴趣,并且/或生存事件和审查时间仅在给定协变量过程的条件下独立。估计方法包括采用原始函数或平滑对生存分布进行正则化,利用估计方程估计正则化后的参数,最后恢复目标参数的估计量。
{"title":"Estimating a Survival Distribution with Current Status Data and High-dimensional Covariates","authors":"A. van der Vaart, M. J. van der Laan","doi":"10.2202/1557-4679.1014","DOIUrl":"https://doi.org/10.2202/1557-4679.1014","url":null,"abstract":"We consider the inverse problem of estimating a survival distribution when the survival times are only observed to be in one of the intervals of a random bisection of the time axis. We are particularly interested in the case that high-dimensional and/or time-dependent covariates are available, and/or the survival events and censoring times are only conditionally independent given the covariate process. The method of estimation consists of regularizing the survival distribution by taking the primitive function or smoothing, estimating the regularized parameter by using estimating equations, and finally recovering an estimator for the parameter of interest.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"2 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1014","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68715195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
Statistical Inference for Variable Importance 变量重要性的统计推断
IF 1.2 4区 数学 Pub Date : 1900-01-01 DOI: 10.2202/1557-4679.1008
M. J. van der Laan
Many statistical problems involve the learning of an importance/effect of a variable for predicting an outcome of interest based on observing a sample of $n$ independent and identically distributed observations on a list of input variables and an outcome. For example, though prediction/machine learning is, in principle, concerned with learning the optimal unknown mapping from input variables to an outcome from the data, the typical reported output is a list of importance measures for each input variable. The approach in prediction has been to learn the unknown optimal predictor from the data and derive, for each of the input variables, the variable importance from the obtained fit. In this article we propose a new approach which involves for each variable separately 1) defining variable importance as a real valued parameter, 2) deriving the efficient influence curve and thereby optimal estimating function for this parameter in the assumed (possibly nonparametric) model, and 3) develop a corresponding double robust locally efficient estimator of this variable importance, obtained by substituting for the nuisance parameters in the optimal estimating function data adaptive estimators. We illustrate this methodology in the context of prediction, and obtain in this manner double robust locally optimal estimators of marginal variable importance, accompanied with p-values and confidence intervals. In addition, we present a model based and machine learning approach to estimate covariate-adjusted variable importance. Finally, we generalize this methodology to variable importance parameters for time-dependent variables.
许多统计问题涉及到学习变量的重要性/效果,以便根据在输入变量和结果列表上观察n个独立且相同分布的观察样本来预测感兴趣的结果。例如,虽然预测/机器学习原则上关注的是学习从输入变量到数据结果的最佳未知映射,但典型的报告输出是每个输入变量的重要性度量列表。预测的方法是从数据中学习未知的最优预测器,并从得到的拟合中导出每个输入变量的变量重要性。在本文中,我们提出了一种新的方法,它涉及到对每个变量分别1)将变量重要性定义为实值参数,2)推导有效影响曲线,从而在假设的(可能是非参数的)模型中对该参数进行最优估计函数,以及3)开发相应的双鲁棒局部有效估计该变量重要性。通过将最优估计函数中的扰值参数代入自适应估计器得到。我们在预测的背景下说明了这种方法,并以这种方式获得了边缘变量重要性的双鲁棒局部最优估计,伴随着p值和置信区间。此外,我们提出了一种基于模型和机器学习的方法来估计协变量调整后的变量重要性。最后,我们将此方法推广到时间相关变量的可变重要参数。
{"title":"Statistical Inference for Variable Importance","authors":"M. J. van der Laan","doi":"10.2202/1557-4679.1008","DOIUrl":"https://doi.org/10.2202/1557-4679.1008","url":null,"abstract":"Many statistical problems involve the learning of an importance/effect of a variable for predicting an outcome of interest based on observing a sample of $n$ independent and identically distributed observations on a list of input variables and an outcome. For example, though prediction/machine learning is, in principle, concerned with learning the optimal unknown mapping from input variables to an outcome from the data, the typical reported output is a list of importance measures for each input variable. The approach in prediction has been to learn the unknown optimal predictor from the data and derive, for each of the input variables, the variable importance from the obtained fit. In this article we propose a new approach which involves for each variable separately 1) defining variable importance as a real valued parameter, 2) deriving the efficient influence curve and thereby optimal estimating function for this parameter in the assumed (possibly nonparametric) model, and 3) develop a corresponding double robust locally efficient estimator of this variable importance, obtained by substituting for the nuisance parameters in the optimal estimating function data adaptive estimators. We illustrate this methodology in the context of prediction, and obtain in this manner double robust locally optimal estimators of marginal variable importance, accompanied with p-values and confidence intervals. In addition, we present a model based and machine learning approach to estimate covariate-adjusted variable importance. Finally, we generalize this methodology to variable importance parameters for time-dependent variables.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"2 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1008","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68714461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 163
Properties of the Projected Length of the Curve (PLC) and Area Swept out by the Curve (ASC) Indices for the Receiver Operating Characteristic (SROC) Curve 曲线的投影长度(PLC)和被曲线扫过的面积(ASC)的性质。接收者工作特性(SROC)曲线的指标
IF 1.2 4区 数学 Pub Date : 1900-01-01 DOI: 10.2202/1557-4679.1096
Xuan Zhang, S. Walter, R. Agnihotram
Several measures have been proposed to summarize the Receiver Operating Characteristic (ROC) curve, including the Projected Length of the Curve (PLC) and the Area Swept out by the Curve (ASC). These indices were first proposed by Lee (Epidemiology 1996; 7:605-611) to avoid certain deficiencies of the traditional Area Under the Curve (AUC) summary measure. More recently meta-analysis methods for assessing diagnostic test accuracy have been developed and the Summary Receiver Operating Characteristic (SROC) curve has been recommended to represent the performance of a diagnostic test. Some properties of the SROC curve were discussed by Walter (Statist. Med. 2002; 21:1237-1256). Here we extend that work to focus on properties of PLC and ASC in the context of SROC curve. Mathematical expressions for these two indices and their variances are derived in terms of the overall diagnostic odds ratio and the magnitude of inter-study heterogeneity in the odds ratio. Expressions for PLC and ASC and their variances are easily computed in homogeneous studies, and their values provide good approximations to the corresponding values for heterogeneous studies in most practical situations. General variances of PLC and ASC are derived by using delta methods, and are found to be smaller if the odds ratio is large. The methods are illustrated using data from two studies, the first being a meta-analysis on the detection of metastases in cervical cancer patients, and the second being a single study of HPV infection and pre-invasive cervical lesions.
提出了几种方法来总结受试者工作特征(ROC)曲线,包括曲线的投影长度(PLC)和曲线扫过的面积(ASC)。这些指标最早由Lee (Epidemiology 1996;7:605-611),以避免传统的曲线下面积(AUC)汇总测量的某些缺陷。最近,用于评估诊断测试准确性的荟萃分析方法得到了发展,并推荐了总接受者工作特征(SROC)曲线来表示诊断测试的性能。Walter (Statist)讨论了SROC曲线的一些性质。医学。2002;21:1237 - 1256)。在这里,我们将这项工作扩展到关注PLC和ASC在SROC曲线背景下的特性。这两个指标及其方差的数学表达式是根据总体诊断优势比和优势比中研究间异质性的大小推导出来的。PLC和ASC的表达式及其方差很容易在同质研究中计算出来,在大多数实际情况下,它们的值很好地近似于异质研究的相应值。PLC和ASC的一般方差是通过使用delta方法推导出来的,如果比值比较大,则发现方差较小。这些方法是用两项研究的数据来说明的,第一个是关于宫颈癌患者转移检测的荟萃分析,第二个是关于HPV感染和侵袭前宫颈病变的单一研究。
{"title":"Properties of the Projected Length of the Curve (PLC) and Area Swept out by the Curve (ASC) Indices for the Receiver Operating Characteristic (SROC) Curve","authors":"Xuan Zhang, S. Walter, R. Agnihotram","doi":"10.2202/1557-4679.1096","DOIUrl":"https://doi.org/10.2202/1557-4679.1096","url":null,"abstract":"Several measures have been proposed to summarize the Receiver Operating Characteristic (ROC) curve, including the Projected Length of the Curve (PLC) and the Area Swept out by the Curve (ASC). These indices were first proposed by Lee (Epidemiology 1996; 7:605-611) to avoid certain deficiencies of the traditional Area Under the Curve (AUC) summary measure. More recently meta-analysis methods for assessing diagnostic test accuracy have been developed and the Summary Receiver Operating Characteristic (SROC) curve has been recommended to represent the performance of a diagnostic test. Some properties of the SROC curve were discussed by Walter (Statist. Med. 2002; 21:1237-1256). Here we extend that work to focus on properties of PLC and ASC in the context of SROC curve. Mathematical expressions for these two indices and their variances are derived in terms of the overall diagnostic odds ratio and the magnitude of inter-study heterogeneity in the odds ratio. Expressions for PLC and ASC and their variances are easily computed in homogeneous studies, and their values provide good approximations to the corresponding values for heterogeneous studies in most practical situations. General variances of PLC and ASC are derived by using delta methods, and are found to be smaller if the odds ratio is large. The methods are illustrated using data from two studies, the first being a meta-analysis on the detection of metastases in cervical cancer patients, and the second being a single study of HPV infection and pre-invasive cervical lesions.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"5 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1096","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68715363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Biostatistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1