Likelihood-based inference for epidemic models can be challenging, in part due to difficulties in evaluating the likelihood. The problem is particularly acute in models of large-scale outbreaks, and unobserved or partially observed data further complicates this process. Here we investigate the performance of Markov Chain Monte Carlo and Sequential Monte Carlo algorithms for parameter inference, where the routines are based on approximate likelihoods generated from model simulations. We compare our results to a gold-standard data-augmented MCMC for both complete and incomplete data. We illustrate our techniques using simulated epidemics as well as data from a recent outbreak of Ebola Haemorrhagic Fever in the Democratic Republic of Congo and discuss situations in which we think simulation-based inference may be preferable to likelihood-based inference.
{"title":"Inference in Epidemic Models without Likelihoods","authors":"T. McKinley, A. Cook, R. Deardon","doi":"10.2202/1557-4679.1171","DOIUrl":"https://doi.org/10.2202/1557-4679.1171","url":null,"abstract":"Likelihood-based inference for epidemic models can be challenging, in part due to difficulties in evaluating the likelihood. The problem is particularly acute in models of large-scale outbreaks, and unobserved or partially observed data further complicates this process. Here we investigate the performance of Markov Chain Monte Carlo and Sequential Monte Carlo algorithms for parameter inference, where the routines are based on approximate likelihoods generated from model simulations. We compare our results to a gold-standard data-augmented MCMC for both complete and incomplete data. We illustrate our techniques using simulated epidemics as well as data from a recent outbreak of Ebola Haemorrhagic Fever in the Democratic Republic of Congo and discuss situations in which we think simulation-based inference may be preferable to likelihood-based inference.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"5 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2009-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1171","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68716137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To select items from a uni-dimensional scale to create a reduced scale for disease screening, Liu and Jin (2007) developed a non-parametric method based on binary risk classification. When the measure for the risk of a disease is ordinal or quantitative, and possibly subject to random censoring, this method is inefficient because it requires dichotomizing the risk measure, which may cause information loss and sample size reduction. In this paper, we modify Harrell's C-index (1984) such that the concordance probability, used as a measure of the discrimination accuracy of a scale with integer valued scores, can be estimated consistently when data are subject to random censoring. By evaluating changes in discrimination accuracy with the addition or deletion of items, we can select risk-related items without specifying parametric models. The procedure first removes the least useful items from the full scale, then, applies forward stepwise selection to the remaining items to obtain a reduced scale whose discrimination accuracy matches or exceeds that of the full scale. A simulation study shows the procedure to have good finite sample performance. We illustrate the method using a data set of patients at risk of developing Alzheimer's disease, who were administered a 40-item test of olfactory function before their semi-annual follow-up assessment.
{"title":"A Non-Parametric Approach to Scale Reduction for Uni-Dimensional Screening Scales","authors":"Xinhua Liu, Zhezhen Jin","doi":"10.2202/1557-4679.1094","DOIUrl":"https://doi.org/10.2202/1557-4679.1094","url":null,"abstract":"To select items from a uni-dimensional scale to create a reduced scale for disease screening, Liu and Jin (2007) developed a non-parametric method based on binary risk classification. When the measure for the risk of a disease is ordinal or quantitative, and possibly subject to random censoring, this method is inefficient because it requires dichotomizing the risk measure, which may cause information loss and sample size reduction. In this paper, we modify Harrell's C-index (1984) such that the concordance probability, used as a measure of the discrimination accuracy of a scale with integer valued scores, can be estimated consistently when data are subject to random censoring. By evaluating changes in discrimination accuracy with the addition or deletion of items, we can select risk-related items without specifying parametric models. The procedure first removes the least useful items from the full scale, then, applies forward stepwise selection to the remaining items to obtain a reduced scale whose discrimination accuracy matches or exceeds that of the full scale. A simulation study shows the procedure to have good finite sample performance. We illustrate the method using a data set of patients at risk of developing Alzheimer's disease, who were administered a 40-item test of olfactory function before their semi-annual follow-up assessment.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"23 1","pages":"1-22"},"PeriodicalIF":1.2,"publicationDate":"2009-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1094","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68715807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Introduction. When faced with a medical classification, clinicians often rank-order the likelihood of potential diagnoses, treatment choices, or prognoses as a way to focus on likely occurrences without dropping rarer ones from consideration. To know how well clinicians agree on such rankings might help extend the realm of clinical judgment farther into the purview of evidence-based medicine. If rankings by different clinicians agree better than chance, the order of assignments and their relative likelihoods may justifiably contribute to medical decisions. If the agreement is no better than chance, the ranking should not influence the medical decision.Background. Available rank-order methods measure agreement over a set of decision choices by two rankers or by a set of rankers over two choices (rank correlation methods), or an overall agreement over a set of choices by a set of rankers (Kendall's W), but will not measure agreement about a single decision choice across a set of rankers. Rating methods (e.g. kappa) assign multiple subjects to nominal categories rather than ranking possible choices about a single subject and will not measure agreement about a single decision choice across a set of rankers.Method. In this article, we pose an agreement coefficient A for measuring agreement among a set of clinicians about a single decision choice and compare several potential forms of A. A takes on the value 0 when agreement is random and 1 when agreement is perfect. It is shown that A = 1 - observed disagreement/maximum disagreement. A particular form of A is recommended and tables of 5% and 10% significant values of A are generated for common numbers of ranks and rankers.Examples. In the selection of potential treatment assignments by a Tumor Board to a patient with a neck mass, there is no significant agreement about any treatment. Another example involves ranking decisions about a proposed medical research protocol by an Institutional Review Board (IRB). The decision to pass a protocol with minor revisions shows agreement at the 5% significance level, adequate for a consistent decision.
{"title":"Measuring Agreement about Ranked Decision Choices for a Single Subject","authors":"R. Riffenburgh, P. Johnstone","doi":"10.2202/1557-4679.1113","DOIUrl":"https://doi.org/10.2202/1557-4679.1113","url":null,"abstract":"Introduction. When faced with a medical classification, clinicians often rank-order the likelihood of potential diagnoses, treatment choices, or prognoses as a way to focus on likely occurrences without dropping rarer ones from consideration. To know how well clinicians agree on such rankings might help extend the realm of clinical judgment farther into the purview of evidence-based medicine. If rankings by different clinicians agree better than chance, the order of assignments and their relative likelihoods may justifiably contribute to medical decisions. If the agreement is no better than chance, the ranking should not influence the medical decision.Background. Available rank-order methods measure agreement over a set of decision choices by two rankers or by a set of rankers over two choices (rank correlation methods), or an overall agreement over a set of choices by a set of rankers (Kendall's W), but will not measure agreement about a single decision choice across a set of rankers. Rating methods (e.g. kappa) assign multiple subjects to nominal categories rather than ranking possible choices about a single subject and will not measure agreement about a single decision choice across a set of rankers.Method. In this article, we pose an agreement coefficient A for measuring agreement among a set of clinicians about a single decision choice and compare several potential forms of A. A takes on the value 0 when agreement is random and 1 when agreement is perfect. It is shown that A = 1 - observed disagreement/maximum disagreement. A particular form of A is recommended and tables of 5% and 10% significant values of A are generated for common numbers of ranks and rankers.Examples. In the selection of potential treatment assignments by a Tumor Board to a patient with a neck mass, there is no significant agreement about any treatment. Another example involves ranking decisions about a proposed medical research protocol by an Institutional Review Board (IRB). The decision to pass a protocol with minor revisions shows agreement at the 5% significance level, adequate for a consistent decision.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"47 47 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1113","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68715496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It has long been recognized that covariate adjustment can increase precision in randomized experiments, even when it is not strictly necessary. Adjustment is often straightforward when a discrete covariate partitions the sample into a handful of strata, but becomes more involved with even a single continuous covariate such as age. As randomized experiments remain a gold standard for scientific inquiry, and the information age facilitates a massive collection of baseline information, the longstanding problem of if and how to adjust for covariates is likely to engage investigators for the foreseeable future.In the locally efficient estimation approach introduced for general coarsened data structures by James Robins and collaborators, one first fits a relatively small working model, often with maximum likelihood, giving a nuisance parameter fit in an estimating equation for the parameter of interest. The usual advertisement is that the estimator will be asymptotically efficient if the working model is correct, but otherwise will still be consistent and asymptotically Gaussian.However, by applying standard likelihood-based fits to misspecified working models in covariate adjustment problems, one can poorly estimate the parameter of interest. We propose a new method, empirical efficiency maximization, to optimize the working model fit for the resulting parameter estimate.In addition to the randomized experiment setting, we show how our covariate adjustment procedure can be used in survival analysis applications. Numerical asymptotic efficiency calculations demonstrate gains relative to standard locally efficient estimators.
{"title":"Empirical Efficiency Maximization: Improved Locally Efficient Covariate Adjustment in Randomized Experiments and Survival Analysis","authors":"D. Rubin, M. J. van der Laan","doi":"10.2202/1557-4679.1084","DOIUrl":"https://doi.org/10.2202/1557-4679.1084","url":null,"abstract":"It has long been recognized that covariate adjustment can increase precision in randomized experiments, even when it is not strictly necessary. Adjustment is often straightforward when a discrete covariate partitions the sample into a handful of strata, but becomes more involved with even a single continuous covariate such as age. As randomized experiments remain a gold standard for scientific inquiry, and the information age facilitates a massive collection of baseline information, the longstanding problem of if and how to adjust for covariates is likely to engage investigators for the foreseeable future.In the locally efficient estimation approach introduced for general coarsened data structures by James Robins and collaborators, one first fits a relatively small working model, often with maximum likelihood, giving a nuisance parameter fit in an estimating equation for the parameter of interest. The usual advertisement is that the estimator will be asymptotically efficient if the working model is correct, but otherwise will still be consistent and asymptotically Gaussian.However, by applying standard likelihood-based fits to misspecified working models in covariate adjustment problems, one can poorly estimate the parameter of interest. We propose a new method, empirical efficiency maximization, to optimize the working model fit for the resulting parameter estimate.In addition to the randomized experiment setting, we show how our covariate adjustment procedure can be used in survival analysis applications. Numerical asymptotic efficiency calculations demonstrate gains relative to standard locally efficient estimators.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"13 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2008-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1084","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68715781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Pinsky, Ruth Etzioni, N. Howlader, P. Goodman, I. Thompson
The Prostate Cancer Prevention Trial (PCPT) recently demonstrated a significant reduction in prostate cancer incidence of about 25% among men taking finasteride compared to men taking placebo. However, the effect of finasteride on the natural history of prostate cancer is not well understood. We adapted a convolution model developed by Pinsky (2001) to characterize the natural history of prostate cancer in the presence and absence of finasteride. The model was applied to data from 10,995 men in PCPT who had disease status determined by interim diagnosis of prostate cancer or end-of-study biopsy. Prostate cancer cases were either screen-detected by Prostate-Specific Antigen (PSA), biopsy-detected at the end of the study, or clinically detected, that is, detected by methods other than PSA screening. The hazard ratio (HR) for the incidence of preclinical disease on finasteride versus placebo was 0.42 (95% CI: 0.20-0.58). The progression from preclinical to clinical disease was relatively unaffected by finasteride, with mean sojourn time being 16 years for placebo cases and 18.5 years for finasteride cases (p-value for difference = 0.2). We conclude that finasteride appears to affect prostate cancer primarily by preventing the emergence of new, preclinical tumors with little impact on established, latent disease.
{"title":"Modeling the Effect of a Preventive Intervention on the Natural History of Cancer: Application to the Prostate Cancer Prevention Trial","authors":"P. Pinsky, Ruth Etzioni, N. Howlader, P. Goodman, I. Thompson","doi":"10.2202/1557-4679.1036","DOIUrl":"https://doi.org/10.2202/1557-4679.1036","url":null,"abstract":"The Prostate Cancer Prevention Trial (PCPT) recently demonstrated a significant reduction in prostate cancer incidence of about 25% among men taking finasteride compared to men taking placebo. However, the effect of finasteride on the natural history of prostate cancer is not well understood. We adapted a convolution model developed by Pinsky (2001) to characterize the natural history of prostate cancer in the presence and absence of finasteride. The model was applied to data from 10,995 men in PCPT who had disease status determined by interim diagnosis of prostate cancer or end-of-study biopsy. Prostate cancer cases were either screen-detected by Prostate-Specific Antigen (PSA), biopsy-detected at the end of the study, or clinically detected, that is, detected by methods other than PSA screening. The hazard ratio (HR) for the incidence of preclinical disease on finasteride versus placebo was 0.42 (95% CI: 0.20-0.58). The progression from preclinical to clinical disease was relatively unaffected by finasteride, with mean sojourn time being 16 years for placebo cases and 18.5 years for finasteride cases (p-value for difference = 0.2). We conclude that finasteride appears to affect prostate cancer primarily by preventing the emergence of new, preclinical tumors with little impact on established, latent disease.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"2 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2006-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1036","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68715671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Suppose one observes a sample of independent and identically distributed observations from a particular data generating distribution. Suppose that one is concerned with estimation of a particular pathwise differentiable Euclidean parameter. A substitution estimator evaluating the parameter of a given likelihood based density estimator is typically too biased and might not even converge at the parametric rate: that is, the density estimator was targeted to be a good estimator of the density and might therefore result in a poor estimator of a particular smooth functional of the density. In this article we propose a one step (and, by iteration, k-th step) targeted maximum likelihood density estimator which involves 1) creating a hardest parametric submodel with parameter epsilon through the given density estimator with score equal to the efficient influence curve of the pathwise differentiable parameter at the density estimator, 2) estimating epsilon with the maximum likelihood estimator, and 3) defining a new density estimator as the corresponding update of the original density estimator. We show that iteration of this algorithm results in a targeted maximum likelihood density estimator which solves the efficient influence curve estimating equation and thereby yields a locally efficient estimator of the parameter of interest, under regularity conditions. In particular, we show that, if the parameter is linear and the model is convex, then the targeted maximum likelihood estimator is often achieved in the first step, and it results in a locally efficient estimator at an arbitrary (e.g., heavily misspecified) starting density.We also show that the targeted maximum likelihood estimators are now in full agreement with the locally efficient estimating function methodology as presented in Robins and Rotnitzky (1992) and van der Laan and Robins (2003), creating, in particular, algebraic equivalence between the double robust locally efficient estimators using the targeted maximum likelihood estimators as an estimate of its nuisance parameters, and targeted maximum likelihood estimators. In addition, it is argued that the targeted MLE has various advantages relative to the current estimating function based approach. We proceed by providing data driven methodologies to select the initial density estimator for the targeted MLE, thereby providing data adaptive targeted maximum likelihood estimation methodology. We illustrate the method with various worked out examples.
假设从一个特定的数据生成分布中观察到一个独立且相同分布的观察样本。假设我们关心的是一个特定的路径可微欧几里得参数的估计。评估给定的基于似然的密度估计器的参数的替代估计器通常过于偏倚,甚至可能不会以参数速率收敛:也就是说,密度估计器的目标是成为密度的良好估计器,因此可能导致密度的特定光滑泛函的差估计器。在本文中,我们提出了一个一步(通过迭代,第k步)目标最大似然密度估计器,它涉及1)通过给定的密度估计器创建参数为epsilon的最难参数子模型,其得分等于密度估计器处路径可微参数的有效影响曲线,2)用最大似然估计器估计epsilon,3)定义一个新的密度估计量作为对原有密度估计量的相应更新。我们证明了该算法的迭代产生了一个目标最大似然密度估计量,它解决了有效的影响曲线估计方程,从而在正则性条件下产生了感兴趣参数的局部有效估计量。特别是,我们表明,如果参数是线性的,模型是凸的,那么目标最大似然估计器通常在第一步就能实现,并且它会在任意(例如,严重错误指定)的起始密度下产生局部有效的估计器。我们还表明,目标最大似然估计量现在与Robins和Rotnitzky(1992)以及van der Laan和Robins(2003)中提出的局部有效估计函数方法完全一致,特别是在使用目标最大似然估计量作为其讨厌参数的估计的双鲁棒局部有效估计量和目标最大似然估计量之间创建了代数等价。此外,本文还认为,相对于目前基于函数的估计方法,目标最大似然算法具有多种优势。我们通过提供数据驱动的方法来选择目标最大似然估计的初始密度估计量,从而提供数据自适应的目标最大似然估计方法。我们用各种算例来说明该方法。
{"title":"Targeted Maximum Likelihood Learning","authors":"M. J. van der Laan, D. Rubin","doi":"10.2202/1557-4679.1043","DOIUrl":"https://doi.org/10.2202/1557-4679.1043","url":null,"abstract":"Suppose one observes a sample of independent and identically distributed observations from a particular data generating distribution. Suppose that one is concerned with estimation of a particular pathwise differentiable Euclidean parameter. A substitution estimator evaluating the parameter of a given likelihood based density estimator is typically too biased and might not even converge at the parametric rate: that is, the density estimator was targeted to be a good estimator of the density and might therefore result in a poor estimator of a particular smooth functional of the density. In this article we propose a one step (and, by iteration, k-th step) targeted maximum likelihood density estimator which involves 1) creating a hardest parametric submodel with parameter epsilon through the given density estimator with score equal to the efficient influence curve of the pathwise differentiable parameter at the density estimator, 2) estimating epsilon with the maximum likelihood estimator, and 3) defining a new density estimator as the corresponding update of the original density estimator. We show that iteration of this algorithm results in a targeted maximum likelihood density estimator which solves the efficient influence curve estimating equation and thereby yields a locally efficient estimator of the parameter of interest, under regularity conditions. In particular, we show that, if the parameter is linear and the model is convex, then the targeted maximum likelihood estimator is often achieved in the first step, and it results in a locally efficient estimator at an arbitrary (e.g., heavily misspecified) starting density.We also show that the targeted maximum likelihood estimators are now in full agreement with the locally efficient estimating function methodology as presented in Robins and Rotnitzky (1992) and van der Laan and Robins (2003), creating, in particular, algebraic equivalence between the double robust locally efficient estimators using the targeted maximum likelihood estimators as an estimate of its nuisance parameters, and targeted maximum likelihood estimators. In addition, it is argued that the targeted MLE has various advantages relative to the current estimating function based approach. We proceed by providing data driven methodologies to select the initial density estimator for the targeted MLE, thereby providing data adaptive targeted maximum likelihood estimation methodology. We illustrate the method with various worked out examples.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"2 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2006-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1043","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68715717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Optimal designs of dose levels in order to estimate parameters from a model for binary response data have a long and rich history. These designs are based on parametric models. Here we consider fully nonparametric models with interest focused on estimation of smooth functionals using plug-in estimators based on the nonparametric maximum likelihood estimator. An important application of the results is the derivation of the optimal choice of the monitoring time distribution function for current status observation of a survival distribution. The optimal choice depends in a simple way on the dose-response function and the form of the functional. The results can be extended to allow dependence of the monitoring mechanism on covariates.
{"title":"Choice of Monitoring Mechanism for Optimal Nonparametric Functional Estimation for Binary Data","authors":"N. Jewell, M. J. van der Laan, S. Shiboski","doi":"10.2202/1557-4679.1031","DOIUrl":"https://doi.org/10.2202/1557-4679.1031","url":null,"abstract":"Optimal designs of dose levels in order to estimate parameters from a model for binary response data have a long and rich history. These designs are based on parametric models. Here we consider fully nonparametric models with interest focused on estimation of smooth functionals using plug-in estimators based on the nonparametric maximum likelihood estimator. An important application of the results is the derivation of the optimal choice of the monitoring time distribution function for current status observation of a survival distribution. The optimal choice depends in a simple way on the dose-response function and the form of the functional. The results can be extended to allow dependence of the monitoring mechanism on covariates.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"2 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2006-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68715343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Van der Laan (2005) proposed a targeted method used to construct variable importance measures coupled with respective statistical inference. This technique involves determining the importance of a variable in predicting an outcome. This method can be applied as inverse probability of treatment weighted (IPTW) or double robust inverse probability of treatment weighted (DR-IPTW) estimators. The variance and respective p-value of the estimate are calculated by estimating the influence curve. This article applies the Van der Laan (2005) variable importance measures and corresponding inference to HIV-1 sequence data. In this application, the method is targeted at every codon position. In this data application, protease and reverse transcriptase codon positions on the HIV-1 strand are assessed to determine their respective variable importance, with respect to an outcome of viral replication capacity. We estimate the DR-IPTW W-adjusted variable importance measure for a specified set of potential effect modifiers W. In addition, simulations were performed on two separate datasets to examine the DR-IPTW estimator.
Van der Laan(2005)提出了一种有针对性的方法,用于构建变量重要性度量,并结合各自的统计推断。这项技术包括确定变量在预测结果中的重要性。该方法可应用于加权处理逆概率估计(IPTW)或双鲁棒加权处理逆概率估计(DR-IPTW)。通过估计影响曲线来计算估计的方差和各自的p值。本文将Van der Laan(2005)变量重要性度量和相应的推断应用于HIV-1序列数据。在本应用中,该方法针对每个密码子位置。在此数据应用中,评估了HIV-1链上蛋白酶和逆转录酶密码子的位置,以确定它们各自的变量重要性,以及病毒复制能力的结果。我们估计了DR-IPTW w调整后的变量重要性测量值对一组特定的潜在效应修饰因子w的影响。此外,在两个独立的数据集上进行了模拟,以检验DR-IPTW估计器。
{"title":"Application of a Variable Importance Measure Method","authors":"M. Birkner, M. J. van der Laan","doi":"10.2202/1557-4679.1013","DOIUrl":"https://doi.org/10.2202/1557-4679.1013","url":null,"abstract":"Van der Laan (2005) proposed a targeted method used to construct variable importance measures coupled with respective statistical inference. This technique involves determining the importance of a variable in predicting an outcome. This method can be applied as inverse probability of treatment weighted (IPTW) or double robust inverse probability of treatment weighted (DR-IPTW) estimators. The variance and respective p-value of the estimate are calculated by estimating the influence curve. This article applies the Van der Laan (2005) variable importance measures and corresponding inference to HIV-1 sequence data. In this application, the method is targeted at every codon position. In this data application, protease and reverse transcriptase codon positions on the HIV-1 strand are assessed to determine their respective variable importance, with respect to an outcome of viral replication capacity. We estimate the DR-IPTW W-adjusted variable importance measure for a specified set of potential effect modifiers W. In addition, simulations were performed on two separate datasets to examine the DR-IPTW estimator.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"2 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2006-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1013","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68715116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Comparing two large multivariate distributions is potentially complicated at least for the following reasons. First, some variable/level combinations may have a redundant difference in prevalence between groups in the sense that the difference can be completely explained in terms of lower-order combinations. Second, the total number of variable/level combinations to compare between groups is very large, and likely computationally prohibitive. In this paper, for both the paired and independent sample case, an approximate comparison method is proposed, along with a computationally efficient algorithm, that estimates the set of variable/level combinations that have a non-redundant different prevalence between two populations. The probability that the estimate contains one or more false or redundant differences is asymptotically bounded above by any pre-specified level for arbitrary data-generating distributions. The method is shown to perform well for finite samples in a simulation study, and is used to investigate HIV-1 genotype evolution in a recent AIDS clinical trial.
{"title":"The Two Sample Problem for Multiple Categorical Variables","authors":"A. DiRienzo","doi":"10.2202/1557-4679.1019","DOIUrl":"https://doi.org/10.2202/1557-4679.1019","url":null,"abstract":"Comparing two large multivariate distributions is potentially complicated at least for the following reasons. First, some variable/level combinations may have a redundant difference in prevalence between groups in the sense that the difference can be completely explained in terms of lower-order combinations. Second, the total number of variable/level combinations to compare between groups is very large, and likely computationally prohibitive. In this paper, for both the paired and independent sample case, an approximate comparison method is proposed, along with a computationally efficient algorithm, that estimates the set of variable/level combinations that have a non-redundant different prevalence between two populations. The probability that the estimate contains one or more false or redundant differences is asymptotically bounded above by any pre-specified level for arbitrary data-generating distributions. The method is shown to perform well for finite samples in a simulation study, and is used to investigate HIV-1 genotype evolution in a recent AIDS clinical trial.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"2 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2006-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1019","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68715010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper develops empirical likelihood based simultaneous confidence bands for differences and ratios of two distribution functions from independent samples of right-censored survival data. The proposed confidence bands provide a flexible way of comparing treatments in biomedical settings, and bring empirical likelihood methods to bear on important target functions for which only Wald-type confidence bands have been available in the literature. The approach is illustrated with a real data example.
{"title":"Comparing Distribution Functions via Empirical Likelihood","authors":"I. McKeague, Yichuan Zhao","doi":"10.2202/1557-4679.1007","DOIUrl":"https://doi.org/10.2202/1557-4679.1007","url":null,"abstract":"This paper develops empirical likelihood based simultaneous confidence bands for differences and ratios of two distribution functions from independent samples of right-censored survival data. The proposed confidence bands provide a flexible way of comparing treatments in biomedical settings, and bring empirical likelihood methods to bear on important target functions for which only Wald-type confidence bands have been available in the literature. The approach is illustrated with a real data example.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"1 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2006-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2202/1557-4679.1007","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68714433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}