首页 > 最新文献

American Statistician最新文献

英文 中文
P-Value Precision and Reproducibility. p值精密度和再现性。
IF 1.8 4区 数学 Q1 Mathematics Pub Date : 2011-01-01 Epub Date: 2012-01-24 DOI: 10.1198/tas.2011.10129
Dennis D Boos, Leonard A Stefanski

P-values are useful statistical measures of evidence against a null hypothesis. In contrast to other statistical estimates, however, their sample-to-sample variability is usually not considered or estimated, and therefore not fully appreciated. Via a systematic study of log-scale p-value standard errors, bootstrap prediction bounds, and reproducibility probabilities for future replicate p-values, we show that p-values exhibit surprisingly large variability in typical data situations. In addition to providing context to discussions about the failure of statistical results to replicate, our findings shed light on the relative value of exact p-values vis-a-vis approximate p-values, and indicate that the use of *, **, and *** to denote levels .05, .01, and .001 of statistical significance in subject-matter journals is about the right level of precision for reporting p-values when judged by widely accepted rules for rounding statistical estimates.

p值是有用的反零假设证据的统计度量。然而,与其他统计估计相比,它们的样本间变异性通常没有被考虑或估计,因此没有得到充分的认识。通过对对数尺度p值标准误差、自举预测界限和未来重复p值的可重复性概率的系统研究,我们表明p值在典型数据情况下表现出惊人的大变异性。除了为统计结果无法复制的讨论提供背景之外,我们的研究结果还揭示了精确p值与近似p值的相对价值,并表明使用*、**和***来表示主题期刊中统计显著性的0.05、0.01和0.001水平,根据广泛接受的四舍五入统计估计规则判断,这是报告p值的正确精度水平。
{"title":"P-Value Precision and Reproducibility.","authors":"Dennis D Boos,&nbsp;Leonard A Stefanski","doi":"10.1198/tas.2011.10129","DOIUrl":"https://doi.org/10.1198/tas.2011.10129","url":null,"abstract":"<p><p>P-values are useful statistical measures of evidence against a null hypothesis. In contrast to other statistical estimates, however, their sample-to-sample variability is usually not considered or estimated, and therefore not fully appreciated. Via a systematic study of log-scale p-value standard errors, bootstrap prediction bounds, and reproducibility probabilities for future replicate p-values, we show that p-values exhibit surprisingly large variability in typical data situations. In addition to providing context to discussions about the failure of statistical results to replicate, our findings shed light on the relative value of exact p-values vis-a-vis approximate p-values, and indicate that the use of *, **, and *** to denote levels .05, .01, and .001 of statistical significance in subject-matter journals is about the right level of precision for reporting p-values when judged by widely accepted rules for rounding statistical estimates.</p>","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"65 4","pages":"213-221"},"PeriodicalIF":1.8,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1198/tas.2011.10129","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30684527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 178
A Note on Comparing the Power of Test Statistics at Low Significance Levels. 关于在低显著性水平下比较检验统计量的能力的注释。
IF 1.8 4区 数学 Q1 Mathematics Pub Date : 2011-01-01 DOI: 10.1198/tast.2011.10117
Nathan Morris, Robert Elston

It is an obvious fact that the power of a test statistic is dependent upon the significance (alpha) level at which the test is performed. It is perhaps a less obvious fact that the relative performance of two statistics in terms of power is also a function of the alpha level. Through numerous personal discussions, we have noted that even some competent statisticians have the mistaken intuition that relative power comparisons at traditional levels such as α = 0.05 will be roughly similar to relative power comparisons at very low levels, such as the level α = 5 × 10-8, which is commonly used in genome-wide association studies. In this brief note, we demonstrate that this notion is in fact quite wrong, especially with respect to comparing tests with differing degrees of freedom. In fact, at very low alpha levels the cost of additional degrees of freedom is often comparatively low. Thus we recommend that statisticians exercise caution when interpreting the results of power comparison studies which use alpha levels that will not be used in practice.

一个明显的事实是,检验统计量的能力取决于进行检验时的显著性(alpha)水平。可能不太明显的事实是,两个统计数据在功率方面的相对表现也是alpha水平的函数。通过大量的个人讨论,我们注意到,即使是一些有能力的统计学家也有错误的直觉,认为传统水平(如α = 0.05)的相对功率比较与非常低水平(如α = 5 × 10-8水平)的相对功率比较大致相似,这是全基因组关联研究中常用的水平。在这个简短的说明中,我们证明这种观念实际上是完全错误的,特别是在比较具有不同自由度的测试方面。事实上,在非常低的alpha水平下,额外自由度的成本通常相对较低。因此,我们建议统计学家在解释使用alpha水平的功率比较研究结果时要谨慎,因为alpha水平不会在实践中使用。
{"title":"A Note on Comparing the Power of Test Statistics at Low Significance Levels.","authors":"Nathan Morris,&nbsp;Robert Elston","doi":"10.1198/tast.2011.10117","DOIUrl":"https://doi.org/10.1198/tast.2011.10117","url":null,"abstract":"<p><p>It is an obvious fact that the power of a test statistic is dependent upon the significance (alpha) level at which the test is performed. It is perhaps a less obvious fact that the <i>relative</i> performance of two statistics in terms of power is also a function of the alpha level. Through numerous personal discussions, we have noted that even some competent statisticians have the mistaken intuition that relative power comparisons at traditional levels such as <i>α</i> = 0.05 will be roughly similar to relative power comparisons at very low levels, such as the level <i>α</i> = 5 × 10<sup>-8</sup>, which is commonly used in genome-wide association studies. In this brief note, we demonstrate that this notion is in fact quite wrong, especially with respect to comparing tests with differing degrees of freedom. In fact, at very low alpha levels the cost of additional degrees of freedom is often comparatively low. Thus we recommend that statisticians exercise caution when interpreting the results of power comparison studies which use alpha levels that will not be used in practice.</p>","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"65 3","pages":""},"PeriodicalIF":1.8,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1198/tast.2011.10117","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31964197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Consistency of Normal Distribution Based Pseudo Maximum Likelihood Estimates When Data Are Missing at Random. 数据随机缺失时基于正态分布的伪极大似然估计的一致性。
IF 1.8 4区 数学 Q1 Mathematics Pub Date : 2010-08-01 DOI: 10.1198/tast.2010.09203
Ke-Hai Yuan, Peter M Bentler

This paper shows that, when variables with missing values are linearly related to observed variables, the normal-distribution-based pseudo MLEs are still consistent. The population distribution may be unknown while the missing data process can follow an arbitrary missing at random mechanism. Enough details are provided for the bivariate case so that readers having taken a course in statistics/probability can fully understand the development. Sufficient conditions for the consistency of the MLEs in higher dimensions are also stated, while the details are omitted.

本文表明,当缺失值变量与观测值线性相关时,基于正态分布的伪最大似然值仍然是一致的。总体分布可能是未知的,而数据丢失过程可能遵循任意丢失和随机机制。为二元情况提供了足够的细节,以便参加过统计/概率课程的读者可以充分理解其发展。文中还给出了mle在高维上一致性的充分条件,但省略了细节。
{"title":"Consistency of Normal Distribution Based Pseudo Maximum Likelihood Estimates When Data Are Missing at Random.","authors":"Ke-Hai Yuan,&nbsp;Peter M Bentler","doi":"10.1198/tast.2010.09203","DOIUrl":"https://doi.org/10.1198/tast.2010.09203","url":null,"abstract":"<p><p>This paper shows that, when variables with missing values are linearly related to observed variables, the normal-distribution-based pseudo MLEs are still consistent. The population distribution may be unknown while the missing data process can follow an arbitrary missing at random mechanism. Enough details are provided for the bivariate case so that readers having taken a course in statistics/probability can fully understand the development. Sufficient conditions for the consistency of the MLEs in higher dimensions are also stated, while the details are omitted.</p>","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"64 3","pages":"263-267"},"PeriodicalIF":1.8,"publicationDate":"2010-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1198/tast.2010.09203","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"29568679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Non-linear Models for Longitudinal Data. 纵向数据的非线性模型。
IF 1.8 4区 数学 Q1 Mathematics Pub Date : 2009-11-01 DOI: 10.1198/tast.2009.07256
Jan Serroyen, Geert Molenberghs, Geert Verbeke, Marie Davidian

While marginal models, random-effects models, and conditional models are routinely considered to be the three main modeling families for continuous and discrete repeated measures with linear and generalized linear mean structures, respectively, it is less common to consider non-linear models, let alone frame them within the above taxonomy. In the latter situation, indeed, when considered at all, the focus is often exclusively on random-effects models. In this paper, we consider all three families, exemplify their great flexibility and relative ease of use, and apply them to a simple but illustrative set of data on tree circumference growth of orange trees.

虽然边际模型、随机效应模型和条件模型通常被认为是分别具有线性和广义线性平均结构的连续和离散重复测量的三个主要建模族,但考虑非线性模型的情况并不常见,更不用说在上述分类中构建它们了。在后一种情况下,当我们考虑到这一点时,我们通常只关注随机效应模型。在本文中,我们考虑了这三个家族,举例说明了它们的灵活性和相对易用性,并将它们应用于一个简单但说明性的橙树树围生长数据集。
{"title":"Non-linear Models for Longitudinal Data.","authors":"Jan Serroyen,&nbsp;Geert Molenberghs,&nbsp;Geert Verbeke,&nbsp;Marie Davidian","doi":"10.1198/tast.2009.07256","DOIUrl":"https://doi.org/10.1198/tast.2009.07256","url":null,"abstract":"<p><p>While marginal models, random-effects models, and conditional models are routinely considered to be the three main modeling families for continuous and discrete repeated measures with linear and generalized linear mean structures, respectively, it is less common to consider non-linear models, let alone frame them within the above taxonomy. In the latter situation, indeed, when considered at all, the focus is often exclusively on random-effects models. In this paper, we consider all three families, exemplify their great flexibility and relative ease of use, and apply them to a simple but illustrative set of data on tree circumference growth of orange trees.</p>","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"63 4","pages":"378-388"},"PeriodicalIF":1.8,"publicationDate":"2009-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1198/tast.2009.07256","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28718267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Rating Movies and Rating the Raters Who Rate Them. 给电影打分,给给电影打分的人打分。
IF 1.8 4区 数学 Q1 Mathematics Pub Date : 2009-11-01 DOI: 10.1198/tast.2009.08278
Hua Zhou, Kenneth Lange

The movie distribution company Netflix has generated considerable buzz in the statistics community by offering a million dollar prize for improvements to its movie rating system. Among the statisticians and computer scientists who have disclosed their techniques, the emphasis has been on machine learning approaches. This article has the modest goal of discussing a simple model for movie rating and other forms of democratic rating. Because the model involves a large number of parameters, it is nontrivial to carry out maximum likelihood estimation. Here we derive a straightforward EM algorithm from the perspective of the more general MM algorithm. The algorithm is capable of finding the global maximum on a likelihood landscape littered with inferior modes. We apply two variants of the model to a dataset from the MovieLens archive and compare their results. Our model identifies quirky raters, redefines the raw rankings, and permits imputation of missing ratings. The model is intended to stimulate discussion and development of better theory rather than to win the prize. It has the added benefit of introducing readers to some of the issues connected with analyzing high-dimensional data.

电影发行公司奈飞公司(Netflix)悬赏100万美元,希望改进其电影评级系统,这在统计界引起了不小的反响。在公开了他们技术的统计学家和计算机科学家中,重点一直放在机器学习方法上。本文的适度目标是讨论一个简单的电影评级模型和其他形式的民主评级。由于该模型涉及大量的参数,因此进行极大似然估计是非平凡的。在这里,我们从更一般的MM算法的角度推导了一个简单的EM算法。该算法能够在充斥着劣等模态的似然图上找到全局最大值。我们将模型的两个变体应用于MovieLens存档的数据集,并比较它们的结果。我们的模型识别出古怪的评级者,重新定义原始排名,并允许对缺失评级进行估算。该模型旨在促进讨论和发展更好的理论,而不是为了获奖。它还有一个额外的好处,就是向读者介绍与分析高维数据相关的一些问题。
{"title":"Rating Movies and Rating the Raters Who Rate Them.","authors":"Hua Zhou,&nbsp;Kenneth Lange","doi":"10.1198/tast.2009.08278","DOIUrl":"https://doi.org/10.1198/tast.2009.08278","url":null,"abstract":"<p><p>The movie distribution company Netflix has generated considerable buzz in the statistics community by offering a million dollar prize for improvements to its movie rating system. Among the statisticians and computer scientists who have disclosed their techniques, the emphasis has been on machine learning approaches. This article has the modest goal of discussing a simple model for movie rating and other forms of democratic rating. Because the model involves a large number of parameters, it is nontrivial to carry out maximum likelihood estimation. Here we derive a straightforward EM algorithm from the perspective of the more general MM algorithm. The algorithm is capable of finding the global maximum on a likelihood landscape littered with inferior modes. We apply two variants of the model to a dataset from the MovieLens archive and compare their results. Our model identifies quirky raters, redefines the raw rankings, and permits imputation of missing ratings. The model is intended to stimulate discussion and development of better theory rather than to win the prize. It has the added benefit of introducing readers to some of the issues connected with analyzing high-dimensional data.</p>","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"63 4","pages":"297-307"},"PeriodicalIF":1.8,"publicationDate":"2009-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1198/tast.2009.08278","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"29274882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Easy Multiplicity Control in Equivalence Testing Using Two One-sided Tests. 使用两个单侧检验的等价检验中的简单多重控制。
IF 1.8 4区 数学 Q1 Mathematics Pub Date : 2009-05-01 DOI: 10.1198/tast.2009.0029
Carolyn Lauzon, Brian Caffo

Equivalence testing is growing in use in scientific research outside of its traditional role in the drug approval process. Largely due to its ease of use and recommendation from the United States Food and Drug Administration guidance, the most common statistical method for testing equivalence is the two one-sided tests procedure (TOST). Like classical point-null hypothesis testing, TOST is subject to multiplicity concerns as more comparisons are made. In this manuscript, a condition that bounds the family-wise error rate using TOST is given. This condition then leads to a simple solution for controlling the family-wise error rate. Specifically, we demonstrate that if all pair-wise comparisons of k independent groups are being evaluated for equivalence, then simply scaling the nominal Type I error rate down by (k - 1) is sufficient to maintain the family-wise error rate at the desired value or less. The resulting rule is much less conservative than the equally simple Bonferroni correction. An example of equivalence testing in a non drug-development setting is given.

等效检验在药物审批过程中的传统作用之外,越来越多地应用于科学研究。主要由于其易于使用和美国食品和药物管理局指南的推荐,检测等效性的最常用统计方法是双单侧试验程序(TOST)。与经典的点零假设检验一样,随着比较的增多,TOST也受到多重性的影响。在本文中,给出了一个使用TOST限定家庭错误率的条件。这种情况导致了控制家庭错误率的简单解决方案。具体地说,我们证明,如果k个独立组的所有两两比较都被评估为相等,那么简单地将名义I型错误率降低(k - 1)就足以使家庭错误率保持在期望的值或更低。由此得出的规则比同样简单的邦费罗尼修正要保守得多。给出了在非药物开发环境中进行等效试验的一个例子。
{"title":"Easy Multiplicity Control in Equivalence Testing Using Two One-sided Tests.","authors":"Carolyn Lauzon,&nbsp;Brian Caffo","doi":"10.1198/tast.2009.0029","DOIUrl":"https://doi.org/10.1198/tast.2009.0029","url":null,"abstract":"<p><p>Equivalence testing is growing in use in scientific research outside of its traditional role in the drug approval process. Largely due to its ease of use and recommendation from the United States Food and Drug Administration guidance, the most common statistical method for testing equivalence is the two one-sided tests procedure (TOST). Like classical point-null hypothesis testing, TOST is subject to multiplicity concerns as more comparisons are made. In this manuscript, a condition that bounds the family-wise error rate using TOST is given. This condition then leads to a simple solution for controlling the family-wise error rate. Specifically, we demonstrate that if all pair-wise comparisons of k independent groups are being evaluated for equivalence, then simply scaling the nominal Type I error rate down by (k - 1) is sufficient to maintain the family-wise error rate at the desired value or less. The resulting rule is much less conservative than the equally simple Bonferroni correction. An example of equivalence testing in a non drug-development setting is given.</p>","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"63 2","pages":"147-154"},"PeriodicalIF":1.8,"publicationDate":"2009-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1198/tast.2009.0029","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28625984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
On the Assessment of Monte Carlo Error in Simulation-Based Statistical Analyses. 基于仿真的统计分析中蒙特卡罗误差的评定。
IF 1.8 4区 数学 Q1 Mathematics Pub Date : 2009-05-01 DOI: 10.1198/tast.2009.0030
Elizabeth Koehler, Elizabeth Brown, Sebastien J-P A Haneuse

Statistical experiments, more commonly referred to as Monte Carlo or simulation studies, are used to study the behavior of statistical methods and measures under controlled situations. Whereas recent computing and methodological advances have permitted increased efficiency in the simulation process, known as variance reduction, such experiments remain limited by their finite nature and hence are subject to uncertainty; when a simulation is run more than once, different results are obtained. However, virtually no emphasis has been placed on reporting the uncertainty, referred to here as Monte Carlo error, associated with simulation results in the published literature, or on justifying the number of replications used. These deserve broader consideration. Here we present a series of simple and practical methods for estimating Monte Carlo error as well as determining the number of replications required to achieve a desired level of accuracy. The issues and methods are demonstrated with two simple examples, one evaluating operating characteristics of the maximum likelihood estimator for the parameters in logistic regression and the other in the context of using the bootstrap to obtain 95% confidence intervals. The results suggest that in many settings, Monte Carlo error may be more substantial than traditionally thought.

统计实验,通常被称为蒙特卡罗或模拟研究,用于研究统计方法和措施在受控情况下的行为。尽管最近的计算和方法进步已经允许在模拟过程中提高效率,称为方差减少,但此类实验仍然受到其有限性质的限制,因此受到不确定性的影响;当一个模拟运行多次时,会得到不同的结果。然而,实际上没有强调报告不确定性,这里称为蒙特卡罗误差,与已发表的文献中的模拟结果有关,也没有强调证明所使用的重复次数。这些值得更广泛的考虑。在这里,我们提出了一系列简单而实用的方法来估计蒙特卡罗误差,以及确定达到所需精度水平所需的重复次数。用两个简单的例子来说明问题和方法,一个是在逻辑回归中评估参数的最大似然估计器的运行特征,另一个是在使用自举法获得95%置信区间的背景下。结果表明,在许多情况下,蒙特卡罗误差可能比传统认为的更大。
{"title":"On the Assessment of Monte Carlo Error in Simulation-Based Statistical Analyses.","authors":"Elizabeth Koehler,&nbsp;Elizabeth Brown,&nbsp;Sebastien J-P A Haneuse","doi":"10.1198/tast.2009.0030","DOIUrl":"https://doi.org/10.1198/tast.2009.0030","url":null,"abstract":"<p><p>Statistical experiments, more commonly referred to as Monte Carlo or simulation studies, are used to study the behavior of statistical methods and measures under controlled situations. Whereas recent computing and methodological advances have permitted increased efficiency in the simulation process, known as variance reduction, such experiments remain limited by their finite nature and hence are subject to uncertainty; when a simulation is run more than once, different results are obtained. However, virtually no emphasis has been placed on reporting the uncertainty, referred to here as Monte Carlo error, associated with simulation results in the published literature, or on justifying the number of replications used. These deserve broader consideration. Here we present a series of simple and practical methods for estimating Monte Carlo error as well as determining the number of replications required to achieve a desired level of accuracy. The issues and methods are demonstrated with two simple examples, one evaluating operating characteristics of the maximum likelihood estimator for the parameters in logistic regression and the other in the context of using the bootstrap to obtain 95% confidence intervals. The results suggest that in many settings, Monte Carlo error may be more substantial than traditionally thought.</p>","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":" ","pages":"155-162"},"PeriodicalIF":1.8,"publicationDate":"2009-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1198/tast.2009.0030","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40192743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 244
A Fresh Look at the Discriminant Function Approach for Estimating Crude or Adjusted Odds Ratios. 重新审视用于估算粗略或调整比值比的判别函数方法。
IF 1.8 4区 数学 Q1 Mathematics Pub Date : 2009-01-01 DOI: 10.1198/tast.2009.08246
Robert H Lyles, Ying Guo, Andrew N Hill

Assuming a binary outcome, logistic regression is the most common approach to estimating a crude or adjusted odds ratio corresponding to a continuous predictor. We revisit a method termed the discriminant function approach, which leads to closed-form estimators and corresponding standard errors. In its most appealing application, we show that the approach suggests a multiple linear regression of the continuous predictor of interest on the outcome and other covariates, in place of the traditional logistic regression model. If standard diagnostics support the assumptions (including normality of errors) accompanying this linear regression model, the resulting estimator has demonstrable advantages over the usual maximum likelihood estimator via logistic regression. These include improvements in terms of bias and efficiency based on a minimum variance unbiased estimator of the log odds ratio, as well as the availability of an estimate when logistic regression fails to converge due to a separation of data points. Use of the discriminant function approach as described here for multivariable analysis requires less stringent assumptions than those for which it was historically criticized, and is worth considering when the adjusted odds ratio associated with a particular continuous predictor is of primary interest. Simulation and case studies illustrate these points.

假定结果是二元的,逻辑回归是估计与连续预测因子相对应的粗略或调整后的几率比的最常用方法。我们重新研究了一种称为判别函数的方法,它可以得到闭式估计值和相应的标准误差。在其最吸引人的应用中,我们表明该方法建议对结果和其他协变量的连续预测因子进行多元线性回归,以取代传统的逻辑回归模型。如果标准诊断支持该线性回归模型的假设(包括误差的正态性),那么与通常的逻辑回归最大似然估计法相比,该估计法具有明显的优势。这些优势包括基于对数几率比的最小方差无偏估计器在偏差和效率方面的改进,以及在逻辑回归因数据点分离而无法收敛时提供估计值。在此介绍的多变量分析中使用判别函数方法所需的假设条件没有历史上受到批评的那么严格,当与特定连续预测因子相关的调整后几率比是主要关注点时,这种方法值得考虑。模拟和案例研究可以说明这些问题。
{"title":"A Fresh Look at the Discriminant Function Approach for Estimating Crude or Adjusted Odds Ratios.","authors":"Robert H Lyles, Ying Guo, Andrew N Hill","doi":"10.1198/tast.2009.08246","DOIUrl":"10.1198/tast.2009.08246","url":null,"abstract":"<p><p>Assuming a binary outcome, logistic regression is the most common approach to estimating a crude or adjusted odds ratio corresponding to a continuous predictor. We revisit a method termed the discriminant function approach, which leads to closed-form estimators and corresponding standard errors. In its most appealing application, we show that the approach suggests a multiple linear regression of the continuous predictor of interest on the outcome and other covariates, in place of the traditional logistic regression model. If standard diagnostics support the assumptions (including normality of errors) accompanying this linear regression model, the resulting estimator has demonstrable advantages over the usual maximum likelihood estimator via logistic regression. These include improvements in terms of bias and efficiency based on a minimum variance unbiased estimator of the log odds ratio, as well as the availability of an estimate when logistic regression fails to converge due to a separation of data points. Use of the discriminant function approach as described here for multivariable analysis requires less stringent assumptions than those for which it was historically criticized, and is worth considering when the adjusted odds ratio associated with a particular continuous predictor is of primary interest. Simulation and case studies illustrate these points.</p>","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"63 4","pages":""},"PeriodicalIF":1.8,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3881534/pdf/nihms536814.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32012162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Flexible Frames and Control Sampling in Case-Control Studies: Weighters (Survey Statisticians) Versus Anti-Weighters (Epidemiologists). 病例对照研究中的灵活框架和对照抽样:权重者(调查统计学家)与反权重者(流行病学家)。
IF 1.8 4区 数学 Q1 Mathematics Pub Date : 2008-11-01 DOI: 10.1198/000313008X364525
Richard F Potthoff, Susan Halabi, Joellen M Schildkraut, Beth Newman

We propose two innovations in statistical sampling for controls to enable better design of population-based case-control studies. The main innovation leads to novel solutions, without using weights, of the difficult and long-standing problem of selecting a control from persons in a household. Another advance concerns the drawing (at the outset) of the households themselves and involves random-digit dialing with atypical use of list-assisted sampling. A common element throughout is that one capitalizes on flexibility (not broadly available in usual survey settings) in choosing the frame, which specifies the population of persons from which both cases and controls come.

我们建议在统计抽样方面进行两项创新,以便更好地设计基于人群的病例对照研究。主要的创新带来了新颖的解决方案,不使用权重,从家庭成员中选择控制的困难和长期存在的问题。另一项进展涉及(在开始时)抽取住户本身,并涉及非典型使用清单辅助抽样的随机数字拨号。贯穿始终的一个共同因素是,人们在选择框架时充分利用了灵活性(在通常的调查设置中不广泛可用),它指定了病例和对照来自的人群。
{"title":"Flexible Frames and Control Sampling in Case-Control Studies: Weighters (Survey Statisticians) Versus Anti-Weighters (Epidemiologists).","authors":"Richard F Potthoff,&nbsp;Susan Halabi,&nbsp;Joellen M Schildkraut,&nbsp;Beth Newman","doi":"10.1198/000313008X364525","DOIUrl":"https://doi.org/10.1198/000313008X364525","url":null,"abstract":"<p><p>We propose two innovations in statistical sampling for controls to enable better design of population-based case-control studies. The main innovation leads to novel solutions, without using weights, of the difficult and long-standing problem of selecting a control from persons in a household. Another advance concerns the drawing (at the outset) of the households themselves and involves random-digit dialing with atypical use of list-assisted sampling. A common element throughout is that one capitalizes on flexibility (not broadly available in usual survey settings) in choosing the frame, which specifies the population of persons from which both cases and controls come.</p>","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"62 4","pages":"307-313"},"PeriodicalIF":1.8,"publicationDate":"2008-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1198/000313008X364525","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28405473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Linear Transformations and the k-Means Clustering Algorithm: Applications to Clustering Curves. 线性变换和k-均值聚类算法:聚类曲线的应用。
IF 1.8 4区 数学 Q1 Mathematics Pub Date : 2007-02-01 DOI: 10.1198/000313007X171016
Thaddeus Tarpey

Functional data can be clustered by plugging estimated regression coefficients from individual curves into the k-means algorithm. Clustering results can differ depending on how the curves are fit to the data. Estimating curves using different sets of basis functions corresponds to different linear transformations of the data. k-means clustering is not invariant to linear transformations of the data. The optimal linear transformation for clustering will stretch the distribution so that the primary direction of variability aligns with actual differences in the clusters. It is shown that clustering the raw data will often give results similar to clustering regression coefficients obtained using an orthogonal design matrix. Clustering functional data using an L(2) metric on function space can be achieved by clustering a suitable linear transformation of the regression coefficients. An example where depressed individuals are treated with an antidepressant is used for illustration.

函数数据可以通过从单个曲线中插入估计的回归系数到k-means算法中来聚类。聚类结果可能会因曲线与数据的拟合程度而有所不同。使用不同的基函数集估计曲线对应于数据的不同线性变换。K-means聚类对数据的线性变换不是不变的。聚类的最优线性变换将拉伸分布,使变异的主要方向与聚类的实际差异保持一致。结果表明,对原始数据进行聚类通常会得到与使用正交设计矩阵得到的聚类回归系数相似的结果。利用函数空间上的L(2)度量聚类函数数据可以通过对回归系数进行适当的线性变换聚类来实现。用抗抑郁药治疗抑郁症患者的例子来说明。
{"title":"Linear Transformations and the k-Means Clustering Algorithm: Applications to Clustering Curves.","authors":"Thaddeus Tarpey","doi":"10.1198/000313007X171016","DOIUrl":"https://doi.org/10.1198/000313007X171016","url":null,"abstract":"<p><p>Functional data can be clustered by plugging estimated regression coefficients from individual curves into the k-means algorithm. Clustering results can differ depending on how the curves are fit to the data. Estimating curves using different sets of basis functions corresponds to different linear transformations of the data. k-means clustering is not invariant to linear transformations of the data. The optimal linear transformation for clustering will stretch the distribution so that the primary direction of variability aligns with actual differences in the clusters. It is shown that clustering the raw data will often give results similar to clustering regression coefficients obtained using an orthogonal design matrix. Clustering functional data using an L(2) metric on function space can be achieved by clustering a suitable linear transformation of the regression coefficients. An example where depressed individuals are treated with an antidepressant is used for illustration.</p>","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"61 1","pages":"34-40"},"PeriodicalIF":1.8,"publicationDate":"2007-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1198/000313007X171016","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"26612590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 85
期刊
American Statistician
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1