首页 > 最新文献

American Statistician最新文献

英文 中文
Bayesian Causality. 贝叶斯因果关系。
IF 1.8 4区 数学 Q1 Mathematics Pub Date : 2020-01-01 Epub Date: 2019-08-26 DOI: 10.1080/00031305.2019.1647876
Pierre Baldi, Babak Shahbaba

Although no universally accepted definition of causality exists, in practice one is often faced with the question of statistically assessing causal relationships in different settings. We present a uniform general approach to causality problems derived from the axiomatic foundations of the Bayesian statistical framework. In this approach, causality statements are viewed as hypotheses, or models, about the world and the fundamental object to be computed is the posterior distribution of the causal hypotheses, given the data and the background knowledge. Computation of the posterior, illustrated here in simple examples, may involve complex probabilistic modeling but this is no different than in any other Bayesian modeling situation. The main advantage of the approach is its connection to the axiomatic foundations of the Bayesian framework, and the general uniformity with which it can be applied to a variety of causality settings, ranging from specific to general cases, or from causes of effects to effects of causes.

虽然没有普遍接受的因果关系定义存在,但在实践中,人们经常面临统计评估不同情况下因果关系的问题。我们提出了一个统一的一般方法,从贝叶斯统计框架的公理基础导出的因果关系问题。在这种方法中,因果关系陈述被视为关于世界的假设或模型,而要计算的基本对象是给定数据和背景知识的因果假设的后验分布。后验的计算,在这里用简单的例子说明,可能涉及复杂的概率建模,但这与任何其他贝叶斯建模情况没有什么不同。该方法的主要优点是它与贝叶斯框架的公理化基础的联系,以及它可以应用于各种因果关系设置的一般一致性,范围从具体到一般情况,或从结果的原因到原因的结果。
{"title":"Bayesian Causality.","authors":"Pierre Baldi,&nbsp;Babak Shahbaba","doi":"10.1080/00031305.2019.1647876","DOIUrl":"https://doi.org/10.1080/00031305.2019.1647876","url":null,"abstract":"<p><p>Although no universally accepted definition of causality exists, in practice one is often faced with the question of statistically assessing causal relationships in different settings. We present a uniform general approach to causality problems derived from the axiomatic foundations of the Bayesian statistical framework. In this approach, causality statements are viewed as hypotheses, or models, about the world and the fundamental object to be computed is the posterior distribution of the causal hypotheses, given the data and the background knowledge. Computation of the posterior, illustrated here in simple examples, may involve complex probabilistic modeling but this is no different than in any other Bayesian modeling situation. The main advantage of the approach is its connection to the axiomatic foundations of the Bayesian framework, and the general uniformity with which it can be applied to a variety of causality settings, ranging from specific to general cases, or from causes of effects to effects of causes.</p>","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"74 3","pages":"249-257"},"PeriodicalIF":1.8,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/00031305.2019.1647876","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38480318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
The Role of Expert Judgment in Statistical Inference and Evidence-Based Decision-Making. 专家判断在统计推断和循证决策中的作用。
IF 1.8 4区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2019-03-20 eCollection Date: 2019-01-01 DOI: 10.1080/00031305.2018.1529623
Naomi C Brownstein, Thomas A Louis, Anthony O'Hagan, Jane Pendergast

This article resulted from our participation in the session on the "role of expert opinion and judgment in statistical inference" at the October 2017 ASA Symposium on Statistical Inference. We present a strong, unified statement on roles of expert judgment in statistics with processes for obtaining input, whether from a Bayesian or frequentist perspective. Topics include the role of subjectivity in the cycle of scientific inference and decisions, followed by a clinical trial and a greenhouse gas emissions case study that illustrate the role of judgments and the importance of basing them on objective information and a comprehensive uncertainty assessment. We close with a call for increased proactivity and involvement of statisticians in study conceptualization, design, conduct, analysis, and communication.

本文是我们参加 2017 年 10 月美国统计学会统计推断专题讨论会 "专家意见和判断在统计推断中的作用 "分会的成果。我们从贝叶斯或频数主义的角度,对专家判断在统计中的作用以及获取输入的过程进行了有力、统一的阐述。主题包括主观性在科学推断和决策周期中的作用,随后的临床试验和温室气体排放案例研究说明了判断的作用以及以客观信息和全面不确定性评估为基础的重要性。最后,我们呼吁统计人员在研究构思、设计、实施、分析和交流过程中更加积极主动地参与。
{"title":"The Role of Expert Judgment in Statistical Inference and Evidence-Based Decision-Making.","authors":"Naomi C Brownstein, Thomas A Louis, Anthony O'Hagan, Jane Pendergast","doi":"10.1080/00031305.2018.1529623","DOIUrl":"10.1080/00031305.2018.1529623","url":null,"abstract":"<p><p>This article resulted from our participation in the session on the \"role of expert opinion and judgment in statistical inference\" at the October 2017 ASA Symposium on Statistical Inference. We present a strong, unified statement on roles of expert judgment in statistics with processes for obtaining input, whether from a Bayesian or frequentist perspective. Topics include the role of subjectivity in the cycle of scientific inference and decisions, followed by a clinical trial and a greenhouse gas emissions case study that illustrate the role of judgments and the importance of basing them on objective information and a comprehensive uncertainty assessment. We close with a call for increased proactivity and involvement of statisticians in study conceptualization, design, conduct, analysis, and communication.</p>","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"73 1","pages":"56-68"},"PeriodicalIF":1.8,"publicationDate":"2019-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6474725/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37216155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The New Statistics for Better Science: Ask How Much, How Uncertain, and What Else is Known. 更好的科学新统计:问多少,有多不确定,还有什么是已知的。
IF 1.8 4区 数学 Q1 Mathematics Pub Date : 2019-01-01 DOI: 10.1080/00031305.2018.1518266
Robert J Calin-Jageman, Geoff Cumming

The "New Statistics" emphasizes effect sizes, confidence intervals, meta-analysis, and the use of Open Science practices. We present 3 specific ways in which a New Statistics approach can help improve scientific practice: by reducing over-confidence in small samples, by reducing confirmation bias, and by fostering more cautious judgments of consistency. We illustrate these points through consideration of the literature on oxytocin and human trust, a research area that typifies some of the endemic problems that arise with poor statistical practice.

“新统计”强调效应大小、置信区间、元分析和开放科学实践的使用。我们提出了新统计学方法可以帮助改进科学实践的三种具体方式:通过减少对小样本的过度自信,通过减少确认偏差,以及通过培养更谨慎的一致性判断。我们通过考虑关于催产素和人类信任的文献来说明这些观点,这是一个研究领域,典型的一些地方性问题出现在不良的统计实践中。
{"title":"The New Statistics for Better Science: Ask How Much, How Uncertain, and What Else is Known.","authors":"Robert J Calin-Jageman,&nbsp;Geoff Cumming","doi":"10.1080/00031305.2018.1518266","DOIUrl":"https://doi.org/10.1080/00031305.2018.1518266","url":null,"abstract":"<p><p>The \"New Statistics\" emphasizes effect sizes, confidence intervals, meta-analysis, and the use of Open Science practices. We present 3 specific ways in which a New Statistics approach can help improve scientific practice: by reducing over-confidence in small samples, by reducing confirmation bias, and by fostering more cautious judgments of consistency. We illustrate these points through consideration of the literature on oxytocin and human trust, a research area that typifies some of the endemic problems that arise with poor statistical practice.</p>","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"73 Suppl 1","pages":"271-280"},"PeriodicalIF":1.8,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/00031305.2018.1518266","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10033601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 99
Evidence from marginally significant t statistics. 来自边际显著性统计的证据。
IF 1.8 4区 数学 Q1 Mathematics Pub Date : 2019-01-01 Epub Date: 2019-03-20 DOI: 10.1080/00031305.2018.1518788
Valen E Johnson

This article examines the evidence contained in t statistics that are marginally significant in 5% tests. The bases for evaluating evidence are likelihood ratios and integrated likelihood ratios, computed under a variety of assumptions regarding the alternative hypotheses in null hypothesis significance tests. Likelihood ratios and integrated likelihood ratios provide a useful measure of the evidence in favor of competing hypotheses because they can be interpreted as representing the ratio of the probabilities that each hypothesis assigns to observed data. When they are either very large or very small, they suggest that one hypothesis is much better than the other in predicting observed data. If they are close to 1.0, then both hypotheses provide approximately equally valid explanations for observed data. I find that p-values that are close to 0.05 (i.e., that are "marginally significant") correspond to integrated likelihood ratios that are bounded by approximately 7 in two-sided tests, and by approximately 4 in one-sided tests. The modest magnitude of integrated likelihood ratios corresponding to p-values close to 0.05 clearly suggests that higher standards of evidence are needed to support claims of novel discoveries and new effects.

本文检验了t统计数据中包含的证据,这些统计数据在5%的检验中具有边际显著性。评估证据的基础是似然比和综合似然比,它们是在零假设显著性检验中关于备选假设的各种假设下计算出来的。似然比和综合似然比为支持竞争性假设的证据提供了有用的度量,因为它们可以被解释为代表每个假设分配给观察数据的概率之比。当它们非常大或非常小时,它们表明在预测观测数据方面,一个假设比另一个好得多。如果它们接近1.0,那么这两个假设为观察到的数据提供了大致相同的有效解释。我发现p值接近0.05(即“边际显著”)对应的综合似然比在双侧检验中约为7,在单侧检验中约为4。与p值接近0.05相对应的综合似然比的适度幅度清楚地表明,需要更高的证据标准来支持新发现和新效果的主张。
{"title":"Evidence from marginally significant <i>t</i> statistics.","authors":"Valen E Johnson","doi":"10.1080/00031305.2018.1518788","DOIUrl":"https://doi.org/10.1080/00031305.2018.1518788","url":null,"abstract":"<p><p>This article examines the evidence contained in <i>t</i> statistics that are marginally significant in 5% tests. The bases for evaluating evidence are likelihood ratios and integrated likelihood ratios, computed under a variety of assumptions regarding the alternative hypotheses in null hypothesis significance tests. Likelihood ratios and integrated likelihood ratios provide a useful measure of the evidence in favor of competing hypotheses because they can be interpreted as representing the ratio of the probabilities that each hypothesis assigns to observed data. When they are either very large or very small, they suggest that one hypothesis is much better than the other in predicting observed data. If they are close to 1.0, then both hypotheses provide approximately equally valid explanations for observed data. I find that <i>p</i>-values that are close to 0.05 (i.e., that are \"marginally significant\") correspond to integrated likelihood ratios that are bounded by approximately 7 in two-sided tests, and by approximately 4 in one-sided tests. The modest magnitude of integrated likelihood ratios corresponding to <i>p</i>-values close to 0.05 clearly suggests that higher standards of evidence are needed to support claims of novel discoveries and new effects.</p>","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":" ","pages":"129-134"},"PeriodicalIF":1.8,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/00031305.2018.1518788","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37268013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Power and Sample Size for Fixed-Effects Inference in Reversible Linear Mixed Models. 可逆线性混合模型中固定效应推断的功率和样本量。
IF 1.8 4区 数学 Q1 Mathematics Pub Date : 2019-01-01 Epub Date: 2018-06-04 DOI: 10.1080/00031305.2017.1415972
Yueh-Yun Chi, Deborah H Glueck, Keith E Muller

Despite the popularity of the general linear mixed model for data analysis, power and sample size methods and software are not generally available for commonly used test statistics and reference distributions. Statisticians resort to simulations with homegrown and uncertified programs or rough approximations which are misaligned with the data analysis. For a wide range of designs with longitudinal and clustering features, we provide accurate power and sample size approximations for inference about fixed effects in linear models we call reversible. We show that under widely applicable conditions, the general linear mixed-model Wald test has non-central distributions equivalent to well-studied multivariate tests. In turn, exact and approximate power and sample size results for the multivariate Hotelling-Lawley test provide exact and approximate power and sample size results for the mixed-model Wald test. The calculations are easily computed with a free, open-source product that requires only a web browser to use. Commercial software can be used for a smaller range of reversible models. Simple approximations allow accounting for modest amounts of missing data. A real-world example illustrates the methods. Sample size results are presented for a multicenter study on pregnancy. The proposed study, an extension of a funded project, has clustering within clinic. Exchangeability among participants allows averaging across them to remove the clustering structure. The resulting simplified design is a single level longitudinal study. Multivariate methods for power provide an approximate sample size. All proofs and inputs for the example are in the Supplementary Materials (available online).

尽管用于数据分析的一般线性混合模型很受欢迎,但对于常用的检验统计和参考分布,功率和样本量方法和软件通常不可用。统计学家求助于模拟与本地和未经认证的程序或粗略的近似值与数据分析不一致。对于具有纵向和聚类特征的广泛设计,我们提供了精确的功率和样本量近似,用于推断线性模型中我们称之为可逆的固定效应。我们证明,在广泛适用的条件下,一般的线性混合模型Wald检验具有非中心分布,相当于经过充分研究的多变量检验。反过来,多元Hotelling-Lawley检验的准确和近似的功率和样本量结果为混合模型Wald检验提供了准确和近似的功率和样本量结果。这些计算很容易用一个免费的、开源的产品来计算,只需要一个网络浏览器就可以使用。商业软件可用于较小范围的可逆模型。简单的近似可以解释少量的丢失数据。一个真实的例子说明了这些方法。样本量结果提出了一个多中心研究的妊娠。拟议的研究是一个资助项目的延伸,在诊所内聚类。参与者之间的互换性允许在它们之间进行平均,以消除集群结构。由此产生的简化设计是单水平纵向研究。功率的多变量方法提供了一个近似的样本量。示例的所有证明和输入都在补充材料中(在线提供)。
{"title":"Power and Sample Size for Fixed-Effects Inference in Reversible Linear Mixed Models.","authors":"Yueh-Yun Chi,&nbsp;Deborah H Glueck,&nbsp;Keith E Muller","doi":"10.1080/00031305.2017.1415972","DOIUrl":"https://doi.org/10.1080/00031305.2017.1415972","url":null,"abstract":"<p><p>Despite the popularity of the general linear mixed model for data analysis, power and sample size methods and software are not generally available for commonly used test statistics and reference distributions. Statisticians resort to simulations with homegrown and uncertified programs or rough approximations which are misaligned with the data analysis. For a wide range of designs with longitudinal and clustering features, we provide accurate power and sample size approximations for inference about fixed effects in linear models we call reversible. We show that under widely applicable conditions, the general linear mixed-model Wald test has non-central distributions equivalent to well-studied multivariate tests. In turn, exact and approximate power and sample size results for the multivariate Hotelling-Lawley test provide exact and approximate power and sample size results for the mixed-model Wald test. The calculations are easily computed with a free, open-source product that requires only a web browser to use. Commercial software can be used for a smaller range of reversible models. Simple approximations allow accounting for modest amounts of missing data. A real-world example illustrates the methods. Sample size results are presented for a multicenter study on pregnancy. The proposed study, an extension of a funded project, has clustering within clinic. Exchangeability among participants allows averaging across them to remove the clustering structure. The resulting simplified design is a single level longitudinal study. Multivariate methods for power provide an approximate sample size. All proofs and inputs for the example are in the Supplementary Materials (available online).</p>","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"73 4","pages":"350-359"},"PeriodicalIF":1.8,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/00031305.2017.1415972","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37631108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Testing for positive quadrant dependence. 正象限相关性检验。
IF 1.8 4区 数学 Q1 Mathematics Pub Date : 2019-01-01 Epub Date: 2019-05-30 DOI: 10.1080/00031305.2019.1607554
Chuan-Fa Tang, Dewei Wang, Hammou El Barmi, Joshua M Tebbs

We develop an empirical likelihood approach to test independence of two univariate random variables X and Y versus the alternative that X and Y are strictly positive quadrant dependent (PQD). Establishing this type of ordering between X and Y is of interest in many applications, including finance, insurance, engineering, and other areas. Adopting the framework in Einmahl and McKeague (2003, Bernoulli), we create a distribution-free test statistic that integrates a localized empirical likelihood ratio test statistic with respect to the empirical joint distribution of X and Y. When compared to well known existing tests and distance-based tests we develop by using copula functions, simulation results show the EL testing procedure performs well in a variety of scenarios when X and Y are strictly PQD. We use three data sets for illustration and provide an online R resource practitioners can use to implement the methods in this article.

我们开发了一种经验似然方法来检验两个单变量随机变量X和Y的独立性,而不是X和Y严格正象限相关(PQD)的选择。在许多应用程序中,包括金融、保险、工程和其他领域,都对在X和Y之间建立这种排序很感兴趣。采用Einmahl和McKeague (2003, Bernoulli)的框架,我们创建了一个无分布的检验统计量,该统计量集成了关于X和Y的经验联合分布的局部经验似然比检验统计量。与我们使用copula函数开发的已知现有检验和基于距离的检验相比,仿真结果表明,EL检验程序在X和Y严格为PQD的各种情况下都表现良好。我们使用三个数据集进行说明,并提供一个在线R资源,从业者可以使用它来实现本文中的方法。
{"title":"Testing for positive quadrant dependence.","authors":"Chuan-Fa Tang,&nbsp;Dewei Wang,&nbsp;Hammou El Barmi,&nbsp;Joshua M Tebbs","doi":"10.1080/00031305.2019.1607554","DOIUrl":"https://doi.org/10.1080/00031305.2019.1607554","url":null,"abstract":"<p><p>We develop an empirical likelihood approach to test independence of two univariate random variables <i>X</i> and <i>Y</i> versus the alternative that <i>X</i> and <i>Y</i> are strictly positive quadrant dependent (PQD). Establishing this type of ordering between <i>X</i> and <i>Y</i> is of interest in many applications, including finance, insurance, engineering, and other areas. Adopting the framework in Einmahl and McKeague (2003, <i>Bernoulli</i>), we create a distribution-free test statistic that integrates a localized empirical likelihood ratio test statistic with respect to the empirical joint distribution of <i>X</i> and <i>Y</i>. When compared to well known existing tests and distance-based tests we develop by using copula functions, simulation results show the EL testing procedure performs well in a variety of scenarios when <i>X</i> and <i>Y</i> are strictly PQD. We use three data sets for illustration and provide an online R resource practitioners can use to implement the methods in this article.</p>","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"2019 ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/00031305.2019.1607554","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38559588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Limitations of P-Values and R-squared for Stepwise Regression Building: A Fairness Demonstration in Health Policy Risk Adjustment. 逐步回归构建的p值和r平方的局限性:卫生政策风险调整的公平性论证。
IF 1.8 4区 数学 Q1 Mathematics Pub Date : 2019-01-01 Epub Date: 2019-03-20 DOI: 10.1080/00031305.2018.1518269
Sherri Rose, Thomas G McGuire

Stepwise regression building procedures are commonly used applied statistical tools, despite their well-known drawbacks. While many of their limitations have been widely discussed in the literature, other aspects of the use of individual statistical fit measures, especially in high-dimensional stepwise regression settings, have not. Giving primacy to individual fit, as is done with p-values and R2, when group fit may be the larger concern, can lead to misguided decision making. One of the most consequential uses of stepwise regression is in health care, where these tools allocate hundreds of billions of dollars to health plans enrolling individuals with different predicted health care costs. The main goal of this "risk adjustment" system is to convey incentives to health plans such that they provide health care services fairly, a component of which is not to discriminate in access or care for persons or groups likely to be expensive. We address some specific limitations of p-values and R2 for high-dimensional stepwise regression in this policy problem through an illustrated example by additionally considering a group-level fairness metric.

逐步回归构建程序是常用的应用统计工具,尽管它们有众所周知的缺点。虽然它们的许多局限性已在文献中得到广泛讨论,但使用个别统计拟合措施的其他方面,特别是在高维逐步回归设置中,尚未得到广泛讨论。当群体拟合可能是更大的关注点时,把个人拟合放在首位,就像p值和R2一样,可能会导致错误的决策。逐步回归最重要的用途之一是在医疗保健领域,这些工具将数千亿美元分配到医疗计划中,这些计划将不同预测医疗保健成本的个人纳入其中。这一“风险调整”制度的主要目标是向保健计划提供奖励,使其公平地提供保健服务,其中一个组成部分是在获得或照顾可能昂贵的个人或群体方面不存在歧视。我们通过一个例子,通过额外考虑群体水平的公平度量,解决了这个政策问题中高维逐步回归的p值和R2的一些特定限制。
{"title":"Limitations of P-Values and R-squared for Stepwise Regression Building: A Fairness Demonstration in Health Policy Risk Adjustment.","authors":"Sherri Rose,&nbsp;Thomas G McGuire","doi":"10.1080/00031305.2018.1518269","DOIUrl":"https://doi.org/10.1080/00031305.2018.1518269","url":null,"abstract":"<p><p>Stepwise regression building procedures are commonly used applied statistical tools, despite their well-known drawbacks. While many of their limitations have been widely discussed in the literature, other aspects of the use of individual statistical fit measures, especially in high-dimensional stepwise regression settings, have not. Giving primacy to individual fit, as is done with p-values and R<sup>2</sup>, when group fit may be the larger concern, can lead to misguided decision making. One of the most consequential uses of stepwise regression is in health care, where these tools allocate hundreds of billions of dollars to health plans enrolling individuals with different predicted health care costs. The main goal of this \"risk adjustment\" system is to convey incentives to health plans such that they provide health care services fairly, a component of which is not to discriminate in access or care for persons or groups likely to be expensive. We address some specific limitations of p-values and R<sup>2</sup> for high-dimensional stepwise regression in this policy problem through an illustrated example by additionally considering a group-level fairness metric.</p>","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":" ","pages":"152-156"},"PeriodicalIF":1.8,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/00031305.2018.1518269","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37122994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Comparing Objective and Subjective Bayes Factors for the Two-Sample Comparison: The Classification Theorem in Action. 比较两个样本比较的客观和主观贝叶斯因素:分类定理在行动。
IF 1.8 4区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2019-01-01 Epub Date: 2018-05-10 DOI: 10.1080/00031305.2017.1322142
Mithat Gönen, Wesley O Johnson, Yonggang Lu, Peter H Westfall

Many Bayes factors have been proposed for comparing population means in two-sample (independent samples) studies. Recently, Wang and Liu (2015) presented an "objective" Bayes factor (BF) as an alternative to a "subjective" one presented by Gönen et al. (2005). Their report was evidently intended to show the superiority of their BF based on "undesirable behavior" of the latter. A wonderful aspect of Bayesian models is that they provide an opportunity to "lay all cards on the table." What distinguishes the various BFs in the two-sample problem is the choice of priors (cards) for the model parameters. This article discusses desiderata of BFs that have been proposed, and proposes a new criterion to compare BFs, no matter whether subjectively or objectively determined: A BF may be preferred if it correctly classifies the data as coming from the correct model most often. The criterion is based on a famous result in classification theory to minimize the total probability of misclassification. This criterion is objective, easily verified by simulation, shows clearly the effects (positive or negative) of assuming particular priors, provides new insights into the appropriateness of BFs in general, and provides a new answer to the question, "Which BF is best?"

在两个样本(独立样本)的研究中,已经提出了许多贝叶斯因子来比较总体均值。最近,王和刘(2015)提出了一个“客观”贝叶斯因子(BF),作为Gönen等人提出的“主观”贝叶斯因子的替代方案。(2005)。他们的报告显然是基于BF的“不良行为”来展示BF的优越性。贝叶斯模型的一个美妙之处在于,它们提供了一个“把所有卡片都摆在桌面上”的机会。区分两个样本问题中的各种BF的是模型参数的先验(卡片)的选择。本文讨论了已经提出的BF的需求,并提出了一个新的标准来比较BF,无论是主观还是客观确定的:如果BF正确地将数据分类为最常见的正确模型,那么它可能是首选。该准则基于分类理论中的一个著名结果,以最小化错误分类的总概率。该标准是客观的,易于通过模拟验证,清楚地显示了假设特定先验的影响(积极或消极),为一般BF的适当性提供了新的见解,并为“哪个BF最好?”
{"title":"Comparing Objective and Subjective Bayes Factors for the Two-Sample Comparison: The Classification Theorem in Action.","authors":"Mithat Gönen, Wesley O Johnson, Yonggang Lu, Peter H Westfall","doi":"10.1080/00031305.2017.1322142","DOIUrl":"10.1080/00031305.2017.1322142","url":null,"abstract":"<p><p>Many Bayes factors have been proposed for comparing population means in two-sample (independent samples) studies. Recently, Wang and Liu (2015) presented an \"objective\" Bayes factor (BF) as an alternative to a \"subjective\" one presented by Gönen et al. (2005). Their report was evidently intended to show the superiority of their BF based on \"undesirable behavior\" of the latter. A wonderful aspect of Bayesian models is that they provide an opportunity to \"lay all cards on the table.\" What distinguishes the various BFs in the two-sample problem is the choice of priors (cards) for the model parameters. This article discusses desiderata of BFs that have been proposed, and proposes a new criterion to compare BFs, no matter whether subjectively or objectively determined: A BF may be preferred if it correctly classifies the data as coming from the correct model most often. The criterion is based on a famous result in classification theory to minimize the total probability of misclassification. This criterion is objective, easily verified by simulation, shows clearly the effects (positive or negative) of assuming particular priors, provides new insights into the appropriateness of BFs in general, and provides a new answer to the question, \"Which BF is best?\"</p>","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"73 1","pages":"22-31"},"PeriodicalIF":1.8,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6424525/pdf/nihms-1502428.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37086310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint clustering with correlated variables. 关联变量联合聚类。
IF 1.8 4区 数学 Q1 Mathematics Pub Date : 2019-01-01 Epub Date: 2018-07-09 DOI: 10.1080/00031305.2018.1424033
Hongmei Zhang, Yubo Zou, Will Terry, Wilfried Karmaus, Hasan Arshad

Traditional clustering methods focus on grouping subjects or (dependent) variables assuming independence between the variables. Clusters formed through these approaches can potentially lack homogeneity. This article proposes a joint clustering method by which both variables and subjects are clustered. In each joint cluster (in general composed of a subset of variables and a subset of subjects), there exists a unique association between dependent variables and covariates of interest. To this end, a Bayesian method is designed, in which a semi-parametric model is used to evaluate any unknown relationships between possibly correlated variables and covariates of interest, and a Dirichlet process is utilized to cluster subjects. Compared to existing clustering techniques, the major novelty of the method exists in its ability to improve the homogeneity of clusters, along with the ability to take the correlations between variables into account. Via simulations, we examine the performance and efficiency of the proposed method. Applying the method to cluster allergens and subjects based on the association of wheal size in reaction to allergens with age, we found that a certain pattern of allergic sensitization to a set of allergens has a potential to reduce the occurrence of asthma.

传统的聚类方法侧重于对主题或(因变量)进行分组,假设变量之间相互独立。通过这些方法形成的集群可能缺乏同质性。本文提出了一种将变量和对象同时聚类的联合聚类方法。在每个联合簇(通常由变量子集和受试者子集组成)中,因变量和感兴趣的协变量之间存在唯一的关联。为此,设计了一种贝叶斯方法,其中使用半参数模型来评估可能相关变量与感兴趣的协变量之间的未知关系,并使用狄利克雷过程对主题进行聚类。与现有的聚类技术相比,该方法的主要新颖之处在于它能够提高聚类的同质性,以及考虑变量之间相关性的能力。通过仿真,我们验证了该方法的性能和效率。将该方法应用于基于车轮大小对过敏原的反应与年龄的关联的过敏原和受试者,我们发现对一组过敏原的特定模式的过敏致敏有可能减少哮喘的发生。
{"title":"Joint clustering with correlated variables.","authors":"Hongmei Zhang,&nbsp;Yubo Zou,&nbsp;Will Terry,&nbsp;Wilfried Karmaus,&nbsp;Hasan Arshad","doi":"10.1080/00031305.2018.1424033","DOIUrl":"https://doi.org/10.1080/00031305.2018.1424033","url":null,"abstract":"<p><p>Traditional clustering methods focus on grouping subjects or (dependent) variables assuming independence between the variables. Clusters formed through these approaches can potentially lack homogeneity. This article proposes a joint clustering method by which both variables and subjects are clustered. In each joint cluster (in general composed of a subset of variables and a subset of subjects), there exists a unique association between dependent variables and covariates of interest. To this end, a Bayesian method is designed, in which a semi-parametric model is used to evaluate any unknown relationships between possibly correlated variables and covariates of interest, and a Dirichlet process is utilized to cluster subjects. Compared to existing clustering techniques, the major novelty of the method exists in its ability to improve the homogeneity of clusters, along with the ability to take the correlations between variables into account. Via simulations, we examine the performance and efficiency of the proposed method. Applying the method to cluster allergens and subjects based on the association of wheal size in reaction to allergens with age, we found that a certain pattern of allergic sensitization to a set of allergens has a potential to reduce the occurrence of asthma.</p>","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"73 3","pages":"296-306"},"PeriodicalIF":1.8,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/00031305.2018.1424033","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38325031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Letter to the Editor. 给编辑的信。
IF 1.8 4区 数学 Q1 Mathematics Pub Date : 2019-01-01 Epub Date: 2019-08-05 DOI: 10.1080/00031305.2018.1537894
Thaddeus Tarpey, Eva Petkova

Hutson and Vexler (2018) demonstrate an example of aliasing with the beta and normal distribution. This letter presents another illustration of aliasing using the beta and normal distributions via an infinite mixture model, inspired by the problem of modeling placebo response.

Hutson和Vexler(2018)展示了一个与beta和正态分布混叠的例子。这封信通过一个无限混合模型展示了使用贝塔分布和正态分布的混叠的另一个例子,灵感来自于模拟安慰剂反应的问题。
{"title":"Letter to the Editor.","authors":"Thaddeus Tarpey,&nbsp;Eva Petkova","doi":"10.1080/00031305.2018.1537894","DOIUrl":"https://doi.org/10.1080/00031305.2018.1537894","url":null,"abstract":"<p><p>Hutson and Vexler (2018) demonstrate an example of aliasing with the beta and normal distribution. This letter presents another illustration of aliasing using the beta and normal distributions via an infinite mixture model, inspired by the problem of modeling placebo response.</p>","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"73 3","pages":"312"},"PeriodicalIF":1.8,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/00031305.2018.1537894","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25523718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
American Statistician
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1