Digging into p Values

IF 1 4区 医学 Q4 REHABILITATION Journal of Visual Impairment & Blindness Pub Date : 2022-11-01 DOI:10.1177/0145482x221144443
R. W. Emerson
{"title":"Digging into p Values","authors":"R. W. Emerson","doi":"10.1177/0145482x221144443","DOIUrl":null,"url":null,"abstract":"Back in the January to February 2016 issue of this journal, I discussed p value and the increased need to report effect sizes along with p value. Since some time has passed and p value remains an important aspect of statistical reporting, I thought it wise to revisit the topic. To illustrate some points, we will refer to the article from this issue entitled “COVID-19: Social Distancing and Physical Activity in United Kingdom Residents with Visual Impairment,” by Strongman, Swain, Chung, Merzbach, and Gordon. The authors of this article made a number of t test comparisons where they are comparing the mean of one group to the mean of another group. If you cast your mind back, you will remember that, in the social sciences, we generally have a cutoff for “statistical significance” of .05 for such comparisons. This measure of significance means that, if the p value or significance level, is < .05, the difference in means between the two groups is deemed “statistically significant.” Statistical significance means that there is less than a 5% chance that the observed difference is due to chance. It is the accepted level of chance that experimenters are willing to accept in the social sciences, where data tend to be a little more noisy or hard to measure accurately than in something like physics. Let us unpack this matter a little more. A number of the comparisons in the article I am using as an example today has p values close to .05, either slightly more or less. How meaningful is it to claim that a comparison with a p value of .051 is not statistically meaningful while one with a p value of .049 is? This question is the reason why I made the case in 2016 that we should also include a measure of effect size when reporting the results of statistical tests so that the magnitude of the difference can also be known. In 1994, Jacob Cohen, a big name in statistics circles, wrote a piece entitled, “The Earth is Round (p < .05),” in which he summarized a long history of people noting that null hypothesis significance testing (which is what you are doing when you rely on the p level) is a dangerous game. Let us take this suggestion step by step. In null hypothesis significance testing (or NHST, for short), we start with the null hypothesis that the groups we are comparing are not different, or are drawn from the same larger population. If the p value from our statistical comparison is less than our cutoff (which is often .05), we “fail to accept the null hypothesis,” which leads one to want to say that the two groups are different. As Jacob Cohen notes,","PeriodicalId":47438,"journal":{"name":"Journal of Visual Impairment & Blindness","volume":"116 1","pages":"857 - 858"},"PeriodicalIF":1.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Impairment & Blindness","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/0145482x221144443","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"REHABILITATION","Score":null,"Total":0}
引用次数: 0

Abstract

Back in the January to February 2016 issue of this journal, I discussed p value and the increased need to report effect sizes along with p value. Since some time has passed and p value remains an important aspect of statistical reporting, I thought it wise to revisit the topic. To illustrate some points, we will refer to the article from this issue entitled “COVID-19: Social Distancing and Physical Activity in United Kingdom Residents with Visual Impairment,” by Strongman, Swain, Chung, Merzbach, and Gordon. The authors of this article made a number of t test comparisons where they are comparing the mean of one group to the mean of another group. If you cast your mind back, you will remember that, in the social sciences, we generally have a cutoff for “statistical significance” of .05 for such comparisons. This measure of significance means that, if the p value or significance level, is < .05, the difference in means between the two groups is deemed “statistically significant.” Statistical significance means that there is less than a 5% chance that the observed difference is due to chance. It is the accepted level of chance that experimenters are willing to accept in the social sciences, where data tend to be a little more noisy or hard to measure accurately than in something like physics. Let us unpack this matter a little more. A number of the comparisons in the article I am using as an example today has p values close to .05, either slightly more or less. How meaningful is it to claim that a comparison with a p value of .051 is not statistically meaningful while one with a p value of .049 is? This question is the reason why I made the case in 2016 that we should also include a measure of effect size when reporting the results of statistical tests so that the magnitude of the difference can also be known. In 1994, Jacob Cohen, a big name in statistics circles, wrote a piece entitled, “The Earth is Round (p < .05),” in which he summarized a long history of people noting that null hypothesis significance testing (which is what you are doing when you rely on the p level) is a dangerous game. Let us take this suggestion step by step. In null hypothesis significance testing (or NHST, for short), we start with the null hypothesis that the groups we are comparing are not different, or are drawn from the same larger population. If the p value from our statistical comparison is less than our cutoff (which is often .05), we “fail to accept the null hypothesis,” which leads one to want to say that the two groups are different. As Jacob Cohen notes,
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
挖掘p值
早在2016年1月至2月的这期杂志上,我就讨论了p值以及随着p值的增加而报告效应大小的需求。由于一段时间过去了,p值仍然是统计报告的一个重要方面,我认为重新讨论这个话题是明智的。为了说明一些观点,我们将参考本期Strongman、Swain、Chung、Merzbach和Gordon撰写的题为“新冠肺炎:英国视力障碍居民的社交距离和身体活动”的文章。这篇文章的作者进行了一些t检验比较,他们将一组的平均值与另一组的均值进行比较。如果你回想一下,你会记得,在社会科学中,这种比较的“统计显著性”通常为0.05。这种显著性度量意味着,如果p值或显著性水平<.05,则两组之间的平均值差异被视为“统计学显著性”。统计学显著性意味着观察到的差异由偶然性引起的可能性不到5%。这是实验者在社会科学中愿意接受的可接受的机会水平,在社会科学领域,数据往往比物理学等学科的数据更嘈杂或更难准确测量。让我们再深入探讨一下这件事。我今天以文章中的一些比较为例,其p值接近.05,或者略高或者略低。声称p值为.051的比较在统计上没有意义,而p值为.049的比较在统计学上有意义,这有多大意义?这个问题就是我在2016年提出的理由,即在报告统计测试结果时,我们还应该包括效应大小的衡量标准,这样就可以知道差异的大小。1994年,统计学界的大人物雅各布·科恩写了一篇题为《地球是圆的(p<0.05)》的文章,他在文章中总结了长期以来人们注意到零假设显著性检验(即当你依赖p水平时所做的)是一种危险的游戏。让我们逐步采纳这个建议。在零假设显著性检验(简称NHST)中,我们从零假设开始,即我们比较的群体没有差异,或者来自同一个更大的群体。如果我们统计比较的p值小于我们的截止值(通常为.05),我们“无法接受零假设”,这会导致人们想说这两组人不同。正如Jacob Cohen所指出的,
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
1.30
自引率
18.20%
发文量
68
期刊介绍: The Journal of Visual Impairment & Blindness is the essential professional resource for information about visual impairment (that is, blindness or low vision). The international peer-reviewed journal of record in the field, it delivers current research and best practice information, commentary from authoritative experts on critical topics, News From the Field, and a calendar of important events. Practitioners and researchers, policymakers and administrators, counselors and advocates rely on JVIB for its delivery of cutting-edge research and the most up-to-date practices in the field of visual impairment and blindness. Available in print and online 24/7, JVIB offers immediate access to information from the leading researchers, teachers of students with visual impairments (often referred to as TVIs), orientation and mobility (O&M) practitioners, vision rehabilitation therapists (often referred to as VRTs), early interventionists, and low vision therapists (often referred to as LVTs) in the field.
期刊最新文献
A graded neonatal mouse model of necrotizing enterocolitis demonstrates that mild enterocolitis is sufficient to activate microglia and increase cerebral cytokine expression. Feasibility and Acceptability of Implementing a Job Search Intervention for Adults With Visual Impairments via Videoconferencing. Components of Valid Learning Media Assessments Using Checklists as a Vocational Rehabilitation Tool for Employed Consumers JVIB Peer Reviewers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1