{"title":"编辑评论:你的P值有多大?","authors":"B. Silvey","doi":"10.1177/87551233211043843","DOIUrl":null,"url":null,"abstract":"For researchers who conduct quantitative analyses that involve statistical software such as SPSS or R, nothing is more perilous than hitting the execute button and waiting for those p values to appear on the screen. In the case of p ≤ .05, there is usually great rejoicing and a satisfaction that achieving statistical significance means that your study may result in a publishable article. However, if the output shows p ≥ .05, the need for a stiff drink and psychotherapy develops suddenly. If only the number had been different! All of that hard work for nothing! Due to the distorted nature of this easy and lazy categorization that places research findings into those that matter and those that do not, it should come as no surprise that statisticians disagree on the importance and use of p values (Wasserstein & Lazar, 2016). Indeed, many researchers and data scientists have called for the retirement of statistical significance altogether (Amrhein et al., 2019). Although a complete description and discussion of the debate surrounding significance testing is beyond the scope of these comments, there are many resources available for those individuals who like to fall asleep early (c.f., Brereton, 2020; Kennedy-Shaffer, 2019; Vidgen & Yasseri, 2016). Even though the mission of Update is to present “findings of individual studies without research terminology or jargon” (Update, n.d.), we often include quantitative studies that have varying degrees of statistical mumbo jumbo. (I won’t make any excuses, though, other than maybe I should do a better job as Editor.) Because you have woken from the slumber induced by reading the exhaustive list of the strengths and weaknesses of null hypothesis significance testing cited previously, I thought it would be better to present some ways that researchers are attempting to move beyond the p value. Many of our readers, in addition to be excellent practitioners, endeavor to consume even more sophisticated quantitative research, so knowing more about what is happening within the social sciences and other music education research journals could prove beneficial. In a terrific article by Resnick (2017), the case for and against redefining statistical significance is debated. He claims that there are more nuanced ways to move science forward, and asserts that researchers should consider several things when reporting their data. One consideration is to include effect sizes. For those unaware of effect sizes, they are a quantitative measure of the magnitude of an experimental effect, and are reported alongside p values. Rather than only report whether there was a statistical difference, researchers should include effect sizes to contextualize the importance and practicality of their findings. Depending on the type of statistical test that was computed, you might find Cohen’s d, Hedges g, or partial eta squared (η2) hanging out behind that p value. In other words, just because something is statistically significant does not mean that it has any real importance. Researchers can also provide additional statistical information and a bit more wiggle room when reporting findings through the use of confidence intervals (CIs). These intervals are a “range of values around that statistic that are believed to contain, with a certain probability, the real value of that statistic” (Field, 2009, p. 783). Most researchers use a CI value of 95%. This is a neat way to indicate a range of values—from using a lower and an upper limit—that you are 95% confident will include your sample statistic. You might see something like this after the p value: p = .01, 95% CI [1.2, 2.5]. There are also matters that do not involve statistics that can help alleviate the anxiety induced by p values. The first consideration is that researchers should contextualize the results of a study by whether the findings are novel or have been replicated. If no one has ever studied this problem, the associated data, or conducted a similar methodology, it stands to reason that regardless of the statistical findings, the research community should exercise discretion in extrapolating the results. However, if the study is a replication of previous research or part of a continued line of study, we have greater reason to accept the underlying implications of those studies. Finally, there has been a push to make data from studies free and accessible online. This includes both quantitative and qualitative data. If researchers are willing to make their data accessible for 1043843 UPDXXX10.1177/87551233211043843Update: Applications of Research in Music EducationSilvey research-article2021","PeriodicalId":75281,"journal":{"name":"Update (Music Educators National Conference (U.S.))","volume":"40 1","pages":"3 - 4"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comments From the Editor: How Big Are Your P Values?\",\"authors\":\"B. Silvey\",\"doi\":\"10.1177/87551233211043843\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For researchers who conduct quantitative analyses that involve statistical software such as SPSS or R, nothing is more perilous than hitting the execute button and waiting for those p values to appear on the screen. In the case of p ≤ .05, there is usually great rejoicing and a satisfaction that achieving statistical significance means that your study may result in a publishable article. However, if the output shows p ≥ .05, the need for a stiff drink and psychotherapy develops suddenly. If only the number had been different! All of that hard work for nothing! Due to the distorted nature of this easy and lazy categorization that places research findings into those that matter and those that do not, it should come as no surprise that statisticians disagree on the importance and use of p values (Wasserstein & Lazar, 2016). Indeed, many researchers and data scientists have called for the retirement of statistical significance altogether (Amrhein et al., 2019). Although a complete description and discussion of the debate surrounding significance testing is beyond the scope of these comments, there are many resources available for those individuals who like to fall asleep early (c.f., Brereton, 2020; Kennedy-Shaffer, 2019; Vidgen & Yasseri, 2016). Even though the mission of Update is to present “findings of individual studies without research terminology or jargon” (Update, n.d.), we often include quantitative studies that have varying degrees of statistical mumbo jumbo. (I won’t make any excuses, though, other than maybe I should do a better job as Editor.) Because you have woken from the slumber induced by reading the exhaustive list of the strengths and weaknesses of null hypothesis significance testing cited previously, I thought it would be better to present some ways that researchers are attempting to move beyond the p value. Many of our readers, in addition to be excellent practitioners, endeavor to consume even more sophisticated quantitative research, so knowing more about what is happening within the social sciences and other music education research journals could prove beneficial. In a terrific article by Resnick (2017), the case for and against redefining statistical significance is debated. He claims that there are more nuanced ways to move science forward, and asserts that researchers should consider several things when reporting their data. One consideration is to include effect sizes. For those unaware of effect sizes, they are a quantitative measure of the magnitude of an experimental effect, and are reported alongside p values. Rather than only report whether there was a statistical difference, researchers should include effect sizes to contextualize the importance and practicality of their findings. Depending on the type of statistical test that was computed, you might find Cohen’s d, Hedges g, or partial eta squared (η2) hanging out behind that p value. In other words, just because something is statistically significant does not mean that it has any real importance. Researchers can also provide additional statistical information and a bit more wiggle room when reporting findings through the use of confidence intervals (CIs). These intervals are a “range of values around that statistic that are believed to contain, with a certain probability, the real value of that statistic” (Field, 2009, p. 783). Most researchers use a CI value of 95%. This is a neat way to indicate a range of values—from using a lower and an upper limit—that you are 95% confident will include your sample statistic. You might see something like this after the p value: p = .01, 95% CI [1.2, 2.5]. There are also matters that do not involve statistics that can help alleviate the anxiety induced by p values. The first consideration is that researchers should contextualize the results of a study by whether the findings are novel or have been replicated. If no one has ever studied this problem, the associated data, or conducted a similar methodology, it stands to reason that regardless of the statistical findings, the research community should exercise discretion in extrapolating the results. However, if the study is a replication of previous research or part of a continued line of study, we have greater reason to accept the underlying implications of those studies. Finally, there has been a push to make data from studies free and accessible online. This includes both quantitative and qualitative data. If researchers are willing to make their data accessible for 1043843 UPDXXX10.1177/87551233211043843Update: Applications of Research in Music EducationSilvey research-article2021\",\"PeriodicalId\":75281,\"journal\":{\"name\":\"Update (Music Educators National Conference (U.S.))\",\"volume\":\"40 1\",\"pages\":\"3 - 4\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Update (Music Educators National Conference (U.S.))\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/87551233211043843\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Update (Music Educators National Conference (U.S.))","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/87551233211043843","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
对于那些使用SPSS或R等统计软件进行定量分析的研究人员来说,没有什么比按下执行按钮等待这些p值出现在屏幕上更危险的了。在p≤.05的情况下,通常会感到非常高兴和满足,因为达到统计显著性意味着你的研究可能会发表一篇文章。然而,如果输出显示p≥0.05,则突然出现了对烈性酒和心理治疗的需求。要是数字不一样就好了!所有这些努力都是徒劳的!由于这种简单而懒惰的分类的扭曲性质,将研究结果分为重要的和不重要的,统计学家在p值的重要性和使用上存在分歧也就不足为奇了(Wasserstein和Lazar,2016)。事实上,许多研究人员和数据科学家呼吁完全取消统计学意义(Amrhein et al.,2019)。尽管围绕显著性测试的辩论的完整描述和讨论超出了这些评论的范围,但对于那些喜欢早睡的人来说,有很多资源可用(c.f.,Brereton,2020;Kennedy Shaffer,2019;Vidgen和Yasseri,2016)。尽管更新的任务是呈现“没有研究术语或行话的个别研究结果”(更新,n.d.),但我们通常包括具有不同程度统计复杂性的定量研究。(不过,我不会找任何借口,除了也许我应该做得更好。)因为你已经从之前引用的零假设显著性测试的优点和缺点的详尽列表中醒来了,我认为最好介绍一些研究人员试图超越p值的方法。我们的许多读者,除了是优秀的从业者外,还努力消费更复杂的定量研究,因此,更多地了解社会科学和其他音乐教育研究期刊中正在发生的事情可能会被证明是有益的。在Resnick(2017)的一篇精彩文章中,支持和反对重新定义统计显著性的理由展开了辩论。他声称,有更微妙的方法来推动科学的发展,并断言研究人员在报告数据时应该考虑几件事。一个考虑因素是包括效果大小。对于那些不知道效应大小的人来说,它们是对实验效应大小的定量测量,并与p值一起报告。研究人员不应该只报告是否存在统计差异,而应该包括影响大小,以了解他们发现的重要性和实用性。根据计算的统计检验类型,你可能会发现Cohen的d、Hedges g或偏η平方(η2)在p值后面。换句话说,仅仅因为某件事在统计上有意义并不意味着它有任何真正的重要性。研究人员还可以通过使用置信区间(CI)报告研究结果,提供额外的统计信息和更多的回旋余地。这些区间是“该统计数据周围的一系列值,这些值被认为在一定概率下包含该统计数据的真实值”(Field,2009,第783页)。大多数研究人员使用95%的CI值。这是一种巧妙的方式来指示一系列值——从使用下限到上限——你有95%的信心将包括你的样本统计数据。在p值之后,你可能会看到这样的情况:p=.01,95%置信区间[1.2,2.5]。还有一些不涉及统计数据的问题可以帮助缓解p值引发的焦虑。首先要考虑的是,研究人员应该根据研究结果是新颖的还是重复的,将其置于背景中。如果没有人研究过这个问题、相关数据或进行过类似的方法,那么无论统计结果如何,研究界都应该在推断结果时行使自由裁量权。然而,如果这项研究是对以前研究的复制,或者是继续研究的一部分,我们就有更大的理由接受这些研究的潜在含义。最后,推动了研究数据的免费在线访问。这包括定量和定性数据。如果研究人员愿意让1043843 UPDX1010.177/87551233211043843更新:研究在音乐教育中的应用Silvey Research-article2021
Comments From the Editor: How Big Are Your P Values?
For researchers who conduct quantitative analyses that involve statistical software such as SPSS or R, nothing is more perilous than hitting the execute button and waiting for those p values to appear on the screen. In the case of p ≤ .05, there is usually great rejoicing and a satisfaction that achieving statistical significance means that your study may result in a publishable article. However, if the output shows p ≥ .05, the need for a stiff drink and psychotherapy develops suddenly. If only the number had been different! All of that hard work for nothing! Due to the distorted nature of this easy and lazy categorization that places research findings into those that matter and those that do not, it should come as no surprise that statisticians disagree on the importance and use of p values (Wasserstein & Lazar, 2016). Indeed, many researchers and data scientists have called for the retirement of statistical significance altogether (Amrhein et al., 2019). Although a complete description and discussion of the debate surrounding significance testing is beyond the scope of these comments, there are many resources available for those individuals who like to fall asleep early (c.f., Brereton, 2020; Kennedy-Shaffer, 2019; Vidgen & Yasseri, 2016). Even though the mission of Update is to present “findings of individual studies without research terminology or jargon” (Update, n.d.), we often include quantitative studies that have varying degrees of statistical mumbo jumbo. (I won’t make any excuses, though, other than maybe I should do a better job as Editor.) Because you have woken from the slumber induced by reading the exhaustive list of the strengths and weaknesses of null hypothesis significance testing cited previously, I thought it would be better to present some ways that researchers are attempting to move beyond the p value. Many of our readers, in addition to be excellent practitioners, endeavor to consume even more sophisticated quantitative research, so knowing more about what is happening within the social sciences and other music education research journals could prove beneficial. In a terrific article by Resnick (2017), the case for and against redefining statistical significance is debated. He claims that there are more nuanced ways to move science forward, and asserts that researchers should consider several things when reporting their data. One consideration is to include effect sizes. For those unaware of effect sizes, they are a quantitative measure of the magnitude of an experimental effect, and are reported alongside p values. Rather than only report whether there was a statistical difference, researchers should include effect sizes to contextualize the importance and practicality of their findings. Depending on the type of statistical test that was computed, you might find Cohen’s d, Hedges g, or partial eta squared (η2) hanging out behind that p value. In other words, just because something is statistically significant does not mean that it has any real importance. Researchers can also provide additional statistical information and a bit more wiggle room when reporting findings through the use of confidence intervals (CIs). These intervals are a “range of values around that statistic that are believed to contain, with a certain probability, the real value of that statistic” (Field, 2009, p. 783). Most researchers use a CI value of 95%. This is a neat way to indicate a range of values—from using a lower and an upper limit—that you are 95% confident will include your sample statistic. You might see something like this after the p value: p = .01, 95% CI [1.2, 2.5]. There are also matters that do not involve statistics that can help alleviate the anxiety induced by p values. The first consideration is that researchers should contextualize the results of a study by whether the findings are novel or have been replicated. If no one has ever studied this problem, the associated data, or conducted a similar methodology, it stands to reason that regardless of the statistical findings, the research community should exercise discretion in extrapolating the results. However, if the study is a replication of previous research or part of a continued line of study, we have greater reason to accept the underlying implications of those studies. Finally, there has been a push to make data from studies free and accessible online. This includes both quantitative and qualitative data. If researchers are willing to make their data accessible for 1043843 UPDXXX10.1177/87551233211043843Update: Applications of Research in Music EducationSilvey research-article2021