Falsifiable research is a basic goal of science and is needed for science to be self-correcting. However, the methods for conducting falsifiable research are not widely known among psychological researchers. Describing the effect sizes that can be confidently investigated in confirmatory research is as important as describing the subject population. Power curves or operating characteristics provide this information and are needed for both frequentist and Bayesian analyses. These evaluations of inferential error rates indicate the performance (validity and reliability) of the planned statistical analysis. For meaningful, falsifiable research, the study plan should specify a minimum effect size that is the goal of the study. If any tiny effect, no matter how small, is considered meaningful evidence, the research is not falsifiable and often has negligible predictive value. Power ≥ .95 for the minimum effect is optimal for confirmatory research and .90 is good. From a frequentist perspective, the statistical model for the alternative hypothesis in the power analysis can be used to obtain a p value that can reject the alternative hypothesis, analogous to rejecting the null hypothesis. However, confidence intervals generally provide more intuitive and more informative inferences than p values. The preregistration for falsifiable confirmatory research should include (a) criteria for evidence the alternative hypothesis is true, (b) criteria for evidence the alternative hypothesis is false, and (c) criteria for outcomes that will be inconclusive. Not all confirmatory studies are or need to be falsifiable. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
可证伪性研究是科学的基本目标,也是科学自我修正的必要条件。然而,进行可证伪性研究的方法在心理学研究者中并不广为人知。描述在验证性研究中可以自信地调查的效应大小与描述受试者群体同样重要。功率曲线或工作特性提供了这些信息,频率分析和贝叶斯分析都需要这些信息。这些推断错误率的评估表明计划统计分析的性能(有效性和可靠性)。对于有意义的、可证伪的研究,研究计划应该指定最小效应大小,这是研究的目标。如果任何微小的影响,无论多么微小,都被认为是有意义的证据,那么研究是不可证伪的,通常具有可以忽略不计的预测价值。对于验证性研究,最小效应的功率≥0.95为最佳,功率为0.90为良好。从频率主义者的角度来看,功率分析中备择假设的统计模型可以用来获得一个可以拒绝备择假设的p值,类似于拒绝零假设。然而,置信区间通常比p值提供更直观、更有信息量的推断。可证伪验证性研究的预注册应包括(a)替代假设为真的证据标准,(b)替代假设为假的证据标准,以及(c)结果不确定的标准。并非所有的验证性研究都是或需要是可证伪的。(PsycInfo Database Record (c) 2024 APA,版权所有)。
{"title":"Planning falsifiable confirmatory research.","authors":"James E Kennedy","doi":"10.1037/met0000639","DOIUrl":"https://doi.org/10.1037/met0000639","url":null,"abstract":"<p><p>Falsifiable research is a basic goal of science and is needed for science to be self-correcting. However, the methods for conducting falsifiable research are not widely known among psychological researchers. Describing the effect sizes that can be confidently investigated in confirmatory research is as important as describing the subject population. Power curves or operating characteristics provide this information and are needed for both frequentist and Bayesian analyses. These evaluations of inferential error rates indicate the performance (validity and reliability) of the planned statistical analysis. For meaningful, falsifiable research, the study plan should specify a minimum effect size that is the goal of the study. If any tiny effect, no matter how small, is considered meaningful evidence, the research is not falsifiable and often has negligible predictive value. Power ≥ .95 for the minimum effect is optimal for confirmatory research and .90 is good. From a frequentist perspective, the statistical model for the alternative hypothesis in the power analysis can be used to obtain a <i>p</i> value that can reject the alternative hypothesis, analogous to rejecting the null hypothesis. However, confidence intervals generally provide more intuitive and more informative inferences than p values. The preregistration for falsifiable confirmatory research should include (a) criteria for evidence the alternative hypothesis is true, (b) criteria for evidence the alternative hypothesis is false, and (c) criteria for outcomes that will be inconclusive. Not all confirmatory studies are or need to be falsifiable. (PsycInfo Database Record (c) 2024 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":""},"PeriodicalIF":7.6,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Most studies in psychology, neuroscience, and life science research make inferences about how strong an effect is on average in the population. Yet, many research questions could instead be answered by testing for the universality of the phenomenon under investigation. By using reliable experimental designs that maximize both sensitivity and specificity of individual experiments, each participant or subject can be treated as an independent replication. This approach is common in certain subfields. To date, there is however no formal approach for calculating the evidential value of such small sample studies and to define a priori evidence thresholds that must be met to draw meaningful conclusions. Here we present such a framework, based on the ratio of binomial probabilities between a model assuming the universality of the phenomenon versus the null hypothesis that any incidence of the effect is sporadic. We demonstrate the benefits of this approach, which permits strong conclusions from samples as small as two to five participants and the flexibility of sequential testing. This approach will enable researchers to preregister experimental designs based on small samples and thus enhance the utility and credibility of such studies. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
心理学、神经科学和生命科学的大多数研究都推断出这种影响在人群中的平均强度。然而,许多研究问题可以通过测试被调查现象的普遍性来回答。通过使用可靠的实验设计,最大限度地提高个体实验的敏感性和特异性,每个参与者或受试者都可以被视为一个独立的复制。这种方法在某些子领域中很常见。然而,到目前为止,还没有正式的方法来计算这种小样本研究的证据值,并确定必须满足的先验证据阈值才能得出有意义的结论。在这里,我们提出了这样一个框架,基于假设现象的普遍性的模型与假设效应的任何发生率是零星的零假设之间的二项概率的比率。我们展示了这种方法的好处,它允许从小到两到五个参与者的样本和顺序测试的灵活性得出强有力的结论。这种方法将使研究人员能够基于小样本预先注册实验设计,从而提高此类研究的实用性和可信度。(PsycInfo Database Record (c) 2024 APA,版权所有)。
{"title":"A simple statistical framework for small sample studies.","authors":"D Samuel Schwarzkopf, Zien Huang","doi":"10.1037/met0000710","DOIUrl":"https://doi.org/10.1037/met0000710","url":null,"abstract":"<p><p>Most studies in psychology, neuroscience, and life science research make inferences about how strong an effect is on average in the population. Yet, many research questions could instead be answered by testing for the universality of the phenomenon under investigation. By using reliable experimental designs that maximize both sensitivity and specificity of individual experiments, each participant or subject can be treated as an independent replication. This approach is common in certain subfields. To date, there is however no formal approach for calculating the evidential value of such small sample studies and to define a priori evidence thresholds that must be met to draw meaningful conclusions. Here we present such a framework, based on the ratio of binomial probabilities between a model assuming the universality of the phenomenon versus the null hypothesis that any incidence of the effect is sporadic. We demonstrate the benefits of this approach, which permits strong conclusions from samples as small as two to five participants and the flexibility of sequential testing. This approach will enable researchers to preregister experimental designs based on small samples and thus enhance the utility and credibility of such studies. (PsycInfo Database Record (c) 2024 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":""},"PeriodicalIF":7.6,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142786849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
sequential stopping rule (SSR) can generate a confidence interval (CI) for a standardized mean difference d that has an exact standardized width, ω. Two methods were tested using a broad range of ω and standardized effect sizes δ. A noncentral t (NCt) CI used with normally distributed data had coverages that were nominal at narrow widths but were slightly inflated at wider widths. A distribution-free (Dist-Free) method used with normally distributed data exhibited superior coverage and stopped on average at the expected sample sizes. When used with moderate to severely skewed lognormal distributions, the coverage was too low at large effect sizes even with a very narrow width where Dist-Free was expected to perform well, and the mean stopping sample sizes were absurdly elevated (thousands per group). SSR procedures negatively biased both the raw difference and the "unbiased" Hedges' g in the stopping sample with all methods and distributions. The d was the less biased estimator of δ when the distribution was normal. The poor coverage with a lognormal distribution resulted from a large positive bias in d that increased as a function of both ω and δ. Coverage and point estimation were little improved by using g instead of d. Increased stopping time resulted from the way an estimate of the variance is calculated when it encounters occasional extreme scores generated from the skewed distribution. The Dist-Free SSR method was superior when the distribution was normal or only slightly skewed but is not recommended with moderately skewed distributions. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
顺序停止规则(SSR)可以为具有精确标准化宽度ω的标准化平均差d生成置信区间(CI)。使用广泛的ω和标准化效应大小δ对两种方法进行了测试。使用正态分布数据的非中心t (NCt) CI在窄宽度下具有名义覆盖率,但在宽宽度下略有膨胀。使用正态分布数据的无分布(distfree)方法显示出更好的覆盖率,并且平均停止在预期的样本量上。当使用中度到严重偏斜的对数正态分布时,即使在非常窄的宽度下,Dist-Free预期表现良好,覆盖率也太低,并且平均停止样本量荒谬地升高(每组数千)。SSR程序对所有方法和分布的停止样本中的原始差异和“无偏”对冲系数都负偏倚。当分布为正态时,d是δ的偏小估计量。对数正态分布的低覆盖率是由于d的大正偏置作为ω和δ的函数而增加。使用g代替d,覆盖率和点估计几乎没有改善。当方差估计遇到偶然的偏斜分布产生的极端分数时,其计算方式导致停止时间增加。Dist-Free SSR法适用于正态分布或轻度偏态分布,不适合中度偏态分布。(PsycInfo Database Record (c) 2024 APA,版权所有)。
{"title":"Comparison of noncentral t and distribution-free methods when using sequential procedures to control the width of a confidence interval for a standardized mean difference.","authors":"Douglas A Fitts","doi":"10.1037/met0000671","DOIUrl":"https://doi.org/10.1037/met0000671","url":null,"abstract":"<p><p>sequential stopping rule (SSR) can generate a confidence interval (CI) for a standardized mean difference <i>d</i> that has an exact standardized width, ω. Two methods were tested using a broad range of ω and standardized effect sizes δ. A noncentral t (NCt) CI used with normally distributed data had coverages that were nominal at narrow widths but were slightly inflated at wider widths. A distribution-free (Dist-Free) method used with normally distributed data exhibited superior coverage and stopped on average at the expected sample sizes. When used with moderate to severely skewed lognormal distributions, the coverage was too low at large effect sizes even with a very narrow width where Dist-Free was expected to perform well, and the mean stopping sample sizes were absurdly elevated (thousands per group). SSR procedures negatively biased both the raw difference and the \"unbiased\" Hedges' g in the stopping sample with all methods and distributions. The <i>d</i> was the less biased estimator of δ when the distribution was normal. The poor coverage with a lognormal distribution resulted from a large positive bias in <i>d</i> that increased as a function of both ω and δ. Coverage and point estimation were little improved by using g instead of <i>d</i>. Increased stopping time resulted from the way an estimate of the variance is calculated when it encounters occasional extreme scores generated from the skewed distribution. The Dist-Free SSR method was superior when the distribution was normal or only slightly skewed but is not recommended with moderately skewed distributions. (PsycInfo Database Record (c) 2024 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":"29 6","pages":"1188-1208"},"PeriodicalIF":7.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142882871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2022-07-18DOI: 10.1037/met0000508
Lihan Chen, Rachel T Fouladi
Extreme groups design (EGD) refers to the use of a screening variable to inform further data collection, such that only participants with the lowest and highest scores are recruited in subsequent stages of the study. It is an effective way to improve the power of a study under a limited budget, but produces biased standardized estimates. We demonstrate that the bias in EGD results from its inherent missing at random mechanism, which can be corrected using modern missing data techniques such as full information maximum likelihood (FIML). Further, we provide a tutorial on computing correlations in EGD data with FIML using R. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
极端组设计(EGD)是指使用筛选变量为进一步的数据收集提供信息,以便在研究的后续阶段只招募得分最低和最高的参与者。这是在预算有限的情况下提高研究能力的有效方法,但会产生有偏差的标准化估计。我们证明了EGD中的偏差是由于其固有的随机机制缺失造成的,这可以使用现代缺失数据技术如全信息最大似然(FIML)来纠正。此外,我们还提供了一个使用R. (PsycInfo Database Record (c) 2024 APA,版权所有)计算EGD数据与FIML相关性的教程。
{"title":"Correcting bias in extreme groups design using a missing data approach.","authors":"Lihan Chen, Rachel T Fouladi","doi":"10.1037/met0000508","DOIUrl":"10.1037/met0000508","url":null,"abstract":"<p><p>Extreme groups design (EGD) refers to the use of a screening variable to inform further data collection, such that only participants with the lowest and highest scores are recruited in subsequent stages of the study. It is an effective way to improve the power of a study under a limited budget, but produces biased standardized estimates. We demonstrate that the bias in EGD results from its inherent <i>missing at random</i> mechanism, which can be corrected using modern missing data techniques such as <i>full information maximum likelihood</i> (FIML). Further, we provide a tutorial on computing correlations in EGD data with FIML using R. (PsycInfo Database Record (c) 2024 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"1123-1131"},"PeriodicalIF":7.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9922061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2023-03-06DOI: 10.1037/met0000519
Daniel Redhead, Richard McElreath, Cody T Ross
Social network analysis provides an important framework for studying the causes, consequences, and structure of social ties. However, standard self-report measures-for example, as collected through the popular "name-generator" method-do not provide an impartial representation of such ties, be they transfers, interactions, or social relationships. At best, they represent perceptions filtered through the cognitive biases of respondents. Individuals may, for example, report transfers that did not really occur, or forget to mention transfers that really did. The propensity to make such reporting inaccuracies is both an individual-level and item-level characteristic-variable across members of any given group. Past research has highlighted that many network-level properties are highly sensitive to such reporting inaccuracies. However, there remains a dearth of easily deployed statistical tools that account for such biases. To address this issue, we provide a latent network model that allows researchers to jointly estimate parameters measuring both reporting biases and a latent, underlying social network. Building upon past research, we conduct several simulation experiments in which network data are subject to various reporting biases, and find that these reporting biases strongly impact fundamental network properties. These impacts are not adequately remedied using the most frequently deployed approaches for network reconstruction in the social sciences (i.e., treating either the union or the intersection of double-sampled data as the true network), but are appropriately resolved through the use of our latent network models. To make implementation of our models easier for end-users, we provide a fully documented R package, STRAND, and include a tutorial illustrating its functionality when applied to empirical food/money sharing data from a rural Colombian population. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
社会网络分析为研究社会联系的原因、后果和结构提供了一个重要框架。然而,标准的自我报告测量方法--例如,通过流行的 "姓名生成器 "方法收集的数据--并不能公正地反映这种联系,无论是转移、互动还是社会关系。充其量,它们代表的是经过受访者认知偏差过滤的看法。例如,个人可能会报告并未真正发生的转移,或忘记提及真正发生的转移。这种报告不准确的倾向既是个人层面的特征,也是项目层面的特征--在任何特定群体的成员中都是可变的。过去的研究强调,许多网络层面的属性对此类报告不准确性高度敏感。然而,目前仍缺乏易于使用的统计工具来解释此类偏差。为了解决这个问题,我们提供了一个潜在网络模型,使研究人员能够共同估算出衡量报告偏差和潜在社会网络的参数。在过去研究的基础上,我们进行了几项模拟实验,在这些实验中,网络数据受到各种报告偏差的影响,并发现这些报告偏差对基本网络属性产生了强烈的影响。使用社会科学中最常用的网络重构方法(即把双重采样数据的联合或交集视为真实网络)并不能充分弥补这些影响,但通过使用我们的潜在网络模型,这些影响得到了适当的解决。为了使最终用户更容易实施我们的模型,我们提供了一个文档齐全的 R 软件包 STRAND,并附带了一个教程,说明该软件包在哥伦比亚农村人口的食物/金钱分享实证数据中的功能。(PsycInfo Database Record (c) 2023 APA, 版权所有)。
{"title":"Reliable network inference from unreliable data: A tutorial on latent network modeling using STRAND.","authors":"Daniel Redhead, Richard McElreath, Cody T Ross","doi":"10.1037/met0000519","DOIUrl":"10.1037/met0000519","url":null,"abstract":"<p><p>Social network analysis provides an important framework for studying the causes, consequences, and structure of social ties. However, standard self-report measures-for example, as collected through the popular \"name-generator\" method-do not provide an impartial representation of such ties, be they transfers, interactions, or social relationships. At best, they represent perceptions filtered through the cognitive biases of respondents. Individuals may, for example, report transfers that did not really occur, or forget to mention transfers that really did. The propensity to make such reporting inaccuracies is both an individual-level and item-level characteristic-variable across members of any given group. Past research has highlighted that many network-level properties are highly sensitive to such reporting inaccuracies. However, there remains a dearth of easily deployed statistical tools that account for such biases. To address this issue, we provide a latent network model that allows researchers to jointly estimate parameters measuring both reporting biases and a latent, underlying social network. Building upon past research, we conduct several simulation experiments in which network data are subject to various reporting biases, and find that these reporting biases strongly impact fundamental network properties. These impacts are not adequately remedied using the most frequently deployed approaches for network reconstruction in the social sciences (i.e., treating either the union or the intersection of double-sampled data as the true network), but are appropriately resolved through the use of our latent network models. To make implementation of our models easier for end-users, we provide a fully documented R package, STRAND, and include a tutorial illustrating its functionality when applied to empirical food/money sharing data from a rural Colombian population. (PsycInfo Database Record (c) 2024 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"1100-1122"},"PeriodicalIF":7.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10821258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2023-11-02DOI: 10.1037/met0000610
Andrew H Hales
When preregistered, one-tailed tests control false-positive results at the same rate as two-tailed tests. They are also more powerful, provided the researcher correctly identified the direction of the effect. So it is surprising that they are not more common in psychology. Here I make an argument in favor of one-tailed tests and address common mistaken objections that researchers may have to using them. The arguments presented here only apply in situations where the test is clearly preregistered. If power is truly as urgent an issue as statistics reformers suggest, then the deliberate and thoughtful use of preregistered one-tailed tests ought to be not only permitted, but encouraged in cases where researchers desire greater power. One-tailed tests are especially well suited for applied questions, replications of previously documented effects, or situations where directionally unexpected effects would be meaningless. Preregistered one-tailed tests can sensibly align the researcher's stated theory with their tested hypothesis, bring a coherence to the practice of null hypothesis statistical testing, and produce generally more persuasive results. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
{"title":"One-tailed tests: Let's do this (responsibly).","authors":"Andrew H Hales","doi":"10.1037/met0000610","DOIUrl":"10.1037/met0000610","url":null,"abstract":"<p><p>When preregistered, one-tailed tests control false-positive results at the same rate as two-tailed tests. They are also more powerful, provided the researcher correctly identified the direction of the effect. So it is surprising that they are not more common in psychology. Here I make an argument in favor of one-tailed tests and address common mistaken objections that researchers may have to using them. The arguments presented here only apply in situations where the test is clearly preregistered. If power is truly as urgent an issue as statistics reformers suggest, then the deliberate and thoughtful use of preregistered one-tailed tests ought to be not only permitted, but encouraged in cases where researchers desire greater power. One-tailed tests are especially well suited for applied questions, replications of previously documented effects, or situations where directionally unexpected effects would be meaningless. Preregistered one-tailed tests can sensibly align the researcher's stated theory with their tested hypothesis, bring a coherence to the practice of null hypothesis statistical testing, and produce generally more persuasive results. (PsycInfo Database Record (c) 2024 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"1209-1218"},"PeriodicalIF":7.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71426349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2023-03-09DOI: 10.1037/met0000538
Young Ri Lee, James E Pustejovsky
Cross-classified random effects modeling (CCREM) is a common approach for analyzing cross-classified data in psychology, education research, and other fields. However, when the focus of a study is on the regression coefficients at Level 1 rather than on the random effects, ordinary least squares regression with cluster robust variance estimators (OLS-CRVE) or fixed effects regression with CRVE (FE-CRVE) could be appropriate approaches. These alternative methods are potentially advantageous because they rely on weaker assumptions than those required by CCREM. We conducted a Monte Carlo Simulation study to compare the performance of CCREM, OLS-CRVE, and FE-CRVE in models, including conditions where homoscedasticity assumptions and exogeneity assumptions held and conditions where they were violated, as well as conditions with unmodeled random slopes. We found that CCREM out-performed the alternative approaches when its assumptions are all met. However, when homoscedasticity assumptions are violated, OLS-CRVE and FE-CRVE provided similar or better performance than CCREM. When the exogeneity assumption is violated, only FE-CRVE provided adequate performance. Further, OLS-CRVE and FE-CRVE provided more accurate inferences than CCREM in the presence of unmodeled random slopes. Thus, we recommend two-way FE-CRVE as a good alternative to CCREM, particularly if the homoscedasticity or exogeneity assumptions of the CCREM might be in doubt. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
{"title":"Comparing random effects models, ordinary least squares, or fixed effects with cluster robust standard errors for cross-classified data.","authors":"Young Ri Lee, James E Pustejovsky","doi":"10.1037/met0000538","DOIUrl":"10.1037/met0000538","url":null,"abstract":"<p><p>Cross-classified random effects modeling (CCREM) is a common approach for analyzing cross-classified data in psychology, education research, and other fields. However, when the focus of a study is on the regression coefficients at Level 1 rather than on the random effects, ordinary least squares regression with cluster robust variance estimators (OLS-CRVE) or fixed effects regression with CRVE (FE-CRVE) could be appropriate approaches. These alternative methods are potentially advantageous because they rely on weaker assumptions than those required by CCREM. We conducted a Monte Carlo Simulation study to compare the performance of CCREM, OLS-CRVE, and FE-CRVE in models, including conditions where homoscedasticity assumptions and exogeneity assumptions held and conditions where they were violated, as well as conditions with unmodeled random slopes. We found that CCREM out-performed the alternative approaches when its assumptions are all met. However, when homoscedasticity assumptions are violated, OLS-CRVE and FE-CRVE provided similar or better performance than CCREM. When the exogeneity assumption is violated, only FE-CRVE provided adequate performance. Further, OLS-CRVE and FE-CRVE provided more accurate inferences than CCREM in the presence of unmodeled random slopes. Thus, we recommend two-way FE-CRVE as a good alternative to CCREM, particularly if the homoscedasticity or exogeneity assumptions of the CCREM might be in doubt. (PsycInfo Database Record (c) 2024 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"1084-1099"},"PeriodicalIF":7.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10871401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2023-02-13DOI: 10.1037/met0000546
Miriam K Forbes
Goldberg's (2006) bass-ackward approach to elucidating the hierarchical structure of individual differences data has been used widely to improve our understanding of the relationships among constructs of varying levels of granularity. The traditional approach has been to extract a single component or factor on the first level of the hierarchy, two on the second level, and so on, treating the correlations between adjoining levels akin to path coefficients in a hierarchical structure. This article proposes three modifications to the traditional approach with a particular focus on examining associations among all levels of the hierarchy: (a) identify and remove redundant elements that perpetuate through multiple levels of the hierarchy; (b) (optionally) identify and remove artefactual elements; and (c) plot the strongest correlations among the remaining elements to identify their hierarchical associations. Together these steps can offer a simpler and more complete picture of the underlying hierarchical structure among a set of observed variables. The rationale for each step is described, illustrated in a hypothetical example and three basic simulations, and then applied in real data. The results are compared with the traditional bass-ackward approach together with agglomerative hierarchical cluster analysis, and a basic tutorial with code is provided to apply the extended bass-ackward approach in other data. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
{"title":"Improving hierarchical models of individual differences: An extension of Goldberg's bass-ackward method.","authors":"Miriam K Forbes","doi":"10.1037/met0000546","DOIUrl":"10.1037/met0000546","url":null,"abstract":"<p><p>Goldberg's (2006) bass-ackward approach to elucidating the hierarchical structure of individual differences data has been used widely to improve our understanding of the relationships among constructs of varying levels of granularity. The traditional approach has been to extract a single component or factor on the first level of the hierarchy, two on the second level, and so on, treating the correlations between adjoining levels akin to path coefficients in a hierarchical structure. This article proposes three modifications to the traditional approach with a particular focus on examining associations among <i>all</i> levels of the hierarchy: (a) identify and remove redundant elements that perpetuate through multiple levels of the hierarchy; (b) (optionally) identify and remove artefactual elements; and (c) plot the strongest correlations among the remaining elements to identify their hierarchical associations. Together these steps can offer a simpler and more complete picture of the underlying hierarchical structure among a set of observed variables. The rationale for each step is described, illustrated in a hypothetical example and three basic simulations, and then applied in real data. The results are compared with the traditional bass-ackward approach together with agglomerative hierarchical cluster analysis, and a basic tutorial with code is provided to apply the extended bass-ackward approach in other data. (PsycInfo Database Record (c) 2024 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"1062-1073"},"PeriodicalIF":7.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10696269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2022-10-06DOI: 10.1037/met0000532
Benjamin W Domingue, Klint Kanopka, Sam Trejo, Mijke Rhemtulla, Elliot M Tucker-Drob
Studies of interaction effects are of great interest because they identify crucial interplay between predictors in explaining outcomes. Previous work has considered several potential sources of statistical bias and substantive misinterpretation in the study of interactions, but less attention has been devoted to the role of the outcome variable in such research. Here, we consider bias and false discovery associated with estimates of interaction parameters as a function of the distributional and metric properties of the outcome variable. We begin by illustrating that, for a variety of noncontinuously distributed outcomes (i.e., binary and count outcomes), attempts to use the linear model for recovery leads to catastrophic levels of bias and false discovery. Next, focusing on transformations of normally distributed variables (i.e., censoring and noninterval scaling), we show that linear models again produce spurious interaction effects. We provide explanations offering geometric and algebraic intuition as to why interactions are a challenge for these incorrectly specified models. In light of these findings, we make two specific recommendations. First, a careful consideration of the outcome's distributional properties should be a standard component of interaction studies. Second, researchers should approach research focusing on interactions with heightened levels of scrutiny. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
对交互作用效应的研究非常有意义,因为这些研究可以确定预测因素之间在解释结果方面的重要相互作用。以往的研究已经考虑了交互作用研究中统计偏差和实质性误解的几个潜在来源,但较少关注结果变量在此类研究中的作用。在此,我们将根据结果变量的分布和度量特性,考虑与交互作用参数估计相关的偏差和错误发现。我们首先说明,对于各种非连续分布的结果(即二元结果和计数结果),尝试使用线性模型进行复原会导致灾难性的偏差和错误发现。接下来,我们重点讨论了正态分布变量的转换(即删减和非区间缩放),结果表明线性模型再次产生了虚假的交互效应。我们提供了几何和代数直观的解释,说明为什么交互作用对这些指定不正确的模型是一个挑战。根据这些发现,我们提出了两项具体建议。首先,仔细考虑结果的分布特性应该成为交互作用研究的标准组成部分。其次,研究人员应加强对交互作用研究的审查。(PsycInfo Database Record (c) 2022 APA, 版权所有)。
{"title":"Ubiquitous bias and false discovery due to model misspecification in analysis of statistical interactions: The role of the outcome's distribution and metric properties.","authors":"Benjamin W Domingue, Klint Kanopka, Sam Trejo, Mijke Rhemtulla, Elliot M Tucker-Drob","doi":"10.1037/met0000532","DOIUrl":"10.1037/met0000532","url":null,"abstract":"<p><p>Studies of interaction effects are of great interest because they identify crucial interplay between predictors in explaining outcomes. Previous work has considered several potential sources of statistical bias and substantive misinterpretation in the study of interactions, but less attention has been devoted to the role of the outcome variable in such research. Here, we consider bias and false discovery associated with estimates of interaction parameters as a function of the distributional and metric properties of the outcome variable. We begin by illustrating that, for a variety of noncontinuously distributed outcomes (i.e., binary and count outcomes), attempts to use the linear model for recovery leads to catastrophic levels of bias and false discovery. Next, focusing on transformations of normally distributed variables (i.e., censoring and noninterval scaling), we show that linear models again produce spurious interaction effects. We provide explanations offering geometric and algebraic intuition as to why interactions are a challenge for these incorrectly specified models. In light of these findings, we make two specific recommendations. First, a careful consideration of the outcome's distributional properties should be a standard component of interaction studies. Second, researchers should approach research focusing on interactions with heightened levels of scrutiny. (PsycInfo Database Record (c) 2024 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"1164-1179"},"PeriodicalIF":7.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10369499/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9862990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Most scientific disciplines use significance testing to draw conclusions about experimental or observational data. This classical approach provides a theoretical guarantee for controlling the number of false positives across a set of hypothesis tests, making it an appealing framework for scientists seeking to limit the number of false effects or associations that they claim to observe. Unfortunately, this theoretical guarantee applies to few experiments, and the true false positive rate (FPR) is much higher. Scientists have plenty of freedom to choose the error rate to control, the tests to include in the adjustment, and the method of correction, making strong error control difficult to attain. In addition, hypotheses are often tested after finding unexpected relationships or patterns, the data are analyzed in several ways, and analyses may be run repeatedly as data accumulate. As a result, adjusted p values are too small, incorrect conclusions are often reached, and results are harder to reproduce. In the following, I argue why the FPR is rarely controlled meaningfully and why shrinking parameter estimates is preferable to p value adjustments. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
大多数科学学科都使用显著性检验来得出实验或观察数据的结论。这种经典方法为控制一组假设检验中的假阳性数量提供了理论保证,因此对于寻求限制他们声称观察到的假效应或假关联数量的科学家来说,它是一个很有吸引力的框架。遗憾的是,这种理论保证只适用于极少数实验,真正的假阳性率(FPR)要高得多。科学家有很大的自由度来选择要控制的误差率、纳入调整的检验项目以及校正方法,因此很难实现强有力的误差控制。此外,假设往往是在发现意想不到的关系或模式后才进行检验的,数据分析有多种方式,而且随着数据的积累,分析可能会反复进行。因此,调整后的 p 值过小,往往会得出不正确的结论,结果也更难重现。在下文中,我将论证为什么很少对 FPR 进行有意义的控制,以及为什么缩小参数估计比调整 p 值更可取。(PsycInfo Database Record (c) 2024 APA, 版权所有)。
{"title":"Why multiple hypothesis test corrections provide poor control of false positives in the real world.","authors":"Stanley E Lazic","doi":"10.1037/met0000678","DOIUrl":"https://doi.org/10.1037/met0000678","url":null,"abstract":"<p><p>Most scientific disciplines use significance testing to draw conclusions about experimental or observational data. This classical approach provides a theoretical guarantee for controlling the number of false positives across a set of hypothesis tests, making it an appealing framework for scientists seeking to limit the number of false effects or associations that they claim to observe. Unfortunately, this theoretical guarantee applies to few experiments, and the true false positive rate (FPR) is much higher. Scientists have plenty of freedom to choose the error rate to control, the tests to include in the adjustment, and the method of correction, making strong error control difficult to attain. In addition, hypotheses are often tested after finding unexpected relationships or patterns, the data are analyzed in several ways, and analyses may be run repeatedly as data accumulate. As a result, adjusted <i>p</i> values are too small, incorrect conclusions are often reached, and results are harder to reproduce. In the following, I argue why the FPR is rarely controlled meaningfully and why shrinking parameter estimates is preferable to <i>p</i> value adjustments. (PsycInfo Database Record (c) 2024 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":""},"PeriodicalIF":7.6,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142688594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}