Advances in Methods and Practices in Psychological Science最新文献

英文中文

Corrigendum: Simulation Studies as a Tool to Understand Bayes Factors 勘误表：模拟研究是理解贝叶斯因子的工具

IF 13.6 1区心理学 Q1 PSYCHOLOGY

Advances in Methods and Practices in Psychological Science

Pub Date : 2021-10-01 DOI: 10.1177/25152459211061266

引用次数: 0

A Causal Framework for Cross-Cultural Generalizability 跨文化概括性的因果框架

IF 13.6 1区心理学 Q1 PSYCHOLOGY

Advances in Methods and Practices in Psychological Science

Pub Date : 2021-09-23 DOI: 10.1177/25152459221106366

Dominik Deffner, J. Rohrer, R. Mcelreath

Behavioral researchers increasingly recognize the need for more diverse samples that capture the breadth of human experience. Current attempts to establish generalizability across populations focus on threats to validity, constraints on generalization, and the accumulation of large, cross-cultural data sets. But for continued progress, we also require a framework that lets us determine which inferences can be drawn and how to make informative cross-cultural comparisons. We describe a generative causal-modeling framework and outline simple graphical criteria to derive analytic strategies and implied generalizations. Using both simulated and real data, we demonstrate how to project and compare estimates across populations and further show how to formally represent measurement equivalence or inequivalence across societies. We conclude with a discussion of how a formal framework for generalizability can assist researchers in designing more informative cross-cultural studies and thus provides a more solid foundation for cumulative and generalizable behavioral research.

行为研究人员越来越认识到需要更多样化的样本来捕捉人类经验的广度。目前试图建立跨人群的概括性主要集中在有效性的威胁、概括性的限制以及大型跨文化数据集的积累。但为了继续取得进展，我们还需要一个框架，让我们确定可以得出哪些推论，以及如何进行信息丰富的跨文化比较。我们描述了一个生成的因果建模框架，并概述了简单的图形准则，以导出分析策略和隐含的概括。使用模拟和真实数据，我们演示了如何在人群中预测和比较估计，并进一步展示了如何正式表示跨社会的测量等效或不等价。最后，我们讨论了概括性的正式框架如何帮助研究人员设计更多信息丰富的跨文化研究，从而为累积性和概括性行为研究提供更坚实的基础。

引用次数: 27

A Guide to Visualizing Trajectories of Change With Confidence Bands and Raw Data 用置信带和原始数据可视化变化轨迹指南

IF 13.6 1区心理学 Q1 PSYCHOLOGY

Advances in Methods and Practices in Psychological Science

Pub Date : 2021-08-27 DOI: 10.1177/25152459211047228

Andrea L. Howard

This tutorial is aimed at researchers working with repeated measures or longitudinal data who are interested in enhancing their visualizations of model-implied mean-level trajectories plotted over time with confidence bands and raw data. The intended audience is researchers who are already modeling their experimental, observational, or other repeated measures data over time using random-effects regression or latent curve modeling but who lack a comprehensive guide to visualize trajectories over time. This tutorial uses an example plotting trajectories from two groups, as seen in random-effects models that include Time × Group interactions and latent curve models that regress the latent time slope factor onto a grouping variable. This tutorial is also geared toward researchers who are satisfied with their current software environment for modeling repeated measures data but who want to make graphics using R software. Prior knowledge of R is not assumed, and readers can follow along using data and other supporting materials available via OSF at https://osf.io/78bk5/. Readers should come away from this tutorial with the tools needed to begin visualizing mean trajectories over time from their own models and enhancing those plots with graphical estimates of uncertainty and raw data that adhere to transparent practices in research reporting.

本教程旨在研究重复测量或纵向数据的研究人员，他们有兴趣增强模型隐含的平均水平轨迹的可视化，这些轨迹随时间随置信带和原始数据绘制。目标受众是已经使用随机效应回归或潜在曲线建模对实验、观察或其他重复测量数据进行建模的研究人员，但他们缺乏全面的指导来可视化随时间变化的轨迹。本教程使用了一个从两个组绘制轨迹的示例，如在随机效应模型中看到的那样，随机效应模型包括Time x Group相互作用和将潜在时间斜率因子回归到分组变量的潜在曲线模型。本教程还面向那些对当前的重复测量数据建模软件环境感到满意，但想要使用R软件制作图形的研究人员。本文不要求读者具备R的先验知识，读者可以使用OSF网站https://osf.io/78bk5/上提供的数据和其他支持材料。读者应该从本教程中获得所需的工具，开始从他们自己的模型中可视化平均轨迹，并通过不确定性的图形估计和坚持透明实践的原始数据来增强这些图。

{"title":"A Guide to Visualizing Trajectories of Change With Confidence Bands and Raw Data","authors":"Andrea L. Howard","doi":"10.1177/25152459211047228","DOIUrl":"https://doi.org/10.1177/25152459211047228","url":null,"abstract":"This tutorial is aimed at researchers working with repeated measures or longitudinal data who are interested in enhancing their visualizations of model-implied mean-level trajectories plotted over time with confidence bands and raw data. The intended audience is researchers who are already modeling their experimental, observational, or other repeated measures data over time using random-effects regression or latent curve modeling but who lack a comprehensive guide to visualize trajectories over time. This tutorial uses an example plotting trajectories from two groups, as seen in random-effects models that include Time × Group interactions and latent curve models that regress the latent time slope factor onto a grouping variable. This tutorial is also geared toward researchers who are satisfied with their current software environment for modeling repeated measures data but who want to make graphics using R software. Prior knowledge of R is not assumed, and readers can follow along using data and other supporting materials available via OSF at https://osf.io/78bk5/. Readers should come away from this tutorial with the tools needed to begin visualizing mean trajectories over time from their own models and enhancing those plots with graphical estimates of uncertainty and raw data that adhere to transparent practices in research reporting.","PeriodicalId":55645,"journal":{"name":"Advances in Methods and Practices in Psychological Science","volume":" ","pages":""},"PeriodicalIF":13.6,"publicationDate":"2021-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46770629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A Conceptual Framework for Investigating and Mitigating Machine-Learning Measurement Bias (MLMB) in Psychological Assessment 一个调查和缓解心理评估中机器学习测量偏差（MLMB）的概念框架

IF 13.6 1区心理学 Q1 PSYCHOLOGY

Advances in Methods and Practices in Psychological Science

Pub Date : 2021-07-30 DOI: 10.1177/25152459211061337

L. Tay, S. E. Woo, L. Hickman, Brandon M. Booth, Sidney K. D’Mello

Given significant concerns about fairness and bias in the use of artificial intelligence (AI) and machine learning (ML) for psychological assessment, we provide a conceptual framework for investigating and mitigating machine-learning measurement bias (MLMB) from a psychometric perspective. MLMB is defined as differential functioning of the trained ML model between subgroups. MLMB manifests empirically when a trained ML model produces different predicted score levels for different subgroups (e.g., race, gender) despite them having the same ground-truth levels for the underlying construct of interest (e.g., personality) and/or when the model yields differential predictive accuracies across the subgroups. Because the development of ML models involves both data and algorithms, both biased data and algorithm-training bias are potential sources of MLMB. Data bias can occur in the form of nonequivalence between subgroups in the ground truth, platform-based construct, behavioral expression, and/or feature computing. Algorithm-training bias can occur when algorithms are developed with nonequivalence in the relation between extracted features and ground truth (i.e., algorithm features are differentially used, weighted, or transformed between subgroups). We explain how these potential sources of bias may manifest during ML model development and share initial ideas for mitigating them, including recognizing that new statistical and algorithmic procedures need to be developed. We also discuss how this framework clarifies MLMB but does not reduce the complexity of the issue.

考虑到在使用人工智能（AI）和机器学习（ML）进行心理评估时对公平性和偏见的严重担忧，我们从心理测量的角度提供了一个调查和减轻机器学习测量偏见（MLMB）的概念框架。MLMB被定义为经过训练的ML模型在亚组之间的差分功能。当训练的ML模型为不同的亚组（例如，种族、性别）产生不同的预测得分水平，尽管他们对潜在的兴趣结构（例如，个性）具有相同的基本事实水平时，MLMB在经验上表现出来，和/或当模型在亚组之间产生差异的预测准确性时。由于ML模型的开发涉及数据和算法，因此有偏差的数据和算法训练偏差都是MLMB的潜在来源。在基本事实、基于平台的构建、行为表达和/或特征计算中，数据偏差可能以亚组之间不等价的形式出现。当算法在提取的特征和基本事实之间的关系中具有非等价性时（即，算法特征在子群之间被差异地使用、加权或变换），可能会出现算法训练偏差。我们解释了这些潜在的偏差来源在ML模型开发过程中是如何表现出来的，并分享了缓解这些偏差的初步想法，包括认识到需要开发新的统计和算法程序。我们还讨论了该框架如何澄清MLMB，但不降低问题的复杂性。

{"title":"A Conceptual Framework for Investigating and Mitigating Machine-Learning Measurement Bias (MLMB) in Psychological Assessment","authors":"L. Tay, S. E. Woo, L. Hickman, Brandon M. Booth, Sidney K. D’Mello","doi":"10.1177/25152459211061337","DOIUrl":"https://doi.org/10.1177/25152459211061337","url":null,"abstract":"Given significant concerns about fairness and bias in the use of artificial intelligence (AI) and machine learning (ML) for psychological assessment, we provide a conceptual framework for investigating and mitigating machine-learning measurement bias (MLMB) from a psychometric perspective. MLMB is defined as differential functioning of the trained ML model between subgroups. MLMB manifests empirically when a trained ML model produces different predicted score levels for different subgroups (e.g., race, gender) despite them having the same ground-truth levels for the underlying construct of interest (e.g., personality) and/or when the model yields differential predictive accuracies across the subgroups. Because the development of ML models involves both data and algorithms, both biased data and algorithm-training bias are potential sources of MLMB. Data bias can occur in the form of nonequivalence between subgroups in the ground truth, platform-based construct, behavioral expression, and/or feature computing. Algorithm-training bias can occur when algorithms are developed with nonequivalence in the relation between extracted features and ground truth (i.e., algorithm features are differentially used, weighted, or transformed between subgroups). We explain how these potential sources of bias may manifest during ML model development and share initial ideas for mitigating them, including recognizing that new statistical and algorithmic procedures need to be developed. We also discuss how this framework clarifies MLMB but does not reduce the complexity of the issue.","PeriodicalId":55645,"journal":{"name":"Advances in Methods and Practices in Psychological Science","volume":" ","pages":""},"PeriodicalIF":13.6,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47805881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

ManyClasses 1: Assessing the Generalizable Effect of Immediate Feedback Versus Delayed Feedback Across Many College Classes 许多课程1:评估即时反馈与延迟反馈在许多大学课程中的推广效果

IF 13.6 1区心理学 Q1 PSYCHOLOGY

Advances in Methods and Practices in Psychological Science

Pub Date : 2021-07-01 DOI: 10.1177/25152459211027575

Emily R. Fyfe, J. D. Leeuw, Paulo F. Carvalho, Robert L. Goldstone, Janelle Sherman, D. Admiraal, Laura K. Alford, Alison Bonner, C. Brassil, Christopher A. Brooks, Tracey Carbonetto, Sau Hou Chang, Laura Cruz, Melina T. Czymoniewicz-Klippel, F. Daniel, M. Driessen, Noel Habashy, Carrie Hanson-Bradley, E. Hirt, Virginia Hojas Carbonell, Daniel K. Jackson, Shay Jones, Jennifer L. Keagy, Brandi Keith, Sarah J. Malmquist, B. McQuarrie, K. Metzger, Maung Min, S. Patil, Ryan S. Patrick, Etienne Pelaprat, Maureen L. Petrunich-Rutherford, Meghan R. Porter, Kristina K. Prescott, Cathrine Reck, Terri Renner, E. Robbins, Adam R. Smith, P. Stuczynski, J. Thompson, N. Tsotakos, J. Turk, Kyle Unruh, Jennifer Webb, S. Whitehead, E. Wisniewski, Ke Anne Zhang, Benjamin A. Motz

Psychology researchers have long attempted to identify educational practices that improve student learning. However, experimental research on these practices is often conducted in laboratory contexts or in a single course, which threatens the external validity of the results. In this article, we establish an experimental paradigm for evaluating the benefits of recommended practices across a variety of authentic educational contexts—a model we call ManyClasses. The core feature is that researchers examine the same research question and measure the same experimental effect across many classes spanning a range of topics, institutions, teacher implementations, and student populations. We report the first ManyClasses study, in which we examined how the timing of feedback on class assignments, either immediate or delayed by a few days, affected subsequent performance on class assessments. Across 38 classes, the overall estimate for the effect of feedback timing was 0.002 (95% highest density interval = [−0.05, 0.05]), which indicates that there was no effect of immediate feedback compared with delayed feedback on student learning that generalizes across classes. Furthermore, there were no credibly nonzero effects for 40 preregistered moderators related to class-level and student-level characteristics. Yet our results provide hints that in certain kinds of classes, which were undersampled in the current study, there may be modest advantages for delayed feedback. More broadly, these findings provide insights regarding the feasibility of conducting within-class randomized experiments across a range of naturally occurring learning environments.

长期以来，心理学研究人员一直试图找出提高学生学习能力的教育实践。然而，这些实践的实验研究往往是在实验室背景下或在单一课程中进行的，这威胁到结果的外部有效性。在本文中，我们建立了一个实验范例，用于评估各种真实教育环境中推荐实践的好处——我们称之为ManyClasses的模型。核心特征是研究人员检查相同的研究问题，并测量跨越一系列主题、机构、教师实施和学生群体的许多班级的相同实验效果。我们报告了第一个ManyClasses研究，在该研究中，我们检查了课堂作业反馈的时间，无论是立即的还是延迟几天的，如何影响课堂评估的后续表现。在38个班级中，反馈时间影响的总体估计为0.002(95%最高密度区间=[−0.05,0.05])，这表明即时反馈与延迟反馈相比，对学生学习的影响没有跨班级的普遍性。此外，40个预先注册的调节因子在班级水平和学生水平特征方面没有可信的非零效应。然而，我们的结果提供了一些提示，即在当前研究中样本不足的某些类别中，延迟反馈可能有适度的优势。更广泛地说，这些发现为在一系列自然发生的学习环境中进行课堂随机实验的可行性提供了见解。

{"title":"ManyClasses 1: Assessing the Generalizable Effect of Immediate Feedback Versus Delayed Feedback Across Many College Classes","authors":"Emily R. Fyfe, J. D. Leeuw, Paulo F. Carvalho, Robert L. Goldstone, Janelle Sherman, D. Admiraal, Laura K. Alford, Alison Bonner, C. Brassil, Christopher A. Brooks, Tracey Carbonetto, Sau Hou Chang, Laura Cruz, Melina T. Czymoniewicz-Klippel, F. Daniel, M. Driessen, Noel Habashy, Carrie Hanson-Bradley, E. Hirt, Virginia Hojas Carbonell, Daniel K. Jackson, Shay Jones, Jennifer L. Keagy, Brandi Keith, Sarah J. Malmquist, B. McQuarrie, K. Metzger, Maung Min, S. Patil, Ryan S. Patrick, Etienne Pelaprat, Maureen L. Petrunich-Rutherford, Meghan R. Porter, Kristina K. Prescott, Cathrine Reck, Terri Renner, E. Robbins, Adam R. Smith, P. Stuczynski, J. Thompson, N. Tsotakos, J. Turk, Kyle Unruh, Jennifer Webb, S. Whitehead, E. Wisniewski, Ke Anne Zhang, Benjamin A. Motz","doi":"10.1177/25152459211027575","DOIUrl":"https://doi.org/10.1177/25152459211027575","url":null,"abstract":"Psychology researchers have long attempted to identify educational practices that improve student learning. However, experimental research on these practices is often conducted in laboratory contexts or in a single course, which threatens the external validity of the results. In this article, we establish an experimental paradigm for evaluating the benefits of recommended practices across a variety of authentic educational contexts—a model we call ManyClasses. The core feature is that researchers examine the same research question and measure the same experimental effect across many classes spanning a range of topics, institutions, teacher implementations, and student populations. We report the first ManyClasses study, in which we examined how the timing of feedback on class assignments, either immediate or delayed by a few days, affected subsequent performance on class assessments. Across 38 classes, the overall estimate for the effect of feedback timing was 0.002 (95% highest density interval = [−0.05, 0.05]), which indicates that there was no effect of immediate feedback compared with delayed feedback on student learning that generalizes across classes. Furthermore, there were no credibly nonzero effects for 40 preregistered moderators related to class-level and student-level characteristics. Yet our results provide hints that in certain kinds of classes, which were undersampled in the current study, there may be modest advantages for delayed feedback. More broadly, these findings provide insights regarding the feasibility of conducting within-class randomized experiments across a range of naturally occurring learning environments.","PeriodicalId":55645,"journal":{"name":"Advances in Methods and Practices in Psychological Science","volume":" ","pages":""},"PeriodicalIF":13.6,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/25152459211027575","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45755906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Summary Plots With Adjusted Error Bars: The superb Framework With an Implementation in R 带调整误差条的汇总图：在R中实现的出色框架

IF 13.6 1区心理学 Q1 PSYCHOLOGY

Advances in Methods and Practices in Psychological Science

Pub Date : 2021-07-01 DOI: 10.1177/25152459211035109

D. Cousineau, Marc-André Goulet, Bradley Harding

Plotting the data of an experiment allows researchers to illustrate the main results of a study, show effect sizes, compare conditions, and guide interpretations. To achieve all this, it is necessary to show point estimates of the results and their precision using error bars. Often, and potentially unbeknownst to them, researchers use a type of error bars—the confidence intervals—that convey limited information. For instance, confidence intervals do not allow comparing results (a) between groups, (b) between repeated measures, (c) when participants are sampled in clusters, and (d) when the population size is finite. The use of such stand-alone error bars can lead to discrepancies between the plot’s display and the conclusions derived from statistical tests. To overcome this problem, we propose to generalize the precision of the results (the confidence intervals) by adjusting them so that they take into account the experimental design and the sampling methodology. Unfortunately, most software dedicated to statistical analyses do not offer options to adjust error bars. As a solution, we developed an open-access, open-source library for R—superb—that allows users to create summary plots with easily adjusted error bars.

绘制实验数据可以让研究人员说明研究的主要结果，显示影响大小，比较条件，并指导解释。为了实现这一切，有必要使用误差条显示结果的点估计及其精度。研究人员经常使用一种误差条——置信区间——来传递有限的信息，这可能是他们不知道的。例如，置信区间不允许比较结果（a）组之间，（b）重复测量之间，（c）当参与者在集群中采样时，以及（d）当群体大小有限时。使用这种独立的误差条可能会导致绘图显示与统计测试得出的结论之间的差异。为了克服这个问题，我们建议通过调整结果（置信区间）来推广结果的精度，使其考虑到实验设计和采样方法。不幸的是，大多数专门用于统计分析的软件都不提供调整误差条的选项。作为一种解决方案，我们为R开发了一个开放访问的开源库——非常棒——它允许用户创建带有易于调整的错误条的摘要图。

{"title":"Summary Plots With Adjusted Error Bars: The superb Framework With an Implementation in R","authors":"D. Cousineau, Marc-André Goulet, Bradley Harding","doi":"10.1177/25152459211035109","DOIUrl":"https://doi.org/10.1177/25152459211035109","url":null,"abstract":"Plotting the data of an experiment allows researchers to illustrate the main results of a study, show effect sizes, compare conditions, and guide interpretations. To achieve all this, it is necessary to show point estimates of the results and their precision using error bars. Often, and potentially unbeknownst to them, researchers use a type of error bars—the confidence intervals—that convey limited information. For instance, confidence intervals do not allow comparing results (a) between groups, (b) between repeated measures, (c) when participants are sampled in clusters, and (d) when the population size is finite. The use of such stand-alone error bars can lead to discrepancies between the plot’s display and the conclusions derived from statistical tests. To overcome this problem, we propose to generalize the precision of the results (the confidence intervals) by adjusting them so that they take into account the experimental design and the sampling methodology. Unfortunately, most software dedicated to statistical analyses do not offer options to adjust error bars. As a solution, we developed an open-access, open-source library for R—superb—that allows users to create summary plots with easily adjusted error bars.","PeriodicalId":55645,"journal":{"name":"Advances in Methods and Practices in Psychological Science","volume":"4 1","pages":""},"PeriodicalIF":13.6,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42212519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Justify Your Alpha: A Primer on Two Practical Approaches 证明你的Alpha:两种实用方法的入门

IF 13.6 1区心理学 Q1 PSYCHOLOGY

Advances in Methods and Practices in Psychological Science

Pub Date : 2021-06-14 DOI: 10.1177/25152459221080396

Maximilian Maier, D. Lakens

The default use of an alpha level of .05 is suboptimal for two reasons. First, decisions based on data can be made more efficiently by choosing an alpha level that minimizes the combined Type 1 and Type 2 error rate. Second, it is possible that in studies with very high statistical power, p values lower than the alpha level can be more likely when the null hypothesis is true than when the alternative hypothesis is true (i.e., Lindley’s paradox). In this article, we explain two approaches that can be used to justify a better choice of an alpha level than relying on the default threshold of .05. The first approach is based on the idea to either minimize or balance Type 1 and Type 2 error rates. The second approach lowers the alpha level as a function of the sample size to prevent Lindley’s paradox. An R package and Shiny app are provided to perform the required calculations. Both approaches have their limitations (e.g., the challenge of specifying relative costs and priors) but can offer an improvement to current practices, especially when sample sizes are large. The use of alpha levels that are better justified should improve statistical inferences and can increase the efficiency and informativeness of scientific research.

默认使用0.05的alpha水平是次优的，原因有两个。首先，通过选择一个最小化类型1和类型2错误率的alpha水平，基于数据的决策可以更有效地进行。其次，在具有非常高统计能力的研究中，当零假设为真时，p值低于α水平的可能性比备择假设为真时更大(即林德利悖论)。在本文中，我们解释了两种方法，它们可以用来证明选择α水平比依赖于默认阈值0.05更好。第一种方法基于最小化或平衡类型1和类型2错误率的想法。第二种方法降低alpha水平作为样本大小的函数，以防止林德利悖论。提供了一个R包和Shiny应用程序来执行所需的计算。这两种方法都有其局限性(例如，指定相对成本和先验的挑战)，但可以提供对当前实践的改进，特别是在样本量很大的情况下。使用更合理的alpha水平应能改进统计推断，并能提高科学研究的效率和信息量。

{"title":"Justify Your Alpha: A Primer on Two Practical Approaches","authors":"Maximilian Maier, D. Lakens","doi":"10.1177/25152459221080396","DOIUrl":"https://doi.org/10.1177/25152459221080396","url":null,"abstract":"The default use of an alpha level of .05 is suboptimal for two reasons. First, decisions based on data can be made more efficiently by choosing an alpha level that minimizes the combined Type 1 and Type 2 error rate. Second, it is possible that in studies with very high statistical power, p values lower than the alpha level can be more likely when the null hypothesis is true than when the alternative hypothesis is true (i.e., Lindley’s paradox). In this article, we explain two approaches that can be used to justify a better choice of an alpha level than relying on the default threshold of .05. The first approach is based on the idea to either minimize or balance Type 1 and Type 2 error rates. The second approach lowers the alpha level as a function of the sample size to prevent Lindley’s paradox. An R package and Shiny app are provided to perform the required calculations. Both approaches have their limitations (e.g., the challenge of specifying relative costs and priors) but can offer an improvement to current practices, especially when sample sizes are large. The use of alpha levels that are better justified should improve statistical inferences and can increase the efficiency and informativeness of scientific research.","PeriodicalId":55645,"journal":{"name":"Advances in Methods and Practices in Psychological Science","volume":" ","pages":""},"PeriodicalIF":13.6,"publicationDate":"2021-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43639586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Evaluating Response Shift in Statistical Mediation Analysis 评价统计中介分析中的反应转移

IF 13.6 1区心理学 Q1 PSYCHOLOGY

Advances in Methods and Practices in Psychological Science

Pub Date : 2021-04-01 DOI: 10.1177/25152459211012271

A. Georgeson, Matthew J. Valente, Oscar Gonzalez

Researchers and prevention scientists often develop interventions to target intermediate variables (known as mediators) that are thought to be related to an outcome. When researchers target a mediating construct measured by self-report, the meaning of the self-report measure could change from pretest to posttest for the individuals who received the intervention—which is a phenomenon referred to as response shift. As a result, any observed changes on the mediator measure across groups or across time might reflect a combination of true change on the construct and response shift. Although previous studies have focused on identifying the source and type of response shift in measures after an intervention, there has been limited research on how using sum scores in the presence of response shift affects the estimation of mediated effects via statistical mediation analysis, which is critical for explaining how the intervention worked. In this article, we focus on recalibration response shift, which is a change in internal standards of measurement and affects how respondents interpret the response scale. We provide background on the theory of response shift and the methodology used to detect response shift (i.e., tests of measurement invariance). In addition, we used simulated data sets to provide an illustration of how recalibration in the mediator can bias estimates of the mediated effect and affect Type I error and power.

研究人员和预防科学家经常针对被认为与结果相关的中间变量（称为中介变量）制定干预措施。当研究人员针对通过自我报告测量的中介结构时，对于接受干预的个体来说，自我报告的含义可能会从前测变为后测，这是一种被称为反应转变的现象。因此，跨组或跨时间观察到的中介测量的任何变化都可能反映出结构和反应变化的真实变化。尽管之前的研究侧重于确定干预后测量中反应转变的来源和类型，但关于在反应转变的情况下使用总分如何通过统计中介分析影响中介效应的估计的研究有限，这对于解释干预是如何起作用的至关重要。在这篇文章中，我们关注的是重新校准反应转变，这是内部衡量标准的变化，并影响受访者如何解释反应量表。我们提供了关于响应偏移理论和用于检测响应偏移的方法的背景（即测量不变性的测试）。此外，我们使用模拟数据集来说明介质中的重新校准如何会对介导效应的估计产生偏差，并影响I型误差和功率。

{"title":"Evaluating Response Shift in Statistical Mediation Analysis","authors":"A. Georgeson, Matthew J. Valente, Oscar Gonzalez","doi":"10.1177/25152459211012271","DOIUrl":"https://doi.org/10.1177/25152459211012271","url":null,"abstract":"Researchers and prevention scientists often develop interventions to target intermediate variables (known as mediators) that are thought to be related to an outcome. When researchers target a mediating construct measured by self-report, the meaning of the self-report measure could change from pretest to posttest for the individuals who received the intervention—which is a phenomenon referred to as response shift. As a result, any observed changes on the mediator measure across groups or across time might reflect a combination of true change on the construct and response shift. Although previous studies have focused on identifying the source and type of response shift in measures after an intervention, there has been limited research on how using sum scores in the presence of response shift affects the estimation of mediated effects via statistical mediation analysis, which is critical for explaining how the intervention worked. In this article, we focus on recalibration response shift, which is a change in internal standards of measurement and affects how respondents interpret the response scale. We provide background on the theory of response shift and the methodology used to detect response shift (i.e., tests of measurement invariance). In addition, we used simulated data sets to provide an illustration of how recalibration in the mediator can bias estimates of the mediated effect and affect Type I error and power.","PeriodicalId":55645,"journal":{"name":"Advances in Methods and Practices in Psychological Science","volume":" ","pages":""},"PeriodicalIF":13.6,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/25152459211012271","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44476513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Improving Transparency, Falsifiability, and Rigor by Making Hypothesis Tests Machine-Readable 通过使假设测试机器可读来提高透明度、可证伪性和严谨性

IF 13.6 1区心理学 Q1 PSYCHOLOGY

Advances in Methods and Practices in Psychological Science

Pub Date : 2021-04-01 DOI: 10.1177/2515245920970949

D. Lakens, L. DeBruine

Making scientific information machine-readable greatly facilitates its reuse. Many scientific articles have the goal to test a hypothesis, so making the tests of statistical predictions easier to find and access could be very beneficial. We propose an approach that can be used to make hypothesis tests machine-readable. We believe there are two benefits to specifying a hypothesis test in such a way that a computer can evaluate whether the statistical prediction is corroborated or not. First, hypothesis tests become more transparent, falsifiable, and rigorous. Second, scientists benefit if information related to hypothesis tests in scientific articles is easily findable and reusable, for example, to perform meta-analyses, conduct peer review, and examine metascientific research questions. We examine what a machine-readable hypothesis test should look like and demonstrate the feasibility of machine-readable hypothesis tests in a real-life example using the fully operational prototype R package scienceverse.

使科学信息具有机器可读性极大地促进了科学信息的重用。许多科学文章的目标都是检验一个假设，因此让统计预测的检验更容易找到和获取可能是非常有益的。我们提出了一种可以用于使假设检验具有机器可读性的方法。我们认为，以这样一种方式指定假设检验有两个好处，即计算机可以评估统计预测是否得到证实。首先，假设检验变得更加透明、可证伪和严格。其次，如果科学文章中与假设检验相关的信息很容易找到并可重复使用，例如进行荟萃分析、进行同行评审和审查元科学研究问题，科学家就会受益。我们研究了机器可读假设测试应该是什么样子，并使用完全可操作的原型R包scienceverse在现实生活中展示了机器可读假说测试的可行性。

引用次数: 14

Psychologists Should Use Brunner-Munzel’s Instead of Mann-Whitney’s U Test as the Default Nonparametric Procedure 心理学家应该使用Brunner-Munzel的U检验而不是Mann-Whitney的U检验作为默认的非参数程序

IF 13.6 1区心理学 Q1 PSYCHOLOGY

Advances in Methods and Practices in Psychological Science

Pub Date : 2021-04-01 DOI: 10.1177/2515245921999602

J. Karch

To investigate whether a variable tends to be larger in one population than in another, the t test is the standard procedure. In some situations, the parametric t test is inappropriate, and a nonparametric procedure should be used instead. The default nonparametric procedure is Mann-Whitney’s U test. Despite being a nonparametric test, Mann-Whitney’s test is associated with a strong assumption, known as exchangeability. I demonstrate that if exchangeability is violated, Mann-Whitney’s test can lead to wrong statistical inferences even for large samples. In addition, I argue that in psychology, exchangeability is typically not met. As a remedy, I introduce Brunner-Munzel’s test and demonstrate that it provides good Type I error rate control even if exchangeability is not met and that it has similar power as Mann-Whitney’s test. Consequently, I recommend using Brunner-Munzel’s test by default. To facilitate this, I provide advice on how to perform and report on Brunner-Munzel’s test.

为了研究一个变量在一个群体中是否倾向于比在另一个群体更大，t检验是标准程序。在某些情况下，参数t检验是不合适的，应该使用非参数程序。默认的非参数程序是Mann-Whitney的U检验。尽管Mann-Whitney检验是一个非参数检验，但它与一个强假设有关，即可交换性。我证明，如果违反了可交换性，即使对于大样本，Mann-Whitney的测试也可能导致错误的统计推断。此外，我认为在心理学中，交换性通常是不满足的。作为补救措施，我介绍了Brunner-Munzel的测试，并证明了即使不满足可交换性，它也能提供良好的I型错误率控制，并且它与Mann-Whitney的测试具有相似的能力。因此，我建议默认使用Brunner-Munzel的测试。为了促进这一点，我提供了关于如何进行Brunner Munzel测试和报告的建议。

引用次数: 15

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Advances in Methods and Practices in Psychological Science

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀