Pub Date : 2025-10-01Epub Date: 2023-09-07DOI: 10.1037/met0000602
Irene Klugkist, Thom Benjamin Volker
To establish a theory one needs cleverly designed and well-executed studies with appropriate and correctly interpreted statistical analyses. Equally important, one also needs replications of such studies and a way to combine the results of several replications into an accumulated state of knowledge. An approach that provides an appropriate and powerful analysis for studies targeting prespecified theories is the use of Bayesian informative hypothesis testing. An additional advantage of the use of this Bayesian approach is that combining the results from multiple studies is straightforward. In this article, we discuss the behavior of Bayes factors in the context of evaluating informative hypotheses with multiple studies. By using simple models and (partly) analytical solutions, we introduce and evaluate Bayesian evidence synthesis (BES) and compare its results to Bayesian sequential updating. By doing so, we clarify how different replications or updating questions can be evaluated. In addition, we illustrate BES with two simulations, in which multiple studies are generated to resemble conceptual replications. The studies in these simulations are too heterogeneous to be aggregated with conventional research synthesis methods. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
为了建立一个理论,人们需要巧妙设计和良好执行的研究,以及适当和正确解释的统计分析。同样重要的是,人们还需要重复这样的研究,并找到一种方法,将多次重复的结果结合起来,形成一种积累的知识状态。使用贝叶斯信息假设检验是一种为针对预先指定理论的研究提供适当和有力分析的方法。使用贝叶斯方法的另一个优点是,将多个研究的结果结合起来是直接的。在这篇文章中,我们讨论了贝叶斯因素的行为在评估信息假设与多个研究的背景下。通过使用简单的模型和(部分)解析解,我们介绍和评估了贝叶斯证据合成(BES),并将其结果与贝叶斯序列更新进行了比较。通过这样做,我们阐明了如何评估不同的重复或更新问题。此外,我们用两个模拟来说明BES,其中生成了多个类似概念复制的研究。这些模拟中的研究太过异质,无法用传统的研究综合方法进行汇总。(PsycInfo Database Record (c) 2025 APA,版权所有)。
{"title":"Bayesian evidence synthesis for informative hypotheses: An introduction.","authors":"Irene Klugkist, Thom Benjamin Volker","doi":"10.1037/met0000602","DOIUrl":"10.1037/met0000602","url":null,"abstract":"<p><p>To establish a theory one needs cleverly designed and well-executed studies with appropriate and correctly interpreted statistical analyses. Equally important, one also needs replications of such studies and a way to combine the results of several replications into an accumulated state of knowledge. An approach that provides an appropriate and powerful analysis for studies targeting prespecified theories is the use of Bayesian informative hypothesis testing. An additional advantage of the use of this Bayesian approach is that combining the results from multiple studies is straightforward. In this article, we discuss the behavior of Bayes factors in the context of evaluating informative hypotheses with multiple studies. By using simple models and (partly) analytical solutions, we introduce and evaluate Bayesian evidence synthesis (BES) and compare its results to Bayesian sequential updating. By doing so, we clarify how different replications or updating questions can be evaluated. In addition, we illustrate BES with two simulations, in which multiple studies are generated to resemble conceptual replications. The studies in these simulations are too heterogeneous to be aggregated with conventional research synthesis methods. (PsycInfo Database Record (c) 2025 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"949-965"},"PeriodicalIF":7.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10173540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2023-12-25DOI: 10.1037/met0000624
Esther Maassen, E Damiano D'Urso, Marcel A L M van Assen, Michèle B Nuijten, Kim De Roover, Jelte M Wicherts
Self-report scales are widely used in psychology to compare means in latent constructs across groups, experimental conditions, or time points. However, for these comparisons to be meaningful and unbiased, the scales must demonstrate measurement invariance (MI) across compared time points or (experimental) groups. MI testing determines whether the latent constructs are measured equivalently across groups or time, which is essential for meaningful comparisons. We conducted a systematic review of 426 psychology articles with openly available data, to (a) examine common practices in conducting and reporting of MI testing, (b) assess whether we could reproduce the reported MI results, and (c) conduct MI tests for the comparisons that enabled sufficiently powerful MI testing. We identified 96 articles that contained a total of 929 comparisons. Results showed that only 4% of the 929 comparisons underwent MI testing, and the tests were generally poorly reported. None of the reported MI tests were reproducible, and only 26% of the 174 newly performed MI tests reached sufficient (scalar) invariance, with MI failing completely in 58% of tests. Exploratory analyses suggested that in nearly half of the comparisons where configural invariance was rejected, the number of factors differed between groups. These results indicate that MI tests are rarely conducted and poorly reported in psychological studies. We observed frequent violations of MI, suggesting that reported differences between (experimental) groups may not be solely attributed to group differences in the latent constructs. We offer recommendations aimed at improving reporting and computational reproducibility practices in psychology. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
在心理学中,自我报告量表被广泛用于比较不同组别、实验条件或时间点的潜在结构的平均值。然而,要使这些比较有意义且无偏见,量表必须在比较的时间点或(实验)组间表现出测量不变性(MI)。测量不变性测试可确定各组或各时间点对潜构的测量是否等效,这对于进行有意义的比较至关重要。我们对 426 篇公开数据的心理学文章进行了系统性回顾,目的是:(a)检查进行和报告 MI 检验的常见做法;(b)评估我们是否能重现所报告的 MI 结果;以及(c)对能进行足够强大 MI 检验的比较进行 MI 检验。我们确定了 96 篇文章,共包含 929 项比较。结果显示,在 929 项比较中,只有 4% 的比较进行了多元智能测试,而且这些测试的报道普遍较少。所报道的 MI 测试都不具有可重复性,在 174 项新进行的 MI 测试中,只有 26% 达到了足够的(标度)不变性,58% 的测试完全不符合 MI 标准。探索性分析表明,在配置不变量被拒绝的近一半比较中,各组之间的因子数量存在差异。这些结果表明,在心理学研究中,多元智能测试很少进行,报告也很少。我们观察到经常出现违反多元智能的情况,这表明所报告的(实验)组间差异可能并不完全归因于潜在建构的组间差异。我们提出了旨在改进心理学报告和计算可重复性实践的建议。(PsycInfo Database Record (c) 2023 APA, 版权所有)。
{"title":"The dire disregard of measurement invariance testing in psychological science.","authors":"Esther Maassen, E Damiano D'Urso, Marcel A L M van Assen, Michèle B Nuijten, Kim De Roover, Jelte M Wicherts","doi":"10.1037/met0000624","DOIUrl":"10.1037/met0000624","url":null,"abstract":"<p><p>Self-report scales are widely used in psychology to compare means in latent constructs across groups, experimental conditions, or time points. However, for these comparisons to be meaningful and unbiased, the scales must demonstrate measurement invariance (MI) across compared time points or (experimental) groups. MI testing determines whether the latent constructs are measured equivalently across groups or time, which is essential for meaningful comparisons. We conducted a systematic review of 426 psychology articles with openly available data, to (a) examine common practices in conducting and reporting of MI testing, (b) assess whether we could reproduce the reported MI results, and (c) conduct MI tests for the comparisons that enabled sufficiently powerful MI testing. We identified 96 articles that contained a total of 929 comparisons. Results showed that only 4% of the 929 comparisons underwent MI testing, and the tests were generally poorly reported. None of the reported MI tests were reproducible, and only 26% of the 174 newly performed MI tests reached sufficient (scalar) invariance, with MI failing completely in 58% of tests. Exploratory analyses suggested that in nearly half of the comparisons where configural invariance was rejected, the number of factors differed between groups. These results indicate that MI tests are rarely conducted and poorly reported in psychological studies. We observed frequent violations of MI, suggesting that reported differences between (experimental) groups may not be solely attributed to group differences in the latent constructs. We offer recommendations aimed at improving reporting and computational reproducibility practices in psychology. (PsycInfo Database Record (c) 2025 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"966-979"},"PeriodicalIF":7.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139037948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2023-10-16DOI: 10.1037/met0000615
Mar J F Ollero, Eduardo Estrada, Michael D Hunter, Pablo F Cáncer
People show stable differences in the way their affect fluctuates over time. Within the general framework of dynamical systems, the damped linear oscillator (DLO) model has been proposed as a useful approach to study affect dynamics. The DLO model can be applied to repeated measures provided by a single individual, and the resulting parameters can capture relevant features of the person's affect dynamics. Focusing on negative affect, we provide an accessible interpretation of the DLO model parameters in terms of emotional lability, resilience, and vulnerability. We conducted a Monte Carlo study to test the DLO model performance under different empirically relevant conditions in terms of individual characteristics and sampling scheme. We used state-space models in continuous time. The results show that, under certain conditions, the DLO model is able to accurately and efficiently recover the parameters underlying the affective dynamics of a single individual. We discuss the results and the theoretical and practical implications of using this model, illustrate how to use it for studying psychological phenomena at the individual level, and provide specific recommendations on how to collect data for this purpose. We also provide a tutorial website and computer code in R to implement this approach. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
{"title":"Characterizing affect dynamics with a damped linear oscillator model: Theoretical considerations and recommendations for individual-level applications.","authors":"Mar J F Ollero, Eduardo Estrada, Michael D Hunter, Pablo F Cáncer","doi":"10.1037/met0000615","DOIUrl":"10.1037/met0000615","url":null,"abstract":"<p><p>People show stable differences in the way their affect fluctuates over time. Within the general framework of dynamical systems, the damped linear oscillator (DLO) model has been proposed as a useful approach to study affect dynamics. The DLO model can be applied to repeated measures provided by a single individual, and the resulting parameters can capture relevant features of the person's affect dynamics. Focusing on negative affect, we provide an accessible interpretation of the DLO model parameters in terms of emotional lability, resilience, and vulnerability. We conducted a Monte Carlo study to test the DLO model performance under different empirically relevant conditions in terms of individual characteristics and sampling scheme. We used state-space models in continuous time. The results show that, under certain conditions, the DLO model is able to accurately and efficiently recover the parameters underlying the affective dynamics of a single individual. We discuss the results and the theoretical and practical implications of using this model, illustrate how to use it for studying psychological phenomena at the individual level, and provide specific recommendations on how to collect data for this purpose. We also provide a tutorial website and computer code in R to implement this approach. (PsycInfo Database Record (c) 2025 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"1095-1112"},"PeriodicalIF":7.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41238100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2023-08-10DOI: 10.1037/met0000605
Miriam Brinberg, Graham D Bodie, Denise H Solomon, Susanne M Jones, Nilam Ram
Several theoretical perspectives suggest that dyadic experiences are distinguished by patterns of behavioral change that emerge during interactions. Methods for examining change in behavior over time are well elaborated for the study of change along continuous dimensions. Extensions for charting increases and decreases in individuals' use of specific, categorically defined behaviors, however, are rarely invoked. Greater accessibility of Bayesian frameworks that facilitate formulation and estimation of the requisite models is opening new possibilities. This article provides a primer on how multinomial logistic growth models can be used to examine between-dyad differences in within-dyad behavioral change over the course of an interaction. We describe and illustrate how these models are implemented in the Bayesian framework using data from support conversations between strangers (N = 118 dyads) to examine (RQ1) how six types of listeners' and disclosers' behaviors change as support conversations unfold and (RQ2) how the disclosers' preconversation distress moderates the change in conversation behaviors. The primer concludes with a series of notes on (a) implications of modeling choices, (b) flexibility in modeling nonlinear change, (c) necessity for theory that specifies how and why change trajectories differ, and (d) how multinomial logistic growth models can help refine current theory about dyadic interaction. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
一些理论观点认为,二元体验是通过互动过程中出现的行为变化模式来区分的。研究行为随时间变化的方法在研究连续维度的变化方面得到了很好的阐述。然而,在个人使用特定的、分类定义的行为时,图表扩展的增减很少被调用。贝叶斯框架的更大可及性,促进了必要模型的制定和估计,正在开辟新的可能性。本文提供了如何使用多项逻辑增长模型来检查在相互作用过程中对内行为变化的对间差异的入门。我们描述并说明了这些模型是如何在贝叶斯框架中实现的,使用陌生人之间的支持对话(N = 118对)的数据来检查(RQ1)六种类型的听者和披露者的行为如何随着支持对话的展开而变化,(RQ2)披露者的谈话前痛苦如何调节谈话行为的变化。本导论以一系列关于(a)建模选择的含义,(b)建模非线性变化的灵活性,(c)指定变化轨迹如何以及为什么不同的理论的必要性,以及(d)多项逻辑增长模型如何帮助完善关于二元相互作用的当前理论。(PsycInfo Database Record (c) 2025 APA,版权所有)。
{"title":"Examining individual differences in how interaction behaviors change over time: A dyadic multinomial logistic growth modeling approach.","authors":"Miriam Brinberg, Graham D Bodie, Denise H Solomon, Susanne M Jones, Nilam Ram","doi":"10.1037/met0000605","DOIUrl":"10.1037/met0000605","url":null,"abstract":"<p><p>Several theoretical perspectives suggest that dyadic experiences are distinguished by patterns of behavioral change that emerge during interactions. Methods for examining change in behavior over time are well elaborated for the study of change along continuous dimensions. Extensions for charting increases and decreases in individuals' use of specific, categorically defined behaviors, however, are rarely invoked. Greater accessibility of Bayesian frameworks that facilitate formulation and estimation of the requisite models is opening new possibilities. This article provides a primer on how multinomial logistic growth models can be used to examine between-dyad differences in within-dyad behavioral change over the course of an interaction. We describe and illustrate how these models are implemented in the Bayesian framework using data from support conversations between strangers (<i>N</i> = 118 dyads) to examine (RQ1) how six types of listeners' and disclosers' behaviors change as support conversations unfold and (RQ2) how the disclosers' preconversation distress moderates the change in conversation behaviors. The primer concludes with a series of notes on (a) implications of modeling choices, (b) flexibility in modeling nonlinear change, (c) necessity for theory that specifies how and why change trajectories differ, and (d) how multinomial logistic growth models can help refine current theory about dyadic interaction. (PsycInfo Database Record (c) 2025 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"1079-1094"},"PeriodicalIF":7.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9967422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2023-12-21DOI: 10.1037/met0000630
Eunsook Kim, Yan Wang, Hsien-Yuan Hsu
Factor mixture modeling (FMM) incorporates both continuous latent variables and categorical latent variables in a single analytic model clustering items and observations simultaneously. After two decades since the introduction of FMM to psychological and behavioral science research, it is an opportune time to review FMM applications to understand how these applications are utilized in real-world research. We conducted a systematic review of 76 FMM applications. We developed a comprehensive coding scheme based on the current methodological literature of FMM and evaluated common usages and practices of FMM. Based on the review, we identify challenges and issues that applied researchers encounter in the practice of FMM and provide practical suggestions to promote well-informed decision making. Lastly, we discuss future methodological directions and suggest how FMM can be expanded beyond its typical use in applied studies. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
因子混合建模(FMM)将连续潜变量和分类潜变量纳入一个分析模型,同时对项目和观察结果进行聚类。自 FMM 被引入心理学和行为科学研究二十年以来,现在正是回顾 FMM 应用以了解这些应用在实际研究中的应用情况的大好时机。我们对 76 项 FMM 应用进行了系统回顾。我们根据当前的 FMM 方法论文献制定了一套全面的编码方案,并评估了 FMM 的常见用法和实践。在综述的基础上,我们确定了应用研究人员在 FMM 实践中遇到的挑战和问题,并提供了实用建议,以促进知情决策。最后,我们讨论了未来的方法论方向,并建议如何将 FMM 扩展到应用研究的典型用途之外。(PsycInfo Database Record (c) 2023 APA, all rights reserved)。
{"title":"A systematic review of and reflection on the applications of factor mixture modeling.","authors":"Eunsook Kim, Yan Wang, Hsien-Yuan Hsu","doi":"10.1037/met0000630","DOIUrl":"10.1037/met0000630","url":null,"abstract":"<p><p>Factor mixture modeling (FMM) incorporates both continuous latent variables and categorical latent variables in a single analytic model clustering items and observations simultaneously. After two decades since the introduction of FMM to psychological and behavioral science research, it is an opportune time to review FMM applications to understand how these applications are utilized in real-world research. We conducted a systematic review of 76 FMM applications. We developed a comprehensive coding scheme based on the current methodological literature of FMM and evaluated common usages and practices of FMM. Based on the review, we identify challenges and issues that applied researchers encounter in the practice of FMM and provide practical suggestions to promote well-informed decision making. Lastly, we discuss future methodological directions and suggest how FMM can be expanded beyond its typical use in applied studies. (PsycInfo Database Record (c) 2025 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"997-1016"},"PeriodicalIF":7.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138831225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2023-11-06DOI: 10.1037/met0000612
Manshu Yang, Darrell J Gaskin
Partially clustered designs are widely used in psychological research, especially in randomized controlled trials that examine the effectiveness of prevention or intervention strategies. In a partially clustered trial, individuals are clustered into intervention groups in one or more study arms, for the purpose of intervention delivery, whereas individuals in other arms (e.g., the waitlist control arm) are unclustered. Missing data are almost inevitable in partially clustered trials and could pose a major challenge in drawing valid research conclusions. This article focuses on handling auxiliary-variable-dependent missing at random data in partially clustered studies. Five methods were compared via a simulation study, including simultaneous multiple imputation using joint modeling (MI-JM-SIM), arm-specific multiple imputation using joint modeling (MI-JM-AS), arm-specific multiple imputation using substantive-model-compatible sequential modeling (MI-SMC-AS), sequential fully Bayesian estimation using noninformative priors (SFB-NON), and sequential fully Bayesian estimation using weakly informative priors (SFB-WEAK). The results suggest that the MI-JM-AS method outperformed other methods when the variables with missing values only involved fixed effects, whereas the MI-SMC-AS method was preferred if the incomplete variables featured random effects. Applications of different methods are also illustrated using an empirical data example. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
{"title":"Handling missing data in partially clustered randomized controlled trials.","authors":"Manshu Yang, Darrell J Gaskin","doi":"10.1037/met0000612","DOIUrl":"10.1037/met0000612","url":null,"abstract":"<p><p>Partially clustered designs are widely used in psychological research, especially in randomized controlled trials that examine the effectiveness of prevention or intervention strategies. In a partially clustered trial, individuals are clustered into intervention groups in one or more study arms, for the purpose of intervention delivery, whereas individuals in other arms (e.g., the waitlist control arm) are unclustered. Missing data are almost inevitable in partially clustered trials and could pose a major challenge in drawing valid research conclusions. This article focuses on handling auxiliary-variable-dependent missing at random data in partially clustered studies. Five methods were compared via a simulation study, including simultaneous multiple imputation using joint modeling (MI-JM-SIM), arm-specific multiple imputation using joint modeling (MI-JM-AS), arm-specific multiple imputation using substantive-model-compatible sequential modeling (MI-SMC-AS), sequential fully Bayesian estimation using noninformative priors (SFB-NON), and sequential fully Bayesian estimation using weakly informative priors (SFB-WEAK). The results suggest that the MI-JM-AS method outperformed other methods when the variables with missing values only involved fixed effects, whereas the MI-SMC-AS method was preferred if the incomplete variables featured random effects. Applications of different methods are also illustrated using an empirical data example. (PsycInfo Database Record (c) 2025 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"927-948"},"PeriodicalIF":7.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11906213/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71485253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2023-03-23DOI: 10.1037/met0000572
Samantha F Anderson, Xinran Liu
Despite increased attention to open science and transparency, questionable research practices (QRPs) remain common, and studies published using QRPs will remain a part of the published record for some time. A particularly common type of QRP involves multiple testing, and in some forms of this, researchers report only a selection of the tests conducted. Methodological investigations of multiple testing and QRPs have often focused on implications for a single study, as well as how these practices can increase the likelihood of false positive results. However, it is illuminating to consider the role of these QRPs from a broader, literature-wide perspective, focusing on consequences that affect the interpretability of results across the literature. In this article, we use a Monte Carlo simulation study to explore the consequences of two QRPs involving multiple testing, cherry picking and question trolling, on effect size bias and heterogeneity among effect sizes. Importantly, we explicitly consider the role of real-world conditions, including sample size, effect size, and publication bias, that amend the influence of these QRPs. Results demonstrated that QRPs can substantially affect both bias and heterogeneity, although there were many nuances, particularly relating to the influence of publication bias, among other factors. The present study adds a new perspective to how QRPs may influence researchers' ability to evaluate a literature accurately and cumulatively, and points toward yet another reason to continue to advocate for initiatives that reduce QRPs. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
尽管人们越来越关注开放科学和透明度,但有问题的研究实践(qrp)仍然很常见,使用qrp发表的研究在一段时间内仍将是已发表记录的一部分。一种特别常见的QRP类型涉及多次测试,在某些形式的测试中,研究人员只报告了所进行测试的一部分。多次测试和qrp的方法学调查通常侧重于对单一研究的影响,以及这些做法如何增加假阳性结果的可能性。然而,从更广泛的、文献范围内的角度来考虑这些qrp的作用是有启发性的,重点是影响文献中结果的可解释性的后果。在本文中,我们使用蒙特卡罗模拟研究来探讨涉及多重测试的两个qrp,樱桃挑选和问题trolling,对效应大小偏差和效应大小异质性的影响。重要的是,我们明确考虑了现实世界条件的作用,包括样本量、效应量和发表偏倚,这些条件可以修正这些qrp的影响。结果表明,尽管存在许多细微差别,特别是与发表偏倚的影响有关,但qrp可以实质上影响偏倚和异质性。目前的研究为qrp如何影响研究人员准确和累积评估文献的能力提供了一个新的视角,并指出了继续倡导减少qrp的另一个原因。(PsycInfo Database Record (c) 2025 APA,版权所有)。
{"title":"Questionable research practices and cumulative science: The consequences of selective reporting on effect size bias and heterogeneity.","authors":"Samantha F Anderson, Xinran Liu","doi":"10.1037/met0000572","DOIUrl":"10.1037/met0000572","url":null,"abstract":"<p><p>Despite increased attention to open science and transparency, questionable research practices (QRPs) remain common, and studies published using QRPs will remain a part of the published record for some time. A particularly common type of QRP involves multiple testing, and in some forms of this, researchers report only a selection of the tests conducted. Methodological investigations of multiple testing and QRPs have often focused on implications for a single study, as well as how these practices can increase the likelihood of false positive results. However, it is illuminating to consider the role of these QRPs from a broader, literature-wide perspective, focusing on consequences that affect the interpretability of results across the literature. In this article, we use a Monte Carlo simulation study to explore the consequences of two QRPs involving multiple testing, cherry picking and question trolling, on effect size bias and heterogeneity among effect sizes. Importantly, we explicitly consider the role of real-world conditions, including sample size, effect size, and publication bias, that amend the influence of these QRPs. Results demonstrated that QRPs can substantially affect both bias and heterogeneity, although there were many nuances, particularly relating to the influence of publication bias, among other factors. The present study adds a new perspective to how QRPs may influence researchers' ability to evaluate a literature accurately and cumulatively, and points toward yet another reason to continue to advocate for initiatives that reduce QRPs. (PsycInfo Database Record (c) 2025 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"1017-1042"},"PeriodicalIF":7.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9367002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2024-02-08DOI: 10.1037/met0000646
Timothy Hayes
Multilevel models allow researchers to test hypotheses at multiple levels of analysis-for example, assessing the effects of both individual-level and school-level predictors on a target outcome. To assess these effects with the greatest clarity, researchers are well-advised to cluster mean center all Level 1 predictors and explicitly incorporate the cluster means into the model at Level 2. When an outcome of interest is continuous, this unconflated model specification serves both to increase model accuracy, by separating the level-specific effects of each predictor, and to increase model interpretability, by reframing the random intercepts as unadjusted cluster means. When an outcome of interest is binary or ordinal, however, only the first of these benefits is fully realized: In these models, the intuitive cluster mean interpretations of Level 2 effects are only available on the metric of the linear predictor (e.g., the logit) or, equivalently, the latent response propensity, yij∗. Because the calculations for obtaining predicted probabilities, odds, and ORs operate on the entire combined model equation, the interpretations of these quantities are inextricably tied to individual-level, rather than cluster-level, outcomes. This is unfortunate, given that the probability and odds metrics are often of greatest interest to researchers in practice. To address this issue, I propose a novel rescaling method designed to calculate cluster average success proportions, odds, and ORs in two-level binary and ordinal logistic and probit models. I apply the approach to a real data example and provide supplemental R functions to help users implement the method easily. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
{"title":"Individual-level probabilities and cluster-level proportions: Toward interpretable level 2 estimates in unconflated multilevel models for binary outcomes.","authors":"Timothy Hayes","doi":"10.1037/met0000646","DOIUrl":"10.1037/met0000646","url":null,"abstract":"<p><p>Multilevel models allow researchers to test hypotheses at multiple levels of analysis-for example, assessing the effects of both individual-level and school-level predictors on a target outcome. To assess these effects with the greatest clarity, researchers are well-advised to cluster mean center all Level 1 predictors and explicitly incorporate the cluster means into the model at Level 2. When an outcome of interest is continuous, this unconflated model specification serves both to increase model accuracy, by separating the level-specific effects of each predictor, and to increase model interpretability, by reframing the random intercepts as unadjusted cluster means. When an outcome of interest is binary or ordinal, however, only the first of these benefits is fully realized: In these models, the intuitive cluster mean interpretations of Level 2 effects are only available on the metric of the linear predictor (e.g., the logit) or, equivalently, the latent response propensity, <i>y</i><sub>ij</sub>∗. Because the calculations for obtaining predicted probabilities, odds, and <i>OR</i>s operate on the entire combined model equation, the interpretations of these quantities are inextricably tied to individual-level, rather than cluster-level, outcomes. This is unfortunate, given that the probability and odds metrics are often of greatest interest to researchers in practice. To address this issue, I propose a novel rescaling method designed to calculate cluster average success proportions, odds, and <i>OR</i>s in two-level binary and ordinal logistic and probit models. I apply the approach to a real data example and provide supplemental R functions to help users implement the method easily. (PsycInfo Database Record (c) 2025 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"1113-1132"},"PeriodicalIF":7.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139707688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2024-01-25DOI: 10.1037/met0000621
Daniel J Schad, Bruno Nicenboim, Shravan Vasishth
Bayesian linear mixed-effects models (LMMs) and Bayesian analysis of variance (ANOVA) are increasingly being used in the cognitive sciences to perform null hypothesis tests, where a null hypothesis that an effect is zero is compared with an alternative hypothesis that the effect exists and is different from zero. While software tools for Bayes factor null hypothesis tests are easily accessible, how to specify the data and the model correctly is often not clear. In Bayesian approaches, many authors use data aggregation at the by-subject level and estimate Bayes factors on aggregated data. Here, we use simulation-based calibration for model inference applied to several example experimental designs to demonstrate that, as with frequentist analysis, such null hypothesis tests on aggregated data can be problematic in Bayesian analysis. Specifically, when random slope variances differ (i.e., violated sphericity assumption), Bayes factors are too conservative for contrasts where the variance is small and they are too liberal for contrasts where the variance is large. Running Bayesian ANOVA on aggregated data can-if the sphericity assumption is violated-likewise lead to biased Bayes factor results. Moreover, Bayes factors for by-subject aggregated data are biased (too liberal) when random item slope variance is present but ignored in the analysis. These problems can be circumvented or reduced by running Bayesian LMMs on nonaggregated data such as on individual trials, and by explicitly modeling the full random effects structure. Reproducible code is available from https://osf.io/mjf47/. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
{"title":"Data aggregation can lead to biased inferences in Bayesian linear mixed models and Bayesian analysis of variance.","authors":"Daniel J Schad, Bruno Nicenboim, Shravan Vasishth","doi":"10.1037/met0000621","DOIUrl":"10.1037/met0000621","url":null,"abstract":"<p><p>Bayesian linear mixed-effects models (LMMs) and Bayesian analysis of variance (ANOVA) are increasingly being used in the cognitive sciences to perform null hypothesis tests, where a null hypothesis that an effect is zero is compared with an alternative hypothesis that the effect exists and is different from zero. While software tools for Bayes factor null hypothesis tests are easily accessible, how to specify the data and the model correctly is often not clear. In Bayesian approaches, many authors use data aggregation at the by-subject level and estimate Bayes factors on aggregated data. Here, we use simulation-based calibration for model inference applied to several example experimental designs to demonstrate that, as with frequentist analysis, such null hypothesis tests on aggregated data can be problematic in Bayesian analysis. Specifically, when random slope variances differ (i.e., violated sphericity assumption), Bayes factors are too conservative for contrasts where the variance is small and they are too liberal for contrasts where the variance is large. Running Bayesian ANOVA on aggregated data can-if the sphericity assumption is violated-likewise lead to biased Bayes factor results. Moreover, Bayes factors for by-subject aggregated data are biased (too liberal) when random item slope variance is present but ignored in the analysis. These problems can be circumvented or reduced by running Bayesian LMMs on nonaggregated data such as on individual trials, and by explicitly modeling the full random effects structure. Reproducible code is available from https://osf.io/mjf47/. (PsycInfo Database Record (c) 2025 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"1133-1168"},"PeriodicalIF":7.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139564771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2023-11-13DOI: 10.1037/met0000614
Craig K Enders, Brian T Keller, Michael P Woller
Estimating power for multilevel models is complex because there are many moving parts, several sources of variation to consider, and unique sample sizes at Level 1 and Level 2. Monte Carlo computer simulation is a flexible tool that has received considerable attention in the literature. However, much of the work to date has focused on very simple models with one predictor at each level and one cross-level interaction effect, and approaches that do not share this limitation require users to specify a large set of population parameters. The goal of this tutorial is to describe a flexible Monte Carlo approach that accommodates a broad class of multilevel regression models with continuous outcomes. Our tutorial makes three important contributions. First, it allows any number of within-cluster effects, between-cluster effects, covariate effects at either level, cross-level interactions, and random coefficients. Moreover, we do not assume orthogonal effects, and predictors can correlate at either level. Second, our approach accommodates models with multiple interaction effects, and it does so with exact expressions for the variances and covariances of product random variables. Finally, our strategy for deriving hypothetical population parameters does not require pilot or comparable data. Instead, we use intuitive variance-explained effect size expressions to reverse-engineer solutions for the regression coefficients and variance components. We describe a new R package mlmpower that computes these solutions and automates the process of generating artificial data sets and summarizing the simulation results. The online supplemental materials provide detailed vignettes that annotate the R scripts and resulting output. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
{"title":"A simple Monte Carlo method for estimating power in multilevel designs.","authors":"Craig K Enders, Brian T Keller, Michael P Woller","doi":"10.1037/met0000614","DOIUrl":"10.1037/met0000614","url":null,"abstract":"<p><p>Estimating power for multilevel models is complex because there are many moving parts, several sources of variation to consider, and unique sample sizes at Level 1 and Level 2. Monte Carlo computer simulation is a flexible tool that has received considerable attention in the literature. However, much of the work to date has focused on very simple models with one predictor at each level and one cross-level interaction effect, and approaches that do not share this limitation require users to specify a large set of population parameters. The goal of this tutorial is to describe a flexible Monte Carlo approach that accommodates a broad class of multilevel regression models with continuous outcomes. Our tutorial makes three important contributions. First, it allows any number of within-cluster effects, between-cluster effects, covariate effects at either level, cross-level interactions, and random coefficients. Moreover, we do not assume orthogonal effects, and predictors can correlate at either level. Second, our approach accommodates models with multiple interaction effects, and it does so with exact expressions for the variances and covariances of product random variables. Finally, our strategy for deriving hypothetical population parameters does not require pilot or comparable data. Instead, we use intuitive variance-explained effect size expressions to reverse-engineer solutions for the regression coefficients and variance components. We describe a new R package mlmpower that computes these solutions and automates the process of generating artificial data sets and summarizing the simulation results. The online supplemental materials provide detailed vignettes that annotate the R scripts and resulting output. (PsycInfo Database Record (c) 2025 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"980-996"},"PeriodicalIF":7.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92156263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}