Pub Date : 2025-08-01Epub Date: 2023-07-27DOI: 10.1037/met0000561
Oliver Schmidt, Edgar Erdfelder, Daniel W Heck
Many psychological theories assume that observable responses are determined by multiple latent processes. Multinomial processing tree (MPT) models are a class of cognitive models for discrete responses that allow researchers to disentangle and measure such processes. Before applying MPT models to specific psychological theories, it is necessary to tailor a model to specific experimental designs. In this tutorial, we explain how to develop, fit, and test MPT models using the classical pair-clustering model as a running example. The first part covers the required data structures, model equations, identifiability, model validation, maximum-likelihood estimation, hypothesis tests, and power analyses using the software multiTree. The second part introduces hierarchical MPT modeling which allows researchers to account for individual differences and to estimate the correlations of latent processes among each other and with additional covariates using the TreeBUGS package in R. All examples including data and annotated analysis scripts are provided at the Open Science Framework (https://osf.io/24pbm/). (PsycInfo Database Record (c) 2025 APA, all rights reserved).
许多心理学理论认为,可观察到的反应是由多个潜在过程决定的。多项处理树(MPT)模型是一类离散响应的认知模型,允许研究人员解开和测量这些过程。在将MPT模型应用于具体的心理学理论之前,有必要根据具体的实验设计来定制模型。在本教程中,我们将使用经典的配对聚类模型作为运行示例,解释如何开发、拟合和测试MPT模型。第一部分涵盖了所需的数据结构、模型方程、可识别性、模型验证、最大似然估计、假设检验和使用软件multiTree的功率分析。第二部分介绍了分层MPT建模,它允许研究人员使用r中的TreeBUGS包来解释个体差异,并估计潜在过程之间以及与其他协变量之间的相关性。所有示例包括数据和注释分析脚本都在开放科学框架(https://osf.io/24pbm/)上提供。(PsycInfo Database Record (c) 2025 APA,版权所有)。
{"title":"How to develop, test, and extend multinomial processing tree models: A tutorial.","authors":"Oliver Schmidt, Edgar Erdfelder, Daniel W Heck","doi":"10.1037/met0000561","DOIUrl":"10.1037/met0000561","url":null,"abstract":"<p><p>Many psychological theories assume that observable responses are determined by multiple latent processes. Multinomial processing tree (MPT) models are a class of cognitive models for discrete responses that allow researchers to disentangle and measure such processes. Before applying MPT models to specific psychological theories, it is necessary to tailor a model to specific experimental designs. In this tutorial, we explain how to develop, fit, and test MPT models using the classical pair-clustering model as a running example. The first part covers the required data structures, model equations, identifiability, model validation, maximum-likelihood estimation, hypothesis tests, and power analyses using the software multiTree. The second part introduces hierarchical MPT modeling which allows researchers to account for individual differences and to estimate the correlations of latent processes among each other and with additional covariates using the TreeBUGS package in R. All examples including data and annotated analysis scripts are provided at the Open Science Framework (https://osf.io/24pbm/). (PsycInfo Database Record (c) 2025 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"720-743"},"PeriodicalIF":7.8,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9882826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-01Epub Date: 2023-08-21DOI: 10.1037/met0000607
Haoran Li, Wen Luo, Eunkyeng Baek, Christopher G Thompson, Kwok Hap Lam
The outcomes in single-case experimental designs (SCEDs) are often counts or proportions. In our study, we provided a colloquial illustration for a new class of generalized linear mixed models (GLMMs) to fit count and proportion data from SCEDs. We also addressed important aspects in the GLMM framework including overdispersion, estimation methods, statistical inferences, model selection methods by detecting overdispersion, and interpretations of regression coefficients. We then demonstrated the GLMMs with two empirical examples with count and proportion outcomes in SCEDs. In addition, we conducted simulation studies to examine the performance of GLMMs in terms of biases and coverage rates for the immediate treatment effect and treatment effect on the trend. We also examined the empirical Type I error rates of statistical tests. Finally, we provided recommendations about how to make sound statistical decisions to use GLMMs based on the findings from simulation studies. Our hope is that this article will provide SCED researchers with the basic information necessary to conduct appropriate statistical analysis of count and proportion data in their own research and outline the future agenda for methodologists to explore the full potential of GLMMs to analyze or meta-analyze SCED data. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
单例实验设计(SCEDs)的结果通常是计数或比例。在我们的研究中,我们为一类新的广义线性混合模型(glmm)提供了一个通俗的说明,以拟合来自SCEDs的计数和比例数据。我们还讨论了GLMM框架中的重要方面,包括过分散、估计方法、统计推断、通过检测过分散来选择模型的方法以及回归系数的解释。然后,我们用两个实证例子证明了glmm在sced中的计数和比例结果。此外,我们还进行了模拟研究,以检验glmm在即时治疗效果和治疗效果对趋势的偏差和覆盖率方面的表现。我们还检验了统计检验的经验I型错误率。最后,我们根据模拟研究的结果,就如何做出合理的统计决策来使用glmm提出了建议。我们希望本文能为经济与经济发展研究人员提供必要的基本信息,以便他们在自己的研究中对计数和比例数据进行适当的统计分析,并概述方法学家未来的议程,以探索glmm分析或元分析经济与经济发展数据的全部潜力。(PsycInfo Database Record (c) 2025 APA,版权所有)。
{"title":"Multilevel modeling in single-case studies with count and proportion data: A demonstration and evaluation.","authors":"Haoran Li, Wen Luo, Eunkyeng Baek, Christopher G Thompson, Kwok Hap Lam","doi":"10.1037/met0000607","DOIUrl":"10.1037/met0000607","url":null,"abstract":"<p><p>The outcomes in single-case experimental designs (SCEDs) are often counts or proportions. In our study, we provided a colloquial illustration for a new class of generalized linear mixed models (GLMMs) to fit count and proportion data from SCEDs. We also addressed important aspects in the GLMM framework including overdispersion, estimation methods, statistical inferences, model selection methods by detecting overdispersion, and interpretations of regression coefficients. We then demonstrated the GLMMs with two empirical examples with count and proportion outcomes in SCEDs. In addition, we conducted simulation studies to examine the performance of GLMMs in terms of biases and coverage rates for the immediate treatment effect and treatment effect on the trend. We also examined the empirical Type I error rates of statistical tests. Finally, we provided recommendations about how to make sound statistical decisions to use GLMMs based on the findings from simulation studies. Our hope is that this article will provide SCED researchers with the basic information necessary to conduct appropriate statistical analysis of count and proportion data in their own research and outline the future agenda for methodologists to explore the full potential of GLMMs to analyze or meta-analyze SCED data. (PsycInfo Database Record (c) 2025 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"815-842"},"PeriodicalIF":7.8,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10029565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-01Epub Date: 2023-07-10DOI: 10.1037/met0000595
Rohit Batra, Simran K Johal, Meng Chen, Emilio Ferrer
Continuous-time (CT) models are a flexible approach for modeling longitudinal data of psychological constructs. When using CT models, a researcher can assume one underlying continuous function for the phenomenon of interest. In principle, these models overcome some limitations of discrete-time (DT) models and allow researchers to compare findings across measures collected using different time intervals, such as daily, weekly, or monthly intervals. Theoretically, the parameters for equivalent models can be rescaled into a common time interval that allows for comparisons across individuals and studies, irrespective of the time interval used for sampling. In this study, we carry out a Monte Carlo simulation to examine the capability of CT autoregressive (CT-AR) models to recover the true dynamics of a process when the sampling interval is different from the time scale of the true generating process. We use two generating time intervals (daily or weekly) with varying strengths of the AR parameter and assess its recovery when sampled at different intervals (daily, weekly, or monthly). Our findings indicate that sampling at a faster time interval than the generating dynamics can mostly recover the generating AR effects. Sampling at a slower time interval requires stronger generating AR effects for satisfactory recovery, otherwise the estimation results show high bias and poor coverage. Based on our findings, we recommend researchers use sampling intervals guided by theory about the variable under study, and whenever possible, sample as frequently as possible. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
连续时间模型是一种灵活的心理构念纵向数据建模方法。当使用CT模型时,研究人员可以为感兴趣的现象假设一个潜在的连续函数。原则上,这些模型克服了离散时间(DT)模型的一些局限性,并允许研究人员比较使用不同时间间隔(如每日,每周或每月间隔)收集的测量结果。从理论上讲,等效模型的参数可以重新调整为一个共同的时间间隔,以便在个体和研究之间进行比较,而不考虑采样所用的时间间隔。在这项研究中,我们进行了蒙特卡罗模拟,以检验CT自回归(CT- ar)模型在采样间隔不同于真实生成过程的时间尺度时恢复过程真实动态的能力。我们使用具有不同AR参数强度的两个生成时间间隔(每天或每周),并在不同间隔(每天,每周或每月)采样时评估其恢复。我们的研究结果表明,在比生成动力学更快的时间间隔内采样可以大部分恢复生成的AR效应。在较慢的时间间隔进行采样,需要较强的生成AR效果才能获得满意的恢复,否则估计结果偏差大,覆盖率差。根据我们的发现,我们建议研究人员在研究变量的理论指导下使用采样间隔,并且只要可能,尽可能频繁地采样。(PsycInfo Database Record (c) 2025 APA,版权所有)。
{"title":"Consequences of sampling frequency on the estimated dynamics of AR processes using continuous-time models.","authors":"Rohit Batra, Simran K Johal, Meng Chen, Emilio Ferrer","doi":"10.1037/met0000595","DOIUrl":"10.1037/met0000595","url":null,"abstract":"<p><p>Continuous-time (CT) models are a flexible approach for modeling longitudinal data of psychological constructs. When using CT models, a researcher can assume one underlying continuous function for the phenomenon of interest. In principle, these models overcome some limitations of discrete-time (DT) models and allow researchers to compare findings across measures collected using different time intervals, such as daily, weekly, or monthly intervals. Theoretically, the parameters for equivalent models can be rescaled into a common time interval that allows for comparisons across individuals and studies, irrespective of the time interval used for sampling. In this study, we carry out a Monte Carlo simulation to examine the capability of CT autoregressive (CT-AR) models to recover the true dynamics of a process when the sampling interval is different from the time scale of the true generating process. We use two generating time intervals (daily or weekly) with varying strengths of the AR parameter and assess its recovery when sampled at different intervals (daily, weekly, or monthly). Our findings indicate that sampling at a faster time interval than the generating dynamics can mostly recover the generating AR effects. Sampling at a slower time interval requires stronger generating AR effects for satisfactory recovery, otherwise the estimation results show high bias and poor coverage. Based on our findings, we recommend researchers use sampling intervals guided by theory about the variable under study, and whenever possible, sample as frequently as possible. (PsycInfo Database Record (c) 2025 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"843-865"},"PeriodicalIF":7.8,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9823284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-01Epub Date: 2023-07-27DOI: 10.1037/met0000587
Ed Donnellan, Satoshi Usami, Kou Murayama
In psychology, researchers often predict a dependent variable (DV) consisting of multiple measurements (e.g., scale items measuring a concept). To analyze the data, researchers typically aggregate (sum/average) scores across items and use this as a DV. Alternatively, they may define the DV as a common factor using structural equation modeling. However, both approaches neglect the possibility that an independent variable (IV) may have different relationships to individual items. This variance in individual item slopes arises because items are randomly sampled from an infinite pool of items reflecting the construct that the scale purports to measure. Here, we offer a mixed-effects model called random item slope regression, which accounts for both similarities and differences of individual item associations. Critically, we argue that random item slope regression poses an alternative measurement model to common factor models prevalent in psychology. Unlike these models, the proposed model supposes no latent constructs and instead assumes that individual items have direct causal relationships with the IV. Such operationalization is especially useful when researchers want to assess a broad construct with heterogeneous items. Using mathematical proof and simulation, we demonstrate that random item slopes cause inflation of Type I error when not accounted for, particularly when the sample size (number of participants) is large. In real-world data (n = 564 participants) using commonly used surveys and two reaction time tasks, we demonstrate that random item slopes are present at problematic levels. We further demonstrate that common statistical indices are not sufficient to diagnose the presence of random item slopes. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
在心理学中,研究人员经常预测一个由多个测量组成的因变量(DV)(例如,测量一个概念的量表项目)。为了分析数据,研究人员通常会汇总(总和/平均值)各个项目的分数,并将其用作DV。或者,他们可以使用结构方程建模将DV定义为一个公共因素。然而,这两种方法都忽略了自变量(IV)可能与单个项目有不同关系的可能性。单个项目斜率的差异之所以产生,是因为项目是从反映量表旨在测量的结构的无限项目池中随机抽样的。在这里,我们提供了一种混合效应模型,称为随机项目斜率回归,它同时考虑了单个项目关联的相似性和差异性。关键的是,我们认为随机项目斜率回归提出了一种替代测量模型,以共同因素模型普遍存在于心理学。与这些模型不同,所提出的模型不假设潜在构念,而是假设单个项目与IV有直接的因果关系。当研究人员想要评估具有异质项目的广泛构念时,这种操作化特别有用。使用数学证明和模拟,我们证明了随机项目斜率在未考虑的情况下会导致I型误差膨胀,特别是当样本量(参与者数量)很大时。在使用常用调查和两个反应时间任务的真实数据(n = 564名参与者)中,我们证明了随机项目斜率存在于问题水平。我们进一步证明了常用的统计指标不足以诊断随机项目斜率的存在。(PsycInfo Database Record (c) 2025 APA,版权所有)。
{"title":"Random item slope regression: An alternative measurement model that accounts for both similarities and differences in association with individual items.","authors":"Ed Donnellan, Satoshi Usami, Kou Murayama","doi":"10.1037/met0000587","DOIUrl":"10.1037/met0000587","url":null,"abstract":"<p><p>In psychology, researchers often predict a dependent variable (DV) consisting of multiple measurements (e.g., scale items measuring a concept). To analyze the data, researchers typically aggregate (sum/average) scores across items and use this as a DV. Alternatively, they may define the DV as a common factor using structural equation modeling. However, both approaches neglect the possibility that an independent variable (IV) may have different relationships to individual items. This variance in individual item slopes arises because items are randomly sampled from an infinite pool of items reflecting the construct that the scale purports to measure. Here, we offer a mixed-effects model called <i>random item slope regression,</i> which accounts for both similarities and differences of individual item associations. Critically, we argue that random item slope regression poses an alternative measurement model to common factor models prevalent in psychology. Unlike these models, the proposed model supposes no latent constructs and instead assumes that individual items have direct causal relationships with the IV. Such operationalization is especially useful when researchers want to assess a broad construct with heterogeneous items. Using mathematical proof and simulation, we demonstrate that random item slopes cause inflation of Type I error when not accounted for, particularly when the sample size (number of participants) is large. In real-world data (<i>n</i> = 564 participants) using commonly used surveys and two reaction time tasks, we demonstrate that random item slopes are present at problematic levels. We further demonstrate that common statistical indices are not sufficient to diagnose the presence of random item slopes. (PsycInfo Database Record (c) 2025 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"744-769"},"PeriodicalIF":7.8,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10259360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-01Epub Date: 2023-07-06DOI: 10.1037/met0000590
Marcos Jiménez, Francisco J Abad, Eduardo Garcia-Garzon, Hudson Golino, Alexander P Christensen, Luis Eduardo Garrido
The accuracy of factor retention methods for structures with one or more general factors, like the ones typically encountered in fields like intelligence, personality, and psychopathology, has often been overlooked in dimensionality research. To address this issue, we compared the performance of several factor retention methods in this context, including a network psychometrics approach developed in this study. For estimating the number of group factors, these methods were the Kaiser criterion, empirical Kaiser criterion, parallel analysis with principal components (PAPCA) or principal axis, and exploratory graph analysis with Louvain clustering (EGALV). We then estimated the number of general factors using the factor scores of the first-order solution suggested by the best two methods, yielding a "second-order" version of PAPCA (PAPCA-FS) and EGALV (EGALV-FS). Additionally, we examined the direct multilevel solution provided by EGALV. All the methods were evaluated in an extensive simulation manipulating nine variables of interest, including population error. The results indicated that EGALV and PAPCA displayed the best overall performance in retrieving the true number of group factors, the former being more sensitive to high cross-loadings, and the latter to weak group factors and small samples. Regarding the estimation of the number of general factors, both PAPCA-FS and EGALV-FS showed a close to perfect accuracy across all the conditions, while EGALV was inaccurate. The methods based on EGA were robust to the conditions most likely to be encountered in practice. Therefore, we highlight the particular usefulness of EGALV (group factors) and EGALV-FS (general factors) for assessing bifactor structures with multiple general factors. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
{"title":"Dimensionality assessment in bifactor structures with multiple general factors: A network psychometrics approach.","authors":"Marcos Jiménez, Francisco J Abad, Eduardo Garcia-Garzon, Hudson Golino, Alexander P Christensen, Luis Eduardo Garrido","doi":"10.1037/met0000590","DOIUrl":"10.1037/met0000590","url":null,"abstract":"<p><p>The accuracy of factor retention methods for structures with one or more general factors, like the ones typically encountered in fields like intelligence, personality, and psychopathology, has often been overlooked in dimensionality research. To address this issue, we compared the performance of several factor retention methods in this context, including a network psychometrics approach developed in this study. For estimating the number of group factors, these methods were the Kaiser criterion, empirical Kaiser criterion, parallel analysis with principal components (PA<sub>PCA</sub>) or principal axis, and exploratory graph analysis with Louvain clustering (EGA<sub>LV</sub>). We then estimated the number of general factors using the factor scores of the first-order solution suggested by the best two methods, yielding a \"second-order\" version of PA<sub>PCA</sub> (PAP<sub>CA-FS</sub>) and EGA<sub>LV</sub> (EGA<sub>LV-FS</sub>). Additionally, we examined the direct multilevel solution provided by EGA<sub>LV</sub>. All the methods were evaluated in an extensive simulation manipulating nine variables of interest, including population error. The results indicated that EGA<sub>LV</sub> and PA<sub>PCA</sub> displayed the best overall performance in retrieving the true number of group factors, the former being more sensitive to high cross-loadings, and the latter to weak group factors and small samples. Regarding the estimation of the number of general factors, both PAP<sub>CA-FS</sub> and EGA<sub>LV-FS</sub> showed a close to perfect accuracy across all the conditions, while EGA<sub>LV</sub> was inaccurate. The methods based on EGA were robust to the conditions most likely to be encountered in practice. Therefore, we highlight the particular usefulness of EGA<sub>LV</sub> (group factors) and EGA<sub>LV-FS</sub> (general factors) for assessing bifactor structures with multiple general factors. (PsycInfo Database Record (c) 2025 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"770-792"},"PeriodicalIF":7.8,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9746828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-01Epub Date: 2023-08-10DOI: 10.1037/met0000591
James L Peugh, Kaylee Litson, David F Feldon
Decades of published methodological research have shown the chi-square test of model fit performs inconsistently and unreliably as a determinant of structural equation model (SEM) fit. Likewise, SEM indices of model fit, such as comparative fit index (CFI) and root-mean-square error of approximation (RMSEA) also perform inconsistently and unreliably. Despite rather unreliable ways to statistically assess model fit, researchers commonly rely on these methods for lack of a suitable inferential alternative. Marcoulides and Yuan (2017) have proposed the first inferential test of SEM fit in many years: an equivalence test adaptation of the RMSEA and CFI indices (i.e., RMSEAt and CFIt). However, the ability of this equivalence testing approach to accurately judge acceptable and unacceptable model fit has not been empirically tested. This fully crossed Monte Carlo simulation evaluated the accuracy of equivalence testing combining many of the same independent variable (IV) conditions used in previous fit index simulation studies, including sample size (N = 100-1,000), model specification (correctly specified or misspecified), model type (confirmatory factor analysis [CFA], path analysis, or SEM), number of variables analyzed (low or high), data distribution (normal or skewed), and missing data (none, 10%, or 25%). Results show equivalence testing performs rather inconsistently and unreliably across IV conditions, with acceptable or unacceptable RMSEAt and CFIt model fit index values often being contingent on complex interactions among conditions. Proportional z-tests and logistic regression analyses indicated that equivalence tests of model fit are problematic under multiple conditions, especially those where models are mildly misspecified. Recommendations for researchers are offered, but with the provision that they be used with caution until more research and development is available. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
几十年发表的方法学研究表明,模型拟合的卡方检验作为结构方程模型(SEM)拟合的决定因素表现得不一致和不可靠。同样,模型拟合的SEM指标,如比较拟合指数(CFI)和近似均方根误差(RMSEA)也表现不一致和不可靠。尽管统计评估模型拟合的方法相当不可靠,但由于缺乏合适的推断替代方法,研究人员通常依赖这些方法。Marcoulides和Yuan(2017)提出了多年来第一个SEM拟合的推理检验:RMSEA和CFI指数(即RMSEAt和CFIt)的等效检验。然而,这种等效检验方法准确判断可接受和不可接受模型拟合的能力尚未得到实证检验。这个完全交叉的蒙特卡罗模拟评估了等效检验的准确性,结合了许多在以前的拟合指数模拟研究中使用的相同的自变量(IV)条件,包括样本量(N = 100- 1000)、模型规格(正确指定或错误指定)、模型类型(验证性因子分析[CFA]、路径分析或SEM)、分析的变量数量(低或高)、数据分布(正态或偏态)和缺失数据(无、10%或25%)。结果表明,等效性测试在IV条件下执行得相当不一致和不可靠,RMSEAt和CFIt模型拟合指数值通常取决于条件之间的复杂相互作用。比例z检验和逻辑回归分析表明,在多种条件下,模型拟合的等效检验存在问题,特别是在模型轻度错误指定的情况下。为研究人员提供了建议,但规定在有更多的研究和发展可用之前,要谨慎使用这些建议。(PsycInfo Database Record (c) 2025 APA,版权所有)。
{"title":"Equivalence testing to judge model fit: A Monte Carlo simulation.","authors":"James L Peugh, Kaylee Litson, David F Feldon","doi":"10.1037/met0000591","DOIUrl":"10.1037/met0000591","url":null,"abstract":"<p><p>Decades of published methodological research have shown the chi-square test of model fit performs inconsistently and unreliably as a determinant of structural equation model (SEM) fit. Likewise, SEM indices of model fit, such as comparative fit index (CFI) and root-mean-square error of approximation (RMSEA) also perform inconsistently and unreliably. Despite rather unreliable ways to statistically assess model fit, researchers commonly rely on these methods for lack of a suitable inferential alternative. Marcoulides and Yuan (2017) have proposed the first inferential test of SEM fit in many years: an equivalence test adaptation of the RMSEA and CFI indices (i.e., RMSEA<sub><i>t</i></sub> and CFI<i><sub>t</sub></i>). However, the ability of this equivalence testing approach to accurately judge acceptable and unacceptable model fit has not been empirically tested. This fully crossed Monte Carlo simulation evaluated the accuracy of equivalence testing combining many of the same independent variable (IV) conditions used in previous fit index simulation studies, including sample size (<i>N</i> = 100-1,000), model specification (correctly specified or misspecified), model type (confirmatory factor analysis [CFA], path analysis, or SEM), number of variables analyzed (low or high), data distribution (normal or skewed), and missing data (none, 10%, or 25%). Results show equivalence testing performs rather inconsistently and unreliably across IV conditions, with acceptable or unacceptable RMSEA<i><sub>t</sub></i> and CFIt model fit index values often being contingent on complex interactions among conditions. Proportional <i>z</i>-tests and logistic regression analyses indicated that equivalence tests of model fit are problematic under multiple conditions, especially those where models are mildly misspecified. Recommendations for researchers are offered, but with the provision that they be used with caution until more research and development is available. (PsycInfo Database Record (c) 2025 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"888-925"},"PeriodicalIF":7.8,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10339181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-01Epub Date: 2023-07-20DOI: 10.1037/met0000581
Antoinette D A Kroes, Jason R Finley
Omega squared (ω^2) is a measure of effect size for analysis of variance (ANOVA) designs. It is less biased than eta squared, but reported less often. This is in part due to lack of clear guidance on how to calculate it. In this paper, we discuss the logic behind effect size measures, the problem with eta squared, the history of omega squared, and why it has been underused. We then provide a user-friendly guide to omega squared and partial omega squared for ANOVA designs with fixed factors, including one-way, two-way, and three-way designs, using within-subjects factors and/or between-subjects factors. We show how to calculate omega squared using output from SPSS. We provide information on the calculation of confidence intervals. We examine the problems of nonadditivity, and intrinsic versus extrinsic factors. We argue that statistical package developers could play an important role in making the calculation of omega squared easier. Finally, we recommend that researchers report the formulas used in calculating effect sizes, include confidence intervals if possible, and include ANOVA tables in the online supplemental materials of their work. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
平方(ω^2)是方差分析(ANOVA)设计的效应大小的度量。它的偏差小于平方,但报告的频率较低。这在一定程度上是由于缺乏关于如何计算的明确指导。在本文中,我们讨论了效应大小测量背后的逻辑,平方的问题,平方的历史,以及为什么它没有得到充分利用。然后,我们为具有固定因素的方差分析设计提供了一个用户友好的omega平方和部分omega平方指南,包括单向,双向和三向设计,使用受试者内因素和/或受试者之间因素。我们展示了如何使用SPSS的输出来计算omega的平方。我们提供了计算置信区间的信息。我们研究了不可加性问题,以及内在因素与外在因素的对比。我们认为统计软件包开发人员可以在简化计算平方方面发挥重要作用。最后,我们建议研究人员报告用于计算效应量的公式,如果可能的话,包括置信区间,并在他们的工作的在线补充材料中包括方差分析表。(PsycInfo Database Record (c) 2025 APA,版权所有)。
{"title":"Demystifying omega squared: Practical guidance for effect size in common analysis of variance designs.","authors":"Antoinette D A Kroes, Jason R Finley","doi":"10.1037/met0000581","DOIUrl":"10.1037/met0000581","url":null,"abstract":"<p><p>Omega squared (ω^2) is a measure of effect size for analysis of variance (ANOVA) designs. It is less biased than eta squared, but reported less often. This is in part due to lack of clear guidance on how to calculate it. In this paper, we discuss the logic behind effect size measures, the problem with eta squared, the history of omega squared, and why it has been underused. We then provide a user-friendly guide to omega squared and partial omega squared for ANOVA designs with fixed factors, including one-way, two-way, and three-way designs, using within-subjects factors and/or between-subjects factors. We show how to calculate omega squared using output from SPSS. We provide information on the calculation of confidence intervals. We examine the problems of nonadditivity, and intrinsic versus extrinsic factors. We argue that statistical package developers could play an important role in making the calculation of omega squared easier. Finally, we recommend that researchers report the formulas used in calculating effect sizes, include confidence intervals if possible, and include ANOVA tables in the online supplemental materials of their work. (PsycInfo Database Record (c) 2025 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"866-887"},"PeriodicalIF":7.8,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9840882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-01Epub Date: 2023-08-10DOI: 10.1037/met0000603
Lydia Craig Aulisi, Hannah M Markell-Goldstein, Jose M Cortina, Carol M Wong, Xue Lei, Cyrus K Foroughi
Meta-analyses in the psychological sciences typically examine moderators that may explain heterogeneity in effect sizes. One of the most commonly examined moderators is gender. Overall, tests of gender as a moderator are rarely significant, which may be because effects rarely differ substantially between men and women. While this may be true in some cases, we also suggest that the lack of significant findings may be attributable to the way in which gender is examined as a meta-analytic moderator, such that detecting moderating effects is very unlikely even when such effects are substantial in magnitude. More specifically, we suggest that lack of between-primary study variance in gender composition makes it exceedingly difficult to detect moderation. That is, because primary studies tend to have similar male-to-female ratios, there is very little variance in gender composition between primaries, making it nearly impossible to detect between-study differences in the relationship of interest as a function of gender. In the present article, we report results from two studies: (a) a meta-meta-analysis in which we demonstrate the magnitude of this problem by computing the between-study variance in gender composition across 286 meta-analytic moderation tests from 50 meta-analyses, and (b) a Monte Carlo simulation study in which we show that this lack of variance results in near-zero moderator effects even when male-female differences in correlations are quite large. Our simulations are also used to show the value of single-gender studies for detecting moderating effects. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
心理科学的荟萃分析通常检查可能解释效应大小异质性的调节因子。最常被检查的调节因素之一是性别。总的来说,性别作为调节因素的测试很少有意义,这可能是因为男性和女性之间的影响很少有实质性差异。虽然这在某些情况下可能是正确的,但我们也认为,缺乏重大发现可能归因于性别作为元分析调节因素的检验方式,这样即使在这种影响很大的情况下,也不太可能检测到调节作用。更具体地说,我们认为缺乏性别构成的主要研究之间的差异使得很难检测到适度。也就是说,由于初级研究往往有相似的男女比例,初级研究之间的性别构成差异很小,因此几乎不可能检测到兴趣关系作为性别函数的研究之间的差异。在本文中,我们报告了两项研究的结果:(a)一项荟萃分析,我们通过计算来自50项荟萃分析的286项荟萃分析调节测试的性别组成的研究间方差来证明这个问题的严重性;(b)一项蒙特卡罗模拟研究,我们表明,即使在男女相关性差异相当大的情况下,这种方差的缺乏也会导致接近零的调节效应。我们的模拟也被用来显示单性别研究在检测调节效应方面的价值。(PsycInfo Database Record (c) 2025 APA,版权所有)。
{"title":"Detecting gender as a moderator in meta-analysis: The problem of restricted between-study variance.","authors":"Lydia Craig Aulisi, Hannah M Markell-Goldstein, Jose M Cortina, Carol M Wong, Xue Lei, Cyrus K Foroughi","doi":"10.1037/met0000603","DOIUrl":"10.1037/met0000603","url":null,"abstract":"<p><p>Meta-analyses in the psychological sciences typically examine moderators that may explain heterogeneity in effect sizes. One of the most commonly examined moderators is gender. Overall, tests of gender as a moderator are rarely significant, which may be because effects rarely differ substantially between men and women. While this may be true in some cases, we also suggest that the lack of significant findings may be attributable to the way in which gender is examined as a meta-analytic moderator, such that detecting moderating effects is very unlikely even when such effects are substantial in magnitude. More specifically, we suggest that lack of between-primary study variance in gender composition makes it exceedingly difficult to detect moderation. That is, because primary studies tend to have similar male-to-female ratios, there is very little variance in gender composition between primaries, making it nearly impossible to detect between-study differences in the relationship of interest as a function of gender. In the present article, we report results from two studies: (a) a meta-meta-analysis in which we demonstrate the magnitude of this problem by computing the between-study variance in gender composition across 286 meta-analytic moderation tests from 50 meta-analyses, and (b) a Monte Carlo simulation study in which we show that this lack of variance results in near-zero moderator effects even when male-female differences in correlations are quite large. Our simulations are also used to show the value of single-gender studies for detecting moderating effects. (PsycInfo Database Record (c) 2025 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"687-719"},"PeriodicalIF":7.8,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9967420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-01Epub Date: 2023-07-27DOI: 10.1037/met0000597
Peter M Steiner, Patrick Sheehan, Vivian C Wong
Given recent evidence challenging the replicability of results in the social and behavioral sciences, critical questions have been raised about appropriate measures for determining replication success in comparing effect estimates across studies. At issue is the fact that conclusions about replication success often depend on the measure used for evaluating correspondence in results. Despite the importance of choosing an appropriate measure, there is still no widespread agreement about which measures should be used. This article addresses these questions by describing formally the most commonly used measures for assessing replication success, and by comparing their performance in different contexts according to their replication probabilities-that is, the probability of obtaining replication success given study-specific settings. The measures may be characterized broadly as conclusion-based approaches, which assess the congruence of two independent studies' conclusions about the presence of an effect, and distance-based approaches, which test for a significant difference or equivalence of two effect estimates. We also introduce a new measure for assessing replication success called the correspondence test, which combines a difference and equivalence test in the same framework. To help researchers plan prospective replication efforts, we provide closed formulas for power calculations that can be used to determine the minimum detectable effect size (and thus, sample sizes) for each study so that a predetermined minimum replication probability can be achieved. Finally, we use a replication data set from the Open Science Collaboration (2015) to demonstrate the extent to which conclusions about replication success depend on the correspondence measure selected. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
鉴于最近的证据对社会和行为科学结果的可复制性提出了挑战,在比较不同研究的效果估计时,确定复制成功的适当措施提出了关键问题。争论的焦点在于,关于复制成功与否的结论往往取决于用于评估结果一致性的测量方法。尽管选择适当的措施很重要,但对于应该使用哪些措施仍然没有广泛的共识。本文通过正式描述用于评估复制成功的最常用度量,并根据它们的复制概率(即给定特定研究设置的获得复制成功的概率)比较它们在不同上下文中的性能,来解决这些问题。这些措施可以被广泛地描述为基于结论的方法,评估两个独立研究关于效应存在的结论的一致性,以及基于距离的方法,测试两个效应估计的显着差异或等效性。我们还引入了一种评估复制成功的新方法,称为对应测试,它在同一框架中结合了差异测试和等效测试。为了帮助研究人员计划前瞻性的复制工作,我们提供了功率计算的封闭公式,可用于确定每个研究的最小可检测效应大小(从而确定样本量),从而可以实现预定的最小复制概率。最后,我们使用开放科学协作(2015)的复制数据集来证明关于复制成功的结论在多大程度上取决于所选择的对应度量。(PsycInfo Database Record (c) 2025 APA,版权所有)。
{"title":"Correspondence measures for assessing replication success.","authors":"Peter M Steiner, Patrick Sheehan, Vivian C Wong","doi":"10.1037/met0000597","DOIUrl":"10.1037/met0000597","url":null,"abstract":"<p><p>Given recent evidence challenging the replicability of results in the social and behavioral sciences, critical questions have been raised about appropriate measures for determining replication success in comparing effect estimates across studies. At issue is the fact that conclusions about replication success often depend on the measure used for evaluating correspondence in results. Despite the importance of choosing an appropriate measure, there is still no widespread agreement about which measures should be used. This article addresses these questions by describing formally the most commonly used measures for assessing replication success, and by comparing their performance in different contexts according to their replication probabilities-that is, the probability of obtaining replication success given study-specific settings. The measures may be characterized broadly as conclusion-based approaches, which assess the congruence of two independent studies' conclusions about the presence of an effect, and distance-based approaches, which test for a significant difference or equivalence of two effect estimates. We also introduce a new measure for assessing replication success called the correspondence test, which combines a difference and equivalence test in the same framework. To help researchers plan prospective replication efforts, we provide closed formulas for power calculations that can be used to determine the minimum detectable effect size (and thus, sample sizes) for each study so that a predetermined minimum replication probability can be achieved. Finally, we use a replication data set from the Open Science Collaboration (2015) to demonstrate the extent to which conclusions about replication success depend on the correspondence measure selected. (PsycInfo Database Record (c) 2025 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"793-814"},"PeriodicalIF":7.8,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10259359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yutian T. Thompson, Yaqi Li, Hairong Song, David Bard
{"title":"Joint variable selection in generalized linear mixed models with random regularized penalized quasi-likelihood technique.","authors":"Yutian T. Thompson, Yaqi Li, Hairong Song, David Bard","doi":"10.1037/met0000783","DOIUrl":"https://doi.org/10.1037/met0000783","url":null,"abstract":"","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":"52 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144748224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}