首页 > 最新文献

British Journal of Mathematical & Statistical Psychology最新文献

英文 中文
Idiographic interrater reliability measures for intensive longitudinal multirater data. 密集纵向多变量数据的具体变量间可靠性测量。
IF 1.8 3区 心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-20 DOI: 10.1111/bmsp.70022
Tobias Koch, Miriam F Jaehne, Michaela Riediger, Antje Rauers, Jana Holtmann

Interrater reliability plays a crucial role in various areas of psychology. In this article, we propose a multilevel latent time series model for intensive longitudinal data with structurally different raters (e.g., self-reports and partner reports). The new MR-MLTS model enables researchers to estimate idiographic (person-specific) rater consistency coefficients for contemporaneous or dynamic rater agreement. Additionally, the model allows rater consistency coefficients to be linked to external explanatory or outcome variables. It can be implemented in Mplus as well as in the newly developed R package mlts. We illustrate the model using data from an intensive longitudinal multirater study involving 100 heterosexual couples (200 individuals) assessed across 86 time points. Our findings show that relationship duration and partner cognitive resources positively predict rater consistency for the innovations. Results from a simulation study indicate that the number of time points is critical for accurately estimating idiographic rater consistency coefficients, whereas the number of participants is important for accurately recovering the random effect variances. We discuss advantages, limitations, and future extensions of the MR-MLTS model.

被测者的信度在心理学的各个领域都起着至关重要的作用。在本文中,我们提出了一个多层次的潜在时间序列模型,用于具有结构不同评分者(例如,自我报告和伴侣报告)的密集纵向数据。新的MR-MLTS模型使研究人员能够估计具体的(个人特定的)评价一致性系数为同期或动态评价一致。此外,该模型允许将较高的一致性系数与外部解释变量或结果变量联系起来。它既可以在Mplus中实现,也可以在新开发的R包中实现。我们使用一项涉及100对异性恋夫妇(200个人)的密集纵向多因素研究的数据来说明该模型,该研究跨越86个时间点进行评估。我们的研究结果表明,关系持续时间和伴侣认知资源正向预测创新的一致性。模拟研究结果表明,时间点的数量对于准确估计具体的比率一致性系数至关重要,而参与者的数量对于准确恢复随机效应方差至关重要。我们讨论了MR-MLTS模型的优点、局限性和未来扩展。
{"title":"Idiographic interrater reliability measures for intensive longitudinal multirater data.","authors":"Tobias Koch, Miriam F Jaehne, Michaela Riediger, Antje Rauers, Jana Holtmann","doi":"10.1111/bmsp.70022","DOIUrl":"https://doi.org/10.1111/bmsp.70022","url":null,"abstract":"<p><p>Interrater reliability plays a crucial role in various areas of psychology. In this article, we propose a multilevel latent time series model for intensive longitudinal data with structurally different raters (e.g., self-reports and partner reports). The new MR-MLTS model enables researchers to estimate idiographic (person-specific) rater consistency coefficients for contemporaneous or dynamic rater agreement. Additionally, the model allows rater consistency coefficients to be linked to external explanatory or outcome variables. It can be implemented in Mplus as well as in the newly developed R package mlts. We illustrate the model using data from an intensive longitudinal multirater study involving 100 heterosexual couples (200 individuals) assessed across 86 time points. Our findings show that relationship duration and partner cognitive resources positively predict rater consistency for the innovations. Results from a simulation study indicate that the number of time points is critical for accurately estimating idiographic rater consistency coefficients, whereas the number of participants is important for accurately recovering the random effect variances. We discuss advantages, limitations, and future extensions of the MR-MLTS model.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145795447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian model averaging of (a)symmetric item response models in small samples. (a)小样本中对称项目反应模型的贝叶斯模型平均。
IF 1.8 3区 心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-16 DOI: 10.1111/bmsp.70024
Fabio Setti, Leah Feuerstahler

Asymmetric IRT models present theoretically desirable features, but often require large sample sizes for stable estimation due to additional item parameters. When applying item response theory (IRT) to small samples, it is often the case that only models with relatively few item parameters can be reliably estimated. Two recently developed asymmetric IRT models, the negative log-log and the complementary log-log, allow for different IRF shapes compared to conventional IRT models and can be fit with small samples. In this paper, we propose Bayesian model averaging (BMA) of simple symmetric and asymmetric IRT models to explore item asymmetry and to flexibly estimate IRFs in small samples. We also consider model averaging at both the item level and the test level. We first show the feasibility of the approach with an empirical example. Then, in a simulation study involving complex data-generating conditions and small sample sizes (i.e., 100 and 250), we show that averaging methods recover asymmetry in the data-generating process and consistently outperform model selection and kernel smoothing. The methods proposed in this study are a practical alternative to more complex asymmetric IRT models and may also be a useful method in exploratory semi-parametric IRT analysis.

非对称IRT模型在理论上具有理想的特征,但由于附加的项目参数,通常需要大样本量来进行稳定的估计。当将项目反应理论(IRT)应用于小样本时,通常只有项目参数相对较少的模型才能可靠地估计。最近开发的两种不对称IRT模型,负对数-对数和互补对数-对数,与传统的IRT模型相比,允许不同的IRF形状,并且可以适应小样本。本文提出了简单对称和非对称IRT模型的贝叶斯模型平均(BMA),以探索项目不对称性,并灵活估计小样本的irf。我们还在项目水平和测试水平上考虑模型平均。我们首先用一个实证例子来证明该方法的可行性。然后,在涉及复杂数据生成条件和小样本量(即100和250)的模拟研究中,我们表明平均方法恢复了数据生成过程中的不对称性,并且始终优于模型选择和核平滑。本研究提出的方法是更复杂的非对称IRT模型的实用替代方法,也可能是探索性半参数IRT分析的有用方法。
{"title":"Bayesian model averaging of (a)symmetric item response models in small samples.","authors":"Fabio Setti, Leah Feuerstahler","doi":"10.1111/bmsp.70024","DOIUrl":"https://doi.org/10.1111/bmsp.70024","url":null,"abstract":"<p><p>Asymmetric IRT models present theoretically desirable features, but often require large sample sizes for stable estimation due to additional item parameters. When applying item response theory (IRT) to small samples, it is often the case that only models with relatively few item parameters can be reliably estimated. Two recently developed asymmetric IRT models, the negative log-log and the complementary log-log, allow for different IRF shapes compared to conventional IRT models and can be fit with small samples. In this paper, we propose Bayesian model averaging (BMA) of simple symmetric and asymmetric IRT models to explore item asymmetry and to flexibly estimate IRFs in small samples. We also consider model averaging at both the item level and the test level. We first show the feasibility of the approach with an empirical example. Then, in a simulation study involving complex data-generating conditions and small sample sizes (i.e., 100 and 250), we show that averaging methods recover asymmetry in the data-generating process and consistently outperform model selection and kernel smoothing. The methods proposed in this study are a practical alternative to more complex asymmetric IRT models and may also be a useful method in exploratory semi-parametric IRT analysis.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145770060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Differential item functioning detection across multiple groups. 跨多组的差异项目功能检测。
IF 1.8 3区 心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-16 DOI: 10.1111/bmsp.70023
Michela Battauz

Differential item functioning (DIF) can be investigated by estimating item response theory (IRT) parameters separately for different respondent groups, thus allowing for the detection of discrepancies in parameter estimates across groups. However, before comparing the estimates, it is necessary to convert them to a common metric due to the constraints required to identify the model. These processes influence each other, as the presence of DIF items affects the estimation of scale conversion. This paper proposes a novel method that simultaneously performs scale conversion and DIF detection. By doing so, the estimated scale conversion automatically takes into account the presence of DIF. The differences of the item parameter estimates across groups can be explained through variables at the within-group item level or by the group itself. Penalized likelihood estimation is used to perform an automatic selection of the item parameters that differ in some groups. Real-data applications and simulation studies show the good performance of the proposal.

差异项目功能(DIF)可以通过评估项目反应理论(IRT)参数分别为不同的被调查者群体进行调查,从而允许在参数估计跨群体的差异检测。然而,在比较估计之前,由于识别模型所需的约束,有必要将它们转换为公共度量。这些过程相互影响,因为DIF项目的存在影响尺度转换的估计。本文提出了一种同时进行尺度转换和DIF检测的新方法。通过这样做,估计的尺度转换自动考虑到DIF的存在。组间项目参数估计的差异可以通过组内项目级别的变量或组本身来解释。惩罚似然估计用于对某些组中不同的项目参数执行自动选择。实际数据应用和仿真研究表明了该方法的良好性能。
{"title":"Differential item functioning detection across multiple groups.","authors":"Michela Battauz","doi":"10.1111/bmsp.70023","DOIUrl":"https://doi.org/10.1111/bmsp.70023","url":null,"abstract":"<p><p>Differential item functioning (DIF) can be investigated by estimating item response theory (IRT) parameters separately for different respondent groups, thus allowing for the detection of discrepancies in parameter estimates across groups. However, before comparing the estimates, it is necessary to convert them to a common metric due to the constraints required to identify the model. These processes influence each other, as the presence of DIF items affects the estimation of scale conversion. This paper proposes a novel method that simultaneously performs scale conversion and DIF detection. By doing so, the estimated scale conversion automatically takes into account the presence of DIF. The differences of the item parameter estimates across groups can be explained through variables at the within-group item level or by the group itself. Penalized likelihood estimation is used to perform an automatic selection of the item parameters that differ in some groups. Real-data applications and simulation studies show the good performance of the proposal.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145764603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifiability conditions in cognitive diagnosis: Implications for Q-matrix estimation algorithms. 认知诊断中的可识别性条件:对q矩阵估计算法的影响。
IF 1.8 3区 心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-12 DOI: 10.1111/bmsp.70020
Hyunjoo Kim, Hans Friedrich Köhn, Chia-Yi Chiu

The Q-matrix of a cognitively diagnostic assessment (CDA), documenting the item-attribute associations, is a key component of any CDA. However, the true Q-matrix underlying a CDA is never known and must be estimated-typically by content experts. However, due to fallible human judgment, misspecifications of the Q-matrix may occur, resulting in the misclassification of examinees. In response to this challenge, algorithms have been developed to estimate the Q-matrix from item responses. Some algorithms impose identifiability conditions while others do not. The debate about which is "right" is ongoing; especially, since these conditions are sufficient but not necessary, which means viable alternative Q-matrix estimates may be ignored. In this study, the performance of Q-matrix estimation algorithms that impose identifiability conditions on the Q-matrix estimate was compared with that of estimation algorithms which do not impose such identifiability conditions. Large-scale simulations examined the impact of factors like sample size, test length, attributes, or error levels. The estimated Q-matrices were evaluated for meeting identifiability conditions and their accuracy in classifying examinees. The simulation results showed that for the various estimation algorithms studied here, imposing identifiability conditions on Q-matrix estimation did not change outcomes with respect to identifiability or examinee classification.

记录项目属性关联的认知诊断评估(CDA)的q矩阵是任何CDA的关键组成部分。然而,CDA背后真正的q矩阵是未知的,必须由内容专家来估计。然而,由于人的判断不准确,可能会出现q矩阵的错误规格,从而导致考生的错误分类。为了应对这一挑战,已经开发了从项目反应中估计q矩阵的算法。一些算法施加可识别性条件,而另一些则没有。关于哪一个是“正确”的争论仍在继续;特别是,由于这些条件是充分的,但不是必要的,这意味着可行的替代q矩阵估计可能会被忽略。在本研究中,比较了在q矩阵估计上施加可辨识条件的q矩阵估计算法与不施加可辨识条件的估计算法的性能。大规模模拟检查了诸如样本量、测试长度、属性或错误水平等因素的影响。评估估计的q矩阵是否满足可识别性条件及其对考生分类的准确性。仿真结果表明,对于本文研究的各种估计算法,在q矩阵估计上施加可辨识条件并没有改变可辨识性和考生分类的结果。
{"title":"Identifiability conditions in cognitive diagnosis: Implications for Q-matrix estimation algorithms.","authors":"Hyunjoo Kim, Hans Friedrich Köhn, Chia-Yi Chiu","doi":"10.1111/bmsp.70020","DOIUrl":"https://doi.org/10.1111/bmsp.70020","url":null,"abstract":"<p><p>The Q-matrix of a cognitively diagnostic assessment (CDA), documenting the item-attribute associations, is a key component of any CDA. However, the true Q-matrix underlying a CDA is never known and must be estimated-typically by content experts. However, due to fallible human judgment, misspecifications of the Q-matrix may occur, resulting in the misclassification of examinees. In response to this challenge, algorithms have been developed to estimate the Q-matrix from item responses. Some algorithms impose identifiability conditions while others do not. The debate about which is \"right\" is ongoing; especially, since these conditions are sufficient but not necessary, which means viable alternative Q-matrix estimates may be ignored. In this study, the performance of Q-matrix estimation algorithms that impose identifiability conditions on the Q-matrix estimate was compared with that of estimation algorithms which do not impose such identifiability conditions. Large-scale simulations examined the impact of factors like sample size, test length, attributes, or error levels. The estimated Q-matrices were evaluated for meeting identifiability conditions and their accuracy in classifying examinees. The simulation results showed that for the various estimation algorithms studied here, imposing identifiability conditions on Q-matrix estimation did not change outcomes with respect to identifiability or examinee classification.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145745878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multilevel Ornstein-Uhlenbeck process with individual- and variable-specific estimates as random effects. 具有个体和变量特异性估计作为随机效应的多层次Ornstein-Uhlenbeck过程。
IF 1.8 3区 心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-08 DOI: 10.1111/bmsp.70019
José Ángel Martínez-Huertas, Emilio Ferrer

In the present study, we extend a stochastic differential equation (SDE) model, the Ornstein-Uhlenbeck (OU) process, to the simultaneous analysis of time series of multiple variables by means of random effects for individuals and variables using a Bayesian framework. This SDE model is a stationary Gauss-Markov process that varies over time around its mean. Our extension allows us to estimate the variability of different parameters of the process, such as the mean (μ) or the drift parameter (φ), across individuals and variables of the system by means of marginalized posterior distributions. We illustrate the estimations and the interpretability of the parameters of this multilevel OU process in an empirical study of affect dynamics where multiple individuals were measured on different variables at multiple time points. We also conducted a simulation study to evaluate whether the model can recover the population parameters generating the OU process. Our results support the use of this model to obtain both the general parameters (common to all individuals and variables) and the variable-specific point estimates (random effects). We conclude that this multilevel OU process with individual- and variable-specific estimates as random effects can be a useful approach to analyse time series for multiple variables simultaneously.

在本研究中,我们将随机微分方程(SDE)模型,即Ornstein-Uhlenbeck (OU)过程扩展到使用贝叶斯框架,通过个体和变量的随机效应来同时分析多变量时间序列。这个SDE模型是一个平稳的高斯-马尔可夫过程,它在其平均值附近随时间变化。我们的扩展使我们能够通过边缘后验分布估计过程中不同参数的可变性,例如平均值(μ)或漂移参数(φ),这些参数在系统的个体和变量之间。我们通过对多个个体在多个时间点对不同变量进行测量的影响动力学的实证研究,说明了这种多层次OU过程参数的估计和可解释性。我们还进行了模拟研究,以评估该模型是否可以恢复产生OU过程的种群参数。我们的结果支持使用该模型来获得一般参数(所有个体和变量的共同参数)和变量特定点估计(随机效应)。我们得出结论,这种具有个体和变量特定估计作为随机效应的多层OU过程可以是同时分析多个变量的时间序列的有用方法。
{"title":"A multilevel Ornstein-Uhlenbeck process with individual- and variable-specific estimates as random effects.","authors":"José Ángel Martínez-Huertas, Emilio Ferrer","doi":"10.1111/bmsp.70019","DOIUrl":"https://doi.org/10.1111/bmsp.70019","url":null,"abstract":"<p><p>In the present study, we extend a stochastic differential equation (SDE) model, the Ornstein-Uhlenbeck (OU) process, to the simultaneous analysis of time series of multiple variables by means of random effects for individuals and variables using a Bayesian framework. This SDE model is a stationary Gauss-Markov process that varies over time around its mean. Our extension allows us to estimate the variability of different parameters of the process, such as the mean (μ) or the drift parameter (φ), across individuals and variables of the system by means of marginalized posterior distributions. We illustrate the estimations and the interpretability of the parameters of this multilevel OU process in an empirical study of affect dynamics where multiple individuals were measured on different variables at multiple time points. We also conducted a simulation study to evaluate whether the model can recover the population parameters generating the OU process. Our results support the use of this model to obtain both the general parameters (common to all individuals and variables) and the variable-specific point estimates (random effects). We conclude that this multilevel OU process with individual- and variable-specific estimates as random effects can be a useful approach to analyse time series for multiple variables simultaneously.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145702571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From tetrachoric to kappa: How to assess reliability on binary scales. 从四分频到kappa:如何在二值尺度上评估信度。
IF 1.8 3区 心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-08 DOI: 10.1111/bmsp.70021
Sophie Vanbelle

Reliability is crucial in psychometrics, reflecting the extent to which a measurement instrument can discriminate between individuals or items. While classical test theory and intraclass correlation coefficients are well-established for quantitative scales, estimating reliability for binary outcomes presents unique challenges due to their discrete nature. This paper reviews and links three major approaches to estimate reliability for single ratings on binary scales: the normal approximation approach, kappa coefficients, and the latent variable approach, which enables estimation at both latent and manifest scale levels. We clarify their conceptual relationships, show conditions for asymptotical equivalence, and evaluate their performance across two common study designs, repeatability and reproducibility studies. Then, we extend the Bayesian Dirichlet-multinomial method for estimating kappa coefficients to settings with more than two replicates, without requiring Bayesian software. Additionally, we introduce a Bayesian method to estimate manifest scale reliability from latent scale reliability that can be implemented in standard Bayesian software. A simulation study compares the statistical properties of the three major approaches across Bayesian and frequentist frameworks. Overall, the normal approximation approach performed poorly, and the frequentist approach was unreliable due to singularity issues. The findings offer further refined practical recommendations.

可靠性在心理测量学中是至关重要的,它反映了测量工具在个体或项目之间区分的程度。虽然经典的测试理论和类内相关系数在定量尺度上已经建立,但由于其离散性,估计二元结果的可靠性面临着独特的挑战。本文回顾并联系了三种主要的方法来估计二元尺度上单个评级的可靠性:正态近似方法,kappa系数和潜在变量方法,它可以在潜在和显化尺度水平上进行估计。我们澄清了它们的概念关系,展示了渐近等价的条件,并在两种常见的研究设计,可重复性和可重复性研究中评估了它们的性能。然后,我们将估计kappa系数的贝叶斯dirichlet -多项式方法推广到有两个以上重复的设置,而不需要贝叶斯软件。此外,我们还介绍了一种贝叶斯方法,可以在标准贝叶斯软件中实现从潜在尺度可靠性估计显尺度可靠性。仿真研究比较了三种主要方法在贝叶斯和频率框架中的统计特性。总的来说,正态近似方法表现不佳,而频率方法由于奇点问题而不可靠。研究结果提供了进一步完善的实用建议。
{"title":"From tetrachoric to kappa: How to assess reliability on binary scales.","authors":"Sophie Vanbelle","doi":"10.1111/bmsp.70021","DOIUrl":"https://doi.org/10.1111/bmsp.70021","url":null,"abstract":"<p><p>Reliability is crucial in psychometrics, reflecting the extent to which a measurement instrument can discriminate between individuals or items. While classical test theory and intraclass correlation coefficients are well-established for quantitative scales, estimating reliability for binary outcomes presents unique challenges due to their discrete nature. This paper reviews and links three major approaches to estimate reliability for single ratings on binary scales: the normal approximation approach, kappa coefficients, and the latent variable approach, which enables estimation at both latent and manifest scale levels. We clarify their conceptual relationships, show conditions for asymptotical equivalence, and evaluate their performance across two common study designs, repeatability and reproducibility studies. Then, we extend the Bayesian Dirichlet-multinomial method for estimating kappa coefficients to settings with more than two replicates, without requiring Bayesian software. Additionally, we introduce a Bayesian method to estimate manifest scale reliability from latent scale reliability that can be implemented in standard Bayesian software. A simulation study compares the statistical properties of the three major approaches across Bayesian and frequentist frameworks. Overall, the normal approximation approach performed poorly, and the frequentist approach was unreliable due to singularity issues. The findings offer further refined practical recommendations.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145702732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial acknowledgement 社论承认
IF 1.8 3区 心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-11-17 DOI: 10.1111/bmsp.70017
{"title":"Editorial acknowledgement","authors":"","doi":"10.1111/bmsp.70017","DOIUrl":"https://doi.org/10.1111/bmsp.70017","url":null,"abstract":"","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"79 1","pages":"229-230"},"PeriodicalIF":1.8,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145941591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sample size determination for hypothesis testing on the intraclass correlation coefficient in a two-way analysis of variance model. 双向方差分析模型中对类内相关系数进行假设检验的样本量确定。
IF 1.8 3区 心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-11-14 DOI: 10.1111/bmsp.70016
Dipro Mondal, Alberto Cassese, Math J J M Candel, Sophie Vanbelle

Reliability evaluation is critical in fields such as psychology and medicine to ensure accurate diagnosis and effective treatment management. When participants are evaluated by the same raters, a two-way ANOVA model is suitable to model the data, with the intraclass correlation coefficient (ICC) serving as the reliability metric. In these domains, the ICC for agreement (ICCa) is commonly used, as the values of the measurements themselves are of interest. Designing such reliability studies requires determining the sample size of participants and raters for the ICCa. Although procedures for sample size determination exist based on the expected width of the confidence interval for the ICCa, there is limited work on hypothesis testing. This paper addresses this gap by proposing procedures to ensure sufficient power to statistically test whether the ICCa exceeds a predetermined value, utilizing confidence intervals for the ICCa. We compared the available confidence interval methods for the ICCa and proposed sample size procedures using the lower confidence limit of the best performing methods. These procedures were evaluated considering the empirical power of the hypothesis test under various parameter configurations. Furthermore, these procedures are implemented in an interactive R shiny app, freely available to researchers for determining sample sizes.

在心理学和医学等领域,可靠性评估是确保准确诊断和有效治疗管理的关键。当参与者被相同的评分者评估时,双向方差分析模型适合对数据建模,类内相关系数(ICC)作为信度度量。在这些领域中,通常使用一致的ICC (ICCa),因为测量值本身是有意义的。设计这样的可靠性研究需要确定ICCa的参与者和评分者的样本量。尽管存在基于ICCa置信区间的预期宽度确定样本量的程序,但在假设检验方面的工作有限。本文通过提出程序来解决这一差距,以确保有足够的能力来统计检验ICCa是否超过预定值,利用ICCa的置信区间。我们比较了ICCa的可用置信区间方法和使用最佳方法的较低置信限提出的样本量程序。考虑到假设检验在各种参数配置下的经验能力,对这些程序进行了评估。此外,这些程序是在一个交互式的R闪亮应用程序中实现的,研究人员可以免费使用它来确定样本量。
{"title":"Sample size determination for hypothesis testing on the intraclass correlation coefficient in a two-way analysis of variance model.","authors":"Dipro Mondal, Alberto Cassese, Math J J M Candel, Sophie Vanbelle","doi":"10.1111/bmsp.70016","DOIUrl":"https://doi.org/10.1111/bmsp.70016","url":null,"abstract":"<p><p>Reliability evaluation is critical in fields such as psychology and medicine to ensure accurate diagnosis and effective treatment management. When participants are evaluated by the same raters, a two-way ANOVA model is suitable to model the data, with the intraclass correlation coefficient (ICC) serving as the reliability metric. In these domains, the ICC for agreement (ICCa) is commonly used, as the values of the measurements themselves are of interest. Designing such reliability studies requires determining the sample size of participants and raters for the ICCa. Although procedures for sample size determination exist based on the expected width of the confidence interval for the ICCa, there is limited work on hypothesis testing. This paper addresses this gap by proposing procedures to ensure sufficient power to statistically test whether the ICCa exceeds a predetermined value, utilizing confidence intervals for the ICCa. We compared the available confidence interval methods for the ICCa and proposed sample size procedures using the lower confidence limit of the best performing methods. These procedures were evaluated considering the empirical power of the hypothesis test under various parameter configurations. Furthermore, these procedures are implemented in an interactive R shiny app, freely available to researchers for determining sample sizes.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145524936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalized extreme value IRT models. 广义极值IRT模型。
IF 1.8 3区 心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-11-12 DOI: 10.1111/bmsp.70015
Jessica Alves, Jorge Bazán, Jorge González

This paper introduces two new Item Response Theory (IRT) models, based on the Generalized Extreme Value (GEV) distribution. These new models have asymmetric item characteristic curves (ICC) which have drawn growing interest, as they may better model actual item response behaviours in specific scenarios. The analysis of the models is carried out using a Bayesian approach, and their properties are examined and discussed. The validity of the models is verified by means of extensive simulation studies to evaluate the sensitivity of the model to the choice of priors on the new item parameter introduced, the accuracy of the parameters' recovery, as well as an assessment of the capacity of model comparison criteria to choose the best model against other IRT models. The new models are exemplified using real data from two mathematics tests, one applied in Peruvian public schools and another one administered to incoming university students in Chile. In both cases, the proposed models showed to be a promising alternative to asymmetric IRT models, offering new insights into item response modelling.

本文介绍了基于广义极值分布的两个新的项目反应理论模型。这些新模型具有不对称的物品特征曲线(ICC),这引起了人们越来越多的兴趣,因为它们可以更好地模拟特定场景下的实际物品反应行为。采用贝叶斯方法对模型进行了分析,并对其性质进行了检验和讨论。通过广泛的仿真研究来验证模型的有效性,以评估模型对引入的新项目参数的先验选择的敏感性,参数恢复的准确性,以及评估模型比较标准相对于其他IRT模型选择最佳模型的能力。新模型使用了两个数学测试的真实数据,一个用于秘鲁公立学校,另一个用于智利即将入学的大学生。在这两种情况下,所提出的模型显示出不对称IRT模型的一个有希望的替代方案,为项目反应建模提供了新的见解。
{"title":"Generalized extreme value IRT models.","authors":"Jessica Alves, Jorge Bazán, Jorge González","doi":"10.1111/bmsp.70015","DOIUrl":"https://doi.org/10.1111/bmsp.70015","url":null,"abstract":"<p><p>This paper introduces two new Item Response Theory (IRT) models, based on the Generalized Extreme Value (GEV) distribution. These new models have asymmetric item characteristic curves (ICC) which have drawn growing interest, as they may better model actual item response behaviours in specific scenarios. The analysis of the models is carried out using a Bayesian approach, and their properties are examined and discussed. The validity of the models is verified by means of extensive simulation studies to evaluate the sensitivity of the model to the choice of priors on the new item parameter introduced, the accuracy of the parameters' recovery, as well as an assessment of the capacity of model comparison criteria to choose the best model against other IRT models. The new models are exemplified using real data from two mathematics tests, one applied in Peruvian public schools and another one administered to incoming university students in Chile. In both cases, the proposed models showed to be a promising alternative to asymmetric IRT models, offering new insights into item response modelling.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145497315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reliability measures in knowledge structure theory. 知识结构理论中的可靠性测度。
IF 1.8 3区 心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-11-01 DOI: 10.1111/bmsp.70013
Debora de Chiusole, Andrea Spoto, Umberto Granziol, Luca Stefanutti

In knowledge structure theory (KST) framework, this study evaluates the reliability of knowledge state estimation by introducing two key measures: the expected accuracy rate and the expected discrepancy. The accuracy rate quantifies the likelihood that the estimated knowledge state aligns with the true state, while the expected discrepancy measures the average deviation when misclassification occurs. To support the theoretical framework, we provide an in-depth discussion of these indices, supplemented by two simulation studies and an empirical example. The simulation results reveal a trade-off between the number of items and the size of the knowledge structure. Specifically, smaller structures exhibit consistent accuracy across different error levels, while larger structures show increasing discrepancies as error rates rise. Nevertheless, accuracy improves with a greater number of items in larger structures, mitigating the impact of errors. Additionally, the expected discrepancy analysis shows that when misclassification occurs, the estimated state is generally close to the true one, minimizing the effect of errors in the assessment. Finally, an empirical application using real assessment data demonstrates the practical relevance of the proposed measures. This suggests that KST-based assessments provide reliable and meaningful diagnostic information, highlighting their potential for use in educational and psychological testing.

在知识结构理论(KST)框架下,通过引入预期正确率和预期偏差这两个关键指标来评估知识状态估计的可靠性。准确率量化了估计的知识状态与真实状态一致的可能性,而期望偏差度量了发生错误分类时的平均偏差。为了支持理论框架,我们对这些指标进行了深入的讨论,并辅以两个模拟研究和一个实证例子。仿真结果揭示了项目数量与知识结构大小之间的权衡关系。具体来说,较小的结构在不同的错误级别上表现出一致的准确性,而较大的结构随着错误率的增加而表现出越来越大的差异。然而,在较大的结构中,随着项目数量的增加,准确性会提高,从而减轻错误的影响。此外,期望差异分析表明,当发生误分类时,估计状态通常接近真实状态,最小化了评估误差的影响。最后,使用真实评估数据的实证应用证明了所提出措施的实际相关性。这表明基于kst的评估提供了可靠和有意义的诊断信息,突出了它们在教育和心理测试中的应用潜力。
{"title":"Reliability measures in knowledge structure theory.","authors":"Debora de Chiusole, Andrea Spoto, Umberto Granziol, Luca Stefanutti","doi":"10.1111/bmsp.70013","DOIUrl":"https://doi.org/10.1111/bmsp.70013","url":null,"abstract":"<p><p>In knowledge structure theory (KST) framework, this study evaluates the reliability of knowledge state estimation by introducing two key measures: the expected accuracy rate and the expected discrepancy. The accuracy rate quantifies the likelihood that the estimated knowledge state aligns with the true state, while the expected discrepancy measures the average deviation when misclassification occurs. To support the theoretical framework, we provide an in-depth discussion of these indices, supplemented by two simulation studies and an empirical example. The simulation results reveal a trade-off between the number of items and the size of the knowledge structure. Specifically, smaller structures exhibit consistent accuracy across different error levels, while larger structures show increasing discrepancies as error rates rise. Nevertheless, accuracy improves with a greater number of items in larger structures, mitigating the impact of errors. Additionally, the expected discrepancy analysis shows that when misclassification occurs, the estimated state is generally close to the true one, minimizing the effect of errors in the assessment. Finally, an empirical application using real assessment data demonstrates the practical relevance of the proposed measures. This suggests that KST-based assessments provide reliable and meaningful diagnostic information, highlighting their potential for use in educational and psychological testing.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145423528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
British Journal of Mathematical & Statistical Psychology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1