首页 > 最新文献

British Journal of Mathematical & Statistical Psychology最新文献

英文 中文
From tetrachoric to kappa: How to assess reliability on binary scales. 从四分频到kappa:如何在二值尺度上评估信度。
IF 1.8 3区 心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-08 DOI: 10.1111/bmsp.70021
Sophie Vanbelle

Reliability is crucial in psychometrics, reflecting the extent to which a measurement instrument can discriminate between individuals or items. While classical test theory and intraclass correlation coefficients are well-established for quantitative scales, estimating reliability for binary outcomes presents unique challenges due to their discrete nature. This paper reviews and links three major approaches to estimate reliability for single ratings on binary scales: the normal approximation approach, kappa coefficients, and the latent variable approach, which enables estimation at both latent and manifest scale levels. We clarify their conceptual relationships, show conditions for asymptotical equivalence, and evaluate their performance across two common study designs, repeatability and reproducibility studies. Then, we extend the Bayesian Dirichlet-multinomial method for estimating kappa coefficients to settings with more than two replicates, without requiring Bayesian software. Additionally, we introduce a Bayesian method to estimate manifest scale reliability from latent scale reliability that can be implemented in standard Bayesian software. A simulation study compares the statistical properties of the three major approaches across Bayesian and frequentist frameworks. Overall, the normal approximation approach performed poorly, and the frequentist approach was unreliable due to singularity issues. The findings offer further refined practical recommendations.

可靠性在心理测量学中是至关重要的,它反映了测量工具在个体或项目之间区分的程度。虽然经典的测试理论和类内相关系数在定量尺度上已经建立,但由于其离散性,估计二元结果的可靠性面临着独特的挑战。本文回顾并联系了三种主要的方法来估计二元尺度上单个评级的可靠性:正态近似方法,kappa系数和潜在变量方法,它可以在潜在和显化尺度水平上进行估计。我们澄清了它们的概念关系,展示了渐近等价的条件,并在两种常见的研究设计,可重复性和可重复性研究中评估了它们的性能。然后,我们将估计kappa系数的贝叶斯dirichlet -多项式方法推广到有两个以上重复的设置,而不需要贝叶斯软件。此外,我们还介绍了一种贝叶斯方法,可以在标准贝叶斯软件中实现从潜在尺度可靠性估计显尺度可靠性。仿真研究比较了三种主要方法在贝叶斯和频率框架中的统计特性。总的来说,正态近似方法表现不佳,而频率方法由于奇点问题而不可靠。研究结果提供了进一步完善的实用建议。
{"title":"From tetrachoric to kappa: How to assess reliability on binary scales.","authors":"Sophie Vanbelle","doi":"10.1111/bmsp.70021","DOIUrl":"https://doi.org/10.1111/bmsp.70021","url":null,"abstract":"<p><p>Reliability is crucial in psychometrics, reflecting the extent to which a measurement instrument can discriminate between individuals or items. While classical test theory and intraclass correlation coefficients are well-established for quantitative scales, estimating reliability for binary outcomes presents unique challenges due to their discrete nature. This paper reviews and links three major approaches to estimate reliability for single ratings on binary scales: the normal approximation approach, kappa coefficients, and the latent variable approach, which enables estimation at both latent and manifest scale levels. We clarify their conceptual relationships, show conditions for asymptotical equivalence, and evaluate their performance across two common study designs, repeatability and reproducibility studies. Then, we extend the Bayesian Dirichlet-multinomial method for estimating kappa coefficients to settings with more than two replicates, without requiring Bayesian software. Additionally, we introduce a Bayesian method to estimate manifest scale reliability from latent scale reliability that can be implemented in standard Bayesian software. A simulation study compares the statistical properties of the three major approaches across Bayesian and frequentist frameworks. Overall, the normal approximation approach performed poorly, and the frequentist approach was unreliable due to singularity issues. The findings offer further refined practical recommendations.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145702732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial acknowledgement 社论承认
IF 1.8 3区 心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-11-17 DOI: 10.1111/bmsp.70017
{"title":"Editorial acknowledgement","authors":"","doi":"10.1111/bmsp.70017","DOIUrl":"https://doi.org/10.1111/bmsp.70017","url":null,"abstract":"","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"79 1","pages":"229-230"},"PeriodicalIF":1.8,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145941591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sample size determination for hypothesis testing on the intraclass correlation coefficient in a two-way analysis of variance model. 双向方差分析模型中对类内相关系数进行假设检验的样本量确定。
IF 1.8 3区 心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-11-14 DOI: 10.1111/bmsp.70016
Dipro Mondal, Alberto Cassese, Math J J M Candel, Sophie Vanbelle

Reliability evaluation is critical in fields such as psychology and medicine to ensure accurate diagnosis and effective treatment management. When participants are evaluated by the same raters, a two-way ANOVA model is suitable to model the data, with the intraclass correlation coefficient (ICC) serving as the reliability metric. In these domains, the ICC for agreement (ICCa) is commonly used, as the values of the measurements themselves are of interest. Designing such reliability studies requires determining the sample size of participants and raters for the ICCa. Although procedures for sample size determination exist based on the expected width of the confidence interval for the ICCa, there is limited work on hypothesis testing. This paper addresses this gap by proposing procedures to ensure sufficient power to statistically test whether the ICCa exceeds a predetermined value, utilizing confidence intervals for the ICCa. We compared the available confidence interval methods for the ICCa and proposed sample size procedures using the lower confidence limit of the best performing methods. These procedures were evaluated considering the empirical power of the hypothesis test under various parameter configurations. Furthermore, these procedures are implemented in an interactive R shiny app, freely available to researchers for determining sample sizes.

在心理学和医学等领域,可靠性评估是确保准确诊断和有效治疗管理的关键。当参与者被相同的评分者评估时,双向方差分析模型适合对数据建模,类内相关系数(ICC)作为信度度量。在这些领域中,通常使用一致的ICC (ICCa),因为测量值本身是有意义的。设计这样的可靠性研究需要确定ICCa的参与者和评分者的样本量。尽管存在基于ICCa置信区间的预期宽度确定样本量的程序,但在假设检验方面的工作有限。本文通过提出程序来解决这一差距,以确保有足够的能力来统计检验ICCa是否超过预定值,利用ICCa的置信区间。我们比较了ICCa的可用置信区间方法和使用最佳方法的较低置信限提出的样本量程序。考虑到假设检验在各种参数配置下的经验能力,对这些程序进行了评估。此外,这些程序是在一个交互式的R闪亮应用程序中实现的,研究人员可以免费使用它来确定样本量。
{"title":"Sample size determination for hypothesis testing on the intraclass correlation coefficient in a two-way analysis of variance model.","authors":"Dipro Mondal, Alberto Cassese, Math J J M Candel, Sophie Vanbelle","doi":"10.1111/bmsp.70016","DOIUrl":"https://doi.org/10.1111/bmsp.70016","url":null,"abstract":"<p><p>Reliability evaluation is critical in fields such as psychology and medicine to ensure accurate diagnosis and effective treatment management. When participants are evaluated by the same raters, a two-way ANOVA model is suitable to model the data, with the intraclass correlation coefficient (ICC) serving as the reliability metric. In these domains, the ICC for agreement (ICCa) is commonly used, as the values of the measurements themselves are of interest. Designing such reliability studies requires determining the sample size of participants and raters for the ICCa. Although procedures for sample size determination exist based on the expected width of the confidence interval for the ICCa, there is limited work on hypothesis testing. This paper addresses this gap by proposing procedures to ensure sufficient power to statistically test whether the ICCa exceeds a predetermined value, utilizing confidence intervals for the ICCa. We compared the available confidence interval methods for the ICCa and proposed sample size procedures using the lower confidence limit of the best performing methods. These procedures were evaluated considering the empirical power of the hypothesis test under various parameter configurations. Furthermore, these procedures are implemented in an interactive R shiny app, freely available to researchers for determining sample sizes.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145524936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalized extreme value IRT models. 广义极值IRT模型。
IF 1.8 3区 心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-11-12 DOI: 10.1111/bmsp.70015
Jessica Alves, Jorge Bazán, Jorge González

This paper introduces two new Item Response Theory (IRT) models, based on the Generalized Extreme Value (GEV) distribution. These new models have asymmetric item characteristic curves (ICC) which have drawn growing interest, as they may better model actual item response behaviours in specific scenarios. The analysis of the models is carried out using a Bayesian approach, and their properties are examined and discussed. The validity of the models is verified by means of extensive simulation studies to evaluate the sensitivity of the model to the choice of priors on the new item parameter introduced, the accuracy of the parameters' recovery, as well as an assessment of the capacity of model comparison criteria to choose the best model against other IRT models. The new models are exemplified using real data from two mathematics tests, one applied in Peruvian public schools and another one administered to incoming university students in Chile. In both cases, the proposed models showed to be a promising alternative to asymmetric IRT models, offering new insights into item response modelling.

本文介绍了基于广义极值分布的两个新的项目反应理论模型。这些新模型具有不对称的物品特征曲线(ICC),这引起了人们越来越多的兴趣,因为它们可以更好地模拟特定场景下的实际物品反应行为。采用贝叶斯方法对模型进行了分析,并对其性质进行了检验和讨论。通过广泛的仿真研究来验证模型的有效性,以评估模型对引入的新项目参数的先验选择的敏感性,参数恢复的准确性,以及评估模型比较标准相对于其他IRT模型选择最佳模型的能力。新模型使用了两个数学测试的真实数据,一个用于秘鲁公立学校,另一个用于智利即将入学的大学生。在这两种情况下,所提出的模型显示出不对称IRT模型的一个有希望的替代方案,为项目反应建模提供了新的见解。
{"title":"Generalized extreme value IRT models.","authors":"Jessica Alves, Jorge Bazán, Jorge González","doi":"10.1111/bmsp.70015","DOIUrl":"https://doi.org/10.1111/bmsp.70015","url":null,"abstract":"<p><p>This paper introduces two new Item Response Theory (IRT) models, based on the Generalized Extreme Value (GEV) distribution. These new models have asymmetric item characteristic curves (ICC) which have drawn growing interest, as they may better model actual item response behaviours in specific scenarios. The analysis of the models is carried out using a Bayesian approach, and their properties are examined and discussed. The validity of the models is verified by means of extensive simulation studies to evaluate the sensitivity of the model to the choice of priors on the new item parameter introduced, the accuracy of the parameters' recovery, as well as an assessment of the capacity of model comparison criteria to choose the best model against other IRT models. The new models are exemplified using real data from two mathematics tests, one applied in Peruvian public schools and another one administered to incoming university students in Chile. In both cases, the proposed models showed to be a promising alternative to asymmetric IRT models, offering new insights into item response modelling.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145497315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reliability measures in knowledge structure theory. 知识结构理论中的可靠性测度。
IF 1.8 3区 心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-11-01 DOI: 10.1111/bmsp.70013
Debora de Chiusole, Andrea Spoto, Umberto Granziol, Luca Stefanutti

In knowledge structure theory (KST) framework, this study evaluates the reliability of knowledge state estimation by introducing two key measures: the expected accuracy rate and the expected discrepancy. The accuracy rate quantifies the likelihood that the estimated knowledge state aligns with the true state, while the expected discrepancy measures the average deviation when misclassification occurs. To support the theoretical framework, we provide an in-depth discussion of these indices, supplemented by two simulation studies and an empirical example. The simulation results reveal a trade-off between the number of items and the size of the knowledge structure. Specifically, smaller structures exhibit consistent accuracy across different error levels, while larger structures show increasing discrepancies as error rates rise. Nevertheless, accuracy improves with a greater number of items in larger structures, mitigating the impact of errors. Additionally, the expected discrepancy analysis shows that when misclassification occurs, the estimated state is generally close to the true one, minimizing the effect of errors in the assessment. Finally, an empirical application using real assessment data demonstrates the practical relevance of the proposed measures. This suggests that KST-based assessments provide reliable and meaningful diagnostic information, highlighting their potential for use in educational and psychological testing.

在知识结构理论(KST)框架下,通过引入预期正确率和预期偏差这两个关键指标来评估知识状态估计的可靠性。准确率量化了估计的知识状态与真实状态一致的可能性,而期望偏差度量了发生错误分类时的平均偏差。为了支持理论框架,我们对这些指标进行了深入的讨论,并辅以两个模拟研究和一个实证例子。仿真结果揭示了项目数量与知识结构大小之间的权衡关系。具体来说,较小的结构在不同的错误级别上表现出一致的准确性,而较大的结构随着错误率的增加而表现出越来越大的差异。然而,在较大的结构中,随着项目数量的增加,准确性会提高,从而减轻错误的影响。此外,期望差异分析表明,当发生误分类时,估计状态通常接近真实状态,最小化了评估误差的影响。最后,使用真实评估数据的实证应用证明了所提出措施的实际相关性。这表明基于kst的评估提供了可靠和有意义的诊断信息,突出了它们在教育和心理测试中的应用潜力。
{"title":"Reliability measures in knowledge structure theory.","authors":"Debora de Chiusole, Andrea Spoto, Umberto Granziol, Luca Stefanutti","doi":"10.1111/bmsp.70013","DOIUrl":"https://doi.org/10.1111/bmsp.70013","url":null,"abstract":"<p><p>In knowledge structure theory (KST) framework, this study evaluates the reliability of knowledge state estimation by introducing two key measures: the expected accuracy rate and the expected discrepancy. The accuracy rate quantifies the likelihood that the estimated knowledge state aligns with the true state, while the expected discrepancy measures the average deviation when misclassification occurs. To support the theoretical framework, we provide an in-depth discussion of these indices, supplemented by two simulation studies and an empirical example. The simulation results reveal a trade-off between the number of items and the size of the knowledge structure. Specifically, smaller structures exhibit consistent accuracy across different error levels, while larger structures show increasing discrepancies as error rates rise. Nevertheless, accuracy improves with a greater number of items in larger structures, mitigating the impact of errors. Additionally, the expected discrepancy analysis shows that when misclassification occurs, the estimated state is generally close to the true one, minimizing the effect of errors in the assessment. Finally, an empirical application using real assessment data demonstrates the practical relevance of the proposed measures. This suggests that KST-based assessments provide reliable and meaningful diagnostic information, highlighting their potential for use in educational and psychological testing.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145423528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An investigation into in-sample and out-of-sample model selection for nonstationary autoregressive models. 非平稳自回归模型的样本内和样本外模型选择研究。
IF 1.8 3区 心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-28 DOI: 10.1111/bmsp.70012
Yong Zhang, Anja F Ernst, Ginette Lafit, Ward B Eiling, Laura F Bringmann

The stationary autoregressive model forms an important base of time-series analysis in today's psychology research. Diverse nonstationary extensions of this model are developed to capture different types of changing temporal dynamics. However, researchers do not always have a solid theoretical base to rely on for deciding a-priori which of these nonstationary models is the most appropriate for a given time-series. In this case, correct model selection becomes a crucial step to ensure an accurate understanding of the temporal dynamics. This study consists of two main parts. First, with a simulation study, we investigated the performance of in-sample (information criteria) and out-of-sample (cross-validation, out-of-sample prediction) model selection techniques in identifying six different univariate nonstationary processes. We found that the Bayesian information criteria (BIC) has an overall optimal performance whereas other techniques' performance depends largely on the time-series' length. Then, we re-analysed a 239-day-long time-series of positive and negative affect to illustrate the model selection process. Combining the simulation results and practical considerations from the empirical analysis, we argue that model selection for nonstationary time-series should not completely rely on data-driven approaches. Instead, more theory-driven approaches where researchers actively integrate their qualitative understanding will inform the data-driven approaches.

平稳自回归模型是当今心理学研究中时间序列分析的重要基础。开发了该模型的各种非平稳扩展以捕获不同类型的变化时间动态。然而,研究人员并不总是有一个坚实的理论基础来先验地决定这些非平稳模型中哪一个最适合给定的时间序列。在这种情况下,正确的模型选择成为确保准确理解时间动态的关键步骤。本研究主要由两个部分组成。首先,通过模拟研究,我们研究了样本内(信息标准)和样本外(交叉验证,样本外预测)模型选择技术在识别六种不同的单变量非平稳过程中的性能。我们发现贝叶斯信息标准(BIC)具有整体最优性能,而其他技术的性能在很大程度上取决于时间序列的长度。然后,我们重新分析了239天的积极和消极影响的时间序列,以说明模型选择过程。结合模拟结果和经验分析的实际考虑,我们认为非平稳时间序列的模型选择不应完全依赖于数据驱动的方法。相反,更多的理论驱动的方法,研究人员积极地整合他们的定性理解,将为数据驱动的方法提供信息。
{"title":"An investigation into in-sample and out-of-sample model selection for nonstationary autoregressive models.","authors":"Yong Zhang, Anja F Ernst, Ginette Lafit, Ward B Eiling, Laura F Bringmann","doi":"10.1111/bmsp.70012","DOIUrl":"https://doi.org/10.1111/bmsp.70012","url":null,"abstract":"<p><p>The stationary autoregressive model forms an important base of time-series analysis in today's psychology research. Diverse nonstationary extensions of this model are developed to capture different types of changing temporal dynamics. However, researchers do not always have a solid theoretical base to rely on for deciding a-priori which of these nonstationary models is the most appropriate for a given time-series. In this case, correct model selection becomes a crucial step to ensure an accurate understanding of the temporal dynamics. This study consists of two main parts. First, with a simulation study, we investigated the performance of in-sample (information criteria) and out-of-sample (cross-validation, out-of-sample prediction) model selection techniques in identifying six different univariate nonstationary processes. We found that the Bayesian information criteria (BIC) has an overall optimal performance whereas other techniques' performance depends largely on the time-series' length. Then, we re-analysed a 239-day-long time-series of positive and negative affect to illustrate the model selection process. Combining the simulation results and practical considerations from the empirical analysis, we argue that model selection for nonstationary time-series should not completely rely on data-driven approaches. Instead, more theory-driven approaches where researchers actively integrate their qualitative understanding will inform the data-driven approaches.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145395303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reinforcement learning-based adaptive learning: Rewards improvement considering learning duration. 基于强化学习的自适应学习:考虑学习持续时间奖励改进。
IF 1.8 3区 心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-10-24 DOI: 10.1111/bmsp.70014
Tongxin Zhang, Canxi Cao, Tao Xin, Xiaoming Zhai

Reinforcement learning (RL) powers the engine of adaptive learning systems which recommend customized learning materials to individual learners in their varying learning states to optimize learning effectiveness. However, some argue that only improving learning effectiveness may be insufficient, particularly if it overly extends learning efforts and requires additional time to work on the recommended materials. Learners with different amounts of prior knowledge consume different amounts of time on the same material. Therefore, designers should consider both the usefulness of the material and the time dedicated to making sense of the materials by individual learners with a specific amount of prior knowledge. To fill this gap, this study proposes a RL-based adaptive learning system wherein reward is improved by considering both factors. We then conducted Monte Carlo simulation studies to verify the effects of the improved reward and uncover the mechanisms for RL recommendation strategies. Results show that the improved reward reduces learners' learning duration substantially due to interpretable recommendation strategies, which results in growing learning efficiency for learners with varying prior knowledge.

强化学习(RL)为自适应学习系统提供动力,该系统向处于不同学习状态的个体学习者推荐定制的学习材料,以优化学习效率。然而,一些人认为,仅仅提高学习效率可能是不够的,特别是如果它过度扩展了学习努力,需要额外的时间来学习推荐的材料。具有不同先验知识的学习者在同一材料上花费的时间不同。因此,设计师既要考虑材料的有用性,也要考虑具有特定先验知识的个体学习者理解材料所花费的时间。为了填补这一空白,本研究提出了一个基于强化学习的自适应学习系统,其中奖励通过考虑这两个因素而得到改善。然后,我们进行了蒙特卡洛模拟研究,以验证改进奖励的效果,并揭示强化学习推荐策略的机制。结果表明,由于可解释的推荐策略,改进的奖励大大减少了学习者的学习时间,从而提高了具有不同先验知识的学习者的学习效率。
{"title":"Reinforcement learning-based adaptive learning: Rewards improvement considering learning duration.","authors":"Tongxin Zhang, Canxi Cao, Tao Xin, Xiaoming Zhai","doi":"10.1111/bmsp.70014","DOIUrl":"https://doi.org/10.1111/bmsp.70014","url":null,"abstract":"<p><p>Reinforcement learning (RL) powers the engine of adaptive learning systems which recommend customized learning materials to individual learners in their varying learning states to optimize learning effectiveness. However, some argue that only improving learning effectiveness may be insufficient, particularly if it overly extends learning efforts and requires additional time to work on the recommended materials. Learners with different amounts of prior knowledge consume different amounts of time on the same material. Therefore, designers should consider both the usefulness of the material and the time dedicated to making sense of the materials by individual learners with a specific amount of prior knowledge. To fill this gap, this study proposes a RL-based adaptive learning system wherein reward is improved by considering both factors. We then conducted Monte Carlo simulation studies to verify the effects of the improved reward and uncover the mechanisms for RL recommendation strategies. Results show that the improved reward reduces learners' learning duration substantially due to interpretable recommendation strategies, which results in growing learning efficiency for learners with varying prior knowledge.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145356833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An extension of the basic local independence model to multiple observed classifications. 将基本局部独立模型扩展到多个观察分类。
IF 1.8 3区 心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-09-21 DOI: 10.1111/bmsp.70008
Pasquale Anselmi, Debora de Chiusole, Egidio Robusto, Alice Bacherini, Giulia Balboni, Andrea Brancaccio, Ottavia M Epifania, Noemi Mazzoni, Luca Stefanutti

The basic local independence model (BLIM) is appropriate in situations where populations do not differ in the probabilities of the knowledge states and the probabilities of careless errors and lucky guesses of the items. In some situations, this is not the case. This work introduces the multiple observed classification local independence model (MOCLIM), which extends the BLIM by allowing the above probabilities to vary across populations. In the MOCLIM, each individual is characterized by proficiency, careless and guessing classes, which are observed and determine the probabilities of knowledge states, careless errors and lucky guesses of a population. Given a particular class type (proficiency, careless, or guessing), the probabilities are the same for populations with the same class but may vary between populations with different classes. Algorithms for maximum likelihood estimation of the MOCLIM parameters are provided. The results of a simulation study suggest that the true parameter values are well recovered by the estimation algorithm and that the true model can be uncovered by comparing the goodness-of-fit of alternative models. The results of an empirical application to data from Raven-like matrices suggest that the MOCLIM effectively discriminates between situations where group differences are expected and those where they are not.

基本的局部独立模型(BLIM)适用于总体在知识状态的概率、粗心错误的概率和项目的幸运猜测的概率方面没有差异的情况。在某些情况下,情况并非如此。本文引入了多重观测分类局部独立模型(MOCLIM),该模型通过允许上述概率在种群中变化来扩展blm。在MOCLIM中,每个个体被分为熟练、粗心和猜测三个类别,观察并确定总体的知识状态、粗心错误和幸运猜测的概率。给定特定的职业类型(熟练、粗心或猜测),具有相同职业的人群的概率是相同的,但不同职业的人群之间可能会有所不同。给出了MOCLIM参数的最大似然估计算法。仿真研究结果表明,该估计算法能够很好地恢复参数的真实值,并且可以通过比较备选模型的拟合优度来揭示真实模型。对Raven-like矩阵数据的实证应用结果表明,MOCLIM有效地区分了预期群体差异和不预期群体差异的情况。
{"title":"An extension of the basic local independence model to multiple observed classifications.","authors":"Pasquale Anselmi, Debora de Chiusole, Egidio Robusto, Alice Bacherini, Giulia Balboni, Andrea Brancaccio, Ottavia M Epifania, Noemi Mazzoni, Luca Stefanutti","doi":"10.1111/bmsp.70008","DOIUrl":"https://doi.org/10.1111/bmsp.70008","url":null,"abstract":"<p><p>The basic local independence model (BLIM) is appropriate in situations where populations do not differ in the probabilities of the knowledge states and the probabilities of careless errors and lucky guesses of the items. In some situations, this is not the case. This work introduces the multiple observed classification local independence model (MOCLIM), which extends the BLIM by allowing the above probabilities to vary across populations. In the MOCLIM, each individual is characterized by proficiency, careless and guessing classes, which are observed and determine the probabilities of knowledge states, careless errors and lucky guesses of a population. Given a particular class type (proficiency, careless, or guessing), the probabilities are the same for populations with the same class but may vary between populations with different classes. Algorithms for maximum likelihood estimation of the MOCLIM parameters are provided. The results of a simulation study suggest that the true parameter values are well recovered by the estimation algorithm and that the true model can be uncovered by comparing the goodness-of-fit of alternative models. The results of an empirical application to data from Raven-like matrices suggest that the MOCLIM effectively discriminates between situations where group differences are expected and those where they are not.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145115131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Bayes factor framework for unified parameter estimation and hypothesis testing. 统一参数估计和假设检验的贝叶斯因子框架。
IF 1.8 3区 心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-09-18 DOI: 10.1111/bmsp.70011
Samuel Pawel

The Bayes factor, the data-based updating factor of the prior to posterior odds of two hypotheses, is a natural measure of statistical evidence for one hypothesis over the other. We show how Bayes factors can also be used for parameter estimation. The key idea is to consider the Bayes factor as a function of the parameter value under the null hypothesis. This 'support curve' is inverted to obtain point estimates ('maximum evidence estimates') and interval estimates ('support intervals'), similar to how p-value functions are inverted to obtain point estimates and confidence intervals. This provides data analysts with a unified inference framework as Bayes factors (for any tested parameter value), support intervals (at any level), and point estimates can be easily read off from a plot of the support curve. This approach shares similarities but is also distinct from conventional Bayesian and frequentist approaches: It uses the Bayesian evidence calculus, but without synthesizing data and prior, and it defines statistical evidence in terms of (integrated) likelihood ratios, but also includes a natural way for dealing with nuisance parameters. Applications to meta-analysis, replication studies and logistic regression illustrate how our framework is of practical value for making quantitative inferences.

贝叶斯因子是两个假设的先验后验概率的基于数据的更新因子,是一个假设优于另一个假设的统计证据的自然度量。我们展示了贝叶斯因子也可以用于参数估计。关键思想是将贝叶斯因子视为零假设下参数值的函数。这个“支持曲线”被倒置以获得点估计(“最大证据估计”)和区间估计(“支持区间”),类似于p值函数被倒置以获得点估计和置信区间。这为数据分析人员提供了一个统一的推理框架,因为贝叶斯因子(对于任何测试的参数值)、支持间隔(在任何水平上)和点估计可以很容易地从支持曲线的图中读取出来。这种方法与传统的贝叶斯和频率方法有相似之处,但也不同:它使用贝叶斯证据演算,但没有综合数据和先验,它根据(集成)似然比定义统计证据,但也包括处理麻烦参数的自然方法。元分析、复制研究和逻辑回归的应用说明了我们的框架如何在进行定量推断方面具有实用价值。
{"title":"A Bayes factor framework for unified parameter estimation and hypothesis testing.","authors":"Samuel Pawel","doi":"10.1111/bmsp.70011","DOIUrl":"https://doi.org/10.1111/bmsp.70011","url":null,"abstract":"<p><p>The Bayes factor, the data-based updating factor of the prior to posterior odds of two hypotheses, is a natural measure of statistical evidence for one hypothesis over the other. We show how Bayes factors can also be used for parameter estimation. The key idea is to consider the Bayes factor as a function of the parameter value under the null hypothesis. This 'support curve' is inverted to obtain point estimates ('maximum evidence estimates') and interval estimates ('support intervals'), similar to how p-value functions are inverted to obtain point estimates and confidence intervals. This provides data analysts with a unified inference framework as Bayes factors (for any tested parameter value), support intervals (at any level), and point estimates can be easily read off from a plot of the support curve. This approach shares similarities but is also distinct from conventional Bayesian and frequentist approaches: It uses the Bayesian evidence calculus, but without synthesizing data and prior, and it defines statistical evidence in terms of (integrated) likelihood ratios, but also includes a natural way for dealing with nuisance parameters. Applications to meta-analysis, replication studies and logistic regression illustrate how our framework is of practical value for making quantitative inferences.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145082536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting Critical Change in Dynamics Through Outlier Detection with Time-Varying Parameters. 通过时变参数的离群值检测检测动力学的关键变化。
IF 1.8 3区 心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-09-14 DOI: 10.1111/bmsp.70010
Meng Chen, Michael D Hunter, Sy-Miin Chow

Intensive longitudinal data are often found to be non-stationary, namely, showing changes in statistical properties, such as means and variance-covariance structures, over time. One way to accommodate non-stationarity is to specify key parameters that show over-time changes as time-varying parameters (TVPs). However, the nature and dynamics of TVPs may themselves be heterogeneous across time, contexts, developmental stages, individuals and as related to other biopsychosocial-cultural influences. We propose an outlier detection method designed to facilitate the detection of critical shifts in any differentiable linear and non-linear dynamic functions, including dynamic functions for TVPs. This approach can be readily applied to various data scenarios, including single-subject and multisubject, univariate and multivariate processes, as well as with and without latent variables. We demonstrate the utility and performance of this approach with three sets of simulation studies and an empirical illustration using facial electromyography data from a laboratory emotion induction study.

密集的纵向数据经常被发现是非平稳的,即显示统计特性的变化,如平均值和方差-协方差结构,随着时间的推移。适应非平稳性的一种方法是将显示随时间变化的关键参数指定为时变参数(tvp)。然而,TVPs的性质和动态本身可能在时间、环境、发展阶段、个体以及与其他生物、心理、社会、文化影响有关的方面是异质的。我们提出了一种异常值检测方法,旨在促进检测任何可微线性和非线性动态函数的临界位移,包括tpv的动态函数。这种方法可以很容易地应用于各种数据场景,包括单主题和多主题,单变量和多变量过程,以及有和没有潜在变量。我们通过三组模拟研究和使用实验室情绪诱导研究的面部肌电图数据的实证说明来证明这种方法的实用性和性能。
{"title":"Detecting Critical Change in Dynamics Through Outlier Detection with Time-Varying Parameters.","authors":"Meng Chen, Michael D Hunter, Sy-Miin Chow","doi":"10.1111/bmsp.70010","DOIUrl":"https://doi.org/10.1111/bmsp.70010","url":null,"abstract":"<p><p>Intensive longitudinal data are often found to be non-stationary, namely, showing changes in statistical properties, such as means and variance-covariance structures, over time. One way to accommodate non-stationarity is to specify key parameters that show over-time changes as time-varying parameters (TVPs). However, the nature and dynamics of TVPs may themselves be heterogeneous across time, contexts, developmental stages, individuals and as related to other biopsychosocial-cultural influences. We propose an outlier detection method designed to facilitate the detection of critical shifts in any differentiable linear and non-linear dynamic functions, including dynamic functions for TVPs. This approach can be readily applied to various data scenarios, including single-subject and multisubject, univariate and multivariate processes, as well as with and without latent variables. We demonstrate the utility and performance of this approach with three sets of simulation studies and an empirical illustration using facial electromyography data from a laboratory emotion induction study.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145066348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
British Journal of Mathematical & Statistical Psychology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1