British Journal of Mathematical & Statistical Psychology最新文献_第3页

Generalized extreme value IRT models. 广义极值IRT模型。

IF 1.8 3区心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

British Journal of Mathematical & Statistical Psychology

Pub Date : 2025-11-12 DOI: 10.1111/bmsp.70015

Jessica Alves, Jorge Bazán, Jorge González

This paper introduces two new Item Response Theory (IRT) models, based on the Generalized Extreme Value (GEV) distribution. These new models have asymmetric item characteristic curves (ICC) which have drawn growing interest, as they may better model actual item response behaviours in specific scenarios. The analysis of the models is carried out using a Bayesian approach, and their properties are examined and discussed. The validity of the models is verified by means of extensive simulation studies to evaluate the sensitivity of the model to the choice of priors on the new item parameter introduced, the accuracy of the parameters' recovery, as well as an assessment of the capacity of model comparison criteria to choose the best model against other IRT models. The new models are exemplified using real data from two mathematics tests, one applied in Peruvian public schools and another one administered to incoming university students in Chile. In both cases, the proposed models showed to be a promising alternative to asymmetric IRT models, offering new insights into item response modelling.

本文介绍了基于广义极值分布的两个新的项目反应理论模型。这些新模型具有不对称的物品特征曲线（ICC），这引起了人们越来越多的兴趣，因为它们可以更好地模拟特定场景下的实际物品反应行为。采用贝叶斯方法对模型进行了分析，并对其性质进行了检验和讨论。通过广泛的仿真研究来验证模型的有效性，以评估模型对引入的新项目参数的先验选择的敏感性，参数恢复的准确性，以及评估模型比较标准相对于其他IRT模型选择最佳模型的能力。新模型使用了两个数学测试的真实数据，一个用于秘鲁公立学校，另一个用于智利即将入学的大学生。在这两种情况下，所提出的模型显示出不对称IRT模型的一个有希望的替代方案，为项目反应建模提供了新的见解。

引用次数: 0

Reliability measures in knowledge structure theory. 知识结构理论中的可靠性测度。

IF 1.8 3区心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

British Journal of Mathematical & Statistical Psychology

Pub Date : 2025-11-01 DOI: 10.1111/bmsp.70013

Debora de Chiusole, Andrea Spoto, Umberto Granziol, Luca Stefanutti

In knowledge structure theory (KST) framework, this study evaluates the reliability of knowledge state estimation by introducing two key measures: the expected accuracy rate and the expected discrepancy. The accuracy rate quantifies the likelihood that the estimated knowledge state aligns with the true state, while the expected discrepancy measures the average deviation when misclassification occurs. To support the theoretical framework, we provide an in-depth discussion of these indices, supplemented by two simulation studies and an empirical example. The simulation results reveal a trade-off between the number of items and the size of the knowledge structure. Specifically, smaller structures exhibit consistent accuracy across different error levels, while larger structures show increasing discrepancies as error rates rise. Nevertheless, accuracy improves with a greater number of items in larger structures, mitigating the impact of errors. Additionally, the expected discrepancy analysis shows that when misclassification occurs, the estimated state is generally close to the true one, minimizing the effect of errors in the assessment. Finally, an empirical application using real assessment data demonstrates the practical relevance of the proposed measures. This suggests that KST-based assessments provide reliable and meaningful diagnostic information, highlighting their potential for use in educational and psychological testing.

在知识结构理论（KST）框架下，通过引入预期正确率和预期偏差这两个关键指标来评估知识状态估计的可靠性。准确率量化了估计的知识状态与真实状态一致的可能性，而期望偏差度量了发生错误分类时的平均偏差。为了支持理论框架，我们对这些指标进行了深入的讨论，并辅以两个模拟研究和一个实证例子。仿真结果揭示了项目数量与知识结构大小之间的权衡关系。具体来说，较小的结构在不同的错误级别上表现出一致的准确性，而较大的结构随着错误率的增加而表现出越来越大的差异。然而，在较大的结构中，随着项目数量的增加，准确性会提高，从而减轻错误的影响。此外，期望差异分析表明，当发生误分类时，估计状态通常接近真实状态，最小化了评估误差的影响。最后，使用真实评估数据的实证应用证明了所提出措施的实际相关性。这表明基于kst的评估提供了可靠和有意义的诊断信息，突出了它们在教育和心理测试中的应用潜力。

{"title":"Reliability measures in knowledge structure theory.","authors":"Debora de Chiusole, Andrea Spoto, Umberto Granziol, Luca Stefanutti","doi":"10.1111/bmsp.70013","DOIUrl":"https://doi.org/10.1111/bmsp.70013","url":null,"abstract":"In knowledge structure theory (KST) framework, this study evaluates the reliability of knowledge state estimation by introducing two key measures: the expected accuracy rate and the expected discrepancy. The accuracy rate quantifies the likelihood that the estimated knowledge state aligns with the true state, while the expected discrepancy measures the average deviation when misclassification occurs. To support the theoretical framework, we provide an in-depth discussion of these indices, supplemented by two simulation studies and an empirical example. The simulation results reveal a trade-off between the number of items and the size of the knowledge structure. Specifically, smaller structures exhibit consistent accuracy across different error levels, while larger structures show increasing discrepancies as error rates rise. Nevertheless, accuracy improves with a greater number of items in larger structures, mitigating the impact of errors. Additionally, the expected discrepancy analysis shows that when misclassification occurs, the estimated state is generally close to the true one, minimizing the effect of errors in the assessment. Finally, an empirical application using real assessment data demonstrates the practical relevance of the proposed measures. This suggests that KST-based assessments provide reliable and meaningful diagnostic information, highlighting their potential for use in educational and psychological testing.","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145423528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An investigation into in-sample and out-of-sample model selection for nonstationary autoregressive models. 非平稳自回归模型的样本内和样本外模型选择研究。

IF 1.8 3区心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

British Journal of Mathematical & Statistical Psychology

Pub Date : 2025-10-28 DOI: 10.1111/bmsp.70012

Yong Zhang, Anja F Ernst, Ginette Lafit, Ward B Eiling, Laura F Bringmann

The stationary autoregressive model forms an important base of time-series analysis in today's psychology research. Diverse nonstationary extensions of this model are developed to capture different types of changing temporal dynamics. However, researchers do not always have a solid theoretical base to rely on for deciding a-priori which of these nonstationary models is the most appropriate for a given time-series. In this case, correct model selection becomes a crucial step to ensure an accurate understanding of the temporal dynamics. This study consists of two main parts. First, with a simulation study, we investigated the performance of in-sample (information criteria) and out-of-sample (cross-validation, out-of-sample prediction) model selection techniques in identifying six different univariate nonstationary processes. We found that the Bayesian information criteria (BIC) has an overall optimal performance whereas other techniques' performance depends largely on the time-series' length. Then, we re-analysed a 239-day-long time-series of positive and negative affect to illustrate the model selection process. Combining the simulation results and practical considerations from the empirical analysis, we argue that model selection for nonstationary time-series should not completely rely on data-driven approaches. Instead, more theory-driven approaches where researchers actively integrate their qualitative understanding will inform the data-driven approaches.

平稳自回归模型是当今心理学研究中时间序列分析的重要基础。开发了该模型的各种非平稳扩展以捕获不同类型的变化时间动态。然而，研究人员并不总是有一个坚实的理论基础来先验地决定这些非平稳模型中哪一个最适合给定的时间序列。在这种情况下，正确的模型选择成为确保准确理解时间动态的关键步骤。本研究主要由两个部分组成。首先，通过模拟研究，我们研究了样本内（信息标准）和样本外（交叉验证，样本外预测）模型选择技术在识别六种不同的单变量非平稳过程中的性能。我们发现贝叶斯信息标准（BIC）具有整体最优性能，而其他技术的性能在很大程度上取决于时间序列的长度。然后，我们重新分析了239天的积极和消极影响的时间序列，以说明模型选择过程。结合模拟结果和经验分析的实际考虑，我们认为非平稳时间序列的模型选择不应完全依赖于数据驱动的方法。相反，更多的理论驱动的方法，研究人员积极地整合他们的定性理解，将为数据驱动的方法提供信息。

{"title":"An investigation into in-sample and out-of-sample model selection for nonstationary autoregressive models.","authors":"Yong Zhang, Anja F Ernst, Ginette Lafit, Ward B Eiling, Laura F Bringmann","doi":"10.1111/bmsp.70012","DOIUrl":"https://doi.org/10.1111/bmsp.70012","url":null,"abstract":"The stationary autoregressive model forms an important base of time-series analysis in today's psychology research. Diverse nonstationary extensions of this model are developed to capture different types of changing temporal dynamics. However, researchers do not always have a solid theoretical base to rely on for deciding a-priori which of these nonstationary models is the most appropriate for a given time-series. In this case, correct model selection becomes a crucial step to ensure an accurate understanding of the temporal dynamics. This study consists of two main parts. First, with a simulation study, we investigated the performance of in-sample (information criteria) and out-of-sample (cross-validation, out-of-sample prediction) model selection techniques in identifying six different univariate nonstationary processes. We found that the Bayesian information criteria (BIC) has an overall optimal performance whereas other techniques' performance depends largely on the time-series' length. Then, we re-analysed a 239-day-long time-series of positive and negative affect to illustrate the model selection process. Combining the simulation results and practical considerations from the empirical analysis, we argue that model selection for nonstationary time-series should not completely rely on data-driven approaches. Instead, more theory-driven approaches where researchers actively integrate their qualitative understanding will inform the data-driven approaches.","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145395303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reinforcement learning-based adaptive learning: Rewards improvement considering learning duration. 基于强化学习的自适应学习：考虑学习持续时间奖励改进。

IF 1.8 3区心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

British Journal of Mathematical & Statistical Psychology

Pub Date : 2025-10-24 DOI: 10.1111/bmsp.70014

Tongxin Zhang, Canxi Cao, Tao Xin, Xiaoming Zhai

Reinforcement learning (RL) powers the engine of adaptive learning systems which recommend customized learning materials to individual learners in their varying learning states to optimize learning effectiveness. However, some argue that only improving learning effectiveness may be insufficient, particularly if it overly extends learning efforts and requires additional time to work on the recommended materials. Learners with different amounts of prior knowledge consume different amounts of time on the same material. Therefore, designers should consider both the usefulness of the material and the time dedicated to making sense of the materials by individual learners with a specific amount of prior knowledge. To fill this gap, this study proposes a RL-based adaptive learning system wherein reward is improved by considering both factors. We then conducted Monte Carlo simulation studies to verify the effects of the improved reward and uncover the mechanisms for RL recommendation strategies. Results show that the improved reward reduces learners' learning duration substantially due to interpretable recommendation strategies, which results in growing learning efficiency for learners with varying prior knowledge.

强化学习（RL）为自适应学习系统提供动力，该系统向处于不同学习状态的个体学习者推荐定制的学习材料，以优化学习效率。然而，一些人认为，仅仅提高学习效率可能是不够的，特别是如果它过度扩展了学习努力，需要额外的时间来学习推荐的材料。具有不同先验知识的学习者在同一材料上花费的时间不同。因此，设计师既要考虑材料的有用性，也要考虑具有特定先验知识的个体学习者理解材料所花费的时间。为了填补这一空白，本研究提出了一个基于强化学习的自适应学习系统，其中奖励通过考虑这两个因素而得到改善。然后，我们进行了蒙特卡洛模拟研究，以验证改进奖励的效果，并揭示强化学习推荐策略的机制。结果表明，由于可解释的推荐策略，改进的奖励大大减少了学习者的学习时间，从而提高了具有不同先验知识的学习者的学习效率。

{"title":"Reinforcement learning-based adaptive learning: Rewards improvement considering learning duration.","authors":"Tongxin Zhang, Canxi Cao, Tao Xin, Xiaoming Zhai","doi":"10.1111/bmsp.70014","DOIUrl":"https://doi.org/10.1111/bmsp.70014","url":null,"abstract":"Reinforcement learning (RL) powers the engine of adaptive learning systems which recommend customized learning materials to individual learners in their varying learning states to optimize learning effectiveness. However, some argue that only improving learning effectiveness may be insufficient, particularly if it overly extends learning efforts and requires additional time to work on the recommended materials. Learners with different amounts of prior knowledge consume different amounts of time on the same material. Therefore, designers should consider both the usefulness of the material and the time dedicated to making sense of the materials by individual learners with a specific amount of prior knowledge. To fill this gap, this study proposes a RL-based adaptive learning system wherein reward is improved by considering both factors. We then conducted Monte Carlo simulation studies to verify the effects of the improved reward and uncover the mechanisms for RL recommendation strategies. Results show that the improved reward reduces learners' learning duration substantially due to interpretable recommendation strategies, which results in growing learning efficiency for learners with varying prior knowledge.","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145356833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An extension of the basic local independence model to multiple observed classifications. 将基本局部独立模型扩展到多个观察分类。

IF 1.8 3区心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

British Journal of Mathematical & Statistical Psychology

Pub Date : 2025-09-21 DOI: 10.1111/bmsp.70008

Pasquale Anselmi, Debora de Chiusole, Egidio Robusto, Alice Bacherini, Giulia Balboni, Andrea Brancaccio, Ottavia M Epifania, Noemi Mazzoni, Luca Stefanutti

The basic local independence model (BLIM) is appropriate in situations where populations do not differ in the probabilities of the knowledge states and the probabilities of careless errors and lucky guesses of the items. In some situations, this is not the case. This work introduces the multiple observed classification local independence model (MOCLIM), which extends the BLIM by allowing the above probabilities to vary across populations. In the MOCLIM, each individual is characterized by proficiency, careless and guessing classes, which are observed and determine the probabilities of knowledge states, careless errors and lucky guesses of a population. Given a particular class type (proficiency, careless, or guessing), the probabilities are the same for populations with the same class but may vary between populations with different classes. Algorithms for maximum likelihood estimation of the MOCLIM parameters are provided. The results of a simulation study suggest that the true parameter values are well recovered by the estimation algorithm and that the true model can be uncovered by comparing the goodness-of-fit of alternative models. The results of an empirical application to data from Raven-like matrices suggest that the MOCLIM effectively discriminates between situations where group differences are expected and those where they are not.

基本的局部独立模型（BLIM）适用于总体在知识状态的概率、粗心错误的概率和项目的幸运猜测的概率方面没有差异的情况。在某些情况下，情况并非如此。本文引入了多重观测分类局部独立模型（MOCLIM），该模型通过允许上述概率在种群中变化来扩展blm。在MOCLIM中，每个个体被分为熟练、粗心和猜测三个类别，观察并确定总体的知识状态、粗心错误和幸运猜测的概率。给定特定的职业类型（熟练、粗心或猜测），具有相同职业的人群的概率是相同的，但不同职业的人群之间可能会有所不同。给出了MOCLIM参数的最大似然估计算法。仿真研究结果表明，该估计算法能够很好地恢复参数的真实值，并且可以通过比较备选模型的拟合优度来揭示真实模型。对Raven-like矩阵数据的实证应用结果表明，MOCLIM有效地区分了预期群体差异和不预期群体差异的情况。

{"title":"An extension of the basic local independence model to multiple observed classifications.","authors":"Pasquale Anselmi, Debora de Chiusole, Egidio Robusto, Alice Bacherini, Giulia Balboni, Andrea Brancaccio, Ottavia M Epifania, Noemi Mazzoni, Luca Stefanutti","doi":"10.1111/bmsp.70008","DOIUrl":"https://doi.org/10.1111/bmsp.70008","url":null,"abstract":"The basic local independence model (BLIM) is appropriate in situations where populations do not differ in the probabilities of the knowledge states and the probabilities of careless errors and lucky guesses of the items. In some situations, this is not the case. This work introduces the multiple observed classification local independence model (MOCLIM), which extends the BLIM by allowing the above probabilities to vary across populations. In the MOCLIM, each individual is characterized by proficiency, careless and guessing classes, which are observed and determine the probabilities of knowledge states, careless errors and lucky guesses of a population. Given a particular class type (proficiency, careless, or guessing), the probabilities are the same for populations with the same class but may vary between populations with different classes. Algorithms for maximum likelihood estimation of the MOCLIM parameters are provided. The results of a simulation study suggest that the true parameter values are well recovered by the estimation algorithm and that the true model can be uncovered by comparing the goodness-of-fit of alternative models. The results of an empirical application to data from Raven-like matrices suggest that the MOCLIM effectively discriminates between situations where group differences are expected and those where they are not.","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145115131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Bayes factor framework for unified parameter estimation and hypothesis testing. 统一参数估计和假设检验的贝叶斯因子框架。

IF 1.8 3区心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

British Journal of Mathematical & Statistical Psychology

Pub Date : 2025-09-18 DOI: 10.1111/bmsp.70011

Samuel Pawel

The Bayes factor, the data-based updating factor of the prior to posterior odds of two hypotheses, is a natural measure of statistical evidence for one hypothesis over the other. We show how Bayes factors can also be used for parameter estimation. The key idea is to consider the Bayes factor as a function of the parameter value under the null hypothesis. This 'support curve' is inverted to obtain point estimates ('maximum evidence estimates') and interval estimates ('support intervals'), similar to how p-value functions are inverted to obtain point estimates and confidence intervals. This provides data analysts with a unified inference framework as Bayes factors (for any tested parameter value), support intervals (at any level), and point estimates can be easily read off from a plot of the support curve. This approach shares similarities but is also distinct from conventional Bayesian and frequentist approaches: It uses the Bayesian evidence calculus, but without synthesizing data and prior, and it defines statistical evidence in terms of (integrated) likelihood ratios, but also includes a natural way for dealing with nuisance parameters. Applications to meta-analysis, replication studies and logistic regression illustrate how our framework is of practical value for making quantitative inferences.

贝叶斯因子是两个假设的先验后验概率的基于数据的更新因子，是一个假设优于另一个假设的统计证据的自然度量。我们展示了贝叶斯因子也可以用于参数估计。关键思想是将贝叶斯因子视为零假设下参数值的函数。这个“支持曲线”被倒置以获得点估计（“最大证据估计”）和区间估计（“支持区间”），类似于p值函数被倒置以获得点估计和置信区间。这为数据分析人员提供了一个统一的推理框架，因为贝叶斯因子（对于任何测试的参数值）、支持间隔（在任何水平上）和点估计可以很容易地从支持曲线的图中读取出来。这种方法与传统的贝叶斯和频率方法有相似之处，但也不同：它使用贝叶斯证据演算，但没有综合数据和先验，它根据（集成）似然比定义统计证据，但也包括处理麻烦参数的自然方法。元分析、复制研究和逻辑回归的应用说明了我们的框架如何在进行定量推断方面具有实用价值。

{"title":"A Bayes factor framework for unified parameter estimation and hypothesis testing.","authors":"Samuel Pawel","doi":"10.1111/bmsp.70011","DOIUrl":"https://doi.org/10.1111/bmsp.70011","url":null,"abstract":"The Bayes factor, the data-based updating factor of the prior to posterior odds of two hypotheses, is a natural measure of statistical evidence for one hypothesis over the other. We show how Bayes factors can also be used for parameter estimation. The key idea is to consider the Bayes factor as a function of the parameter value under the null hypothesis. This 'support curve' is inverted to obtain point estimates ('maximum evidence estimates') and interval estimates ('support intervals'), similar to how p-value functions are inverted to obtain point estimates and confidence intervals. This provides data analysts with a unified inference framework as Bayes factors (for any tested parameter value), support intervals (at any level), and point estimates can be easily read off from a plot of the support curve. This approach shares similarities but is also distinct from conventional Bayesian and frequentist approaches: It uses the Bayesian evidence calculus, but without synthesizing data and prior, and it defines statistical evidence in terms of (integrated) likelihood ratios, but also includes a natural way for dealing with nuisance parameters. Applications to meta-analysis, replication studies and logistic regression illustrate how our framework is of practical value for making quantitative inferences.","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145082536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Detecting Critical Change in Dynamics Through Outlier Detection with Time-Varying Parameters. 通过时变参数的离群值检测检测动力学的关键变化。

IF 1.8 3区心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

British Journal of Mathematical & Statistical Psychology

Pub Date : 2025-09-14 DOI: 10.1111/bmsp.70010

Meng Chen, Michael D Hunter, Sy-Miin Chow

Intensive longitudinal data are often found to be non-stationary, namely, showing changes in statistical properties, such as means and variance-covariance structures, over time. One way to accommodate non-stationarity is to specify key parameters that show over-time changes as time-varying parameters (TVPs). However, the nature and dynamics of TVPs may themselves be heterogeneous across time, contexts, developmental stages, individuals and as related to other biopsychosocial-cultural influences. We propose an outlier detection method designed to facilitate the detection of critical shifts in any differentiable linear and non-linear dynamic functions, including dynamic functions for TVPs. This approach can be readily applied to various data scenarios, including single-subject and multisubject, univariate and multivariate processes, as well as with and without latent variables. We demonstrate the utility and performance of this approach with three sets of simulation studies and an empirical illustration using facial electromyography data from a laboratory emotion induction study.

密集的纵向数据经常被发现是非平稳的，即显示统计特性的变化，如平均值和方差-协方差结构，随着时间的推移。适应非平稳性的一种方法是将显示随时间变化的关键参数指定为时变参数（tvp）。然而，TVPs的性质和动态本身可能在时间、环境、发展阶段、个体以及与其他生物、心理、社会、文化影响有关的方面是异质的。我们提出了一种异常值检测方法，旨在促进检测任何可微线性和非线性动态函数的临界位移，包括tpv的动态函数。这种方法可以很容易地应用于各种数据场景，包括单主题和多主题，单变量和多变量过程，以及有和没有潜在变量。我们通过三组模拟研究和使用实验室情绪诱导研究的面部肌电图数据的实证说明来证明这种方法的实用性和性能。

引用次数: 0

Residual permutation tests for feature importance in machine learning. 机器学习中特征重要性的残差排列测试。

IF 1.8 3区心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

British Journal of Mathematical & Statistical Psychology

Pub Date : 2025-08-30 DOI: 10.1111/bmsp.70009

Po-Hsien Huang

Psychological research has traditionally relied on linear models to test scientific hypotheses. However, the emergence of machine learning (ML) algorithms has opened new opportunities for exploring variable relationships beyond linear constraints. To interpret the outcomes of these 'black-box' algorithms, various tools for assessing feature importance have been developed. However, most of these tools are descriptive and do not facilitate statistical inference. To address this gap, our study introduces two versions of residual permutation tests (RPTs), designed to assess the significance of a target feature in predicting the label. The first variant, RPT on Y (RPT-Y), permutes the residuals of the label conditioned on features other than the target. The second variant, RPT on X (RPT-X), permutes the residuals of the target feature conditioned on the other features. Through a comprehensive simulation study, we show that RPT-X maintains empirical Type I error rates under the nominal level across a wide range of ML algorithms and demonstrates appropriate statistical power in both regression and classification contexts. These findings suggest the utility of RPT-X for hypothesis testing in ML applications.

心理学研究传统上依靠线性模型来检验科学假设。然而，机器学习（ML）算法的出现为探索超越线性约束的变量关系开辟了新的机会。为了解释这些“黑盒”算法的结果，已经开发了各种评估特征重要性的工具。然而，这些工具大多是描述性的，不便于统计推断。为了解决这一差距，我们的研究引入了两个版本的残差排列测试（RPTs），旨在评估目标特征在预测标签中的重要性。第一种变体，RPT on Y (RPT-Y)，根据目标以外的特征来排列标签的残差。第二个变体，RPT on X (RPT-X)，将目标特征的残差以其他特征为条件进行排列。通过全面的模拟研究，我们表明RPT-X在广泛的ML算法中保持经验I型错误率低于标称水平，并在回归和分类上下文中显示出适当的统计能力。这些发现表明RPT-X在机器学习应用中的假设检验的效用。

{"title":"Residual permutation tests for feature importance in machine learning.","authors":"Po-Hsien Huang","doi":"10.1111/bmsp.70009","DOIUrl":"https://doi.org/10.1111/bmsp.70009","url":null,"abstract":"Psychological research has traditionally relied on linear models to test scientific hypotheses. However, the emergence of machine learning (ML) algorithms has opened new opportunities for exploring variable relationships beyond linear constraints. To interpret the outcomes of these 'black-box' algorithms, various tools for assessing feature importance have been developed. However, most of these tools are descriptive and do not facilitate statistical inference. To address this gap, our study introduces two versions of residual permutation tests (RPTs), designed to assess the significance of a target feature in predicting the label. The first variant, RPT on Y (RPT-Y), permutes the residuals of the label conditioned on features other than the target. The second variant, RPT on X (RPT-X), permutes the residuals of the target feature conditioned on the other features. Through a comprehensive simulation study, we show that RPT-X maintains empirical Type I error rates under the nominal level across a wide range of ML algorithms and demonstrates appropriate statistical power in both regression and classification contexts. These findings suggest the utility of RPT-X for hypothesis testing in ML applications.","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144979556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Joint analysis of dispersed count-time data using a bivariate latent factor model 使用双变量潜在因子模型对分散计数时间数据进行联合分析。

IF 1.8 3区心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

British Journal of Mathematical & Statistical Psychology

Pub Date : 2025-08-28 DOI: 10.1111/bmsp.70005

Cornelis J. Potgieter, Akihito Kamata, Yusuf Kara, Xin Qiao

In this study, we explore parameter estimation for a joint count-time data model with a two-factor latent trait structure, representing accuracy and speed. Each count-time variable pair corresponds to a specific item on a measurement instrument, where each item consists of a fixed number of tasks. The count variable represents the number of successfully completed tasks and is modeled using a Beta-binomial distribution to account for potential over-dispersion. The time variable, representing the duration needed to complete the tasks, is modeled using a normal distribution on a logarithmic scale. To characterize the model structure, we derive marginal moments that inform a set of method-of-moments (MOM) estimators, which serve as initial values for maximum likelihood estimation (MLE) via the Monte Carlo Expectation-Maximization (MCEM) algorithm. Standard errors are estimated using both the observed information matrix and bootstrap resampling, with simulation results indicating superior performance of the bootstrap, especially near boundary values of the dispersion parameter. A comprehensive simulation study investigates estimator accuracy and computational efficiency. To demonstrate the methodology, we analyze oral reading fluency (ORF) data, showing substantial variation in item-level dispersion and providing evidence for the improved model fit of the Beta-binomial specification, assessed using standardized root mean square residuals (SRMSR).

在本研究中，我们探索了具有双因素潜在特征结构（代表准确性和速度）的联合计数时间数据模型的参数估计。每个计数时间变量对对应于测量仪器上的一个特定项目，其中每个项目由固定数量的任务组成。count变量表示成功完成任务的数量，并使用beta二项分布建模，以解释潜在的过度分散。时间变量表示完成任务所需的持续时间，使用对数尺度上的正态分布建模。为了表征模型结构，我们推导了边际矩，这些边际矩为一组矩法（MOM）估计器提供信息，这些估计器通过蒙特卡洛期望最大化（MCEM）算法作为最大似然估计（MLE）的初始值。使用观测到的信息矩阵和自举重采样来估计标准误差，仿真结果表明自举法具有优越的性能，特别是在色散参数的边界值附近。对估计器的精度和计算效率进行了全面的仿真研究。为了证明该方法，我们分析了口语阅读流畅性（ORF）数据，显示了项目水平分散的实质性变化，并为β二项规范的改进模型拟合提供了证据，使用标准化均方根残差（SRMSR）进行评估。

{"title":"Joint analysis of dispersed count-time data using a bivariate latent factor model","authors":"Cornelis J. Potgieter, Akihito Kamata, Yusuf Kara, Xin Qiao","doi":"10.1111/bmsp.70005","DOIUrl":"10.1111/bmsp.70005","url":null,"abstract":"In this study, we explore parameter estimation for a joint count-time data model with a two-factor latent trait structure, representing accuracy and speed. Each count-time variable pair corresponds to a specific item on a measurement instrument, where each item consists of a fixed number of tasks. The count variable represents the number of successfully completed tasks and is modeled using a Beta-binomial distribution to account for potential over-dispersion. The time variable, representing the duration needed to complete the tasks, is modeled using a normal distribution on a logarithmic scale. To characterize the model structure, we derive marginal moments that inform a set of method-of-moments (MOM) estimators, which serve as initial values for maximum likelihood estimation (MLE) via the Monte Carlo Expectation-Maximization (MCEM) algorithm. Standard errors are estimated using both the observed information matrix and bootstrap resampling, with simulation results indicating superior performance of the bootstrap, especially near boundary values of the dispersion parameter. A comprehensive simulation study investigates estimator accuracy and computational efficiency. To demonstrate the methodology, we analyze oral reading fluency (ORF) data, showing substantial variation in item-level dispersion and providing evidence for the improved model fit of the Beta-binomial specification, assessed using standardized root mean square residuals (SRMSR).","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"79 1","pages":"207-228"},"PeriodicalIF":1.8,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://bpspsychub.onlinelibrary.wiley.com/doi/epdf/10.1111/bmsp.70005","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144979632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reduced rank regression for mixed predictor and response variables 混合预测变量和反应变量的降低秩回归。

IF 1.8 3区心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

British Journal of Mathematical & Statistical Psychology

Pub Date : 2025-08-28 DOI: 10.1111/bmsp.70004

Mark de Rooij, Lorenza Cotugno, Roberta Siciliano

In this paper, we propose the generalized mixed reduced rank regression method, GMR³ for short. GMR³ is a regression method for a mix of numeric, binary and ordinal response variables. The predictor variables can be a mix of binary, nominal, ordinal and numeric variables. For dealing with the categorical predictors we use optimal scaling. A majorization-minimization algorithm is derived for maximum likelihood estimation. A series of simulation studies is shown (Section 4) to evaluate the performance of the algorithm with different types of predictor and response variables. In Section 5, we briefly discuss the choices to make when applying the model the empirical data and give suggestions for supporting such choices. In a second simulation study (Section 6), we further study the behaviour of the model and algorithm in different scenarios for the true rank in relation to sample size. In Section 7, we show an application of GMR³ using the Eurobarometer Surveys data set of 2023.

本文提出了广义混合降阶回归方法，简称GMR3。GMR3是一种混合数值、二进制和有序响应变量的回归方法。预测变量可以是二进制、标称、序数和数字变量的混合。在处理分类预测时，我们使用最优尺度。提出了一种极大似然估计的最大化-最小化算法。本文展示了一系列模拟研究（第4节），以评估使用不同类型的预测器和响应变量的算法的性能。在第5节中，我们简要地讨论了在应用经验数据模型时要做出的选择，并给出了支持这些选择的建议。在第二个模拟研究（第6节）中，我们进一步研究了模型和算法在不同情况下与样本量相关的真实秩的行为。在第7节中，我们使用2023年的Eurobarometer Surveys数据集展示了GMR3的应用。

引用次数: 0