Differential item functioning (DIF) can be investigated by estimating item response theory (IRT) parameters separately for different respondent groups, thus allowing for the detection of discrepancies in parameter estimates across groups. However, before comparing the estimates, it is necessary to convert them to a common metric due to the constraints required to identify the model. These processes influence each other, as the presence of DIF items affects the estimation of scale conversion. This paper proposes a novel method that simultaneously performs scale conversion and DIF detection. By doing so, the estimated scale conversion automatically takes into account the presence of DIF. The differences of the item parameter estimates across groups can be explained through variables at the within-group item level or by the group itself. Penalized likelihood estimation is used to perform an automatic selection of the item parameters that differ in some groups. Real-data applications and simulation studies show the good performance of the proposal.
{"title":"Differential item functioning detection across multiple groups.","authors":"Michela Battauz","doi":"10.1111/bmsp.70023","DOIUrl":"https://doi.org/10.1111/bmsp.70023","url":null,"abstract":"<p><p>Differential item functioning (DIF) can be investigated by estimating item response theory (IRT) parameters separately for different respondent groups, thus allowing for the detection of discrepancies in parameter estimates across groups. However, before comparing the estimates, it is necessary to convert them to a common metric due to the constraints required to identify the model. These processes influence each other, as the presence of DIF items affects the estimation of scale conversion. This paper proposes a novel method that simultaneously performs scale conversion and DIF detection. By doing so, the estimated scale conversion automatically takes into account the presence of DIF. The differences of the item parameter estimates across groups can be explained through variables at the within-group item level or by the group itself. Penalized likelihood estimation is used to perform an automatic selection of the item parameters that differ in some groups. Real-data applications and simulation studies show the good performance of the proposal.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145764603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Q-matrix of a cognitively diagnostic assessment (CDA), documenting the item-attribute associations, is a key component of any CDA. However, the true Q-matrix underlying a CDA is never known and must be estimated-typically by content experts. However, due to fallible human judgment, misspecifications of the Q-matrix may occur, resulting in the misclassification of examinees. In response to this challenge, algorithms have been developed to estimate the Q-matrix from item responses. Some algorithms impose identifiability conditions while others do not. The debate about which is "right" is ongoing; especially, since these conditions are sufficient but not necessary, which means viable alternative Q-matrix estimates may be ignored. In this study, the performance of Q-matrix estimation algorithms that impose identifiability conditions on the Q-matrix estimate was compared with that of estimation algorithms which do not impose such identifiability conditions. Large-scale simulations examined the impact of factors like sample size, test length, attributes, or error levels. The estimated Q-matrices were evaluated for meeting identifiability conditions and their accuracy in classifying examinees. The simulation results showed that for the various estimation algorithms studied here, imposing identifiability conditions on Q-matrix estimation did not change outcomes with respect to identifiability or examinee classification.
{"title":"Identifiability conditions in cognitive diagnosis: Implications for Q-matrix estimation algorithms.","authors":"Hyunjoo Kim, Hans Friedrich Köhn, Chia-Yi Chiu","doi":"10.1111/bmsp.70020","DOIUrl":"https://doi.org/10.1111/bmsp.70020","url":null,"abstract":"<p><p>The Q-matrix of a cognitively diagnostic assessment (CDA), documenting the item-attribute associations, is a key component of any CDA. However, the true Q-matrix underlying a CDA is never known and must be estimated-typically by content experts. However, due to fallible human judgment, misspecifications of the Q-matrix may occur, resulting in the misclassification of examinees. In response to this challenge, algorithms have been developed to estimate the Q-matrix from item responses. Some algorithms impose identifiability conditions while others do not. The debate about which is \"right\" is ongoing; especially, since these conditions are sufficient but not necessary, which means viable alternative Q-matrix estimates may be ignored. In this study, the performance of Q-matrix estimation algorithms that impose identifiability conditions on the Q-matrix estimate was compared with that of estimation algorithms which do not impose such identifiability conditions. Large-scale simulations examined the impact of factors like sample size, test length, attributes, or error levels. The estimated Q-matrices were evaluated for meeting identifiability conditions and their accuracy in classifying examinees. The simulation results showed that for the various estimation algorithms studied here, imposing identifiability conditions on Q-matrix estimation did not change outcomes with respect to identifiability or examinee classification.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145745878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the present study, we extend a stochastic differential equation (SDE) model, the Ornstein-Uhlenbeck (OU) process, to the simultaneous analysis of time series of multiple variables by means of random effects for individuals and variables using a Bayesian framework. This SDE model is a stationary Gauss-Markov process that varies over time around its mean. Our extension allows us to estimate the variability of different parameters of the process, such as the mean (μ) or the drift parameter (φ), across individuals and variables of the system by means of marginalized posterior distributions. We illustrate the estimations and the interpretability of the parameters of this multilevel OU process in an empirical study of affect dynamics where multiple individuals were measured on different variables at multiple time points. We also conducted a simulation study to evaluate whether the model can recover the population parameters generating the OU process. Our results support the use of this model to obtain both the general parameters (common to all individuals and variables) and the variable-specific point estimates (random effects). We conclude that this multilevel OU process with individual- and variable-specific estimates as random effects can be a useful approach to analyse time series for multiple variables simultaneously.
{"title":"A multilevel Ornstein-Uhlenbeck process with individual- and variable-specific estimates as random effects.","authors":"José Ángel Martínez-Huertas, Emilio Ferrer","doi":"10.1111/bmsp.70019","DOIUrl":"https://doi.org/10.1111/bmsp.70019","url":null,"abstract":"<p><p>In the present study, we extend a stochastic differential equation (SDE) model, the Ornstein-Uhlenbeck (OU) process, to the simultaneous analysis of time series of multiple variables by means of random effects for individuals and variables using a Bayesian framework. This SDE model is a stationary Gauss-Markov process that varies over time around its mean. Our extension allows us to estimate the variability of different parameters of the process, such as the mean (μ) or the drift parameter (φ), across individuals and variables of the system by means of marginalized posterior distributions. We illustrate the estimations and the interpretability of the parameters of this multilevel OU process in an empirical study of affect dynamics where multiple individuals were measured on different variables at multiple time points. We also conducted a simulation study to evaluate whether the model can recover the population parameters generating the OU process. Our results support the use of this model to obtain both the general parameters (common to all individuals and variables) and the variable-specific point estimates (random effects). We conclude that this multilevel OU process with individual- and variable-specific estimates as random effects can be a useful approach to analyse time series for multiple variables simultaneously.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145702571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reliability is crucial in psychometrics, reflecting the extent to which a measurement instrument can discriminate between individuals or items. While classical test theory and intraclass correlation coefficients are well-established for quantitative scales, estimating reliability for binary outcomes presents unique challenges due to their discrete nature. This paper reviews and links three major approaches to estimate reliability for single ratings on binary scales: the normal approximation approach, kappa coefficients, and the latent variable approach, which enables estimation at both latent and manifest scale levels. We clarify their conceptual relationships, show conditions for asymptotical equivalence, and evaluate their performance across two common study designs, repeatability and reproducibility studies. Then, we extend the Bayesian Dirichlet-multinomial method for estimating kappa coefficients to settings with more than two replicates, without requiring Bayesian software. Additionally, we introduce a Bayesian method to estimate manifest scale reliability from latent scale reliability that can be implemented in standard Bayesian software. A simulation study compares the statistical properties of the three major approaches across Bayesian and frequentist frameworks. Overall, the normal approximation approach performed poorly, and the frequentist approach was unreliable due to singularity issues. The findings offer further refined practical recommendations.
{"title":"From tetrachoric to kappa: How to assess reliability on binary scales.","authors":"Sophie Vanbelle","doi":"10.1111/bmsp.70021","DOIUrl":"https://doi.org/10.1111/bmsp.70021","url":null,"abstract":"<p><p>Reliability is crucial in psychometrics, reflecting the extent to which a measurement instrument can discriminate between individuals or items. While classical test theory and intraclass correlation coefficients are well-established for quantitative scales, estimating reliability for binary outcomes presents unique challenges due to their discrete nature. This paper reviews and links three major approaches to estimate reliability for single ratings on binary scales: the normal approximation approach, kappa coefficients, and the latent variable approach, which enables estimation at both latent and manifest scale levels. We clarify their conceptual relationships, show conditions for asymptotical equivalence, and evaluate their performance across two common study designs, repeatability and reproducibility studies. Then, we extend the Bayesian Dirichlet-multinomial method for estimating kappa coefficients to settings with more than two replicates, without requiring Bayesian software. Additionally, we introduce a Bayesian method to estimate manifest scale reliability from latent scale reliability that can be implemented in standard Bayesian software. A simulation study compares the statistical properties of the three major approaches across Bayesian and frequentist frameworks. Overall, the normal approximation approach performed poorly, and the frequentist approach was unreliable due to singularity issues. The findings offer further refined practical recommendations.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145702732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dipro Mondal, Alberto Cassese, Math J J M Candel, Sophie Vanbelle
Reliability evaluation is critical in fields such as psychology and medicine to ensure accurate diagnosis and effective treatment management. When participants are evaluated by the same raters, a two-way ANOVA model is suitable to model the data, with the intraclass correlation coefficient (ICC) serving as the reliability metric. In these domains, the ICC for agreement (ICCa) is commonly used, as the values of the measurements themselves are of interest. Designing such reliability studies requires determining the sample size of participants and raters for the ICCa. Although procedures for sample size determination exist based on the expected width of the confidence interval for the ICCa, there is limited work on hypothesis testing. This paper addresses this gap by proposing procedures to ensure sufficient power to statistically test whether the ICCa exceeds a predetermined value, utilizing confidence intervals for the ICCa. We compared the available confidence interval methods for the ICCa and proposed sample size procedures using the lower confidence limit of the best performing methods. These procedures were evaluated considering the empirical power of the hypothesis test under various parameter configurations. Furthermore, these procedures are implemented in an interactive R shiny app, freely available to researchers for determining sample sizes.
{"title":"Sample size determination for hypothesis testing on the intraclass correlation coefficient in a two-way analysis of variance model.","authors":"Dipro Mondal, Alberto Cassese, Math J J M Candel, Sophie Vanbelle","doi":"10.1111/bmsp.70016","DOIUrl":"https://doi.org/10.1111/bmsp.70016","url":null,"abstract":"<p><p>Reliability evaluation is critical in fields such as psychology and medicine to ensure accurate diagnosis and effective treatment management. When participants are evaluated by the same raters, a two-way ANOVA model is suitable to model the data, with the intraclass correlation coefficient (ICC) serving as the reliability metric. In these domains, the ICC for agreement (ICCa) is commonly used, as the values of the measurements themselves are of interest. Designing such reliability studies requires determining the sample size of participants and raters for the ICCa. Although procedures for sample size determination exist based on the expected width of the confidence interval for the ICCa, there is limited work on hypothesis testing. This paper addresses this gap by proposing procedures to ensure sufficient power to statistically test whether the ICCa exceeds a predetermined value, utilizing confidence intervals for the ICCa. We compared the available confidence interval methods for the ICCa and proposed sample size procedures using the lower confidence limit of the best performing methods. These procedures were evaluated considering the empirical power of the hypothesis test under various parameter configurations. Furthermore, these procedures are implemented in an interactive R shiny app, freely available to researchers for determining sample sizes.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145524936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces two new Item Response Theory (IRT) models, based on the Generalized Extreme Value (GEV) distribution. These new models have asymmetric item characteristic curves (ICC) which have drawn growing interest, as they may better model actual item response behaviours in specific scenarios. The analysis of the models is carried out using a Bayesian approach, and their properties are examined and discussed. The validity of the models is verified by means of extensive simulation studies to evaluate the sensitivity of the model to the choice of priors on the new item parameter introduced, the accuracy of the parameters' recovery, as well as an assessment of the capacity of model comparison criteria to choose the best model against other IRT models. The new models are exemplified using real data from two mathematics tests, one applied in Peruvian public schools and another one administered to incoming university students in Chile. In both cases, the proposed models showed to be a promising alternative to asymmetric IRT models, offering new insights into item response modelling.
{"title":"Generalized extreme value IRT models.","authors":"Jessica Alves, Jorge Bazán, Jorge González","doi":"10.1111/bmsp.70015","DOIUrl":"https://doi.org/10.1111/bmsp.70015","url":null,"abstract":"<p><p>This paper introduces two new Item Response Theory (IRT) models, based on the Generalized Extreme Value (GEV) distribution. These new models have asymmetric item characteristic curves (ICC) which have drawn growing interest, as they may better model actual item response behaviours in specific scenarios. The analysis of the models is carried out using a Bayesian approach, and their properties are examined and discussed. The validity of the models is verified by means of extensive simulation studies to evaluate the sensitivity of the model to the choice of priors on the new item parameter introduced, the accuracy of the parameters' recovery, as well as an assessment of the capacity of model comparison criteria to choose the best model against other IRT models. The new models are exemplified using real data from two mathematics tests, one applied in Peruvian public schools and another one administered to incoming university students in Chile. In both cases, the proposed models showed to be a promising alternative to asymmetric IRT models, offering new insights into item response modelling.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145497315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Debora de Chiusole, Andrea Spoto, Umberto Granziol, Luca Stefanutti
In knowledge structure theory (KST) framework, this study evaluates the reliability of knowledge state estimation by introducing two key measures: the expected accuracy rate and the expected discrepancy. The accuracy rate quantifies the likelihood that the estimated knowledge state aligns with the true state, while the expected discrepancy measures the average deviation when misclassification occurs. To support the theoretical framework, we provide an in-depth discussion of these indices, supplemented by two simulation studies and an empirical example. The simulation results reveal a trade-off between the number of items and the size of the knowledge structure. Specifically, smaller structures exhibit consistent accuracy across different error levels, while larger structures show increasing discrepancies as error rates rise. Nevertheless, accuracy improves with a greater number of items in larger structures, mitigating the impact of errors. Additionally, the expected discrepancy analysis shows that when misclassification occurs, the estimated state is generally close to the true one, minimizing the effect of errors in the assessment. Finally, an empirical application using real assessment data demonstrates the practical relevance of the proposed measures. This suggests that KST-based assessments provide reliable and meaningful diagnostic information, highlighting their potential for use in educational and psychological testing.
{"title":"Reliability measures in knowledge structure theory.","authors":"Debora de Chiusole, Andrea Spoto, Umberto Granziol, Luca Stefanutti","doi":"10.1111/bmsp.70013","DOIUrl":"https://doi.org/10.1111/bmsp.70013","url":null,"abstract":"<p><p>In knowledge structure theory (KST) framework, this study evaluates the reliability of knowledge state estimation by introducing two key measures: the expected accuracy rate and the expected discrepancy. The accuracy rate quantifies the likelihood that the estimated knowledge state aligns with the true state, while the expected discrepancy measures the average deviation when misclassification occurs. To support the theoretical framework, we provide an in-depth discussion of these indices, supplemented by two simulation studies and an empirical example. The simulation results reveal a trade-off between the number of items and the size of the knowledge structure. Specifically, smaller structures exhibit consistent accuracy across different error levels, while larger structures show increasing discrepancies as error rates rise. Nevertheless, accuracy improves with a greater number of items in larger structures, mitigating the impact of errors. Additionally, the expected discrepancy analysis shows that when misclassification occurs, the estimated state is generally close to the true one, minimizing the effect of errors in the assessment. Finally, an empirical application using real assessment data demonstrates the practical relevance of the proposed measures. This suggests that KST-based assessments provide reliable and meaningful diagnostic information, highlighting their potential for use in educational and psychological testing.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145423528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yong Zhang, Anja F Ernst, Ginette Lafit, Ward B Eiling, Laura F Bringmann
The stationary autoregressive model forms an important base of time-series analysis in today's psychology research. Diverse nonstationary extensions of this model are developed to capture different types of changing temporal dynamics. However, researchers do not always have a solid theoretical base to rely on for deciding a-priori which of these nonstationary models is the most appropriate for a given time-series. In this case, correct model selection becomes a crucial step to ensure an accurate understanding of the temporal dynamics. This study consists of two main parts. First, with a simulation study, we investigated the performance of in-sample (information criteria) and out-of-sample (cross-validation, out-of-sample prediction) model selection techniques in identifying six different univariate nonstationary processes. We found that the Bayesian information criteria (BIC) has an overall optimal performance whereas other techniques' performance depends largely on the time-series' length. Then, we re-analysed a 239-day-long time-series of positive and negative affect to illustrate the model selection process. Combining the simulation results and practical considerations from the empirical analysis, we argue that model selection for nonstationary time-series should not completely rely on data-driven approaches. Instead, more theory-driven approaches where researchers actively integrate their qualitative understanding will inform the data-driven approaches.
{"title":"An investigation into in-sample and out-of-sample model selection for nonstationary autoregressive models.","authors":"Yong Zhang, Anja F Ernst, Ginette Lafit, Ward B Eiling, Laura F Bringmann","doi":"10.1111/bmsp.70012","DOIUrl":"https://doi.org/10.1111/bmsp.70012","url":null,"abstract":"<p><p>The stationary autoregressive model forms an important base of time-series analysis in today's psychology research. Diverse nonstationary extensions of this model are developed to capture different types of changing temporal dynamics. However, researchers do not always have a solid theoretical base to rely on for deciding a-priori which of these nonstationary models is the most appropriate for a given time-series. In this case, correct model selection becomes a crucial step to ensure an accurate understanding of the temporal dynamics. This study consists of two main parts. First, with a simulation study, we investigated the performance of in-sample (information criteria) and out-of-sample (cross-validation, out-of-sample prediction) model selection techniques in identifying six different univariate nonstationary processes. We found that the Bayesian information criteria (BIC) has an overall optimal performance whereas other techniques' performance depends largely on the time-series' length. Then, we re-analysed a 239-day-long time-series of positive and negative affect to illustrate the model selection process. Combining the simulation results and practical considerations from the empirical analysis, we argue that model selection for nonstationary time-series should not completely rely on data-driven approaches. Instead, more theory-driven approaches where researchers actively integrate their qualitative understanding will inform the data-driven approaches.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145395303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reinforcement learning (RL) powers the engine of adaptive learning systems which recommend customized learning materials to individual learners in their varying learning states to optimize learning effectiveness. However, some argue that only improving learning effectiveness may be insufficient, particularly if it overly extends learning efforts and requires additional time to work on the recommended materials. Learners with different amounts of prior knowledge consume different amounts of time on the same material. Therefore, designers should consider both the usefulness of the material and the time dedicated to making sense of the materials by individual learners with a specific amount of prior knowledge. To fill this gap, this study proposes a RL-based adaptive learning system wherein reward is improved by considering both factors. We then conducted Monte Carlo simulation studies to verify the effects of the improved reward and uncover the mechanisms for RL recommendation strategies. Results show that the improved reward reduces learners' learning duration substantially due to interpretable recommendation strategies, which results in growing learning efficiency for learners with varying prior knowledge.
{"title":"Reinforcement learning-based adaptive learning: Rewards improvement considering learning duration.","authors":"Tongxin Zhang, Canxi Cao, Tao Xin, Xiaoming Zhai","doi":"10.1111/bmsp.70014","DOIUrl":"https://doi.org/10.1111/bmsp.70014","url":null,"abstract":"<p><p>Reinforcement learning (RL) powers the engine of adaptive learning systems which recommend customized learning materials to individual learners in their varying learning states to optimize learning effectiveness. However, some argue that only improving learning effectiveness may be insufficient, particularly if it overly extends learning efforts and requires additional time to work on the recommended materials. Learners with different amounts of prior knowledge consume different amounts of time on the same material. Therefore, designers should consider both the usefulness of the material and the time dedicated to making sense of the materials by individual learners with a specific amount of prior knowledge. To fill this gap, this study proposes a RL-based adaptive learning system wherein reward is improved by considering both factors. We then conducted Monte Carlo simulation studies to verify the effects of the improved reward and uncover the mechanisms for RL recommendation strategies. Results show that the improved reward reduces learners' learning duration substantially due to interpretable recommendation strategies, which results in growing learning efficiency for learners with varying prior knowledge.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145356833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}