首页 > 最新文献

Applied Psychological Measurement最新文献

英文 中文
Detecting Differential Item Functioning in Multidimensional Graded Response Models With Recursive Partitioning 用递归分区检测多维分级反应模型中的差异项目功能
IF 1.2 4区 心理学 Q2 Social Sciences Pub Date : 2024-03-13 DOI: 10.1177/01466216241238743
Franz Classe, Christoph Kern
Differential item functioning (DIF) is a common challenge when examining latent traits in large scale surveys. In recent work, methods from the field of machine learning such as model-based recursive partitioning have been proposed to identify subgroups with DIF when little theoretical guidance and many potential subgroups are available. On this basis, we propose and compare recursive partitioning techniques for detecting DIF with a focus on measurement models with multiple latent variables and ordinal response data. We implement tree-based approaches for identifying subgroups that contribute to DIF in multidimensional latent variable modeling and propose a robust, yet scalable extension, inspired by random forests. The proposed techniques are applied and compared with simulations. We show that the proposed methods are able to efficiently detect DIF and allow to extract decision rules that lead to subgroups with well fitting models.
在大规模调查中研究潜在特质时,项目功能差异(DIF)是一个常见的挑战。在最近的工作中,有人提出了机器学习领域的方法,如基于模型的递归分区法,用于在缺乏理论指导和存在许多潜在子组的情况下识别具有 DIF 的子组。在此基础上,我们提出并比较了用于检测 DIF 的递归分区技术,重点是具有多个潜变量和顺序响应数据的测量模型。我们采用基于树的方法来识别在多维潜变量建模中导致 DIF 的子群,并受随机森林的启发提出了一种稳健且可扩展的扩展方法。我们应用了所提出的技术并进行了模拟比较。我们发现,所提出的方法能够有效地检测 DIF,并能提取出决策规则,从而产生具有良好拟合模型的子群。
{"title":"Detecting Differential Item Functioning in Multidimensional Graded Response Models With Recursive Partitioning","authors":"Franz Classe, Christoph Kern","doi":"10.1177/01466216241238743","DOIUrl":"https://doi.org/10.1177/01466216241238743","url":null,"abstract":"Differential item functioning (DIF) is a common challenge when examining latent traits in large scale surveys. In recent work, methods from the field of machine learning such as model-based recursive partitioning have been proposed to identify subgroups with DIF when little theoretical guidance and many potential subgroups are available. On this basis, we propose and compare recursive partitioning techniques for detecting DIF with a focus on measurement models with multiple latent variables and ordinal response data. We implement tree-based approaches for identifying subgroups that contribute to DIF in multidimensional latent variable modeling and propose a robust, yet scalable extension, inspired by random forests. The proposed techniques are applied and compared with simulations. We show that the proposed methods are able to efficiently detect DIF and allow to extract decision rules that lead to subgroups with well fitting models.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140247720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Linking Methods for Multidimensional Forced Choice Tests Using the Multi-Unidimensional Pairwise Preference Model 使用多维成对偏好模型的多维强制选择测试的链接方法
IF 1.2 4区 心理学 Q2 Social Sciences Pub Date : 2024-03-11 DOI: 10.1177/01466216241238741
Naidan Tu, Lavanya S. Kumar, Sean Joo, Stephen Stark
Applications of multidimensional forced choice (MFC) testing have increased considerably over the last 20 years. Yet there has been little, if any, research on methods for linking the parameter estimates from different samples. This research addressed that important need by extending four widely used methods for unidimensional linking and comparing the efficacy of new estimation algorithms for MFC linking coefficients based on the Multi-Unidimensional Pairwise Preference model (MUPP). More specifically, we compared the efficacy of multidimensional test characteristic curve (TCC), item characteristic curve (ICC; Haebara, 1980), mean/mean (M/M), and mean/sigma (M/S) methods in a Monte Carlo study that also manipulated test length, test dimensionality, sample size, percentage of anchor items, and linking scenarios. Results indicated that the ICC method outperformed the M/M method, which was better than the M/S method, with the TCC method being the least effective. However, as the number of items “per dimension” and the percentage of anchor items increased, the differences between the ICC, M/M, and M/S methods decreased. Study implications and practical recommendations for MUPP linking, as well as limitations, are discussed.
在过去 20 年中,多维强迫选择(MFC)测试的应用大幅增加。然而,关于将不同样本的参数估计联系起来的方法的研究却少之又少。为了满足这一重要需求,本研究扩展了四种广泛使用的单维连接方法,并比较了基于多维配对偏好模型(MUPP)的 MFC 连接系数新估算算法的有效性。更具体地说,我们在蒙特卡罗研究中比较了多维测试特征曲线(TCC)、项目特征曲线(ICC;Haebara,1980 年)、均值/均值(M/M)和均值/西格玛(M/S)方法的功效。结果表明,ICC 方法优于 M/M 方法,M/M 方法优于 M/S 方法,而 TCC 方法效果最差。不过,随着 "每个维度 "的项目数和锚点项目比例的增加,ICC、M/M 和 M/S 方法之间的差异也在缩小。本文讨论了 MUPP 连接的研究意义和实用建议以及局限性。
{"title":"Linking Methods for Multidimensional Forced Choice Tests Using the Multi-Unidimensional Pairwise Preference Model","authors":"Naidan Tu, Lavanya S. Kumar, Sean Joo, Stephen Stark","doi":"10.1177/01466216241238741","DOIUrl":"https://doi.org/10.1177/01466216241238741","url":null,"abstract":"Applications of multidimensional forced choice (MFC) testing have increased considerably over the last 20 years. Yet there has been little, if any, research on methods for linking the parameter estimates from different samples. This research addressed that important need by extending four widely used methods for unidimensional linking and comparing the efficacy of new estimation algorithms for MFC linking coefficients based on the Multi-Unidimensional Pairwise Preference model (MUPP). More specifically, we compared the efficacy of multidimensional test characteristic curve (TCC), item characteristic curve (ICC; Haebara, 1980), mean/mean (M/M), and mean/sigma (M/S) methods in a Monte Carlo study that also manipulated test length, test dimensionality, sample size, percentage of anchor items, and linking scenarios. Results indicated that the ICC method outperformed the M/M method, which was better than the M/S method, with the TCC method being the least effective. However, as the number of items “per dimension” and the percentage of anchor items increased, the differences between the ICC, M/M, and M/S methods decreased. Study implications and practical recommendations for MUPP linking, as well as limitations, are discussed.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140253400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Interpretable Machine Learning for Differential Item Functioning Detection in Psychometric Tests 使用可解释机器学习检测心理测验中的差异项目功能
IF 1.2 4区 心理学 Q2 Social Sciences Pub Date : 2024-03-11 DOI: 10.1177/01466216241238744
E. Kraus, Johannes Wild, Sven Hilbert
This study presents a novel method to investigate test fairness and differential item functioning combining psychometrics and machine learning. Test unfairness manifests itself in systematic and demographically imbalanced influences of confounding constructs on residual variances in psychometric modeling. Our method aims to account for resulting complex relationships between response patterns and demographic attributes. Specifically, it measures the importance of individual test items, and latent ability scores in comparison to a random baseline variable when predicting demographic characteristics. We conducted a simulation study to examine the functionality of our method under various conditions such as linear and complex impact, unfairness and varying number of factors, unfair items, and varying test length. We found that our method detects unfair items as reliably as Mantel–Haenszel statistics or logistic regression analyses but generalizes to multidimensional scales in a straight forward manner. To apply the method, we used random forests to predict migration backgrounds from ability scores and single items of an elementary school reading comprehension test. One item was found to be unfair according to all proposed decision criteria. Further analysis of the item’s content provided plausible explanations for this finding. Analysis code is available at: https://osf.io/s57rw/?view_only=47a3564028d64758982730c6d9c6c547 .
本研究提出了一种结合心理测量学和机器学习研究测验公平性和差异项目功能的新方法。在心理测量建模中,测验不公平表现为混杂因素对残差的系统性和人口统计不平衡影响。我们的方法旨在解释由此产生的应答模式与人口统计学属性之间的复杂关系。具体来说,它衡量了单个测试项目和潜在能力分数与随机基线变量相比在预测人口统计学特征时的重要性。我们进行了一项模拟研究,以检验我们的方法在不同条件下的功能,如线性和复杂影响、不公平和不同因素数量、不公平项目和不同测试长度。我们发现,我们的方法能像曼特尔-海恩泽尔统计或逻辑回归分析一样可靠地检测出不公平项目,而且能直接推广到多维量表。为了应用该方法,我们使用随机森林从小学阅读理解测试的能力分数和单个项目中预测迁移背景。根据所有建议的判定标准,我们发现有一个项目是不公平的。对该项目内容的进一步分析为这一发现提供了合理的解释。分析代码见:https://osf.io/s57rw/?view_only=47a3564028d64758982730c6d9c6c547 。
{"title":"Using Interpretable Machine Learning for Differential Item Functioning Detection in Psychometric Tests","authors":"E. Kraus, Johannes Wild, Sven Hilbert","doi":"10.1177/01466216241238744","DOIUrl":"https://doi.org/10.1177/01466216241238744","url":null,"abstract":"This study presents a novel method to investigate test fairness and differential item functioning combining psychometrics and machine learning. Test unfairness manifests itself in systematic and demographically imbalanced influences of confounding constructs on residual variances in psychometric modeling. Our method aims to account for resulting complex relationships between response patterns and demographic attributes. Specifically, it measures the importance of individual test items, and latent ability scores in comparison to a random baseline variable when predicting demographic characteristics. We conducted a simulation study to examine the functionality of our method under various conditions such as linear and complex impact, unfairness and varying number of factors, unfair items, and varying test length. We found that our method detects unfair items as reliably as Mantel–Haenszel statistics or logistic regression analyses but generalizes to multidimensional scales in a straight forward manner. To apply the method, we used random forests to predict migration backgrounds from ability scores and single items of an elementary school reading comprehension test. One item was found to be unfair according to all proposed decision criteria. Further analysis of the item’s content provided plausible explanations for this finding. Analysis code is available at: https://osf.io/s57rw/?view_only=47a3564028d64758982730c6d9c6c547 .","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140253015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benefits of the Curious Behavior of Bayesian Hierarchical Item Response Theory Models—An in-Depth Investigation and Bias Correction 贝叶斯分层项目反应理论模型奇异行为的益处--深入调查与偏差校正
IF 1.2 4区 心理学 Q2 Social Sciences Pub Date : 2024-01-20 DOI: 10.1177/01466216241227547
Christoph König, Rainer W. Alexandrowicz
When using Bayesian hierarchical modeling, a popular approach for Item Response Theory (IRT) models, researchers typically face a tradeoff between the precision and accuracy of the item parameter estimates. Given the pooling principle and variance-dependent shrinkage, the expected behavior of Bayesian hierarchical IRT models is to deliver more precise but biased item parameter estimates, compared to those obtained in nonhierarchical models. Previous research, however, points out the possibility that, in the context of the two-parameter logistic IRT model, the aforementioned tradeoff has not to be made. With a comprehensive simulation study, we provide an in-depth investigation into this possibility. The results show a superior performance, in terms of bias, RMSE and precision, of the hierarchical specifications compared to the nonhierarchical counterpart. Under certain conditions, the bias in the item parameter estimates is independent of the bias in the variance components. Moreover, we provide a bias correction procedure for item discrimination parameter estimates. In sum, we show that IRT models create a unique situation where the Bayesian hierarchical approach indeed yields parameter estimates that are not only more precise, but also more accurate, compared to nonhierarchical approaches. We discuss this beneficial behavior from both theoretical and applied point of views.
贝叶斯层次模型是项目反应理论(IRT)模型的一种流行方法,研究人员在使用贝叶斯层次模型时,通常需要在项目参数估计的精确度和准确度之间做出权衡。考虑到集合原理和方差收缩,贝叶斯分层 IRT 模型的预期行为是,与非分层模型相比,提供更精确但有偏差的项目参数估计。然而,以往的研究指出,在双参数逻辑 IRT 模型中,可能不需要做出上述权衡。通过全面的模拟研究,我们对这种可能性进行了深入调查。结果表明,与非分层模型相比,分层模型在偏差、均方根误差和精度方面都有更出色的表现。在某些条件下,项目参数估计的偏差与方差成分的偏差无关。此外,我们还为项目区分度参数估计提供了一个偏差修正程序。总之,我们证明了 IRT 模型创造了一种独特的情况,即贝叶斯分层方法与非分层方法相比,不仅能获得更精确的参数估计,而且能获得更准确的参数估计。我们将从理论和应用两个角度讨论这种有益的行为。
{"title":"Benefits of the Curious Behavior of Bayesian Hierarchical Item Response Theory Models—An in-Depth Investigation and Bias Correction","authors":"Christoph König, Rainer W. Alexandrowicz","doi":"10.1177/01466216241227547","DOIUrl":"https://doi.org/10.1177/01466216241227547","url":null,"abstract":"When using Bayesian hierarchical modeling, a popular approach for Item Response Theory (IRT) models, researchers typically face a tradeoff between the precision and accuracy of the item parameter estimates. Given the pooling principle and variance-dependent shrinkage, the expected behavior of Bayesian hierarchical IRT models is to deliver more precise but biased item parameter estimates, compared to those obtained in nonhierarchical models. Previous research, however, points out the possibility that, in the context of the two-parameter logistic IRT model, the aforementioned tradeoff has not to be made. With a comprehensive simulation study, we provide an in-depth investigation into this possibility. The results show a superior performance, in terms of bias, RMSE and precision, of the hierarchical specifications compared to the nonhierarchical counterpart. Under certain conditions, the bias in the item parameter estimates is independent of the bias in the variance components. Moreover, we provide a bias correction procedure for item discrimination parameter estimates. In sum, we show that IRT models create a unique situation where the Bayesian hierarchical approach indeed yields parameter estimates that are not only more precise, but also more accurate, compared to nonhierarchical approaches. We discuss this beneficial behavior from both theoretical and applied point of views.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139523601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrigendum to “irtplay: An R Package for Online Item Calibration, Scoring, Evaluation of Model Fit, and Useful Functions for Unidimensional IRT” irtplay:用于单维 IRT 的在线项目校准、评分、模型拟合度评估和有用函数的 R 包"
IF 1.2 4区 心理学 Q2 Social Sciences Pub Date : 2024-01-18 DOI: 10.1177/01466216231223043
{"title":"Corrigendum to “irtplay: An R Package for Online Item Calibration, Scoring, Evaluation of Model Fit, and Useful Functions for Unidimensional IRT”","authors":"","doi":"10.1177/01466216231223043","DOIUrl":"https://doi.org/10.1177/01466216231223043","url":null,"abstract":"","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139614768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting uniform differential item functioning for continuous response computerized adaptive testing 检测连续反应计算机自适应测试的统一差异项目功能
IF 1.2 4区 心理学 Q2 Social Sciences Pub Date : 2024-01-17 DOI: 10.1177/01466216241227544
Chun Wang, Ruoyi Zhu
Evaluating items for potential differential item functioning (DIF) is an essential step to ensuring measurement fairness. In this article, we focus on a specific scenario, namely, the continuous response, severely sparse, computerized adaptive testing (CAT). Continuous responses items are growingly used in performance-based tasks because they tend to generate more information than traditional dichotomous items. Severe sparsity arises when many items are automatically generated via machine learning algorithms. We propose two uniform DIF detection methods in this scenario. The first is a modified version of the CAT-SIBTEST, a non-parametric method that does not depend on any specific item response theory model assumptions. The second is a regularization method, a parametric, model-based approach. Simulation studies show that both methods are effective in correctly identifying items with uniform DIF. A real data analysis is provided in the end to illustrate the utility and potential caveats of the two methods.
评估项目的潜在差异项目功能(DIF)是确保测量公平性的重要步骤。在本文中,我们将重点讨论一种特定的情况,即连续反应、严重稀疏的计算机化自适应测试(CAT)。连续反应项目越来越多地用于基于成绩的任务中,因为与传统的二分法项目相比,连续反应项目往往能产生更多的信息。当许多项目是通过机器学习算法自动生成时,就会出现严重的稀疏性。在这种情况下,我们提出了两种统一的 DIF 检测方法。第一种是 CAT-SIBTEST 的改进版,这是一种非参数方法,不依赖于任何特定的项目反应理论模型假设。第二种是正则化方法,这是一种基于模型的参数方法。模拟研究表明,这两种方法都能有效地正确识别具有统一 DIF 的项目。最后提供了一个真实数据分析,以说明这两种方法的实用性和潜在的注意事项。
{"title":"Detecting uniform differential item functioning for continuous response computerized adaptive testing","authors":"Chun Wang, Ruoyi Zhu","doi":"10.1177/01466216241227544","DOIUrl":"https://doi.org/10.1177/01466216241227544","url":null,"abstract":"Evaluating items for potential differential item functioning (DIF) is an essential step to ensuring measurement fairness. In this article, we focus on a specific scenario, namely, the continuous response, severely sparse, computerized adaptive testing (CAT). Continuous responses items are growingly used in performance-based tasks because they tend to generate more information than traditional dichotomous items. Severe sparsity arises when many items are automatically generated via machine learning algorithms. We propose two uniform DIF detection methods in this scenario. The first is a modified version of the CAT-SIBTEST, a non-parametric method that does not depend on any specific item response theory model assumptions. The second is a regularization method, a parametric, model-based approach. Simulation studies show that both methods are effective in correctly identifying items with uniform DIF. A real data analysis is provided in the end to illustrate the utility and potential caveats of the two methods.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139616965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Location-Matching Adaptive Testing for Polytomous Technology-Enhanced Items 针对多项式技术增强项目的位置匹配自适应测试
IF 1.2 4区 心理学 Q2 Social Sciences Pub Date : 2024-01-16 DOI: 10.1177/01466216241227548
Hyeon-Ah Kang, Gregory Arbet, Joe Betts, William Muntean
The article presents adaptive testing strategies for polytomously scored technology-enhanced innovative items. We investigate item selection methods that match examinee’s ability levels in location and explore ways to leverage test-taking speeds during item selection. Existing approaches to selecting polytomous items are mostly based on information measures and tend to experience an item pool usage problem. In this study, we introduce location indices for polytomous items and show that location-matched item selection significantly improves the usage problem and achieves more diverse item sampling. We also contemplate matching items’ time intensities so that testing times can be regulated across the examinees. Numerical experiment from Monte Carlo simulation suggests that location-matched item selection achieves significantly better and more balanced item pool usage. Leveraging working speed in item selection distinctly reduced the average testing times as well as variation across the examinees. Both the procedures incurred marginal measurement cost (e.g., precision and efficiency) and yet showed significant improvement in the administrative outcomes. The experiment in two test settings also suggested that the procedures can lead to different administrative gains depending on the test design.
文章介绍了针对多项式评分技术增强型创新项目的自适应测试策略。我们研究了在位置上与考生能力水平相匹配的项目选择方法,并探讨了在项目选择过程中如何充分利用应试速度的方法。现有的多项式项目选择方法大多基于信息测量,往往会遇到项目池使用问题。在本研究中,我们引入了多项式项目的位置指数,并证明位置匹配的项目选择能显著改善使用问题,实现更多样化的项目抽样。我们还考虑了项目时间强度的匹配问题,从而可以调节不同考生的测试时间。蒙特卡洛模拟的数字实验表明,位置匹配的项目选择能明显改善项目库的使用情况,并使其更加均衡。在项目选择中利用工作速度明显减少了平均测试时间以及不同考生之间的差异。两种程序都会产生边际测量成本(如精确度和效率),但在管理结果方面却有显著改善。在两种测试环境下进行的实验还表明,根据测试设计的不同,这两种程序可带来不同的管理收益。
{"title":"Location-Matching Adaptive Testing for Polytomous Technology-Enhanced Items","authors":"Hyeon-Ah Kang, Gregory Arbet, Joe Betts, William Muntean","doi":"10.1177/01466216241227548","DOIUrl":"https://doi.org/10.1177/01466216241227548","url":null,"abstract":"The article presents adaptive testing strategies for polytomously scored technology-enhanced innovative items. We investigate item selection methods that match examinee’s ability levels in location and explore ways to leverage test-taking speeds during item selection. Existing approaches to selecting polytomous items are mostly based on information measures and tend to experience an item pool usage problem. In this study, we introduce location indices for polytomous items and show that location-matched item selection significantly improves the usage problem and achieves more diverse item sampling. We also contemplate matching items’ time intensities so that testing times can be regulated across the examinees. Numerical experiment from Monte Carlo simulation suggests that location-matched item selection achieves significantly better and more balanced item pool usage. Leveraging working speed in item selection distinctly reduced the average testing times as well as variation across the examinees. Both the procedures incurred marginal measurement cost (e.g., precision and efficiency) and yet showed significant improvement in the administrative outcomes. The experiment in two test settings also suggested that the procedures can lead to different administrative gains depending on the test design.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139619277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparing Test-Taking Effort Between Paper-Based and Computer-Based Tests 比较纸质考试和计算机辅助考试的应试努力程度
IF 1.2 4区 心理学 Q2 Social Sciences Pub Date : 2024-01-13 DOI: 10.1177/01466216241227535
Sebastian Weirich, Karoline A. Sachse, Sofie Henschel, Carola Schnitzler
The article compares the trajectories of students’ self-reported test-taking effort during a 120 minutes low-stakes large-scale assessment of English comprehension between a paper-and-pencil (PPA) and a computer-based assessment (CBA). Test-taking effort was measured four times during the test. Using a within-subject design, each of the N = 2,676 German ninth-grade students completed half of the test in PPA and half in CBA mode, where the sequence of modes was balanced between students. Overall, students’ test-taking effort decreased considerably during the course of the test. On average, effort was lower in CBA than in PPA. While on average, effort was lower in CBA than in PPA, the decline did not vary between both modes during the test. That is, students’ self-reported effort was higher if the items were easier (compared to students’ abilities). The consequences of these results concerning the further development of CBA tests and large-scale assessments in general are discussed.
文章比较了纸笔测验(PPA)和计算机测验(CBA)在 120 分钟低风险大规模英语理解能力评估中学生自我报告的考试努力程度的轨迹。在测试过程中,对学生的考试努力程度进行了四次测量。采用受试者内设计,N=2,676 名德国九年级学生每人一半以 PPA 模式完成测试,一半以 CBA 模式完成测试,学生之间的模式顺序保持平衡。总体而言,学生的考试努力程度在考试过程中明显下降。平均而言,CBA 模式下的努力程度低于 PPA 模式。虽然平均而言,CBA 的努力程度低于 PPA,但在考试过程中,两种模式的努力程度下降幅度并无差异。也就是说,如果题目比较容易(与学生的能力相比),学生自我报告的努力程度会更高。我们讨论了这些结果对进一步发展 CBA 测试和大规模评估的影响。
{"title":"Comparing Test-Taking Effort Between Paper-Based and Computer-Based Tests","authors":"Sebastian Weirich, Karoline A. Sachse, Sofie Henschel, Carola Schnitzler","doi":"10.1177/01466216241227535","DOIUrl":"https://doi.org/10.1177/01466216241227535","url":null,"abstract":"The article compares the trajectories of students’ self-reported test-taking effort during a 120 minutes low-stakes large-scale assessment of English comprehension between a paper-and-pencil (PPA) and a computer-based assessment (CBA). Test-taking effort was measured four times during the test. Using a within-subject design, each of the N = 2,676 German ninth-grade students completed half of the test in PPA and half in CBA mode, where the sequence of modes was balanced between students. Overall, students’ test-taking effort decreased considerably during the course of the test. On average, effort was lower in CBA than in PPA. While on average, effort was lower in CBA than in PPA, the decline did not vary between both modes during the test. That is, students’ self-reported effort was higher if the items were easier (compared to students’ abilities). The consequences of these results concerning the further development of CBA tests and large-scale assessments in general are discussed.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139531162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Auxiliary Item Information in the Item Parameter Estimation of a Graded Response Model for a Small to Medium Sample Size: Empirical Versus Hierarchical Bayes Estimation 在小到中等样本量的分级反应模型的项目参数估计中使用辅助项目信息:经验与层次贝叶斯估计
4区 心理学 Q2 Social Sciences Pub Date : 2023-11-03 DOI: 10.1177/01466216231209758
Matthew Naveiras, Sun-Joo Cho
Marginal maximum likelihood estimation (MMLE) is commonly used for item response theory item parameter estimation. However, sufficiently large sample sizes are not always possible when studying rare populations. In this paper, empirical Bayes and hierarchical Bayes are presented as alternatives to MMLE in small sample sizes, using auxiliary item information to estimate the item parameters of a graded response model with higher accuracy. Empirical Bayes and hierarchical Bayes methods are compared with MMLE to determine under what conditions these Bayes methods can outperform MMLE, and to determine if hierarchical Bayes can act as an acceptable alternative to MMLE in conditions where MMLE is unable to converge. In addition, empirical Bayes and hierarchical Bayes methods are compared to show how hierarchical Bayes can result in estimates of posterior variance with greater accuracy than empirical Bayes by acknowledging the uncertainty of item parameter estimates. The proposed methods were evaluated via a simulation study. Simulation results showed that hierarchical Bayes methods can be acceptable alternatives to MMLE under various testing conditions, and we provide a guideline to indicate which methods would be recommended in different research situations. R functions are provided to implement these proposed methods.
边际最大似然估计是项目反应理论中常用的项目参数估计方法。然而,在研究稀有种群时,足够大的样本量并不总是可能的。本文提出了经验贝叶斯和层次贝叶斯作为小样本量MMLE的替代方法,利用辅助的项目信息来估计分级响应模型的项目参数,具有更高的精度。将经验贝叶斯和层次贝叶斯方法与MMLE进行比较,以确定在哪些条件下这些贝叶斯方法可以优于MMLE,并确定在MMLE无法收敛的情况下,层次贝叶斯是否可以作为MMLE的可接受替代方法。此外,比较了经验贝叶斯和层次贝叶斯方法,显示了层次贝叶斯如何通过承认项目参数估计的不确定性,以比经验贝叶斯更高的精度估计后验方差。通过仿真研究对所提出的方法进行了评估。仿真结果表明,在各种测试条件下,分层贝叶斯方法都是MMLE的可接受替代方法,并给出了在不同研究情况下推荐哪种方法的指导方针。提供R函数来实现这些建议的方法。
{"title":"Using Auxiliary Item Information in the Item Parameter Estimation of a Graded Response Model for a Small to Medium Sample Size: Empirical Versus Hierarchical Bayes Estimation","authors":"Matthew Naveiras, Sun-Joo Cho","doi":"10.1177/01466216231209758","DOIUrl":"https://doi.org/10.1177/01466216231209758","url":null,"abstract":"Marginal maximum likelihood estimation (MMLE) is commonly used for item response theory item parameter estimation. However, sufficiently large sample sizes are not always possible when studying rare populations. In this paper, empirical Bayes and hierarchical Bayes are presented as alternatives to MMLE in small sample sizes, using auxiliary item information to estimate the item parameters of a graded response model with higher accuracy. Empirical Bayes and hierarchical Bayes methods are compared with MMLE to determine under what conditions these Bayes methods can outperform MMLE, and to determine if hierarchical Bayes can act as an acceptable alternative to MMLE in conditions where MMLE is unable to converge. In addition, empirical Bayes and hierarchical Bayes methods are compared to show how hierarchical Bayes can result in estimates of posterior variance with greater accuracy than empirical Bayes by acknowledging the uncertainty of item parameter estimates. The proposed methods were evaluated via a simulation study. Simulation results showed that hierarchical Bayes methods can be acceptable alternatives to MMLE under various testing conditions, and we provide a guideline to indicate which methods would be recommended in different research situations. R functions are provided to implement these proposed methods.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135819514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Bayesian Random Weights Linear Logistic Test Model for Within-Test Practice Effects 测试内实践效果的贝叶斯随机权重线性Logistic检验模型
4区 心理学 Q2 Social Sciences Pub Date : 2023-11-01 DOI: 10.1177/01466216231209752
José H. Lozano, Javier Revuelta
The present paper introduces a random weights linear logistic test model for the measurement of individual differences in operation-specific practice effects within a single administration of a test. The proposed model is an extension of the linear logistic test model of learning developed by Spada (1977) in which the practice effects are considered random effects varying across examinees. A Bayesian framework was used for model estimation and evaluation. A simulation study was conducted to examine the behavior of the model in combination with the Bayesian procedures. The results demonstrated the good performance of the estimation and evaluation methods. Additionally, an empirical study was conducted to illustrate the applicability of the model to real data. The model was applied to a sample of responses from a logical ability test providing evidence of individual differences in operation-specific practice effects.
本文介绍了一个随机权重线性逻辑检验模型,用于测量单个管理测试中特定操作实践效果的个体差异。所提出的模型是对Spada(1977)提出的学习线性逻辑检验模型的扩展,在Spada(1977)的模型中,实践效应被认为是随机效应,在考生之间是不同的。采用贝叶斯框架对模型进行估计和评价。结合贝叶斯过程进行了模拟研究,以检验模型的行为。结果表明,该估计和评价方法具有良好的性能。此外,通过实证研究验证了该模型对实际数据的适用性。该模型应用于逻辑能力测试的反应样本,提供了具体操作实践效果的个体差异的证据。
{"title":"A Bayesian Random Weights Linear Logistic Test Model for Within-Test Practice Effects","authors":"José H. Lozano, Javier Revuelta","doi":"10.1177/01466216231209752","DOIUrl":"https://doi.org/10.1177/01466216231209752","url":null,"abstract":"The present paper introduces a random weights linear logistic test model for the measurement of individual differences in operation-specific practice effects within a single administration of a test. The proposed model is an extension of the linear logistic test model of learning developed by Spada (1977) in which the practice effects are considered random effects varying across examinees. A Bayesian framework was used for model estimation and evaluation. A simulation study was conducted to examine the behavior of the model in combination with the Bayesian procedures. The results demonstrated the good performance of the estimation and evaluation methods. Additionally, an empirical study was conducted to illustrate the applicability of the model to real data. The model was applied to a sample of responses from a logical ability test providing evidence of individual differences in operation-specific practice effects.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135371926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Applied Psychological Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1