Applied Psychological Measurement最新文献_第9页

A Comparison of Robust Likelihood Estimators to Mitigate Bias From Rapid Guessing. 一种鲁棒似然估计器的比较以减轻快速猜测的偏差。

IF 1.2 4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL

Applied Psychological Measurement

Pub Date : 2022-05-01 DOI: 10.1177/01466216221084371

Joseph A Rios

Rapid guessing (RG) behavior can undermine measurement property and score-based inferences. To mitigate this potential bias, practitioners have relied on response time information to identify and filter RG responses. However, response times may be unavailable in many testing contexts, such as paper-and-pencil administrations. When this is the case, self-report measures of effort and person-fit statistics have been used. These methods are limited in that inferences concerning motivation and aberrant responding are made at the examinee level. As test takers can engage in a mixture of solution and RG behavior throughout a test administration, there is a need to limit the influence of potential aberrant responses at the item level. This can be done by employing robust estimation procedures. Since these estimators have received limited attention in the RG literature, the objective of this simulation study was to evaluate ability parameter estimation accuracy in the presence of RG by comparing maximum likelihood estimation (MLE) to two robust variants, the bisquare and Huber estimators. Two RG conditions were manipulated, RG percentage (10%, 20%, and 40%) and pattern (difficulty-based and changing state). Contrasted to the MLE procedure, results demonstrated that both the bisquare and Huber estimators reduced bias in ability parameter estimates by as much as 94%. Given that the Huber estimator showed smaller standard deviations of error and performed equally as well as the bisquare approach under most conditions, it is recommended as a promising approach to mitigating bias from RG when response time information is unavailable.

快速猜测(RG)行为会破坏测量属性和基于分数的推理。为了减轻这种潜在的偏见，从业者依赖于响应时间信息来识别和过滤RG响应。然而，响应时间可能在许多测试环境中不可用，例如纸笔管理。在这种情况下，自我报告的努力措施和个人适合统计已被使用。这些方法的局限性在于，对动机和异常反应的推断是在考生层面上进行的。由于考生在整个考试过程中可能参与溶液和RG行为的混合，因此有必要限制潜在异常反应在项目层面的影响。这可以通过采用稳健的估计过程来实现。由于这些估计量在RG文献中受到的关注有限，因此本模拟研究的目的是通过比较最大似然估计(MLE)与两种鲁棒变量(bissquared和Huber估计)来评估RG存在下的能力参数估计精度。操纵两种RG条件，RG百分比(10%，20%和40%)和模式(基于难度和变化状态)。与MLE方法相比，结果表明，双方估计和Huber估计都减少了94%的能力参数估计偏差。鉴于Huber估计器显示出较小的误差标准差，并且在大多数情况下表现得与二方方法一样好，因此在无法获得响应时间信息时，建议将其作为一种有希望的方法来减轻RG的偏差。

{"title":"A Comparison of Robust Likelihood Estimators to Mitigate Bias From Rapid Guessing.","authors":"Joseph A Rios","doi":"10.1177/01466216221084371","DOIUrl":"https://doi.org/10.1177/01466216221084371","url":null,"abstract":"Rapid guessing (RG) behavior can undermine measurement property and score-based inferences. To mitigate this potential bias, practitioners have relied on response time information to identify and filter RG responses. However, response times may be unavailable in many testing contexts, such as paper-and-pencil administrations. When this is the case, self-report measures of effort and person-fit statistics have been used. These methods are limited in that inferences concerning motivation and aberrant responding are made at the examinee level. As test takers can engage in a mixture of solution and RG behavior throughout a test administration, there is a need to limit the influence of potential aberrant responses at the item level. This can be done by employing robust estimation procedures. Since these estimators have received limited attention in the RG literature, the objective of this simulation study was to evaluate ability parameter estimation accuracy in the presence of RG by comparing maximum likelihood estimation (MLE) to two robust variants, the bisquare and Huber estimators. Two RG conditions were manipulated, RG percentage (10%, 20%, and 40%) and pattern (difficulty-based and changing state). Contrasted to the MLE procedure, results demonstrated that both the bisquare and Huber estimators reduced bias in ability parameter estimates by as much as 94%. Given that the Huber estimator showed smaller standard deviations of error and performed equally as well as the bisquare approach under most conditions, it is recommended as a promising approach to mitigating bias from RG when response time information is unavailable.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 3","pages":"236-249"},"PeriodicalIF":1.2,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9073634/pdf/10.1177_01466216221084371.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9748240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

BayMDS: An R Package for Bayesian Multidimensional Scaling and Choice of Dimension. BayMDS:一个用于贝叶斯多维尺度和维度选择的R包。

IF 1.2 4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL

Applied Psychological Measurement

Pub Date : 2022-05-01 DOI: 10.1177/01466216221084219

Man-Suk Oh, Eun-Kyung Lee

MDSIC computes and plots MDSIC that can be used to select optimal number of dimensions for a given data set. There are also a few plot functions. plotObj shows pairwise scatter plots of object con ﬁ guration in a Euclidean space for the ﬁ rst three dimensions. plotTrace provides trace plots of parameter samples for visual inspection of MCMC convergence. plotDelDist plots the observed dissimilarity measures versus Euclidean distances computed from BMDS object con ﬁ guration. bayMDSApp shows the results of bayMDS in a web-based GUI (graphical user

引用次数: 1

Measurement of Ability in Adaptive Learning and Assessment Systems when Learners Use On-Demand Hints 当学习者使用随需应变提示时，自适应学习和评估系统中的能力测量

IF 1.2 4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL

Applied Psychological Measurement

Pub Date : 2022-04-18 DOI: 10.1177/01466216221084208

M. Bolsinova, Benjamin E. Deonovic, Meirav Arieli-Attali, Burr Settles, Masato Hagiwara, G. Maris

Adaptive learning and assessment systems support learners in acquiring knowledge and skills in a particular domain. The learners’ progress is monitored through them solving items matching their level and aiming at specific learning goals. Scaffolding and providing learners with hints are powerful tools in helping the learning process. One way of introducing hints is to make hint use the choice of the student. When the learner is certain of their response, they answer without hints, but if the learner is not certain or does not know how to approach the item they can request a hint. We develop measurement models for applications where such on-demand hints are available. Such models take into account that hint use may be informative of ability, but at the same time may be influenced by other individual characteristics. Two modeling strategies are considered: (1) The measurement model is based on a scoring rule for ability which includes both response accuracy and hint use. (2) The choice to use hints and response accuracy conditional on this choice are modeled jointly using Item Response Tree models. The properties of different models and their implications are discussed. An application to data from Duolingo, an adaptive language learning system, is presented. Here, the best model is the scoring-rule-based model with full credit for correct responses without hints, partial credit for correct responses with hints, and no credit for all incorrect responses. The second dimension in the model accounts for the individual differences in the tendency to use hints.

适应性学习和评估系统支持学习者获取特定领域的知识和技能。通过解决与他们的水平相匹配的问题，并针对特定的学习目标来监测学习者的进步。脚手架和为学习者提供提示是帮助学习过程的强大工具。引入提示的一种方法是让提示使用学生的选择。当学习者确定自己的回答时，他们不需要提示，但如果学习者不确定或不知道如何接近这个项目，他们可以要求提示。我们为这些按需提示可用的应用程序开发测量模型。这些模型考虑到暗示的使用可能是能力的信息，但同时也可能受到其他个体特征的影响。考虑了两种建模策略:(1)测量模型基于能力评分规则，该规则包括反应准确性和提示使用。(2)采用Item response Tree模型对提示的选择和基于提示的回答精度进行联合建模。讨论了不同模型的性质及其含义。介绍了一种基于自适应语言学习系统“多邻国”的数据处理方法。在这里，最好的模型是基于评分规则的模型，对没有提示的正确回答得满分，对有提示的正确回答得部分分，对所有错误的回答不得分。模型中的第二个维度解释了使用暗示倾向的个体差异。

{"title":"Measurement of Ability in Adaptive Learning and Assessment Systems when Learners Use On-Demand Hints","authors":"M. Bolsinova, Benjamin E. Deonovic, Meirav Arieli-Attali, Burr Settles, Masato Hagiwara, G. Maris","doi":"10.1177/01466216221084208","DOIUrl":"https://doi.org/10.1177/01466216221084208","url":null,"abstract":"Adaptive learning and assessment systems support learners in acquiring knowledge and skills in a particular domain. The learners’ progress is monitored through them solving items matching their level and aiming at specific learning goals. Scaffolding and providing learners with hints are powerful tools in helping the learning process. One way of introducing hints is to make hint use the choice of the student. When the learner is certain of their response, they answer without hints, but if the learner is not certain or does not know how to approach the item they can request a hint. We develop measurement models for applications where such on-demand hints are available. Such models take into account that hint use may be informative of ability, but at the same time may be influenced by other individual characteristics. Two modeling strategies are considered: (1) The measurement model is based on a scoring rule for ability which includes both response accuracy and hint use. (2) The choice to use hints and response accuracy conditional on this choice are modeled jointly using Item Response Tree models. The properties of different models and their implications are discussed. An application to data from Duolingo, an adaptive language learning system, is presented. Here, the best model is the scoring-rule-based model with full credit for correct responses without hints, partial credit for correct responses with hints, and no credit for all incorrect responses. The second dimension in the model accounts for the individual differences in the tendency to use hints.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 1","pages":"219 - 235"},"PeriodicalIF":1.2,"publicationDate":"2022-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43517867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Impact of Sampling Variability When Estimating the Explained Common Variance 抽样变异性在估计解释的共同方差时的影响

IF 1.2 4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL

Applied Psychological Measurement

Pub Date : 2022-04-15 DOI: 10.1177/01466216221084215

Björn Andersson, Hao Luo

Assessing multidimensionality of a scale or test is a staple of educational and psychological measurement. One approach to evaluate approximate unidimensionality is to fit a bifactor model where the subfactors are determined by substantive theory and estimate the explained common variance (ECV) of the general factor. The ECV says to what extent the explained variance is dominated by the general factor over the specific factors, and has been used, together with other methods and statistics, to determine if a single factor model is sufficient for analyzing a scale or test (Rodriguez et al., 2016). In addition, the individual item-ECV (I-ECV) has been used to assess approximate unidimensionality of individual items (Carnovale et al., 2021; Stucky et al., 2013). However, the ECVand I-ECVare subject to random estimation error which previous studies have not considered. Not accounting for the error in estimation can lead to conclusions regarding the dimensionality of a scale or item that are inaccurate, especially when an estimate of ECVor I-ECV is compared to a pre-specified cut-off value to evaluate unidimensionality. The objective of the present study is to derive standard errors of the estimators of ECV and I-ECV with linear confirmatory factor analysis (CFA) models to enable the assessment of random estimation error and the computation of confidence intervals for the parameters. We use Monte-Carlo simulation to assess the accuracy of the derived standard errors and evaluate the impact of sampling variability on the estimation of the ECV and I-ECV. In a bifactor model for J items, denote Xj, j 1⁄4 1, ..., J , as the observed variable and let G denote the general factor. We define the S subfactors Fs, s2f1,..., Sg, and Js as the set of indicators for each subfactor. Each observed indicator Xj is then defined by the multiple factor model (McDonald, 2013)

评估量表或测试的多维度是教育和心理测量的主要内容。评估近似单维性的一种方法是拟合由实体理论确定子因子的双因子模型，并估计总因子的解释共同方差(ECV)。ECV表示被解释的方差在多大程度上由一般因素而不是特定因素主导，并与其他方法和统计数据一起使用，以确定单因素模型是否足以分析量表或测试(Rodriguez et al.， 2016)。此外，单个项目的ecv (I-ECV)已被用于评估单个项目的近似单维性(Carnovale等人，2021;Stucky et al.， 2013)。然而，ecv和i - ecv存在随机估计误差，这是以往研究没有考虑到的。不考虑估计误差可能导致关于量表或项目维度的结论不准确，特别是当将ECVor I-ECV的估计与预先指定的截止值进行比较以评估单维性时。本研究的目的是利用线性验证性因子分析(CFA)模型推导出ECV和I-ECV估计量的标准误差，以便评估随机估计误差和计算参数的置信区间。我们使用蒙特卡罗模拟来评估衍生标准误差的准确性，并评估抽样变异性对ECV和I-ECV估计的影响。在J项的双因子模型中，记为Xj, J 1 / 4 1，…， J为观测变量，设G为一般因子。我们定义S个子因子Fs, s2f1，…， Sg和Js作为每个子因子的指标集。然后用多因素模型定义每个观测到的指标Xj (McDonald, 2013)。

{"title":"Impact of Sampling Variability When Estimating the Explained Common Variance","authors":"Björn Andersson, Hao Luo","doi":"10.1177/01466216221084215","DOIUrl":"https://doi.org/10.1177/01466216221084215","url":null,"abstract":"Assessing multidimensionality of a scale or test is a staple of educational and psychological measurement. One approach to evaluate approximate unidimensionality is to fit a bifactor model where the subfactors are determined by substantive theory and estimate the explained common variance (ECV) of the general factor. The ECV says to what extent the explained variance is dominated by the general factor over the specific factors, and has been used, together with other methods and statistics, to determine if a single factor model is sufficient for analyzing a scale or test (Rodriguez et al., 2016). In addition, the individual item-ECV (I-ECV) has been used to assess approximate unidimensionality of individual items (Carnovale et al., 2021; Stucky et al., 2013). However, the ECVand I-ECVare subject to random estimation error which previous studies have not considered. Not accounting for the error in estimation can lead to conclusions regarding the dimensionality of a scale or item that are inaccurate, especially when an estimate of ECVor I-ECV is compared to a pre-specified cut-off value to evaluate unidimensionality. The objective of the present study is to derive standard errors of the estimators of ECV and I-ECV with linear confirmatory factor analysis (CFA) models to enable the assessment of random estimation error and the computation of confidence intervals for the parameters. We use Monte-Carlo simulation to assess the accuracy of the derived standard errors and evaluate the impact of sampling variability on the estimation of the ECV and I-ECV. In a bifactor model for J items, denote Xj, j 1⁄4 1, ..., J , as the observed variable and let G denote the general factor. We define the S subfactors Fs, s2f1,..., Sg, and Js as the set of indicators for each subfactor. Each observed indicator Xj is then defined by the multiple factor model (McDonald, 2013)","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 1","pages":"338 - 341"},"PeriodicalIF":1.2,"publicationDate":"2022-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42137052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Standard Errors of Kernel Equating: Accounting for Bandwidth Estimation 核方程的标准误差:考虑带宽估计

IF 1.2 4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL

Applied Psychological Measurement

Pub Date : 2022-03-07 DOI: 10.1177/01466216211066601

Kseniia Marcq, Björn Andersson

In standardized testing, equating is used to ensure comparability of test scores across multiple test administrations. One equipercentile observed-score equating method is kernel equating, where an essential step is to obtain continuous approximations to the discrete score distributions by applying a kernel with a smoothing bandwidth parameter. When estimating the bandwidth, additional variability is introduced which is currently not accounted for when calculating the standard errors of equating. This poses a threat to the accuracy of the standard errors of equating. In this study, the asymptotic variance of the bandwidth parameter estimator is derived and a modified method for calculating the standard error of equating that accounts for the bandwidth estimation variability is introduced for the equivalent groups design. A simulation study is used to verify the derivations and confirm the accuracy of the modified method across several sample sizes and test lengths as compared to the existing method and the Monte Carlo standard error of equating estimates. The results show that the modified standard errors of equating are accurate under the considered conditions. Furthermore, the modified and the existing methods produce similar results which suggest that the bandwidth variability impact on the standard error of equating is minimal.

在标准化考试中，使用等式来确保多个考试管理部门的考试成绩的可比性。一种等百分比观察到的分数等值方法是核等值，其中一个重要步骤是通过应用具有平滑带宽参数的核来获得离散分数分布的连续近似。在估计带宽时，引入了额外的可变性，目前在计算等式的标准误差时没有考虑到这一点。这对等式标准误差的准确性构成了威胁。在这项研究中，导出了带宽参数估计器的渐近方差，并在等效群设计中引入了一种计算等式标准误差的修正方法，该方法考虑了带宽估计的可变性。与现有方法相比，模拟研究用于验证推导，并确认修正方法在几个样本量和测试长度上的准确性，以及等效估计的蒙特卡罗标准误差。结果表明，在所考虑的条件下，方程的修正标准误差是准确的。此外，修改后的方法和现有的方法产生了类似的结果，表明带宽可变性对等式标准误差的影响是最小的。

{"title":"Standard Errors of Kernel Equating: Accounting for Bandwidth Estimation","authors":"Kseniia Marcq, Björn Andersson","doi":"10.1177/01466216211066601","DOIUrl":"https://doi.org/10.1177/01466216211066601","url":null,"abstract":"In standardized testing, equating is used to ensure comparability of test scores across multiple test administrations. One equipercentile observed-score equating method is kernel equating, where an essential step is to obtain continuous approximations to the discrete score distributions by applying a kernel with a smoothing bandwidth parameter. When estimating the bandwidth, additional variability is introduced which is currently not accounted for when calculating the standard errors of equating. This poses a threat to the accuracy of the standard errors of equating. In this study, the asymptotic variance of the bandwidth parameter estimator is derived and a modified method for calculating the standard error of equating that accounts for the bandwidth estimation variability is introduced for the equivalent groups design. A simulation study is used to verify the derivations and confirm the accuracy of the modified method across several sample sizes and test lengths as compared to the existing method and the Monte Carlo standard error of equating estimates. The results show that the modified standard errors of equating are accurate under the considered conditions. Furthermore, the modified and the existing methods produce similar results which suggest that the bandwidth variability impact on the standard error of equating is minimal.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 1","pages":"200 - 218"},"PeriodicalIF":1.2,"publicationDate":"2022-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49283258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SEMsens: An R Package for Sensitivity Analysis of Structural Equation Models With the Ant Colony Optimization Algorithm. SEMsens：利用蚁群优化算法对结构方程模型进行灵敏度分析的 R 软件包

IF 1 4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL

Applied Psychological Measurement

Pub Date : 2022-03-01 Epub Date: 2022-01-09 DOI: 10.1177/01466216211063233

Zuchao Shen, Walter L Leite

引用次数: 0

Predictive Fit Metrics for Item Response Models. 项目反应模型的预测拟合度量。

IF 1 4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL

Applied Psychological Measurement

Pub Date : 2022-03-01 Epub Date: 2022-02-13 DOI: 10.1177/01466216211066603

Benjamin A Stenhaug, Benjamin W Domingue

The fit of an item response model is typically conceptualized as whether a given model could have generated the data. In this study, for an alternative view of fit, "predictive fit," based on the model's ability to predict new data is advocated. The authors define two prediction tasks: "missing responses prediction"-where the goal is to predict an in-sample person's response to an in-sample item-and "missing persons prediction"-where the goal is to predict an out-of-sample person's string of responses. Based on these prediction tasks, two predictive fit metrics are derived for item response models that assess how well an estimated item response model fits the data-generating model. These metrics are based on long-run out-of-sample predictive performance (i.e., if the data-generating model produced infinite amounts of data, what is the quality of a "model's predictions on average?"). Simulation studies are conducted to identify the prediction-maximizing model across a variety of conditions. For example, defining prediction in terms of missing responses, greater average person ability, and greater item discrimination are all associated with the 3PL model producing relatively worse predictions, and thus lead to greater minimum sample sizes for the 3PL model. In each simulation, the prediction-maximizing model to the model selected by Akaike's information criterion, Bayesian information criterion (BIC), and likelihood ratio tests are compared. It is found that performance of these methods depends on the prediction task of interest. In general, likelihood ratio tests often select overly flexible models, while BIC selects overly parsimonious models. The authors use Programme for International Student Assessment data to demonstrate how to use cross-validation to directly estimate the predictive fit metrics in practice. The implications for item response model selection in operational settings are discussed.

项目反应模型的拟合度通常被理解为一个给定的模型是否能够生成数据。在本研究中，作者提出了另一种拟合度观点，即基于模型预测新数据能力的 "预测拟合度"。作者定义了两种预测任务："缺失反应预测"--目标是预测样本内人员对样本内项目的反应；"缺失人员预测"--目标是预测样本外人员的一连串反应。基于这些预测任务，我们得出了两个项目反应模型的预测拟合度量，用于评估估计的项目反应模型与数据生成模型的拟合程度。这些指标基于长期的样本外预测性能（即如果数据生成模型产生了无限量的数据，那么 "模型的平均预测质量如何？）进行模拟研究是为了确定各种条件下的预测最大化模型。例如，根据缺失的回答、更高的平均个人能力和更高的项目区分度来定义预测，都会使 3PL 模型产生相对较差的预测结果，从而导致 3PL 模型的最小样本量增大。在每次模拟中，都会比较预测最大化模型与阿凯克信息准则、贝叶斯信息准则（BIC）和似然比检验所选择的模型。结果发现，这些方法的性能取决于所关注的预测任务。一般来说，似然比检验往往选择过于灵活的模型，而贝叶斯信息准则则选择过于简单的模型。作者利用国际学生评估项目的数据演示了如何在实践中使用交叉验证来直接估计预测拟合度量。作者还讨论了在实际操作中选择项目反应模型的意义。

{"title":"Predictive Fit Metrics for Item Response Models.","authors":"Benjamin A Stenhaug, Benjamin W Domingue","doi":"10.1177/01466216211066603","DOIUrl":"10.1177/01466216211066603","url":null,"abstract":"The fit of an item response model is typically conceptualized as whether a given model could have generated the data. In this study, for an alternative view of fit, \"predictive fit,\" based on the model's ability to predict new data is advocated. The authors define two prediction tasks: \"missing responses prediction\"-where the goal is to predict an in-sample person's response to an in-sample item-and \"missing persons prediction\"-where the goal is to predict an out-of-sample person's string of responses. Based on these prediction tasks, two predictive fit metrics are derived for item response models that assess how well an estimated item response model fits the data-generating model. These metrics are based on long-run out-of-sample predictive performance (i.e., if the data-generating model produced infinite amounts of data, what is the quality of a \"model's predictions on average?\"). Simulation studies are conducted to identify the prediction-maximizing model across a variety of conditions. For example, defining prediction in terms of missing responses, greater average person ability, and greater item discrimination are all associated with the 3PL model producing relatively worse predictions, and thus lead to greater minimum sample sizes for the 3PL model. In each simulation, the prediction-maximizing model to the model selected by Akaike's information criterion, Bayesian information criterion (BIC), and likelihood ratio tests are compared. It is found that performance of these methods depends on the prediction task of interest. In general, likelihood ratio tests often select overly flexible models, while BIC selects overly parsimonious models. The authors use Programme for International Student Assessment data to demonstrate how to use cross-validation to directly estimate the predictive fit metrics in practice. The implications for item response model selection in operational settings are discussed.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 2","pages":"136-155"},"PeriodicalIF":1.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8908407/pdf/10.1177_01466216211066603.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10810179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Considerations for Fitting Dynamic Bayesian Networks With Latent Variables: A Monte Carlo Study. 具有潜在变量的动态贝叶斯网络拟合的考虑:蒙特卡罗研究。

IF 1.2 4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL

Applied Psychological Measurement

Pub Date : 2022-03-01 DOI: 10.1177/01466216211066609

Ray E Reichenberg, Roy Levy, Adam Clark

Dynamic Bayesian networks (DBNs; Reye, 2004) are a promising tool for modeling student proficiency under rich measurement scenarios (Reichenberg, 2018). These scenarios often present assessment conditions far more complex than what is seen with more traditional assessments and require assessment arguments and psychometric models capable of integrating those complexities. Unfortunately, DBNs remain understudied and their psychometric properties relatively unknown. The current work aimed at exploring the properties of DBNs under a variety of realistic psychometric conditions. A Monte Carlo simulation study was conducted in order to evaluate parameter recovery for DBNs using maximum likelihood estimation. Manipulated factors included sample size, measurement quality, test length, the number of measurement occasions. Results suggested that measurement quality has the most prominent impact on estimation quality with more distinct performance categories yielding better estimation. From a practical perspective, parameter recovery appeared to be sufficient with samples as low as N = 400 as long as measurement quality was not poor and at least three items were present at each measurement occasion. Tests consisting of only a single item required exceptional measurement quality in order to adequately recover model parameters.

动态贝叶斯网络;Reye, 2004)是一种很有前途的工具，可以在丰富的测量场景下模拟学生的熟练程度(Reichenberg, 2018)。这些场景通常呈现的评估条件比传统评估所看到的要复杂得多，需要能够整合这些复杂性的评估论证和心理测量模型。不幸的是，dbn仍未得到充分研究，其心理测量特性也相对未知。目前的工作旨在探索在各种现实心理测量条件下dbn的特性。为了使用最大似然估计评估dbn的参数恢复，进行了蒙特卡罗模拟研究。操纵因素包括样本量、测量质量、测试长度、测量次数。结果表明，度量质量对估计质量的影响最为显著，不同的性能类别产生更好的估计。从实际应用的角度来看，只要测量质量不差，每次测量至少有三个项目存在，只要样本低至N = 400，参数回收率似乎是足够的。仅由单一项目组成的测试需要特殊的测量质量，以便充分恢复模型参数。

{"title":"Considerations for Fitting Dynamic Bayesian Networks With Latent Variables: A Monte Carlo Study.","authors":"Ray E Reichenberg, Roy Levy, Adam Clark","doi":"10.1177/01466216211066609","DOIUrl":"https://doi.org/10.1177/01466216211066609","url":null,"abstract":"Dynamic Bayesian networks (DBNs; Reye, 2004) are a promising tool for modeling student proficiency under rich measurement scenarios (Reichenberg, 2018). These scenarios often present assessment conditions far more complex than what is seen with more traditional assessments and require assessment arguments and psychometric models capable of integrating those complexities. Unfortunately, DBNs remain understudied and their psychometric properties relatively unknown. The current work aimed at exploring the properties of DBNs under a variety of realistic psychometric conditions. A Monte Carlo simulation study was conducted in order to evaluate parameter recovery for DBNs using maximum likelihood estimation. Manipulated factors included sample size, measurement quality, test length, the number of measurement occasions. Results suggested that measurement quality has the most prominent impact on estimation quality with more distinct performance categories yielding better estimation. From a practical perspective, parameter recovery appeared to be sufficient with samples as low as N = 400 as long as measurement quality was not poor and at least three items were present at each measurement occasion. Tests consisting of only a single item required exceptional measurement quality in order to adequately recover model parameters.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 2","pages":"116-135"},"PeriodicalIF":1.2,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8908410/pdf/10.1177_01466216211066609.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10615071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Bayesian Approaches for Detecting Differential Item Functioning Using the Generalized Graded Unfolding Model. 用广义梯度展开模型检测项目微分功能的贝叶斯方法。

IF 1.2 4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL

Applied Psychological Measurement

Pub Date : 2022-03-01 DOI: 10.1177/01466216211066606

Seang-Hwane Joo, Philseok Lee, Stephen Stark

Differential item functioning (DIF) analysis is one of the most important applications of item response theory (IRT) in psychological assessment. This study examined the performance of two Bayesian DIF methods, Bayes factor (BF) and deviance information criterion (DIC), with the generalized graded unfolding model (GGUM). The Type I error and power were investigated in a Monte Carlo simulation that manipulated sample size, DIF source, DIF size, DIF location, subpopulation trait distribution, and type of baseline model. We also examined the performance of two likelihood-based methods, the likelihood ratio (LR) test and Akaike information criterion (AIC), using marginal maximum likelihood (MML) estimation for comparison with past DIF research. The results indicated that the proposed BF and DIC methods provided well-controlled Type I error and high power using a free-baseline model implementation, their performance was superior to LR and AIC in terms of Type I error rates when the reference and focal group trait distributions differed. The implications and recommendations for applied research are discussed.

差异项目功能分析是项目反应理论在心理评估中的重要应用之一。本文研究了广义梯度展开模型(GGUM)下贝叶斯因子(BF)和偏差信息准则(DIC)两种贝叶斯DIF方法的性能。通过蒙特卡罗模拟研究了I型误差和功率，其中包括样本量、DIF来源、DIF大小、DIF位置、亚种群性状分布和基线模型类型。我们还检验了两种基于似然比(LR)检验和赤池信息准则(AIC)的方法的性能，使用边际最大似然(MML)估计与过去的DIF研究进行比较。结果表明，在自由基线模型实现下，BF和DIC方法的I型错误率控制良好，在参考群和焦点群特征分布不同的情况下，其性能优于LR和AIC方法。最后讨论了应用研究的意义和建议。

{"title":"Bayesian Approaches for Detecting Differential Item Functioning Using the Generalized Graded Unfolding Model.","authors":"Seang-Hwane Joo, Philseok Lee, Stephen Stark","doi":"10.1177/01466216211066606","DOIUrl":"https://doi.org/10.1177/01466216211066606","url":null,"abstract":"Differential item functioning (DIF) analysis is one of the most important applications of item response theory (IRT) in psychological assessment. This study examined the performance of two Bayesian DIF methods, Bayes factor (BF) and deviance information criterion (DIC), with the generalized graded unfolding model (GGUM). The Type I error and power were investigated in a Monte Carlo simulation that manipulated sample size, DIF source, DIF size, DIF location, subpopulation trait distribution, and type of baseline model. We also examined the performance of two likelihood-based methods, the likelihood ratio (LR) test and Akaike information criterion (AIC), using marginal maximum likelihood (MML) estimation for comparison with past DIF research. The results indicated that the proposed BF and DIC methods provided well-controlled Type I error and high power using a free-baseline model implementation, their performance was superior to LR and AIC in terms of Type I error rates when the reference and focal group trait distributions differed. The implications and recommendations for applied research are discussed.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 2","pages":"98-115"},"PeriodicalIF":1.2,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8908411/pdf/10.1177_01466216211066606.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10800335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Scale Linking for the Testlet Item Response Theory Model. 测验项目反应理论模型的量表链接。

IF 1.2 4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL

Applied Psychological Measurement

Pub Date : 2022-03-01 DOI: 10.1177/01466216211063234

Seonghoon Kim, Michael J Kolen

In their 2005 paper, Li and her colleagues proposed a test response function (TRF) linking method for a two-parameter testlet model and used a genetic algorithm to find minimization solutions for the linking coefficients. In the present paper the linking task for a three-parameter testlet model is formulated from the perspective of bi-factor modeling, and three linking methods for the model are presented: the TRF, mean/least squares (MLS), and item response function (IRF) methods. Simulations are conducted to compare the TRF method using a genetic algorithm with the TRF and IRF methods using a quasi-Newton algorithm and the MLS method. The results indicate that the IRF, MLS, and TRF methods perform very well, well, and poorly, respectively, in estimating the linking coefficients associated with testlet effects, that the use of genetic algorithms offers little improvement to the TRF method, and that the minimization function for the TRF method is not as well-structured as that for the IRF method.

在2005年的论文中，Li和她的同事提出了一种双参数测试模型的测试响应函数(TRF)连接方法，并使用遗传算法找到连接系数的最小化解。本文从双因素建模的角度出发，提出了三参数测试集模型的链接任务，并提出了三种链接模型的方法:TRF、mean/least squares (MLS)和item response function (IRF)方法。通过仿真比较了基于遗传算法的TRF方法与基于准牛顿算法和MLS方法的TRF和IRF方法。结果表明，IRF、MLS和TRF方法在估计与测试集效应相关的连接系数方面分别表现得很好、很好和很差，遗传算法的使用对TRF方法的改进很小，并且TRF方法的最小化函数不如IRF方法结构良好。

引用次数: 0