首页 > 最新文献

Educational and Psychological Measurement最新文献

英文 中文
Item Parameter Recovery: Sensitivity to Prior Distribution 项目参数恢复:对先验分布的敏感性
3区 心理学 Q1 Social Sciences Pub Date : 2023-10-30 DOI: 10.1177/00131644231203688
Christine E. DeMars, Paulius Satkus
Marginal maximum likelihood, a common estimation method for item response theory models, is not inherently a Bayesian procedure. However, due to estimation difficulties, Bayesian priors are often applied to the likelihood when estimating 3PL models, especially with small samples. Little focus has been placed on choosing the priors for marginal maximum estimation. In this study, using sample sizes of 1,000 or smaller, not using priors often led to extreme, implausible parameter estimates. Applying prior distributions to the c-parameters alleviated the estimation problems with samples of 500 or more; for the samples of 100, priors on both the a-parameters and c-parameters were needed. Estimates were biased when the mode of the prior did not match the true parameter value, but the degree of the bias did not depend on the strength of the prior unless it was extremely informative. The root mean squared error (RMSE) of the a-parameters and b-parameters did not depend greatly on either the mode or the strength of the prior unless it was extremely informative. The RMSE of the c-parameters, like the bias, depended on the mode of the prior for c.
边际最大似然是项目反应理论模型中常用的一种估计方法,它本身并不是贝叶斯过程。然而,由于估计困难,在估计3PL模型时,特别是在小样本情况下,贝叶斯先验经常应用于似然。很少关注于选择边际最大值估计的先验。在这项研究中,使用1000或更小的样本量,不使用先验往往导致极端的,难以置信的参数估计。对c参数应用先验分布可以缓解500个或更多样本的估计问题;对于100个样本,a参数和c参数都需要先验。当先验的模式与真实参数值不匹配时,估计是有偏差的,但偏差的程度不取决于先验的强度,除非它具有极大的信息量。a参数和b参数的均方根误差(RMSE)不太依赖于先验的模式或强度,除非它具有极大的信息量。c参数的RMSE,就像偏差一样,取决于c的先验模式。
{"title":"Item Parameter Recovery: Sensitivity to Prior Distribution","authors":"Christine E. DeMars, Paulius Satkus","doi":"10.1177/00131644231203688","DOIUrl":"https://doi.org/10.1177/00131644231203688","url":null,"abstract":"Marginal maximum likelihood, a common estimation method for item response theory models, is not inherently a Bayesian procedure. However, due to estimation difficulties, Bayesian priors are often applied to the likelihood when estimating 3PL models, especially with small samples. Little focus has been placed on choosing the priors for marginal maximum estimation. In this study, using sample sizes of 1,000 or smaller, not using priors often led to extreme, implausible parameter estimates. Applying prior distributions to the c-parameters alleviated the estimation problems with samples of 500 or more; for the samples of 100, priors on both the a-parameters and c-parameters were needed. Estimates were biased when the mode of the prior did not match the true parameter value, but the degree of the bias did not depend on the strength of the prior unless it was extremely informative. The root mean squared error (RMSE) of the a-parameters and b-parameters did not depend greatly on either the mode or the strength of the prior unless it was extremely informative. The RMSE of the c-parameters, like the bias, depended on the mode of the prior for c.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136067363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Linear Factor Analytic Thurstonian Forced-Choice Models: Current Status and Issues 线性因子分析瑟斯顿强迫选择模型:现状与问题
3区 心理学 Q1 Social Sciences Pub Date : 2023-10-30 DOI: 10.1177/00131644231205011
Markus T. Jansen, Ralf Schulze
Thurstonian forced-choice modeling is considered to be a powerful new tool to estimate item and person parameters while simultaneously testing the model fit. This assessment approach is associated with the aim of reducing faking and other response tendencies that plague traditional self-report trait assessments. As a result of major recent methodological developments, the estimation of normative trait scores has become possible in addition to the computation of only ipsative scores. This opened up the important possibility of comparisons between individuals with forced-choice assessment procedures. With item response theory (IRT) methods, a multidimensional forced-choice (MFC) format has also been proposed to estimate individual scores. Customarily, items to assess different traits are presented in blocks, often triplets, in applications of the MFC, which is an efficient form of item presentation but also a simplification of the original models. The present study provides a comprehensive review of the present status of Thurstonian forced-choice models and their variants. Critical features of the current models, especially the block models, are identified and discussed. It is concluded that MFC modeling with item blocks is highly problematic and yields biased results. In particular, the often-recommended presentation of blocks with items that are keyed in different directions of a trait proves to be counterproductive considering the goal to reduce response tendencies. The consequences and implications of the highlighted issues are further discussed.
瑟斯顿强迫选择建模被认为是一种强大的新工具,可以估计项目和人的参数,同时测试模型的拟合。这种评估方法的目的是减少欺骗和其他困扰传统自我报告特质评估的反应倾向。由于最近主要的方法发展,除了计算负性分数之外,对规范性特征分数的估计已经成为可能。这就提供了一种重要的可能性,可以对具有强制选择评估程序的个人进行比较。利用项目反应理论(IRT)方法,提出了一种多维强迫选择(MFC)格式来估计个体得分。通常,在MFC应用中,评估不同特征的项目以块(通常是三元组)的形式呈现,这是一种有效的项目呈现形式,也是对原始模型的简化。本研究对瑟斯顿强迫选择模型及其变体的现状进行了全面的回顾。对当前模型,特别是块模型的关键特征进行了识别和讨论。结论是,使用项目块的MFC建模是非常有问题的,并且产生有偏差的结果。特别是,考虑到减少反应倾向的目标,经常推荐的用不同方向键的项目展示块被证明是适得其反的。进一步讨论了突出问题的后果和影响。
{"title":"Linear Factor Analytic Thurstonian Forced-Choice Models: Current Status and Issues","authors":"Markus T. Jansen, Ralf Schulze","doi":"10.1177/00131644231205011","DOIUrl":"https://doi.org/10.1177/00131644231205011","url":null,"abstract":"Thurstonian forced-choice modeling is considered to be a powerful new tool to estimate item and person parameters while simultaneously testing the model fit. This assessment approach is associated with the aim of reducing faking and other response tendencies that plague traditional self-report trait assessments. As a result of major recent methodological developments, the estimation of normative trait scores has become possible in addition to the computation of only ipsative scores. This opened up the important possibility of comparisons between individuals with forced-choice assessment procedures. With item response theory (IRT) methods, a multidimensional forced-choice (MFC) format has also been proposed to estimate individual scores. Customarily, items to assess different traits are presented in blocks, often triplets, in applications of the MFC, which is an efficient form of item presentation but also a simplification of the original models. The present study provides a comprehensive review of the present status of Thurstonian forced-choice models and their variants. Critical features of the current models, especially the block models, are identified and discussed. It is concluded that MFC modeling with item blocks is highly problematic and yields biased results. In particular, the often-recommended presentation of blocks with items that are keyed in different directions of a trait proves to be counterproductive considering the goal to reduce response tendencies. The consequences and implications of the highlighted issues are further discussed.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136069087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Thinking About Sum Scores Yet Again, Maybe the Last Time, We Don’t Know, Oh No . . .: A Comment on 再一次思考求和分数,也许是最后一次,我们不知道,哦不…:评论
3区 心理学 Q1 Social Sciences Pub Date : 2023-10-13 DOI: 10.1177/00131644231205310
Keith F. Widaman, William Revelle
The relative advantages and disadvantages of sum scores and estimated factor scores are issues of concern for substantive research in psychology. Recently, while championing estimated factor scores over sum scores, McNeish offered a trenchant rejoinder to an article by Widaman and Revelle, which had critiqued an earlier paper by McNeish and Wolf. In the recent contribution, McNeish misrepresented a number of claims by Widaman and Revelle, rendering moot his criticisms of Widaman and Revelle. Notably, McNeish chose to avoid confronting a key strength of sum scores stressed by Widaman and Revelle—the greater comparability of results across studies if sum scores are used. Instead, McNeish pivoted to present a host of simulation studies to identify relative strengths of estimated factor scores. Here, we review our prior claims and, in the process, deflect purported criticisms by McNeish. We discuss briefly issues related to simulated data and empirical data that provide evidence of strengths of each type of score. In doing so, we identified a second strength of sum scores: superior cross-validation of results across independent samples of empirical data, at least for samples of moderate size. We close with consideration of four general issues concerning sum scores and estimated factor scores that highlight the contrasts between positions offered by McNeish and by us, issues of importance when pursuing applied research in our field.
总和分数和估计因子分数的相对优劣是心理学实质性研究关注的问题。最近,麦克尼什在支持估计因子得分高于总和得分的同时,对Widaman和Revelle的一篇文章提出了尖锐的反驳,这篇文章批评了麦克尼什和沃尔夫早先的一篇论文。在最近的文章中,McNeish歪曲了Widaman和Revelle的一些观点,使他对Widaman和Revelle的批评变得毫无意义。值得注意的是,McNeish选择避免面对Widaman和revelve强调的总和分数的关键优势-如果使用总和分数,则研究结果的可比性更大。相反,麦克尼什转而提出了一系列模拟研究,以确定估计因素得分的相对优势。在这里,我们回顾我们之前的主张,并在此过程中,转移McNeish的批评。我们简要讨论了与模拟数据和经验数据相关的问题,这些数据提供了每种类型得分优势的证据。在这样做的过程中,我们确定了总和分数的第二个优势:至少对于中等规模的样本,在经验数据的独立样本中,结果的交叉验证效果更好。最后,我们考虑了关于总分数和估计因子分数的四个一般问题,这些问题突出了麦克尼什和我们提供的职位之间的对比,这是在我们的领域进行应用研究时的重要问题。
{"title":"Thinking About Sum Scores Yet Again, Maybe the Last Time, We Don’t Know, Oh No . . .: A Comment on","authors":"Keith F. Widaman, William Revelle","doi":"10.1177/00131644231205310","DOIUrl":"https://doi.org/10.1177/00131644231205310","url":null,"abstract":"The relative advantages and disadvantages of sum scores and estimated factor scores are issues of concern for substantive research in psychology. Recently, while championing estimated factor scores over sum scores, McNeish offered a trenchant rejoinder to an article by Widaman and Revelle, which had critiqued an earlier paper by McNeish and Wolf. In the recent contribution, McNeish misrepresented a number of claims by Widaman and Revelle, rendering moot his criticisms of Widaman and Revelle. Notably, McNeish chose to avoid confronting a key strength of sum scores stressed by Widaman and Revelle—the greater comparability of results across studies if sum scores are used. Instead, McNeish pivoted to present a host of simulation studies to identify relative strengths of estimated factor scores. Here, we review our prior claims and, in the process, deflect purported criticisms by McNeish. We discuss briefly issues related to simulated data and empirical data that provide evidence of strengths of each type of score. In doing so, we identified a second strength of sum scores: superior cross-validation of results across independent samples of empirical data, at least for samples of moderate size. We close with consideration of four general issues concerning sum scores and estimated factor scores that highlight the contrasts between positions offered by McNeish and by us, issues of importance when pursuing applied research in our field.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135859120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Impact and Detection of Uniform Differential Item Functioning for Continuous Item Response Models. 一致微分项目函数对连续项目响应模型的影响和检测。
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-10-01 Epub Date: 2022-07-21 DOI: 10.1177/00131644221111993
W Holmes Finch

Psychometricians have devoted much research and attention to categorical item responses, leading to the development and widespread use of item response theory for the estimation of model parameters and identification of items that do not perform in the same way for examinees from different population subgroups (e.g., differential item functioning [DIF]). With the increasing use of computer-based measurement, use of items with a continuous response modality is becoming more common. Models for use with these items have been developed and refined in recent years, but less attention has been devoted to investigating DIF for these continuous response models (CRMs). Therefore, the purpose of this simulation study was to compare the performance of three potential methods for assessing DIF for CRMs, including regression, the MIMIC model, and factor invariance testing. Study results revealed that the MIMIC model provided a combination of Type I error control and relatively high power for detecting DIF. Implications of these findings are discussed.

心理测量学家对分类项目反应进行了大量研究和关注,导致项目反应理论的发展和广泛使用,用于估计模型参数,并识别不同人群亚组考生表现不同的项目(例如,差异项目功能[DIF])。随着越来越多地使用基于计算机的测量,使用具有连续反应模式的项目变得越来越普遍。近年来,已经开发和完善了用于这些项目的模型,但很少关注研究这些连续响应模型(CRM)的DIF。因此,本模拟研究的目的是比较评估CRM DIF的三种潜在方法的性能,包括回归、MIMIC模型和因子不变性测试。研究结果表明,MIMIC模型为检测DIF提供了I型误差控制和相对高功率的组合。讨论了这些发现的含义。
{"title":"The Impact and Detection of Uniform Differential Item Functioning for Continuous Item Response Models.","authors":"W Holmes Finch","doi":"10.1177/00131644221111993","DOIUrl":"10.1177/00131644221111993","url":null,"abstract":"<p><p>Psychometricians have devoted much research and attention to categorical item responses, leading to the development and widespread use of item response theory for the estimation of model parameters and identification of items that do not perform in the same way for examinees from different population subgroups (e.g., differential item functioning [DIF]). With the increasing use of computer-based measurement, use of items with a continuous response modality is becoming more common. Models for use with these items have been developed and refined in recent years, but less attention has been devoted to investigating DIF for these continuous response models (CRMs). Therefore, the purpose of this simulation study was to compare the performance of three potential methods for assessing DIF for CRMs, including regression, the MIMIC model, and factor invariance testing. Study results revealed that the MIMIC model provided a combination of Type I error control and relatively high power for detecting DIF. Implications of these findings are discussed.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10470162/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10506042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting Cheating in Large-Scale Assessment: The Transfer of Detectors to New Tests. 大规模评估中的作弊检测:检测器向新测试的转移。
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-10-01 Epub Date: 2022-11-04 DOI: 10.1177/00131644221132723
Jochen Ranger, Nico Schmidt, Anett Wolgast

Recent approaches to the detection of cheaters in tests employ detectors from the field of machine learning. Detectors based on supervised learning algorithms achieve high accuracy but require labeled data sets with identified cheaters for training. Labeled data sets are usually not available at an early stage of the assessment period. In this article, we discuss the approach of adapting a detector that was trained previously with a labeled training data set to a new unlabeled data set. The training and the new data set may contain data from different tests. The adaptation of detectors to new data or tasks is denominated as transfer learning in the field of machine learning. We first discuss the conditions under which a detector of cheating can be transferred. We then investigate whether the conditions are met in a real data set. We finally evaluate the benefits of transferring a detector of cheating. We find that a transferred detector has higher accuracy than an unsupervised detector of cheating. A naive transfer that consists of a simple reuse of the detector increases the accuracy considerably. A transfer via a self-labeling (SETRED) algorithm increases the accuracy slightly more than the naive transfer. The findings suggest that the detection of cheating might be improved by using existing detectors of cheating at an early stage of an assessment period.

最近在测试中检测作弊者的方法使用了机器学习领域的检测器。基于监督学习算法的检测器实现了高精度,但需要带有已识别作弊者的标记数据集进行训练。标记的数据集通常在评估期的早期阶段不可用。在本文中,我们讨论了将先前使用标记的训练数据集训练的检测器调整为新的未标记数据集的方法。训练和新的数据集可以包含来自不同测试的数据。在机器学习领域,检测器对新数据或任务的适应被称为迁移学习。我们首先讨论作弊检测器可以转移的条件。然后,我们调查在真实数据集中是否满足这些条件。我们最后评估了转移作弊检测器的好处。我们发现,转移检测器比无监督的作弊检测器具有更高的准确性。一个简单的转移,包括检测器的简单重用,大大提高了精度。通过自标记(SETRED)算法的转移比原始转移略微提高了准确性。研究结果表明,在评估期的早期阶段,使用现有的作弊检测器可能会提高作弊的检测能力。
{"title":"Detecting Cheating in Large-Scale Assessment: The Transfer of Detectors to New Tests.","authors":"Jochen Ranger, Nico Schmidt, Anett Wolgast","doi":"10.1177/00131644221132723","DOIUrl":"10.1177/00131644221132723","url":null,"abstract":"<p><p>Recent approaches to the detection of cheaters in tests employ detectors from the field of machine learning. Detectors based on supervised learning algorithms achieve high accuracy but require labeled data sets with identified cheaters for training. Labeled data sets are usually not available at an early stage of the assessment period. In this article, we discuss the approach of adapting a detector that was trained previously with a labeled training data set to a new unlabeled data set. The training and the new data set may contain data from different tests. The adaptation of detectors to new data or tasks is denominated as transfer learning in the field of machine learning. We first discuss the conditions under which a detector of cheating can be transferred. We then investigate whether the conditions are met in a real data set. We finally evaluate the benefits of transferring a detector of cheating. We find that a transferred detector has higher accuracy than an unsupervised detector of cheating. A naive transfer that consists of a simple reuse of the detector increases the accuracy considerably. A transfer via a self-labeling (SETRED) algorithm increases the accuracy slightly more than the naive transfer. The findings suggest that the detection of cheating might be improved by using existing detectors of cheating at an early stage of an assessment period.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10470164/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10525104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal Data Fusion to Detect Preknowledge Test-Taking Behavior Using Machine Learning 利用机器学习检测预见性应试行为的多模态数据融合
3区 心理学 Q1 Social Sciences Pub Date : 2023-09-19 DOI: 10.1177/00131644231193625
Kaiwen Man
In various fields, including college admission, medical board certifications, and military recruitment, high-stakes decisions are frequently made based on scores obtained from large-scale assessments. These decisions necessitate precise and reliable scores that enable valid inferences to be drawn about test-takers. However, the ability of such tests to provide reliable, accurate inference on a test-taker’s performance could be jeopardized by aberrant test-taking practices, for instance, practicing real items prior to the test. As a result, it is crucial for administrators of such assessments to develop strategies that detect potential aberrant test-takers after data collection. The aim of this study is to explore the implementation of machine learning methods in combination with multimodal data fusion strategies that integrate bio-information technology, such as eye-tracking, and psychometric measures, including response times and item responses, to detect aberrant test-taking behaviors in technology-assisted remote testing settings.
在大学入学、医学委员会认证、征兵等各个领域,高风险的决定往往是根据大规模评估得出的分数做出的。这些决定需要精确和可靠的分数,以便对考生进行有效的推断。然而,这种测试为考生的表现提供可靠、准确推断的能力可能会因异常的考试做法而受到损害,例如,在考试前练习真实的题目。因此,对于这些评估的管理者来说,制定策略,在数据收集后发现潜在的异常考生是至关重要的。本研究的目的是探索将机器学习方法与多模态数据融合策略相结合的实施,该策略整合了生物信息技术,如眼球追踪和心理测量,包括反应时间和项目反应,以检测技术辅助远程测试设置中的异常考试行为。
{"title":"Multimodal Data Fusion to Detect Preknowledge Test-Taking Behavior Using Machine Learning","authors":"Kaiwen Man","doi":"10.1177/00131644231193625","DOIUrl":"https://doi.org/10.1177/00131644231193625","url":null,"abstract":"In various fields, including college admission, medical board certifications, and military recruitment, high-stakes decisions are frequently made based on scores obtained from large-scale assessments. These decisions necessitate precise and reliable scores that enable valid inferences to be drawn about test-takers. However, the ability of such tests to provide reliable, accurate inference on a test-taker’s performance could be jeopardized by aberrant test-taking practices, for instance, practicing real items prior to the test. As a result, it is crucial for administrators of such assessments to develop strategies that detect potential aberrant test-takers after data collection. The aim of this study is to explore the implementation of machine learning methods in combination with multimodal data fusion strategies that integrate bio-information technology, such as eye-tracking, and psychometric measures, including response times and item responses, to detect aberrant test-taking behaviors in technology-assisted remote testing settings.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135014578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fixed Effects or Mixed Effects Classifiers? Evidence From Simulated and Archival Data. 固定效应还是混合效应分类器?来自模拟数据和档案数据的证据
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-08-01 Epub Date: 2022-06-30 DOI: 10.1177/00131644221108180
Anthony A Mangino, Jocelyn H Bolin, W Holmes Finch

This study seeks to compare fixed and mixed effects models for the purposes of predictive classification in the presence of multilevel data. The first part of the study utilizes a Monte Carlo simulation to compare fixed and mixed effects logistic regression and random forests. An applied examination of the prediction of student retention in the public-use U.S. PISA data set was considered to verify the simulation findings. Results of this study indicate fixed effects models performed comparably with mixed effects models across both the simulation and PISA examinations. Results broadly suggest that researchers should be cognizant of the type of predictors and data structure being used, as these factors carried more weight than did the model type.

本研究旨在比较固定效应模型和混合效应模型,以便在多层次数据情况下进行预测分类。研究的第一部分利用蒙特卡罗模拟来比较固定效应和混合效应逻辑回归与随机森林。为了验证模拟结果,我们考虑了对美国国际学生评估项目(PISA)公共使用数据集中的学生保留率预测进行应用检查。研究结果表明,固定效应模型与混合效应模型在模拟和 PISA 考试中的表现相当。研究结果广泛表明,研究人员应认识到所使用的预测因子类型和数据结构,因为这些因素比模型类型更重要。
{"title":"Fixed Effects or Mixed Effects Classifiers? Evidence From Simulated and Archival Data.","authors":"Anthony A Mangino, Jocelyn H Bolin, W Holmes Finch","doi":"10.1177/00131644221108180","DOIUrl":"10.1177/00131644221108180","url":null,"abstract":"<p><p>This study seeks to compare fixed and mixed effects models for the purposes of predictive classification in the presence of multilevel data. The first part of the study utilizes a Monte Carlo simulation to compare fixed and mixed effects logistic regression and random forests. An applied examination of the prediction of student retention in the public-use U.S. PISA data set was considered to verify the simulation findings. Results of this study indicate fixed effects models performed comparably with mixed effects models across both the simulation and PISA examinations. Results broadly suggest that researchers should be cognizant of the type of predictors and data structure being used, as these factors carried more weight than did the model type.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311958/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9747521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploration of the Stacking Ensemble Machine Learning Algorithm for Cheating Detection in Large-Scale Assessment. 探索用于大规模评估作弊检测的堆叠集合机器学习算法。
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-08-01 Epub Date: 2022-08-13 DOI: 10.1177/00131644221117193
Todd Zhou, Hong Jiao

Cheating detection in large-scale assessment received considerable attention in the extant literature. However, none of the previous studies in this line of research investigated the stacking ensemble machine learning algorithm for cheating detection. Furthermore, no study addressed the issue of class imbalance using resampling. This study explored the application of the stacking ensemble machine learning algorithm to analyze the item response, response time, and augmented data of test-takers to detect cheating behaviors. The performance of the stacking method was compared with that of two other ensemble methods (bagging and boosting) as well as six base non-ensemble machine learning algorithms. Issues related to class imbalance and input features were addressed. The study results indicated that stacking, resampling, and feature sets including augmented summary data generally performed better than its counterparts in cheating detection. Compared with other competing machine learning algorithms investigated in this study, the meta-model from stacking using discriminant analysis based on the top two base models-Gradient Boosting and Random Forest-generally performed the best when item responses and the augmented summary statistics were used as the input features with an under-sampling ratio of 10:1 among all the study conditions.

大规模评估中的作弊检测在现有文献中受到了广泛关注。然而,在这一研究方向上,之前的研究都没有调查过用于作弊检测的堆叠集合机器学习算法。此外,也没有研究使用重采样来解决类不平衡的问题。本研究探索了堆叠集合机器学习算法在分析考生的项目响应、响应时间和增强数据以检测作弊行为中的应用。研究将堆叠方法的性能与其他两种集合方法(bagging 和 boosting)以及六种基本的非集合机器学习算法进行了比较。研究还探讨了与类不平衡和输入特征相关的问题。研究结果表明,在作弊检测方面,堆叠、重采样和包含增强摘要数据的特征集的性能普遍优于同类算法。与本研究中调查的其他竞争性机器学习算法相比,在所有研究条件中,当使用项目回答和增强汇总统计数据作为输入特征时,使用基于前两个基本模型--梯度提升和随机森林--的判别分析的堆叠元模型的表现一般最佳,而采样不足比率为 10:1。
{"title":"Exploration of the Stacking Ensemble Machine Learning Algorithm for Cheating Detection in Large-Scale Assessment.","authors":"Todd Zhou, Hong Jiao","doi":"10.1177/00131644221117193","DOIUrl":"10.1177/00131644221117193","url":null,"abstract":"<p><p>Cheating detection in large-scale assessment received considerable attention in the extant literature. However, none of the previous studies in this line of research investigated the stacking ensemble machine learning algorithm for cheating detection. Furthermore, no study addressed the issue of class imbalance using resampling. This study explored the application of the stacking ensemble machine learning algorithm to analyze the item response, response time, and augmented data of test-takers to detect cheating behaviors. The performance of the stacking method was compared with that of two other ensemble methods (bagging and boosting) as well as six base non-ensemble machine learning algorithms. Issues related to class imbalance and input features were addressed. The study results indicated that stacking, resampling, and feature sets including augmented summary data generally performed better than its counterparts in cheating detection. Compared with other competing machine learning algorithms investigated in this study, the meta-model from stacking using discriminant analysis based on the top two base models-Gradient Boosting and Random Forest-generally performed the best when item responses and the augmented summary statistics were used as the input features with an under-sampling ratio of 10:1 among all the study conditions.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311957/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9747522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparing the Psychometric Properties of a Scale Across Three Likert and Three Alternative Formats: An Application to the Rosenberg Self-Esteem Scale. 比较三种李克特和三种替代格式量表的心理测量特性:在罗森博格自尊量表中的应用。
IF 2.7 3区 心理学 Q1 Social Sciences Pub Date : 2023-08-01 DOI: 10.1177/00131644221111402
Xijuan Zhang, Linnan Zhou, Victoria Savalei

Zhang and Savalei proposed an alternative scale format to the Likert format, called the Expanded format. In this format, response options are presented in complete sentences, which can reduce acquiescence bias and method effects. The goal of the current study was to compare the psychometric properties of the Rosenberg Self-Esteem Scale (RSES) in the Expanded format and in two other alternative formats, relative to several versions of the traditional Likert format. We conducted two studies to compare the psychometric properties of the RSES across the different formats. We found that compared with the Likert format, the alternative formats tend to have a unidimensional factor structure, less response inconsistency, and comparable validity. In addition, we found that the Expanded format resulted in the best factor structure among the three alternative formats. Researchers should consider the Expanded format, especially when creating short psychological scales such as the RSES.

Zhang和Savalei提出了一种替代李克特量表格式的量表格式,称为扩展量表格式。在这种格式中,回答选项以完整的句子形式呈现,可以减少默认偏见和方法效应。本研究的目的是比较罗森博格自尊量表(RSES)的扩展格式和其他两种可选格式的心理测量特性,相对于传统李克特格式的几个版本。我们进行了两项研究来比较不同格式的RSES的心理测量特性。研究发现,与李克特问卷相比,备选问卷具有单向度的因素结构、较少的反应不一致性和可比较效度。此外,我们发现在三种备选格式中,扩展格式的因子结构最好。研究人员应该考虑扩展格式,特别是在创建像RSES这样的短心理量表时。
{"title":"Comparing the Psychometric Properties of a Scale Across Three Likert and Three Alternative Formats: An Application to the Rosenberg Self-Esteem Scale.","authors":"Xijuan Zhang,&nbsp;Linnan Zhou,&nbsp;Victoria Savalei","doi":"10.1177/00131644221111402","DOIUrl":"https://doi.org/10.1177/00131644221111402","url":null,"abstract":"<p><p>Zhang and Savalei proposed an alternative scale format to the Likert format, called the Expanded format. In this format, response options are presented in complete sentences, which can reduce acquiescence bias and method effects. The goal of the current study was to compare the psychometric properties of the Rosenberg Self-Esteem Scale (RSES) in the Expanded format and in two other alternative formats, relative to several versions of the traditional Likert format. We conducted two studies to compare the psychometric properties of the RSES across the different formats. We found that compared with the Likert format, the alternative formats tend to have a unidimensional factor structure, less response inconsistency, and comparable validity. In addition, we found that the Expanded format resulted in the best factor structure among the three alternative formats. Researchers should consider the Expanded format, especially when creating short psychological scales such as the RSES.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/0c/99/10.1177_00131644221111402.PMC10311935.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9802113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Relative Robustness of CDMs and (M)IRT in Measuring Growth in Latent Skills. CDMs 和 (M)IRT 在衡量潜在技能增长方面的相对稳健性。
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-08-01 Epub Date: 2022-08-18 DOI: 10.1177/00131644221117194
Qi Helen Huang, Daniel M Bolt

Previous studies have demonstrated evidence of latent skill continuity even in tests intentionally designed for measurement of binary skills. In addition, the assumption of binary skills when continuity is present has been shown to potentially create a lack of invariance in item and latent ability parameters that may undermine applications. In this article, we examine measurement of growth as one such application, and consider multidimensional item response theory (MIRT) as a competing alternative. Motivated by prior findings concerning the effects of skill continuity, we study the relative robustness of cognitive diagnostic models (CDMs) and (M)IRT models in the measurement of growth under both binary and continuous latent skill distributions. We find CDMs to be a less robust way of quantifying growth under misspecification, and subsequently provide a real-data example suggesting underestimation of growth as a likely consequence. It is suggested that researchers should regularly attend to the assumptions associated with the use of latent binary skills and consider (M)IRT as a potentially more robust alternative if unsure of their discrete nature.

以往的研究已经证明,即使是有意为测量二元技能而设计的测验,也存在潜在技能连续性的证据。此外,在连续性存在的情况下,二元技能的假设已被证明可能会造成项目和潜在能力参数缺乏不变性,从而影响应用。在本文中,我们将成长测量作为其中一种应用进行研究,并考虑将多维项目反应理论(MIRT)作为一种可供选择的替代方法。受先前关于技能连续性影响的研究结果的启发,我们研究了认知诊断模型(CDMs)和(M)IRT 模型在二元和连续潜在技能分布条件下测量成长的相对稳健性。我们发现,认知诊断模型是一种在误设情况下量化成长的不太稳健的方法,并随后提供了一个真实数据示例,表明低估成长很可能是一种后果。我们建议,研究人员应定期关注与使用二元潜技能相关的假设,如果不确定其离散性,可考虑将(M)IRT 作为一种潜在的更稳健的替代方法。
{"title":"Relative Robustness of CDMs and (M)IRT in Measuring Growth in Latent Skills.","authors":"Qi Helen Huang, Daniel M Bolt","doi":"10.1177/00131644221117194","DOIUrl":"10.1177/00131644221117194","url":null,"abstract":"<p><p>Previous studies have demonstrated evidence of latent skill continuity even in tests intentionally designed for measurement of binary skills. In addition, the assumption of binary skills when continuity is present has been shown to potentially create a lack of invariance in item and latent ability parameters that may undermine applications. In this article, we examine measurement of growth as one such application, and consider multidimensional item response theory (MIRT) as a competing alternative. Motivated by prior findings concerning the effects of skill continuity, we study the relative robustness of cognitive diagnostic models (CDMs) and (M)IRT models in the measurement of growth under both binary and continuous latent skill distributions. We find CDMs to be a less robust way of quantifying growth under misspecification, and subsequently provide a real-data example suggesting underestimation of growth as a likely consequence. It is suggested that researchers should regularly attend to the assumptions associated with the use of latent binary skills and consider (M)IRT as a potentially more robust alternative if unsure of their discrete nature.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311955/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9747520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Educational and Psychological Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1