首页 > 最新文献

Educational and Psychological Measurement最新文献

英文 中文
Obtaining a Bayesian Estimate of Coefficient Alpha Using a Posterior Normal Distribution. 利用后验正态分布获得系数Alpha的贝叶斯估计。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-01-31 eCollection Date: 2025-08-01 DOI: 10.1177/00131644241311877
John Mart V DelosReyes, Miguel A Padilla

A new alternative to obtain a Bayesian estimate of coefficient alpha through a posterior normal distribution is proposed and assessed through percentile, normal-theory-based, and highest probability density credible intervals in a simulation study. The results indicate that the proposed Bayesian method to estimate coefficient alpha has acceptable coverage probability performance across the majority of investigated simulation conditions.

提出了一种通过后验正态分布获得alpha系数贝叶斯估计的新方法,并在模拟研究中通过百分位数、基于正态理论和最高概率密度可信区间进行了评估。结果表明,所提出的估计系数alpha的贝叶斯方法在大多数研究的模拟条件下具有可接受的覆盖概率性能。
{"title":"Obtaining a Bayesian Estimate of Coefficient Alpha Using a Posterior Normal Distribution.","authors":"John Mart V DelosReyes, Miguel A Padilla","doi":"10.1177/00131644241311877","DOIUrl":"10.1177/00131644241311877","url":null,"abstract":"<p><p>A new alternative to obtain a Bayesian estimate of coefficient alpha through a posterior normal distribution is proposed and assessed through percentile, normal-theory-based, and highest probability density credible intervals in a simulation study. The results indicate that the proposed Bayesian method to estimate coefficient alpha has acceptable coverage probability performance across the majority of investigated simulation conditions.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"829-852"},"PeriodicalIF":2.3,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11786261/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143079164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Examining the Instructional Sensitivity of Constructed-Response Achievement Test Item Scores. 建构反应成就测验项目分数的教学敏感性检验。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-01-30 eCollection Date: 2025-10-01 DOI: 10.1177/00131644241313212
Anne Traynor, Cheng-Hsien Li, Shuqi Zhou

Inferences about student learning from large-scale achievement test scores are fundamental in education. For achievement test scores to provide useful information about student learning progress, differences in the content of instruction (i.e., the implemented curriculum) should affect test-takers' item responses. Existing research has begun to identify patterns in the content of instructionally sensitive multiple-choice achievement test items. To inform future test design decisions, this study identified instructionally (in)sensitive constructed-response achievement items, then characterized features of those items and their corresponding scoring rubrics. First, we used simulation to evaluate an item step difficulty difference index for constructed-response test items, derived from the generalized partial credit model. The statistical performance of the index was adequate, so we then applied it to data from 32 constructed-response eighth-grade science test items. We found that the instructional sensitivity (IS) index values varied appreciably across the category boundaries within an item as well as across items. Content analysis by master science teachers allowed us to identify general features of item score categories that show high, or negligible, IS.

从大规模成绩测试成绩中推断学生的学习情况是教育的基础。为了让成绩测试分数提供有关学生学习进展的有用信息,教学内容(即实施的课程)的差异应该影响考生对项目的反应。现有的研究已经开始确定教学敏感性选择题内容的模式。为了给未来的测试设计决策提供信息,本研究确定了具有指导意义的(in)敏感的构式反应成就项目,然后描述了这些项目的特征及其相应的评分标准。首先,我们用模拟的方法评估了一个由广义部分信用模型衍生出来的建构反应测试项目的项目步骤难度差指数。该指标的统计性能是足够的,因此我们将其应用于32个建构反应的八年级科学测试项目的数据。我们发现教学敏感性(IS)指数值在一个项目内以及跨项目的类别边界上有明显的变化。科学硕士教师的内容分析使我们能够确定项目得分类别的一般特征,这些特征显示出高IS或可忽略的IS。
{"title":"Examining the Instructional Sensitivity of Constructed-Response Achievement Test Item Scores.","authors":"Anne Traynor, Cheng-Hsien Li, Shuqi Zhou","doi":"10.1177/00131644241313212","DOIUrl":"10.1177/00131644241313212","url":null,"abstract":"<p><p>Inferences about student learning from large-scale achievement test scores are fundamental in education. For achievement test scores to provide useful information about student learning progress, differences in the content of instruction (i.e., the implemented curriculum) should affect test-takers' item responses. Existing research has begun to identify patterns in the content of instructionally sensitive multiple-choice achievement test items. To inform future test design decisions, this study identified instructionally (in)sensitive constructed-response achievement items, then characterized features of those items and their corresponding scoring rubrics. First, we used simulation to evaluate an item step difficulty difference index for constructed-response test items, derived from the generalized partial credit model. The statistical performance of the index was adequate, so we then applied it to data from 32 constructed-response eighth-grade science test items. We found that the instructional sensitivity (IS) index values varied appreciably across the category boundaries within an item as well as across items. Content analysis by master science teachers allowed us to identify general features of item score categories that show high, or negligible, IS.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"1000-1031"},"PeriodicalIF":2.3,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11783420/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143079163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Impact of Attentiveness Interventions on Survey Data. 注意力干预对调查数据的影响。
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-01-29 DOI: 10.1177/00131644241311851
Christie M Fuller, Marcia J Simmering, Brian Waterwall, Elizabeth Ragland, Douglas P Twitchell, Alison Wall

Social and behavioral science researchers who use survey data are vigilant about data quality, with an increasing emphasis on avoiding common method variance (CMV) and insufficient effort responding (IER). Each of these errors can inflate and deflate substantive relationships, and there are both a priori and post hoc means to address them. Yet, little research has investigated how both IER and CMV are affected with the use of these different procedural or statistical techniques used to address them. More specifically, if interventions to reduce IER are used, does this affect CMV in data? In an experiment conducted both in and out of the laboratory, we investigate the impact of attentiveness interventions, such as a Factual Manipulation Check (FMC) on both IER and CMV in same-source survey data. In addition to typical IER measures, we also track whether respondents play the instructional video and their mouse movement. The results show that while interventions have some impact on the level of participant attentiveness, these interventions do not appear to lead to differing levels of CMV.

使用调查数据的社会和行为科学研究人员对数据质量保持警惕,越来越重视避免共同方法方差(CMV)和不充分努力响应(IER)。这些错误中的每一个都可能使实质性关系膨胀或缩小,并且有先验和事后的方法来解决它们。然而,很少有研究调查使用这些不同的程序或统计技术来解决IER和CMV是如何受到影响的。更具体地说,如果使用干预措施来减少IER,这是否会影响数据中的CMV ?在实验室内外进行的一项实验中,我们调查了注意力干预的影响,如事实操纵检查(FMC)对同一来源调查数据中的IER和CMV。除了典型的IER测量,我们还跟踪受访者是否播放教学视频和他们的鼠标移动。结果表明,虽然干预措施对参与者的注意力水平有一定影响,但这些干预措施似乎不会导致CMV的不同水平。
{"title":"The Impact of Attentiveness Interventions on Survey Data.","authors":"Christie M Fuller, Marcia J Simmering, Brian Waterwall, Elizabeth Ragland, Douglas P Twitchell, Alison Wall","doi":"10.1177/00131644241311851","DOIUrl":"10.1177/00131644241311851","url":null,"abstract":"<p><p>Social and behavioral science researchers who use survey data are vigilant about data quality, with an increasing emphasis on avoiding common method variance (CMV) and insufficient effort responding (IER). Each of these errors can inflate and deflate substantive relationships, and there are both a priori and post hoc means to address them. Yet, little research has investigated how both IER and CMV are affected with the use of these different procedural or statistical techniques used to address them. More specifically, if interventions to reduce IER are used, does this affect CMV in data? In an experiment conducted both in and out of the laboratory, we investigate the impact of attentiveness interventions, such as a Factual Manipulation Check (FMC) on both IER and CMV in same-source survey data. In addition to typical IER measures, we also track whether respondents play the instructional video and their mouse movement. The results show that while interventions have some impact on the level of participant attentiveness, these interventions do not appear to lead to differing levels of CMV.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644241311851"},"PeriodicalIF":2.1,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11775934/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143064490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
"What If Applicants Fake Their Responses?": Modeling Faking and Response Styles in High-Stakes Assessments Using the Multidimensional Nominal Response Model. “如果求职者的回答是假的怎么办?”运用多维标称反应模型对高风险评估中的虚假和反应风格进行建模。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-01-23 eCollection Date: 2025-08-01 DOI: 10.1177/00131644241307560
Timo Seitz, Maik Spengler, Thorsten Meiser

Self-report personality tests used in high-stakes assessments hold the risk that test-takers engage in faking. In this article, we demonstrate an extension of the multidimensional nominal response model (MNRM) to account for the response bias of faking. The MNRM is a flexible item response theory (IRT) model that allows modeling response biases whose effect patterns vary between items. In a simulation, we found good parameter recovery of the model accounting for faking under different conditions as well as good performance of model selection criteria. Also, we modeled responses from N = 3,046 job applicants taking a personality test under real high-stakes conditions. We thereby specified item-specific effect patterns of faking by setting scoring weights to appropriate values that we collected in a pilot study. Results indicated that modeling faking significantly increased model fit over and above response styles and improved divergent validity, while the faking dimension exhibited relations to several covariates. Additionally, applying the model to a sample of job incumbents taking the test under low-stakes conditions, we found evidence that the model can effectively capture faking and adjust estimates of substantive trait scores for the assumed influence of faking. We end the article with a discussion of implications for psychological measurement in high-stakes assessment contexts.

在高风险评估中使用的自我报告型人格测试有考生作假的风险。在本文中,我们展示了多维名义反应模型(MNRM)的扩展,以解释虚假的反应偏差。MNRM是一个灵活的项目反应理论(IRT)模型,它允许对不同项目的反应偏差进行建模。在仿真中,我们发现在不同条件下模型的参数恢复良好,模型选择标准的性能也很好。此外,我们还模拟了N = 3046名求职者在真实高风险条件下进行性格测试的反应。因此,我们通过将得分权重设置为我们在试点研究中收集的适当值来指定特定项目的伪造效果模式。结果表明,伪造维度与多个协变量之间存在一定的关系,伪造维度显著提高了模型拟合度和发散效度。此外,将该模型应用于在低风险条件下参加测试的在职人员样本,我们发现有证据表明,该模型可以有效地捕捉虚假行为,并根据虚假行为的假设影响调整对实质性特质得分的估计。我们以讨论高风险评估情境下心理测量的含义来结束文章。
{"title":"\"What If Applicants Fake Their Responses?\": Modeling Faking and Response Styles in High-Stakes Assessments Using the Multidimensional Nominal Response Model.","authors":"Timo Seitz, Maik Spengler, Thorsten Meiser","doi":"10.1177/00131644241307560","DOIUrl":"10.1177/00131644241307560","url":null,"abstract":"<p><p>Self-report personality tests used in high-stakes assessments hold the risk that test-takers engage in faking. In this article, we demonstrate an extension of the multidimensional nominal response model (MNRM) to account for the response bias of faking. The MNRM is a flexible item response theory (IRT) model that allows modeling response biases whose effect patterns vary between items. In a simulation, we found good parameter recovery of the model accounting for faking under different conditions as well as good performance of model selection criteria. Also, we modeled responses from <i>N</i> = 3,046 job applicants taking a personality test under real high-stakes conditions. We thereby specified item-specific effect patterns of faking by setting scoring weights to appropriate values that we collected in a pilot study. Results indicated that modeling faking significantly increased model fit over and above response styles and improved divergent validity, while the faking dimension exhibited relations to several covariates. Additionally, applying the model to a sample of job incumbents taking the test under low-stakes conditions, we found evidence that the model can effectively capture faking and adjust estimates of substantive trait scores for the assumed influence of faking. We end the article with a discussion of implications for psychological measurement in high-stakes assessment contexts.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"747-782"},"PeriodicalIF":2.3,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11755426/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143045425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Comparison of the Next Eigenvalue Sufficiency Test to Other Stopping Rules for the Number of Factors in Factor Analysis. 因子分析中下一特征值充分性检验与其他停止规则的比较
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-01-22 eCollection Date: 2025-08-01 DOI: 10.1177/00131644241308528
Pier-Olivier Caron

A plethora of techniques exist to determine the number of factors to retain in exploratory factor analysis. A recent and promising technique is the Next Eigenvalue Sufficiency Test (NEST), but has not been systematically compared with well-established stopping rules. The present study proposes a simulation with synthetic factor structures to compare NEST, parallel analysis, sequential χ 2 test, Hull method, and the empirical Kaiser criterion. The structures were based on 24 variables containing one to eight factors, loadings ranged from .40 to .80, inter-factor correlations ranged from .00 to .30, and three sample sizes were used. In total, 360 scenarios were replicated 1,000 times. Performance was evaluated in terms of accuracy (correct identification of dimensionality) and bias (tendency to over- or underestimate dimensionality). Overall, NEST showed the best overall performances, especially in hard conditions where it had to detect small but meaningful factors. It had a tendency to underextract, but to a lesser extent than other methods. The second best method was parallel analysis by being more liberal in harder cases. The three other stopping rules had pitfalls: sequential χ 2 test and Hull method even in some easy conditions; the empirical Kaiser criterion in hard conditions.

在探索性因子分析中,存在大量的技术来确定要保留的因素数量。下一个特征值充分性检验(NEST)是一种最新的、有前途的技术,但尚未与已建立的停止规则进行系统比较。本研究提出了一种综合因子结构的模拟方法来比较NEST、并行分析、序列χ 2检验、Hull方法和经验Kaiser准则。结构基于24个变量,包含1至8个因子,载荷范围为0.40至0.80,因子间相关性范围为0.00至0.30,使用了三种样本量。总共360个场景被复制了1000次。根据准确性(正确识别维度)和偏差(倾向于高估或低估维度)来评估性能。总的来说,NEST表现出了最好的综合性能,特别是在必须检测小但有意义的因素的困难条件下。它有提取不足的趋势,但程度低于其他方法。第二种最好的方法是并行分析,在更困难的情况下更自由。其他三种停止规则存在缺陷:即使在一些简单的条件下,顺序χ 2检验和赫尔法也存在缺陷;在艰苦条件下的经验凯撒标准。
{"title":"A Comparison of the Next Eigenvalue Sufficiency Test to Other Stopping Rules for the Number of Factors in Factor Analysis.","authors":"Pier-Olivier Caron","doi":"10.1177/00131644241308528","DOIUrl":"10.1177/00131644241308528","url":null,"abstract":"<p><p>A plethora of techniques exist to determine the number of factors to retain in exploratory factor analysis. A recent and promising technique is the Next Eigenvalue Sufficiency Test (NEST), but has not been systematically compared with well-established stopping rules. The present study proposes a simulation with synthetic factor structures to compare NEST, parallel analysis, sequential <math> <mrow> <msup><mrow><mi>χ</mi></mrow> <mrow><mn>2</mn></mrow> </msup> </mrow> </math> test, Hull method, and the empirical Kaiser criterion. The structures were based on 24 variables containing one to eight factors, loadings ranged from .40 to .80, inter-factor correlations ranged from .00 to .30, and three sample sizes were used. In total, 360 scenarios were replicated 1,000 times. Performance was evaluated in terms of accuracy (correct identification of dimensionality) and bias (tendency to over- or underestimate dimensionality). Overall, NEST showed the best overall performances, especially in hard conditions where it had to detect small but meaningful factors. It had a tendency to underextract, but to a lesser extent than other methods. The second best method was parallel analysis by being more liberal in harder cases. The three other stopping rules had pitfalls: sequential <math> <mrow> <msup><mrow><mi>χ</mi></mrow> <mrow><mn>2</mn></mrow> </msup> </mrow> </math> test and Hull method even in some easy conditions; the empirical Kaiser criterion in hard conditions.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"814-828"},"PeriodicalIF":2.3,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11755425/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143045428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Factor Retention in Exploratory Multidimensional Item Response Theory. 探索性多维项目反应理论中的因素保留。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-01-04 eCollection Date: 2025-08-01 DOI: 10.1177/00131644241306680
Changsheng Chen, Robbe D'hondt, Celine Vens, Wim Van den Noortgate

Multidimensional Item Response Theory (MIRT) is applied routinely in developing educational and psychological assessment tools, for instance, for exploring multidimensional structures of items using exploratory MIRT. A critical decision in exploratory MIRT analyses is the number of factors to retain. Unfortunately, the comparative properties of statistical methods and innovative Machine Learning (ML) methods for factor retention in exploratory MIRT analyses are still not clear. This study aims to fill this gap by comparing a selection of statistical and ML methods, including Kaiser Criterion (KC), Empirical Kaiser Criterion (EKC), Parallel Analysis (PA), scree plot (OC and AF), Very Simple Structure (VSS; C1 and C2), Minimum Average Partial (MAP), Exploratory Graph Analysis (EGA), Random Forest (RF), Histogram-based Gradient Boosted Decision Trees (HistGBDT), eXtreme Gradient Boosting (XGBoost), and Artificial Neural Network (ANN). The comparison was performed using 720,000 dichotomous response data sets simulated by the MIRT, for various between-item and within-item structures and considering characteristics of large-scale assessments. The results show that MAP, RF, HistGBDT, XGBoost, and ANN tremendously outperform other methods. Among them, HistGBDT generally performs better than other methods. Furthermore, including statistical methods' results as training features improves ML methods' performance. The methods' correct-factoring proportions decrease with an increase in missingness or a decrease in sample size. KC, PA, EKC, and scree plot (OC) are over-factoring, while EGA, scree plot (AF), and VSS (C1) are under-factoring. We recommend that practitioners use both MAP and HistGBDT to determine the number of factors when applying exploratory MIRT.

多维项目反应理论(MIRT)在教育和心理评估工具的开发中得到了常规应用,例如,利用探索性MIRT来探索项目的多维结构。探索性MIRT分析的一个关键决定是保留的因素数量。不幸的是,在探索性MIRT分析中,统计方法和创新的机器学习(ML)方法在因子保留方面的比较特性仍然不清楚。本研究旨在通过比较统计和ML方法的选择来填补这一空白,包括Kaiser标准(KC),经验Kaiser标准(EKC),平行分析(PA),屏幕图(OC和AF),非常简单结构(VSS);C1和C2),最小平均偏(MAP),探索性图分析(EGA),随机森林(RF),基于直方图的梯度增强决策树(HistGBDT),极端梯度增强(XGBoost)和人工神经网络(ANN)。比较是使用由MIRT模拟的72万个二分反应数据集进行的,用于各种项目间和项目内结构,并考虑大规模评估的特点。结果表明,MAP、RF、HistGBDT、XGBoost和ANN大大优于其他方法。其中,HistGBDT的性能普遍优于其他方法。此外,将统计方法的结果作为训练特征可以提高机器学习方法的性能。这些方法的正确因子比例随着缺失量的增加或样本量的减少而降低。KC、PA、EKC和屏幕图(OC)是过度保理,而EGA、屏幕图(AF)和VSS (C1)是欠保理。我们建议从业者在应用探索性MIRT时同时使用MAP和HistGBDT来确定因素的数量。
{"title":"Factor Retention in Exploratory Multidimensional Item Response Theory.","authors":"Changsheng Chen, Robbe D'hondt, Celine Vens, Wim Van den Noortgate","doi":"10.1177/00131644241306680","DOIUrl":"10.1177/00131644241306680","url":null,"abstract":"<p><p>Multidimensional Item Response Theory (MIRT) is applied routinely in developing educational and psychological assessment tools, for instance, for exploring multidimensional structures of items using exploratory MIRT. A critical decision in exploratory MIRT analyses is the number of factors to retain. Unfortunately, the comparative properties of statistical methods and innovative Machine Learning (ML) methods for factor retention in exploratory MIRT analyses are still not clear. This study aims to fill this gap by comparing a selection of statistical and ML methods, including Kaiser Criterion (KC), Empirical Kaiser Criterion (EKC), Parallel Analysis (PA), scree plot (OC and AF), Very Simple Structure (VSS; C1 and C2), Minimum Average Partial (MAP), Exploratory Graph Analysis (EGA), Random Forest (RF), Histogram-based Gradient Boosted Decision Trees (HistGBDT), eXtreme Gradient Boosting (XGBoost), and Artificial Neural Network (ANN). The comparison was performed using 720,000 dichotomous response data sets simulated by the MIRT, for various between-item and within-item structures and considering characteristics of large-scale assessments. The results show that MAP, RF, HistGBDT, XGBoost, and ANN tremendously outperform other methods. Among them, HistGBDT generally performs better than other methods. Furthermore, including statistical methods' results as training features improves ML methods' performance. The methods' correct-factoring proportions decrease with an increase in missingness or a decrease in sample size. KC, PA, EKC, and scree plot (OC) are over-factoring, while EGA, scree plot (AF), and VSS (C1) are under-factoring. We recommend that practitioners use both MAP and HistGBDT to determine the number of factors when applying exploratory MIRT.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"672-695"},"PeriodicalIF":2.3,"publicationDate":"2025-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11699551/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142931009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Examination of ChatGPT's Performance as a Data Analysis Tool. ChatGPT作为数据分析工具的性能检验。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-01-03 eCollection Date: 2025-08-01 DOI: 10.1177/00131644241302721
Duygu Koçak

This study examines the performance of ChatGPT, developed by OpenAI and widely used as an AI-based conversational tool, as a data analysis tool through exploratory factor analysis (EFA). To this end, simulated data were generated under various data conditions, including normal distribution, response category, sample size, test length, factor loading, and measurement models. The generated data were analyzed using ChatGPT-4o twice with a 1-week interval under the same prompt, and the results were compared with those obtained using R code. In data analysis, the Kaiser-Meyer-Olkin (KMO) value, total variance explained, and the number of factors estimated using the empirical Kaiser criterion, Hull method, and Kaiser-Guttman criterion, as well as factor loadings, were calculated. The findings obtained from ChatGPT at two different times were found to be consistent with those obtained using R. Overall, ChatGPT demonstrated good performance for steps that require only computational decisions without involving researcher judgment or theoretical evaluation (such as KMO, total variance explained, and factor loadings). However, for multidimensional structures, although the estimated number of factors was consistent across analyses, biases were observed, suggesting that researchers should exercise caution in such decisions.

本研究通过探索性因素分析(EFA)对OpenAI开发的ChatGPT作为数据分析工具的性能进行了研究,ChatGPT是一种广泛使用的基于人工智能的会话工具。为此,在各种数据条件下生成模拟数据,包括正态分布、响应类别、样本量、试验长度、因子载荷和测量模型。在相同的提示下,使用chatgpt - 40对生成的数据进行两次分析,每隔一周进行一次,并与使用R代码获得的结果进行比较。在数据分析中,计算了Kaiser- meyer - olkin (KMO)值、解释的总方差、使用经验Kaiser准则、Hull方法和Kaiser- guttman准则估计的因子数量以及因子负荷。在两个不同的时间从ChatGPT获得的结果被发现与使用r获得的结果一致。总的来说,ChatGPT在只需要计算决策而不涉及研究人员判断或理论评估(如KMO,总方差解释和因子负载)的步骤中表现出良好的性能。然而,对于多维结构,尽管在分析中估计的因素数量是一致的,但仍观察到偏差,这表明研究人员在做出此类决定时应谨慎行事。
{"title":"Examination of ChatGPT's Performance as a Data Analysis Tool.","authors":"Duygu Koçak","doi":"10.1177/00131644241302721","DOIUrl":"10.1177/00131644241302721","url":null,"abstract":"<p><p>This study examines the performance of ChatGPT, developed by OpenAI and widely used as an AI-based conversational tool, as a data analysis tool through exploratory factor analysis (EFA). To this end, simulated data were generated under various data conditions, including normal distribution, response category, sample size, test length, factor loading, and measurement models. The generated data were analyzed using ChatGPT-4o twice with a 1-week interval under the same prompt, and the results were compared with those obtained using R code. In data analysis, the Kaiser-Meyer-Olkin (KMO) value, total variance explained, and the number of factors estimated using the empirical Kaiser criterion, Hull method, and Kaiser-Guttman criterion, as well as factor loadings, were calculated. The findings obtained from ChatGPT at two different times were found to be consistent with those obtained using R. Overall, ChatGPT demonstrated good performance for steps that require only computational decisions without involving researcher judgment or theoretical evaluation (such as KMO, total variance explained, and factor loadings). However, for multidimensional structures, although the estimated number of factors was consistent across analyses, biases were observed, suggesting that researchers should exercise caution in such decisions.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"641-671"},"PeriodicalIF":2.3,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11696938/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142931005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring the Evidence to Interpret Differential Item Functioning via Response Process Data. 通过反应过程数据探索解释差异项目功能的证据。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-11-29 eCollection Date: 2025-08-01 DOI: 10.1177/00131644241298975
Ziying Li, Jinnie Shin, Huan Kuang, A Corinne Huggins-Manley

Evaluating differential item functioning (DIF) in assessments plays an important role in achieving measurement fairness across different subgroups, such as gender and native language. However, relying solely on the item response scores among traditional DIF techniques poses challenges for researchers and practitioners in interpreting DIF. Recently, response process data, which carry valuable information about examinees' response behaviors, offer an opportunity to further interpret DIF items by examining differences in response processes. This study aims to investigate the potential of response process data features in improving the interpretability of DIF items, with a focus on gender DIF using data from the Programme for International Assessment of Adult Competencies (PIAAC) 2012 computer-based numeracy assessment. We applied random forest and logistic regression with ridge regularization to investigate the association between process data features and DIF items, evaluating the important features to interpret DIF. In addition, we evaluated model performance across varying percentages of DIF items to reflect practical scenarios with different percentages of DIF items. The results demonstrate that the combination of timing features and action-sequence features is informative to reveal the response process differences between groups, thereby enhancing DIF item interpretability. Overall, this study introduces a feasible procedure to leverage response process data to understand and interpret DIF items, shedding light on potential reasons for the low agreement between DIF statistics and expert reviews and revealing potential irrelevant factors to enhance measurement equity.

在评估中评估差异项目功能(DIF)在实现跨不同亚组(如性别和母语)的测量公平方面发挥着重要作用。然而,在传统的DIF技术中,仅仅依靠项目反应分数对DIF的解释给研究者和实践者带来了挑战。近年来,反应过程数据提供了有关考生反应行为的宝贵信息,通过检查反应过程的差异,为进一步解释DIF项目提供了机会。本研究旨在探讨反应过程数据特征在提高DIF项目可解释性方面的潜力,重点关注性别DIF,使用的数据来自2012年国际成人能力评估项目(PIAAC)基于计算机的计算能力评估。我们应用随机森林和逻辑回归与岭正则化来研究过程数据特征与DIF项目之间的关联,评估解释DIF的重要特征。此外,我们评估了不同百分比的DIF项目的模型性能,以反映具有不同百分比的DIF项目的实际场景。结果表明,时序特征和动作序列特征的结合能够有效地揭示群体间的反应过程差异,从而增强了DIF项目的可解释性。总体而言,本研究引入了一种可行的程序来利用反应过程数据来理解和解释DIF项目,揭示了DIF统计数据与专家评论之间一致性低的潜在原因,并揭示了潜在的不相关因素,以增强测量公平性。
{"title":"Exploring the Evidence to Interpret Differential Item Functioning via Response Process Data.","authors":"Ziying Li, Jinnie Shin, Huan Kuang, A Corinne Huggins-Manley","doi":"10.1177/00131644241298975","DOIUrl":"10.1177/00131644241298975","url":null,"abstract":"<p><p>Evaluating differential item functioning (DIF) in assessments plays an important role in achieving measurement fairness across different subgroups, such as gender and native language. However, relying solely on the item response scores among traditional DIF techniques poses challenges for researchers and practitioners in interpreting DIF. Recently, response process data, which carry valuable information about examinees' response behaviors, offer an opportunity to further interpret DIF items by examining differences in response processes. This study aims to investigate the potential of response process data features in improving the interpretability of DIF items, with a focus on gender DIF using data from the Programme for International Assessment of Adult Competencies (PIAAC) 2012 computer-based numeracy assessment. We applied random forest and logistic regression with ridge regularization to investigate the association between process data features and DIF items, evaluating the important features to interpret DIF. In addition, we evaluated model performance across varying percentages of DIF items to reflect practical scenarios with different percentages of DIF items. The results demonstrate that the combination of timing features and action-sequence features is informative to reveal the response process differences between groups, thereby enhancing DIF item interpretability. Overall, this study introduces a feasible procedure to leverage response process data to understand and interpret DIF items, shedding light on potential reasons for the low agreement between DIF statistics and expert reviews and revealing potential irrelevant factors to enhance measurement equity.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"783-813"},"PeriodicalIF":2.3,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11607718/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142767507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On Latent Structure Examination of Behavioral Measuring Instruments in Complex Empirical Settings. 论复杂实证环境中行为测量工具的潜在结构检查。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-10-07 eCollection Date: 2025-10-01 DOI: 10.1177/00131644241281049
Tenko Raykov, Khaled Alkherainej

A multiple-step procedure is outlined that can be used for examining the latent structure of behavior measurement instruments in complex empirical settings. The method permits one to study their latent structure after assessing the need to account for clustering effects and the necessity of its examination within individual levels of fixed factors, such as gender or group membership of substantive relevance. The approach is readily applicable with binary or binary-scored items using popular and widely available software. The described procedure is illustrated with empirical data from a student behavior screening instrument.

本文概述了一种多步骤程序,可用于在复杂的实证环境中研究行为测量工具的潜在结构。在评估是否需要考虑聚类效应以及是否有必要在固定因素(如性别或具有实质性相关性的群体成员资格)的个体水平上对其进行检查之后,该方法允许人们对其潜在结构进行研究。这种方法很容易使用流行且广泛可用的软件来处理二元或二元评分项目。本文以一个学生行为筛查工具的经验数据来说明所述程序。
{"title":"On Latent Structure Examination of Behavioral Measuring Instruments in Complex Empirical Settings.","authors":"Tenko Raykov, Khaled Alkherainej","doi":"10.1177/00131644241281049","DOIUrl":"10.1177/00131644241281049","url":null,"abstract":"<p><p>A multiple-step procedure is outlined that can be used for examining the latent structure of behavior measurement instruments in complex empirical settings. The method permits one to study their latent structure after assessing the need to account for clustering effects and the necessity of its examination within individual levels of fixed factors, such as gender or group membership of substantive relevance. The approach is readily applicable with binary or binary-scored items using popular and widely available software. The described procedure is illustrated with empirical data from a student behavior screening instrument.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"983-999"},"PeriodicalIF":2.3,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562891/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142647680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Benefits of Using Maximal Reliability in Educational and Behavioral Research. 论在教育和行为研究中使用最大信度的好处。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-10-01 Epub Date: 2023-12-27 DOI: 10.1177/00131644231215771
Tenko Raykov

This note is concerned with the benefits that can result from the use of the maximal reliability and optimal linear combination concepts in educational and psychological research. Within the widely used framework of unidimensional multi-component measuring instruments, it is demonstrated that the linear combination of their components that possesses the highest possible reliability can exhibit a level of consistency considerably exceeding that of their overall sum score that is nearly routinely employed in contemporary empirical research. This optimal linear combination can be particularly useful in circumstances where one or more scale components are associated with relatively large error variances, but their removal from the instrument can lead to a notable loss in validity due to construct underrepresentation. The discussion is illustrated with a numerical example.

本说明涉及在教育和心理学研究中使用最大信度和最佳线性组合概念的好处。在广泛使用的单维度多成分测量工具的框架内,研究表明,具有最高信度的各成分线性组合所表现出的一致性水平可以大大超过其总分的一致性水平,而后者几乎是当代实证研究中经常使用的。当一个或多个量表成分与相对较大的误差方差相关联时,这种最佳线性组合就显得尤为有用,但如果将其从工具中去除,则会因建构的代表性不足而导致效度的显著降低。本讨论将通过一个数字示例进行说明。
{"title":"On the Benefits of Using Maximal Reliability in Educational and Behavioral Research.","authors":"Tenko Raykov","doi":"10.1177/00131644231215771","DOIUrl":"10.1177/00131644231215771","url":null,"abstract":"<p><p>This note is concerned with the benefits that can result from the use of the maximal reliability and optimal linear combination concepts in educational and psychological research. Within the widely used framework of unidimensional multi-component measuring instruments, it is demonstrated that the linear combination of their components that possesses the highest possible reliability can exhibit a level of consistency considerably exceeding that of their overall sum score that is nearly routinely employed in contemporary empirical research. This optimal linear combination can be particularly useful in circumstances where one or more scale components are associated with relatively large error variances, but their removal from the instrument can lead to a notable loss in validity due to construct underrepresentation. The discussion is illustrated with a numerical example.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"84 5","pages":"994-1011"},"PeriodicalIF":2.3,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11418609/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142336340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Educational and Psychological Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1