首页 > 最新文献

Educational and Psychological Measurement最新文献

英文 中文
Exploring the Evidence to Interpret Differential Item Functioning via Response Process Data. 通过反应过程数据探索解释差异项目功能的证据。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-11-29 eCollection Date: 2025-08-01 DOI: 10.1177/00131644241298975
Ziying Li, Jinnie Shin, Huan Kuang, A Corinne Huggins-Manley

Evaluating differential item functioning (DIF) in assessments plays an important role in achieving measurement fairness across different subgroups, such as gender and native language. However, relying solely on the item response scores among traditional DIF techniques poses challenges for researchers and practitioners in interpreting DIF. Recently, response process data, which carry valuable information about examinees' response behaviors, offer an opportunity to further interpret DIF items by examining differences in response processes. This study aims to investigate the potential of response process data features in improving the interpretability of DIF items, with a focus on gender DIF using data from the Programme for International Assessment of Adult Competencies (PIAAC) 2012 computer-based numeracy assessment. We applied random forest and logistic regression with ridge regularization to investigate the association between process data features and DIF items, evaluating the important features to interpret DIF. In addition, we evaluated model performance across varying percentages of DIF items to reflect practical scenarios with different percentages of DIF items. The results demonstrate that the combination of timing features and action-sequence features is informative to reveal the response process differences between groups, thereby enhancing DIF item interpretability. Overall, this study introduces a feasible procedure to leverage response process data to understand and interpret DIF items, shedding light on potential reasons for the low agreement between DIF statistics and expert reviews and revealing potential irrelevant factors to enhance measurement equity.

在评估中评估差异项目功能(DIF)在实现跨不同亚组(如性别和母语)的测量公平方面发挥着重要作用。然而,在传统的DIF技术中,仅仅依靠项目反应分数对DIF的解释给研究者和实践者带来了挑战。近年来,反应过程数据提供了有关考生反应行为的宝贵信息,通过检查反应过程的差异,为进一步解释DIF项目提供了机会。本研究旨在探讨反应过程数据特征在提高DIF项目可解释性方面的潜力,重点关注性别DIF,使用的数据来自2012年国际成人能力评估项目(PIAAC)基于计算机的计算能力评估。我们应用随机森林和逻辑回归与岭正则化来研究过程数据特征与DIF项目之间的关联,评估解释DIF的重要特征。此外,我们评估了不同百分比的DIF项目的模型性能,以反映具有不同百分比的DIF项目的实际场景。结果表明,时序特征和动作序列特征的结合能够有效地揭示群体间的反应过程差异,从而增强了DIF项目的可解释性。总体而言,本研究引入了一种可行的程序来利用反应过程数据来理解和解释DIF项目,揭示了DIF统计数据与专家评论之间一致性低的潜在原因,并揭示了潜在的不相关因素,以增强测量公平性。
{"title":"Exploring the Evidence to Interpret Differential Item Functioning via Response Process Data.","authors":"Ziying Li, Jinnie Shin, Huan Kuang, A Corinne Huggins-Manley","doi":"10.1177/00131644241298975","DOIUrl":"10.1177/00131644241298975","url":null,"abstract":"<p><p>Evaluating differential item functioning (DIF) in assessments plays an important role in achieving measurement fairness across different subgroups, such as gender and native language. However, relying solely on the item response scores among traditional DIF techniques poses challenges for researchers and practitioners in interpreting DIF. Recently, response process data, which carry valuable information about examinees' response behaviors, offer an opportunity to further interpret DIF items by examining differences in response processes. This study aims to investigate the potential of response process data features in improving the interpretability of DIF items, with a focus on gender DIF using data from the Programme for International Assessment of Adult Competencies (PIAAC) 2012 computer-based numeracy assessment. We applied random forest and logistic regression with ridge regularization to investigate the association between process data features and DIF items, evaluating the important features to interpret DIF. In addition, we evaluated model performance across varying percentages of DIF items to reflect practical scenarios with different percentages of DIF items. The results demonstrate that the combination of timing features and action-sequence features is informative to reveal the response process differences between groups, thereby enhancing DIF item interpretability. Overall, this study introduces a feasible procedure to leverage response process data to understand and interpret DIF items, shedding light on potential reasons for the low agreement between DIF statistics and expert reviews and revealing potential irrelevant factors to enhance measurement equity.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"783-813"},"PeriodicalIF":2.3,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11607718/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142767507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On Latent Structure Examination of Behavioral Measuring Instruments in Complex Empirical Settings. 论复杂实证环境中行为测量工具的潜在结构检查。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-10-07 eCollection Date: 2025-10-01 DOI: 10.1177/00131644241281049
Tenko Raykov, Khaled Alkherainej

A multiple-step procedure is outlined that can be used for examining the latent structure of behavior measurement instruments in complex empirical settings. The method permits one to study their latent structure after assessing the need to account for clustering effects and the necessity of its examination within individual levels of fixed factors, such as gender or group membership of substantive relevance. The approach is readily applicable with binary or binary-scored items using popular and widely available software. The described procedure is illustrated with empirical data from a student behavior screening instrument.

本文概述了一种多步骤程序,可用于在复杂的实证环境中研究行为测量工具的潜在结构。在评估是否需要考虑聚类效应以及是否有必要在固定因素(如性别或具有实质性相关性的群体成员资格)的个体水平上对其进行检查之后,该方法允许人们对其潜在结构进行研究。这种方法很容易使用流行且广泛可用的软件来处理二元或二元评分项目。本文以一个学生行为筛查工具的经验数据来说明所述程序。
{"title":"On Latent Structure Examination of Behavioral Measuring Instruments in Complex Empirical Settings.","authors":"Tenko Raykov, Khaled Alkherainej","doi":"10.1177/00131644241281049","DOIUrl":"10.1177/00131644241281049","url":null,"abstract":"<p><p>A multiple-step procedure is outlined that can be used for examining the latent structure of behavior measurement instruments in complex empirical settings. The method permits one to study their latent structure after assessing the need to account for clustering effects and the necessity of its examination within individual levels of fixed factors, such as gender or group membership of substantive relevance. The approach is readily applicable with binary or binary-scored items using popular and widely available software. The described procedure is illustrated with empirical data from a student behavior screening instrument.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"983-999"},"PeriodicalIF":2.3,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562891/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142647680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Benefits of Using Maximal Reliability in Educational and Behavioral Research. 论在教育和行为研究中使用最大信度的好处。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-10-01 Epub Date: 2023-12-27 DOI: 10.1177/00131644231215771
Tenko Raykov

This note is concerned with the benefits that can result from the use of the maximal reliability and optimal linear combination concepts in educational and psychological research. Within the widely used framework of unidimensional multi-component measuring instruments, it is demonstrated that the linear combination of their components that possesses the highest possible reliability can exhibit a level of consistency considerably exceeding that of their overall sum score that is nearly routinely employed in contemporary empirical research. This optimal linear combination can be particularly useful in circumstances where one or more scale components are associated with relatively large error variances, but their removal from the instrument can lead to a notable loss in validity due to construct underrepresentation. The discussion is illustrated with a numerical example.

本说明涉及在教育和心理学研究中使用最大信度和最佳线性组合概念的好处。在广泛使用的单维度多成分测量工具的框架内,研究表明,具有最高信度的各成分线性组合所表现出的一致性水平可以大大超过其总分的一致性水平,而后者几乎是当代实证研究中经常使用的。当一个或多个量表成分与相对较大的误差方差相关联时,这种最佳线性组合就显得尤为有用,但如果将其从工具中去除,则会因建构的代表性不足而导致效度的显著降低。本讨论将通过一个数字示例进行说明。
{"title":"On the Benefits of Using Maximal Reliability in Educational and Behavioral Research.","authors":"Tenko Raykov","doi":"10.1177/00131644231215771","DOIUrl":"10.1177/00131644231215771","url":null,"abstract":"<p><p>This note is concerned with the benefits that can result from the use of the maximal reliability and optimal linear combination concepts in educational and psychological research. Within the widely used framework of unidimensional multi-component measuring instruments, it is demonstrated that the linear combination of their components that possesses the highest possible reliability can exhibit a level of consistency considerably exceeding that of their overall sum score that is nearly routinely employed in contemporary empirical research. This optimal linear combination can be particularly useful in circumstances where one or more scale components are associated with relatively large error variances, but their removal from the instrument can lead to a notable loss in validity due to construct underrepresentation. The discussion is illustrated with a numerical example.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"84 5","pages":"994-1011"},"PeriodicalIF":2.3,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11418609/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142336340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating the Ordering Structure of Clustered Items Using Nonparametric Item Response Theory 利用非参数项目反应理论研究聚类项目的排序结构
IF 2.7 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-09-06 DOI: 10.1177/00131644241274122
Letty Koopman, Johan Braeken
Educational and psychological tests with an ordered item structure enable efficient test administration procedures and allow for intuitive score interpretation and monitoring. The effectiveness of the measurement instrument relies to a large extent on the validated strength of its ordering structure. We define three increasingly strict types of ordering for the ordering structure of a measurement instrument with clustered items: a weak and a strong invariant cluster ordering and a clustered invariant item ordering. Following a nonparametric item response theory (IRT) approach, we proposed a procedure to evaluate the ordering structure of a clustered item set along this three-fold continuum of order invariance. The basis of the procedure is (a) the local assessment of pairwise conditional expectations at both cluster and item level and (b) the global assessment of the number of Guttman errors through new generalizations of the H-coefficient for this item-cluster context. The procedure, readily implemented in R, is illustrated and applied to an empirical example. Suggestions for test practice, further methodological developments, and future research are discussed.
采用有序项目结构的教育和心理测验可以提高测验实施程序的效率,并能对分数进行直观的解释和监控。测量工具的有效性在很大程度上取决于其排序结构的有效强度。我们为具有聚类项目的测量工具的排序结构定义了三种越来越严格的排序类型:弱不变聚类排序和强不变聚类排序,以及聚类不变项目排序。按照非参数项目反应理论(IRT)的方法,我们提出了一种程序,用于根据顺序不变性的三重连续统一体评估聚类项目集的排序结构。该程序的基础是:(a) 在聚类和项目水平上对成对条件期望进行局部评估;(b) 通过对 H 系数进行新的概括,在此项目-聚类背景下对 Guttman 误差的数量进行全局评估。该程序可在 R 中轻松实现,并在一个实证例子中加以说明和应用。此外,还讨论了对测试实践、进一步的方法论发展和未来研究的建议。
{"title":"Investigating the Ordering Structure of Clustered Items Using Nonparametric Item Response Theory","authors":"Letty Koopman, Johan Braeken","doi":"10.1177/00131644241274122","DOIUrl":"https://doi.org/10.1177/00131644241274122","url":null,"abstract":"Educational and psychological tests with an ordered item structure enable efficient test administration procedures and allow for intuitive score interpretation and monitoring. The effectiveness of the measurement instrument relies to a large extent on the validated strength of its ordering structure. We define three increasingly strict types of ordering for the ordering structure of a measurement instrument with clustered items: a weak and a strong invariant cluster ordering and a clustered invariant item ordering. Following a nonparametric item response theory (IRT) approach, we proposed a procedure to evaluate the ordering structure of a clustered item set along this three-fold continuum of order invariance. The basis of the procedure is (a) the local assessment of pairwise conditional expectations at both cluster and item level and (b) the global assessment of the number of Guttman errors through new generalizations of the H-coefficient for this item-cluster context. The procedure, readily implemented in R, is illustrated and applied to an empirical example. Suggestions for test practice, further methodological developments, and future research are discussed.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"108 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142178007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Added Value of Subscores for Tests With Polytomous Items 多项式项目测试的子分数附加值
IF 2.7 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-08-07 DOI: 10.1177/00131644241268128
Kylie Gorney, Sandip Sinharay
Test-takers, policymakers, teachers, and institutions are increasingly demanding that testing programs provide more detailed feedback regarding test performance. As a result, there has been a growing interest in the reporting of subscores that potentially provide such detailed feedback. Haberman developed a method based on classical test theory for determining whether a subscore has added value over the total score. Sinharay conducted a detailed study using both real and simulated data and concluded that it is not common for subscores to have added value according to Haberman’s criterion. However, Sinharay almost exclusively dealt with data from tests with only dichotomous items. In this article, we show that it is more common for subscores to have added value in tests with polytomous items.
应试者、政策制定者、教师和机构越来越多地要求考试项目提供更详细的考试成绩反馈。因此,人们对有可能提供此类详细反馈的子分数的报告越来越感兴趣。哈伯曼(Haberman)根据经典测试理论开发了一种方法,用于确定子分数是否比总分具有附加值。辛哈雷利用真实数据和模拟数据进行了详细研究,得出结论认为,根据哈伯曼的标准,子分数具有附加值的情况并不常见。不过,辛哈雷几乎只处理了来自只有二分法项目的测试数据。在本文中,我们将证明在具有多义项目的测试中,子分数具有附加值的情况更为常见。
{"title":"Added Value of Subscores for Tests With Polytomous Items","authors":"Kylie Gorney, Sandip Sinharay","doi":"10.1177/00131644241268128","DOIUrl":"https://doi.org/10.1177/00131644241268128","url":null,"abstract":"Test-takers, policymakers, teachers, and institutions are increasingly demanding that testing programs provide more detailed feedback regarding test performance. As a result, there has been a growing interest in the reporting of subscores that potentially provide such detailed feedback. Haberman developed a method based on classical test theory for determining whether a subscore has added value over the total score. Sinharay conducted a detailed study using both real and simulated data and concluded that it is not common for subscores to have added value according to Haberman’s criterion. However, Sinharay almost exclusively dealt with data from tests with only dichotomous items. In this article, we show that it is more common for subscores to have added value in tests with polytomous items.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"3 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating The Predictive Reliability of Neural Networks in Psychological Research With Random Datasets 利用随机数据集评估神经网络在心理学研究中的预测可靠性
IF 2.7 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-07-25 DOI: 10.1177/00131644241262964
Yongtian Cheng, K. V. Petrides
Psychologists are emphasizing the importance of predictive conclusions. Machine learning methods, such as supervised neural networks, have been used in psychological studies as they naturally fit prediction tasks. However, we are concerned about whether neural networks fitted with random datasets (i.e., datasets where there is no relationship between ordinal independent variables and continuous or binary-dependent variables) can provide an acceptable level of predictive performance from a psychologist’s perspective. Through a Monte Carlo simulation study, we found that this kind of erroneous conclusion is not likely to be drawn as long as the sample size is larger than 50 with continuous-dependent variables. However, when the dependent variable is binary, the minimum sample size is 500 when the criteria are balanced accuracy ≥ .6 or balanced accuracy ≥ .65, and the minimum sample size is 200 when the criterion is balanced accuracy ≥ .7 for a decision error less than .05. In the case where area under the curve (AUC) is used as a metric, a sample size of 100, 200, and 500 is necessary when the minimum acceptable performance level is set at AUC ≥ .7, AUC ≥ .65, and AUC ≥ .6, respectively. The results found by this study can be used for sample size planning for psychologists who wish to apply neural networks for a qualitatively reliable conclusion. Further directions and limitations of the study are also discussed.
心理学家正在强调预测结论的重要性。机器学习方法(如有监督的神经网络)已被用于心理学研究,因为它们很自然地适合预测任务。然而,我们担心的是,从心理学家的角度来看,使用随机数据集(即序数自变量与连续或二元依赖变量之间不存在关系的数据集)拟合的神经网络能否提供可接受水平的预测性能。通过蒙特卡罗模拟研究,我们发现只要样本量大于 50 个连续因变量,就不太可能得出这种错误的结论。然而,当因变量为二元变量时,当标准为均衡准确度≥.6 或均衡准确度≥.65 时,最小样本量为 500 个;当标准为均衡准确度≥.7 时,最小样本量为 200 个,决策误差小于 0.05。在使用曲线下面积(AUC)作为指标的情况下,当最低可接受性能水平设定为 AUC ≥ .7、AUC ≥ .65 和 AUC ≥ .6 时,分别需要 100、200 和 500 个样本量。对于希望应用神经网络得出定性可靠结论的心理学家来说,本研究发现的结果可用于样本量规划。本研究的进一步方向和局限性也在讨论之列。
{"title":"Evaluating The Predictive Reliability of Neural Networks in Psychological Research With Random Datasets","authors":"Yongtian Cheng, K. V. Petrides","doi":"10.1177/00131644241262964","DOIUrl":"https://doi.org/10.1177/00131644241262964","url":null,"abstract":"Psychologists are emphasizing the importance of predictive conclusions. Machine learning methods, such as supervised neural networks, have been used in psychological studies as they naturally fit prediction tasks. However, we are concerned about whether neural networks fitted with random datasets (i.e., datasets where there is no relationship between ordinal independent variables and continuous or binary-dependent variables) can provide an acceptable level of predictive performance from a psychologist’s perspective. Through a Monte Carlo simulation study, we found that this kind of erroneous conclusion is not likely to be drawn as long as the sample size is larger than 50 with continuous-dependent variables. However, when the dependent variable is binary, the minimum sample size is 500 when the criteria are balanced accuracy ≥ .6 or balanced accuracy ≥ .65, and the minimum sample size is 200 when the criterion is balanced accuracy ≥ .7 for a decision error less than .05. In the case where area under the curve (AUC) is used as a metric, a sample size of 100, 200, and 500 is necessary when the minimum acceptable performance level is set at AUC ≥ .7, AUC ≥ .65, and AUC ≥ .6, respectively. The results found by this study can be used for sample size planning for psychologists who wish to apply neural networks for a qualitatively reliable conclusion. Further directions and limitations of the study are also discussed.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"39 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141772234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Studying Factorial Invariance With Nominal Items: A Note on a Latent Variable Modeling Procedure 用名义项目研究因子不变量:关于潜在变量建模程序的说明
IF 2.7 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-06-24 DOI: 10.1177/00131644241256626
Tenko Raykov
A latent variable modeling procedure for studying factorial invariance and differential item functioning for multi-component measuring instruments with nominal items is discussed. The method is based on a multiple testing approach utilizing the false discovery rate concept and likelihood ratio tests. The procedure complements the Revuelta, Franco-Martinez, and Ximenez approach to factorial invariance examination, and permits localization of individual invariance violations. The outlined method does not require the selection of a reference observed variable and is illustrated with empirical data.
本文讨论了一种潜变量建模程序,用于研究具有名义项目的多成分测量工具的因子不变量和差异项目功能。该方法基于利用误发现率概念和似然比检验的多重检验方法。该方法与 Revuelta、Franco-Martinez 和 Ximenez 的因子不变量检验方法相辅相成,并允许对个别不变量违规行为进行定位。所概述的方法无需选择参考观测变量,并用经验数据进行了说明。
{"title":"Studying Factorial Invariance With Nominal Items: A Note on a Latent Variable Modeling Procedure","authors":"Tenko Raykov","doi":"10.1177/00131644241256626","DOIUrl":"https://doi.org/10.1177/00131644241256626","url":null,"abstract":"A latent variable modeling procedure for studying factorial invariance and differential item functioning for multi-component measuring instruments with nominal items is discussed. The method is based on a multiple testing approach utilizing the false discovery rate concept and likelihood ratio tests. The procedure complements the Revuelta, Franco-Martinez, and Ximenez approach to factorial invariance examination, and permits localization of individual invariance violations. The outlined method does not require the selection of a reference observed variable and is illustrated with empirical data.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"33 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141501613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Note on Evaluation of Polytomous Item Locations With the Rating Scale Model and Testing Its Fit 用评分量表模型评估多项式项目位置并测试其拟合度的说明
IF 2.7 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-06-24 DOI: 10.1177/00131644241259026
Tenko Raykov, Martin Pusic
A procedure is outlined for point and interval estimation of location parameters associated with polytomous items, or raters assessing studied subjects or cases, which follow the rating scale model. The method is developed within the framework of latent variable modeling, and is readily applied in empirical research using popular software. The approach permits testing the goodness of fit of this widely used model, which represents a rather parsimonious item response theory model as a means of description and explanation of an analyzed data set. The procedure allows examination of important aspects of the functioning of measuring instruments with polytomous ordinal items, which may also constitute person assessments furnished by teachers, counselors, judges, raters, or clinicians. The described method is illustrated using an empirical example.
本文概述了一种程序,用于按照评分量表模型,对与多变量项目或评估研究对象或案例的评分者相关的位置参数进行点估算和区间估算。该方法是在潜在变量建模框架内开发的,可使用流行软件方便地应用于实证研究。该方法允许对这一广泛使用的模型的拟合度进行测试,该模型代表了一种相当简洁的项目反应理论模型,是描述和解释所分析数据集的一种手段。该程序允许对带有多项式序数项目的测量工具的功能的重要方面进行检查,这些项目也可能构成由教师、顾问、法官、评分者或临床医生提供的个人评估。我们将通过一个实证例子来说明所描述的方法。
{"title":"A Note on Evaluation of Polytomous Item Locations With the Rating Scale Model and Testing Its Fit","authors":"Tenko Raykov, Martin Pusic","doi":"10.1177/00131644241259026","DOIUrl":"https://doi.org/10.1177/00131644241259026","url":null,"abstract":"A procedure is outlined for point and interval estimation of location parameters associated with polytomous items, or raters assessing studied subjects or cases, which follow the rating scale model. The method is developed within the framework of latent variable modeling, and is readily applied in empirical research using popular software. The approach permits testing the goodness of fit of this widely used model, which represents a rather parsimonious item response theory model as a means of description and explanation of an analyzed data set. The procedure allows examination of important aspects of the functioning of measuring instruments with polytomous ordinal items, which may also constitute person assessments furnished by teachers, counselors, judges, raters, or clinicians. The described method is illustrated using an empirical example.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"18 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141501614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing the Detection of Social Desirability Bias Using Machine Learning: A Novel Application of Person-Fit Indices 利用机器学习加强对社会可取性偏见的检测:拟人指数的新应用
IF 2.7 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-05-30 DOI: 10.1177/00131644241255109
Sanaz Nazari, Walter L. Leite, A. Corinne Huggins-Manley
Social desirability bias (SDB) is a common threat to the validity of conclusions from responses to a scale or survey. There is a wide range of person-fit statistics in the literature that can be employed to detect SDB. In addition, machine learning classifiers, such as logistic regression and random forest, have the potential to distinguish between biased and unbiased responses. This study proposes a new application of these classifiers to detect SDB by considering several person-fit indices as features or predictors in the machine learning methods. The results of a Monte Carlo simulation study showed that for a single feature, applying person-fit indices directly and logistic regression led to similar classification results. However, the random forest classifier improved the classification of biased and unbiased responses substantially. Classification was improved in both logistic regression and random forest by considering multiple features simultaneously. Moreover, cross-validation indicated stable area under the curves (AUCs) across machine learning classifiers. A didactical illustration of applying random forest to detect SDB is presented.
社会可取性偏差(SDB)是一种常见的威胁,会影响从量表或调查中得出的结论的有效性。文献中有多种拟人统计方法可用于检测 SDB。此外,机器学习分类器(如逻辑回归和随机森林)也有可能区分有偏见和无偏见的回答。本研究提出了将这些分类器应用于检测 SDB 的新方法,即在机器学习方法中考虑几个人称拟合指数作为特征或预测因子。蒙特卡罗模拟研究结果表明,对于单一特征,直接应用人称拟合指数和逻辑回归的分类结果相似。不过,随机森林分类器大大提高了有偏差和无偏差响应的分类效果。通过同时考虑多个特征,逻辑回归和随机森林分类器的分类效果都得到了改善。此外,交叉验证表明,各种机器学习分类器的曲线下面积(AUC)都很稳定。本文介绍了应用随机森林检测 SDB 的教学示例。
{"title":"Enhancing the Detection of Social Desirability Bias Using Machine Learning: A Novel Application of Person-Fit Indices","authors":"Sanaz Nazari, Walter L. Leite, A. Corinne Huggins-Manley","doi":"10.1177/00131644241255109","DOIUrl":"https://doi.org/10.1177/00131644241255109","url":null,"abstract":"Social desirability bias (SDB) is a common threat to the validity of conclusions from responses to a scale or survey. There is a wide range of person-fit statistics in the literature that can be employed to detect SDB. In addition, machine learning classifiers, such as logistic regression and random forest, have the potential to distinguish between biased and unbiased responses. This study proposes a new application of these classifiers to detect SDB by considering several person-fit indices as features or predictors in the machine learning methods. The results of a Monte Carlo simulation study showed that for a single feature, applying person-fit indices directly and logistic regression led to similar classification results. However, the random forest classifier improved the classification of biased and unbiased responses substantially. Classification was improved in both logistic regression and random forest by considering multiple features simultaneously. Moreover, cross-validation indicated stable area under the curves (AUCs) across machine learning classifiers. A didactical illustration of applying random forest to detect SDB is presented.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"2018 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141188132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Is Effort Moderated Scoring Robust to Multidimensional Rapid Guessing? 努力程度调节评分对多维快速猜测是否稳健?
IF 2.7 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-04-28 DOI: 10.1177/00131644241246749
Joseph A. Rios, Jiayi Deng
To mitigate the potential damaging consequences of rapid guessing (RG), a form of noneffortful responding, researchers have proposed a number of scoring approaches. The present simulation study examines the robustness of the most popular of these approaches, the unidimensional effort-moderated (EM) scoring procedure, to multidimensional RG (i.e., RG that is linearly related to examinee ability). Specifically, EM scoring is compared with the Holman–Glas (HG) method, a multidimensional scoring approach, in terms of model fit distortion, ability parameter recovery, and omega reliability distortion. Test difficulty, the proportion of RG present within a sample, and the strength of association between ability and RG propensity were manipulated to create 80 total conditions. Overall, the results showed that EM scoring provided improved model fit compared with HG scoring when RG comprised 12% or less of all item responses. Furthermore, no significant differences in ability parameter recovery and omega reliability distortion were noted when comparing these two scoring approaches under moderate degrees of RG multidimensionality. These limited differences were largely due to the limited impact of RG on aggregated ability (bias ranged from 0.00 to 0.05 logits) and reliability (distortion was ≤ .005 units) estimates when as much as 40% of item responses in the sample data reflected RG behavior.
快速猜测(RG)是一种非努力反应形式,为了减轻快速猜测的潜在破坏性后果,研究人员提出了许多评分方法。本模拟研究考察了这些方法中最流行的单维努力调解(EM)计分程序对多维 RG(即与考生能力呈线性关系的 RG)的稳健性。具体来说,EM 计分与 Holman-Glas(HG)方法(一种多维计分方法)在模型拟合失真、能力参数恢复和欧米茄信度失真方面进行了比较。测试难度、样本中出现 RG 的比例以及能力与 RG 倾向之间的关联强度受到操纵,共产生了 80 种条件。总体而言,研究结果表明,当 RG 占所有项目回答的 12% 或更少时,与 HG 评分相比,EM 评分的模型拟合度更高。此外,在中等程度的 RG 多维性条件下,比较这两种计分方法,在能力参数恢复和欧米茄信度失真方面没有发现明显差异。这些有限的差异主要是由于当样本数据中有多达 40% 的项目回答反映了 RG 行为时,RG 对综合能力(偏差范围在 0.00 至 0.05 logits 之间)和可靠性(失真度小于 0.005 单位)估计值的影响有限。
{"title":"Is Effort Moderated Scoring Robust to Multidimensional Rapid Guessing?","authors":"Joseph A. Rios, Jiayi Deng","doi":"10.1177/00131644241246749","DOIUrl":"https://doi.org/10.1177/00131644241246749","url":null,"abstract":"To mitigate the potential damaging consequences of rapid guessing (RG), a form of noneffortful responding, researchers have proposed a number of scoring approaches. The present simulation study examines the robustness of the most popular of these approaches, the unidimensional effort-moderated (EM) scoring procedure, to multidimensional RG (i.e., RG that is linearly related to examinee ability). Specifically, EM scoring is compared with the Holman–Glas (HG) method, a multidimensional scoring approach, in terms of model fit distortion, ability parameter recovery, and omega reliability distortion. Test difficulty, the proportion of RG present within a sample, and the strength of association between ability and RG propensity were manipulated to create 80 total conditions. Overall, the results showed that EM scoring provided improved model fit compared with HG scoring when RG comprised 12% or less of all item responses. Furthermore, no significant differences in ability parameter recovery and omega reliability distortion were noted when comparing these two scoring approaches under moderate degrees of RG multidimensionality. These limited differences were largely due to the limited impact of RG on aggregated ability (bias ranged from 0.00 to 0.05 logits) and reliability (distortion was ≤ .005 units) estimates when as much as 40% of item responses in the sample data reflected RG behavior.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"11 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140810537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Educational and Psychological Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1