首页 > 最新文献

Educational and Psychological Measurement最新文献

英文 中文
Investigating the Ordering Structure of Clustered Items Using Nonparametric Item Response Theory 利用非参数项目反应理论研究聚类项目的排序结构
IF 2.7 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-09-06 DOI: 10.1177/00131644241274122
Letty Koopman, Johan Braeken
Educational and psychological tests with an ordered item structure enable efficient test administration procedures and allow for intuitive score interpretation and monitoring. The effectiveness of the measurement instrument relies to a large extent on the validated strength of its ordering structure. We define three increasingly strict types of ordering for the ordering structure of a measurement instrument with clustered items: a weak and a strong invariant cluster ordering and a clustered invariant item ordering. Following a nonparametric item response theory (IRT) approach, we proposed a procedure to evaluate the ordering structure of a clustered item set along this three-fold continuum of order invariance. The basis of the procedure is (a) the local assessment of pairwise conditional expectations at both cluster and item level and (b) the global assessment of the number of Guttman errors through new generalizations of the H-coefficient for this item-cluster context. The procedure, readily implemented in R, is illustrated and applied to an empirical example. Suggestions for test practice, further methodological developments, and future research are discussed.
采用有序项目结构的教育和心理测验可以提高测验实施程序的效率,并能对分数进行直观的解释和监控。测量工具的有效性在很大程度上取决于其排序结构的有效强度。我们为具有聚类项目的测量工具的排序结构定义了三种越来越严格的排序类型:弱不变聚类排序和强不变聚类排序,以及聚类不变项目排序。按照非参数项目反应理论(IRT)的方法,我们提出了一种程序,用于根据顺序不变性的三重连续统一体评估聚类项目集的排序结构。该程序的基础是:(a) 在聚类和项目水平上对成对条件期望进行局部评估;(b) 通过对 H 系数进行新的概括,在此项目-聚类背景下对 Guttman 误差的数量进行全局评估。该程序可在 R 中轻松实现,并在一个实证例子中加以说明和应用。此外,还讨论了对测试实践、进一步的方法论发展和未来研究的建议。
{"title":"Investigating the Ordering Structure of Clustered Items Using Nonparametric Item Response Theory","authors":"Letty Koopman, Johan Braeken","doi":"10.1177/00131644241274122","DOIUrl":"https://doi.org/10.1177/00131644241274122","url":null,"abstract":"Educational and psychological tests with an ordered item structure enable efficient test administration procedures and allow for intuitive score interpretation and monitoring. The effectiveness of the measurement instrument relies to a large extent on the validated strength of its ordering structure. We define three increasingly strict types of ordering for the ordering structure of a measurement instrument with clustered items: a weak and a strong invariant cluster ordering and a clustered invariant item ordering. Following a nonparametric item response theory (IRT) approach, we proposed a procedure to evaluate the ordering structure of a clustered item set along this three-fold continuum of order invariance. The basis of the procedure is (a) the local assessment of pairwise conditional expectations at both cluster and item level and (b) the global assessment of the number of Guttman errors through new generalizations of the H-coefficient for this item-cluster context. The procedure, readily implemented in R, is illustrated and applied to an empirical example. Suggestions for test practice, further methodological developments, and future research are discussed.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142178007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Added Value of Subscores for Tests With Polytomous Items 多项式项目测试的子分数附加值
IF 2.7 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-08-07 DOI: 10.1177/00131644241268128
Kylie Gorney, Sandip Sinharay
Test-takers, policymakers, teachers, and institutions are increasingly demanding that testing programs provide more detailed feedback regarding test performance. As a result, there has been a growing interest in the reporting of subscores that potentially provide such detailed feedback. Haberman developed a method based on classical test theory for determining whether a subscore has added value over the total score. Sinharay conducted a detailed study using both real and simulated data and concluded that it is not common for subscores to have added value according to Haberman’s criterion. However, Sinharay almost exclusively dealt with data from tests with only dichotomous items. In this article, we show that it is more common for subscores to have added value in tests with polytomous items.
应试者、政策制定者、教师和机构越来越多地要求考试项目提供更详细的考试成绩反馈。因此,人们对有可能提供此类详细反馈的子分数的报告越来越感兴趣。哈伯曼(Haberman)根据经典测试理论开发了一种方法,用于确定子分数是否比总分具有附加值。辛哈雷利用真实数据和模拟数据进行了详细研究,得出结论认为,根据哈伯曼的标准,子分数具有附加值的情况并不常见。不过,辛哈雷几乎只处理了来自只有二分法项目的测试数据。在本文中,我们将证明在具有多义项目的测试中,子分数具有附加值的情况更为常见。
{"title":"Added Value of Subscores for Tests With Polytomous Items","authors":"Kylie Gorney, Sandip Sinharay","doi":"10.1177/00131644241268128","DOIUrl":"https://doi.org/10.1177/00131644241268128","url":null,"abstract":"Test-takers, policymakers, teachers, and institutions are increasingly demanding that testing programs provide more detailed feedback regarding test performance. As a result, there has been a growing interest in the reporting of subscores that potentially provide such detailed feedback. Haberman developed a method based on classical test theory for determining whether a subscore has added value over the total score. Sinharay conducted a detailed study using both real and simulated data and concluded that it is not common for subscores to have added value according to Haberman’s criterion. However, Sinharay almost exclusively dealt with data from tests with only dichotomous items. In this article, we show that it is more common for subscores to have added value in tests with polytomous items.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Relative Normed Effect-Size Difference Index for Determining the Number of Common Factors in Exploratory Solutions. 用于确定探索性解决方案中常见因素数量的相对规范化效应大小差异指数
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-08-01 Epub Date: 2023-09-07 DOI: 10.1177/00131644231196482
Pere J Ferrando, David Navarro-González, Urbano Lorenzo-Seva

Descriptive fit indices that do not require a formal statistical basis and do not specifically depend on a given estimation criterion are useful as auxiliary devices for judging the appropriateness of unrestricted or exploratory factor analytical (UFA) solutions, when the problem is to decide the most appropriate number of common factors. While overall indices of this type are well known in UFA applications, especially those intended for item analysis, difference indices are much more scarce. Recently, Raykov and collaborators proposed a family of effect-size-type descriptive difference indices that are promising for UFA applications. As a starting point, we considered the simplest measure of this family, which (a) can be viewed as absolute and (b) from which only tentative cutoffs and reference values have been provided so far. In this situation, this article has three aims. The first is to propose a relative version of Raykov's effect-size measure, intended to be used as a complement of the original measure, in which the increase in explained common variance is related to the overall prior estimated amount of common factor variance. The second is to establish reference values for both indices in item-analysis scenarios using simulation. And the third aim (instrumental) is to implement the proposal in both R language and a well-known non-commercial factor analysis program. The functioning and usefulness of the proposal is illustrated using an existing empirical dataset.

当问题是决定最合适的公共因素数量时,不需要正式统计基础且不具体依赖于给定估计标准的描述性拟合指数可以作为判断不受限制或探索性因素分析(UFA)解决方案的适当性的辅助设备。虽然这类总体指数在UFA应用中是众所周知的,尤其是那些用于项目分析的指数,但差异指数要少得多。最近,Raykov及其合作者提出了一个效应大小类型的描述性差异指数家族,该指数有望用于UFA应用。作为一个起点,我们考虑了这个家族最简单的衡量标准,(a)可以被视为绝对的,(b)到目前为止,只提供了暂定的截止值和参考值。在这种情况下,本文有三个目的。第一个是提出Raykov效应大小测度的相对版本,旨在作为原始测度的补充,其中解释的共同方差的增加与先前估计的共同因子方差的总体数量有关。第二种是在使用模拟的项目分析场景中为这两个指标建立参考值。第三个目标(工具)是用R语言和一个著名的非商业因素分析程序来实现该建议。使用现有的经验数据集说明了该提案的功能和有用性。
{"title":"A Relative Normed Effect-Size Difference Index for Determining the Number of Common Factors in Exploratory Solutions.","authors":"Pere J Ferrando, David Navarro-González, Urbano Lorenzo-Seva","doi":"10.1177/00131644231196482","DOIUrl":"10.1177/00131644231196482","url":null,"abstract":"<p><p>Descriptive fit indices that do not require a formal statistical basis and do not specifically depend on a given estimation criterion are useful as auxiliary devices for judging the appropriateness of unrestricted or exploratory factor analytical (UFA) solutions, when the problem is to decide the most appropriate number of common factors. While overall indices of this type are well known in UFA applications, especially those intended for item analysis, difference indices are much more scarce. Recently, Raykov and collaborators proposed a family of effect-size-type descriptive difference indices that are promising for UFA applications. As a starting point, we considered the simplest measure of this family, which (a) can be viewed as absolute and (b) from which only tentative cutoffs and reference values have been provided so far. In this situation, this article has three aims. The first is to propose a relative version of Raykov's effect-size measure, intended to be used as a complement of the original measure, in which the increase in explained common variance is related to the overall prior estimated amount of common factor variance. The second is to establish reference values for both indices in item-analysis scenarios using simulation. And the third aim (instrumental) is to implement the proposal in both R language and a well-known non-commercial factor analysis program. The functioning and usefulness of the proposal is illustrated using an existing empirical dataset.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11268389/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46343630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Illustration of an IRTree Model for Disengagement. 一种用于脱离的IRTree模型说明
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-08-01 Epub Date: 2023-07-26 DOI: 10.1177/00131644231185533
Brian C Leventhal, Dena Pastor

Low-stakes test performance commonly reflects examinee ability and effort. Examinees exhibiting low effort may be identified through rapid guessing behavior throughout an assessment. There has been a plethora of methods proposed to adjust scores once rapid guesses have been identified, but these have been plagued by strong assumptions or the removal of examinees. In this study, we illustrate how an IRTree model can be used to adjust examinee ability for rapid guessing behavior. Our approach is flexible as it does not assume independence between rapid guessing behavior and the trait of interest (e.g., ability) nor does it necessitate the removal of examinees who engage in rapid guessing. In addition, our method uniquely allows for the simultaneous modeling of a disengagement latent trait in addition to the trait of interest. The results indicate the model is quite useful for estimating individual differences among examinees in the disengagement latent trait and in providing more precise measurement of examinee ability relative to models ignoring rapid guesses or accommodating it in different ways. A simulation study reveals that our model results in less biased estimates of the trait of interest for individuals with rapid responses, regardless of sample size and rapid response rate in the sample. We conclude with a discussion of extensions of the model and directions for future research.

低风险考试成绩通常反映考生的能力和努力程度。在整个评估过程中,可以通过快速猜测行为来识别表现出低努力的考生。一旦确定了快速猜测,就有很多方法可以用来调整分数,但这些方法一直受到强烈假设或取消考生资格的困扰。在这项研究中,我们说明了如何使用IRTree模型来调整考生的快速猜测行为能力。我们的方法是灵活的,因为它不假设快速猜测行为和兴趣特征(如能力)之间的独立性,也不需要排除参与快速猜测的考生。此外,除了感兴趣的特征之外,我们的方法还独特地允许对脱离潜在特征进行同时建模。结果表明,相对于忽略快速猜测或以不同方式适应快速猜测的模型,该模型在估计考生在脱离潜在特征方面的个体差异以及提供更精确的考生能力测量方面非常有用。一项模拟研究表明,无论样本大小和样本中的快速反应率如何,我们的模型都能减少对快速反应个体感兴趣特征的偏差估计。最后,我们讨论了模型的扩展和未来研究的方向。
{"title":"An Illustration of an IRTree Model for Disengagement.","authors":"Brian C Leventhal, Dena Pastor","doi":"10.1177/00131644231185533","DOIUrl":"10.1177/00131644231185533","url":null,"abstract":"<p><p>Low-stakes test performance commonly reflects examinee ability and effort. Examinees exhibiting low effort may be identified through rapid guessing behavior throughout an assessment. There has been a plethora of methods proposed to adjust scores once rapid guesses have been identified, but these have been plagued by strong assumptions or the removal of examinees. In this study, we illustrate how an IRTree model can be used to adjust examinee ability for rapid guessing behavior. Our approach is flexible as it does not assume independence between rapid guessing behavior and the trait of interest (e.g., ability) nor does it necessitate the removal of examinees who engage in rapid guessing. In addition, our method uniquely allows for the simultaneous modeling of a disengagement latent trait in addition to the trait of interest. The results indicate the model is quite useful for estimating individual differences among examinees in the disengagement latent trait and in providing more precise measurement of examinee ability relative to models ignoring rapid guesses or accommodating it in different ways. A simulation study reveals that our model results in less biased estimates of the trait of interest for individuals with rapid responses, regardless of sample size and rapid response rate in the sample. We conclude with a discussion of extensions of the model and directions for future research.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11268386/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45249200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Ensemble Learning Approach Based on TabNet and Machine Learning Models for Cheating Detection in Educational Tests. 基于表网和机器学习模型的教育考试作弊检测集成学习方法
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-08-01 Epub Date: 2023-08-21 DOI: 10.1177/00131644231191298
Yang Zhen, Xiaoyan Zhu

The pervasive issue of cheating in educational tests has emerged as a paramount concern within the realm of education, prompting scholars to explore diverse methodologies for identifying potential transgressors. While machine learning models have been extensively investigated for this purpose, the untapped potential of TabNet, an intricate deep neural network model, remains uncharted territory. Within this study, a comprehensive evaluation and comparison of 12 base models (naive Bayes, linear discriminant analysis, Gaussian process, support vector machine, decision tree, random forest, Extreme Gradient Boosting (XGBoost), AdaBoost, logistic regression, k-nearest neighbors, multilayer perceptron, and TabNet) was undertaken to scrutinize their predictive capabilities. The area under the receiver operating characteristic curve (AUC) was employed as the performance metric for evaluation. Impressively, the findings underscored the supremacy of TabNet (AUC = 0.85) over its counterparts, signifying the profound aptitude of deep neural network models in tackling tabular tasks, such as the detection of academic dishonesty. Encouraged by these outcomes, we proceeded to synergistically amalgamate the two most efficacious models, TabNet (AUC = 0.85) and AdaBoost (AUC = 0.81), resulting in the creation of an ensemble model christened TabNet-AdaBoost (AUC = 0.92). The emergence of this novel hybrid approach exhibited considerable potential in research endeavors within this domain. Importantly, our investigation has unveiled fresh insights into the utilization of deep neural network models for the purpose of identifying cheating in educational tests.

教育考试中普遍存在的作弊问题已经成为教育领域的首要问题,促使学者们探索各种方法来识别潜在的违规者。虽然机器学习模型已经为此目的进行了广泛的研究,但TabNet(一种复杂的深度神经网络模型)尚未开发的潜力仍然是未知的领域。在本研究中,对12个基本模型(朴素贝叶斯、线性判别分析、高斯过程、支持向量机、决策树、随机森林、极端梯度增强(XGBoost)、AdaBoost、逻辑回归、k近邻、多层感知器和TabNet)进行了综合评估和比较,以审查它们的预测能力。以受试者工作特征曲线下面积(AUC)作为评价指标。令人印象深刻的是,研究结果强调了TabNet的优势(AUC = 0.85),这表明深度神经网络模型在处理表格任务(如学术不诚实的检测)方面具有深刻的能力。受这些结果的鼓舞,我们继续协同合并两个最有效的模型,TabNet (AUC = 0.85)和AdaBoost (AUC = 0.81),从而创建了一个命名为TabNet-AdaBoost的集成模型(AUC = 0.92)。这种新型混合方法的出现在该领域的研究努力中显示出相当大的潜力。重要的是,我们的调查揭示了利用深度神经网络模型来识别教育考试作弊的新见解。
{"title":"An Ensemble Learning Approach Based on TabNet and Machine Learning Models for Cheating Detection in Educational Tests.","authors":"Yang Zhen, Xiaoyan Zhu","doi":"10.1177/00131644231191298","DOIUrl":"10.1177/00131644231191298","url":null,"abstract":"<p><p>The pervasive issue of cheating in educational tests has emerged as a paramount concern within the realm of education, prompting scholars to explore diverse methodologies for identifying potential transgressors. While machine learning models have been extensively investigated for this purpose, the untapped potential of TabNet, an intricate deep neural network model, remains uncharted territory. Within this study, a comprehensive evaluation and comparison of 12 base models (naive Bayes, linear discriminant analysis, Gaussian process, support vector machine, decision tree, random forest, Extreme Gradient Boosting (XGBoost), AdaBoost, logistic regression, <i>k</i>-nearest neighbors, multilayer perceptron, and TabNet) was undertaken to scrutinize their predictive capabilities. The area under the receiver operating characteristic curve (AUC) was employed as the performance metric for evaluation. Impressively, the findings underscored the supremacy of TabNet (AUC = 0.85) over its counterparts, signifying the profound aptitude of deep neural network models in tackling tabular tasks, such as the detection of academic dishonesty. Encouraged by these outcomes, we proceeded to synergistically amalgamate the two most efficacious models, TabNet (AUC = 0.85) and AdaBoost (AUC = 0.81), resulting in the creation of an ensemble model christened TabNet-AdaBoost (AUC = 0.92). The emergence of this novel hybrid approach exhibited considerable potential in research endeavors within this domain. Importantly, our investigation has unveiled fresh insights into the utilization of deep neural network models for the purpose of identifying cheating in educational tests.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11268385/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41801931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating Imputation-Based Fit Statistics in Structural Equation Modeling With Ordinal Data: The MI2S Approach 在使用序数数据的结构方程建模中评估基于估算的拟合统计量:MI2S 方法
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-07-27 DOI: 10.1177/00131644241261271
Suppanut Sriutaisuk, Yu Liu, Seungwon Chung, Hanjoe Kim, Fei Gu
The multiple imputation two-stage (MI2S) approach holds promise for evaluating the model fit of structural equation models for ordinal variables with multiply imputed data. However, previous studies only examined the performance of MI2S-based residual-based test statistics. This study extends previous research by examining the performance of two alternative test statistics: the mean-adjusted test statistic ( T M) and the mean- and variance-adjusted test statistic ( T MV). Our results showed that the MI2S-based T MV generally outperformed other test statistics examined in a wide range of conditions. The MI2S-based root mean square error of approximation also exhibited good performance. This article demonstrates the MI2S approach with an empirical data set and provides Mplus and R code for its implementation.
两阶段多重估算(MI2S)方法有望评估具有多重估算数据的序变量结构方程模型的拟合度。然而,以前的研究只考察了基于 MI2S 的残差检验统计量的性能。本研究扩展了之前的研究,检验了两种可选检验统计量的性能:均值调整检验统计量(T M)和均值与方差调整检验统计量(T MV)。我们的结果表明,在各种条件下,基于 MI2S 的 T MV 总体上优于其他测试统计量。基于 MI2S 的均方根近似误差也表现出良好的性能。本文用一组经验数据演示了 MI2S 方法,并提供了实现该方法的 Mplus 和 R 代码。
{"title":"Evaluating Imputation-Based Fit Statistics in Structural Equation Modeling With Ordinal Data: The MI2S Approach","authors":"Suppanut Sriutaisuk, Yu Liu, Seungwon Chung, Hanjoe Kim, Fei Gu","doi":"10.1177/00131644241261271","DOIUrl":"https://doi.org/10.1177/00131644241261271","url":null,"abstract":"The multiple imputation two-stage (MI2S) approach holds promise for evaluating the model fit of structural equation models for ordinal variables with multiply imputed data. However, previous studies only examined the performance of MI2S-based residual-based test statistics. This study extends previous research by examining the performance of two alternative test statistics: the mean-adjusted test statistic ( T M) and the mean- and variance-adjusted test statistic ( T MV). Our results showed that the MI2S-based T MV generally outperformed other test statistics examined in a wide range of conditions. The MI2S-based root mean square error of approximation also exhibited good performance. This article demonstrates the MI2S approach with an empirical data set and provides Mplus and R code for its implementation.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141798012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Can One Pool Over Site in a Multi-Site Study With Categorical Item Measuring Instruments?: A Multiple Testing Procedure 在使用分类项目测量工具进行的多站点研究中,能否在多个站点之间建立一个集合?多重测试程序
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-07-27 DOI: 10.1177/00131644241267010
T. Raykov, Khaled Alkherainej
We outline a procedure for examining collapsibility over site in multiple-location settings that are frequently utilized in contemporary educational and behavioral research. The method is based on a test of cross-site identity of the response distributions of polytomous items in multi-component measuring instruments, which implies the possibility to pool over study location. The approach is readily applicable in empirical studies using popular and widely circulated software and is generalizable to various types of items. The described procedure is illustrated with data from a child development survey.
我们概述了在当代教育和行为研究中经常使用的多地点环境中检验地点可比性的程序。该方法基于对多组分测量工具中多组分项目的反应分布的跨地点同一性的检验,这意味着对研究地点进行集合的可能性。该方法可随时应用于使用流行和广泛流传的软件进行的实证研究,并适用于各种类型的项目。所述程序以一项儿童发展调查的数据为例进行说明。
{"title":"Can One Pool Over Site in a Multi-Site Study With Categorical Item Measuring Instruments?: A Multiple Testing Procedure","authors":"T. Raykov, Khaled Alkherainej","doi":"10.1177/00131644241267010","DOIUrl":"https://doi.org/10.1177/00131644241267010","url":null,"abstract":"We outline a procedure for examining collapsibility over site in multiple-location settings that are frequently utilized in contemporary educational and behavioral research. The method is based on a test of cross-site identity of the response distributions of polytomous items in multi-component measuring instruments, which implies the possibility to pool over study location. The approach is readily applicable in empirical studies using popular and widely circulated software and is generalizable to various types of items. The described procedure is illustrated with data from a child development survey.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141798033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating The Predictive Reliability of Neural Networks in Psychological Research With Random Datasets 利用随机数据集评估神经网络在心理学研究中的预测可靠性
IF 2.7 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-07-25 DOI: 10.1177/00131644241262964
Yongtian Cheng, K. V. Petrides
Psychologists are emphasizing the importance of predictive conclusions. Machine learning methods, such as supervised neural networks, have been used in psychological studies as they naturally fit prediction tasks. However, we are concerned about whether neural networks fitted with random datasets (i.e., datasets where there is no relationship between ordinal independent variables and continuous or binary-dependent variables) can provide an acceptable level of predictive performance from a psychologist’s perspective. Through a Monte Carlo simulation study, we found that this kind of erroneous conclusion is not likely to be drawn as long as the sample size is larger than 50 with continuous-dependent variables. However, when the dependent variable is binary, the minimum sample size is 500 when the criteria are balanced accuracy ≥ .6 or balanced accuracy ≥ .65, and the minimum sample size is 200 when the criterion is balanced accuracy ≥ .7 for a decision error less than .05. In the case where area under the curve (AUC) is used as a metric, a sample size of 100, 200, and 500 is necessary when the minimum acceptable performance level is set at AUC ≥ .7, AUC ≥ .65, and AUC ≥ .6, respectively. The results found by this study can be used for sample size planning for psychologists who wish to apply neural networks for a qualitatively reliable conclusion. Further directions and limitations of the study are also discussed.
心理学家正在强调预测结论的重要性。机器学习方法(如有监督的神经网络)已被用于心理学研究,因为它们很自然地适合预测任务。然而,我们担心的是,从心理学家的角度来看,使用随机数据集(即序数自变量与连续或二元依赖变量之间不存在关系的数据集)拟合的神经网络能否提供可接受水平的预测性能。通过蒙特卡罗模拟研究,我们发现只要样本量大于 50 个连续因变量,就不太可能得出这种错误的结论。然而,当因变量为二元变量时,当标准为均衡准确度≥.6 或均衡准确度≥.65 时,最小样本量为 500 个;当标准为均衡准确度≥.7 时,最小样本量为 200 个,决策误差小于 0.05。在使用曲线下面积(AUC)作为指标的情况下,当最低可接受性能水平设定为 AUC ≥ .7、AUC ≥ .65 和 AUC ≥ .6 时,分别需要 100、200 和 500 个样本量。对于希望应用神经网络得出定性可靠结论的心理学家来说,本研究发现的结果可用于样本量规划。本研究的进一步方向和局限性也在讨论之列。
{"title":"Evaluating The Predictive Reliability of Neural Networks in Psychological Research With Random Datasets","authors":"Yongtian Cheng, K. V. Petrides","doi":"10.1177/00131644241262964","DOIUrl":"https://doi.org/10.1177/00131644241262964","url":null,"abstract":"Psychologists are emphasizing the importance of predictive conclusions. Machine learning methods, such as supervised neural networks, have been used in psychological studies as they naturally fit prediction tasks. However, we are concerned about whether neural networks fitted with random datasets (i.e., datasets where there is no relationship between ordinal independent variables and continuous or binary-dependent variables) can provide an acceptable level of predictive performance from a psychologist’s perspective. Through a Monte Carlo simulation study, we found that this kind of erroneous conclusion is not likely to be drawn as long as the sample size is larger than 50 with continuous-dependent variables. However, when the dependent variable is binary, the minimum sample size is 500 when the criteria are balanced accuracy ≥ .6 or balanced accuracy ≥ .65, and the minimum sample size is 200 when the criterion is balanced accuracy ≥ .7 for a decision error less than .05. In the case where area under the curve (AUC) is used as a metric, a sample size of 100, 200, and 500 is necessary when the minimum acceptable performance level is set at AUC ≥ .7, AUC ≥ .65, and AUC ≥ .6, respectively. The results found by this study can be used for sample size planning for psychologists who wish to apply neural networks for a qualitatively reliable conclusion. Further directions and limitations of the study are also discussed.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141772234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Studying Factorial Invariance With Nominal Items: A Note on a Latent Variable Modeling Procedure 用名义项目研究因子不变量:关于潜在变量建模程序的说明
IF 2.7 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-06-24 DOI: 10.1177/00131644241256626
Tenko Raykov
A latent variable modeling procedure for studying factorial invariance and differential item functioning for multi-component measuring instruments with nominal items is discussed. The method is based on a multiple testing approach utilizing the false discovery rate concept and likelihood ratio tests. The procedure complements the Revuelta, Franco-Martinez, and Ximenez approach to factorial invariance examination, and permits localization of individual invariance violations. The outlined method does not require the selection of a reference observed variable and is illustrated with empirical data.
本文讨论了一种潜变量建模程序,用于研究具有名义项目的多成分测量工具的因子不变量和差异项目功能。该方法基于利用误发现率概念和似然比检验的多重检验方法。该方法与 Revuelta、Franco-Martinez 和 Ximenez 的因子不变量检验方法相辅相成,并允许对个别不变量违规行为进行定位。所概述的方法无需选择参考观测变量,并用经验数据进行了说明。
{"title":"Studying Factorial Invariance With Nominal Items: A Note on a Latent Variable Modeling Procedure","authors":"Tenko Raykov","doi":"10.1177/00131644241256626","DOIUrl":"https://doi.org/10.1177/00131644241256626","url":null,"abstract":"A latent variable modeling procedure for studying factorial invariance and differential item functioning for multi-component measuring instruments with nominal items is discussed. The method is based on a multiple testing approach utilizing the false discovery rate concept and likelihood ratio tests. The procedure complements the Revuelta, Franco-Martinez, and Ximenez approach to factorial invariance examination, and permits localization of individual invariance violations. The outlined method does not require the selection of a reference observed variable and is illustrated with empirical data.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141501613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Note on Evaluation of Polytomous Item Locations With the Rating Scale Model and Testing Its Fit 用评分量表模型评估多项式项目位置并测试其拟合度的说明
IF 2.7 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-06-24 DOI: 10.1177/00131644241259026
Tenko Raykov, Martin Pusic
A procedure is outlined for point and interval estimation of location parameters associated with polytomous items, or raters assessing studied subjects or cases, which follow the rating scale model. The method is developed within the framework of latent variable modeling, and is readily applied in empirical research using popular software. The approach permits testing the goodness of fit of this widely used model, which represents a rather parsimonious item response theory model as a means of description and explanation of an analyzed data set. The procedure allows examination of important aspects of the functioning of measuring instruments with polytomous ordinal items, which may also constitute person assessments furnished by teachers, counselors, judges, raters, or clinicians. The described method is illustrated using an empirical example.
本文概述了一种程序,用于按照评分量表模型,对与多变量项目或评估研究对象或案例的评分者相关的位置参数进行点估算和区间估算。该方法是在潜在变量建模框架内开发的,可使用流行软件方便地应用于实证研究。该方法允许对这一广泛使用的模型的拟合度进行测试,该模型代表了一种相当简洁的项目反应理论模型,是描述和解释所分析数据集的一种手段。该程序允许对带有多项式序数项目的测量工具的功能的重要方面进行检查,这些项目也可能构成由教师、顾问、法官、评分者或临床医生提供的个人评估。我们将通过一个实证例子来说明所描述的方法。
{"title":"A Note on Evaluation of Polytomous Item Locations With the Rating Scale Model and Testing Its Fit","authors":"Tenko Raykov, Martin Pusic","doi":"10.1177/00131644241259026","DOIUrl":"https://doi.org/10.1177/00131644241259026","url":null,"abstract":"A procedure is outlined for point and interval estimation of location parameters associated with polytomous items, or raters assessing studied subjects or cases, which follow the rating scale model. The method is developed within the framework of latent variable modeling, and is readily applied in empirical research using popular software. The approach permits testing the goodness of fit of this widely used model, which represents a rather parsimonious item response theory model as a means of description and explanation of an analyzed data set. The procedure allows examination of important aspects of the functioning of measuring instruments with polytomous ordinal items, which may also constitute person assessments furnished by teachers, counselors, judges, raters, or clinicians. The described method is illustrated using an empirical example.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141501614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Educational and Psychological Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1