首页 > 最新文献

Measurement-Interdisciplinary Research and Perspectives最新文献

英文 中文
Application of Network Analysis to Description and Prediction of Assessment Outcomes 网络分析在评价结果描述与预测中的应用
IF 1 Q3 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2022-07-03 DOI: 10.1080/15366367.2021.1971024
James J. Thompson
ABSTRACT With the use of computerized testing, ordinary assessments can capture both answer accuracy and answer response time. For the Canadian Programme for the International Assessment of Adult Competencies (PIAAC) numeracy and literacy subtests, person ability, person speed, question difficulty, question time intensity, fluency (rate), person fluency (skill), question fluency (load), pace (rank of response time within question), and person pace were assessed. Undirected Gaussian Graphical Model networks of the measures based on partial correlations were predictive of the measures as nodes. The population-based model extrapolated well to individual person estimations. Finally, it was shown that the “training” Canadian model generalized with minor differences to four other English-speaking PIAAC assessments (USA, Great Britain, Ireland, and New Zealand). Thus, the undirected network approach provides a heuristic that is both descriptive and predictive. However, the model is not causal and can be taken as an example of “mutualism.”
随着计算机化测试的使用,普通的评估可以捕捉到答案的准确性和答案的反应时间。对于加拿大成人能力国际评估计划(PIAAC)的计算和读写子测试,评估了个人能力、个人速度、问题难度、问题时间强度、流畅性(比率)、个人流畅性(技能)、问题流畅性(负荷)、速度(问题内反应时间的等级)和个人速度。基于偏相关的测度的无向高斯图模型网络作为节点对测度进行预测。以人口为基础的模型可以很好地推断出个人的估计。最后,研究表明,“训练”加拿大模式与其他四个以英语为母语的PIAAC评估(美国、英国、爱尔兰和新西兰)有轻微的差异。因此,无向网络方法提供了一种启发式方法,既具有描述性又具有预测性。然而,该模型不是因果关系,可以作为“互惠主义”的一个例子。
{"title":"Application of Network Analysis to Description and Prediction of Assessment Outcomes","authors":"James J. Thompson","doi":"10.1080/15366367.2021.1971024","DOIUrl":"https://doi.org/10.1080/15366367.2021.1971024","url":null,"abstract":"ABSTRACT With the use of computerized testing, ordinary assessments can capture both answer accuracy and answer response time. For the Canadian Programme for the International Assessment of Adult Competencies (PIAAC) numeracy and literacy subtests, person ability, person speed, question difficulty, question time intensity, fluency (rate), person fluency (skill), question fluency (load), pace (rank of response time within question), and person pace were assessed. Undirected Gaussian Graphical Model networks of the measures based on partial correlations were predictive of the measures as nodes. The population-based model extrapolated well to individual person estimations. Finally, it was shown that the “training” Canadian model generalized with minor differences to four other English-speaking PIAAC assessments (USA, Great Britain, Ireland, and New Zealand). Thus, the undirected network approach provides a heuristic that is both descriptive and predictive. However, the model is not causal and can be taken as an example of “mutualism.”","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":"3 1","pages":"121 - 138"},"PeriodicalIF":1.0,"publicationDate":"2022-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88826420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Think-aloud Interviews to Examine a Clinically Oriented Performance Assessment Rubric 使用有声思考访谈来检验临床导向的绩效评估标准
IF 1 Q3 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2022-07-03 DOI: 10.1080/15366367.2021.1991742
M. Roduta Roberts, Chad M. Gotch, Megan Cook, Karin Werther, I. Chao
ABSTRACT Performance-based assessment is a common approach to assess the development and acquisition of practice competencies among health professions students. Judgments related to the quality of performance are typically operationalized as ratings against success criteria specified within a rubric. The extent to which the rubric is understood, interpreted, and applied by assessors is critical to support valid score interpretations and their subsequent use. Therefore, the purpose of this study was to examine evidence to support a scoring inference related to assessor ratings on a clinically oriented performance-based examination. Think-aloud data showed that rubric dimensions generally informed assessors’ ratings, but specific performance descriptors were rarely invoked. These findings support revisions to the rubric (e.g., less subjective, rating-scale language) and highlight tensions and implications of using rubrics for student evaluation and making decisions in a learning context.
基于绩效的评估是评估卫生专业学生实践能力发展和获得的常用方法。与绩效质量相关的判断通常是根据一个标题中指定的成功标准进行评级。评估者理解、解释和应用评分标准的程度对于支持有效的分数解释及其后续使用至关重要。因此,本研究的目的是检验证据,以支持在临床导向的基于绩效的检查中与评估者评分相关的评分推断。有声思考的数据显示,标题维度通常会影响评估人员的评级,但很少使用具体的绩效描述符。这些发现支持对标准的修订(例如,减少主观,等级量表语言),并突出了在学习环境中使用标准进行学生评价和决策的紧张和影响。
{"title":"Using Think-aloud Interviews to Examine a Clinically Oriented Performance Assessment Rubric","authors":"M. Roduta Roberts, Chad M. Gotch, Megan Cook, Karin Werther, I. Chao","doi":"10.1080/15366367.2021.1991742","DOIUrl":"https://doi.org/10.1080/15366367.2021.1991742","url":null,"abstract":"ABSTRACT Performance-based assessment is a common approach to assess the development and acquisition of practice competencies among health professions students. Judgments related to the quality of performance are typically operationalized as ratings against success criteria specified within a rubric. The extent to which the rubric is understood, interpreted, and applied by assessors is critical to support valid score interpretations and their subsequent use. Therefore, the purpose of this study was to examine evidence to support a scoring inference related to assessor ratings on a clinically oriented performance-based examination. Think-aloud data showed that rubric dimensions generally informed assessors’ ratings, but specific performance descriptors were rarely invoked. These findings support revisions to the rubric (e.g., less subjective, rating-scale language) and highlight tensions and implications of using rubrics for student evaluation and making decisions in a learning context.","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":"22 1","pages":"139 - 150"},"PeriodicalIF":1.0,"publicationDate":"2022-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87509730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rater Connections and the Detection of Bias in Performance Assessment 绩效评估中的评价者连接与偏差检测
IF 1 Q3 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2022-04-03 DOI: 10.1080/15366367.2021.1942672
Stefanie A. Wind
ABSTRACT In many performance assessments, one or two raters from the complete rater pool scores each performance, resulting in a sparse rating design, where there are limited observations of each rater relative to the complete sample of students. Although sparse rating designs can be constructed to facilitate estimation of student achievement, the relatively limited observations of each rater can pose challenges for identifying raters who may exhibit scoring idiosyncrasies specific to individual or subgroups of examinees, such as differential rater functioning (DRF; i.e., rater bias). In particular, when raters who exhibit DRF are directly connected to other raters who exhibit the same type of DRF, there is limited information with which to detect this effect. On the other hand, if raters who exhibit DRF are connected to raters who do not exhibit DRF, this effect may be more readily detected. In this study, a simulation is used to systematically examine the degree to which the nature of connections among raters who exhibit common DRF patterns in sparse rating designs impacts the sensitivity of DRF indices. The use of additional “monitoring ratings” and variable rater assignment to student performances are considered as strategies to improve DRF detection in sparse designs. The results indicate that the nature of connections among DRF raters has a substantial impact on the sensitivity of DRF indices, and that monitoring ratings and variable rater assignment to student performances can improve DRF detection.
在许多绩效评估中,从完整的评分者池中选出一个或两个评分者对每个绩效进行评分,导致了稀疏的评分设计,其中每个评分者相对于完整的学生样本的观察结果有限。尽管可以构建稀疏评分设计来促进对学生成绩的估计,但对每个评分员的相对有限的观察可能会对识别可能表现出特定于个人或考生亚组的评分特质的评分员构成挑战,例如差异评分功能(DRF;也就是,评分偏差)。特别是,当表现出DRF的评分者与表现出相同类型DRF的其他评分者直接联系时,用于检测这种影响的信息有限。另一方面,如果表现出DRF的评分者与没有表现出DRF的评分者有联系,这种影响可能更容易被发现。在本研究中,模拟用于系统地检查在稀疏评级设计中表现出常见DRF模式的评分者之间的连接性质对DRF指数敏感性的影响程度。在稀疏设计中,使用额外的“监控评分”和对学生成绩分配可变评分者被认为是改进DRF检测的策略。结果表明,DRF评分者之间的联系性质对DRF指标的敏感性有实质性影响,监测评分和对学生成绩的可变评分分配可以提高DRF检测。
{"title":"Rater Connections and the Detection of Bias in Performance Assessment","authors":"Stefanie A. Wind","doi":"10.1080/15366367.2021.1942672","DOIUrl":"https://doi.org/10.1080/15366367.2021.1942672","url":null,"abstract":"ABSTRACT In many performance assessments, one or two raters from the complete rater pool scores each performance, resulting in a sparse rating design, where there are limited observations of each rater relative to the complete sample of students. Although sparse rating designs can be constructed to facilitate estimation of student achievement, the relatively limited observations of each rater can pose challenges for identifying raters who may exhibit scoring idiosyncrasies specific to individual or subgroups of examinees, such as differential rater functioning (DRF; i.e., rater bias). In particular, when raters who exhibit DRF are directly connected to other raters who exhibit the same type of DRF, there is limited information with which to detect this effect. On the other hand, if raters who exhibit DRF are connected to raters who do not exhibit DRF, this effect may be more readily detected. In this study, a simulation is used to systematically examine the degree to which the nature of connections among raters who exhibit common DRF patterns in sparse rating designs impacts the sensitivity of DRF indices. The use of additional “monitoring ratings” and variable rater assignment to student performances are considered as strategies to improve DRF detection in sparse designs. The results indicate that the nature of connections among DRF raters has a substantial impact on the sensitivity of DRF indices, and that monitoring ratings and variable rater assignment to student performances can improve DRF detection.","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":"26 1","pages":"91 - 106"},"PeriodicalIF":1.0,"publicationDate":"2022-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82602033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The Comparison of Estimation Methods for the Four-Parameter Logistic Item Response Theory Model 四参数Logistic项目反应理论模型估计方法的比较
IF 1 Q3 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2022-04-03 DOI: 10.1080/15366367.2021.1897398
Ö. K. Kalkan
ABSTRACT The four-parameter logistic (4PL) Item Response Theory (IRT) model has recently been reconsidered in the literature due to the advances in the statistical modeling software and the recent developments in the estimation of the 4PL IRT model parameters. The current simulation study evaluated the performance of expectation-maximization (EM), Quasi-Monte Carlo EM (QMCEM), and Metropolis-Hastings Robbins-Monro (MH-RM) estimation methods for the item parameters in the 4PL IRT model under the manipulated study conditions, including the number of factors, the correlation between factors, and test length. The results indicated that there was no method to be recommended as the best one among the three estimation algorithms for the estimation of 4PL item parameters accurately across all study conditions. However, using the MH-RM algorithm for 4PL model item parameter estimation can be suggested when the number of factors is 2 or 3. In addition, it may be advised to prefer long test lengths rather than shorter test lengths (n = 24), as three algorithms provide better item parameter estimates at long test lengths (n = 48).
由于统计建模软件的进步和最近在估计4PL IRT模型参数方面的发展,四参数逻辑(4PL)项目反应理论(IRT)模型最近在文献中被重新考虑。本模拟研究评估了期望最大化(EM)、准蒙特卡罗EM (QMCEM)和Metropolis-Hastings Robbins-Monro (hh - rm)三种方法在操纵研究条件下对4PL IRT模型中项目参数的估计性能,包括因素数量、因素之间的相关性和测试长度。结果表明,在三种估计算法中,没有一种方法可以在所有研究条件下准确估计第4物流项目参数。然而,当因子数为2或3时,可以建议使用MH-RM算法进行4PL模型项目参数估计。此外,可以建议选择较长的测试长度而不是较短的测试长度(n = 24),因为三种算法在较长的测试长度(n = 48)下提供了更好的项目参数估计。
{"title":"The Comparison of Estimation Methods for the Four-Parameter Logistic Item Response Theory Model","authors":"Ö. K. Kalkan","doi":"10.1080/15366367.2021.1897398","DOIUrl":"https://doi.org/10.1080/15366367.2021.1897398","url":null,"abstract":"ABSTRACT The four-parameter logistic (4PL) Item Response Theory (IRT) model has recently been reconsidered in the literature due to the advances in the statistical modeling software and the recent developments in the estimation of the 4PL IRT model parameters. The current simulation study evaluated the performance of expectation-maximization (EM), Quasi-Monte Carlo EM (QMCEM), and Metropolis-Hastings Robbins-Monro (MH-RM) estimation methods for the item parameters in the 4PL IRT model under the manipulated study conditions, including the number of factors, the correlation between factors, and test length. The results indicated that there was no method to be recommended as the best one among the three estimation algorithms for the estimation of 4PL item parameters accurately across all study conditions. However, using the MH-RM algorithm for 4PL model item parameter estimation can be suggested when the number of factors is 2 or 3. In addition, it may be advised to prefer long test lengths rather than shorter test lengths (n = 24), as three algorithms provide better item parameter estimates at long test lengths (n = 48).","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":"7 1","pages":"73 - 90"},"PeriodicalIF":1.0,"publicationDate":"2022-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87961872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Now in JMP® Pro: Structual Equation Modeling 现在在JMP®Pro:结构方程建模
IF 1 Q3 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2022-04-03 DOI: 10.1080/15366367.2022.2094847
{"title":"Now in JMP® Pro: Structual Equation Modeling","authors":"","doi":"10.1080/15366367.2022.2094847","DOIUrl":"https://doi.org/10.1080/15366367.2022.2094847","url":null,"abstract":"","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":"31 1","pages":"1 - 1"},"PeriodicalIF":1.0,"publicationDate":"2022-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87111499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sample Size Requirements for Parameter Recovery in the 4-Parameter Logistic Model 四参数Logistic模型中参数恢复的样本量要求
IF 1 Q3 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2022-04-03 DOI: 10.1080/15366367.2021.1934805
Ismail Cuhadar
ABSTRACT In practice, some test items may display misfit at the upper-asymptote of item characteristic curve due to distraction, anxiety, or carelessness by the test takers (i.e., the slipping effect). The conventional item response theory (IRT) models do not take the slipping effect into consideration, which may violate the model fit assumption in IRT. The 4-parameter logistic model (4PLM) includes a parameter for the misfit at the upper-asymptote. Although the 4PLM took more attention by researchers in recent years, there are a few studies on the sample size requirements for the 4PLM in the literature. The current study investigated the sample size requirements for the parameter recovery in the 4PLM with a systematic simulation study design. Results indicated that the item parameters in the 4PLM can be estimated accurately when the sample size is at least 4000, and the person parameters, excluding the extreme ends of the ability scale, can be estimated accurately for the conditions with a sample size of at least 750.
在实际操作中,由于受试者的注意力分散、焦虑或粗心大意,一些测试项目可能在项目特征曲线的上渐近线处出现不拟合(即滑动效应)。传统的项目反应理论(IRT)模型没有考虑滑动效应,这可能违反了IRT的模型拟合假设。四参数逻辑模型(4PLM)包含了上渐近线处失配的参数。虽然近年来4PLM越来越受到研究者的关注,但文献中对4PLM的样本量要求的研究较少。本研究通过系统模拟研究设计,研究了4PLM中参数恢复的样本量要求。结果表明,当样本量在4000以上时,4PLM中的项目参数可以被准确估计;当样本量在750以上时,人的参数(不包括能力量表的极端端)可以被准确估计。
{"title":"Sample Size Requirements for Parameter Recovery in the 4-Parameter Logistic Model","authors":"Ismail Cuhadar","doi":"10.1080/15366367.2021.1934805","DOIUrl":"https://doi.org/10.1080/15366367.2021.1934805","url":null,"abstract":"ABSTRACT In practice, some test items may display misfit at the upper-asymptote of item characteristic curve due to distraction, anxiety, or carelessness by the test takers (i.e., the slipping effect). The conventional item response theory (IRT) models do not take the slipping effect into consideration, which may violate the model fit assumption in IRT. The 4-parameter logistic model (4PLM) includes a parameter for the misfit at the upper-asymptote. Although the 4PLM took more attention by researchers in recent years, there are a few studies on the sample size requirements for the 4PLM in the literature. The current study investigated the sample size requirements for the parameter recovery in the 4PLM with a systematic simulation study design. Results indicated that the item parameters in the 4PLM can be estimated accurately when the sample size is at least 4000, and the person parameters, excluding the extreme ends of the ability scale, can be estimated accurately for the conditions with a sample size of at least 750.","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":"39 1","pages":"57 - 72"},"PeriodicalIF":1.0,"publicationDate":"2022-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77196196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Using “metaSEM” Package in R 在R中使用“metaSEM”包
IF 1 Q3 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2022-04-03 DOI: 10.1080/15366367.2021.1991759
C. Hoi, R. Schumacker
ABSTRACT Over the last few decades, researchers have increased interests in synthesizing data using the meta-analysis approach. While this method has been able to provide new insights to the literature with findings drawn from secondary data, scholars in the field of Psychology and Methodology have been proposing the integration of meta-analysis with structural equation modeling approach. In this vein, the method of meta-analytic structural equation modeling (MASEM) with the two-step structural equation modeling (TSSEM) approach have been developed, corresponding with the metaSEM package for the use in R statistic package. Ever since its development in 2015, the metaSEM package as well as the TSSEM approach have still been constantly updated and modified. In order to promote the use, this study aims at providing a software review for the metaSEM package and its codes on the R platform. R codes, figures, as well as initial results interpretations are provided.
在过去的几十年里,研究人员对使用荟萃分析方法综合数据的兴趣越来越大。虽然这种方法已经能够通过从二手数据中得出的发现为文献提供新的见解,但心理学和方法论领域的学者已经提出将元分析与结构方程建模方法相结合。在此基础上,采用两步结构方程建模(TSSEM)方法开发了元分析结构方程建模(MASEM)方法,并与R统计包中使用的metaSEM包相对应。自2015年发展以来,metaSEM包和TSSEM方法仍在不断更新和修改。为了促进使用,本研究旨在为R平台上的metaSEM包及其代码提供软件审查。给出了R代码、图形和初步结果解释。
{"title":"Using “metaSEM” Package in R","authors":"C. Hoi, R. Schumacker","doi":"10.1080/15366367.2021.1991759","DOIUrl":"https://doi.org/10.1080/15366367.2021.1991759","url":null,"abstract":"ABSTRACT Over the last few decades, researchers have increased interests in synthesizing data using the meta-analysis approach. While this method has been able to provide new insights to the literature with findings drawn from secondary data, scholars in the field of Psychology and Methodology have been proposing the integration of meta-analysis with structural equation modeling approach. In this vein, the method of meta-analytic structural equation modeling (MASEM) with the two-step structural equation modeling (TSSEM) approach have been developed, corresponding with the metaSEM package for the use in R statistic package. Ever since its development in 2015, the metaSEM package as well as the TSSEM approach have still been constantly updated and modified. In order to promote the use, this study aims at providing a software review for the metaSEM package and its codes on the R platform. R codes, figures, as well as initial results interpretations are provided.","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":"133 1","pages":"111 - 119"},"PeriodicalIF":1.0,"publicationDate":"2022-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90704439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Handling Extreme Scores in Vertically Scaled Fixed-Length Computerized Adaptive Tests 处理极端分数在垂直缩放固定长度计算机自适应测试
IF 1 Q3 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2022-01-02 DOI: 10.1080/15366367.2021.1977583
Adam E. Wyse, J. Mcbride
ABSTRACT A common practical challenge is how to assign ability estimates to all incorrect and all correct response patterns when using item response theory (IRT) models and maximum likelihood estimation (MLE) since ability estimates for these types of responses equal −∞ or +∞. This article uses a simulation study and data from an operational K − 12 computerized adaptive test (CAT) to compare how well several alternatives – including Bayesian maximum a priori (MAP) estimators; various MLE based methods; and assigning constants – work as strategies for computing ability estimates for extreme scores in vertically scaled fixed-length Rasch-based CATs. Results suggested that the MLE-based methods, MAP estimators with prior standard deviations of 4 and above, and assigning constants achieved the desired outcomes of producing finite ability estimates for all correct and all incorrect responses that were more extreme than the MLE values of students that got one item correct or one item incorrect as well as being more extreme than the difficulty of the items students saw during the CAT. Additional analyses showed that it is possible for some methods to exhibit changes in how much they differ in magnitude and variability from the MLE comparison values or the b values of the CAT items for all correct versus all incorrect responses and across grades. Specific discussion is given to how one may select a strategy to assign ability estimates to extreme scores in vertically scaled fixed-length CATs that employ the Rasch model.
一个常见的实际挑战是,当使用项目反应理论(IRT)模型和最大似然估计(MLE)时,如何将能力估计分配给所有不正确和所有正确的反应模式,因为这些类型的反应的能力估计等于−∞或+∞。本文使用模拟研究和操作K−12计算机自适应测试(CAT)的数据来比较几种替代方案的效果-包括贝叶斯最大先验(MAP)估计器;各种基于MLE的方法;和分配常数——作为计算能力估计的策略,在垂直缩放的固定长度基于rasch的cat中,对极端分数进行估计。结果表明,基于MLE的方法、先验标准差为4及以上的MAP估计量和分配常数达到了对所有正确和所有错误答案产生有限能力估计的预期结果,这些估计比答对一项或答错一项的学生的MLE值更极端,也比学生在CAT中看到的题目的难度更极端。额外的分析表明,对于某些方法来说,对于所有正确答案和所有不正确答案以及不同年级,它们的幅度和可变性与MLE比较值或CAT项目的b值的差异有多大是可能的。具体讨论了如何选择一种策略,将能力估计分配给极端分数在垂直缩放固定长度cat中,使用Rasch模型。
{"title":"Handling Extreme Scores in Vertically Scaled Fixed-Length Computerized Adaptive Tests","authors":"Adam E. Wyse, J. Mcbride","doi":"10.1080/15366367.2021.1977583","DOIUrl":"https://doi.org/10.1080/15366367.2021.1977583","url":null,"abstract":"ABSTRACT A common practical challenge is how to assign ability estimates to all incorrect and all correct response patterns when using item response theory (IRT) models and maximum likelihood estimation (MLE) since ability estimates for these types of responses equal −∞ or +∞. This article uses a simulation study and data from an operational K − 12 computerized adaptive test (CAT) to compare how well several alternatives – including Bayesian maximum a priori (MAP) estimators; various MLE based methods; and assigning constants – work as strategies for computing ability estimates for extreme scores in vertically scaled fixed-length Rasch-based CATs. Results suggested that the MLE-based methods, MAP estimators with prior standard deviations of 4 and above, and assigning constants achieved the desired outcomes of producing finite ability estimates for all correct and all incorrect responses that were more extreme than the MLE values of students that got one item correct or one item incorrect as well as being more extreme than the difficulty of the items students saw during the CAT. Additional analyses showed that it is possible for some methods to exhibit changes in how much they differ in magnitude and variability from the MLE comparison values or the b values of the CAT items for all correct versus all incorrect responses and across grades. Specific discussion is given to how one may select a strategy to assign ability estimates to extreme scores in vertically scaled fixed-length CATs that employ the Rasch model.","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":"25 1","pages":"1 - 20"},"PeriodicalIF":1.0,"publicationDate":"2022-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88487274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On Measuring Adaptivity of an Adaptive Test 关于自适应测试的自适应度量
IF 1 Q3 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2022-01-02 DOI: 10.1080/15366367.2021.1922232
Zhongmin Cui
ABSTRACT Although many educational and psychological tests are labeled as computerized adaptive test (CAT), not all tests show the same level of adaptivity – some tests might not have much adaptation because of various constraints imposed by test developers. Researchers have proposed some indices to measure the amount of adaption for an adaptive test. This article shows some limitations of the existing indices. A new index of adaptivity is proposed in this article. Its performance was evaluated in a simulation. The results show that the new index was able to overcome some of the limitations of the existing indices in the simulated scenarios.
虽然许多教育和心理测试被标记为计算机化的适应性测试(CAT),但并非所有的测试都显示出相同的适应性水平——由于测试开发人员施加的各种限制,一些测试可能没有太多的适应性。研究人员提出了一些指标来衡量适应测试的适应程度。本文指出了现有指标的一些局限性。本文提出了一种新的自适应指标。通过仿真对其性能进行了评价。结果表明,新指标能够克服现有指标在模拟场景中的一些局限性。
{"title":"On Measuring Adaptivity of an Adaptive Test","authors":"Zhongmin Cui","doi":"10.1080/15366367.2021.1922232","DOIUrl":"https://doi.org/10.1080/15366367.2021.1922232","url":null,"abstract":"ABSTRACT Although many educational and psychological tests are labeled as computerized adaptive test (CAT), not all tests show the same level of adaptivity – some tests might not have much adaptation because of various constraints imposed by test developers. Researchers have proposed some indices to measure the amount of adaption for an adaptive test. This article shows some limitations of the existing indices. A new index of adaptivity is proposed in this article. Its performance was evaluated in a simulation. The results show that the new index was able to overcome some of the limitations of the existing indices in the simulated scenarios.","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":"24 1","pages":"21 - 33"},"PeriodicalIF":1.0,"publicationDate":"2022-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80780201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Psychometrics: An Introduction 心理测量学:简介
IF 1 Q3 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2022-01-02 DOI: 10.1080/15366367.2021.1976089
A. Huggins-Manley
{"title":"Psychometrics: An Introduction","authors":"A. Huggins-Manley","doi":"10.1080/15366367.2021.1976089","DOIUrl":"https://doi.org/10.1080/15366367.2021.1976089","url":null,"abstract":"","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":"1 1","pages":"47 - 48"},"PeriodicalIF":1.0,"publicationDate":"2022-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85589079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Measurement-Interdisciplinary Research and Perspectives
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1