Journal of Educational Measurement最新文献

英文中文

Validating Performance Standards via Latent Class Analysis 通过潜在类分析验证性能标准

IF 1.3 4区心理学 Q3 PSYCHOLOGY, APPLIED

Journal of Educational Measurement

Pub Date : 2022-05-05 DOI: 10.1111/jedm.12325

Salih Binici, Ismail Cuhadar

Validity of performance standards is a key element for the defensibility of standard setting results, and validating performance standards requires collecting multiple pieces of evidence at every step during the standard setting process. This study employs a statistical procedure, latent class analysis, to set performance standards and compares latent class analysis results with previously established performance standards via the modified-Angoff method for cross-validation. The context of the study is an operational large-scale science assessment administered in one of the southern states in the United States. Results show that the number of classes that emerged in the latent class analysis concurs with the number of existing performance levels. In addition, there is a substantial level of agreement between latent class analysis results and modified-Angoff method in terms of classifying students into the same performance levels. Overall, the findings establish evidence for the validity of the performance standards identified via the modified-Angoff method. Practical implications of the study findings are discussed.

绩效标准的有效性是标准制定结果的可辩护性的关键因素，验证绩效标准需要在标准制定过程的每一步收集多个证据。本研究采用统计程序潜类分析来设定性能标准，并通过修正angoff方法将潜类分析结果与先前建立的性能标准进行交叉验证。该研究的背景是在美国南部的一个州进行的一项可操作的大规模科学评估。结果表明，在潜在类分析中出现的类的数量与现有性能水平的数量一致。此外，在将学生划分为相同的成绩水平方面，潜在类分析结果与修正angoff方法之间存在相当程度的一致性。总体而言，研究结果为通过改进的angoff方法确定的绩效标准的有效性提供了证据。讨论了研究结果的实际意义。

引用次数: 1

Score Comparability Issues with At-Home Testing and How to Address Them 评分可比性问题与在家测试和如何解决他们

IF 1.3 4区心理学 Q3 PSYCHOLOGY, APPLIED

Journal of Educational Measurement

Pub Date : 2022-05-04 DOI: 10.1111/jedm.12324

Gautam Puhan, Sooyeon Kim

As a result of the COVID-19 pandemic, at-home testing has become a popular delivery mode in many testing programs. When programs offer at-home testing to expand their service, the score comparability between test takers testing remotely and those testing in a test center is critical. This article summarizes statistical procedures that could be used to evaluate potential mode effects at both the item level and the total score levels. Using operational data from a licensure test, we also compared linking relationships between the test center and at-home testing groups to determine the reporting score conversion from a subpopulation invariance perspective.

由于COVID-19大流行，在家检测已成为许多检测项目中流行的交付模式。当项目提供在家考试来扩展他们的服务时，远程考试和在考试中心考试的考生之间的分数可比性是至关重要的。本文总结了可用于在项目水平和总分水平上评估潜在模式效应的统计程序。使用来自许可证测试的操作数据，我们还比较了测试中心和家庭测试组之间的联系关系，以确定从亚总体不变性角度报告的分数转换。

引用次数: 3

The Impact of Cheating on Score Comparability via Pool-Based IRT Pre-equating 作弊对分数可比性的影响——基于池的IRT预均衡

IF 1.3 4区心理学 Q3 PSYCHOLOGY, APPLIED

Journal of Educational Measurement

Pub Date : 2022-05-01 DOI: 10.1111/jedm.12321

Jinghua Liu, Kirk Becker

For any testing programs that administer multiple forms across multiple years, maintaining score comparability via equating is essential. With continuous testing and high-stakes results, especially with less secure online administrations, testing programs must consider the potential for cheating on their exams. This study used empirical and simulated data to examine the impact of item exposure and prior knowledge on the estimation of item difficulty and test taker's ability via pool-based IRT preequating. Raw-to-theta transformations were derived from two groups of test takers with and without possible prior knowledge of exposed items, and these were compared to a criterion raw to theta transformation. Results indicated that item exposure has a large impact on item difficulty, not only altering the difficulty of exposed items, but also altering the difficulty of unexposed items. Item exposure makes test takers with prior knowledge appear more able. Further, theta estimation bias for test takers without prior knowledge increases when more test takers with possible prior knowledge are in the calibration population. Score inflation occurs for test takers with and without prior knowledge, especially for those with lower abilities.

对于任何在多年中管理多种形式的考试项目，通过相等来保持分数的可比性是必不可少的。随着持续的考试和高风险的结果，特别是不太安全的在线管理，考试项目必须考虑到考试作弊的可能性。本研究运用实证和模拟数据，通过基于池的IRT预均衡，考察了项目暴露和先验知识对项目难度和考生能力估计的影响。原始到theta的转换是从两组有或没有可能事先了解暴露项目的测试者中得出的，并且将这些与原始到theta转换的标准进行比较。结果表明，项目暴露对项目难度有较大影响，不仅改变了被暴露项目的难度，也改变了未被暴露项目的难度。项目暴露使具有先验知识的考生表现得更有能力。此外，当校准人群中有更多可能具有先验知识的考生时，没有先验知识的考生的theta估计偏差会增加。分数膨胀发生在有或没有先验知识的考生身上，尤其是那些能力较低的考生。

{"title":"The Impact of Cheating on Score Comparability via Pool-Based IRT Pre-equating","authors":"Jinghua Liu, Kirk Becker","doi":"10.1111/jedm.12321","DOIUrl":"10.1111/jedm.12321","url":null,"abstract":"<p>For any testing programs that administer multiple forms across multiple years, maintaining score comparability via equating is essential. With continuous testing and high-stakes results, especially with less secure online administrations, testing programs must consider the potential for cheating on their exams. This study used empirical and simulated data to examine the impact of item exposure and prior knowledge on the estimation of item difficulty and test taker's ability via pool-based IRT preequating. Raw-to-theta transformations were derived from two groups of test takers with and without possible prior knowledge of exposed items, and these were compared to a criterion raw to theta transformation. Results indicated that item exposure has a large impact on item difficulty, not only altering the difficulty of exposed items, but also altering the difficulty of unexposed items. Item exposure makes test takers with prior knowledge appear more able. Further, theta estimation bias for test takers without prior knowledge increases when more test takers with possible prior knowledge are in the calibration population. Score inflation occurs for test takers with and without prior knowledge, especially for those with lower abilities.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"59 2","pages":"208-230"},"PeriodicalIF":1.3,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46066972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Score Comparability between Online Proctored and In-Person Credentialing Exams 在线监考和现场考试之间的分数可比性

IF 1.3 4区心理学 Q3 PSYCHOLOGY, APPLIED

Journal of Educational Measurement

Pub Date : 2022-04-27 DOI: 10.1111/jedm.12320

Paul Jones, Ye Tong, Jinghua Liu, Joshua Borglum, Vince Primoli

This article studied two methods to detect mode effects in two credentialing exams. In Study 1, we used a “modal scale comparison approach,” where the same pool of items was calibrated separately, without transformation, within two TC cohorts (TC1 and TC2) and one OP cohort (OP1) matched on their pool-based scale score distributions. The calibrations from all three groups were used to score the TC2 cohort, designated the validation sample. The TC1 item parameters and TC1-based thetas and pass rates were more like the native TC2 values than the OP1-based values, indicating mode effects, but the score and pass/fail decision differences were small. In Study 2, we used a “cross-modal repeater approach” in which test takers who failed their first attempt in one modality took the test again in either the same or different modality. The two pairs of repeater groups (TC → TC: TC → OP, and OP → OP: OP → TC) were matched exactly on their first attempt scores. Results showed increased pass rate and greater score variability in all conditions involving OP, with mode effects noticeable in both the TC → OP condition and less-strongly in the OP → TC condition. Limitations of the study and implications for exam developers were discussed.

本文研究了两种检测两种认证考试模式效应的方法。在研究1中，我们使用了“模态量表比较方法”，其中在两个TC队列(TC1和TC2)和一个OP队列(OP1)中分别校准相同的项目池，而不进行转换，其基于池的量表得分分布相匹配。使用所有三组的校准值对TC2队列进行评分，指定验证样本。TC1项目参数和基于TC1的theta和通过率比基于op1的值更接近原生TC2值，表明模式效应，但得分和通过/不通过决策差异较小。在研究2中，我们使用了“跨模态重复测试方法”，即在第一次测试中失败的应试者用相同或不同的模态再次进行测试。两对重复组(TC→TC: TC→OP和OP→OP: OP→TC)的第一次尝试分数完全匹配。结果显示，在所有涉及OP的条件下，通过率增加，得分变异性更大，模式效应在TC→OP条件下都很明显，而在OP→TC条件下则不那么强烈。讨论了本研究的局限性和对考试开发者的启示。

{"title":"Score Comparability between Online Proctored and In-Person Credentialing Exams","authors":"Paul Jones, Ye Tong, Jinghua Liu, Joshua Borglum, Vince Primoli","doi":"10.1111/jedm.12320","DOIUrl":"10.1111/jedm.12320","url":null,"abstract":"<p>This article studied two methods to detect mode effects in two credentialing exams. In Study 1, we used a “modal scale comparison approach,” where the same pool of items was calibrated separately, without transformation, within two TC cohorts (TC1 and TC2) and one OP cohort (OP1) matched on their pool-based scale score distributions. The calibrations from all three groups were used to score the TC2 cohort, designated the validation sample. The TC1 item parameters and TC1-based thetas and pass rates were more like the native TC2 values than the OP1-based values, indicating mode effects, but the score and pass/fail decision differences were small. In Study 2, we used a “cross-modal repeater approach” in which test takers who failed their first attempt in one modality took the test again in either the same or different modality. The two pairs of repeater groups (TC → TC: TC → OP, and OP → OP: OP → TC) were matched exactly on their first attempt scores. Results showed increased pass rate and greater score variability in all conditions involving OP, with mode effects noticeable in both the TC → OP condition and less-strongly in the OP → TC condition. Limitations of the study and implications for exam developers were discussed.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"59 2","pages":"180-207"},"PeriodicalIF":1.3,"publicationDate":"2022-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43064453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Random Responders in the TIMSS 2015 Student Questionnaire: A Threat to Validity? TIMSS 2015学生问卷中的随机应答者:对效度的威胁?

IF 1.3 4区心理学 Q3 PSYCHOLOGY, APPLIED

Journal of Educational Measurement

Pub Date : 2022-04-26 DOI: 10.1111/jedm.12317

Saskia van Laar, Johan Braeken

The low-stakes character of international large-scale educational assessments implies that a participating student might at times provide unrelated answers as if s/he was not even reading the items and choosing a response option randomly throughout. Depending on the severity of this invalid response behavior, interpretations of the assessment results are at risk of being invalidated. Not much is known about the prevalence nor impact of such random responders in the context of international large-scale educational assessments. Following a mixture item response theory (IRT) approach, an initial investigation of both issues is conducted for the Confidence in and Value of Mathematics/Science (VoM/VoS) scales in the Trends in International Mathematics and Science Study (TIMSS) 2015 student questionnaire. We end with a call to facilitate further mapping of invalid response behavior in this context by the inclusion of instructed response items and survey completion speed indicators in the assessments and a habit of sensitivity checks in all secondary data studies.

国际大规模教育评估的低风险特征意味着参与的学生有时可能会提供不相关的答案，就好像他/她甚至没有阅读项目，而是随机选择一个回答选项。根据这种无效响应行为的严重程度，对评估结果的解释有被无效的风险。在国际大规模教育评估的背景下，这种随机反应者的流行程度和影响尚不清楚。采用混合项目反应理论(IRT)方法，对国际数学与科学趋势研究(TIMSS) 2015学生问卷中的数学/科学信心和价值(VoM/VoS)量表进行了这两个问题的初步调查。最后，我们呼吁通过在评估中纳入指示响应项目和调查完成速度指标，以及在所有次要数据研究中习惯进行敏感性检查，促进在这种情况下进一步映射无效响应行为。

引用次数: 0

Detecting Differential Item Functioning Using Posterior Predictive Model Checking: A Comparison of Discrepancy Statistics 用后验预测模型检验检测差异项目功能:差异统计的比较

IF 1.3 4区心理学 Q3 PSYCHOLOGY, APPLIED

Journal of Educational Measurement

Pub Date : 2022-04-25 DOI: 10.1111/jedm.12316

Seang-Hwane Joo, Philseok Lee

This study proposes a new Bayesian differential item functioning (DIF) detection method using posterior predictive model checking (PPMC). Item fit measures including infit, outfit, observed score distribution (OSD), and Q1 were considered as discrepancy statistics for the PPMC DIF methods. The performance of the PPMC DIF method was evaluated via a Monte Carlo simulation manipulating sample size, DIF size, DIF type, DIF percentage, and subpopulation trait distribution. Parametric DIF methods, such as Lord's chi-square and Raju's area approaches, were also included in the simulation design in order to compare the performance of the proposed PPMC DIF methods to those previously existing. Based on Type I error and power analysis, we found that PPMC DIF methods showed better-controlled Type I error rates than the existing methods and comparable power to detect uniform DIF. The implications and recommendations for applied researchers are discussed.

本文提出了一种新的基于后验预测模型检验的贝叶斯差分项目功能(DIF)检测方法。项目拟合措施包括infit、outfit、观察得分分布(OSD)和Q1被认为是PPMC DIF方法的差异统计。通过蒙特卡罗模拟对样本大小、DIF大小、DIF类型、DIF百分比和亚种群性状分布进行了评价。参数DIF方法，如Lord卡方法和Raju面积法，也包括在仿真设计中，以比较所提出的PPMC DIF方法与先前存在的DIF方法的性能。基于I型误差和功率分析，我们发现PPMC DIF方法比现有方法具有更好的I型错误率控制，并且具有相当的检测均匀DIF的能力。讨论了对应用研究人员的启示和建议。

引用次数: 1

Two IRT Characteristic Curve Linking Methods Weighted by Information 两种信息加权的IRT特征曲线连接方法

IF 1.3 4区心理学 Q3 PSYCHOLOGY, APPLIED

Journal of Educational Measurement

Pub Date : 2022-04-17 DOI: 10.1111/jedm.12315

Shaojie Wang, Minqiang Zhang, Won-Chan Lee, Feifei Huang, Zonglong Li, Yixing Li, Sufang Yu

Traditional IRT characteristic curve linking methods ignore parameter estimation errors, which may undermine the accuracy of estimated linking constants. Two new linking methods are proposed that take into account parameter estimation errors. The item- (IWCC) and test-information-weighted characteristic curve (TWCC) methods employ weighting components in the loss function from traditional methods by their corresponding item and test information, respectively. Monte Carlo simulation was conducted to evaluate the performances of the new linking methods and compare them with traditional ones. Ability difference between linking groups, sample size, and test length were manipulated under the common-item nonequivalent groups design. Results showed that the two information-weighted characteristic curve methods outperformed traditional methods, in general. TWCC was found to be more accurate and stable than IWCC. A pseudo-form pseudo-group analysis was also performed, and similar results were observed. Finally, guidelines for practice and future directions are discussed.

传统的IRT特征曲线连接方法忽略了参数估计误差，这可能会影响连接常数估计的准确性。提出了两种考虑参数估计误差的连接方法。项目加权特征曲线法(IWCC)和测试信息加权特征曲线法(TWCC)分别利用传统方法中对应的项目和测试信息对损失函数进行加权。通过蒙特卡罗仿真对新连接方法的性能进行了评价，并与传统连接方法进行了比较。在共同项目非等效组设计下，对连接组之间的能力差异、样本量和测试长度进行处理。结果表明，两种加权特征曲线方法总体上优于传统方法。TWCC比IWCC更准确、更稳定。伪形式伪组分析也进行，并观察到类似的结果。最后，对实践指导和未来发展方向进行了讨论。

引用次数: 0

Evaluation of Factors Affecting the Performance of the S−X2$S-X^{2}$ Item‐Fit Index S- X2$S- x ^{2}$项目拟合指数性能影响因素的评价

IF 1.3 4区心理学 Q3 PSYCHOLOGY, APPLIED

Journal of Educational Measurement

Pub Date : 2022-03-29 DOI: 10.1111/jedm.12312

Hyungjin Kim, Won‐Chan Lee

引用次数: 0

A Residual‐Based Differential Item Functioning Detection Framework in Item Response Theory 项目反应理论中基于残差的差异项目功能检测框架

IF 1.3 4区心理学 Q3 PSYCHOLOGY, APPLIED

Journal of Educational Measurement

Pub Date : 2022-03-28 DOI: 10.1111/jedm.12313

Hwanggyu Lim, Edison M. Choe, K. T. Han

引用次数: 4

Assessing the Impact of Equating Error on Group Means and Group Mean Differences 评估等式误差对组均值和组均值差异的影响

IF 1.3 4区心理学 Q3 PSYCHOLOGY, APPLIED

Journal of Educational Measurement

Pub Date : 2022-03-16 DOI: 10.1111/jedm.12311

Dongmei Li

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Journal of Educational Measurement

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀