首页 > 最新文献

Educational Measurement-Issues and Practice最新文献

英文 中文
Digital Module 35: Through-Year Assessment 数字模块 35:全年评估
IF 2 4区 教育学 Q2 Social Sciences Pub Date : 2024-02-28 DOI: 10.1111/emip.12595
Nathan Dadey, Brian Gong, Yun-Kyung Kim, Edynn Sato

Module Abstract

Through-year assessments are assessments that are administered in multiple parts and at different times over the course of a school year that also produce summative scores that can be used with state accountability systems (Lorié et al., 2021; Dadey & Gong, 2023). These assessments are alternatively known as instructionally embedded, through-course, or periodic assessments. There are a number of possible through-year assessment models, and they have recently been the subject of much policy interest as they have the potential to inform subsequent instruction, be more closely aligned with and responsive to curricula and instruction, provide more proximal measures of learning, and be a more sensitive measure of student progress or growth than typical year-end summative assessments (Clark & Karvonen, 2021; Gong, 2021; NWEA, 2021; Wise, 2011). More research is needed, however, to substantiate these potential uses.

模块 摘要 贯穿学年的评估是指在一学年中的不同时间分多个部分进行的评估,这些评估也会 产生总结性的分数,可用于州问责制度(Lorié et al.,2021;Dadey & Gong,2023)。这些评估也被称为教学嵌入式评估、贯穿课程评估或定期评估。有许多可能的贯穿全年的评估模式,它们最近一直是许多政策关注的主题,因为它们有可能为后续教学提供信息,与课程和教学更紧密地结合和响应,提供更接近学习的衡量标准,比典型的年终总结性评估更灵敏地衡量学生的进步或成长(Clark & Karvonen, 2021; Gong, 2021; NWEA, 2021; Wise, 2011)。然而,还需要更多的研究来证实这些潜在的用途。
{"title":"Digital Module 35: Through-Year Assessment","authors":"Nathan Dadey,&nbsp;Brian Gong,&nbsp;Yun-Kyung Kim,&nbsp;Edynn Sato","doi":"10.1111/emip.12595","DOIUrl":"https://doi.org/10.1111/emip.12595","url":null,"abstract":"<div>\u0000 \u0000 <section>\u0000 \u0000 <h3> Module Abstract</h3>\u0000 \u0000 <p><i>Through-year assessments</i> are assessments that are administered in multiple parts and at different times over the course of a school year that also produce summative scores that can be used with state accountability systems (Lorié et al., 2021; Dadey &amp; Gong, 2023). These assessments are alternatively known as instructionally embedded, through-course, or periodic assessments. There are a number of possible through-year assessment models, and they have recently been the subject of much policy interest as they have the potential to inform subsequent instruction, be more closely aligned with and responsive to curricula and instruction, provide more proximal measures of learning, and be a more sensitive measure of student progress or growth than typical year-end summative assessments (Clark &amp; Karvonen, 2021; Gong, 2021; NWEA, 2021; Wise, 2011). More research is needed, however, to substantiate these potential uses.</p>\u0000 </section>\u0000 </div>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139987433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Workflow for Minimizing Errors in Template-Based Automated Item-Generation Development 在基于模板的自动物品生成开发过程中尽量减少错误的工作流程
IF 2 4区 教育学 Q2 Social Sciences Pub Date : 2024-02-12 DOI: 10.1111/emip.12600
Yanyan Fu

The template-based automated item-generation (TAIG) approach that involves template creation, item generation, item selection, field-testing, and evaluation has more steps than the traditional item development method. Consequentially, there is more margin for error in this process, and any template errors can be cascaded to the generated items. Therefore, it is essential to eliminate the source of errors and ensure the quality of the template so items can be problem-free. The article introduces a process to reduce template errors at the early stage of template development, minimize the impact of template errors on generated items, and increase the survival rates of generated items. The article also discusses a statistical method to establish confidence in the quality of the template by systematically examining the quality of the generated items. The proposed method can reduce the review process for some items generated from a template.

与传统的项目开发方法相比,基于模板的自动项目生成(TAIG)方法涉及模板创建、项目生成、项目选择、现场测试和评估等多个步骤。因此,在这一过程中出错的余地更大,而且任何模板错误都会连带影响到生成的项目。因此,必须消除错误源,确保模板的质量,从而使项目不出现问题。文章介绍了在模板开发早期阶段减少模板错误、最大限度地减少模板错误对生成项目的影响以及提高生成项目存活率的过程。文章还讨论了一种统计方法,通过系统地检查生成项目的质量来建立对模板质量的信心。所提出的方法可以减少由模板生成的某些项目的审查过程。
{"title":"A Workflow for Minimizing Errors in Template-Based Automated Item-Generation Development","authors":"Yanyan Fu","doi":"10.1111/emip.12600","DOIUrl":"10.1111/emip.12600","url":null,"abstract":"<p>The template-based automated item-generation (TAIG) approach that involves template creation, item generation, item selection, field-testing, and evaluation has more steps than the traditional item development method. Consequentially, there is more margin for error in this process, and any template errors can be cascaded to the generated items. Therefore, it is essential to eliminate the source of errors and ensure the quality of the template so items can be problem-free. The article introduces a process to reduce template errors at the early stage of template development, minimize the impact of template errors on generated items, and increase the survival rates of generated items. The article also discusses a statistical method to establish confidence in the quality of the template by systematically examining the quality of the generated items. The proposed method can reduce the review process for some items generated from a template.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139760582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The University of California Was Wrong to Abolish the SAT: Admissions When Affirmative Action Was Banned 加州大学废除 SAT 是错误的:禁止平权法案时的招生工作
IF 2 4区 教育学 Q2 Social Sciences Pub Date : 2024-02-09 DOI: 10.1111/emip.12598
Donald Wittman

I study student characteristics and academic performance at the University of California, where consideration of an applicant's ethnicity has been banned since 1996 and SAT scores were used in admitting students to the university until fall 2021. I show the following: (1) SAT scores were more important than high school grades in predicting first-year university GPA; (2) the use of SAT scores alone or with high school grades in determining admission is biased in favor of admitting underrepresented minorities and students who are socioeconomically disadvantaged; (3) SAT scores are more important and high school grades are less important in predicting GPA for underrepresented minorities and/or those students from low-income families than they are for those students who are white and/or from high-income families; and (4) the University of California found ways to admit a significant number of underrepresented minorities despite many of them having low SAT scores.

我研究了加利福尼亚大学的学生特征和学业成绩,该校自 1996 年起禁止考虑申请者的种族,在 2021 年秋季之前,录取学生时使用 SAT 分数。我的研究结果如下(1) 在预测大学一年级的 GPA 方面,SAT 分数比高中成绩更重要;(2) 在决定录取时,单独使用 SAT 分数或与高中成绩一起使用,都偏向于录取代表人数不足的少数族裔和社会经济状况不佳的学生;(3) 与白人和/或来自高收入家庭的学生相比,SAT 分数在预测代表性不足的少数族裔和/或来自低收入家庭的学生的 GPA 方面更重要,而高中成绩则不那么重要;以及 (4) 加利福尼亚大学想方设法录取了大量代表性不足的少数族裔,尽管其中许多人的 SAT 分数很低。
{"title":"The University of California Was Wrong to Abolish the SAT: Admissions When Affirmative Action Was Banned","authors":"Donald Wittman","doi":"10.1111/emip.12598","DOIUrl":"10.1111/emip.12598","url":null,"abstract":"<p>I study student characteristics and academic performance at the University of California, where consideration of an applicant's ethnicity has been banned since 1996 and SAT scores were used in admitting students to the university until fall 2021. I show the following: (1) SAT scores were more important than high school grades in predicting first-year university GPA; (2) the use of SAT scores alone or with high school grades in determining admission is biased in favor of admitting underrepresented minorities and students who are socioeconomically disadvantaged; (3) SAT scores are more important and high school grades are less important in predicting GPA for underrepresented minorities and/or those students from low-income families than they are for those students who are white and/or from high-income families; and (4) the University of California found ways to admit a significant number of underrepresented minorities despite many of them having low SAT scores.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12598","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139771065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Automated Item Pool Assembly Framework for Maximizing Item Utilization for CAT 实现计算机辅助翻译项目利用率最大化的自动化项目库组装框架
IF 2 4区 教育学 Q2 Social Sciences Pub Date : 2024-02-09 DOI: 10.1111/emip.12589
Hwanggyu Lim, Kyung (Chris) T. Han

Computerized adaptive testing (CAT) has gained deserved popularity in the administration of educational and professional assessments, but continues to face test security challenges. To ensure sustained quality assurance and testing integrity, it is imperative to establish and maintain multiple stable item pools that are consistent in terms of psychometric characteristics and content specifications. This study introduces the Honeycomb Pool Assembly (HPA) framework, an innovative solution for the construction of multiple parallel item pools for CAT that maximizes item utilization in the item bank. The HPA framework comprises two stages—cell assembly and pool assembly—and uses a mixed integer programming modeling approach. An empirical study demonstrated HPA's effectiveness in creating a large number of parallel pools using a real-world high-stakes CAT assessment item bank. The HPA framework offers several advantages, including (a) simultaneous creation of multiple parallel pools, (b) simplification of item pool maintenance, and (c) flexibility in establishing statistical and operational constraints. Moreover, it can help testing organizations efficiently manage and monitor the health of their item banks. Thus, the HPA framework is expected to be a valuable tool for testing professionals and organizations to address test security challenges and maintain the integrity of high-stakes CAT assessments.

计算机化自适应测试(CAT)在教育和专业评估中得到了应有的普及,但仍然面临着测试安全方面的挑战。为了确保持续的质量保证和测试的完整性,必须建立和维护多个稳定的项目库,这些项目库在心理测量特征和内容规范方面必须保持一致。本研究介绍了蜂巢题库组装(HPA)框架,这是一种创新的解决方案,用于为计算机辅助测试(CAT)构建多个并行题库,最大限度地提高题库中题目的利用率。HPA 框架包括两个阶段--单元组装和题库组装--并采用混合整数编程建模方法。一项实证研究表明,HPA 在使用真实世界的高风险 CAT 评估项目库创建大量并行项目库方面非常有效。HPA 框架具有多种优势,包括:(a)同时创建多个并行题库;(b)简化题库维护;(c)灵活建立统计和操作约束。此外,它还能帮助测试机构有效地管理和监控其项目库的健康状况。因此,HPA 框架有望成为测试专业人员和机构应对测试安全挑战和维护高风险计算机辅助测试评估完整性的宝贵工具。
{"title":"An Automated Item Pool Assembly Framework for Maximizing Item Utilization for CAT","authors":"Hwanggyu Lim,&nbsp;Kyung (Chris) T. Han","doi":"10.1111/emip.12589","DOIUrl":"10.1111/emip.12589","url":null,"abstract":"<p>Computerized adaptive testing (CAT) has gained deserved popularity in the administration of educational and professional assessments, but continues to face test security challenges. To ensure sustained quality assurance and testing integrity, it is imperative to establish and maintain multiple stable item pools that are consistent in terms of psychometric characteristics and content specifications. This study introduces the Honeycomb Pool Assembly (HPA) framework, an innovative solution for the construction of multiple parallel item pools for CAT that maximizes item utilization in the item bank. The HPA framework comprises two stages—cell assembly and pool assembly—and uses a mixed integer programming modeling approach. An empirical study demonstrated HPA's effectiveness in creating a large number of parallel pools using a real-world high-stakes CAT assessment item bank. The HPA framework offers several advantages, including (a) simultaneous creation of multiple parallel pools, (b) simplification of item pool maintenance, and (c) flexibility in establishing statistical and operational constraints. Moreover, it can help testing organizations efficiently manage and monitor the health of their item banks. Thus, the HPA framework is expected to be a valuable tool for testing professionals and organizations to address test security challenges and maintain the integrity of high-stakes CAT assessments.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139894957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MxML (Exploring the Relationship between Measurement and Machine Learning): Current State of the Field MxML(探索测量与机器学习之间的关系):领域现状
IF 2 4区 教育学 Q2 Social Sciences Pub Date : 2024-01-29 DOI: 10.1111/emip.12593
Yi Zheng, Steven Nydick, Sijia Huang, Susu Zhang

The recent surge of machine learning (ML) has impacted many disciplines, including educational and psychological measurement (hereafter shortened as measurement). The measurement literature has seen rapid growth in applications of ML to solve measurement problems. However, as we emphasize in this article, it is imperative to critically examine the potential risks associated with involving ML in measurement. The MxML project aims to explore the relationship between measurement and ML, so as to identify and address the risks and better harness the power of ML to serve measurement missions. This paper describes the first study of the MxML project, in which we summarize the state of the field of applications, extensions, and discussions about ML in measurement contexts with a systematic review of the recent 10 years’ literature. We provide a snapshot of the literature in (1) areas of measurement where ML is discussed, (2) types of articles (e.g., applications, conceptual, etc.), (3) ML methods discussed, and (4) potential risks associated with involving ML in measurement, which result from the differences between what measurement tasks need versus what ML techniques can provide.

近年来,机器学习(ML)的迅猛发展影响了许多学科,包括教育和心理测量(以下简称测量)。在测量文献中,应用 ML 解决测量问题的案例迅速增加。然而,正如我们在本文中所强调的,必须严格审查将 ML 应用于测量的潜在风险。MxML 项目旨在探索测量与 ML 之间的关系,从而识别和应对风险,更好地利用 ML 的力量为测量任务服务。本文介绍了 MxML 项目的第一项研究,通过对最近 10 年的文献进行系统回顾,我们总结了有关测量背景下 ML 的应用、扩展和讨论领域的现状。我们提供了以下方面的文献快照:(1) 讨论 ML 的测量领域;(2) 文章类型(如应用、概念等);(3) 讨论的 ML 方法;(4) 将 ML 应用于测量的潜在风险,这些风险源于测量任务的需求与 ML 技术所能提供的需求之间的差异。
{"title":"MxML (Exploring the Relationship between Measurement and Machine Learning): Current State of the Field","authors":"Yi Zheng,&nbsp;Steven Nydick,&nbsp;Sijia Huang,&nbsp;Susu Zhang","doi":"10.1111/emip.12593","DOIUrl":"https://doi.org/10.1111/emip.12593","url":null,"abstract":"<p>The recent surge of machine learning (ML) has impacted many disciplines, including educational and psychological measurement (hereafter shortened as <i>measurement</i>). The measurement literature has seen rapid growth in applications of ML to solve measurement problems. However, as we emphasize in this article, it is imperative to critically examine the potential risks associated with involving ML in measurement. The MxML project aims to explore the relationship between measurement and ML, so as to identify and address the risks and better harness the power of ML to serve measurement missions. This paper describes the first study of the MxML project, in which we summarize the state of the field of applications, extensions, and discussions about ML in measurement contexts with a systematic review of the recent 10 years’ literature. We provide a snapshot of the literature in (1) areas of measurement where ML is discussed, (2) types of articles (e.g., applications, conceptual, etc.), (3) ML methods discussed, and (4) potential risks associated with involving ML in measurement, which result from the differences between what measurement tasks need versus what ML techniques can provide.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139987475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Expected Classification Accuracy for Categorical Growth Models 分类增长模型的预期分类精度
IF 2 4区 教育学 Q2 Social Sciences Pub Date : 2024-01-29 DOI: 10.1111/emip.12599
Daniel Murphy, Sarah Quesen, Matthew Brunetti, Quintin Love

Categorical growth models describe examinee growth in terms of performance-level category transitions, which implies that some percentage of examinees will be misclassified. This paper introduces a new procedure for estimating the classification accuracy of categorical growth models, based on Rudner's classification accuracy index for item response theory–based assessments. Results of a simulation study are presented to provide evidence for the accuracy and validity of the approach. Also, an empirical example is presented to demonstrate the approach using data from the Indiana Student Performance Readiness and Observation of Understanding Tool growth model, which classifies examinees into growth categories used by the Office of Special Education Programs to monitor the progress of preschool children who receive special education services.

分类增长模型以成绩水平的类别转换来描述考生的增长,这意味着一定比例的考生会被错误分类。本文介绍了一种估算分类增长模型分类准确性的新程序,该程序基于鲁德纳的分类准确性指数,适用于基于项目反应理论的评估。本文介绍了一项模拟研究的结果,以证明该方法的准确性和有效性。此外,还介绍了一个实证范例,使用印第安纳州学生成绩准备和理解能力观察工具增长模型的数据来演示该方法,该模型将受试者分为不同的增长类别,供特殊教育项目办公室用于监测接受特殊教育服务的学龄前儿童的进展情况。
{"title":"Expected Classification Accuracy for Categorical Growth Models","authors":"Daniel Murphy,&nbsp;Sarah Quesen,&nbsp;Matthew Brunetti,&nbsp;Quintin Love","doi":"10.1111/emip.12599","DOIUrl":"10.1111/emip.12599","url":null,"abstract":"<p>Categorical growth models describe examinee growth in terms of performance-level category transitions, which implies that some percentage of examinees will be misclassified. This paper introduces a new procedure for estimating the classification accuracy of categorical growth models, based on Rudner's classification accuracy index for item response theory–based assessments. Results of a simulation study are presented to provide evidence for the accuracy and validity of the approach. Also, an empirical example is presented to demonstrate the approach using data from the Indiana Student Performance Readiness and Observation of Understanding Tool growth model, which classifies examinees into growth categories used by the Office of Special Education Programs to monitor the progress of preschool children who receive special education services.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140487311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Knowledge Integration in Science Learning: Tracking Students' Knowledge Development and Skill Acquisition with Cognitive Diagnosis Models 科学学习中的知识整合:利用认知诊断模型跟踪学生的知识发展和技能掌握情况
IF 2 4区 教育学 Q2 Social Sciences Pub Date : 2024-01-24 DOI: 10.1111/emip.12592
Xin Xu, Shixiu Ren, Danhui Zhang, Tao Xin

In scientific literacy, knowledge integration (KI) is a scaffolding-based theory to assist students' scientific inquiry learning. To drive students to be self-directed, many courses have been developed based on KI framework. However, few efforts have been made to evaluate the outcome of students' learning under KI instruction. Moreover, finer-grained information has been pursued to better understand students' learning and how it progresses over time. In this article, a normative procedure of building and choosing cognitive diagnosis models (CDMs) and attribute hierarchies was formulated under KI theory. We examined the utility of CDMs for evaluating students' knowledge status in KI learning. The results of the data analysis confirmed an intuitive assumption of the hierarchical structure of KI components. Furthermore, analysis of pre- and posttests using a higher-order, hidden Markov model tracked students' skill acquisition while integrating knowledge. Results showed that students make significant progress after using the web-based inquiry science environment (WISE) platform.

在科学素养中,知识整合(KI)是一种以支架为基础的理论,用以帮助学生进行科学探究学习。为了推动学生自主学习,许多课程都是基于知识整合框架开发的。然而,很少有人对学生在知识整合教学下的学习效果进行评估。此外,为了更好地了解学生的学习情况及其随着时间的推移是如何进步的,人们一直在追求更精细的信息。本文根据知识创新理论,制定了建立和选择认知诊断模型(CDM)和属性分层的规范程序。我们考察了认知诊断模型在评价学生知识创新学习中的知识状况方面的效用。数据分析的结果证实了关于知识创新成分层次结构的直观假设。此外,使用高阶隐马尔可夫模型对前后测试进行分析,可追踪学生在整合知识的同时掌握技能的情况。结果表明,学生在使用网络探究科学环境(WISE)平台后取得了显著进步。
{"title":"Knowledge Integration in Science Learning: Tracking Students' Knowledge Development and Skill Acquisition with Cognitive Diagnosis Models","authors":"Xin Xu,&nbsp;Shixiu Ren,&nbsp;Danhui Zhang,&nbsp;Tao Xin","doi":"10.1111/emip.12592","DOIUrl":"10.1111/emip.12592","url":null,"abstract":"<p>In scientific literacy, knowledge integration (KI) is a scaffolding-based theory to assist students' scientific inquiry learning. To drive students to be self-directed, many courses have been developed based on KI framework. However, few efforts have been made to evaluate the outcome of students' learning under KI instruction. Moreover, finer-grained information has been pursued to better understand students' learning and how it progresses over time. In this article, a normative procedure of building and choosing cognitive diagnosis models (CDMs) and attribute hierarchies was formulated under KI theory. We examined the utility of CDMs for evaluating students' knowledge status in KI learning. The results of the data analysis confirmed an intuitive assumption of the hierarchical structure of KI components. Furthermore, analysis of pre- and posttests using a higher-order, hidden Markov model tracked students' skill acquisition while integrating knowledge. Results showed that students make significant progress after using the web-based inquiry science environment (WISE) platform.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139587965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Measuring Variability in Proctor Decision Making on High-Stakes Assessments: Improving Test Security in the Digital Age 衡量监考人员在高风险评估中决策的可变性:提高数字时代的考试安全
IF 2 4区 教育学 Q2 Social Sciences Pub Date : 2024-01-24 DOI: 10.1111/emip.12591
William Belzak, J. R. Lockwood, Yigal Attali

Remote proctoring, or monitoring test takers through internet-based, video-recording software, has become critical for maintaining test security on high-stakes assessments. The main role of remote proctors is to make judgments about test takers' behaviors and decide whether these behaviors constitute rule violations. Variability in proctor decision making, or the degree to which humans/proctors make different decisions about the same test-taking behaviors, can be problematic for both test takers and test users (e.g., universities). In this paper, we measure variability in proctor decision making over time on a high-stakes English language proficiency test. Our results show that (1) proctors systematically differ in their decision making and (2) these differences are trait-like (i.e., ranging from lenient to strict), but (3) systematic variability in decisions can be reduced. Based on these findings, we recommend that test security providers conduct regular measurements of proctors’ judgments and take actions to reduce variability in proctor decision making.

远程监考,即通过基于互联网的录像软件对考生进行监控,已成为维护高风险评估考试安全的关键。远程监考人员的主要职责是对考生的行为做出判断,并决定这些行为是否构成违规。监考决策的可变性,即人类/监考人员对相同的考试行为做出不同决策的程度,可能会给考生和考试使用者(如大学)带来问题。在本文中,我们测量了在一次高风险的英语语言能力测试中,监考人员的决策随时间推移而产生的变化。我们的研究结果表明:(1) 监考人员的决策存在系统性差异;(2) 这些差异具有特质性(即从宽到严),但 (3) 决策的系统性差异是可以减少的。基于这些发现,我们建议考试安全提供商定期对监考人员的判断进行测量,并采取行动减少监考人员决策的变异性。
{"title":"Measuring Variability in Proctor Decision Making on High-Stakes Assessments: Improving Test Security in the Digital Age","authors":"William Belzak,&nbsp;J. R. Lockwood,&nbsp;Yigal Attali","doi":"10.1111/emip.12591","DOIUrl":"10.1111/emip.12591","url":null,"abstract":"<p>Remote proctoring, or monitoring test takers through internet-based, video-recording software, has become critical for maintaining test security on high-stakes assessments. The main role of remote proctors is to make judgments about test takers' behaviors and decide whether these behaviors constitute rule violations. Variability in proctor decision making, or the degree to which humans/proctors make different decisions about the same test-taking behaviors, can be problematic for both test takers and test users (e.g., universities). In this paper, we measure variability in proctor decision making over time on a high-stakes English language proficiency test. Our results show that (1) proctors systematically differ in their decision making and (2) these differences are trait-like (i.e., ranging from lenient to strict), but (3) systematic variability in decisions can be reduced. Based on these findings, we recommend that test security providers conduct regular measurements of proctors’ judgments and take actions to reduce variability in proctor decision making.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12591","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139587843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using OpenAI GPT to Generate Reading Comprehension Items 使用 OpenAI GPT 生成阅读理解项目
IF 2 4区 教育学 Q2 Social Sciences Pub Date : 2024-01-24 DOI: 10.1111/emip.12590
Ayfer Sayin, Mark Gierl

The purpose of this study is to introduce and evaluate a method for generating reading comprehension items using template-based automatic item generation. To begin, we describe a new model for generating reading comprehension items called the text analysis cognitive model assessing inferential skills across different reading passages. Next, the text analysis cognitive model is used to generate reading comprehension items where examinees are required to read a passage and identify the irrelevant sentence. The sentences for the generated passages were created using OpenAI GPT-3.5. Finally, the quality of the generated items was evaluated. The generated items were reviewed by three subject-matter experts. The generated items were also administered to a sample of 1,607 Grade-8 students. The correct options for the generated items produced a similar level of difficulty and yielded strong discrimination power while the incorrect options served as effective distractors. Implications of augmented intelligence for item development are discussed.

本研究的目的是介绍和评估一种使用基于模板的自动项目生成方法来生成阅读理解项目的方法。首先,我们介绍了一种用于生成阅读理解题目的新模型,即评估不同阅读段落推断能力的文本分析认知模型。接下来,我们使用文本分析认知模型生成阅读理解项目,要求考生阅读段落并找出不相关的句子。生成段落的句子使用 OpenAI GPT-3.5 创建。最后,对生成项目的质量进行了评估。三个主题专家对生成的项目进行了审核。生成的题目还对 1,607 名八年级学生进行了抽样测试。结果表明,生成题目的正确选项具有相似的难度和较强的区分度,而错误选项则能有效地分散学生的注意力。本文讨论了增强智能对项目开发的影响。
{"title":"Using OpenAI GPT to Generate Reading Comprehension Items","authors":"Ayfer Sayin,&nbsp;Mark Gierl","doi":"10.1111/emip.12590","DOIUrl":"10.1111/emip.12590","url":null,"abstract":"<p>The purpose of this study is to introduce and evaluate a method for generating reading comprehension items using template-based automatic item generation. To begin, we describe a new model for generating reading comprehension items called the text analysis cognitive model assessing inferential skills across different reading passages. Next, the text analysis cognitive model is used to generate reading comprehension items where examinees are required to read a passage and identify the irrelevant sentence. The sentences for the generated passages were created using OpenAI GPT-3.5. Finally, the quality of the generated items was evaluated. The generated items were reviewed by three subject-matter experts. The generated items were also administered to a sample of 1,607 Grade-8 students. The correct options for the generated items produced a similar level of difficulty and yielded strong discrimination power while the incorrect options served as effective distractors. Implications of augmented intelligence for item development are discussed.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12590","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139587888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Achievement and Growth on English Language Proficiency and Content Assessments for English Learners in Elementary Grades 小学英语学习者在英语语言能力和内容评估中的成绩和成长
IF 2 4区 教育学 Q2 Social Sciences Pub Date : 2024-01-10 DOI: 10.1111/emip.12588
Heather M Buzick, Mikyung Kim Wolf, Laura Ballard

English language proficiency (ELP) assessment scores are used by states to make high-stakes decisions related to linguistic support in instruction and assessment for English learner (EL) students and for EL student reclassification. Changes to both academic content standards and ELP academic standards within the last decade have resulted in increased academic rigor and language demands. In this study, we explored the association between EL student performance over time on content (English language arts and mathematics) and ELP assessments, generally finding evidence of positive associations. Modeling the simultaneous association between changes over time in both content and ELP assessment performance contributes empirical evidence about the role of language in ELA and mathematics development and provides contextual information to serve as validity evidence for score inferences for EL students.

各州使用英语语言能力(ELP)评估分数来做出与英语学习者(EL)学生的教学和评估中的语言支持有关的高风险决策,以及对英语学习者学生进行重新分类。在过去十年中,学术内容标准和英语学习能力学术标准的变化导致了学术严谨性和语言要求的提高。在这项研究中,我们探讨了随着时间的推移,英语语言学习者(EL)学生在学习内容(英语语言艺术和数学)和英语语言学习能力(ELP)评估中的表现之间的关联,总体上发现了正相关的证据。建立内容和英语学习能力评估成绩随时间变化的同步关联模型,为语言在英语语言艺术和数学发展中的作用提供了经验证据,并为英语学习者的分数推断提供了背景信息作为有效性证据。
{"title":"Achievement and Growth on English Language Proficiency and Content Assessments for English Learners in Elementary Grades","authors":"Heather M Buzick,&nbsp;Mikyung Kim Wolf,&nbsp;Laura Ballard","doi":"10.1111/emip.12588","DOIUrl":"10.1111/emip.12588","url":null,"abstract":"<p>English language proficiency (ELP) assessment scores are used by states to make high-stakes decisions related to linguistic support in instruction and assessment for English learner (EL) students and for EL student reclassification. Changes to both academic content standards and ELP academic standards within the last decade have resulted in increased academic rigor and language demands. In this study, we explored the association between EL student performance over time on content (English language arts and mathematics) and ELP assessments, generally finding evidence of positive associations. Modeling the simultaneous association between changes over time in both content and ELP assessment performance contributes empirical evidence about the role of language in ELA and mathematics development and provides contextual information to serve as validity evidence for score inferences for EL students.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139464311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Educational Measurement-Issues and Practice
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1