首页 > 最新文献

Educational Measurement-Issues and Practice最新文献

英文 中文
Classroom Assessment Validation: Proficiency Claims and Uses 课堂评估验证:熟练程度声明和使用
IF 1.9 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2026-02-04 DOI: 10.1111/emip.70014
James H. McMillan

Unlike standardized testing applications of validity, teachers need a simple and efficient way to reflect on the accuracy of the claims based on student performance, then consider whether the uses of those claims are appropriate. A two-phase reasoning process of validation, consisting of a proficiency claim /argument and a use/argument, is presented as a way for teachers to understand and apply the central tenets of validation to their classroom assessments. Since classroom assessment is contextualized with multiple purposes, each teacher is obligated to use validation for their situation. The accuracy of teachers’ conclusions about the proficiency claims, and uses, will depend on their skill in gathering supportive evidence and considering alternative explanations. Examples of the proposed classroom assessment validation process are presented.

与效度的标准化测试应用不同,教师需要一种简单有效的方法来反映基于学生表现的声明的准确性,然后考虑这些声明的使用是否适当。验证的两阶段推理过程,由熟练程度声明/论证和使用/论证组成,作为教师理解和将验证的中心原则应用于课堂评估的一种方式。由于课堂评估具有多重目的,每个教师都有义务根据自己的情况使用验证。教师关于熟练程度要求和用途的结论的准确性将取决于他们收集支持性证据和考虑替代解释的技能。提出了课堂评估验证过程的示例。
{"title":"Classroom Assessment Validation: Proficiency Claims and Uses","authors":"James H. McMillan","doi":"10.1111/emip.70014","DOIUrl":"https://doi.org/10.1111/emip.70014","url":null,"abstract":"<p>Unlike standardized testing applications of validity, teachers need a simple and efficient way to reflect on the accuracy of the claims based on student performance, then consider whether the uses of those claims are appropriate. A two-phase reasoning process of validation, consisting of a proficiency claim /argument and a use/argument, is presented as a way for teachers to understand and apply the central tenets of validation to their classroom assessments. Since classroom assessment is contextualized with multiple purposes, each teacher is obligated to use validation for their situation. The accuracy of teachers’ conclusions about the proficiency claims, and uses, will depend on their skill in gathering supportive evidence and considering alternative explanations. Examples of the proposed classroom assessment validation process are presented.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"45 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146139296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI-Generated Essays: Characteristics and Implications on Automated Scoring and Academic Integrity 人工智能生成的论文:自动评分和学术诚信的特点和影响
IF 1.9 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2026-01-29 DOI: 10.1111/emip.70013
Yang Zhong, Jiangang Hao, Michael Fauss, Chen Li, Yuan Wang

The rapid advancement of large language models (LLMs) has enabled the generation of coherent essays, making AI-assisted writing increasingly common in educational and professional settings. Using large-scale empirical data, we examine and benchmark the characteristics and quality of essays generated by popular LLMs and discuss their implications for two key components of writing assessments: automated scoring and academic integrity. Our findings highlight limitations in existing automated scoring systems when applied to essays generated or heavily influenced by AI, and identify areas for improvement, including the development of new features to capture deeper thinking and recalibrating feature weights. Despite growing concerns that the increasing variety of LLMs may undermine the feasibility of detecting AI-generated essays, our results show that detectors trained on essays generated from one model can often identify texts from others with high accuracy, suggesting that effective detection could remain manageable in practice.

大型语言模型(llm)的快速发展使得能够生成连贯的文章,使人工智能辅助写作在教育和专业环境中越来越普遍。使用大规模的经验数据,我们检查和基准的特点和质量的流行法学硕士产生的论文,并讨论其对写作评估的两个关键组成部分的影响:自动评分和学术诚信。我们的研究结果强调了现有自动评分系统在应用于人工智能生成或受人工智能严重影响的论文时的局限性,并确定了需要改进的领域,包括开发新特征以捕获更深层次的思考和重新校准特征权重。尽管越来越多的人担心越来越多的法学硕士可能会破坏检测人工智能生成的论文的可行性,但我们的研究结果表明,在一个模型生成的论文上训练的检测器通常可以高精度地识别来自其他模型的文本,这表明有效的检测在实践中仍然是可管理的。
{"title":"AI-Generated Essays: Characteristics and Implications on Automated Scoring and Academic Integrity","authors":"Yang Zhong,&nbsp;Jiangang Hao,&nbsp;Michael Fauss,&nbsp;Chen Li,&nbsp;Yuan Wang","doi":"10.1111/emip.70013","DOIUrl":"https://doi.org/10.1111/emip.70013","url":null,"abstract":"<p>The rapid advancement of large language models (LLMs) has enabled the generation of coherent essays, making AI-assisted writing increasingly common in educational and professional settings. Using large-scale empirical data, we examine and benchmark the characteristics and quality of essays generated by popular LLMs and discuss their implications for two key components of writing assessments: automated scoring and academic integrity. Our findings highlight limitations in existing automated scoring systems when applied to essays generated or heavily influenced by AI, and identify areas for improvement, including the development of new features to capture deeper thinking and recalibrating feature weights. Despite growing concerns that the increasing variety of LLMs may undermine the feasibility of detecting AI-generated essays, our results show that detectors trained on essays generated from one model can often identify texts from others with high accuracy, suggesting that effective detection could remain manageable in practice.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"45 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146136621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Issue Cover 覆盖问题
IF 1.9 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2025-09-08 DOI: 10.1111/emip.70005
{"title":"Issue Cover","authors":"","doi":"10.1111/emip.70005","DOIUrl":"https://doi.org/10.1111/emip.70005","url":null,"abstract":"","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 3","pages":""},"PeriodicalIF":1.9,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.70005","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145012007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated Scoring in Learning Progression-Based Assessment: A Comparison of Researcher and Machine Interpretations 学习进度评估中的自动评分:研究者和机器解释的比较
IF 1.9 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2025-08-13 DOI: 10.1111/emip.70003
Hui Jin, Cynthia Lima, Limin Wang

Although AI transformer models have demonstrated notable capability in automated scoring, it is difficult to examine how and why these models fall short in scoring some responses. This study investigated how transformer models’ language processing and quantification processes can be leveraged to enhance the accuracy of automated scoring. Automated scoring was applied to five science items. Results indicate that including item descriptions prior to student responses provides additional contextual information to the transformer model, allowing it to generate automated scoring models with improved performance. These automated scoring models achieved scoring accuracy comparable to human raters. However, they struggle to evaluate responses that contain complex scientific terminology and to interpret responses that contain unusual symbols, atypical language errors, or logical inconsistencies. These findings underscore the importance of the efforts from both researchers and teachers in advancing the accuracy, fairness, and effectiveness of automated scoring.

尽管人工智能变压器模型在自动评分方面表现出了显著的能力,但很难检查这些模型如何以及为什么在某些反应中评分不足。本研究探讨了如何利用变压器模型的语言处理和量化过程来提高自动评分的准确性。自动评分应用于五个科学项目。结果表明,在学生回答之前包含项目描述为转换模型提供了额外的上下文信息,允许它生成具有改进性能的自动评分模型。这些自动评分模型达到了与人类评分者相当的评分准确性。然而,他们很难评估包含复杂科学术语的回复,并解释包含不寻常符号、非典型语言错误或逻辑不一致的回复。这些发现强调了研究人员和教师在提高自动评分的准确性、公平性和有效性方面所做努力的重要性。
{"title":"Automated Scoring in Learning Progression-Based Assessment: A Comparison of Researcher and Machine Interpretations","authors":"Hui Jin,&nbsp;Cynthia Lima,&nbsp;Limin Wang","doi":"10.1111/emip.70003","DOIUrl":"https://doi.org/10.1111/emip.70003","url":null,"abstract":"<p>Although AI transformer models have demonstrated notable capability in automated scoring, it is difficult to examine how and why these models fall short in scoring some responses. This study investigated how transformer models’ language processing and quantification processes can be leveraged to enhance the accuracy of automated scoring. Automated scoring was applied to five science items. Results indicate that including item descriptions prior to student responses provides additional contextual information to the transformer model, allowing it to generate automated scoring models with improved performance. These automated scoring models achieved scoring accuracy comparable to human raters. However, they struggle to evaluate responses that contain complex scientific terminology and to interpret responses that contain unusual symbols, atypical language errors, or logical inconsistencies. These findings underscore the importance of the efforts from both researchers and teachers in advancing the accuracy, fairness, and effectiveness of automated scoring.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 3","pages":"25-37"},"PeriodicalIF":1.9,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145012936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring the Effect of Human Error When Using Expert Judgments to Train an Automated Scoring System 探索使用专家判断训练自动评分系统时人为错误的影响
IF 1.9 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2025-08-07 DOI: 10.1111/emip.70002
Stephanie Iaccarino, Brian E. Clauser, Polina Harik, Peter Baldwin, Yiyun Zhou, Michael T. Kane
{"title":"Exploring the Effect of Human Error When Using Expert Judgments to Train an Automated Scoring System","authors":"Stephanie Iaccarino,&nbsp;Brian E. Clauser,&nbsp;Polina Harik,&nbsp;Peter Baldwin,&nbsp;Yiyun Zhou,&nbsp;Michael T. Kane","doi":"10.1111/emip.70002","DOIUrl":"https://doi.org/10.1111/emip.70002","url":null,"abstract":"","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 3","pages":"15-24"},"PeriodicalIF":1.9,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145012067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Digital Module 39: Introduction to Generalizability Theory 数字模块39:概括性理论导论
IF 1.9 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2025-08-03 DOI: 10.1111/emip.70001
Won-Chan Lee, Stella Y. Kim, Qiao Liu, Seungwon Shin

Module Abstract

Generalizability theory (GT) is a widely used framework in the social and behavioral sciences for assessing the reliability of measurements. Unlike classical test theory, which treats measurement error as a single undifferentiated term, GT enables the decomposition of error into multiple distinct components. This module introduces the core principles and applications of GT, with a focus on the univariate framework. The first four sections cover foundational concepts, including key terminology, common design structures, and the estimation of variance components. The final two sections offer hands-on examples using real data, implemented in R and GENOVA software. By the end of the module, participants will have a solid understanding of GT and the ability to conduct basic GT analyses using statistical software.

摘要概括性理论(Generalizability theory, GT)是社会科学和行为科学中广泛应用的测量可靠性评估框架。与经典测试理论不同,经典测试理论将测量误差视为单个未微分项,而GT可以将误差分解为多个不同的分量。本模块介绍了GT的核心原理和应用,重点是单变量框架。前四个部分涵盖了基本概念,包括关键术语、通用设计结构和方差成分的估计。最后两部分提供了使用R和GENOVA软件实现的真实数据的实际示例。在本模块结束时,学员将对GT有一个扎实的理解,并能够使用统计软件进行基本的GT分析。
{"title":"Digital Module 39: Introduction to Generalizability Theory","authors":"Won-Chan Lee,&nbsp;Stella Y. Kim,&nbsp;Qiao Liu,&nbsp;Seungwon Shin","doi":"10.1111/emip.70001","DOIUrl":"https://doi.org/10.1111/emip.70001","url":null,"abstract":"<div>\u0000 \u0000 <section>\u0000 \u0000 <h3> Module Abstract</h3>\u0000 \u0000 <p>Generalizability theory (GT) is a widely used framework in the social and behavioral sciences for assessing the reliability of measurements. Unlike classical test theory, which treats measurement error as a single undifferentiated term, GT enables the decomposition of error into multiple distinct components. This module introduces the core principles and applications of GT, with a focus on the univariate framework. The first four sections cover foundational concepts, including key terminology, common design structures, and the estimation of variance components. The final two sections offer hands-on examples using real data, implemented in R and GENOVA software. By the end of the module, participants will have a solid understanding of GT and the ability to conduct basic GT analyses using statistical software.</p>\u0000 </section>\u0000 </div>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 3","pages":"38-39"},"PeriodicalIF":1.9,"publicationDate":"2025-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.70001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145012353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Cover: Illustrating Collusion Networks with Graph Theory 封面:用图论说明共谋网络
IF 1.9 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2025-07-27 DOI: 10.1111/emip.70000
Yuan-Ling Liaw
{"title":"On the Cover: Illustrating Collusion Networks with Graph Theory","authors":"Yuan-Ling Liaw","doi":"10.1111/emip.70000","DOIUrl":"https://doi.org/10.1111/emip.70000","url":null,"abstract":"","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 3","pages":""},"PeriodicalIF":1.9,"publicationDate":"2025-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145013243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Case for Reimagining Universal Design of Assessment Systems 重新构想通用评估系统设计的案例
IF 1.9 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2025-07-20 DOI: 10.1111/emip.12674
Cara Cahalan Laitusis, Meagan Karvonen

The 2014 Standards for Educational and Psychological Testing describe universal design as an approach that offers promise for improving the fairness of educational assessments. As the field reconsiders questions of fairness in assessments, we propose a new framework that addresses the entire assessment lifecycle: universal design of assessment systems. This framework is rooted in the original Universal Design principles but extends beyond test design and administration to the entire assessment lifecycle, from construct definition to score interpretation and use. Another core tenet within this framework is the integration of psychological theory on universal human needs for autonomy, competence, and relatedness with flexibility based on our contemporary understandings of neurodiversity, culture, and multilingualism. Finally, the framework integrates the original Universal Design principle of tolerance for error, which promotes assessment designs that anticipate unintended actions and mitigate potential harms. After describing how the principles and guidelines might apply in contexts ranging from classroom assessments to statewide assessments and licensure exams, we conclude with practical implications and next steps. We hope future versions of the Standards for Educational and Psychological Testing incorporate this broader, systems-wide approach to universal design.

2014年《教育和心理测试标准》将通用设计描述为一种有望提高教育评估公平性的方法。由于该领域重新考虑评估中的公平性问题,我们提出了一个解决整个评估生命周期的新框架:评估系统的通用设计。这个框架根植于最初的通用设计原则,但是从测试设计和管理扩展到整个评估生命周期,从构造定义到分数解释和使用。该框架的另一个核心原则是基于我们当代对神经多样性、文化和多语言的理解,将心理学理论与人类对自主性、能力和灵活性的普遍需求相结合。最后,该框架整合了原始的通用设计容错原则,这促进了预测意外行为和减轻潜在危害的评估设计。在描述了原则和指导方针如何适用于从课堂评估到全州评估和执照考试的各种环境之后,我们总结了实际意义和下一步。我们希望教育和心理测试标准的未来版本将这种更广泛的、全系统的方法纳入通用设计。
{"title":"A Case for Reimagining Universal Design of Assessment Systems","authors":"Cara Cahalan Laitusis,&nbsp;Meagan Karvonen","doi":"10.1111/emip.12674","DOIUrl":"https://doi.org/10.1111/emip.12674","url":null,"abstract":"<p>The 2014 <i>Standards for Educational and Psychological Testing</i> describe universal design as an approach that offers promise for improving the fairness of educational assessments. As the field reconsiders questions of fairness in assessments, we propose a new framework that addresses the entire assessment lifecycle: universal design of assessment systems. This framework is rooted in the original Universal Design principles but extends beyond test design and administration to the entire assessment lifecycle, from construct definition to score interpretation and use. Another core tenet within this framework is the integration of psychological theory on universal human needs for autonomy, competence, and relatedness with flexibility based on our contemporary understandings of neurodiversity, culture, and multilingualism. Finally, the framework integrates the original <i>Universal Design</i> principle of <i>tolerance for error</i>, which promotes assessment designs that anticipate unintended actions and mitigate potential harms. After describing how the principles and guidelines might apply in contexts ranging from classroom assessments to statewide assessments and licensure exams, we conclude with practical implications and next steps. We hope future versions of the <i>Standards for Educational and Psychological Testing</i> incorporate this broader, systems-wide approach to universal design.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 3","pages":"5-14"},"PeriodicalIF":1.9,"publicationDate":"2025-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12674","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145013161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Issue Cover 覆盖问题
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2025-05-23 DOI: 10.1111/emip.12614
{"title":"Issue Cover","authors":"","doi":"10.1111/emip.12614","DOIUrl":"https://doi.org/10.1111/emip.12614","url":null,"abstract":"","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 2","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12614","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Cover: Sequential Progression and Item Review in Timed Tests: Patterns in Process Data 封面:时间测试中的顺序进展和项目审查:过程数据中的模式
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2025-05-20 DOI: 10.1111/emip.12670
Yuan-Ling Liaw
<p>We are excited to announce the winners of the 12th <i>EM:IP</i> Cover Graphic/Data Visualization Competition. Each year, we invite our readers to submit visualizations that are not only accurate and insightful but also visually compelling and easy to understand. This year's submissions explored key topics in educational measurement, including process data, item characteristics, test design, and score interpretation. We extend our sincere thanks to everyone who submitted their work, and we are especially grateful to the <i>EM:IP</i> editorial board for their thoughtful review and feedback in the selection process.</p><p>Winning entries may be featured on the cover of a future <i>EM:IP</i> issue. Previous winners who have not yet appeared on a cover remain eligible for upcoming issues.</p><p>This issue's cover features Sequential Progression and Item Review in Timed Tests: Patterns in Process Data, a compelling visualization created by Christian Meyer from the Association of American Medical Colleges and the University of Maryland, along with Ying Jin and Marc Kroopnick, both from the Association of American Medical Colleges.</p><p>The visualization, developed using R, presents smoothed density plots derived from process data collected during a high-stakes admissions test. It illustrates how examinees navigated one section of the test within a 95-minute time limit. The <i>x</i>-axis represents elapsed time in minutes. The <i>y</i>-axis segments item positions into five groups: 1 to 15, 16 to 25, 26 to 35, 36 to 45, and 46 to 59. Meyer and his colleagues explain that, for each item group, the height of the plot indicates density. The supports of the estimated densities extend beyond the start and end of the test to allow the plots to approach zero smoothly at the extremes.</p><p>Color is used effectively to distinguish between initial engagement and item review. Blue areas indicate when items were first viewed, while red areas show when examinees revisited those same items. The authors describe, “The figure illustrates a common test-taking strategy: examinees initially progress sequentially through the test, as shown by the early blue density peaks for each group. Toward the end of the session, they frequently revisit earlier items, as evidenced by the red peaks clustering near the time limit.” This pattern reflects deliberate time management, with examinees dividing their approach into two distinct phases.</p><p>They continue, “In the first phase, they assess each item, either attempting a response or skipping it for later review. In the second phase, they revisit skipped or uncertain items, providing more considered answers when time permits or resorting to random guessing if necessary.”</p><p>According to Meyer and his colleagues, the visualization offers valuable insight into examinees’ time management and engagement strategies during timed tests. They conclude, “It captures temporal strategies, such as sequential progression and end-of-sessi
我们很高兴地宣布第十二届EM:IP封面图形/数据可视化比赛的获胜者。每年,我们都会邀请读者提交不仅准确且富有洞察力,而且在视觉上引人注目且易于理解的可视化图像。今年提交的作品探讨了教育测量的关键主题,包括过程数据、项目特征、测试设计和分数解释。我们衷心感谢所有提交作品的人,并特别感谢《新知识产权》编辑委员会在评选过程中所做的周到审查和反馈。获奖作品可能会出现在未来的《新兴市场:知识产权》杂志的封面上。以前没有出现在封面上的获奖者仍然有资格出现在即将出版的杂志上。这期的封面特色是时序测试中的顺序进展和项目审查:过程数据中的模式,这是由美国医学院协会和马里兰大学的Christian Meyer以及美国医学院协会的Ying Jin和Marc Kroopnick创建的引人注目的可视化。使用R开发的可视化显示了从高风险入学测试期间收集的过程数据得出的平滑密度图。它说明了考生如何在95分钟的时间内完成考试的一个部分。x轴表示以分钟为单位的经过时间。y轴分段项目位置分为五组:1至15、16至25、26至35、36至45和46至59。迈耶和他的同事解释说,对于每个项目组,图的高度表示密度。估计密度的支持超出了测试的开始和结束,以允许图在极端情况下平稳地接近零。颜色被有效地用于区分初始参与和项目回顾。蓝色区域表示考生第一次看这些题目的时间,而红色区域表示考生再次看这些题目的时间。作者描述说:“该图说明了一种常见的考试策略:考生最初是按顺序通过考试的,正如每组的早期蓝色密度峰值所示。在会议结束时,他们经常回顾以前的项目,在时间限制附近聚集的红色峰值证明了这一点。”这种模式反映了刻意的时间管理,考生将他们的方法分为两个不同的阶段。他们继续说,“在第一阶段,他们评估每个项目,要么尝试回答,要么跳过它供以后回顾。在第二阶段,他们重新审视跳过的或不确定的项目,在时间允许的情况下提供更深思熟虑的答案,或者在必要时诉诸随机猜测。”根据迈耶和他的同事的说法,可视化提供了宝贵的见解,了解考生在定时考试中的时间管理和参与策略。他们的结论是:“它捕捉了时间策略,比如顺序进展和期末复习,为了解考生如何与考试结构和限制进行互动提供了有价值的见解。”虽然该图不包括有关单个项目的信息,并且仅限于项目位置范围,但它展示了如何以可访问和可解释的格式表示时间行为数据。其清晰度和设计使其成为通过过程数据传达测试模式的有用工具。如果你有兴趣了解更多关于数据可视化的信息,请联系Christian Meyer,邮箱:[email protected]。我们还邀请您参加年度EM:IP封面图形/数据可视化竞赛。详细信息可以在NCME和期刊网站上找到,你的参赛作品可能会出现在未来一期的封面上。如有任何问题或反馈,请联系袁玲,邮箱:[email protected]。
{"title":"On the Cover: Sequential Progression and Item Review in Timed Tests: Patterns in Process Data","authors":"Yuan-Ling Liaw","doi":"10.1111/emip.12670","DOIUrl":"https://doi.org/10.1111/emip.12670","url":null,"abstract":"&lt;p&gt;We are excited to announce the winners of the 12th &lt;i&gt;EM:IP&lt;/i&gt; Cover Graphic/Data Visualization Competition. Each year, we invite our readers to submit visualizations that are not only accurate and insightful but also visually compelling and easy to understand. This year's submissions explored key topics in educational measurement, including process data, item characteristics, test design, and score interpretation. We extend our sincere thanks to everyone who submitted their work, and we are especially grateful to the &lt;i&gt;EM:IP&lt;/i&gt; editorial board for their thoughtful review and feedback in the selection process.&lt;/p&gt;&lt;p&gt;Winning entries may be featured on the cover of a future &lt;i&gt;EM:IP&lt;/i&gt; issue. Previous winners who have not yet appeared on a cover remain eligible for upcoming issues.&lt;/p&gt;&lt;p&gt;This issue's cover features Sequential Progression and Item Review in Timed Tests: Patterns in Process Data, a compelling visualization created by Christian Meyer from the Association of American Medical Colleges and the University of Maryland, along with Ying Jin and Marc Kroopnick, both from the Association of American Medical Colleges.&lt;/p&gt;&lt;p&gt;The visualization, developed using R, presents smoothed density plots derived from process data collected during a high-stakes admissions test. It illustrates how examinees navigated one section of the test within a 95-minute time limit. The &lt;i&gt;x&lt;/i&gt;-axis represents elapsed time in minutes. The &lt;i&gt;y&lt;/i&gt;-axis segments item positions into five groups: 1 to 15, 16 to 25, 26 to 35, 36 to 45, and 46 to 59. Meyer and his colleagues explain that, for each item group, the height of the plot indicates density. The supports of the estimated densities extend beyond the start and end of the test to allow the plots to approach zero smoothly at the extremes.&lt;/p&gt;&lt;p&gt;Color is used effectively to distinguish between initial engagement and item review. Blue areas indicate when items were first viewed, while red areas show when examinees revisited those same items. The authors describe, “The figure illustrates a common test-taking strategy: examinees initially progress sequentially through the test, as shown by the early blue density peaks for each group. Toward the end of the session, they frequently revisit earlier items, as evidenced by the red peaks clustering near the time limit.” This pattern reflects deliberate time management, with examinees dividing their approach into two distinct phases.&lt;/p&gt;&lt;p&gt;They continue, “In the first phase, they assess each item, either attempting a response or skipping it for later review. In the second phase, they revisit skipped or uncertain items, providing more considered answers when time permits or resorting to random guessing if necessary.”&lt;/p&gt;&lt;p&gt;According to Meyer and his colleagues, the visualization offers valuable insight into examinees’ time management and engagement strategies during timed tests. They conclude, “It captures temporal strategies, such as sequential progression and end-of-sessi","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 2","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12670","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Educational Measurement-Issues and Practice
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1