首页 > 最新文献

Educational Measurement-Issues and Practice最新文献

英文 中文
Measurement Must Be Qualitative, then Quantitative, then Qualitative Again 测量必须是定性的,然后是定量的,然后再是定性的
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-11-21 DOI: 10.1111/emip.12662
Andrew D. Ho

Educational measurement is a social science that requires both qualitative and quantitative competencies. Qualitative competencies in educational measurement include developing and applying theories of learning, designing instruments, and identifying the social, cultural, historical, and political contexts of measurement. Quantitative competencies include statistical inference, computational fluency, and psychometric modeling. I review 12 commentaries authored by past presidents of the National Council on Measurement in Education (NCME) published in a special issue prompting them to reflect on the past, present, and future of educational measurement. I explain how a perspective on both qualitative and quantitative competencies yields common themes across the commentaries. These include the appeal and challenge of personalization, the necessity of contextualization, and the value of communication and collaboration. I conclude that elevation of both qualitative and quantitative competencies underlying educational measurement provides a clearer sense of how NCME can advance its mission, “to advance theory and applications of educational measurement to benefit society.”

教育测量是一门需要定性和定量能力的社会科学。教育测量的定性能力包括发展和应用学习理论,设计工具,识别测量的社会、文化、历史和政治背景。定量能力包括统计推断、计算流畅性和心理测量建模。我回顾了12篇由全国教育测量委员会(NCME)前任主席撰写的评论,这些评论发表在一个特刊上,促使他们反思教育测量的过去、现在和未来。我解释了定性和定量能力的观点如何在评论中产生共同的主题。其中包括个性化的吸引力和挑战,情境化的必要性,以及沟通和协作的价值。我的结论是,教育测量的定性和定量能力的提升,为NCME如何推进其使命提供了更清晰的认识,“推进教育测量的理论和应用,造福社会”。
{"title":"Measurement Must Be Qualitative, then Quantitative, then Qualitative Again","authors":"Andrew D. Ho","doi":"10.1111/emip.12662","DOIUrl":"https://doi.org/10.1111/emip.12662","url":null,"abstract":"<p>Educational measurement is a social science that requires both qualitative and quantitative competencies. Qualitative competencies in educational measurement include developing and applying theories of learning, designing instruments, and identifying the social, cultural, historical, and political contexts of measurement. Quantitative competencies include statistical inference, computational fluency, and psychometric modeling. I review 12 commentaries authored by past presidents of the National Council on Measurement in Education (NCME) published in a special issue prompting them to reflect on the past, present, and future of educational measurement. I explain how a perspective on both qualitative and quantitative competencies yields common themes across the commentaries. These include the appeal and challenge of personalization, the necessity of contextualization, and the value of communication and collaboration. I conclude that elevation of both qualitative and quantitative competencies underlying educational measurement provides a clearer sense of how NCME can advance its mission, “to advance theory and applications of educational measurement to benefit society.”</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"137-145"},"PeriodicalIF":2.7,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Admission Testing in Higher Education: Changing Landscape and Outcomes from Test-Optional Policies 高等教育入学考试:可选考试政策的变化景观和结果
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-11-14 DOI: 10.1111/emip.12651
Wayne Camara

Access to admission tests was greatly restricted during the COVID-19 pandemic resulting in widespread adoption of test-optional policies by colleges and universities. Many institutions adopted such policies on an interim or trial basis, as many others signaled the change would be long term. Several Ivy League institutions and selective public flagship universities have returned to requiring test scores from all applicants citing their own research indicating diversity and ensuring academic success of applicants can be best served by inclusion of test scores in the admissions process. This paper reviews recent research on the impact of test-optional policies on score-sending behaviors of applicants and differential outcomes in college and score sending. Ultimately, test-optional policies are neither the panacea for diversity that proponents suggested nor do they result in a decay of academic outcomes that opponents forecast, but they do have consequences, which colleges will need to weigh going forward.

在2019冠状病毒病大流行期间,参加入学考试的机会受到极大限制,导致高校广泛采用非强制性考试政策。许多机构在临时或试验的基础上采用了这种政策,而其他许多机构则表示这种变化将是长期的。一些常春藤盟校和重点公立旗舰大学已经恢复了对所有申请人的考试成绩的要求,理由是他们自己的研究表明了多样性,并确保在录取过程中包括考试成绩可以最好地服务于申请人的学业成功。本文综述了近年来关于非考试选择政策对申请人送分行为的影响以及大学和送分的差异结果的研究。最终,非必考政策既不是支持者所说的多样性的灵丹妙药,也不会像反对者预测的那样导致学业成绩的下降,但它们确实会产生影响,大学需要在未来加以权衡。
{"title":"Admission Testing in Higher Education: Changing Landscape and Outcomes from Test-Optional Policies","authors":"Wayne Camara","doi":"10.1111/emip.12651","DOIUrl":"https://doi.org/10.1111/emip.12651","url":null,"abstract":"<p>Access to admission tests was greatly restricted during the COVID-19 pandemic resulting in widespread adoption of test-optional policies by colleges and universities. Many institutions adopted such policies on an interim or trial basis, as many others signaled the change would be long term. Several Ivy League institutions and selective public flagship universities have returned to requiring test scores from all applicants citing their own research indicating diversity and ensuring academic success of applicants can be best served by inclusion of test scores in the admissions process. This paper reviews recent research on the impact of test-optional policies on score-sending behaviors of applicants and differential outcomes in college and score sending. Ultimately, test-optional policies are neither the panacea for diversity that proponents suggested nor do they result in a decay of academic outcomes that opponents forecast, but they do have consequences, which colleges will need to weigh going forward.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"104-111"},"PeriodicalIF":2.7,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leading ITEMS: A Retrospective on Progress and Future Goals 主要项目:回顾进展和未来目标
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-11-14 DOI: 10.1111/emip.12661
Brian C. Leventhal
<p>As this issue marks the conclusion of my tenure as editor of the Instructional Topics in Educational Measurement Series (ITEMS), I take this opportunity to reflect on the progress made during my term and to outline potential future directions for the publication.</p><p>First, I extend my gratitude to the National Council on Measurement in Education (NCME) and the publications committee for entrusting me with the role of editor and for their unwavering support of my vision for ITEMS. I am also deeply appreciative of Richard Feinberg, who served as associate editor throughout my tenure, and Zhongmin Cui, editor of <i>Educational Measurement: Issues and Practice</i> (<i>EM:IP</i>) for their invaluable collaboration. Additionally, I thank all the authors who contributed modules and the dedicated readership that has engaged with the content.</p><p>ITEMS stands as a distinctive publication, bridging the gap between research and education by offering learning modules on both emerging and established practice in educational measurement. I saw the primary objective of ITEMS as to provide accessible learning resources to a diverse audience, including practitioners, students, partners, stakeholders, and the general public. These modules serve various purposes; practitioners may seek to research or expand their skills, students and professors may use them to complement classroom learning, partners and stakeholders may develop foundational knowledge to enhance collaboration with measurement professionals, and the public may gain insights into tests they encounter in their daily lives. Addressing the needs of such a broad audience is challenging, yet it underscores the essential role that ITEMS plays.</p><p>Upon assuming the role of editor three years ago, ITEMS had recently transitioned from static articles to interactive digital modules. My efforts focused on furthering this transformation by enhancing the engagement of digital publications and streamlining the development process. Although much of this work occurred behind the scenes, the benefits are evident to learners. The modules are now easily accessible on the NCME website, available in both digital and print formats. Newer modules include downloadable videos for offline use or course integration. Content is now accessible across multiple devices, including computers, phones and tablets. Authors also benefit from the updated development process, which now uses familiar software such as Microsoft PowerPoint or Google Slides. Comprehensive documentation, including timelines, deliverables, and templates, supports authors throughout the development process, allowing them to focus on content creation rather than formatting and logistics.</p><p>Reflecting on my tenure, I am proud of the modules published, yet I recognize areas for improvement and future growth. Recruiting authors and maintaining content development posed significant challenges, with some modules remaining incomplete. I am hopeful that th
本期标志着我作为《教育测量教学专题丛书》(ITEMS)编辑任期的结束,我借此机会反思我在任期内取得的进展,并概述该出版物未来的发展方向。首先,我要感谢全国教育测量委员会(NCME)和出版委员会委托我担任编辑一职,并感谢他们对我的《项目》愿景的坚定支持。我也非常感谢在我任职期间担任副主编的理查德·范伯格(Richard Feinberg)和《教育测量:问题与实践》(EM:IP)主编崔忠民的宝贵合作。此外,我要感谢所有贡献模块的作者和参与内容的忠实读者。ITEMS是一份独特的出版物,通过提供新兴和已建立的教育测量实践的学习模块,弥合了研究与教育之间的差距。我认为ITEMS的主要目标是为不同的受众提供可访问的学习资源,包括从业者、学生、合作伙伴、利益相关者和公众。这些模块用于各种目的;从业者可能会寻求研究或扩展他们的技能,学生和教授可能会用它们来补充课堂学习,合作伙伴和利益相关者可能会发展基础知识,以加强与测量专业人员的合作,公众可能会对他们日常生活中遇到的测试有更深的了解。满足如此广泛的受众的需求是具有挑战性的,但它强调了ITEMS所起的重要作用。自三年前担任编辑以来,ITEMS最近已经从静态文章过渡到交互式数字模块。我的工作重点是通过加强数字出版物的参与和简化开发过程来进一步推动这一转变。虽然大部分工作都是在幕后进行的,但对学习者来说,好处是显而易见的。这些模块现在可以很容易地在NCME网站上获得,有数字和印刷两种格式。较新的模块包括离线使用或课程集成的可下载视频。内容现在可以通过多种设备访问,包括电脑、手机和平板电脑。作者也受益于更新的开发过程,现在使用熟悉的软件,如Microsoft PowerPoint或谷歌Slides。全面的文档,包括时间表、可交付成果和模板,在整个开发过程中为作者提供支持,使他们能够专注于内容创建,而不是格式化和后勤。回顾我的任期,我为发布的模块感到自豪,但我也认识到需要改进和未来增长的领域。招募作者和维护内容开发带来了巨大的挑战,有些模块仍然不完整。我希望简化后的程序将缓解这些问题。此外,虽然努力吸引相关学科的作者,但在这方面仍有改进的余地。我设想ITEMS出版更多来自新兴学者的模块,这些模块既在传统的教育测量范围内,也在传统的教育测量范围之外。随着该领域继续与基础能力接触,项目可以在加强、教学和扩展这些能力方面发挥关键作用。此外,必须通过遵循通用设计原则和以多种语言提供模块来增强ITEMS的可访问性。这将扩大出版物的影响范围,加强NCME在教育测量方面的领导地位。最后,我主张增加文化响应性评估、公平评估实践和社会正义评估的内容。这些方法和框架在该领域获得了牵引力,ITEMS可以使在这些领域缺乏指导的研究生和实践者更容易获得它们。虽然很少有研究生课程涉及这些主题,但新兴学者对此非常感兴趣。物品可以作为他们的宝贵资源。在我结束编辑生涯之际,我期待着《项目》杂志继续取得成功,扩大影响力。
{"title":"Leading ITEMS: A Retrospective on Progress and Future Goals","authors":"Brian C. Leventhal","doi":"10.1111/emip.12661","DOIUrl":"https://doi.org/10.1111/emip.12661","url":null,"abstract":"&lt;p&gt;As this issue marks the conclusion of my tenure as editor of the Instructional Topics in Educational Measurement Series (ITEMS), I take this opportunity to reflect on the progress made during my term and to outline potential future directions for the publication.&lt;/p&gt;&lt;p&gt;First, I extend my gratitude to the National Council on Measurement in Education (NCME) and the publications committee for entrusting me with the role of editor and for their unwavering support of my vision for ITEMS. I am also deeply appreciative of Richard Feinberg, who served as associate editor throughout my tenure, and Zhongmin Cui, editor of &lt;i&gt;Educational Measurement: Issues and Practice&lt;/i&gt; (&lt;i&gt;EM:IP&lt;/i&gt;) for their invaluable collaboration. Additionally, I thank all the authors who contributed modules and the dedicated readership that has engaged with the content.&lt;/p&gt;&lt;p&gt;ITEMS stands as a distinctive publication, bridging the gap between research and education by offering learning modules on both emerging and established practice in educational measurement. I saw the primary objective of ITEMS as to provide accessible learning resources to a diverse audience, including practitioners, students, partners, stakeholders, and the general public. These modules serve various purposes; practitioners may seek to research or expand their skills, students and professors may use them to complement classroom learning, partners and stakeholders may develop foundational knowledge to enhance collaboration with measurement professionals, and the public may gain insights into tests they encounter in their daily lives. Addressing the needs of such a broad audience is challenging, yet it underscores the essential role that ITEMS plays.&lt;/p&gt;&lt;p&gt;Upon assuming the role of editor three years ago, ITEMS had recently transitioned from static articles to interactive digital modules. My efforts focused on furthering this transformation by enhancing the engagement of digital publications and streamlining the development process. Although much of this work occurred behind the scenes, the benefits are evident to learners. The modules are now easily accessible on the NCME website, available in both digital and print formats. Newer modules include downloadable videos for offline use or course integration. Content is now accessible across multiple devices, including computers, phones and tablets. Authors also benefit from the updated development process, which now uses familiar software such as Microsoft PowerPoint or Google Slides. Comprehensive documentation, including timelines, deliverables, and templates, supports authors throughout the development process, allowing them to focus on content creation rather than formatting and logistics.&lt;/p&gt;&lt;p&gt;Reflecting on my tenure, I am proud of the modules published, yet I recognize areas for improvement and future growth. Recruiting authors and maintaining content development posed significant challenges, with some modules remaining incomplete. I am hopeful that th","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"169"},"PeriodicalIF":2.7,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12661","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Application of Text Embeddings to Support Alignment of Educational Content Standards 文本嵌入在支持教育内容标准对齐中的应用
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-11-14 DOI: 10.1111/emip.12641
Reese Butterfuss, Harold Doran

Large language models are increasingly used in educational and psychological measurement activities. Their rapidly evolving sophistication and ability to detect language semantics make them viable tools to supplement subject matter experts and their reviews of large amounts of text statements, such as educational content standards. This paper presents an application of text embeddings to find relationhips between different sets of educational content standards in a content mapping process. Content mapping is routinely used by state education agencies and is often a requirement of the United States Department of Education peer review process. We discuss the educational measurement problem, propose a formal methodology, demonstrate an application of our proposed approach, and provide measures of its accuracy and potential to support real-world activities.

大型语言模型越来越多地应用于教育和心理测量活动中。它们快速发展的复杂性和检测语言语义的能力使它们成为补充主题专家及其对大量文本声明(如教育内容标准)的审查的可行工具。本文介绍了文本嵌入在内容映射过程中寻找不同教育内容标准之间关系的应用。内容映射通常被州教育机构使用,并且通常是美国教育部同行评审过程的要求。我们讨论了教育度量问题,提出了一种正式的方法,演示了我们提出的方法的应用,并提供了其准确性和支持现实世界活动的潜力的度量。
{"title":"An Application of Text Embeddings to Support Alignment of Educational Content Standards","authors":"Reese Butterfuss,&nbsp;Harold Doran","doi":"10.1111/emip.12641","DOIUrl":"https://doi.org/10.1111/emip.12641","url":null,"abstract":"<p>Large language models are increasingly used in educational and psychological measurement activities. Their rapidly evolving sophistication and ability to detect language semantics make them viable tools to supplement subject matter experts and their reviews of large amounts of text statements, such as educational content standards. This paper presents an application of text embeddings to find relationhips between different sets of educational content standards in a content mapping process. Content mapping is routinely used by state education agencies and is often a requirement of the United States Department of Education peer review process. We discuss the educational measurement problem, propose a formal methodology, demonstrate an application of our proposed approach, and provide measures of its accuracy and potential to support real-world activities.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 1","pages":"73-83"},"PeriodicalIF":2.7,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143423802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
What Should Psychometricians Know about the History of Testing and Testing Policy? 心理测量学家应该了解什么关于测试的历史和测试政策?
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-11-14 DOI: 10.1111/emip.12650
Lorrie A. Shepard

In 2023, a National Council on Measurement in Education Presidential Task Force developed a consensus framework for foundational competencies in educational measurement to guide graduate programs and subsequent professional development. This article elaborates on the social, cultural, historical, and political context subdomain from that framework. A graduate course on the history of testing and testing policy in the United States is proposed to help measurement professionals develop an understanding of historic belief systems and theories of action that affect every aspect of testing applications—definition of constructs, instrument design, respondents’ interactions, interpretations and use of results, and both intended and unintended consequences. Two, accessible, key readings are proposed for each of 14 weeks addressing the following topics: IQ testing and deficit perspectives; special education placements, disproportionality, and accommodations; grade retention and tracking; college admissions testing; standards-based reforms; 1990s performance assessment innovations; NCLB and school accountability; achievement gaps and opportunity to learn; NAEP and international assessments; standard setting and NAEP achievement levels; Common Core State Standards and ESSA; formative assessment and research on learning; culturally responsive assessment.

2023年,全国教育测量委员会总统特别工作组为教育测量的基础能力制定了共识框架,以指导研究生课程和随后的专业发展。本文将从该框架详细阐述社会、文化、历史和政治上下文子域。一门关于美国测试历史和测试政策的研究生课程被提出,以帮助测量专业人员发展对影响测试应用各个方面的历史信念系统和行动理论的理解——结构的定义、仪器设计、受访者的互动、结果的解释和使用,以及有意和无意的后果。两个,可访问的,关键的阅读建议,每14周针对以下主题:智商测试和缺陷的观点;特殊教育安置,不均衡和住宿;成绩保留和跟踪;大学入学考试;基于标准的改革;20世纪90年代绩效考核创新;NCLB和学校问责制;成就差距和学习机会;NAEP和国际评估;标准制定和NAEP成绩水平;共同核心州标准与ESSA;形成性评价与学习研究;文化响应性评估。
{"title":"What Should Psychometricians Know about the History of Testing and Testing Policy?","authors":"Lorrie A. Shepard","doi":"10.1111/emip.12650","DOIUrl":"https://doi.org/10.1111/emip.12650","url":null,"abstract":"<p>In 2023, a National Council on Measurement in Education Presidential Task Force developed a consensus framework for foundational competencies in educational measurement to guide graduate programs and subsequent professional development. This article elaborates on the social, cultural, historical, and political context subdomain from that framework. A graduate course on the history of testing and testing policy in the United States is proposed to help measurement professionals develop an understanding of historic belief systems and theories of action that affect every aspect of testing applications—definition of constructs, instrument design, respondents’ interactions, interpretations and use of results, and both intended and unintended consequences. Two, accessible, key readings are proposed for each of 14 weeks addressing the following topics: IQ testing and deficit perspectives; special education placements, disproportionality, and accommodations; grade retention and tracking; college admissions testing; standards-based reforms; 1990s performance assessment innovations; NCLB and school accountability; achievement gaps and opportunity to learn; NAEP and international assessments; standard setting and NAEP achievement levels; Common Core State Standards and ESSA; formative assessment and research on learning; culturally responsive assessment.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"46-61"},"PeriodicalIF":2.7,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12650","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
In the beginning, there was an item… 一开始,有一个项目……
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-11-14 DOI: 10.1111/emip.12647
Deborah J. Harris, Catherine J. Welch, Stephen B. Dunbar

As educational researchers, we take scored item responses, create data sets to analyze, draw inferences from those analyses, and make decisions, about students’ educational knowledge and future success, judge how successful educational programs are, determine what to teach tomorrow, and so on. It is good to remind ourselves that the basis for all our analyses, from simple means to complex multilevel, multidimensional modeling, interpretations of those analyses, and decisions we make based on the analyses are at the core based on a test taker responding to an item. With all the emphasis on modeling, analyses, big data, machine learning, etc., we need to remember it all starts with the items we collect information on. If we get those wrong, then the results of subsequent analyses are unlikely to provide the information we are seeking.

It is true that how students and educators interact with items has changed, and continues to change. More and more of the student-item interactions are happening online, and the days when an educator had relatively easy access to the actual test items, often after test administration, are in the past. This lack of access is also true for the researchers analyzing the response data: instead of a single test booklet aligned to a data file of test taker responses, there are large pools of items, and while the researcher may know a test taker was administered, say, item #SK-65243-0273A and what the response was, they do not know what the text of the item actually was, which can make it challenging to interpret analysis results at times.

From having a test author write the items for an assessment, to contracting with content specialists to draft items, to cloning items from a template, to having large language models/artificial intelligence produce items, item development has morphed over the past and present, and will continue to morph into the future. Item tryouts for pretesting the quality and functioning of an item, including gathering data for generating item statistics to aid in forms construction and in some instances scoring, now attempt to develop algorithms that can accurately predict item characteristics, including item statistics, without gathering item data in advance of operational use (or at all). We are developing more innovative item types, and collecting more data, such as latencies, click streams, and other process data on student responses to those items.

Sometimes we are so enamored of what we can do with the data, the analyses seem distant from the actual experience: a test taker responding to an item. And this makes it challenging at times to interpret analysis results in terms of actionable steps. Our aim here is to examine the evolution of how items are developed and considered, concentrating on large-scale, K–12 educational assessments.

The Standards for Educational and Psychological Testing (Standards; American Educational Research Association [AERA], the

作为教育研究人员,我们收集得分项目的回答,创建数据集来分析,从这些分析中得出推论,并做出决定,关于学生的教育知识和未来的成功,判断教育计划有多成功,决定明天教什么,等等。我们应该提醒自己,我们所有分析的基础,从简单的方法到复杂的多层次、多维的建模,对这些分析的解释,以及我们基于分析做出的决策,其核心都是基于考生对一个项目的反应。随着所有对建模、分析、大数据、机器学习等的强调,我们需要记住这一切都是从我们收集信息的项目开始的。如果我们弄错了,那么随后的分析结果就不太可能提供我们正在寻找的信息。的确,学生和教育者与物品互动的方式已经改变了,而且还在继续改变。越来越多的学生与试题的互动发生在网上,而教育工作者相对容易获得实际试题的时代,通常是在考试管理之后,已经成为过去。对于分析回答数据的研究人员来说,这种缺乏访问权限的情况也是如此:不是一个与考生回答数据文件一致的测试小册子,而是有大量的项目,虽然研究人员可能知道一个考生被管理了,比如,项目#SK-65243-0273A和回答是什么,但他们不知道项目的实际文本是什么,这有时会使解释分析结果变得具有挑战性。从让测试作者为评估编写项目,到与内容专家签订合同起草项目,到从模板克隆项目,再到使用大型语言模型/人工智能生成项目,项目开发已经在过去和现在发生了变化,并将继续在未来发生变化。为了预先测试项目的质量和功能而进行的项目测试,包括收集数据以生成项目统计数据以帮助表单构建和在某些情况下评分,现在试图开发能够准确预测项目特征(包括项目统计数据)的算法,而无需在操作使用之前收集项目数据(或根本不收集数据)。我们正在开发更多创新的项目类型,并收集更多的数据,例如延迟、点击流和学生对这些项目的响应的其他过程数据。有时候,我们对数据的处理能力太过痴迷,以至于这些分析似乎与实际体验相距甚远:一个考生对一个项目的反应。这使得有时用可操作的步骤来解释分析结果变得具有挑战性。我们在这里的目的是研究项目如何开发和考虑的演变,集中在大规模的K-12教育评估上。教育与心理测试标准(标准;美国教育研究协会(AERA)、美国心理协会(APA)和国家教育测量委员会(NCME)(1966, 1974, 1985, 1999, 2014)几十年来一直是测试开发者和教育测量专家的指导原则。这些标准随着时间的推移而发展。它们需要三个主要组织的一致意见:APA、AERA和NCME,它们结合了多种观点的考虑。最早的版本似乎在某种程度上忽视了对单个项目的处理,而把注意力集中在项目的集合或测试形式上。为了与“教育测量的过去、现在和未来”主题保持一致,我们使用标准的五个版本来研究这些年来对项目的关注是如何变化的,并展望目前正在开发的标准的未来版本,以及它们如何概念化与“项目”相关的问题。我们的目的是将注意力清楚地集中在所有形式的项目上,作为我们测量决策的基础。考生回答的问题是我们收集、分析和解释数据的中心。然而,有时这些项目似乎与我们的重点相去甚远。教育测量专业的研究生通常在他们的早期训练中有一个关于测量的一般课程,涵盖了基本的基础概念,如信度和效度,通常还包括一些项目写作的处理,通常是处理多项选择题和好的和坏的项目写作技巧。构建的响应项和规则也可能被涵盖。然而,随着学生进入更高级的课程,似乎项目统计,p值,点双序列,IRT参数估计是重点。项目的实际文本甚至可能不会被呈现为“良好”的统计属性,项目偏差和项目拟合统计被讨论,评估中的项目仅根据统计属性保留或丢弃,而忽略了项目文本。 随着学生们为模拟研究生成数据,并习惯于处理0和1,与实际项目的距离可能会更远,而实际项目与考生的反应不同。这些标准的发展是为了反映和解决测试领域的变化。关于项目的发展过程,可以看到问题和内容的扩展,以及重要领域的重组和重新定位。本文探讨了项目发展背景下的演变。在1966年版中,强调了有关有效性和可靠性的主题,并认为这是必不可少的。在专门讨论效度的章节中,与内容效度有关的问题包括在构建测试时解决项目代表性的标准,专家在选择和审查项目中的作用,以及项目与测试规范之间的匹配。对于大规模成绩测试,假定题目编写由主题专家完成,他们“设计”和选择被判断为涵盖与正在编制的评估有关的主题和过程的题目。还强调了当专家选择个别项目时,独立判断之间的一致性(标准C3.11)。但是,没有强调确定项目编写过程应遵循的过程的细节。与项目编写有关的类似问题也列入1974年版。然而,本版本也强调了测试公平性、偏倚性和敏感性的重要性(见标准E12.1.2)。关于项目编写者,有人说,项目编写者和编辑的资格文件是成绩测试所需要的。此外,还讨论了构念效度的概念以及测试内容与理论构念的一致性。标准解决了使用专家来判断项目的适当性的实践,因为它们与测试所代表的“任务范围”有关。专家资格的文件和专家在选择项目方面的协议可以被视为今天许多项目编写和协调活动的先驱。在1985年的文件中引入了一个新的组织结构,将技术标准分组在“测试建设和评估技术标准”下,并专门用一章来讨论测试的开发和修订。本章包括25个标准,涉及测试规范、测试开发、公平性、项目分析和与预期用途相关的设计,并将每个标准分为主要的、次要的或有条件的。1999年版保留了1985年的结构,将技术度量问题分组在“测试构建、评估和文档”下,并专门为测试开发和修订写了一章。1999年包含的27个标准与早期版本相似,但也包含了对测试开发的更详细的介绍。引言探讨了不同的项目格式(选择响应、扩展响应、组合和性能项目),并讨论了设计的含义。与评估和问责制有关的联邦教育法的重要性也在本版本中进行了更详细的讨论。为了反映1999年至2014年间教育的变化,例如2001年通过的《不让一个孩子掉队法》,最新版本的标准更加关注与教育考试相关的问责问题。这一版本突出了公平性作为三个“基础”章节之一,将其提升到与有效性和可靠性相同的水平。2014年关于测试设计和开发的章节被移到了“操作”下的六个章节之一。本章的引言进行了扩展,并通过阐述内容质量、清晰度、结构无关性、敏感性和适当性,专门讨论了项目开发和审查。2014年标准继续被测试组织用于识别和塑造开发项目所遵循的过程和程序。测试组织使用的当前程序使项目写作的过程可操作,该过程从评估的设计开始,并通过可操作的测试继续进行。标准3.1规定:“负责考试开发、修订和管理的人员应设计考试过程的所有步骤,以促进有效的分数解释,以尽可能广泛地为目标人群中的个人和相关子群体提供预期分数。”通过这种方式,2014年标准纳入了以证据为中心的设计(ECD)的基本概念,这些概念几十年来一直是良好发展实践的基础。2014年标准强调,项目开发对于提供K-12大规模评估的质量和一致性至关重要。 大多数负责大规模评估项目的州和地方教育机构使用现行标准来构建流程并提供必要的文件。尽管在过程中有变化,大多数教育机构阐明了项目开发、项目编写和审查过程的过程。对于经过多年发展的标准核心概念,广义的项目质量可以被认为是有效性、可靠性和公平性的基础。在K-12的评估中,人的因素在其对项目质量的贡献和评估方面显得非常重要。教师和其他主题专家(sme)编写和编辑项目,他们作为小组成员,审查为大规模评估起草的项目,以评估适当性、可及性、
{"title":"In the beginning, there was an item…","authors":"Deborah J. Harris,&nbsp;Catherine J. Welch,&nbsp;Stephen B. Dunbar","doi":"10.1111/emip.12647","DOIUrl":"https://doi.org/10.1111/emip.12647","url":null,"abstract":"<p>As educational researchers, we take scored item responses, create data sets to analyze, draw inferences from those analyses, and make decisions, about students’ educational knowledge and future success, judge how successful educational programs are, determine what to teach tomorrow, and so on. It is good to remind ourselves that the basis for all our analyses, from simple means to complex multilevel, multidimensional modeling, interpretations of those analyses, and decisions we make based on the analyses are at the core based on a test taker responding to an item. With all the emphasis on modeling, analyses, big data, machine learning, etc., we need to remember it all starts with the items we collect information on. If we get those wrong, then the results of subsequent analyses are unlikely to provide the information we are seeking.</p><p>It is true that how students and educators interact with items has changed, and continues to change. More and more of the student-item interactions are happening online, and the days when an educator had relatively easy access to the actual test items, often after test administration, are in the past. This lack of access is also true for the researchers analyzing the response data: instead of a single test booklet aligned to a data file of test taker responses, there are large pools of items, and while the researcher may know a test taker was administered, say, item #SK-65243-0273A and what the response was, they do not know what the text of the item actually was, which can make it challenging to interpret analysis results at times.</p><p>From having a test author write the items for an assessment, to contracting with content specialists to draft items, to cloning items from a template, to having large language models/artificial intelligence produce items, item development has morphed over the past and present, and will continue to morph into the future. Item tryouts for pretesting the quality and functioning of an item, including gathering data for generating item statistics to aid in forms construction and in some instances scoring, now attempt to develop algorithms that can accurately predict item characteristics, including item statistics, without gathering item data in advance of operational use (or at all). We are developing more innovative item types, and collecting more data, such as latencies, click streams, and other process data on student responses to those items.</p><p>Sometimes we are so enamored of what we can do with the data, the analyses seem distant from the actual experience: a test taker responding to an item. And this makes it challenging at times to interpret analysis results in terms of actionable steps. Our aim here is to examine the evolution of how items are developed and considered, concentrating on large-scale, K–12 educational assessments.</p><p>The <i>Standards for Educational and Psychological Testing</i> (<i>Standards</i>; American Educational Research Association [AERA], the ","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"40-45"},"PeriodicalIF":2.7,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12647","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Measurement Invariance for Multilingual Learners Using Item Response and Response Time in PISA 2018 2018年国际学生评估项目反应和反应时间对多语种学习者的测量不变性
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-11-10 DOI: 10.1111/emip.12640
Jung Yeon Park, Sean Joo, Zikun Li, Hyejin Yoon

This study examines potential assessment bias based on students' primary language status in PISA 2018. Specifically, multilingual (MLs) and nonmultilingual (non-MLs) students in the United States are compared with regard to their response time as well as scored responses across three cognitive domains (reading, mathematics, and science). Differential item functioning (DIF) analysis reveals that 7–14% of items exhibit DIF-related problems in scored responses between the two groups, aligning with PISA technical report results. While MLs generally spend more time on the test than non-MLs across cognitive levels, differential response time (DRT) functioning identifies significant time differences in 7–10% of items for students with similar cognitive levels. It was noticeable that items with DIF and DRT issues show limited overlap, suggesting diverse reasons for student struggles in the assessment. A deeper examination of item characteristics is recommended for test developers and teachers to gain a better understanding of these nuances.

本研究探讨了 2018 年国际学生评估项目(PISA)中基于学生主要语言状况的潜在评估偏差。具体而言,研究比较了美国多语种(MLs)和非多语种(non-MLs)学生在三个认知领域(阅读、数学和科学)的回答时间和得分情况。差异项目功能(DIF)分析表明,7%-14% 的项目在两组学生的计分回答中表现出与 DIF 相关的问题,这与国际学生评估项目(PISA)技术报告的结果一致。在不同认知水平的学生中,多语种学生通常比非多语种学生花费更多的时间在测试上,但差异反应时间(DRT)功能发现,在认知水平相似的学生中,有 7-10%的项目存在显著的时间差异。值得注意的是,存在 DIF 和 DRT 问题的题目显示出有限的重叠,这表明学生在测评中遇到困难的原因多种多样。建议测试开发人员和教师对项目特征进行更深入的研究,以便更好地了解这些细微差别。
{"title":"Measurement Invariance for Multilingual Learners Using Item Response and Response Time in PISA 2018","authors":"Jung Yeon Park,&nbsp;Sean Joo,&nbsp;Zikun Li,&nbsp;Hyejin Yoon","doi":"10.1111/emip.12640","DOIUrl":"https://doi.org/10.1111/emip.12640","url":null,"abstract":"<p>This study examines potential assessment bias based on students' primary language status in PISA 2018. Specifically, multilingual (MLs) and nonmultilingual (non-MLs) students in the United States are compared with regard to their response time as well as scored responses across three cognitive domains (reading, mathematics, and science). Differential item functioning (DIF) analysis reveals that 7–14% of items exhibit DIF-related problems in scored responses between the two groups, aligning with PISA technical report results. While MLs generally spend more time on the test than non-MLs across cognitive levels, differential response time (DRT) functioning identifies significant time differences in 7–10% of items for students with similar cognitive levels. It was noticeable that items with DIF and DRT issues show limited overlap, suggesting diverse reasons for student struggles in the assessment. A deeper examination of item characteristics is recommended for test developers and teachers to gain a better understanding of these nuances.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 1","pages":"55-65"},"PeriodicalIF":2.7,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12640","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143423536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
You Win Some, You Lose Some 有得也有失
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-11-10 DOI: 10.1111/emip.12643
Gregory J. Cizek

In a 1993 EM:IP article, I made six predictions related to measurement policy issues for the approaching millenium. In this article, I evaluate the accuracy of those predictions (Spoiler: I was only modestly accurate) and I proffer a mix of seven contemporary predictions, recommendations, and aspirations regarding assessment generally, NCME as an association, and specific psychometric practices.

在1993年的一篇EM:IP文章中,我对即将到来的千禧年的测量政策问题做出了六个预测。在这篇文章中,我评估了这些预测的准确性(注:我只是适度准确),并提供了7个当代预测、建议和愿望,包括评估的总体情况、NCME作为一种关联以及具体的心理测量实践。
{"title":"You Win Some, You Lose Some","authors":"Gregory J. Cizek","doi":"10.1111/emip.12643","DOIUrl":"https://doi.org/10.1111/emip.12643","url":null,"abstract":"<p>In a 1993 EM:IP article, I made six predictions related to measurement policy issues for the approaching millenium. In this article, I evaluate the accuracy of those predictions (Spoiler: I was only modestly accurate) and I proffer a mix of seven contemporary predictions, recommendations, and aspirations regarding assessment generally, NCME as an association, and specific psychometric practices.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"126-136"},"PeriodicalIF":2.7,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143245272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The 2025 EM:IP Cover Graphic/Data Visualization Competition 2025年新兴市场:知识产权封面图形/数据可视化竞赛
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-11-10 DOI: 10.1111/emip.12658
Yuan-Ling Liaw
{"title":"The 2025 EM:IP Cover Graphic/Data Visualization Competition","authors":"Yuan-Ling Liaw","doi":"10.1111/emip.12658","DOIUrl":"https://doi.org/10.1111/emip.12658","url":null,"abstract":"","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"8"},"PeriodicalIF":2.7,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143245274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Introduction to the Special Section on the Past, Present, and Future of Educational Measurement “教育测量的过去、现在和未来”专题导论
IF 2.7 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2024-11-10 DOI: 10.1111/emip.12660
Zhongmin Cui
{"title":"Introduction to the Special Section on the Past, Present, and Future of Educational Measurement","authors":"Zhongmin Cui","doi":"10.1111/emip.12660","DOIUrl":"https://doi.org/10.1111/emip.12660","url":null,"abstract":"","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"38-39"},"PeriodicalIF":2.7,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Educational Measurement-Issues and Practice
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1