Applied Measurement in Education最新文献

英文中文

Enacting a Process for Developing Culturally Relevant Classroom Assessments 制定与文化相关的课堂评估程序

IF 1.5 4区教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH

Applied Measurement in Education

Pub Date : 2023-05-25 DOI: 10.1080/08957347.2023.2214652

Eowyn P. O’Dwyer, Jesse R. Sparks, Leslie Nabors Oláh

ABSTRACT A critical aspect of the development of culturally relevant classroom assessments is the design of tasks that affirm students’ racial and ethnic identities and community cultural practices. This paper describes the process we followed to build a shared understanding of what culturally relevant assessments are, to pursue ways of bringing more diverse voices and perspectives into the development process to generate new ideas and further our understanding, and finally to integrate those understandings and findings into the design of scenario-based tasks (ETS Testlets). This paper describes our engagement with research literature and employee-led affinity groups, students, and external consultants. In synthesizing their advice and feedback, we identified five design principles that scenario-based assessment developers can incorporate into their own work. These principles are then applied to the development of a scenario-based assessment task. Finally, we reflect on our process and challenges faced to inform future advancements in the field.

摘要：发展与文化相关的课堂评估的一个关键方面是设计确认学生种族和民族身份以及社区文化实践的任务。本文描述了我们所遵循的过程，以建立对什么是文化相关评估的共同理解，寻求将更多不同的声音和观点引入开发过程的方法，从而产生新的想法并进一步加深我们的理解，并最终将这些理解和发现融入基于场景的任务（ETS测试集）的设计中。本文介绍了我们与研究文献和员工领导的亲和力团体、学生和外部顾问的接触。在综合他们的建议和反馈时，我们确定了五个设计原则，基于场景的评估开发人员可以将其纳入自己的工作中。然后将这些原则应用于基于场景的评估任务的开发。最后，我们反思我们的进程和面临的挑战，为该领域的未来进展提供信息。

引用次数: 2

Applying a Culturally Responsive Pedagogical Framework to Design and Evaluate Classroom Performance-Based Assessments in Hawai‘i 应用文化响应性教学框架设计和评估夏威夷课堂绩效评估

IF 1.5 4区教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH

Applied Measurement in Education

Pub Date : 2023-05-20 DOI: 10.1080/08957347.2023.2214655

Carla M. Evans

ABSTRACT Previous writings focus on why centering assessment design around students’ cultural, social, and/or linguistic diversity is important and how performance-based assessment can support such aims. This article extends previous work by describing how a culturally responsive classroom assessment framework was created from a culturally responsive education (CRE) pedagogical framework. The goal of the framework was to guide the design and evaluation of curriculum-embedded, classroom performance assessments. Components discussed include: modification of evidence-centered design processes, teacher and/or student adaptation of construct irrelevant aspects of task prompts, addition of cultural meaningfulness questions to think alouds, and revision of task quality review protocols to promote CRE design features. Future research is needed to explore the limitations of the framework applied, and the extent to which students perceive the classroom summative assessments designed do indeed allow them to better show all they know and can do in ways related to their cultural, social, and/or linguistic identities.

以前的文章集中在为什么围绕学生的文化、社会和/或语言多样性进行评估设计是重要的，以及基于绩效的评估如何支持这些目标。本文通过描述如何从文化响应性教育(CRE)教学框架中创建文化响应性课堂评估框架来扩展先前的工作。该框架的目标是指导嵌入课程的课堂表现评估的设计和评估。讨论的内容包括:修改以证据为中心的设计过程，教师和/或学生适应构建任务提示的不相关方面，增加文化意义问题以大声思考，以及修改任务质量审查协议以促进CRE设计特征。未来的研究需要探索所应用的框架的局限性，以及学生认为课堂总结性评估设计的程度确实允许他们更好地展示他们所知道的和可以做的与他们的文化，社会和/或语言身份相关的方式。

{"title":"Applying a Culturally Responsive Pedagogical Framework to Design and Evaluate Classroom Performance-Based Assessments in Hawai‘i","authors":"Carla M. Evans","doi":"10.1080/08957347.2023.2214655","DOIUrl":"https://doi.org/10.1080/08957347.2023.2214655","url":null,"abstract":"ABSTRACT Previous writings focus on why centering assessment design around students’ cultural, social, and/or linguistic diversity is important and how performance-based assessment can support such aims. This article extends previous work by describing how a culturally responsive classroom assessment framework was created from a culturally responsive education (CRE) pedagogical framework. The goal of the framework was to guide the design and evaluation of curriculum-embedded, classroom performance assessments. Components discussed include: modification of evidence-centered design processes, teacher and/or student adaptation of construct irrelevant aspects of task prompts, addition of cultural meaningfulness questions to think alouds, and revision of task quality review protocols to promote CRE design features. Future research is needed to explore the limitations of the framework applied, and the extent to which students perceive the classroom summative assessments designed do indeed allow them to better show all they know and can do in ways related to their cultural, social, and/or linguistic identities.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"36 1","pages":"269 - 285"},"PeriodicalIF":1.5,"publicationDate":"2023-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46027666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Validity and Racial Justice in Educational Assessment 教育评估的有效性与种族公正

IF 1.5 4区教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH

Applied Measurement in Education

Pub Date : 2023-05-20 DOI: 10.1080/08957347.2023.2214654

Josh Lederman

Abstract Given its centrality to assessment, until the concept of validity includes concern for racial justice, such matters will be seen as residing outside the “real” work of validation, rendering them powerless to count against the apparent scientific merit of the test. As the definition of validity has evolved, however, it holds great potential to centralize matters like racial (in)justice, positioning them as necessary validity evidence. This article reviews a history of debates over what validity should and shouldn’t encompass; we then look toward the more centralized stances on validity – the book series Standards and Educational Measurement – where we see that test use, and the social impact of test use, has been a mounting concern over the years within these publications. Finally, we explore Kane’s argument-based approach to validation, which I argue could impact racial justice concerns by centralizing them within the very notion of what makes assessment valid or invalid.

摘要鉴于其在评估中的中心地位，在有效性概念包括对种族正义的关注之前，这些问题将被视为存在于“真正”的验证工作之外，使其无法与测试的明显科学价值相抗衡。然而，随着有效性的定义不断演变，它有很大的潜力将种族正义等问题集中起来，将其定位为必要的有效性证据。这篇文章回顾了关于有效性应该包含什么和不应该包含什么的争论历史；然后，我们着眼于对有效性的更集中的立场——《标准与教育测量》系列丛书——在这本书中，我们看到，多年来，考试的使用以及考试使用的社会影响一直是这些出版物中越来越关注的问题。最后，我们探讨了凯恩基于论证的验证方法，我认为这可能会影响种族正义问题，因为它将种族正义问题集中在评估有效或无效的概念中。

引用次数: 1

College Admissions and Testing in a Time of Transformational Change 转型时期的大学招生与考试

IF 1.5 4区教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH

Applied Measurement in Education

Pub Date : 2023-04-03 DOI: 10.1080/08957347.2023.2201705

Ross E. Markle

conversation

交谈

引用次数: 0

Keeping Up the PACE: Evaluating Grade 8 Student Achievement Outcomes for New Hampshire’s Innovative Assessment System 保持步调:评估新罕布什尔州创新评估系统的八年级学生成就结果

IF 1.5 4区教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH

Applied Measurement in Education

Pub Date : 2023-04-03 DOI: 10.1080/08957347.2023.2201700

Alexandra Lane Perez, Carla M. Evans

ABSTRACT New Hampshire’s Performance Assessment of Competency Education (PACE) innovative assessment system uses student scores from classroom performance assessments as well as other classroom tests for school accountability purposes. One concern is that not having annual state testing may incentivize schools and teachers away from teaching the breadth of the state content standards. This study examined the effects of PACE on Grade 8 test scores after 5 years of implementation using propensity score matching followed by hierarchical linear modeling. The results suggest that PACE students perform about the same, on average, in mathematics and ELA as non-PACE students on the state assessment. There was no evidence of differential effects for students who had an individualized education program or were granted FRL. Findings for this limited sample suggest schools and teachers did not sacrifice the breadth of students’ opportunity to learn the state content standards while piloting a state performance assessment reform.

摘要新罕布什尔州的能力教育绩效评估（PACE）创新评估系统使用课堂绩效评估和其他课堂测试中的学生成绩来实现学校问责目的。一个令人担忧的问题是，不进行年度州测试可能会激励学校和教师放弃教授州内容标准的广度。本研究采用倾向评分匹配和分层线性模型，检验了PACE在实施5年后对8年级考试成绩的影响。结果表明，在国家评估中，PACE学生在数学和ELA方面的平均表现与非PACE学生大致相同。没有证据表明对接受个性化教育计划或获得FRL的学生有不同的影响。对这一有限样本的调查结果表明，学校和教师在试行州绩效评估改革时，没有牺牲学生学习州内容标准的广泛机会。

引用次数: 0

Comparing Drift Detection Methods for Accurate Rasch Equating in Different Sample Sizes 不同样本量下精确Rasch方程漂移检测方法的比较

IF 1.5 4区教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH

Applied Measurement in Education

Pub Date : 2023-04-03 DOI: 10.1080/08957347.2023.2201704

Sarah Alahmadi, Andrew T. Jones, Carol L. Barry, Beatriz Ibáñez

ABSTRACT Rasch common-item equating is often used in high-stakes testing to maintain equivalent passing standards across test administrations. If unaddressed, item parameter drift poses a major threat to the accuracy of Rasch common-item equating. We compared the performance of well-established and newly developed drift detection methods in small and large sample sizes, varying the proportion of test items used as anchor (common) items and the proportion of drifted anchors. In the simulated-data study, the most accurate equating was obtained in large-sample conditions with a small-moderate number of drifted anchors using the mINFIT/mOUTFIT methods. However, when any drift was present in small-sample conditions and when a large number of drifted anchors were present in large-sample conditions, all methods performed ineffectively. In the operational-data study, percent-correct standards and failure rates varied across the methods in the large-sample exam but not in the small-sample exam. Different recommendations for high- and low-volume testing programs are provided.

摘要：在高风险测试中，经常使用Rasch通用项目等式，以在测试管理部门中保持同等的通过标准。如果不加以解决，项目参数漂移会对Rasch公共项目等值的准确性构成重大威胁。我们比较了成熟的和新开发的漂移检测方法在小样本和大样本中的性能，改变了用作锚（常见）项目的测试项目的比例和漂移锚的比例。在模拟数据研究中，使用mINFIT/mOUTFIT方法，在大样本条件下，使用少量中等数量的漂移锚，获得了最准确的等式。然而，当小样本条件下出现任何漂移时，以及当大样本条件下存在大量漂移锚时，所有方法都无效。在操作数据研究中，在大样本测试中，不同方法的正确率和失败率各不相同，但在小样本测试中则不同。对高容量和低容量测试程序提供了不同的建议。

引用次数: 0

Multi-Group Generalizations of SIBTEST and Crossing-SIBTEST SIBTEST和交叉SIBTEST的多群推广

IF 1.5 4区教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH

Applied Measurement in Education

Pub Date : 2023-04-03 DOI: 10.1080/08957347.2023.2201703

R. P. Chalmers, Guoguo Zheng

ABSTRACT This article presents generalizations of SIBTEST and crossing-SIBTEST statistics for differential item functioning (DIF) investigations involving more than two groups. After reviewing the original two-group setup for these statistics, a set of multigroup generalizations that support contrast matrices for joint tests of DIF are presented. To investigate the Type I error and power behavior of these generalizations, a Monte Carlo simulation study was then explored. Results indicated that the proposed generalizations are reasonably effective at recovering their respective population parameter definitions, maintain optimal Type I error control, have suitable power to detect uniform and non-uniform DIF, and in shorter tests are competitive with the generalized logistic regression and generalized Mantel–Haenszel tests for DIF.

摘要本文对涉及两组以上的差异项目功能（DIF）调查的SIBTEST和交叉SIBTEST统计进行了概括。在回顾了这些统计数据的原始两组设置后，提出了一组支持DIF联合测试的对比矩阵的多组推广。为了研究这些推广的I型误差和幂行为，随后进行了蒙特卡洛模拟研究。结果表明，所提出的推广在恢复各自的总体参数定义方面相当有效，保持了最优的I型误差控制，具有检测均匀和非均匀DIF的适当能力，并且在较短的测试中与DIF的广义逻辑回归和广义Mantel–Haenszel测试具有竞争力。

引用次数: 0

Tracking Ordinal Development of Skills with a Longitudinal DINA Model with Polytomous Attributes 用多属性纵向DINA模型跟踪技能的有序发展

IF 1.5 4区教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH

Applied Measurement in Education

Pub Date : 2023-04-03 DOI: 10.1080/08957347.2023.2201702

P. Zhan, Yao-sen Liu, Zhaohui Yu, Yanfang Pan

ABSTRACT Many educational and psychological studies have shown that the development of students is generally step-by-step (i.e. ordinal development) to a specific level. This study proposed a novel longitudinal learning diagnosis model with polytomous attributes to track students’ ordinal development in learning. Using the concept of polytomous attributes in the proposed model, the learning process of a specific skill, from non-mastery to mastery, can be divided into multiple ordinal steps in order to better characterize the learning trajectory. The results of an empirical study conducted to explore the performance of the proposed model indicated that it could adequately diagnose the ordinal development of skills in longitudinal assessments. A simulation study was also conducted to examine the estimation accuracy of general ability and the classification accuracy of attributes of the proposed model in different simulated conditions.

许多教育和心理学研究表明，学生的发展通常是逐步(即有序发展)到某一特定水平的。本研究提出了一种具有多属性的纵向学习诊断模型，用于跟踪学生在学习中的有序发展。利用该模型中的多属性概念，将特定技能从非精通到精通的学习过程划分为多个有序步骤，以便更好地表征学习轨迹。实证研究结果表明，该模型能够在纵向评估中充分诊断技能的有序发展。仿真研究了该模型在不同模拟条件下的一般能力估计精度和属性分类精度。

引用次数: 0

Measurement Invariance in Relation to First Language: An Evaluation of German Reading and Spelling Tests 与第一语言相关的测量不变性:对德语阅读和拼写测试的评价

IF 1.5 4区教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH

Applied Measurement in Education

Pub Date : 2023-04-03 DOI: 10.1080/08957347.2023.2201701

L. Visser, Friederike Cartschau, Ariane von Goldammer, Janin Brandenburg, M. Timmerman, M. Hasselhorn, C. Mähler

ABSTRACT The growing number of children in primary schools in Germany who have German as their second language (L2) has raised questions about the fairness of performance assessment. Fair tests are a prerequisite for distinguishing between L2 learning delay and a specific learning disability. We evaluated five commonly used reading and spelling tests for measurement invariance (MI) as a function of first language (German vs. other). Multi-group confirmatory factor analyses revealed strict MI for the Weingarten Basic Vocabulary Spelling Tests (WRTs) 3+ and 4+ and the Salzburger Reading (SLT) and Spelling (SRT) Tests, suggesting these instruments are suitable for assessing reading and spelling skills regardless of first language. The MI for A Reading Comprehension Test for First to Seventh Graders – 2nd Edition (ELFE II) was partly strict with unequal intercepts for the text subscale. We discuss the implications of this finding for assessing reading performance of children with L2.

在德国，越来越多的小学生将德语作为第二语言(L2)，这引发了对绩效评估公平性的质疑。公平的测试是区分第二语言学习迟缓和特定学习障碍的先决条件。我们评估了五种常用的阅读和拼写测试的测量不变性(MI)作为第一语言的函数(德语与其他)。多组验证性因素分析显示，Weingarten基础词汇拼写测试(WRTs) 3+和4+以及萨尔茨堡阅读(SLT)和拼写(SRT)测试具有严格的MI，表明这些工具适用于评估阅读和拼写技能，无论母语如何。《小学一年级至七年级阅读理解测验第二版》(ELFE II)在文本分量表截距不相等的问题上有一定程度的严格。我们讨论了这一发现对评估二语儿童阅读表现的意义。

引用次数: 0

A Census-Level, Multi-Grade Analysis of the Association Between Testing Time, Breaks, and Achievement 测试时间、休息时间和成绩之间关系的普查水平多层次分析

IF 1.5 4区教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH

Applied Measurement in Education

Pub Date : 2023-01-02 DOI: 10.1080/08957347.2023.2172019

david. rutkowski, Leslie Rutkowski, Dubravka Svetina Valdivia, Yusuf Canbolat, Stephanie Underhill

ABSTRACT Several states in the US have removed time limits on their state assessments. In Indiana, where this study takes place, the state assessment is both untimed during the testing window and allows unlimited breaks during the testing session. Using grade 3 and 8 math and English state assessment data, in this paper we focus on time used for testing and examine whether students who take more time tend to outperform their peers. Further, we also examine if the number of breaks students take is associated with student achievement scores. Findings suggest that even in an untimed setting, there remains a strong association between time spent on the assessment and achievement at both the student and school level. The number of breaks, on the other hand, show little to no association with achievement after controlling for time. The paper concludes with a discussion of the policy implications of the findings.

美国的几个州已经取消了州评估的时间限制。在印第安纳州，这项研究进行的地方，州评估在测试窗口期间是不定时的，并且在测试期间允许无限的休息。本文使用三年级和八年级的数学和英语状态评估数据，重点关注用于测试的时间，并检查花费更多时间的学生是否倾向于优于同龄人。此外，我们还研究了学生休息的次数是否与学生的成绩分数有关。研究结果表明，即使在不定时的情况下，花在评估上的时间与学生和学校的成绩之间仍然存在很强的联系。另一方面，在控制时间后，休息的次数与成绩几乎没有联系。论文最后讨论了研究结果的政策含义。

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Applied Measurement in Education

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀