首页 > 最新文献

Applied Measurement in Education最新文献

英文 中文
Enacting a Process for Developing Culturally Relevant Classroom Assessments 制定与文化相关的课堂评估程序
IF 1.5 4区 教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-05-25 DOI: 10.1080/08957347.2023.2214652
Eowyn P. O’Dwyer, Jesse R. Sparks, Leslie Nabors Oláh
ABSTRACT A critical aspect of the development of culturally relevant classroom assessments is the design of tasks that affirm students’ racial and ethnic identities and community cultural practices. This paper describes the process we followed to build a shared understanding of what culturally relevant assessments are, to pursue ways of bringing more diverse voices and perspectives into the development process to generate new ideas and further our understanding, and finally to integrate those understandings and findings into the design of scenario-based tasks (ETS Testlets). This paper describes our engagement with research literature and employee-led affinity groups, students, and external consultants. In synthesizing their advice and feedback, we identified five design principles that scenario-based assessment developers can incorporate into their own work. These principles are then applied to the development of a scenario-based assessment task. Finally, we reflect on our process and challenges faced to inform future advancements in the field.
摘要:发展与文化相关的课堂评估的一个关键方面是设计确认学生种族和民族身份以及社区文化实践的任务。本文描述了我们所遵循的过程,以建立对什么是文化相关评估的共同理解,寻求将更多不同的声音和观点引入开发过程的方法,从而产生新的想法并进一步加深我们的理解,并最终将这些理解和发现融入基于场景的任务(ETS测试集)的设计中。本文介绍了我们与研究文献和员工领导的亲和力团体、学生和外部顾问的接触。在综合他们的建议和反馈时,我们确定了五个设计原则,基于场景的评估开发人员可以将其纳入自己的工作中。然后将这些原则应用于基于场景的评估任务的开发。最后,我们反思我们的进程和面临的挑战,为该领域的未来进展提供信息。
{"title":"Enacting a Process for Developing Culturally Relevant Classroom Assessments","authors":"Eowyn P. O’Dwyer, Jesse R. Sparks, Leslie Nabors Oláh","doi":"10.1080/08957347.2023.2214652","DOIUrl":"https://doi.org/10.1080/08957347.2023.2214652","url":null,"abstract":"ABSTRACT A critical aspect of the development of culturally relevant classroom assessments is the design of tasks that affirm students’ racial and ethnic identities and community cultural practices. This paper describes the process we followed to build a shared understanding of what culturally relevant assessments are, to pursue ways of bringing more diverse voices and perspectives into the development process to generate new ideas and further our understanding, and finally to integrate those understandings and findings into the design of scenario-based tasks (ETS Testlets). This paper describes our engagement with research literature and employee-led affinity groups, students, and external consultants. In synthesizing their advice and feedback, we identified five design principles that scenario-based assessment developers can incorporate into their own work. These principles are then applied to the development of a scenario-based assessment task. Finally, we reflect on our process and challenges faced to inform future advancements in the field.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"36 1","pages":"286 - 303"},"PeriodicalIF":1.5,"publicationDate":"2023-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49204043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Applying a Culturally Responsive Pedagogical Framework to Design and Evaluate Classroom Performance-Based Assessments in Hawai‘i 应用文化响应性教学框架设计和评估夏威夷课堂绩效评估
IF 1.5 4区 教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-05-20 DOI: 10.1080/08957347.2023.2214655
Carla M. Evans
ABSTRACT Previous writings focus on why centering assessment design around students’ cultural, social, and/or linguistic diversity is important and how performance-based assessment can support such aims. This article extends previous work by describing how a culturally responsive classroom assessment framework was created from a culturally responsive education (CRE) pedagogical framework. The goal of the framework was to guide the design and evaluation of curriculum-embedded, classroom performance assessments. Components discussed include: modification of evidence-centered design processes, teacher and/or student adaptation of construct irrelevant aspects of task prompts, addition of cultural meaningfulness questions to think alouds, and revision of task quality review protocols to promote CRE design features. Future research is needed to explore the limitations of the framework applied, and the extent to which students perceive the classroom summative assessments designed do indeed allow them to better show all they know and can do in ways related to their cultural, social, and/or linguistic identities.
以前的文章集中在为什么围绕学生的文化、社会和/或语言多样性进行评估设计是重要的,以及基于绩效的评估如何支持这些目标。本文通过描述如何从文化响应性教育(CRE)教学框架中创建文化响应性课堂评估框架来扩展先前的工作。该框架的目标是指导嵌入课程的课堂表现评估的设计和评估。讨论的内容包括:修改以证据为中心的设计过程,教师和/或学生适应构建任务提示的不相关方面,增加文化意义问题以大声思考,以及修改任务质量审查协议以促进CRE设计特征。未来的研究需要探索所应用的框架的局限性,以及学生认为课堂总结性评估设计的程度确实允许他们更好地展示他们所知道的和可以做的与他们的文化,社会和/或语言身份相关的方式。
{"title":"Applying a Culturally Responsive Pedagogical Framework to Design and Evaluate Classroom Performance-Based Assessments in Hawai‘i","authors":"Carla M. Evans","doi":"10.1080/08957347.2023.2214655","DOIUrl":"https://doi.org/10.1080/08957347.2023.2214655","url":null,"abstract":"ABSTRACT Previous writings focus on why centering assessment design around students’ cultural, social, and/or linguistic diversity is important and how performance-based assessment can support such aims. This article extends previous work by describing how a culturally responsive classroom assessment framework was created from a culturally responsive education (CRE) pedagogical framework. The goal of the framework was to guide the design and evaluation of curriculum-embedded, classroom performance assessments. Components discussed include: modification of evidence-centered design processes, teacher and/or student adaptation of construct irrelevant aspects of task prompts, addition of cultural meaningfulness questions to think alouds, and revision of task quality review protocols to promote CRE design features. Future research is needed to explore the limitations of the framework applied, and the extent to which students perceive the classroom summative assessments designed do indeed allow them to better show all they know and can do in ways related to their cultural, social, and/or linguistic identities.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"36 1","pages":"269 - 285"},"PeriodicalIF":1.5,"publicationDate":"2023-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46027666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Validity and Racial Justice in Educational Assessment 教育评估的有效性与种族公正
IF 1.5 4区 教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-05-20 DOI: 10.1080/08957347.2023.2214654
Josh Lederman
Abstract Given its centrality to assessment, until the concept of validity includes concern for racial justice, such matters will be seen as residing outside the “real” work of validation, rendering them powerless to count against the apparent scientific merit of the test. As the definition of validity has evolved, however, it holds great potential to centralize matters like racial (in)justice, positioning them as necessary validity evidence. This article reviews a history of debates over what validity should and shouldn’t encompass; we then look toward the more centralized stances on validity – the book series Standards and Educational Measurement – where we see that test use, and the social impact of test use, has been a mounting concern over the years within these publications. Finally, we explore Kane’s argument-based approach to validation, which I argue could impact racial justice concerns by centralizing them within the very notion of what makes assessment valid or invalid.
摘要鉴于其在评估中的中心地位,在有效性概念包括对种族正义的关注之前,这些问题将被视为存在于“真正”的验证工作之外,使其无法与测试的明显科学价值相抗衡。然而,随着有效性的定义不断演变,它有很大的潜力将种族正义等问题集中起来,将其定位为必要的有效性证据。这篇文章回顾了关于有效性应该包含什么和不应该包含什么的争论历史;然后,我们着眼于对有效性的更集中的立场——《标准与教育测量》系列丛书——在这本书中,我们看到,多年来,考试的使用以及考试使用的社会影响一直是这些出版物中越来越关注的问题。最后,我们探讨了凯恩基于论证的验证方法,我认为这可能会影响种族正义问题,因为它将种族正义问题集中在评估有效或无效的概念中。
{"title":"Validity and Racial Justice in Educational Assessment","authors":"Josh Lederman","doi":"10.1080/08957347.2023.2214654","DOIUrl":"https://doi.org/10.1080/08957347.2023.2214654","url":null,"abstract":"Abstract Given its centrality to assessment, until the concept of validity includes concern for racial justice, such matters will be seen as residing outside the “real” work of validation, rendering them powerless to count against the apparent scientific merit of the test. As the definition of validity has evolved, however, it holds great potential to centralize matters like racial (in)justice, positioning them as necessary validity evidence. This article reviews a history of debates over what validity should and shouldn’t encompass; we then look toward the more centralized stances on validity – the book series Standards and Educational Measurement – where we see that test use, and the social impact of test use, has been a mounting concern over the years within these publications. Finally, we explore Kane’s argument-based approach to validation, which I argue could impact racial justice concerns by centralizing them within the very notion of what makes assessment valid or invalid.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"36 1","pages":"242 - 254"},"PeriodicalIF":1.5,"publicationDate":"2023-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41535738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
College Admissions and Testing in a Time of Transformational Change 转型时期的大学招生与考试
IF 1.5 4区 教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-04-03 DOI: 10.1080/08957347.2023.2201705
Ross E. Markle
conversation
交谈
{"title":"College Admissions and Testing in a Time of Transformational Change","authors":"Ross E. Markle","doi":"10.1080/08957347.2023.2201705","DOIUrl":"https://doi.org/10.1080/08957347.2023.2201705","url":null,"abstract":"conversation","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"36 1","pages":"132 - 136"},"PeriodicalIF":1.5,"publicationDate":"2023-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42510663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Keeping Up the PACE: Evaluating Grade 8 Student Achievement Outcomes for New Hampshire’s Innovative Assessment System 保持步调:评估新罕布什尔州创新评估系统的八年级学生成就结果
IF 1.5 4区 教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-04-03 DOI: 10.1080/08957347.2023.2201700
Alexandra Lane Perez, Carla M. Evans
ABSTRACT New Hampshire’s Performance Assessment of Competency Education (PACE) innovative assessment system uses student scores from classroom performance assessments as well as other classroom tests for school accountability purposes. One concern is that not having annual state testing may incentivize schools and teachers away from teaching the breadth of the state content standards. This study examined the effects of PACE on Grade 8 test scores after 5 years of implementation using propensity score matching followed by hierarchical linear modeling. The results suggest that PACE students perform about the same, on average, in mathematics and ELA as non-PACE students on the state assessment. There was no evidence of differential effects for students who had an individualized education program or were granted FRL. Findings for this limited sample suggest schools and teachers did not sacrifice the breadth of students’ opportunity to learn the state content standards while piloting a state performance assessment reform.
摘要新罕布什尔州的能力教育绩效评估(PACE)创新评估系统使用课堂绩效评估和其他课堂测试中的学生成绩来实现学校问责目的。一个令人担忧的问题是,不进行年度州测试可能会激励学校和教师放弃教授州内容标准的广度。本研究采用倾向评分匹配和分层线性模型,检验了PACE在实施5年后对8年级考试成绩的影响。结果表明,在国家评估中,PACE学生在数学和ELA方面的平均表现与非PACE学生大致相同。没有证据表明对接受个性化教育计划或获得FRL的学生有不同的影响。对这一有限样本的调查结果表明,学校和教师在试行州绩效评估改革时,没有牺牲学生学习州内容标准的广泛机会。
{"title":"Keeping Up the PACE: Evaluating Grade 8 Student Achievement Outcomes for New Hampshire’s Innovative Assessment System","authors":"Alexandra Lane Perez, Carla M. Evans","doi":"10.1080/08957347.2023.2201700","DOIUrl":"https://doi.org/10.1080/08957347.2023.2201700","url":null,"abstract":"ABSTRACT New Hampshire’s Performance Assessment of Competency Education (PACE) innovative assessment system uses student scores from classroom performance assessments as well as other classroom tests for school accountability purposes. One concern is that not having annual state testing may incentivize schools and teachers away from teaching the breadth of the state content standards. This study examined the effects of PACE on Grade 8 test scores after 5 years of implementation using propensity score matching followed by hierarchical linear modeling. The results suggest that PACE students perform about the same, on average, in mathematics and ELA as non-PACE students on the state assessment. There was no evidence of differential effects for students who had an individualized education program or were granted FRL. Findings for this limited sample suggest schools and teachers did not sacrifice the breadth of students’ opportunity to learn the state content standards while piloting a state performance assessment reform.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"36 1","pages":"137 - 156"},"PeriodicalIF":1.5,"publicationDate":"2023-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48459890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparing Drift Detection Methods for Accurate Rasch Equating in Different Sample Sizes 不同样本量下精确Rasch方程漂移检测方法的比较
IF 1.5 4区 教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-04-03 DOI: 10.1080/08957347.2023.2201704
Sarah Alahmadi, Andrew T. Jones, Carol L. Barry, Beatriz Ibáñez
ABSTRACT Rasch common-item equating is often used in high-stakes testing to maintain equivalent passing standards across test administrations. If unaddressed, item parameter drift poses a major threat to the accuracy of Rasch common-item equating. We compared the performance of well-established and newly developed drift detection methods in small and large sample sizes, varying the proportion of test items used as anchor (common) items and the proportion of drifted anchors. In the simulated-data study, the most accurate equating was obtained in large-sample conditions with a small-moderate number of drifted anchors using the mINFIT/mOUTFIT methods. However, when any drift was present in small-sample conditions and when a large number of drifted anchors were present in large-sample conditions, all methods performed ineffectively. In the operational-data study, percent-correct standards and failure rates varied across the methods in the large-sample exam but not in the small-sample exam. Different recommendations for high- and low-volume testing programs are provided.
摘要:在高风险测试中,经常使用Rasch通用项目等式,以在测试管理部门中保持同等的通过标准。如果不加以解决,项目参数漂移会对Rasch公共项目等值的准确性构成重大威胁。我们比较了成熟的和新开发的漂移检测方法在小样本和大样本中的性能,改变了用作锚(常见)项目的测试项目的比例和漂移锚的比例。在模拟数据研究中,使用mINFIT/mOUTFIT方法,在大样本条件下,使用少量中等数量的漂移锚,获得了最准确的等式。然而,当小样本条件下出现任何漂移时,以及当大样本条件下存在大量漂移锚时,所有方法都无效。在操作数据研究中,在大样本测试中,不同方法的正确率和失败率各不相同,但在小样本测试中则不同。对高容量和低容量测试程序提供了不同的建议。
{"title":"Comparing Drift Detection Methods for Accurate Rasch Equating in Different Sample Sizes","authors":"Sarah Alahmadi, Andrew T. Jones, Carol L. Barry, Beatriz Ibáñez","doi":"10.1080/08957347.2023.2201704","DOIUrl":"https://doi.org/10.1080/08957347.2023.2201704","url":null,"abstract":"ABSTRACT Rasch common-item equating is often used in high-stakes testing to maintain equivalent passing standards across test administrations. If unaddressed, item parameter drift poses a major threat to the accuracy of Rasch common-item equating. We compared the performance of well-established and newly developed drift detection methods in small and large sample sizes, varying the proportion of test items used as anchor (common) items and the proportion of drifted anchors. In the simulated-data study, the most accurate equating was obtained in large-sample conditions with a small-moderate number of drifted anchors using the mINFIT/mOUTFIT methods. However, when any drift was present in small-sample conditions and when a large number of drifted anchors were present in large-sample conditions, all methods performed ineffectively. In the operational-data study, percent-correct standards and failure rates varied across the methods in the large-sample exam but not in the small-sample exam. Different recommendations for high- and low-volume testing programs are provided.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"36 1","pages":"157 - 170"},"PeriodicalIF":1.5,"publicationDate":"2023-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42571395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Group Generalizations of SIBTEST and Crossing-SIBTEST SIBTEST和交叉SIBTEST的多群推广
IF 1.5 4区 教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-04-03 DOI: 10.1080/08957347.2023.2201703
R. P. Chalmers, Guoguo Zheng
ABSTRACT This article presents generalizations of SIBTEST and crossing-SIBTEST statistics for differential item functioning (DIF) investigations involving more than two groups. After reviewing the original two-group setup for these statistics, a set of multigroup generalizations that support contrast matrices for joint tests of DIF are presented. To investigate the Type I error and power behavior of these generalizations, a Monte Carlo simulation study was then explored. Results indicated that the proposed generalizations are reasonably effective at recovering their respective population parameter definitions, maintain optimal Type I error control, have suitable power to detect uniform and non-uniform DIF, and in shorter tests are competitive with the generalized logistic regression and generalized Mantel–Haenszel tests for DIF.
摘要本文对涉及两组以上的差异项目功能(DIF)调查的SIBTEST和交叉SIBTEST统计进行了概括。在回顾了这些统计数据的原始两组设置后,提出了一组支持DIF联合测试的对比矩阵的多组推广。为了研究这些推广的I型误差和幂行为,随后进行了蒙特卡洛模拟研究。结果表明,所提出的推广在恢复各自的总体参数定义方面相当有效,保持了最优的I型误差控制,具有检测均匀和非均匀DIF的适当能力,并且在较短的测试中与DIF的广义逻辑回归和广义Mantel–Haenszel测试具有竞争力。
{"title":"Multi-Group Generalizations of SIBTEST and Crossing-SIBTEST","authors":"R. P. Chalmers, Guoguo Zheng","doi":"10.1080/08957347.2023.2201703","DOIUrl":"https://doi.org/10.1080/08957347.2023.2201703","url":null,"abstract":"ABSTRACT This article presents generalizations of SIBTEST and crossing-SIBTEST statistics for differential item functioning (DIF) investigations involving more than two groups. After reviewing the original two-group setup for these statistics, a set of multigroup generalizations that support contrast matrices for joint tests of DIF are presented. To investigate the Type I error and power behavior of these generalizations, a Monte Carlo simulation study was then explored. Results indicated that the proposed generalizations are reasonably effective at recovering their respective population parameter definitions, maintain optimal Type I error control, have suitable power to detect uniform and non-uniform DIF, and in shorter tests are competitive with the generalized logistic regression and generalized Mantel–Haenszel tests for DIF.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"36 1","pages":"171 - 191"},"PeriodicalIF":1.5,"publicationDate":"2023-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44337447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tracking Ordinal Development of Skills with a Longitudinal DINA Model with Polytomous Attributes 用多属性纵向DINA模型跟踪技能的有序发展
IF 1.5 4区 教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-04-03 DOI: 10.1080/08957347.2023.2201702
P. Zhan, Yao-sen Liu, Zhaohui Yu, Yanfang Pan
ABSTRACT Many educational and psychological studies have shown that the development of students is generally step-by-step (i.e. ordinal development) to a specific level. This study proposed a novel longitudinal learning diagnosis model with polytomous attributes to track students’ ordinal development in learning. Using the concept of polytomous attributes in the proposed model, the learning process of a specific skill, from non-mastery to mastery, can be divided into multiple ordinal steps in order to better characterize the learning trajectory. The results of an empirical study conducted to explore the performance of the proposed model indicated that it could adequately diagnose the ordinal development of skills in longitudinal assessments. A simulation study was also conducted to examine the estimation accuracy of general ability and the classification accuracy of attributes of the proposed model in different simulated conditions.
许多教育和心理学研究表明,学生的发展通常是逐步(即有序发展)到某一特定水平的。本研究提出了一种具有多属性的纵向学习诊断模型,用于跟踪学生在学习中的有序发展。利用该模型中的多属性概念,将特定技能从非精通到精通的学习过程划分为多个有序步骤,以便更好地表征学习轨迹。实证研究结果表明,该模型能够在纵向评估中充分诊断技能的有序发展。仿真研究了该模型在不同模拟条件下的一般能力估计精度和属性分类精度。
{"title":"Tracking Ordinal Development of Skills with a Longitudinal DINA Model with Polytomous Attributes","authors":"P. Zhan, Yao-sen Liu, Zhaohui Yu, Yanfang Pan","doi":"10.1080/08957347.2023.2201702","DOIUrl":"https://doi.org/10.1080/08957347.2023.2201702","url":null,"abstract":"ABSTRACT Many educational and psychological studies have shown that the development of students is generally step-by-step (i.e. ordinal development) to a specific level. This study proposed a novel longitudinal learning diagnosis model with polytomous attributes to track students’ ordinal development in learning. Using the concept of polytomous attributes in the proposed model, the learning process of a specific skill, from non-mastery to mastery, can be divided into multiple ordinal steps in order to better characterize the learning trajectory. The results of an empirical study conducted to explore the performance of the proposed model indicated that it could adequately diagnose the ordinal development of skills in longitudinal assessments. A simulation study was also conducted to examine the estimation accuracy of general ability and the classification accuracy of attributes of the proposed model in different simulated conditions.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"36 1","pages":"99 - 114"},"PeriodicalIF":1.5,"publicationDate":"2023-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47235911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Measurement Invariance in Relation to First Language: An Evaluation of German Reading and Spelling Tests 与第一语言相关的测量不变性:对德语阅读和拼写测试的评价
IF 1.5 4区 教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-04-03 DOI: 10.1080/08957347.2023.2201701
L. Visser, Friederike Cartschau, Ariane von Goldammer, Janin Brandenburg, M. Timmerman, M. Hasselhorn, C. Mähler
ABSTRACT The growing number of children in primary schools in Germany who have German as their second language (L2) has raised questions about the fairness of performance assessment. Fair tests are a prerequisite for distinguishing between L2 learning delay and a specific learning disability. We evaluated five commonly used reading and spelling tests for measurement invariance (MI) as a function of first language (German vs. other). Multi-group confirmatory factor analyses revealed strict MI for the Weingarten Basic Vocabulary Spelling Tests (WRTs) 3+ and 4+ and the Salzburger Reading (SLT) and Spelling (SRT) Tests, suggesting these instruments are suitable for assessing reading and spelling skills regardless of first language. The MI for A Reading Comprehension Test for First to Seventh Graders – 2nd Edition (ELFE II) was partly strict with unequal intercepts for the text subscale. We discuss the implications of this finding for assessing reading performance of children with L2.
在德国,越来越多的小学生将德语作为第二语言(L2),这引发了对绩效评估公平性的质疑。公平的测试是区分第二语言学习迟缓和特定学习障碍的先决条件。我们评估了五种常用的阅读和拼写测试的测量不变性(MI)作为第一语言的函数(德语与其他)。多组验证性因素分析显示,Weingarten基础词汇拼写测试(WRTs) 3+和4+以及萨尔茨堡阅读(SLT)和拼写(SRT)测试具有严格的MI,表明这些工具适用于评估阅读和拼写技能,无论母语如何。《小学一年级至七年级阅读理解测验第二版》(ELFE II)在文本分量表截距不相等的问题上有一定程度的严格。我们讨论了这一发现对评估二语儿童阅读表现的意义。
{"title":"Measurement Invariance in Relation to First Language: An Evaluation of German Reading and Spelling Tests","authors":"L. Visser, Friederike Cartschau, Ariane von Goldammer, Janin Brandenburg, M. Timmerman, M. Hasselhorn, C. Mähler","doi":"10.1080/08957347.2023.2201701","DOIUrl":"https://doi.org/10.1080/08957347.2023.2201701","url":null,"abstract":"ABSTRACT The growing number of children in primary schools in Germany who have German as their second language (L2) has raised questions about the fairness of performance assessment. Fair tests are a prerequisite for distinguishing between L2 learning delay and a specific learning disability. We evaluated five commonly used reading and spelling tests for measurement invariance (MI) as a function of first language (German vs. other). Multi-group confirmatory factor analyses revealed strict MI for the Weingarten Basic Vocabulary Spelling Tests (WRTs) 3+ and 4+ and the Salzburger Reading (SLT) and Spelling (SRT) Tests, suggesting these instruments are suitable for assessing reading and spelling skills regardless of first language. The MI for A Reading Comprehension Test for First to Seventh Graders – 2nd Edition (ELFE II) was partly strict with unequal intercepts for the text subscale. We discuss the implications of this finding for assessing reading performance of children with L2.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"36 1","pages":"115 - 131"},"PeriodicalIF":1.5,"publicationDate":"2023-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59806259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Maintaining Score Scales Over Time: A Comparison of Five Scoring Methods 随着时间的推移保持分数尺度:五种评分方法的比较
IF 1.5 4区 教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-01-02 DOI: 10.1080/08957347.2023.2172015
S. Y. Kim, Won‐Chan Lee
ABSTRACT This study evaluates various scoring methods including number-correct scoring, IRT theta scoring, and hybrid scoring in terms of scale-score stability over time. A simulation study was conducted to examine the relative performance of five scoring methods in terms of preserving the first two moments of scale scores for a population in a chain of linking with multiple test forms. Simulation factors included 1) the number of forms linked back to the initial form, 2) the pattern in mean shift, and 3) the proportion of common items. Results showed that scoring methods that operate with number-correct scores generally outperform those that are based on IRT proficiency estimators ( ) in terms of reproducing the mean and standard deviation of scale scores. Scoring methods performed differently as a function of patterns in a group proficiency change.
摘要本研究根据量表得分随时间的稳定性评估了各种评分方法,包括数字正确评分、IRTθ评分和混合评分。进行了一项模拟研究,以检验五种评分方法在保留多个测试形式链接链中人群的量表得分前两个矩方面的相对性能。模拟因素包括1)链接回初始表格的表格数量,2)均值偏移的模式,以及3)常见项目的比例。结果表明,在再现量表得分的平均值和标准差方面,使用数字正确得分的评分方法通常优于基于IRT熟练度估计量()的评分方法。评分方法在小组熟练程度变化中作为模式的函数表现不同。
{"title":"Maintaining Score Scales Over Time: A Comparison of Five Scoring Methods","authors":"S. Y. Kim, Won‐Chan Lee","doi":"10.1080/08957347.2023.2172015","DOIUrl":"https://doi.org/10.1080/08957347.2023.2172015","url":null,"abstract":"ABSTRACT This study evaluates various scoring methods including number-correct scoring, IRT theta scoring, and hybrid scoring in terms of scale-score stability over time. A simulation study was conducted to examine the relative performance of five scoring methods in terms of preserving the first two moments of scale scores for a population in a chain of linking with multiple test forms. Simulation factors included 1) the number of forms linked back to the initial form, 2) the pattern in mean shift, and 3) the proportion of common items. Results showed that scoring methods that operate with number-correct scores generally outperform those that are based on IRT proficiency estimators ( ) in terms of reproducing the mean and standard deviation of scale scores. Scoring methods performed differently as a function of patterns in a group proficiency change.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"36 1","pages":"60 - 79"},"PeriodicalIF":1.5,"publicationDate":"2023-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46970807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Applied Measurement in Education
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1