首页 > 最新文献

Educational Assessment最新文献

英文 中文
Beyond Agreement: Exploring Rater Effects in Large-Scale Mixed Format Assessments 超越协议:探索大规模混合格式评估中的评级效应
IF 1.5 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-08-17 DOI: 10.1080/10627197.2021.1962277
Stefanie A. Wind, Wenjing Guo
ABSTRACT Scoring procedures for the constructed-response (CR) items in large-scale mixed-format educational assessments often involve checks for rater agreement or rater reliability. Although these analyses are important, researchers have documented rater effects that persist despite rater training and that are not always detected in rater agreement and reliability analyses, such as severity/leniency, centrality/extremism, and biases. Left undetected, these effects pose threats to fairness. We illustrate how rater effects analyses can be incorporated into scoring procedures for large-scale mixed-format assessments. We used data from the National Assessment of Educational Progress (NAEP) to illustrate relatively simple analyses that can provide insight into patterns of rater judgment that may warrant additional attention. Our results suggested that the NAEP raters exhibited generally defensible psychometric properties, while also exhibiting some idiosyncrasies that could inform scoring procedures. Similar procedures could be used operationally to inform the interpretation and use of rater judgments in large-scale mixed-format assessments.
在大规模混合格式教育评估中,构建反应(CR)项目的评分程序通常包括对评分者一致性或可靠性的检查。尽管这些分析很重要,但研究人员已经证明,尽管进行了评级培训,但评级效应仍然存在,并且在评级一致性和可靠性分析中并不总是检测到,例如严厉/宽大、中心性/极端主义和偏见。如果不被发现,这些影响会对公平构成威胁。我们说明了如何将评分者效应分析纳入大规模混合格式评估的评分程序。我们使用来自国家教育进步评估(NAEP)的数据来说明相对简单的分析,这些分析可以提供对评分判断模式的洞察,这些模式可能需要额外的关注。我们的研究结果表明,NAEP评分者表现出一般可辩护的心理测量特性,同时也表现出一些可以为评分程序提供信息的特质。类似的程序可在业务上用于大规模混合格式评估中解释和使用较重的判断。
{"title":"Beyond Agreement: Exploring Rater Effects in Large-Scale Mixed Format Assessments","authors":"Stefanie A. Wind, Wenjing Guo","doi":"10.1080/10627197.2021.1962277","DOIUrl":"https://doi.org/10.1080/10627197.2021.1962277","url":null,"abstract":"ABSTRACT Scoring procedures for the constructed-response (CR) items in large-scale mixed-format educational assessments often involve checks for rater agreement or rater reliability. Although these analyses are important, researchers have documented rater effects that persist despite rater training and that are not always detected in rater agreement and reliability analyses, such as severity/leniency, centrality/extremism, and biases. Left undetected, these effects pose threats to fairness. We illustrate how rater effects analyses can be incorporated into scoring procedures for large-scale mixed-format assessments. We used data from the National Assessment of Educational Progress (NAEP) to illustrate relatively simple analyses that can provide insight into patterns of rater judgment that may warrant additional attention. Our results suggested that the NAEP raters exhibited generally defensible psychometric properties, while also exhibiting some idiosyncrasies that could inform scoring procedures. Similar procedures could be used operationally to inform the interpretation and use of rater judgments in large-scale mixed-format assessments.","PeriodicalId":46209,"journal":{"name":"Educational Assessment","volume":"26 1","pages":"264 - 283"},"PeriodicalIF":1.5,"publicationDate":"2021-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48562788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Investigating the Use of Assessment Data by Primary School Teachers: Insights from a Large-scale Survey in Ireland 调查小学教师评估数据的使用:来自爱尔兰大规模调查的见解
IF 1.5 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-07-03 DOI: 10.1080/10627197.2021.1917358
Vasiliki Pitsia, Anastasios Karakolidis, P. Lehane
ABSTRACT Evidence suggests that the quality of teachers’ instructional practices can be improved when these are informed by relevant assessment data. Drawing on a sample of 1,300 primary school teachers in Ireland, this study examined the extent to which teachers use standardized test results for instructional purposes as well as the role of several factors in predicting this use. Specifically, the study analyzed data from a cross-sectional survey that gathered information about teachers’ use of, experiences with, and attitudes toward assessment data from standardized tests. After taking other teacher and school characteristics into consideration, the analysis revealed that teachers with more positive attitudes toward standardized tests and those who were often engaged in some form of professional development on standardized testing tended to use assessment data to inform their teaching more frequently. Based on the findings, policy and practice implications are discussed.
摘要证据表明,当教师的教学实践质量得到相关评估数据的支持时,这些质量可以得到提高。这项研究以爱尔兰1300名小学教师为样本,考察了教师将标准化考试结果用于教学目的的程度,以及几个因素在预测这种使用中的作用。具体而言,该研究分析了一项横断面调查的数据,该调查收集了教师对标准化考试评估数据的使用、经验和态度的信息。在考虑了其他教师和学校的特点后,分析显示,对标准化考试态度更积极的教师,以及那些经常在标准化考试中进行某种形式的专业发展的教师,倾向于更频繁地使用评估数据来告知他们的教学。根据调查结果,讨论了政策和实践意义。
{"title":"Investigating the Use of Assessment Data by Primary School Teachers: Insights from a Large-scale Survey in Ireland","authors":"Vasiliki Pitsia, Anastasios Karakolidis, P. Lehane","doi":"10.1080/10627197.2021.1917358","DOIUrl":"https://doi.org/10.1080/10627197.2021.1917358","url":null,"abstract":"ABSTRACT Evidence suggests that the quality of teachers’ instructional practices can be improved when these are informed by relevant assessment data. Drawing on a sample of 1,300 primary school teachers in Ireland, this study examined the extent to which teachers use standardized test results for instructional purposes as well as the role of several factors in predicting this use. Specifically, the study analyzed data from a cross-sectional survey that gathered information about teachers’ use of, experiences with, and attitudes toward assessment data from standardized tests. After taking other teacher and school characteristics into consideration, the analysis revealed that teachers with more positive attitudes toward standardized tests and those who were often engaged in some form of professional development on standardized testing tended to use assessment data to inform their teaching more frequently. Based on the findings, policy and practice implications are discussed.","PeriodicalId":46209,"journal":{"name":"Educational Assessment","volume":"26 1","pages":"145 - 162"},"PeriodicalIF":1.5,"publicationDate":"2021-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10627197.2021.1917358","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42751021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Using Full-information Item Analysis to Improve Item Quality 利用全信息项目分析提高项目质量
IF 1.5 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-07-03 DOI: 10.1080/10627197.2021.1946390
T. Haladyna, Michael C. Rodriguez
ABSTRACT Full-information item analysis provides item developers and reviewers comprehensive empirical evidence of item quality, including option response frequency, point-biserial index (PBI) for distractors, mean-scores of respondents selecting each option, and option trace lines. The multi-serial index (MSI) is introduced as a more informative item-total correlation, accounting for variable distractor performance. The overall item PBI is empirically compared to the MSI. For items from an operational mathematics and reading test, poorly performing distractors are systematically removed to recompute the MSI, indicating improvements in item quality. Case studies for specific items with different characteristics are described to illustrate a variety of outcomes, focused on improving item discrimination. Full-information item analyses are presented for each case study item, providing clear examples of interpretation and use of item analyses. A summary of recommendations for item analysts is provided.
全信息项目分析为项目的开发者和评论者提供了项目质量的综合经验证据,包括选项响应频率、干扰因素的点双列指数(PBI)、选择每个选项的被调查者的平均得分以及选项跟踪线。多序列指数(MSI)被引入作为一个更信息的项目-总相关,考虑到可变的干扰性能。总体项目PBI与MSI进行经验比较。对于运算数学和阅读测试中的项目,系统地移除表现不佳的干扰物以重新计算MSI,表明项目质量的改善。描述了具有不同特征的特定项目的案例研究,以说明各种结果,重点是改善项目歧视。为每个案例研究项目提供了完整的信息项目分析,提供了解释和使用项目分析的清晰示例。提供了对项目分析人员的建议摘要。
{"title":"Using Full-information Item Analysis to Improve Item Quality","authors":"T. Haladyna, Michael C. Rodriguez","doi":"10.1080/10627197.2021.1946390","DOIUrl":"https://doi.org/10.1080/10627197.2021.1946390","url":null,"abstract":"ABSTRACT Full-information item analysis provides item developers and reviewers comprehensive empirical evidence of item quality, including option response frequency, point-biserial index (PBI) for distractors, mean-scores of respondents selecting each option, and option trace lines. The multi-serial index (MSI) is introduced as a more informative item-total correlation, accounting for variable distractor performance. The overall item PBI is empirically compared to the MSI. For items from an operational mathematics and reading test, poorly performing distractors are systematically removed to recompute the MSI, indicating improvements in item quality. Case studies for specific items with different characteristics are described to illustrate a variety of outcomes, focused on improving item discrimination. Full-information item analyses are presented for each case study item, providing clear examples of interpretation and use of item analyses. A summary of recommendations for item analysts is provided.","PeriodicalId":46209,"journal":{"name":"Educational Assessment","volume":"26 1","pages":"198 - 211"},"PeriodicalIF":1.5,"publicationDate":"2021-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10627197.2021.1946390","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42811776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
The Impact of Disengaged Test Taking on a State’s Accountability Test Results 不参与考试对州问责制考试结果的影响
IF 1.5 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-07-03 DOI: 10.1080/10627197.2021.1956897
S. Wise, Sukkeun Im, Jay Lee
ABSTRACT This study investigated test-taking engagement on the Spring 2019 administration of a large-scale state summative assessment. Through the identification of rapid-guessing behavior – which is a validated indicator of disengagement – the percentage of Grade 8 test events with meaningful amounts of rapid guessing was 5.5% in mathematics, 6.7% in English Language Arts (ELA), and 3.5% in science. Disengagement rates on the state summative test were also found to vary materially across gender, ethnicity, Individualized Educational Plan (IEP) status, Limited English Proficient (LEP) status, free and reduced lunch (FRL) status, and disability status. However, school mean performance, proficiency rates, and relative ranking were only minimally affected by disengagement. Overall, results of this study indicate that disengagement has a material impact on individual state summative test scores, though its impact on score aggregations may be relatively minor.
摘要本研究调查了2019年春季大规模州总结性评估的参与情况。通过识别快速猜测行为(这是一个有效的脱离接触指标),8年级测试中有意义的快速猜测的百分比在数学中为5.5%,在英语语言艺术(ELA)中为6.7%,在科学中为3.5%。州总结性考试的脱离率也因性别、种族、个性化教育计划(IEP)状态、英语能力有限(LEP)状态,免费和减少午餐(FRL)状态以及残疾状态而存在重大差异。然而,脱离接触对学校平均成绩、熟练率和相对排名的影响微乎其微。总的来说,这项研究的结果表明,脱离对个人状态总结性考试成绩有实质性影响,尽管它对成绩汇总的影响可能相对较小。
{"title":"The Impact of Disengaged Test Taking on a State’s Accountability Test Results","authors":"S. Wise, Sukkeun Im, Jay Lee","doi":"10.1080/10627197.2021.1956897","DOIUrl":"https://doi.org/10.1080/10627197.2021.1956897","url":null,"abstract":"ABSTRACT This study investigated test-taking engagement on the Spring 2019 administration of a large-scale state summative assessment. Through the identification of rapid-guessing behavior – which is a validated indicator of disengagement – the percentage of Grade 8 test events with meaningful amounts of rapid guessing was 5.5% in mathematics, 6.7% in English Language Arts (ELA), and 3.5% in science. Disengagement rates on the state summative test were also found to vary materially across gender, ethnicity, Individualized Educational Plan (IEP) status, Limited English Proficient (LEP) status, free and reduced lunch (FRL) status, and disability status. However, school mean performance, proficiency rates, and relative ranking were only minimally affected by disengagement. Overall, results of this study indicate that disengagement has a material impact on individual state summative test scores, though its impact on score aggregations may be relatively minor.","PeriodicalId":46209,"journal":{"name":"Educational Assessment","volume":"26 1","pages":"163 - 174"},"PeriodicalIF":1.5,"publicationDate":"2021-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10627197.2021.1956897","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45119734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Assessing Quality of Teaching from Different Perspectives: Measurement Invariance across Teachers and Classes 从不同角度评估教学质量:教师和班级之间的测量不变性
IF 1.5 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-04-03 DOI: 10.1080/10627197.2020.1858785
G. Krammer, Barbara Pflanzl, Gerlinde Lenske, Johannes Mayr
ABSTRACT Comparing teachers’ self-assessment to classes’ assessment of quality of teaching can offer insights for educational research and be a valuable resource for teachers’ continuous professional development. However, the quality of teaching needs to be measured in the same way across perspectives for this comparison to be meaningful. We used data from 622 teachers self-assessing aspects of quality of teaching and of their classes (12229 students) assessing the same aspects. Perspectives were compared with measurement invariance analyses. Teachers and classes agreed on the average level of instructional clarity, and disagreed over teacher-student relationship and performance monitoring, suggesting that mean differences across perspectives may not be as consistent as the literature claims. Results showed a nonuniform measurement bias for only one item of instructional clarity, while measurement of the other aspects was directly comparable. We conclude the viability of comparing teachers’ and classes’ perspectives of aspects of quality of teaching.
将教师自我评价与课堂教学质量评价进行比较,可以为教育研究提供见解,是教师持续专业发展的宝贵资源。然而,为了使这种比较有意义,教学质量需要以同样的方式跨视角进行衡量。我们使用的数据来自622名教师对教学质量的自我评估,以及他们的班级(12229名学生)对相同方面的评估。比较透视与测量不变性分析。教师和班级在教学清晰度的平均水平上达成一致,但在师生关系和绩效监控方面存在分歧,这表明不同观点的平均差异可能不像文献所声称的那样一致。结果显示,只有一个项目的教学清晰度不均匀的测量偏差,而测量的其他方面是直接可比。我们总结了比较教师和学生在教学质量方面的观点的可行性。
{"title":"Assessing Quality of Teaching from Different Perspectives: Measurement Invariance across Teachers and Classes","authors":"G. Krammer, Barbara Pflanzl, Gerlinde Lenske, Johannes Mayr","doi":"10.1080/10627197.2020.1858785","DOIUrl":"https://doi.org/10.1080/10627197.2020.1858785","url":null,"abstract":"ABSTRACT Comparing teachers’ self-assessment to classes’ assessment of quality of teaching can offer insights for educational research and be a valuable resource for teachers’ continuous professional development. However, the quality of teaching needs to be measured in the same way across perspectives for this comparison to be meaningful. We used data from 622 teachers self-assessing aspects of quality of teaching and of their classes (12229 students) assessing the same aspects. Perspectives were compared with measurement invariance analyses. Teachers and classes agreed on the average level of instructional clarity, and disagreed over teacher-student relationship and performance monitoring, suggesting that mean differences across perspectives may not be as consistent as the literature claims. Results showed a nonuniform measurement bias for only one item of instructional clarity, while measurement of the other aspects was directly comparable. We conclude the viability of comparing teachers’ and classes’ perspectives of aspects of quality of teaching.","PeriodicalId":46209,"journal":{"name":"Educational Assessment","volume":"26 1","pages":"88 - 103"},"PeriodicalIF":1.5,"publicationDate":"2021-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10627197.2020.1858785","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42517403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Predicting Retention in Higher Education from high-stakes Exams or School GPA 从高风险考试或学校平均成绩预测高等教育的保留率
IF 1.5 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-02-06 DOI: 10.1080/10627197.2022.2130748
M. Meeter, M. V. van Brederode
ABSTRACT The transition from secondary to tertiary education varies from country to country. In many countries, secondary school is concluded with high-stakes, national exams, or high-stakes entry tests are used for admissions to tertiary education. In other countries, secondary-school grade point average (GPA) is the determining factor. In the Netherlands, both play a role. With administrative data of close to 180,000 students, we investigated whether national exam scores or secondary school GPA was a better predictor of tertiary first-year retention. For both university education and higher professional education, secondary school GPA was the better prediction of retention, to the extent that national exams did not explain any additional variance. Moreover, for students who failed their exam, being held back by the secondary school for an additional year and entering tertiary education one year later, GPA in the year of failure remained as predictive as for students who had passed their exams and started tertiary education immediately. National exam scores, on the other hand, had no predictive value at all for these students. It is concluded that secondary school GPA measures aspects of student performance that is not included in high-stakes national exams, but that are predictive of subsequent success in tertiary education.
从中等教育到高等教育的转变因国家而异。在许多国家,中学是通过高风险的国家考试结束的,或者高风险的入学考试用于进入高等教育。在其他国家,中学平均绩点(GPA)是决定因素。在荷兰,两者都发挥了作用。利用近18万名学生的行政数据,我们调查了国家考试成绩或中学GPA是否能更好地预测大学一年级的留校率。无论是大学教育还是高等职业教育,中学GPA都能更好地预测留校率,在某种程度上,国家考试不能解释任何额外的差异。此外,对于那些考试不及格的学生,他们被中学多留了一年,一年后进入高等教育,他们在考试不及格那年的GPA仍然与通过考试并立即开始高等教育的学生一样具有预测性。另一方面,国家考试成绩对这些学生没有任何预测价值。结论是,中学GPA衡量的是学生表现的一些方面,这些方面不包括在高风险的国家考试中,但却预示着随后在高等教育中的成功。
{"title":"Predicting Retention in Higher Education from high-stakes Exams or School GPA","authors":"M. Meeter, M. V. van Brederode","doi":"10.1080/10627197.2022.2130748","DOIUrl":"https://doi.org/10.1080/10627197.2022.2130748","url":null,"abstract":"ABSTRACT The transition from secondary to tertiary education varies from country to country. In many countries, secondary school is concluded with high-stakes, national exams, or high-stakes entry tests are used for admissions to tertiary education. In other countries, secondary-school grade point average (GPA) is the determining factor. In the Netherlands, both play a role. With administrative data of close to 180,000 students, we investigated whether national exam scores or secondary school GPA was a better predictor of tertiary first-year retention. For both university education and higher professional education, secondary school GPA was the better prediction of retention, to the extent that national exams did not explain any additional variance. Moreover, for students who failed their exam, being held back by the secondary school for an additional year and entering tertiary education one year later, GPA in the year of failure remained as predictive as for students who had passed their exams and started tertiary education immediately. National exam scores, on the other hand, had no predictive value at all for these students. It is concluded that secondary school GPA measures aspects of student performance that is not included in high-stakes national exams, but that are predictive of subsequent success in tertiary education.","PeriodicalId":46209,"journal":{"name":"Educational Assessment","volume":"28 1","pages":"1 - 10"},"PeriodicalIF":1.5,"publicationDate":"2021-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42962496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Anchors Aweigh: How the Choice of Anchor Items Affects the Vertical Scaling of 3PL Data with the Rasch Model Anchors Aweigh:锚定项目的选择如何影响Rasch模型下3PL数据的垂直缩放
IF 1.5 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-01-20 DOI: 10.1080/10627197.2020.1858782
Glenn Thomas Waterbury, Christine E. DeMars
ABSTRACT Vertical scaling is used to put tests of different difficulty onto a common metric. The Rasch model is often used to perform vertical scaling, despite its strict functional form. Few, if any, studies have examined anchor item choice when using the Rasch model to vertically scale data that do not fit the model. The purpose of this study was to investigate the implications of anchor item choice on bias in growth estimates when data do not fit the Rasch model. Data were generated with varying levels of true difference between grades and levels of the lower asymptote. When true growth or the lower asymptote were zero, estimates were unbiased and anchor item choice was not consequential. As true growth and the lower asymptote both increased, growth was underestimated and choice of anchor items had an impact. Easy anchor items led to less biased estimates of growth than hard anchor items.
垂直缩放是将不同难度的测试放到一个共同的度量上。Rasch模型通常用于执行垂直缩放,尽管它具有严格的功能形式。当使用Rasch模型来垂直缩放不适合模型的数据时,很少有研究检查锚项目选择。本研究的目的是探讨当数据不符合Rasch模型时,锚项目选择对增长估计偏差的影响。生成的数据具有不同等级和较低渐近线水平之间的真实差异水平。当真实增长率或下渐近线为零时,估计是无偏的,锚项目的选择不重要。当真实增长率和下渐近线都增加时,增长率被低估了,锚项目的选择产生了影响。与硬锚项目相比,容易锚项目导致的增长估计偏差更小。
{"title":"Anchors Aweigh: How the Choice of Anchor Items Affects the Vertical Scaling of 3PL Data with the Rasch Model","authors":"Glenn Thomas Waterbury, Christine E. DeMars","doi":"10.1080/10627197.2020.1858782","DOIUrl":"https://doi.org/10.1080/10627197.2020.1858782","url":null,"abstract":"ABSTRACT Vertical scaling is used to put tests of different difficulty onto a common metric. The Rasch model is often used to perform vertical scaling, despite its strict functional form. Few, if any, studies have examined anchor item choice when using the Rasch model to vertically scale data that do not fit the model. The purpose of this study was to investigate the implications of anchor item choice on bias in growth estimates when data do not fit the Rasch model. Data were generated with varying levels of true difference between grades and levels of the lower asymptote. When true growth or the lower asymptote were zero, estimates were unbiased and anchor item choice was not consequential. As true growth and the lower asymptote both increased, growth was underestimated and choice of anchor items had an impact. Easy anchor items led to less biased estimates of growth than hard anchor items.","PeriodicalId":46209,"journal":{"name":"Educational Assessment","volume":"26 1","pages":"175 - 197"},"PeriodicalIF":1.5,"publicationDate":"2021-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10627197.2020.1858782","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47196823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Model meets reality: Validating a new behavioral measure for test-taking effort 模型符合现实:验证测试工作的新行为度量
IF 1.5 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-01-12 DOI: 10.1080/10627197.2020.1858786
Esther Ulitzsch, Christiane Penk, Matthias von Davier, S. Pohl
ABSTRACT Identifying and considering test-taking effort is of utmost importance for drawing valid inferences on examinee competency in low-stakes tests. Different approaches exist for doing so. The speed-accuracy+engagement model aims at identifying non-effortful test-taking behavior in terms of nonresponse and rapid guessing based on responses and response times. The model allows for identifying rapid-guessing behavior on the item-by-examinee level whilst jointly modeling the processes underlying rapid guessing and effortful responding. To assess whether the model indeed provides a valid measure of test-taking effort, we investigate (1) convergent validity with previously developed behavioral as well as self-report measures on guessing behavior and effort, (2) fit within the nomological network of test-taking motivation derived from expectancy-value theory, and (3) ability to detect differences between groups that can be assumed to differ in test-taking effort. Results suggest that the model captures central aspects of non-effortful test-taking behavior. While it does not cover the whole spectrum of non-effortful test-taking behavior, it provides a measure for some aspects of it, in a manner that is less subjective than self-reports. The article concludes with a discussion of implications for the development of behavioral measures of non-effortful test-taking behavior.
识别和考虑应试努力对于在低风险测试中对考生能力做出有效推断至关重要。有不同的方法可以做到这一点。速度-准确度+参与模型旨在根据反应和反应时间识别不费力的应试行为,即不反应和快速猜测。该模型允许在每个考生的水平上识别快速猜测行为,同时联合建模快速猜测和努力反应的过程。为了评估该模型是否确实提供了一个有效的测试努力,我们研究了(1)与先前开发的猜测行为和努力的行为和自我报告测量的收敛效度,(2)与来自期望值理论的考试动机的法则网络的拟合,以及(3)检测可以假设不同的小组之间的测试努力差异的能力。结果表明,该模型捕捉了不费力的考试行为的核心方面。虽然它没有涵盖所有不费力的应试行为,但它提供了一个衡量它的某些方面的方法,以一种比自我报告更少主观的方式。文章最后讨论了开发不费力考试行为的行为测量方法的意义。
{"title":"Model meets reality: Validating a new behavioral measure for test-taking effort","authors":"Esther Ulitzsch, Christiane Penk, Matthias von Davier, S. Pohl","doi":"10.1080/10627197.2020.1858786","DOIUrl":"https://doi.org/10.1080/10627197.2020.1858786","url":null,"abstract":"ABSTRACT Identifying and considering test-taking effort is of utmost importance for drawing valid inferences on examinee competency in low-stakes tests. Different approaches exist for doing so. The speed-accuracy+engagement model aims at identifying non-effortful test-taking behavior in terms of nonresponse and rapid guessing based on responses and response times. The model allows for identifying rapid-guessing behavior on the item-by-examinee level whilst jointly modeling the processes underlying rapid guessing and effortful responding. To assess whether the model indeed provides a valid measure of test-taking effort, we investigate (1) convergent validity with previously developed behavioral as well as self-report measures on guessing behavior and effort, (2) fit within the nomological network of test-taking motivation derived from expectancy-value theory, and (3) ability to detect differences between groups that can be assumed to differ in test-taking effort. Results suggest that the model captures central aspects of non-effortful test-taking behavior. While it does not cover the whole spectrum of non-effortful test-taking behavior, it provides a measure for some aspects of it, in a manner that is less subjective than self-reports. The article concludes with a discussion of implications for the development of behavioral measures of non-effortful test-taking behavior.","PeriodicalId":46209,"journal":{"name":"Educational Assessment","volume":"26 1","pages":"104 - 124"},"PeriodicalIF":1.5,"publicationDate":"2021-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10627197.2020.1858786","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46282388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Do They See What I See? Toward a Better Understanding of the 7Cs Framework of Teaching Effectiveness 他们看到我看到的了吗?更好地理解7Cs教学有效性框架
IF 1.5 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2021-01-05 DOI: 10.1080/10627197.2020.1858784
S. Phillips, Ronald F. Ferguson, Jacob F. S. Rowley
ABSTRACT School systems are increasingly incorporating student perceptions of teaching effectiveness into educator accountability systems. Using Tripod’s 7Cs™ Framework of Teaching Effectiveness, this study examines key issues in validating student perception data for use in this manner. Analyses examine the internal structure of 7Cs scores and the extent to which scores predict key criteria. Results offer the first empirical evidence that 7Cs scores capture seven distinct dimensions of teaching effectiveness even as they also confirm prior research concluding 7Cs scores are largely unidimensional. At the same time, results demonstrate a modest relationship between 7Cs scores and teacher self-assessments of their own effectiveness. Together, findings suggest 7Cs scores can be used to collect meaningful information about over-arching effectiveness. However, additional evidence is warranted before giving 7Cs scores as much weight in high-stakes contexts as value-added test-score gains or expert classroom observations.
摘要学校系统越来越多地将学生对教学有效性的看法纳入教育者问责制。使用三脚架的7Cs™ 在教学有效性框架下,本研究探讨了验证学生感知数据以这种方式使用的关键问题。分析考察了7Cs分数的内部结构以及分数预测关键标准的程度。研究结果提供了第一个经验证据,证明7Cs成绩反映了教学效果的七个不同维度,尽管它们也证实了先前的研究结论,即7Cs成绩在很大程度上是一维的。同时,研究结果表明,7Cs成绩与教师对自身效能的自我评估之间存在适度的关系。总之,研究结果表明,7Cs评分可以用来收集有关总体有效性的有意义的信息。然而,在高风险环境中,在给予7Cs分数与增值测试分数或专家课堂观察同等的权重之前,需要额外的证据。
{"title":"Do They See What I See? Toward a Better Understanding of the 7Cs Framework of Teaching Effectiveness","authors":"S. Phillips, Ronald F. Ferguson, Jacob F. S. Rowley","doi":"10.1080/10627197.2020.1858784","DOIUrl":"https://doi.org/10.1080/10627197.2020.1858784","url":null,"abstract":"ABSTRACT School systems are increasingly incorporating student perceptions of teaching effectiveness into educator accountability systems. Using Tripod’s 7Cs™ Framework of Teaching Effectiveness, this study examines key issues in validating student perception data for use in this manner. Analyses examine the internal structure of 7Cs scores and the extent to which scores predict key criteria. Results offer the first empirical evidence that 7Cs scores capture seven distinct dimensions of teaching effectiveness even as they also confirm prior research concluding 7Cs scores are largely unidimensional. At the same time, results demonstrate a modest relationship between 7Cs scores and teacher self-assessments of their own effectiveness. Together, findings suggest 7Cs scores can be used to collect meaningful information about over-arching effectiveness. However, additional evidence is warranted before giving 7Cs scores as much weight in high-stakes contexts as value-added test-score gains or expert classroom observations.","PeriodicalId":46209,"journal":{"name":"Educational Assessment","volume":"26 1","pages":"69 - 87"},"PeriodicalIF":1.5,"publicationDate":"2021-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10627197.2020.1858784","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48730441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
The Effect of Linguistic Factors on Assessment of English Language Learners’ Mathematical Ability: A Differential Item Functioning Analysis 语言因素对英语学习者数学能力评价的影响——差异项功能分析
IF 1.5 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2020-12-17 DOI: 10.1080/10627197.2020.1858783
Stephanie Buono, E. Jang
ABSTRACT Increasing linguistic diversity in classrooms has led researchers to examine the validity and fairness of standardized achievement tests, specifically concerning whether test score interpretations are free of bias and score use is fair for all students. This study examined whether mathematics achievement test items that contain complex language function differently between two language subgroups: native English speakers (EL1, n= 1 000), and English language learners (ELL, n= 1 000). Confirmatory Differential Item Functioning (DIF) analyses using a SIBTEST were performed on 28 mathematics assessment items. Eleven items were identified to have complex language features, and DIF analyses revealed that seven of these items (63%) favored EL1s over ELLs. Effect sizes were moderate (0.05 ≤ <0.10) for six items, and marginal ( <0.05) for one item. This paper discusses validity issues with math achievement test items assessing ELLs and calls for careful test development and instructional accommodation in the classroom.
摘要随着课堂语言多样性的增加,研究人员开始检验标准化成绩测试的有效性和公平性,特别是测试成绩的解释是否没有偏见,成绩的使用是否对所有学生都公平。本研究考察了包含复杂语言的数学成绩测试项目在两个语言亚组之间的功能是否不同:母语为英语的人(EL1,n=1000)和英语学习者(ELL,n=1000)。使用SIBTEST对28个数学评估项目进行了验证性差异项目功能分析。11个项目被确定具有复杂的语言特征,DIF分析显示,其中7个项目(63%)更喜欢EL1而不是ELL。六个项目的效果大小为中等(0.05≤<0.10),一个项目的影响大小为边际(<0.05)。本文讨论了评估ELLs的数学成绩测试项目的有效性问题,并呼吁在课堂上仔细进行测试开发和教学调整。
{"title":"The Effect of Linguistic Factors on Assessment of English Language Learners’ Mathematical Ability: A Differential Item Functioning Analysis","authors":"Stephanie Buono, E. Jang","doi":"10.1080/10627197.2020.1858783","DOIUrl":"https://doi.org/10.1080/10627197.2020.1858783","url":null,"abstract":"ABSTRACT Increasing linguistic diversity in classrooms has led researchers to examine the validity and fairness of standardized achievement tests, specifically concerning whether test score interpretations are free of bias and score use is fair for all students. This study examined whether mathematics achievement test items that contain complex language function differently between two language subgroups: native English speakers (EL1, n= 1 000), and English language learners (ELL, n= 1 000). Confirmatory Differential Item Functioning (DIF) analyses using a SIBTEST were performed on 28 mathematics assessment items. Eleven items were identified to have complex language features, and DIF analyses revealed that seven of these items (63%) favored EL1s over ELLs. Effect sizes were moderate (0.05 ≤ <0.10) for six items, and marginal ( <0.05) for one item. This paper discusses validity issues with math achievement test items assessing ELLs and calls for careful test development and instructional accommodation in the classroom.","PeriodicalId":46209,"journal":{"name":"Educational Assessment","volume":"26 1","pages":"125 - 144"},"PeriodicalIF":1.5,"publicationDate":"2020-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/10627197.2020.1858783","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49511402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
Educational Assessment
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1