National Center for Research on Evaluation, Standards, and Student Testing最新文献_第2页

Providing Validity Evidence to Improve the Assessment of English Language Learners. CRESST Report 738. 为提高英语学习者的评价提供效度证据。738号报告。

National Center for Research on Evaluation, Standards, and Student Testing

Pub Date : 2008-08-01 DOI: 10.1037/e643102011-001

M. Wolf, J. Herman, Jinok S. Kim, J. Abedi, Seth Leon, Noelle C. Griffin, Patina L. Bachman, Sandy Chang, Tim Farnsworth, Hyekyung Jung, J. Nollner, H. Shin

This research project addresses the validity of assessments used to measure the performance of English language learners (ELLs), such as those mandated by the No Child Left Behind Act of 2001 (NCLB, 2002). The goals of the research are to help educators understand and improve ELL performance by investigating the validity of their current assessments, and to provide states with much needed guidance to improve the validity of their English language proficiency (ELP) and academic achievement assessments for ELL students. The research has three phases. In the first phase, the researchers analyze existing data and documents to understand the nature and validity of states’ current practices and their priority needs. This first phase is exploratory in that the researchers identify key validity issues by examining the existing data and formulate research areas where further investigation is needed for the second phase. In the second phase of the research, the researchers will deepen their analysis of the areas identified from Phase I findings. In the third phase of the research, the researchers will develop specific guidelines on which states may base their ELL assessment policy and practice. The present report focuses on the researchers’ Phase I research activities and results. The report also discusses preliminary implications and recommendations for improving ELL assessment systems. 1 We would like to thank Lyle Bachman, Alison Bailey, Frances Butler, Diane August, and Guillermo SolanoFlores for their valuable comments on earlier drafts of this report. We are also very grateful to our three participating states for their willingness to share their data and support of our work.

本研究项目探讨了用于衡量英语学习者(ELLs)表现的评估的有效性，例如2001年《不让一个孩子掉队法》(NCLB, 2002)规定的评估。本研究的目的是通过调查当前评估的有效性来帮助教育工作者了解和提高外语学习成绩，并为各州提供急需的指导，以提高他们对外语学生的英语语言能力(ELP)和学业成绩评估的有效性。这项研究分为三个阶段。在第一阶段，研究人员分析现有数据和文件，以了解各州当前做法的性质和有效性以及它们的优先需求。第一阶段是探索性的，研究人员通过检查现有数据确定关键的有效性问题，并制定第二阶段需要进一步调查的研究领域。在第二阶段的研究中，研究人员将对第一阶段发现的领域进行深入分析。在研究的第三阶段，研究人员将制定具体的指导方针，各州可以以此为基础制定ELL评估政策和实践。本报告的重点是研究人员的第一阶段的研究活动和结果。报告还讨论了改善ELL评估系统的初步影响和建议。1 .我们要感谢莱尔·巴赫曼、艾莉森·贝利、弗朗西斯·巴特勒、黛安·奥古斯特和吉列尔莫·索拉纳·洛夫雷斯对本报告早期草稿提出的宝贵意见。我们也非常感谢三个参与国愿意分享数据并支持我们的工作。

{"title":"Providing Validity Evidence to Improve the Assessment of English Language Learners. CRESST Report 738.","authors":"M. Wolf, J. Herman, Jinok S. Kim, J. Abedi, Seth Leon, Noelle C. Griffin, Patina L. Bachman, Sandy Chang, Tim Farnsworth, Hyekyung Jung, J. Nollner, H. Shin","doi":"10.1037/e643102011-001","DOIUrl":"https://doi.org/10.1037/e643102011-001","url":null,"abstract":"This research project addresses the validity of assessments used to measure the performance of English language learners (ELLs), such as those mandated by the No Child Left Behind Act of 2001 (NCLB, 2002). The goals of the research are to help educators understand and improve ELL performance by investigating the validity of their current assessments, and to provide states with much needed guidance to improve the validity of their English language proficiency (ELP) and academic achievement assessments for ELL students. The research has three phases. In the first phase, the researchers analyze existing data and documents to understand the nature and validity of states’ current practices and their priority needs. This first phase is exploratory in that the researchers identify key validity issues by examining the existing data and formulate research areas where further investigation is needed for the second phase. In the second phase of the research, the researchers will deepen their analysis of the areas identified from Phase I findings. In the third phase of the research, the researchers will develop specific guidelines on which states may base their ELL assessment policy and practice. The present report focuses on the researchers’ Phase I research activities and results. The report also discusses preliminary implications and recommendations for improving ELL assessment systems. 1 We would like to thank Lyle Bachman, Alison Bailey, Frances Butler, Diane August, and Guillermo SolanoFlores for their valuable comments on earlier drafts of this report. We are also very grateful to our three participating states for their willingness to share their data and support of our work.","PeriodicalId":19116,"journal":{"name":"National Center for Research on Evaluation, Standards, and Student Testing","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2008-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75331741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

Recommendations for Assessing English Language Learners: English Language Proficiency Measures and Accommodation Uses. Recommendations Report (Part 3 of 3). CRESST Report 737. 评估英语学习者的建议:英语语言能力测量和住宿使用。建议报告(3 / 3)。

National Center for Research on Evaluation, Standards, and Student Testing

Pub Date : 2008-07-01 DOI: 10.1037/e643112011-001

M. Wolf, J. Herman, Lyle F. Bachman, A. Bailey, Noelle C. Griffin

The No Child Left Behind Act of 2001 (NCLB, 2002) has had a great impact on states’ policies in assessing English language learner (ELL) students. The legislation requires states to develop or adopt sound assessments in order to validly measure the ELL students’ English language proficiency, as well as content knowledge and skills. While states have moved rapidly to meet these requirements, they face challenges to validate their current assessment and accountability systems for ELL students, partly due to the lack of resources. Considering the significant role of assessment in guiding decisions about organizations and individuals, validity is a paramount concern. In light of this, we reviewed the current literature and policy regarding ELL assessment in order to inform practitioners of the key issues to consider in their validation process. Drawn from our review of literature and practice, we developed a set of guidelines and recommendations for practitioners to use as a resource to improve their ELL assessment systems. The present report is the last component of the series, providing recommendations for state policy and practice in assessing ELL students. It also discusses areas for future research and development. Introduction and Background English language learners (ELLs) are the fastest growing subgroup in the nation. Over a 10-year period between the 1994–1995 and 2004–2005 school years, the enrollment of ELL students grew over 60%, while the total K–12 growth was just over 2% (Office of English Language Acquisition [OELA], n.d.). The increased rate is more astounding for some states. For instance, North Carolina and Nevada have reported their ELL population growth rate as 500% and 200% respectively for the past 10-year period (Batlova, Fix, & Murray, 2005, as cited in Short & Fitzsimmons, 2007). Not only is the size of the ELL population is growing, but the diversity of these students is becoming more extensive. Over 400 different languages are reported among these students; schooling experience is varied depending on the students’ 1 We would like to thank the following for their valuable comments and suggestions on earlier drafts of this report: Jamal Abedi, Diane August, Margaret Malone, Robert J. Mislevy, Charlene Rivera, Lourdes Rovira, Robert Rueda, Guillermo Solano-Flores, and Lynn Shafer Willner. Our sincere thanks also go to Jenny Kao, Patina L. Bachman, and Sandy M. Chang for their useful suggestions and invaluable research assistance.

2001年的《不让一个孩子掉队法案》(NCLB, 2002)对各州评估英语学习者(ELL)学生的政策产生了巨大影响。立法要求各州制定或采用健全的评估，以有效地衡量英语学生的英语语言能力，以及内容知识和技能。虽然各州已经迅速采取行动来满足这些要求，但由于缺乏资源，他们面临着验证当前针对ELL学生的评估和问责制度的挑战。考虑到评估在指导组织和个人决策中的重要作用，有效性是最重要的问题。鉴于此，我们回顾了当前关于ELL评估的文献和政策，以便告知从业者在验证过程中需要考虑的关键问题。根据我们对文献和实践的回顾，我们制定了一套指导方针和建议，供从业人员用作改进其ELL评估系统的资源。本报告是该系列的最后一部分，为评估ELL学生的国家政策和实践提供建议。它还讨论了未来研究和发展的领域。英语学习者(ELLs)是美国增长最快的群体。在1994-1995学年至2004-2005学年的10年间，外语学生的入学率增长了60%以上，而K-12的总增长率仅略高于2% (Office of English Language Acquisition [OELA]， n.d)。在一些州，这一增长速度更为惊人。例如，北卡罗来纳州和内华达州在过去10年期间的ELL人口增长率分别为500%和200% (Batlova, Fix， & Murray, 2005，引用于Short & Fitzsimmons, 2007)。不仅是ELL学生的数量在增长，而且这些学生的多样性也越来越广泛。据报道，这些学生会说400多种不同的语言;我们要感谢以下人士对本报告早期草稿的宝贵意见和建议:Jamal Abedi, Diane August, Margaret Malone, Robert J. Mislevy, Charlene Rivera, Lourdes Rovira, Robert Rueda, Guillermo Solano-Flores和Lynn Shafer Willner。我们也要衷心感谢Jenny Kao、Patina L. Bachman和Sandy M. Chang提供的有用建议和宝贵的研究协助。

{"title":"Recommendations for Assessing English Language Learners: English Language Proficiency Measures and Accommodation Uses. Recommendations Report (Part 3 of 3). CRESST Report 737.","authors":"M. Wolf, J. Herman, Lyle F. Bachman, A. Bailey, Noelle C. Griffin","doi":"10.1037/e643112011-001","DOIUrl":"https://doi.org/10.1037/e643112011-001","url":null,"abstract":"The No Child Left Behind Act of 2001 (NCLB, 2002) has had a great impact on states’ policies in assessing English language learner (ELL) students. The legislation requires states to develop or adopt sound assessments in order to validly measure the ELL students’ English language proficiency, as well as content knowledge and skills. While states have moved rapidly to meet these requirements, they face challenges to validate their current assessment and accountability systems for ELL students, partly due to the lack of resources. Considering the significant role of assessment in guiding decisions about organizations and individuals, validity is a paramount concern. In light of this, we reviewed the current literature and policy regarding ELL assessment in order to inform practitioners of the key issues to consider in their validation process. Drawn from our review of literature and practice, we developed a set of guidelines and recommendations for practitioners to use as a resource to improve their ELL assessment systems. The present report is the last component of the series, providing recommendations for state policy and practice in assessing ELL students. It also discusses areas for future research and development. Introduction and Background English language learners (ELLs) are the fastest growing subgroup in the nation. Over a 10-year period between the 1994–1995 and 2004–2005 school years, the enrollment of ELL students grew over 60%, while the total K–12 growth was just over 2% (Office of English Language Acquisition [OELA], n.d.). The increased rate is more astounding for some states. For instance, North Carolina and Nevada have reported their ELL population growth rate as 500% and 200% respectively for the past 10-year period (Batlova, Fix, & Murray, 2005, as cited in Short & Fitzsimmons, 2007). Not only is the size of the ELL population is growing, but the diversity of these students is becoming more extensive. Over 400 different languages are reported among these students; schooling experience is varied depending on the students’ 1 We would like to thank the following for their valuable comments and suggestions on earlier drafts of this report: Jamal Abedi, Diane August, Margaret Malone, Robert J. Mislevy, Charlene Rivera, Lourdes Rovira, Robert Rueda, Guillermo Solano-Flores, and Lynn Shafer Willner. Our sincere thanks also go to Jenny Kao, Patina L. Bachman, and Sandy M. Chang for their useful suggestions and invaluable research assistance.","PeriodicalId":19116,"journal":{"name":"National Center for Research on Evaluation, Standards, and Student Testing","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2008-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83351697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Templates and Objects in Authoring Problem-Solving Assessments. CRESST Report 735. 创建问题解决评估的模板和对象。报告735。

National Center for Research on Evaluation, Standards, and Student Testing

Pub Date : 2008-05-01 DOI: 10.4324/9781315096773-16

Terry P. Vendlinski, E. Baker, D. Niemi

引用次数: 10

Issues in Assessing English Language Learners: English Language Proficiency Measures and Accommodation Uses. Literature Review (Part 1 of 3). CRESST Report 731. 评估英语学习者的问题:英语水平测量和住宿使用。文献综述(1 / 3). CRESST报告731。

National Center for Research on Evaluation, Standards, and Student Testing

Pub Date : 2008-01-01 DOI: 10.1037/e643592011-001

M. Wolf, Jenny C. Kao, J. Herman, Lyle F. Bachman, A. Bailey, Patina L. Bachman, Tim Farnsworth, Sandy Chang

The No Child Left Behind (NCLB) Act has made a great impact on states’ policies in assessing English language learner (ELL) students. The legislation requires states to develop or adopt sound assessments in order to validly measure the ELL students’ English language proficiency (ELP), as well as content knowledge and skills. Although states have moved rapidly to meet these requirements, they face challenges to validate their current assessment and accountability systems for ELL students, partly due to the lack of resources. Considering the significant role of assessments in guiding decisions about organizations and individuals, it is of paramount importance to establish a valid assessment system. In light of this, we reviewed the current literature and policy regarding ELL assessment in order to inform practitioners of the key issues to consider in their validation processes. Drawn from our review of literature and practice, we developed a set of guidelines and recommendations for practitioners to use as a resource to improve their ELL assessment systems. We have compiled a series of three reports. The present report is the first component of the series, containing pertinent literature related to assessing ELL students. The areas being reviewed include validity theory, the construct of ELP assessments, and the effects of accommodations in the assessment of ELL students’ content knowledge.

《不让一个孩子掉队法案》(NCLB)对各州评估英语学习者(ELL)学生的政策产生了巨大影响。该立法要求各州制定或采用合理的评估方法，以便有效地衡量英语学生的英语语言能力(ELP)，以及内容知识和技能。尽管各州已经迅速采取行动来满足这些要求，但由于缺乏资源，他们面临着验证当前针对ELL学生的评估和问责制度的挑战。考虑到评估在指导组织和个人决策方面的重要作用，建立有效的评估体系至关重要。鉴于此，我们回顾了当前关于ELL评估的文献和政策，以便告知从业者在验证过程中需要考虑的关键问题。根据我们对文献和实践的回顾，我们制定了一套指导方针和建议，供从业人员用作改进其ELL评估系统的资源。我们编制了三份报告。本报告是该系列的第一部分，包含与评估ELL学生相关的相关文献。本研究检视的领域包括效度理论、外语学习评估的建构，以及在评估外语学生的内容知识时，住宿的效果。

{"title":"Issues in Assessing English Language Learners: English Language Proficiency Measures and Accommodation Uses. Literature Review (Part 1 of 3). CRESST Report 731.","authors":"M. Wolf, Jenny C. Kao, J. Herman, Lyle F. Bachman, A. Bailey, Patina L. Bachman, Tim Farnsworth, Sandy Chang","doi":"10.1037/e643592011-001","DOIUrl":"https://doi.org/10.1037/e643592011-001","url":null,"abstract":"The No Child Left Behind (NCLB) Act has made a great impact on states’ policies in assessing English language learner (ELL) students. The legislation requires states to develop or adopt sound assessments in order to validly measure the ELL students’ English language proficiency (ELP), as well as content knowledge and skills. Although states have moved rapidly to meet these requirements, they face challenges to validate their current assessment and accountability systems for ELL students, partly due to the lack of resources. Considering the significant role of assessments in guiding decisions about organizations and individuals, it is of paramount importance to establish a valid assessment system. In light of this, we reviewed the current literature and policy regarding ELL assessment in order to inform practitioners of the key issues to consider in their validation processes. Drawn from our review of literature and practice, we developed a set of guidelines and recommendations for practitioners to use as a resource to improve their ELL assessment systems. We have compiled a series of three reports. The present report is the first component of the series, containing pertinent literature related to assessing ELL students. The areas being reviewed include validity theory, the construct of ELP assessments, and the effects of accommodations in the assessment of ELL students’ content knowledge.","PeriodicalId":19116,"journal":{"name":"National Center for Research on Evaluation, Standards, and Student Testing","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80074697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 74

Creating Accurate Science Benchmark Assessments to Inform Instruction. CSE Technical Report 730. 创建准确的科学基准评估，为教学提供信息。CSE技术报告

National Center for Research on Evaluation, Standards, and Student Testing

Pub Date : 2007-10-01 DOI: 10.1037/e643602011-001

Terry P. Vendlinski, Sam O. Nagashima, J. Herman

Current educational policy highlights the important role that assessment can play in improving education. State standards and the assessments that are aligned with them establish targets for learning and promote school accountability for helping all students succeed; at the same time, feedback from assessment results is expected to provide districts, schools, and teachers with important information for guiding instructional planning and decision making. Yet even as No Child Left Behind (NCLB) and its requirements for adequate yearly progress put unprecedented emphasis on state tests, educators have discovered that annual state tests are too little and too late to guide teaching and learning. Recognizing the need for more frequent assessments to support student learning, many districts and schools have turned to benchmark testing—periodic assessments through which districts can monitor students’ progress, and schools and teachers can refine curriculum and teaching—to help students succeed. We report in this document a collaborative effort of teachers, district administrators, professional developers, and assessment researchers to develop benchmark assessments for elementary school science. In the sections which follow we provide the rationale for our work and its research question, describe our collaborative assessment development process and its results, and present conclusions.

当前的教育政策强调了评估在改善教育方面所能发挥的重要作用。国家标准和与之相一致的评估确立了学习目标，促进了学校的问责制，以帮助所有学生取得成功;同时，评估结果的反馈有望为地区、学校和教师提供指导教学计划和决策的重要信息。然而，即使《不让一个孩子掉队》(NCLB)及其对足够的年度进步的要求前所未有地强调了州考试，教育者们也发现，每年的州考试太少，也太迟了，无法指导教与学。认识到需要更频繁的评估来支持学生的学习，许多地区和学校已经转向基准测试——通过定期评估，地区可以监控学生的进步，学校和教师可以改进课程和教学——来帮助学生取得成功。我们在本文件中报告了教师、地区行政人员、专业开发人员和评估研究人员共同努力开发小学科学基准评估的成果。在接下来的章节中，我们提供了我们工作的基本原理及其研究问题，描述了我们的协作评估开发过程及其结果，并给出了结论。

{"title":"Creating Accurate Science Benchmark Assessments to Inform Instruction. CSE Technical Report 730.","authors":"Terry P. Vendlinski, Sam O. Nagashima, J. Herman","doi":"10.1037/e643602011-001","DOIUrl":"https://doi.org/10.1037/e643602011-001","url":null,"abstract":"Current educational policy highlights the important role that assessment can play in improving education. State standards and the assessments that are aligned with them establish targets for learning and promote school accountability for helping all students succeed; at the same time, feedback from assessment results is expected to provide districts, schools, and teachers with important information for guiding instructional planning and decision making. Yet even as No Child Left Behind (NCLB) and its requirements for adequate yearly progress put unprecedented emphasis on state tests, educators have discovered that annual state tests are too little and too late to guide teaching and learning. Recognizing the need for more frequent assessments to support student learning, many districts and schools have turned to benchmark testing—periodic assessments through which districts can monitor students’ progress, and schools and teachers can refine curriculum and teaching—to help students succeed. We report in this document a collaborative effort of teachers, district administrators, professional developers, and assessment researchers to develop benchmark assessments for elementary school science. In the sections which follow we provide the rationale for our work and its research question, describe our collaborative assessment development process and its results, and present conclusions.","PeriodicalId":19116,"journal":{"name":"National Center for Research on Evaluation, Standards, and Student Testing","volume":"107 9‐12","pages":""},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91418813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Eliciting Student Thinking in Elementary School Mathematics Classrooms. CRESST Report 725. 小学数学课堂中学生思维的引导。725号报告。

National Center for Research on Evaluation, Standards, and Student Testing

Pub Date : 2007-08-01 DOI: 10.1037/e643702011-001

M. Franke, N. Webb, Angela G. Chan, Dan Battey, Marsha Ing, Deanna Freund, Tondra De

引用次数: 6

Examining the Generalizability of Direct Writing Assessment Tasks. CSE Technical Report 718. 考察直接写作考核任务的普遍性。CSE技术报告

National Center for Research on Evaluation, Standards, and Student Testing

Pub Date : 2007-06-01 DOI: 10.1037/e643812011-001

Eva Chen, D. Niemi, Jia Wang, Haiwen Wang, J. Mirocha

This study investigated the level of generalizability across a few high quality assessment tasks and the validity of measuring student writing ability using a limited number of essay tasks. More specifically, the research team explored how well writing prompts could measure student general writing ability and if student performance from one writing task could be generalized to other similar writing tasks. A total of four writing prompts were used in the study, with three tasks being literature-based and one task based on a short story. A total of 397 students participated in the study and each student was randomly assigned to complete two of the four tasks. The research team found that three to five essays were required to evaluate and make a reliable judgment of student writing performance. Examining the Generalizability of Direct Writing Assessment Tasks Performance assessment can serve to measure important and complex learning outcomes (Resnick & Resnick, 1989), provide a more direct measurement of student ability (Frederiksen, 1984; Glaser, 1991; Guthrie, 1984), and help guide improvement in instructional practices (Baron, 1991; Bennett, 1993). Of the various types of performance assessment, direct tests of writing ability have experienced the most acceptance in state and national assessment programs (Afflebach, 1985; Applebee, Langer, Jenkins, Mullins & Foertsch, 1990; Applebee, Langer, & Mullis, 1995). Advocates of direct writing assessment point out that students need more exposure to writing in the form of instruction and more frequent examinations (Breland, 1983). However, there are problems associated with using essays to measure students’ writing abilities, like objectivity of ratings, generalizability of scores across raters and tasks (Crehan, 1997). Previous generalizability studies of direct writing assessment

本研究调查了几个高质量的评估任务的普遍性水平，以及使用有限数量的论文任务来衡量学生写作能力的有效性。更具体地说，研究小组探索了写作提示如何很好地衡量学生的总体写作能力，以及学生在一个写作任务中的表现是否可以推广到其他类似的写作任务中。研究中总共使用了四个写作提示，其中三个任务是基于文学的，一个任务是基于一个短篇故事的。共有397名学生参加了这项研究，每个学生被随机分配完成四项任务中的两项。研究小组发现，要评估并对学生的写作表现做出可靠的判断，需要三到五篇文章。绩效评估可以用来衡量重要和复杂的学习成果(Resnick & Resnick, 1989)，提供对学生能力的更直接的衡量(Frederiksen, 1984;格拉泽,1991;Guthrie, 1984)，并帮助指导教学实践的改进(Baron, 1991;班尼特,1993)。在各种类型的绩效评估中，写作能力的直接测试在州和国家评估计划中得到了最广泛的接受(Afflebach, 1985;Applebee, Langer, Jenkins, Mullins & Foertsch, 1990;Applebee, Langer， & Mullis, 1995)。提倡直接写作评估的人指出，学生需要更多地以教学的形式接触写作，需要更频繁地参加考试(Breland, 1983)。然而，用论文来衡量学生的写作能力存在一些问题，比如评分的客观性，评分者和任务的分数的普遍性(Crehan, 1997)。以往直接写作评价的概括性研究

{"title":"Examining the Generalizability of Direct Writing Assessment Tasks. CSE Technical Report 718.","authors":"Eva Chen, D. Niemi, Jia Wang, Haiwen Wang, J. Mirocha","doi":"10.1037/e643812011-001","DOIUrl":"https://doi.org/10.1037/e643812011-001","url":null,"abstract":"This study investigated the level of generalizability across a few high quality assessment tasks and the validity of measuring student writing ability using a limited number of essay tasks. More specifically, the research team explored how well writing prompts could measure student general writing ability and if student performance from one writing task could be generalized to other similar writing tasks. A total of four writing prompts were used in the study, with three tasks being literature-based and one task based on a short story. A total of 397 students participated in the study and each student was randomly assigned to complete two of the four tasks. The research team found that three to five essays were required to evaluate and make a reliable judgment of student writing performance. Examining the Generalizability of Direct Writing Assessment Tasks Performance assessment can serve to measure important and complex learning outcomes (Resnick & Resnick, 1989), provide a more direct measurement of student ability (Frederiksen, 1984; Glaser, 1991; Guthrie, 1984), and help guide improvement in instructional practices (Baron, 1991; Bennett, 1993). Of the various types of performance assessment, direct tests of writing ability have experienced the most acceptance in state and national assessment programs (Afflebach, 1985; Applebee, Langer, Jenkins, Mullins & Foertsch, 1990; Applebee, Langer, & Mullis, 1995). Advocates of direct writing assessment point out that students need more exposure to writing in the form of instruction and more frequent examinations (Breland, 1983). However, there are problems associated with using essays to measure students’ writing abilities, like objectivity of ratings, generalizability of scores across raters and tasks (Crehan, 1997). Previous generalizability studies of direct writing assessment","PeriodicalId":19116,"journal":{"name":"National Center for Research on Evaluation, Standards, and Student Testing","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2007-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85164680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Developing Academic English Language Proficiency Prototypes for 5th Grade Reading: Psychometric and Linguistic Profiles of Tasks. An Extended Executive Summary. CSE Report 720. 发展五年级学术英语语言能力原型:任务的心理测量和语言特征。一份扩展的执行摘要。CSE报告720。

National Center for Research on Evaluation, Standards, and Student Testing

Pub Date : 2007-06-01 DOI: 10.1037/e643792011-001

A. Bailey, Becky H. Huang, H. Shin, Tim Farnsworth, Frances A. Butler

引用次数: 4

School Improvement under Test-Driven Accountability: A Comparison of High- and Low-Performing Middle Schools in California. CSE Report 717. 考试驱动问责制下的学校改进:加州高中和低绩效中学的比较。CSE报告717。

National Center for Research on Evaluation, Standards, and Student Testing

Pub Date : 2007-05-01 DOI: 10.1037/e643832011-001

H. Mintrop, Tina Trujillo

Based on in-depth data from nine demographically similar schools, the study asks five questions in regard to key aspects of the improvement process and that speak to the consequential validity of accountability indicators: Do schools that differ widely according to system performance criteria also differ on the quality of the educational experience they provide to students? Are schools that have posted high growth on the state’s performance index more effective organizationally? Do high-performing schools respond more productively to the messages of their state accountability system? Do highand low-performing schools exhibit different approaches to organizational learning and teacher professionalism? Is district instructional management in an aligned state accountability system related to performance? We report our findings in three results papers1 (Mintrop & Trujillo, 2007a, 2007b; Trujillo & Mintrop, 2007) and this technical report. The results papers, in a nutshell, show that, across the nine case study schools, one positive performance outlier differed indeed in the quality of teaching, organizational effectiveness, response to accountability, and patterns of organizational learning. Across the other eight schools, however, the patterns blurred. We conclude that, save for performance differences on the extreme positive and negative margins, relationships between system-designated performance levels and improvement processes on the ground are uncertain and far from solid. The papers try to elucidate why this may be so. This final technical report summarizes the major components of the study design and methodology, including case selection, instrumentation, data collection, and data analysis techniques. We describe the context of the study as well as descriptive data on our cases and procedures. School improvement is an intricate business. Whether a school succeeds in improving is dependent on a host of factors. Factors come into play that are internal and external to the organization. The motivation and capacity of the workforce, the 1 The three reports are entitled Accountability Urgency, Organizational Learning, and Educational Outcomes: A Comparative Analysis of California Middle Schools; The Practical Relevance of Accountability Systems for School Improvement: A Descriptive Analysis of California Schools; and Centralized Instructional Management: District Control, Organizational Culture, and School Performance.

基于9所人口统计学上相似的学校的深入数据，该研究就改进过程的关键方面提出了5个问题，这些问题与问责制指标的相应有效性有关:根据系统绩效标准差异很大的学校是否也在为学生提供的教育体验质量上存在差异?在国家绩效指数上高增长的学校在组织上是否更有效?表现优异的学校是否更有效地回应了州问责制的信息?高绩效学校和低绩效学校表现出不同的组织学习方法和教师专业精神吗?在统一的州问责制下的地区教学管理是否与绩效相关?我们在三篇结果论文中报告了我们的发现1 (Mintrop & Trujillo, 2007a, 2007b;Trujillo & Mintrop, 2007)和这份技术报告。简而言之，结果论文表明，在九个案例研究学校中，有一个积极的异常值在教学质量、组织有效性、对责任的反应和组织学习模式方面确实存在差异。然而，在其他八所学校中，这种模式变得模糊了。我们得出的结论是，除了在极端的正边际和负边际上的性能差异外，系统指定的性能水平与实地改进过程之间的关系是不确定的，远非可靠的。论文试图解释为什么会这样。这份最终的技术报告总结了研究设计和方法的主要组成部分，包括病例选择、仪器、数据收集和数据分析技术。我们描述了研究的背景以及我们的病例和程序的描述性数据。学校的改进是一件复杂的事情。一所学校能否成功地改进取决于许多因素。组织内部和外部的因素都在起作用。这三份报告的题目是:问责紧迫性、组织学习和教育成果:加州中学的比较分析;问责制对学校改进的实际意义:对加州学校的描述性分析集中教学管理:区域控制、组织文化和学校绩效。

{"title":"School Improvement under Test-Driven Accountability: A Comparison of High- and Low-Performing Middle Schools in California. CSE Report 717.","authors":"H. Mintrop, Tina Trujillo","doi":"10.1037/e643832011-001","DOIUrl":"https://doi.org/10.1037/e643832011-001","url":null,"abstract":"Based on in-depth data from nine demographically similar schools, the study asks five questions in regard to key aspects of the improvement process and that speak to the consequential validity of accountability indicators: Do schools that differ widely according to system performance criteria also differ on the quality of the educational experience they provide to students? Are schools that have posted high growth on the state’s performance index more effective organizationally? Do high-performing schools respond more productively to the messages of their state accountability system? Do highand low-performing schools exhibit different approaches to organizational learning and teacher professionalism? Is district instructional management in an aligned state accountability system related to performance? We report our findings in three results papers1 (Mintrop & Trujillo, 2007a, 2007b; Trujillo & Mintrop, 2007) and this technical report. The results papers, in a nutshell, show that, across the nine case study schools, one positive performance outlier differed indeed in the quality of teaching, organizational effectiveness, response to accountability, and patterns of organizational learning. Across the other eight schools, however, the patterns blurred. We conclude that, save for performance differences on the extreme positive and negative margins, relationships between system-designated performance levels and improvement processes on the ground are uncertain and far from solid. The papers try to elucidate why this may be so. This final technical report summarizes the major components of the study design and methodology, including case selection, instrumentation, data collection, and data analysis techniques. We describe the context of the study as well as descriptive data on our cases and procedures. School improvement is an intricate business. Whether a school succeeds in improving is dependent on a host of factors. Factors come into play that are internal and external to the organization. The motivation and capacity of the workforce, the 1 The three reports are entitled Accountability Urgency, Organizational Learning, and Educational Outcomes: A Comparative Analysis of California Middle Schools; The Practical Relevance of Accountability Systems for School Improvement: A Descriptive Analysis of California Schools; and Centralized Instructional Management: District Control, Organizational Culture, and School Performance.","PeriodicalId":19116,"journal":{"name":"National Center for Research on Evaluation, Standards, and Student Testing","volume":"112 2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2007-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91024258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Changes in the Black-White Test score Gap in the Elementary School Grades. CSE Report 715. 小学成绩中黑人-白人考试成绩差距的变化。CSE报告715。

National Center for Research on Evaluation, Standards, and Student Testing

Pub Date : 2007-04-01 DOI: 10.1037/e643902011-001

D. Koretz, Y. Kim

In a pair of recent studies, Fryer and Levitt (2004a, 2004b) analyzed the Early Childhood Longitudinal Study – Kindergarten Cohort (ECLS-K) to explore the characteristics of the Black-White test score gap in young children. They found that the gap grew markedly between kindergarten and the third grade and that they could predict the gap from measured characteristics in kindergarten but not in the third grade. In addition, they found that the widening of the gap was differential across areas of knowledge and skill, with Blacks falling behind in all areas other than the most basic. They raised the possibility that Black and Whites may not be on “parallel trajectories” and that Blacks, as they go through school, may never master some skills mastered by Whites. This study re-analyzes the ECLS-K data to address this last question. We find that the scores used by Fryer and Levitt (proficiency probability scores, or PPS) do not support the hypothesis of differential growth of the gap. The patterns they found reflect the nonlinear relationships between overall proficiency, θ , and the PPS variables, as well as ceiling effects in the PPS distributions. Moreover, θ is a sufficient statistic for the PPS variables, and therefore, PPS variables merely re-express the overall mean difference between groups and contain no information about qualitative differences in performance between Black and White students at similar levels of θ . We therefore carried out differential item functioning (DIF) analyses of all items in all rounds of the ECLS-K through grade 5 (Round 6), excluding only the fall of grade 1 (which was a very small sample) and subsamples in which there were too few Black students for reasonable analysis. We found no relevant patterns in the distribution of the DIF statistics or in the characteristics of the items showing DIF that support the notion of differential divergence, other than in kindergarten and the first grade, where DIF favoring Blacks tended to be on items tapping simple skills taught outside of school (e.g., number recognition), while DIF disfavoring Blacks tended to be on material taught more in school (e.g., arithmetic). However, there were exceptions to this. Moreover, because of its construction and reporting, the ECLS-K data were not ideal for addressing this 1Young-Suk Kim is currently at the Florida Center for Reading Research (FCRR) and Department of Childhood Education, Reading, and Disability Services, College of Education, Florida State University

在最近的两项研究中，Fryer和Levitt (2004a, 2004b)分析了早期儿童纵向研究-幼儿园队列(ECLS-K)，以探索幼儿黑人-白人测试成绩差距的特征。他们发现，幼儿园和三年级之间的差距明显扩大，他们可以通过幼儿园的测量特征来预测差距，但在三年级时却不能。此外，他们还发现，差距的扩大在知识和技能领域存在差异，黑人在除了最基本的领域之外的所有领域都落后。他们提出了一种可能性，即黑人和白人可能不在“平行轨迹”上，黑人在上学的过程中，可能永远无法掌握白人掌握的一些技能。本研究重新分析了ECLS-K数据来解决最后一个问题。我们发现Fryer和Levitt使用的分数(熟练概率分数，或PPS)不支持差距差异增长的假设。他们发现的模式反映了总体熟练度、θ和PPS变量之间的非线性关系，以及PPS分布中的天花板效应。此外，θ是PPS变量的充分统计量，因此，PPS变量只是重新表达了组间的总体平均差异，而不包含关于黑人和白人学生在相似θ水平下表现的定性差异的信息。因此，我们对ECLS-K到5年级(第6轮)的所有项目进行了差异项目功能(DIF)分析，只排除了1年级的下降(这是一个非常小的样本)和由于黑人学生太少而无法进行合理分析的子样本。除了在幼儿园和一年级，我们发现在DIF统计数据的分布或显示DIF的项目的特征中没有相关的模式支持微分发散的概念，在幼儿园和一年级，有利于黑人的DIF往往是涉及校外教授的简单技能的项目(例如，数字识别)，而不利于黑人的DIF往往是在学校教授的更多的材料(例如，算术)。然而，也有例外。此外，由于其结构和报告，ECLS-K数据对于解决这一问题并不理想。Kim young - suk目前就职于佛罗里达阅读研究中心(FCRR)和佛罗里达州立大学教育学院儿童教育、阅读和残疾服务系

{"title":"Changes in the Black-White Test score Gap in the Elementary School Grades. CSE Report 715.","authors":"D. Koretz, Y. Kim","doi":"10.1037/e643902011-001","DOIUrl":"https://doi.org/10.1037/e643902011-001","url":null,"abstract":"In a pair of recent studies, Fryer and Levitt (2004a, 2004b) analyzed the Early Childhood Longitudinal Study – Kindergarten Cohort (ECLS-K) to explore the characteristics of the Black-White test score gap in young children. They found that the gap grew markedly between kindergarten and the third grade and that they could predict the gap from measured characteristics in kindergarten but not in the third grade. In addition, they found that the widening of the gap was differential across areas of knowledge and skill, with Blacks falling behind in all areas other than the most basic. They raised the possibility that Black and Whites may not be on “parallel trajectories” and that Blacks, as they go through school, may never master some skills mastered by Whites. This study re-analyzes the ECLS-K data to address this last question. We find that the scores used by Fryer and Levitt (proficiency probability scores, or PPS) do not support the hypothesis of differential growth of the gap. The patterns they found reflect the nonlinear relationships between overall proficiency, θ , and the PPS variables, as well as ceiling effects in the PPS distributions. Moreover, θ is a sufficient statistic for the PPS variables, and therefore, PPS variables merely re-express the overall mean difference between groups and contain no information about qualitative differences in performance between Black and White students at similar levels of θ . We therefore carried out differential item functioning (DIF) analyses of all items in all rounds of the ECLS-K through grade 5 (Round 6), excluding only the fall of grade 1 (which was a very small sample) and subsamples in which there were too few Black students for reasonable analysis. We found no relevant patterns in the distribution of the DIF statistics or in the characteristics of the items showing DIF that support the notion of differential divergence, other than in kindergarten and the first grade, where DIF favoring Blacks tended to be on items tapping simple skills taught outside of school (e.g., number recognition), while DIF disfavoring Blacks tended to be on material taught more in school (e.g., arithmetic). However, there were exceptions to this. Moreover, because of its construction and reporting, the ECLS-K data were not ideal for addressing this 1Young-Suk Kim is currently at the Florida Center for Reading Research (FCRR) and Department of Childhood Education, Reading, and Disability Services, College of Education, Florida State University","PeriodicalId":19116,"journal":{"name":"National Center for Research on Evaluation, Standards, and Student Testing","volume":"89 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2007-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85838700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4