首页 > 最新文献

ETS Research Report Series最新文献

英文 中文
Building a Validity Argument for the TOEFL Junior® Tests 为 TOEFL Junior® 考试建立有效性论证
Q3 Social Sciences Pub Date : 2024-05-15 DOI: 10.1002/ets2.12379
Ching‐Ni Hsieh
The TOEFL Junior® tests are designed to evaluate young language students' English reading, listening, speaking, and writing skills in an English‐medium secondary instructional context. This paper articulates a validity argument constructed to support the use and interpretation of the TOEFL Junior test scores for the purpose of placement, progress monitoring, and evaluation of a test taker's English skills. The validity argument is built within an argument‐based approach to validation and consists of six validity inferences that provide a coherent narrative about the measurement quality and intended uses of the TOEFL Junior test scores. Each validity inference is underpinned by specific assumptions and corresponding evidential support. The claims and supporting evidence presented in the validity argument demonstrate how the TOEFL Junior research program takes a rigorous approach to supporting the uses of the tests. The compilation of validity evidence serves as a resource for score users and stakeholders, guiding them to make informed decisions regarding the use and interpretation of TOEFL Junior test scores within their educational contexts.
TOEFL Junior® 考试旨在评估在以英语为教学语言的中学教学环境中,年轻语言学生的英语阅读、听力、口语和写作技能。本文阐述了一个有效性论证,以支持使用和解释 TOEFL Junior 考试成绩,从而达到分班、进度监控和评估考生英语技能的目的。该有效性论证采用基于论证的验证方法,由六个有效性推论组成,对 TOEFL Junior 考试成绩的测量质量和预期用途进行了连贯的叙述。每个有效性推论都有具体的假设和相应的证据支持。效度论证中提出的主张和支持性证据展示了 TOEFL Junior 研究项目如何以严谨的方法支持测试的用途。效度证据汇编可作为分数使用者和利益相关者的资源,指导他们在教育环境中就 TOEFL Junior 考试分数的使用和解释做出明智的决定。
{"title":"Building a Validity Argument for the TOEFL Junior® Tests","authors":"Ching‐Ni Hsieh","doi":"10.1002/ets2.12379","DOIUrl":"https://doi.org/10.1002/ets2.12379","url":null,"abstract":"The TOEFL Junior® tests are designed to evaluate young language students' English reading, listening, speaking, and writing skills in an English‐medium secondary instructional context. This paper articulates a validity argument constructed to support the use and interpretation of the TOEFL Junior test scores for the purpose of placement, progress monitoring, and evaluation of a test taker's English skills. The validity argument is built within an argument‐based approach to validation and consists of six validity inferences that provide a coherent narrative about the measurement quality and intended uses of the TOEFL Junior test scores. Each validity inference is underpinned by specific assumptions and corresponding evidential support. The claims and supporting evidence presented in the validity argument demonstrate how the TOEFL Junior research program takes a rigorous approach to supporting the uses of the tests. The compilation of validity evidence serves as a resource for score users and stakeholders, guiding them to make informed decisions regarding the use and interpretation of TOEFL Junior test scores within their educational contexts.","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"48 7","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140974663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Validity, Reliability, and Fairness Evidence for the JD‐Next Exam JD-Next 考试的有效性、可靠性和公平性证据
Q3 Social Sciences Pub Date : 2024-04-10 DOI: 10.1002/ets2.12378
Steven Holtzman, Jonathan Steinberg, Jonathan Weeks, Christopher Robertson, Jessica Findley, David M Klieger
At a time when institutions of higher education are exploring alternatives to traditional admissions testing, institutions are also seeking to better support students and prepare them for academic success. Under such an engaged model, one may seek to measure not just the accumulated knowledge and skills that students would bring to a new academic program but also their ability to grow and learn through the academic program. To help prepare students for law school before they matriculate, the JD‐Next is a fully online, noncredit, 7‐ to 10‐week course to train potential juris doctor students in case reading and analysis skills. This study builds on the work presented for previous JD‐Next cohorts by introducing new scoring and reliability estimation methodologies based on a recent redesign of the assessment for the 2021 cohort, and it presents updated validity and fairness findings using first‐year grades, rather than merely first‐semester grades as in prior cohorts. Results support the claim that the JD‐Next exam is reliable and valid for predicting law school success, providing a statistically significant increase in predictive power over baseline models, including entrance exam scores and grade point averages. In terms of fairness across racial and ethnic groups, smaller score disparities are found with JD‐Next than with traditional admissions assessments, and the assessment is shown to be equally predictive for students from underrepresented minority groups and for first‐generation students. These findings, in conjunction with those from previous research, support the use of the JD‐Next exam for both preparing and admitting future law school students.
当高等教育机构在探索传统入学考试的替代方案时,他们也在寻求更好地支持学生,为他们的学业成功做好准备。在这种参与模式下,我们可能不仅要衡量学生在新的学术项目中积累的知识和技能,还要衡量他们在学术项目中成长和学习的能力。为了帮助学生在入学前为法学院做好准备,JD-Next 是一门完全在线的非学分课程,为期 7 到 10 周,旨在培训潜在法学博士学生的案例阅读和分析技能。本研究在前几届 JD-Next 课程的基础上,根据最近为 2021 届学生重新设计的评估,引入了新的评分和信度估计方法,并使用第一年的成绩,而不仅仅是前几届学生第一学期的成绩,对有效性和公平性进行了更新。结果证明,JD-Next 考试在预测法学院学业成功方面是可靠有效的,与基线模型(包括入学考试分数和平均学分绩点)相比,其预测能力在统计学上有显著提高。在不同种族和民族群体之间的公平性方面,JD-Next 考试的分数差距小于传统的招生评估,而且该评估对来自代表人数不足的少数民族群体的学生和第一代学生具有同等的预测能力。这些研究结果与之前的研究结果相结合,支持将 JD-Next 考试用于准备和录取未来的法学院学生。
{"title":"Validity, Reliability, and Fairness Evidence for the JD‐Next Exam","authors":"Steven Holtzman, Jonathan Steinberg, Jonathan Weeks, Christopher Robertson, Jessica Findley, David M Klieger","doi":"10.1002/ets2.12378","DOIUrl":"https://doi.org/10.1002/ets2.12378","url":null,"abstract":"At a time when institutions of higher education are exploring alternatives to traditional admissions testing, institutions are also seeking to better support students and prepare them for academic success. Under such an engaged model, one may seek to measure not just the accumulated knowledge and skills that students would bring to a new academic program but also their ability to grow and learn through the academic program. To help prepare students for law school before they matriculate, the JD‐Next is a fully online, noncredit, 7‐ to 10‐week course to train potential juris doctor students in case reading and analysis skills. This study builds on the work presented for previous JD‐Next cohorts by introducing new scoring and reliability estimation methodologies based on a recent redesign of the assessment for the 2021 cohort, and it presents updated validity and fairness findings using first‐year grades, rather than merely first‐semester grades as in prior cohorts. Results support the claim that the JD‐Next exam is reliable and valid for predicting law school success, providing a statistically significant increase in predictive power over baseline models, including entrance exam scores and grade point averages. In terms of fairness across racial and ethnic groups, smaller score disparities are found with JD‐Next than with traditional admissions assessments, and the assessment is shown to be equally predictive for students from underrepresented minority groups and for first‐generation students. These findings, in conjunction with those from previous research, support the use of the JD‐Next exam for both preparing and admitting future law school students.","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"7 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140716447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Practical Considerations in Item Calibration With Small Samples Under Multistage Test Design: A Case Study 多阶段测试设计下小样本项目校准的实际考虑因素:案例研究
Q3 Social Sciences Pub Date : 2024-02-04 DOI: 10.1002/ets2.12376
Hongwen Guo, Matthew S. Johnson, Daniel F. McCaffrey, Lixong Gu
The multistage testing (MST) design has been gaining attention and popularity in educational assessments. For testing programs that have small test‐taker samples, it is challenging to calibrate new items to replenish the item pool. In the current research, we used the item pools from an operational MST program to illustrate how research studies can be built upon literature and program‐specific data to help to fill the gaps between research and practice and to make sound psychometric decisions to address the small‐sample issues. The studies included choice of item calibration methods, data collection designs to increase sample sizes, and item response theory models in producing the score conversion tables. Our results showed that, with small samples, the fixed parameter calibration (FIPC) method performed consistently the best for calibrating new items, compared to the traditional separate‐calibration with scaling method and a new approach of a calibration method based on the minimum discriminant information adjustment. In addition, the concurrent FIPC calibration with data from multiple administrations also improved parameter estimation of new items. However, because of the program‐specific settings, a simpler model may not improve current practice when the sample size was small and when the initial item pools were well‐calibrated using a two‐parameter logistic model with a large field trial data.
多阶段测试(MST)设计在教育评估中越来越受到关注和欢迎。对于考生样本较少的测试项目来说,校准新项目以补充项目库是一项挑战。在目前的研究中,我们使用了一个运行中的 MST 项目的项目库,以说明如何在文献和特定项目数据的基础上开展研究,帮助填补研究与实践之间的空白,并做出合理的心理测量决策,以解决小样本问题。这些研究包括项目校准方法的选择、增加样本量的数据收集设计以及制作分数转换表的项目反应理论模型。研究结果表明,在小样本情况下,固定参数校准法(FIPC)在校准新项目方面一直表现最佳,与之相比,传统的分别校准加比例法和基于最小判别信息调整的新校准法表现更佳。此外,利用多次施测数据同时进行的 FIPC 校正也改进了新项目的参数估计。然而,由于项目的具体设置,当样本量较小,且初始项目库已通过使用双参数逻辑模型和大量现场试验数据进行了良好校准时,更简单的模型可能无法改善目前的做法。
{"title":"Practical Considerations in Item Calibration With Small Samples Under Multistage Test Design: A Case Study","authors":"Hongwen Guo, Matthew S. Johnson, Daniel F. McCaffrey, Lixong Gu","doi":"10.1002/ets2.12376","DOIUrl":"https://doi.org/10.1002/ets2.12376","url":null,"abstract":"The multistage testing (MST) design has been gaining attention and popularity in educational assessments. For testing programs that have small test‐taker samples, it is challenging to calibrate new items to replenish the item pool. In the current research, we used the item pools from an operational MST program to illustrate how research studies can be built upon literature and program‐specific data to help to fill the gaps between research and practice and to make sound psychometric decisions to address the small‐sample issues. The studies included choice of item calibration methods, data collection designs to increase sample sizes, and item response theory models in producing the score conversion tables. Our results showed that, with small samples, the fixed parameter calibration (FIPC) method performed consistently the best for calibrating new items, compared to the traditional separate‐calibration with scaling method and a new approach of a calibration method based on the minimum discriminant information adjustment. In addition, the concurrent FIPC calibration with data from multiple administrations also improved parameter estimation of new items. However, because of the program‐specific settings, a simpler model may not improve current practice when the sample size was small and when the initial item pools were well‐calibrated using a two‐parameter logistic model with a large field trial data.","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"11 1-2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139867309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Practical Considerations in Item Calibration With Small Samples Under Multistage Test Design: A Case Study 多阶段测试设计下小样本项目校准的实际考虑因素:案例研究
Q3 Social Sciences Pub Date : 2024-02-04 DOI: 10.1002/ets2.12376
Hongwen Guo, Matthew S. Johnson, Daniel F. McCaffrey, Lixong Gu
The multistage testing (MST) design has been gaining attention and popularity in educational assessments. For testing programs that have small test‐taker samples, it is challenging to calibrate new items to replenish the item pool. In the current research, we used the item pools from an operational MST program to illustrate how research studies can be built upon literature and program‐specific data to help to fill the gaps between research and practice and to make sound psychometric decisions to address the small‐sample issues. The studies included choice of item calibration methods, data collection designs to increase sample sizes, and item response theory models in producing the score conversion tables. Our results showed that, with small samples, the fixed parameter calibration (FIPC) method performed consistently the best for calibrating new items, compared to the traditional separate‐calibration with scaling method and a new approach of a calibration method based on the minimum discriminant information adjustment. In addition, the concurrent FIPC calibration with data from multiple administrations also improved parameter estimation of new items. However, because of the program‐specific settings, a simpler model may not improve current practice when the sample size was small and when the initial item pools were well‐calibrated using a two‐parameter logistic model with a large field trial data.
多阶段测试(MST)设计在教育评估中越来越受到关注和欢迎。对于考生样本较少的测试项目来说,校准新项目以补充项目库是一项挑战。在目前的研究中,我们使用了一个运行中的 MST 项目的项目库,以说明如何在文献和特定项目数据的基础上开展研究,帮助填补研究与实践之间的空白,并做出合理的心理测量决策,以解决小样本问题。这些研究包括项目校准方法的选择、增加样本量的数据收集设计以及制作分数转换表的项目反应理论模型。研究结果表明,在小样本情况下,固定参数校准法(FIPC)在校准新项目方面一直表现最佳,与之相比,传统的分别校准加比例法和基于最小判别信息调整的新校准法表现更佳。此外,利用多次施测数据同时进行的 FIPC 校正也改进了新项目的参数估计。然而,由于项目的具体设置,当样本量较小,且初始项目库已通过使用双参数逻辑模型和大量现场试验数据进行了良好校准时,更简单的模型可能无法改善目前的做法。
{"title":"Practical Considerations in Item Calibration With Small Samples Under Multistage Test Design: A Case Study","authors":"Hongwen Guo, Matthew S. Johnson, Daniel F. McCaffrey, Lixong Gu","doi":"10.1002/ets2.12376","DOIUrl":"https://doi.org/10.1002/ets2.12376","url":null,"abstract":"The multistage testing (MST) design has been gaining attention and popularity in educational assessments. For testing programs that have small test‐taker samples, it is challenging to calibrate new items to replenish the item pool. In the current research, we used the item pools from an operational MST program to illustrate how research studies can be built upon literature and program‐specific data to help to fill the gaps between research and practice and to make sound psychometric decisions to address the small‐sample issues. The studies included choice of item calibration methods, data collection designs to increase sample sizes, and item response theory models in producing the score conversion tables. Our results showed that, with small samples, the fixed parameter calibration (FIPC) method performed consistently the best for calibrating new items, compared to the traditional separate‐calibration with scaling method and a new approach of a calibration method based on the minimum discriminant information adjustment. In addition, the concurrent FIPC calibration with data from multiple administrations also improved parameter estimation of new items. However, because of the program‐specific settings, a simpler model may not improve current practice when the sample size was small and when the initial item pools were well‐calibrated using a two‐parameter logistic model with a large field trial data.","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2011 22","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139807398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling Writing Traits in a Formative Essay Corpus 在形成性论文语料库中塑造写作特质
Q3 Social Sciences Pub Date : 2024-01-17 DOI: 10.1002/ets2.12377
Paul Deane, Duanli Yan, Katherine Castellano, Y. Attali, Michelle Lamar, Mo Zhang, Ian Blood, James V. Bruno, Chen Li, Wenju Cui, Chunyi Ruan, Colleen Appel, Kofi James, Rodolfo Long, Farah Qureshi
This paper presents a multidimensional model of variation in writing quality, register, and genre in student essays, trained and tested via confirmatory factor analysis of 1.37 million essay submissions to ETS' digital writing service, Criterion®. The model was also validated with several other corpora, which indicated that it provides a reasonable fit for essay data from 4th grade to college. It includes an analysis of the test‐retest reliability of each trait, longitudinal trends by trait, both within the school year and from 4th to 12th grades, and analysis of genre differences by trait, using prompts from the Criterion topic library aligned with the major modes of writing (exposition, argumentation, narrative, description, process, comparison and contrast, and cause and effect). It demonstrates that many of the traits are about as reliable as overall e‐rater® scores, that the trait model can be used to build models somewhat more closely aligned with human scores than standard e‐rater models, and that there are large, significant trait differences by genre, consistent with genre differences in trait patterns described in the larger literature. Some of the traits demonstrated clear trends between successive revisions. Students using Criterion appear to have consistently improved grammar, usage, and spelling after getting Criterion feedback and to have marginally improved essay organization. Many of the traits also demonstrated clear grade level trends. These features indicate that the trait model could be used to support more detailed scoring and reporting for writing assessments and learning tools.
本文介绍了学生作文中写作质量、语体和体裁差异的多维模型,该模型是通过对 137 万篇提交给 ETS 数字写作服务 Criterion® 的作文进行确认性因子分析而训练和测试得出的。该模型还通过其他几个语料库进行了验证,结果表明它能合理地拟合从四年级到大学的作文数据。它包括每种特质的重测信度分析、学年内和从四年级到十二年级的特质纵向趋势分析,以及使用 Criterion 主题库中与主要写作模式(论述、论证、叙述、描述、过程、比较和对比以及因果关系)相一致的提示,对特质的体裁差异进行分析。结果表明,许多特质与电子评分器®的总分一样可靠,特质模型可以用来建立比标准电子评分器模型更接近人类分数的模型,而且不同体裁的特质差异很大,这与更多文献中描述的特质模式的体裁差异是一致的。一些特质在连续修订之间表现出明显的趋势。使用《标准》的学生在获得《标准》反馈后,语法、用法和拼写似乎都有了持续的改善,文章组织也略有提高。许多特质还表现出明显的年级趋势。这些特点表明,特质模型可用于为写作评估和学习工具提供更详细的评分和报告。
{"title":"Modeling Writing Traits in a Formative Essay Corpus","authors":"Paul Deane, Duanli Yan, Katherine Castellano, Y. Attali, Michelle Lamar, Mo Zhang, Ian Blood, James V. Bruno, Chen Li, Wenju Cui, Chunyi Ruan, Colleen Appel, Kofi James, Rodolfo Long, Farah Qureshi","doi":"10.1002/ets2.12377","DOIUrl":"https://doi.org/10.1002/ets2.12377","url":null,"abstract":"This paper presents a multidimensional model of variation in writing quality, register, and genre in student essays, trained and tested via confirmatory factor analysis of 1.37 million essay submissions to ETS' digital writing service, Criterion®. The model was also validated with several other corpora, which indicated that it provides a reasonable fit for essay data from 4th grade to college. It includes an analysis of the test‐retest reliability of each trait, longitudinal trends by trait, both within the school year and from 4th to 12th grades, and analysis of genre differences by trait, using prompts from the Criterion topic library aligned with the major modes of writing (exposition, argumentation, narrative, description, process, comparison and contrast, and cause and effect). It demonstrates that many of the traits are about as reliable as overall e‐rater® scores, that the trait model can be used to build models somewhat more closely aligned with human scores than standard e‐rater models, and that there are large, significant trait differences by genre, consistent with genre differences in trait patterns described in the larger literature. Some of the traits demonstrated clear trends between successive revisions. Students using Criterion appear to have consistently improved grammar, usage, and spelling after getting Criterion feedback and to have marginally improved essay organization. Many of the traits also demonstrated clear grade level trends. These features indicate that the trait model could be used to support more detailed scoring and reporting for writing assessments and learning tools.","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":" 10","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139617004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Use of TOEFL iBT® in Admissions Decisions: Stakeholder Perceptions of Policies and Practices 在招生决策中使用 TOEFL iBT®:利益相关者对政策和实践的看法
Q3 Social Sciences Pub Date : 2024-01-16 DOI: 10.1002/ets2.12375
Sara T. Cushing, Haoshan Ren, Yi Tan
This paper reports partial results from a larger study of how three different groups of stakeholders—university admissions officers, faculty in graduate programs involved in admissions decisions, and Intensive English Program (IEP) faculty—interpret and use TOEFL iBT® scores in making admissions decisions or preparing students to meet minimum test score requirements. Our overall goal was to gain a better understanding of the perceived role of English language proficiency in admissions decisions and the internal and external factors that inform decisions about acceptable ways to demonstrate proficiency and minimal standards. To that end, we designed surveys for each stakeholder group that contained questions for all groups and questions specific to each group. This report focuses on the questions that were common to all three groups across two areas: (1) understandings of and participation in institutional policy making around English language proficiency tests and (2) knowledge of and attitudes toward the TOEFL iBT test itself. Our results suggested that, as predicted, university admissions staff were the most aware of and involved in policy making but frequently consulted with ESL experts such as IEP faculty when setting policies. This stakeholder group was also the most knowledgeable about the TOEFL iBT test. Faculty in graduate programs varied in their understanding of and involvement in policy making and reported the least familiarity with the test. However, they reported that more information about many aspects of the test would help them make better admissions decisions. The results of the study add to the growing literature on language assessment literacy among various stakeholder groups, especially in terms of identifying aspects of assessment literacy that are important to different groups of stakeholders.
本文报告了一项大型研究的部分结果,该研究涉及三个不同的利益相关者群体--大学招生官、参与招生决策的研究生项目教师以及强化英语课程(IEP)教师--在做出招生决策或帮助学生达到最低考试分数要求时如何理解和使用托福 iBT® 成绩。我们的总体目标是更好地了解英语语言能力在招生决策中的作用,以及影响可接受的能力证明方式和最低标准决策的内部和外部因素。为此,我们为每个利益相关者群体设计了调查问卷,其中既有针对所有群体的问题,也有针对每个群体的具体问题。本报告重点讨论了三个群体在以下两个方面的共同问题:(1) 对围绕英语语言能力测试的机构政策制定的理解和参与;(2) 对 TOEFL iBT 考试本身的了解和态度。我们的结果表明,正如所预测的那样,大学招生人员对政策制定的了解和参与程度最高,但在制定政策时,他们经常咨询 ESL 专家,如 IEP 教师。这个利益相关者群体也最了解托福 iBT 考试。研究生项目的教师对政策制定的理解和参与程度各不相同,对托福考试的熟悉程度最低。不过,他们表示,如果能获得更多有关该考试许多方面的信息,将有助于他们做出更好的录取决定。这项研究的结果丰富了有关各利益相关群体语言评估素养的文献,尤其是在确定评估素养对不同利益相关群体的重要性方面。
{"title":"The Use of TOEFL iBT® in Admissions Decisions: Stakeholder Perceptions of Policies and Practices","authors":"Sara T. Cushing, Haoshan Ren, Yi Tan","doi":"10.1002/ets2.12375","DOIUrl":"https://doi.org/10.1002/ets2.12375","url":null,"abstract":"This paper reports partial results from a larger study of how three different groups of stakeholders—university admissions officers, faculty in graduate programs involved in admissions decisions, and Intensive English Program (IEP) faculty—interpret and use TOEFL iBT® scores in making admissions decisions or preparing students to meet minimum test score requirements. Our overall goal was to gain a better understanding of the perceived role of English language proficiency in admissions decisions and the internal and external factors that inform decisions about acceptable ways to demonstrate proficiency and minimal standards. To that end, we designed surveys for each stakeholder group that contained questions for all groups and questions specific to each group. This report focuses on the questions that were common to all three groups across two areas: (1) understandings of and participation in institutional policy making around English language proficiency tests and (2) knowledge of and attitudes toward the TOEFL iBT test itself. Our results suggested that, as predicted, university admissions staff were the most aware of and involved in policy making but frequently consulted with ESL experts such as IEP faculty when setting policies. This stakeholder group was also the most knowledgeable about the TOEFL iBT test. Faculty in graduate programs varied in their understanding of and involvement in policy making and reported the least familiarity with the test. However, they reported that more information about many aspects of the test would help them make better admissions decisions. The results of the study add to the growing literature on language assessment literacy among various stakeholder groups, especially in terms of identifying aspects of assessment literacy that are important to different groups of stakeholders.","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":" 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139619988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Culturally Responsive Assessment: Provisional Principles 文化反应性评估:临时原则
Q3 Social Sciences Pub Date : 2023-09-06 DOI: 10.1002/ets2.12374
Michael E. Walker, Margarita Olivera-Aguilar, Blair Lehman, Cara Laitusis, Danielle Guzman-Orth, Melissa Gholson

Recent criticisms of large-scale summative assessments have claimed that the assessments are biased against historically excluded groups because of the assessments' lack of cultural representation. Accompanying these criticisms is a call for more culturally responsive assessments—assessments that take into account the background characteristics of the students; their beliefs, values, and ethics; their lived experiences; and everything that affects how they learn and behave and communicate. In this paper, we present provisional principles, based on a review of research, that we deem necessary for fostering cultural responsiveness in assessment. We believe the application of these principles can address the criticisms of current assessments.

最近对大规模总结性评估的批评声称,由于评估缺乏文化代表性,评估对历史上被排除在外的群体有偏见。伴随这些批评而来的是对更具文化响应性的评估的呼吁——将学生的背景特征考虑在内的评估;他们的信仰、价值观和道德规范;他们的生活经历;以及所有影响他们学习、行为和交流的因素。在本文中,我们基于对研究的回顾,提出了我们认为在评估中促进文化响应性所必需的临时原则。我们认为,应用这些原则可以解决对目前评估的批评。
{"title":"Culturally Responsive Assessment: Provisional Principles","authors":"Michael E. Walker,&nbsp;Margarita Olivera-Aguilar,&nbsp;Blair Lehman,&nbsp;Cara Laitusis,&nbsp;Danielle Guzman-Orth,&nbsp;Melissa Gholson","doi":"10.1002/ets2.12374","DOIUrl":"10.1002/ets2.12374","url":null,"abstract":"<p>Recent criticisms of large-scale summative assessments have claimed that the assessments are biased against historically excluded groups because of the assessments' lack of cultural representation. Accompanying these criticisms is a call for more culturally responsive assessments—assessments that take into account the background characteristics of the students; their beliefs, values, and ethics; their lived experiences; and everything that affects how they learn and behave and communicate. In this paper, we present provisional principles, based on a review of research, that we deem necessary for fostering cultural responsiveness in assessment. We believe the application of these principles can address the criticisms of current assessments.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2023 1","pages":"1-24"},"PeriodicalIF":0.0,"publicationDate":"2023-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ets2.12374","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45011414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interpretation and Use of a Workplace English Language Proficiency Test Score Report: Perspectives of TOEIC® Test Takers and Score Users in Taiwan 工作场所英语语言能力考试成绩报告的解读与使用:台湾TOEIC®考生和成绩使用者的观点
Q3 Social Sciences Pub Date : 2023-08-23 DOI: 10.1002/ets2.12373
Ching-Ni Hsieh

Research in validity suggests that stakeholders' interpretation and use of test results should be an aspect of validity. Claims about the meaningfulness of test score interpretations and consequences of test use should be backed by evidence that stakeholders understand the definition of the construct assessed and the score report information. The current study explored stakeholders' uses and interpretations of the score report of a workplace English language proficiency test, the TOEIC® Listening and Reading (TOEIC L&R) test. Online surveys were administered to TOEIC L&R test takers and institutional and corporate score users in Taiwan to collect data about their uses and interpretations of the test score report. Eleven survey respondents participated in follow-up interviews to further elaborate on their uses of the different score reporting information within the stakeholders' respective contexts. Results indicated that the participants used the TOEIC L&R test scores largely as intended by the test developer although some elements of the score report appeared to be less useful and could be confusing for stakeholders. Findings from this study highlight the importance of providing score reporting information with clarity and ease to enhance appropriate use and interpretation.

有效性研究表明,利益相关者对测试结果的解释和使用应该是有效性的一个方面。关于测试分数解释的意义和测试使用的后果的说法应该有证据支持,证明利益相关者理解评估结构的定义和分数报告信息。目前的研究探讨了利益相关者对工作场所英语语言能力测试——TOEIC®听力和阅读(TOEIC L&R)测试成绩报告的使用和解释。对台湾的TOEIC L&R考试考生、机构和企业分数用户进行了在线调查,以收集他们对考试分数报告的使用和解释数据。11名调查受访者参加了后续访谈,以进一步阐述他们在利益相关者各自背景下对不同分数报告信息的使用情况。结果表明,尽管分数报告中的一些元素似乎没有那么有用,并且可能会让利益相关者感到困惑,但参与者在很大程度上按照测试开发人员的意图使用了托业L&R测试分数。这项研究的结果强调了清晰方便地提供分数报告信息的重要性,以加强适当的使用和解释。
{"title":"Interpretation and Use of a Workplace English Language Proficiency Test Score Report: Perspectives of TOEIC® Test Takers and Score Users in Taiwan","authors":"Ching-Ni Hsieh","doi":"10.1002/ets2.12373","DOIUrl":"10.1002/ets2.12373","url":null,"abstract":"<p>Research in validity suggests that stakeholders' interpretation and use of test results should be an aspect of validity. Claims about the meaningfulness of test score interpretations and consequences of test use should be backed by evidence that stakeholders understand the definition of the construct assessed and the score report information. The current study explored stakeholders' uses and interpretations of the score report of a workplace English language proficiency test, the TOEIC® Listening and Reading (TOEIC L&amp;R) test. Online surveys were administered to TOEIC L&amp;R test takers and institutional and corporate score users in Taiwan to collect data about their uses and interpretations of the test score report. Eleven survey respondents participated in follow-up interviews to further elaborate on their uses of the different score reporting information within the stakeholders' respective contexts. Results indicated that the participants used the TOEIC L&amp;R test scores largely as intended by the test developer although some elements of the score report appeared to be less useful and could be confusing for stakeholders. Findings from this study highlight the importance of providing score reporting information with clarity and ease to enhance appropriate use and interpretation.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2023 1","pages":"1-21"},"PeriodicalIF":0.0,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ets2.12373","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48765322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Culturally Responsive Personalized Learning: Recommendations for a Working Definition and Framework 文化响应的个性化学习:工作定义和框架的建议
Q3 Social Sciences Pub Date : 2023-08-07 DOI: 10.1002/ets2.12372
Teresa M. Ober, Blair A. Lehman, Reginald Gooch, Olasumbo Oluwalana, Jaemarie Solyst, Geoffrey Phelps, Laura S. Hamilton

Culturally responsive personalized learning (CRPL) emphasizes the importance of aligning personalized learning approaches with previous research on culturally responsive practices to consider social, cultural, and linguistic contexts for learning. In the present discussion, we briefly summarize two bodies of literature considered in defining and developing a framework for CRPL: technology-enabled personalized learning and culturally relevant, responsive, and sustaining pedagogy. We then provide a definition and framework consisting of six key principles of CRPL, along with a brief discussion of theories and empirical evidence to support these principles. These six principles include agency, dynamic adaptation, connection to lived experiences, consideration of social movements, opportunities for collaboration, and shared power. These principles fall into three domains: fostering flexible student-centered learning experiences, leveraging relevant content and practices, and supporting meaningful interactions within a community. Finally, we conclude with some implications of this framework for researchers, policymakers, and practitioners working to ensure that all students receive high-quality learning opportunities that are both personalized and culturally responsive.

文化响应个性化学习(CRPL)强调将个性化学习方法与先前对文化响应实践的研究相结合的重要性,以考虑学习的社会、文化和语言背景。在目前的讨论中,我们简要总结了在定义和开发CRPL框架时考虑的两个文献:技术支持的个性化学习和文化相关、反应灵敏和可持续的教育学。然后,我们提供了一个由CRPL的六个关键原则组成的定义和框架,并简要讨论了支持这些原则的理论和经验证据。这六项原则包括能动性、动态适应、与生活经历的联系、对社会运动的考虑、合作机会和共享权力。这些原则分为三个领域:培养灵活的以学生为中心的学习体验,利用相关内容和实践,以及支持社区内有意义的互动。最后,我们总结了这一框架对研究人员、政策制定者和从业者的一些启示,他们致力于确保所有学生都能获得高质量的个性化和文化响应性的学习机会。
{"title":"Culturally Responsive Personalized Learning: Recommendations for a Working Definition and Framework","authors":"Teresa M. Ober,&nbsp;Blair A. Lehman,&nbsp;Reginald Gooch,&nbsp;Olasumbo Oluwalana,&nbsp;Jaemarie Solyst,&nbsp;Geoffrey Phelps,&nbsp;Laura S. Hamilton","doi":"10.1002/ets2.12372","DOIUrl":"10.1002/ets2.12372","url":null,"abstract":"<p>Culturally responsive personalized learning (CRPL) emphasizes the importance of aligning personalized learning approaches with previous research on culturally responsive practices to consider social, cultural, and linguistic contexts for learning. In the present discussion, we briefly summarize two bodies of literature considered in defining and developing a framework for CRPL: technology-enabled personalized learning and culturally relevant, responsive, and sustaining pedagogy. We then provide a definition and framework consisting of six key principles of CRPL, along with a brief discussion of theories and empirical evidence to support these principles. These six principles include agency, dynamic adaptation, connection to lived experiences, consideration of social movements, opportunities for collaboration, and shared power. These principles fall into three domains: fostering flexible student-centered learning experiences, leveraging relevant content and practices, and supporting meaningful interactions within a community. Finally, we conclude with some implications of this framework for researchers, policymakers, and practitioners working to ensure that all students receive high-quality learning opportunities that are both personalized and culturally responsive.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2023 1","pages":"1-14"},"PeriodicalIF":0.0,"publicationDate":"2023-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ets2.12372","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47605083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Performance Tasks to Provide Feedback and Assess Progress in Teacher Preparation 利用绩效任务提供反馈并评估教师准备工作的进展
Q3 Social Sciences Pub Date : 2023-07-21 DOI: 10.1002/ets2.12371
Geoffrey Phelps, Devon Kinsey, Thomas Florek, Nathan Jones

This report presents results from a survey of 64 elementary mathematics and reading language arts teacher educators providing feedback on a new type of short performance task. The performance tasks each present a brief teaching scenario and then require a short performance as if teaching actual students. Teacher educators participating in the study first reviewed six performance tasks, followed by a more in-depth review of two of the tasks. After reviewing the tasks, teacher educators completed an online survey providing input on the value of the tasks and on potential uses to support teacher preparation. The survey responses were positive with the majority of teacher educators supporting a variety of different uses of the performance tasks to support teacher preparation. The report concludes by proposing a larger theory for how the performance tasks can be used as both formative assessment tools to support teacher learning and summative assessments to guide decisions about candidates' readiness for the classroom.

本报告介绍了对 64 名小学数学和阅读语言艺术教师教育工作者进行调查的结果,他们对一种新型的简短表演任务提供了反馈意见。每个表演任务都呈现了一个简短的教学情景,然后要求像教授实际学生一样进行简短的表演。参与研究的教师教育工作者首先审查了六个表演任务,然后对其中两个任务进行了更深入的审查。审查完任务后,教师教育工作者完成了一项在线调查,就任务的价值和支持教师备课的潜在用途提供了意见。调查得到了积极的回应,大多数教师教育者支持以各种不同的方式使用绩效任务来支持教师准备工作。报告最后提出了一个更大的理论,即如何将绩效任务既用作支持教师学习的形成性评估工具,又用作终结性评估,以指导对候选人是否准备好进入课堂进行决策。
{"title":"Using Performance Tasks to Provide Feedback and Assess Progress in Teacher Preparation","authors":"Geoffrey Phelps,&nbsp;Devon Kinsey,&nbsp;Thomas Florek,&nbsp;Nathan Jones","doi":"10.1002/ets2.12371","DOIUrl":"10.1002/ets2.12371","url":null,"abstract":"<p>This report presents results from a survey of 64 elementary mathematics and reading language arts teacher educators providing feedback on a new type of short performance task. The performance tasks each present a brief teaching scenario and then require a short performance as if teaching actual students. Teacher educators participating in the study first reviewed six performance tasks, followed by a more in-depth review of two of the tasks. After reviewing the tasks, teacher educators completed an online survey providing input on the value of the tasks and on potential uses to support teacher preparation. The survey responses were positive with the majority of teacher educators supporting a variety of different uses of the performance tasks to support teacher preparation. The report concludes by proposing a larger theory for how the performance tasks can be used as both formative assessment tools to support teacher learning and summative assessments to guide decisions about candidates' readiness for the classroom.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2023 1","pages":"1-44"},"PeriodicalIF":0.0,"publicationDate":"2023-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ets2.12371","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49083835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ETS Research Report Series
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1