Developing an Innovative Elicited Imitation Task for Efficient English Proficiency Assessment

Q3 Social Sciences ETS Research Report Series Pub Date : 2021-11-17 DOI:10.1002/ets2.12338
Larry Davis, John Norris
{"title":"Developing an Innovative Elicited Imitation Task for Efficient English Proficiency Assessment","authors":"Larry Davis,&nbsp;John Norris","doi":"10.1002/ets2.12338","DOIUrl":null,"url":null,"abstract":"<p>The elicited imitation task (EIT), in which language learners listen to a series of spoken sentences and repeat each one verbatim, is a commonly used measure of language proficiency in second language acquisition research. The <i>TOEFL</i>® <i>Essentials</i>™ test includes an EIT as a holistic measure of speaking proficiency, referred to as the “Listen and Repeat” task type. In this report, we describe the design considerations that informed the development of the EIT for TOEFL Essentials. We also report the results of a series of investigations conducted during the prototyping and pilot phases of test development, which were undertaken with the goal of confirming task design specifications, evaluating scoring performance, and obtaining initial validity evidence to support score interpretation and use of the EIT in the TOEFL Essentials test. We found that task design variables generally performed as expected. The length of input sentence was strongly associated with performance (Pearson <i>r</i> = .88), consistent with the construct measured by the EIT, while other task variables not directly related to the EIT construct did not impact performance (e.g., graphics, speaker accent, and response time). Scorers drawn from TOEFL iBT test raters were able to score responses consistently with over 98% exact or adjacent interrater agreement on a 6-point scale, and scores on the pilot version of the EIT were highly reliable (Cronbach's α = .93 on the 15-item pilot version). Correlations between EIT scores and other measures were generally as expected: Correlations with other speaking tasks were high (.78–.84) and slightly to somewhat lower for other language measures (.73 for writing, .68 for listening, and .57 for reading). Correlation with an independent measure of holistic language proficiency (C-test) was moderately high (.69), as expected. We discuss the study findings in terms of the TOEFL Essentials test validity argument and point out limitations to the current results along with future research needs. Overall, we believe that the findings provide initial support to warrant the use of the EIT as operationalized in the TOEFL Essentials test.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2021 1","pages":"1-30"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ets2.12338","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ETS Research Report Series","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ets2.12338","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 7

Abstract

The elicited imitation task (EIT), in which language learners listen to a series of spoken sentences and repeat each one verbatim, is a commonly used measure of language proficiency in second language acquisition research. The TOEFL® Essentials™ test includes an EIT as a holistic measure of speaking proficiency, referred to as the “Listen and Repeat” task type. In this report, we describe the design considerations that informed the development of the EIT for TOEFL Essentials. We also report the results of a series of investigations conducted during the prototyping and pilot phases of test development, which were undertaken with the goal of confirming task design specifications, evaluating scoring performance, and obtaining initial validity evidence to support score interpretation and use of the EIT in the TOEFL Essentials test. We found that task design variables generally performed as expected. The length of input sentence was strongly associated with performance (Pearson r = .88), consistent with the construct measured by the EIT, while other task variables not directly related to the EIT construct did not impact performance (e.g., graphics, speaker accent, and response time). Scorers drawn from TOEFL iBT test raters were able to score responses consistently with over 98% exact or adjacent interrater agreement on a 6-point scale, and scores on the pilot version of the EIT were highly reliable (Cronbach's α = .93 on the 15-item pilot version). Correlations between EIT scores and other measures were generally as expected: Correlations with other speaking tasks were high (.78–.84) and slightly to somewhat lower for other language measures (.73 for writing, .68 for listening, and .57 for reading). Correlation with an independent measure of holistic language proficiency (C-test) was moderately high (.69), as expected. We discuss the study findings in terms of the TOEFL Essentials test validity argument and point out limitations to the current results along with future research needs. Overall, we believe that the findings provide initial support to warrant the use of the EIT as operationalized in the TOEFL Essentials test.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
开发一种创新的引出式模仿任务用于高效英语水平评估
诱导模仿任务(EIT)是语言学习者听一系列口语句子并逐字重复的任务,是第二语言习得研究中常用的语言熟练程度测量方法。托福®Essentials™考试包括一个EIT测试,作为口语能力的整体衡量标准,被称为“听和重复”任务类型。在本报告中,我们描述了为TOEFL Essentials开发EIT的设计考虑因素。我们还报告了在测试开发的原型和试点阶段进行的一系列调查的结果,这些调查的目的是确认任务设计规范,评估评分表现,并获得初步有效性证据,以支持在托福基本考试中解释分数和使用EIT。我们发现任务设计变量的表现与预期一致。输入句子的长度与表现密切相关(Pearson r = 0.88),这与EIT测量的结构一致,而其他与EIT结构不直接相关的任务变量(例如,图形,说话者口音和反应时间)不影响表现。从托福网考评分员中抽取的评分者能够在6分制的评分中保持98%以上的准确或接近的一致性,并且EIT试点版本的分数是高度可靠的(在15项试点版本中Cronbach's α = 0.93)。EIT得分与其他指标之间的相关性总体上与预期一致:与其他口语任务的相关性较高(0.78 - 0.84),而与其他语言指标的相关性略低(0.78 - 0.84)。写作73分,听力68分,阅读57分)。与整体语言能力的独立测量(C-test)的相关性中等高(0.69),正如预期的那样。我们根据托福基本测试的有效性论点讨论了研究结果,并指出了当前结果的局限性以及未来的研究需求。总的来说,我们认为这些发现为在托福基础考试中使用EIT提供了初步支持。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
ETS Research Report Series
ETS Research Report Series Social Sciences-Education
CiteScore
1.20
自引率
0.00%
发文量
17
期刊最新文献
Building a Validity Argument for the TOEFL Junior® Tests Validity, Reliability, and Fairness Evidence for the JD‐Next Exam Practical Considerations in Item Calibration With Small Samples Under Multistage Test Design: A Case Study Practical Considerations in Item Calibration With Small Samples Under Multistage Test Design: A Case Study Modeling Writing Traits in a Formative Essay Corpus
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1