Language Testing最新文献_第7页

Assessment of fluency in the Test of English for Educational Purposes 教育英语测试的流利性评估

IF 4.1 1区文学 0 LANGUAGE & LINGUISTICS

Language Testing

Pub Date : 2023-03-13 DOI: 10.1177/02655322231151384

P. Tavakoli, Gill Kendon, Svetlana Mazhurnaya, A. Ziomek

The main aim of this study was to investigate how oral fluency is assessed across different levels of proficiency in the Test of English for Educational Purposes (TEEP). Working with data from 56 test-takers performing a monologic task at a range of proficiency levels (equivalent to approximately levels 5.0, 5.5, 6.5, and 7.5 in the IELTS scoring system), we used PRAAT analysis to measure speed, breakdown, and repair fluency. A multivariate analysis of variance and a series of analyses of variance were used to examine the differences between fluency measures at these different levels of proficiency. The results largely replicate previous research in this area suggesting that (a) speed measures distinguish between lower levels (5.0 and 5.5) and higher levels of proficiency (6.5 and 7.5), (b) breakdown measures of silent pauses distinguish between 5.0 and higher levels of 6.5 or 7.5, and (c) repair measures and filled pauses do not distinguish between any of the proficiency levels. Using the results, we have proposed changes that can help refine the fluency rating descriptors and rater training materials in the TEEP.

本研究的主要目的是调查在教育英语测试(TEEP)中，口语流利度是如何在不同水平的熟练程度中进行评估的。研究了56名考生在熟练程度范围内(大约相当于雅思评分系统中的5.0、5.5、6.5和7.5级)执行单一任务的数据，我们使用PRAAT分析来测量速度、故障和修复流畅性。多变量方差分析和一系列方差分析被用来检验这些不同熟练程度的流利度测量之间的差异。结果在很大程度上重复了该领域先前的研究，表明(a)速度测量区分较低水平(5.0和5.5)和较高水平(6.5和7.5)，(b)沉默停顿的故障测量区分5.0和更高水平(6.5或7.5)，以及(c)修复措施和填充停顿不区分任何熟练程度。利用这些结果，我们提出了一些改进建议，可以帮助改进TEEP中的流利度评分描述符和评分培训材料。

引用次数: 0

The relationship among accent familiarity, shared L1, and comprehensibility: A path analysis perspective 口音熟悉度、共同母语和可理解性的关系:路径分析视角

IF 4.1 1区文学 0 LANGUAGE & LINGUISTICS

Language Testing

Pub Date : 2023-03-13 DOI: 10.1177/02655322231156105

Yongzhi Miao

Scholars have argued for the inclusion of different spoken varieties of English in high-stakes listening tests to better represent the global use of English. However, doing so may introduce additional construct-irrelevant variance due to accent familiarity and the shared first language (L1) advantage, which could threaten test fairness. However, it is unclear to what extent accent familiarity and a shared L1 are related to or conflated with each other. The present study investigates the relationship between accent familiarity, a shared L1, and comprehensibility. Results from descriptive statistics and Mann–Whitney U test based on 302 second language (L2) English listeners’ responses to an online questionnaire suggested that a shared L1 meant high accent familiarity, but not vice versa. A path analysis revealed a complex relationship between accent familiarity, a shared L1, and comprehensibility. While a shared L1 had a direct effect on accent familiarity, and accent familiarity had a direct effect on comprehensibility, a shared L1 did not predict comprehensibility when accent familiarity was controlled for. These results disentangle accent familiarity from a shared L1. Researchers should consider both constructs when investigating fairness in relation to World Englishes for listening assessment.

学者们主张在高风险的听力测试中纳入不同类型的英语口语，以更好地代表英语的全球使用。然而，这样做可能会由于口音熟悉和共享的第一语言(L1)优势而引入额外的结构无关的方差，这可能会威胁到测试的公平性。然而，目前尚不清楚口音熟悉度和共同的母语在多大程度上相互关联或合并。本研究探讨口音熟悉度、共同母语和可理解性之间的关系。描述性统计和曼-惠特尼U测试基于302名第二语言(L2)英语听众对一份在线问卷的回答，结果表明，共享L1意味着高度的口音熟悉，反之则不然。通径分析揭示了口音熟悉度、共同母语和可理解性之间的复杂关系。虽然共同的母语对口音熟悉度有直接影响，而口音熟悉度对可理解性有直接影响，但在控制口音熟悉度的情况下，共同的母语并不能预测可理解性。这些结果将口音熟悉度与共享的L1区分开来。研究人员在调查与世界英语听力评估相关的公平性时应考虑这两个构念。

{"title":"The relationship among accent familiarity, shared L1, and comprehensibility: A path analysis perspective","authors":"Yongzhi Miao","doi":"10.1177/02655322231156105","DOIUrl":"https://doi.org/10.1177/02655322231156105","url":null,"abstract":"Scholars have argued for the inclusion of different spoken varieties of English in high-stakes listening tests to better represent the global use of English. However, doing so may introduce additional construct-irrelevant variance due to accent familiarity and the shared first language (L1) advantage, which could threaten test fairness. However, it is unclear to what extent accent familiarity and a shared L1 are related to or conflated with each other. The present study investigates the relationship between accent familiarity, a shared L1, and comprehensibility. Results from descriptive statistics and Mann–Whitney U test based on 302 second language (L2) English listeners’ responses to an online questionnaire suggested that a shared L1 meant high accent familiarity, but not vice versa. A path analysis revealed a complex relationship between accent familiarity, a shared L1, and comprehensibility. While a shared L1 had a direct effect on accent familiarity, and accent familiarity had a direct effect on comprehensibility, a shared L1 did not predict comprehensibility when accent familiarity was controlled for. These results disentangle accent familiarity from a shared L1. Researchers should consider both constructs when investigating fairness in relation to World Englishes for listening assessment.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":"40 1","pages":"723 - 747"},"PeriodicalIF":4.1,"publicationDate":"2023-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48154760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Proficiency at the lexis–grammar interface: Comparing oral versus written French exam tasks 熟练掌握词汇-语法界面：比较法语口语和书面考试任务

IF 4.1 1区文学 0 LANGUAGE & LINGUISTICS

Language Testing

Pub Date : 2023-03-07 DOI: 10.1177/02655322231153543

Nathan Vandeweerd, Alex Housen, M. Paquot

This study investigates whether re-thinking the separation of lexis and grammar in language testing could lead to more valid inferences about proficiency across modes. As argued by Römer, typical scoring rubrics ignore important information about proficiency encoded at the lexis–grammar interface, in particular how the co-selection of lexical and grammatical features is mediated by communicative function. This is especially evident when assessing oral versus written exam tasks, where the modality of a task may intersect with register-induced variation in linguistic output. This article presents the results of an empirical study in which we measured the diversity and sophistication of four-word lexical bundles extracted from a corpus of French proficiency exams. Analysis revealed that the diversity of noun-based bundles was a significant predictor of written proficiency scores and the sophistication of verb-based bundles was a significant predictor of proficiency scores across both modes, suggesting that communicative function as well as the constraints of online planning mediated the effect of lexicogrammatical phenomena on proficiency scores. Importantly, lexicogrammatical measures were better predictors of proficiency than solely lexical-based measures, which speaks to the potential utility of considering lexicogrammatical competence on scoring rubrics.

本研究旨在探讨在语言测试中重新思考词汇和语法的分离是否可以对跨模式的熟练程度做出更有效的推断。如Römer所言，典型的评分标准忽略了在词汇-语法界面编码的关于熟练程度的重要信息，特别是词汇和语法特征的共同选择是如何由交际功能介导的。这在评估口试和笔试任务时尤其明显，其中任务的情态可能与语域引起的语言输出变化相交。本文介绍了一项实证研究的结果，在该研究中，我们测量了从法语水平考试语料库中提取的四词词汇束的多样性和复杂性。分析发现，两种模式下，名词语束的多样性是书面熟练程度得分的显著预测因子，动词语束的复杂性是书面熟练程度得分的显著预测因子，这表明交际功能和在线计划的约束介导了词汇语法现象对熟练程度得分的影响。重要的是，词汇语法测量比单独的基于词汇的测量更能预测熟练程度，这说明了在评分标准中考虑词汇语法能力的潜在效用。

{"title":"Proficiency at the lexis–grammar interface: Comparing oral versus written French exam tasks","authors":"Nathan Vandeweerd, Alex Housen, M. Paquot","doi":"10.1177/02655322231153543","DOIUrl":"https://doi.org/10.1177/02655322231153543","url":null,"abstract":"This study investigates whether re-thinking the separation of lexis and grammar in language testing could lead to more valid inferences about proficiency across modes. As argued by Römer, typical scoring rubrics ignore important information about proficiency encoded at the lexis–grammar interface, in particular how the co-selection of lexical and grammatical features is mediated by communicative function. This is especially evident when assessing oral versus written exam tasks, where the modality of a task may intersect with register-induced variation in linguistic output. This article presents the results of an empirical study in which we measured the diversity and sophistication of four-word lexical bundles extracted from a corpus of French proficiency exams. Analysis revealed that the diversity of noun-based bundles was a significant predictor of written proficiency scores and the sophistication of verb-based bundles was a significant predictor of proficiency scores across both modes, suggesting that communicative function as well as the constraints of online planning mediated the effect of lexicogrammatical phenomena on proficiency scores. Importantly, lexicogrammatical measures were better predictors of proficiency than solely lexical-based measures, which speaks to the potential utility of considering lexicogrammatical competence on scoring rubrics.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":"40 1","pages":"658 - 683"},"PeriodicalIF":4.1,"publicationDate":"2023-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42701730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Strategy use in a spoken dialog system–delivered paired discussion task: A stimulated recall study 在口语对话系统提供的配对讨论任务中的策略使用:刺激回忆研究

IF 4.1 1区文学 0 LANGUAGE & LINGUISTICS

Language Testing

Pub Date : 2023-03-07 DOI: 10.1177/02655322231152620

Nazlinur Gokturk, E. Chukharev-Hudilainen

With recent technological advances, researchers have begun to explore the potential use of spoken dialog systems (SDSs) for L2 oral communication assessment. While several studies support the feasibility of building these systems for various types of oral tasks, research on the construct validity of SDS-delivered tasks is still limited. Thus, this study examines the cognitive processes engaged by an SDS-delivered paired discussion task, focusing on strategic competence, an essential component of L2 oral communication ability. Thirty adult test-takers completed a paired discussion task with an SDS acting as an interlocutor and provided stimulated recalls about their strategy use in the task. Three trained raters independently evaluated the test-takers’ oral task responses using a holistic rating scale devised for the task. Findings revealed the use of six categories of construct-relevant strategies during task performance. While no statistically significant differences were found in the use of these categories between high- and low-ability test-takers, marked differences were observed in the use of individual strategies within the categories between the test-takers at the two levels. These findings provide insight into how test-takers at different ability levels cognitively interact with SDS-delivered paired discussion tasks and offer implications for the design and validation of such tasks.

随着最近的技术进步，研究人员已经开始探索口语对话系统（SDS）在二语口语交际评估中的潜在用途。虽然一些研究支持为各种类型的口头任务构建这些系统的可行性，但对SDS交付任务的结构有效性的研究仍然有限。因此，本研究考察了SDS提供的配对讨论任务所涉及的认知过程，重点关注策略能力，这是二语口语交际能力的重要组成部分。30名成年考生完成了一项由SDS作为对话者的配对讨论任务，并对他们在任务中的策略使用进行了刺激性回忆。三名经过培训的评分员使用为任务设计的整体评分量表独立评估了考生的口头任务反应。研究结果显示，在任务执行过程中使用了六类结构相关策略。虽然高能力和低能力考生在使用这些类别方面没有发现统计学上的显著差异，但在两个级别的考生之间，在使用类别内的个人策略方面观察到了显著差异。这些发现深入了解了不同能力水平的考生如何与SDS提供的配对讨论任务进行认知互动，并为这些任务的设计和验证提供了启示。

{"title":"Strategy use in a spoken dialog system–delivered paired discussion task: A stimulated recall study","authors":"Nazlinur Gokturk, E. Chukharev-Hudilainen","doi":"10.1177/02655322231152620","DOIUrl":"https://doi.org/10.1177/02655322231152620","url":null,"abstract":"With recent technological advances, researchers have begun to explore the potential use of spoken dialog systems (SDSs) for L2 oral communication assessment. While several studies support the feasibility of building these systems for various types of oral tasks, research on the construct validity of SDS-delivered tasks is still limited. Thus, this study examines the cognitive processes engaged by an SDS-delivered paired discussion task, focusing on strategic competence, an essential component of L2 oral communication ability. Thirty adult test-takers completed a paired discussion task with an SDS acting as an interlocutor and provided stimulated recalls about their strategy use in the task. Three trained raters independently evaluated the test-takers’ oral task responses using a holistic rating scale devised for the task. Findings revealed the use of six categories of construct-relevant strategies during task performance. While no statistically significant differences were found in the use of these categories between high- and low-ability test-takers, marked differences were observed in the use of individual strategies within the categories between the test-takers at the two levels. These findings provide insight into how test-takers at different ability levels cognitively interact with SDS-delivered paired discussion tasks and offer implications for the design and validation of such tasks.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":"40 1","pages":"630 - 657"},"PeriodicalIF":4.1,"publicationDate":"2023-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46064308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Investigating the impact of self-pacing on the L2 listening performance of young learner candidates with differing L1 literacy skills 研究自主节奏对具有不同一级读写能力的年轻考生二级听力表现的影响

IF 4.1 1区文学 0 LANGUAGE & LINGUISTICS

Language Testing

Pub Date : 2023-03-02 DOI: 10.1177/02655322221149642

K. Eberharter, Judit Kormos, Elisa Guggenbichler, Viktoria S. Ebner, Shungo Suzuki, Doris Moser-Frötscher, Eva Konrad, B. Kremmel

In online environments, listening involves being able to pause or replay the recording as needed. Previous research indicates that control over the listening input could improve the measurement accuracy of listening assessment. Self-pacing also supports the second language (L2) comprehension processes of test-takers with specific learning difficulties (SpLDs) or, more specifically, of learners with reading-related learning difficulties who might have slower processing speed and limited working memory capacity. Our study examined how L1 literacy skills influence L2 listening performance in the standard single-listening and self-paced administration mode of the listening section of the Test of English as a Foreign Language (TOEFL) Junior Standard test. In a counterbalanced design, 139 Austrian learners of English completed 15 items in a standard single-listening condition and another 15 in a self-paced condition. L1 literacy skills were assessed via a standard reading, non-word reading, word-naming, and non-word repetition test. Generalized Linear Mixed-Effects Modelling revealed that self-pacing had no statistically significant effect on listening scores nor did it boost the performance of test-takers with lower L1 literacy scores indicative of reading-related SpLDs. The results indicate that young test-takers might require training in self-pacing or that self-paced conditions may need to be carefully implemented when they are offered to candidates with SpLDs.

在在线环境中，收听包括能够根据需要暂停或重放录音。已有研究表明，控制听力输入可以提高听力测评的测量精度。自我节奏也支持有特殊学习困难的考生的第二语言理解过程，或者更具体地说，是有阅读相关学习困难的学习者，他们可能有较慢的处理速度和有限的工作记忆容量。本研究考察了在托福初级标准考试听力部分的标准单听和自定进度管理模式下，母语读写技能如何影响第二语言听力表现。在平衡设计中，139名奥地利英语学习者在标准单听条件下完成了15个项目，另外15个在自主节奏条件下完成。通过标准阅读、非单词阅读、单词命名和非单词重复测试来评估母语读写能力。广义线性混合效应模型显示，自我节奏对听力成绩没有统计学意义上的显著影响，也没有提高母语读写分数较低的考生的表现，这表明阅读相关的spld。结果表明，年轻的考生可能需要自我调整节奏的培训，或者在向患有spd的考生提供自我调整节奏的条件时，可能需要仔细实施。

{"title":"Investigating the impact of self-pacing on the L2 listening performance of young learner candidates with differing L1 literacy skills","authors":"K. Eberharter, Judit Kormos, Elisa Guggenbichler, Viktoria S. Ebner, Shungo Suzuki, Doris Moser-Frötscher, Eva Konrad, B. Kremmel","doi":"10.1177/02655322221149642","DOIUrl":"https://doi.org/10.1177/02655322221149642","url":null,"abstract":"In online environments, listening involves being able to pause or replay the recording as needed. Previous research indicates that control over the listening input could improve the measurement accuracy of listening assessment. Self-pacing also supports the second language (L2) comprehension processes of test-takers with specific learning difficulties (SpLDs) or, more specifically, of learners with reading-related learning difficulties who might have slower processing speed and limited working memory capacity. Our study examined how L1 literacy skills influence L2 listening performance in the standard single-listening and self-paced administration mode of the listening section of the Test of English as a Foreign Language (TOEFL) Junior Standard test. In a counterbalanced design, 139 Austrian learners of English completed 15 items in a standard single-listening condition and another 15 in a self-paced condition. L1 literacy skills were assessed via a standard reading, non-word reading, word-naming, and non-word repetition test. Generalized Linear Mixed-Effects Modelling revealed that self-pacing had no statistically significant effect on listening scores nor did it boost the performance of test-takers with lower L1 literacy scores indicative of reading-related SpLDs. The results indicate that young test-takers might require training in self-pacing or that self-paced conditions may need to be carefully implemented when they are offered to candidates with SpLDs.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":"40 1","pages":"960 - 983"},"PeriodicalIF":4.1,"publicationDate":"2023-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48923531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Book Review: The sociology of assessment: Comparative and policy perspectives: The selected works of Patricia Broadfoot 书评:评估社会学:比较与政策视角:帕特里夏·布罗德富特选集

IF 4.1 1区文学 0 LANGUAGE & LINGUISTICS

Language Testing

Pub Date : 2023-03-01 DOI: 10.1177/02655322231158554

B. Deygers

Education is an emancipatory force in society, and centralized testing offers an objective way to select talented pupils, identify performant schools within an educational system, and compare educational systems on a global scale. Such is the traditional view of educational assessment. This view, however, is rooted in 19th-century positivistic thinking, is naïve in its belief in objective measurement and agnostic toward evidence to the contrary, so argues educational sociologist Patricia Broadfoot in her book The Sociology of Assessment. A collection of essays, chapters, and articles that span the esteemed educational sociologist’s career, this book is a testament to her interest in sociology and in comparative education. The volume has two central themes that weave its four sections together. First, it is a defense of comparative policy analysis. Broadfoot contends that education is a cultural project first and foremost and shows herself to be a fierce opponent of educational policies that serve a neoliberal agenda. Because education is embedded in a specific culture, comparative analysis helps to identify which aspects of an educational policy are context-specific and which are relatively constant across contexts. In other words, identifying idiosyncratic educational policies provokes questions about practices that may seem self-evident for people within a certain educational culture, but are not universal. One aspect of education that comparative analysis shows to be rather constant across systems is the central role of standardized testing as a driver of education. Exploring and critiquing the use of such tests as a driver of educational policy is the second central theme and the backbone of the book. The first section is the most conceptual and philosophical one. It establishes the core concepts of Broadfoot’s thinking and outlines what she sees as the primary functions of assessment: attesting competence, regulating competition and selection, determining and shaping educational content, and controlling educational quality. She also explains how these functions are linked to Durkheim’s work and zooms in on Weber, Bernstein, Bourdieu, Gramsci, and Foucault to lay the foundation of an argument that recurs throughout the book. This argument positions standardized educational testing as a 1158554 LTJ0010.1177/02655322231158554Language TestingBook Review book-review2023

教育是社会的一种解放力量，集中考试提供了一种客观的方式来选择有才能的学生，在教育系统中确定表现优异的学校，并在全球范围内比较教育系统。这就是传统的教育评价观。教育社会学家帕特里夏·布罗德富特在她的《评估的社会学》一书中指出，这种观点根植于19世纪的实证主义思想，其对客观测量的信仰和对相反证据的不可知论是naïve。本书汇集了这位受人尊敬的教育社会学家的论文、章节和文章，证明了她对社会学和比较教育的兴趣。这本书有两个中心主题，将四个部分编织在一起。首先，它是对比较政策分析的辩护。布罗德富特认为，教育首先是一项文化工程，并表明自己是为新自由主义议程服务的教育政策的强烈反对者。由于教育植根于特定的文化，比较分析有助于确定教育政策的哪些方面是特定于具体情况的，哪些方面在不同情况下相对不变。换句话说，识别特殊的教育政策引发了关于实践的问题，这些实践对于某些教育文化中的人来说似乎是不言而喻的，但并不普遍。比较分析显示，教育的一个方面在各个系统中都是相当恒定的，那就是标准化考试作为教育驱动力的核心作用。探索和批评这些测试作为教育政策驱动因素的使用是本书的第二个中心主题和支柱。第一部分是最具概念性和哲理性的部分。它建立了布罗德富特思想的核心概念，并概述了她认为评估的主要功能:证明能力，规范竞争和选择，确定和塑造教育内容，控制教育质量。她还解释了这些功能是如何与迪尔凯姆的作品联系在一起的，并聚焦于韦伯、伯恩斯坦、布迪厄、葛兰西和福柯，为贯穿全书的论点奠定了基础。这一论点将标准化教育考试定位为语言测试

{"title":"Book Review: The sociology of assessment: Comparative and policy perspectives: The selected works of Patricia Broadfoot","authors":"B. Deygers","doi":"10.1177/02655322231158554","DOIUrl":"https://doi.org/10.1177/02655322231158554","url":null,"abstract":"Education is an emancipatory force in society, and centralized testing offers an objective way to select talented pupils, identify performant schools within an educational system, and compare educational systems on a global scale. Such is the traditional view of educational assessment. This view, however, is rooted in 19th-century positivistic thinking, is naïve in its belief in objective measurement and agnostic toward evidence to the contrary, so argues educational sociologist Patricia Broadfoot in her book The Sociology of Assessment. A collection of essays, chapters, and articles that span the esteemed educational sociologist’s career, this book is a testament to her interest in sociology and in comparative education. The volume has two central themes that weave its four sections together. First, it is a defense of comparative policy analysis. Broadfoot contends that education is a cultural project first and foremost and shows herself to be a fierce opponent of educational policies that serve a neoliberal agenda. Because education is embedded in a specific culture, comparative analysis helps to identify which aspects of an educational policy are context-specific and which are relatively constant across contexts. In other words, identifying idiosyncratic educational policies provokes questions about practices that may seem self-evident for people within a certain educational culture, but are not universal. One aspect of education that comparative analysis shows to be rather constant across systems is the central role of standardized testing as a driver of education. Exploring and critiquing the use of such tests as a driver of educational policy is the second central theme and the backbone of the book. The first section is the most conceptual and philosophical one. It establishes the core concepts of Broadfoot’s thinking and outlines what she sees as the primary functions of assessment: attesting competence, regulating competition and selection, determining and shaping educational content, and controlling educational quality. She also explains how these functions are linked to Durkheim’s work and zooms in on Weber, Bernstein, Bourdieu, Gramsci, and Foucault to lay the foundation of an argument that recurs throughout the book. This argument positions standardized educational testing as a 1158554 LTJ0010.1177/02655322231158554Language TestingBook Review book-review2023","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":"40 1","pages":"840 - 843"},"PeriodicalIF":4.1,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43700050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Book Review: Looking Like a Language, Sounding Like a Race: Raciolinguistic Ideologies and the Learning of Latinidad 书评:《看起来像一种语言，听起来像一个种族:种族语言意识形态和拉丁语的学习》

IF 4.1 1区文学 0 LANGUAGE & LINGUISTICS

Language Testing

Pub Date : 2023-02-15 DOI: 10.1177/02655322221143928

K. Khan

引用次数: 0

L2 and L1 semantic context indices as automated measures of lexical sophistication L2和L1语义上下文指数作为词汇复杂度的自动化度量

IF 4.1 1区文学 0 LANGUAGE & LINGUISTICS

Language Testing

Pub Date : 2023-02-02 DOI: 10.1177/02655322221147924

Kátia Monteiro, S. Crossley, Robert-Mihai Botarleanu, M. Dascalu

Lexical frequency benchmarks have been extensively used to investigate second language (L2) lexical sophistication, especially in language assessment studies. However, indices based on semantic co-occurrence, which may be a better representation of the experience language users have with lexical items, have not been sufficiently tested as benchmarks of lexical sophistication. To address this gap, we developed and tested indices based on semantic co-occurrence from two computational methods, namely, Latent Semantic Analysis and Word2Vec. The indices were developed from one L2 written corpus (i.e., EF Cambridge Open Language Database [EF-CAMDAT]) and one first language (L1) written corpus (i.e., Corpus of Contemporary American English [COCA] Magazine). Available L1 semantic context indices (i.e., Touchstone Applied Sciences Associates [TASA] indices) were also assessed. To validate the indices, they were used to predict L2 essay quality scores as judged by human raters. The models suggested that the semantic context indices developed from EF-CAMDAT and TASA, but not the COCA Magazine indices, explained unique variance in the presence of lexical sophistication measures. This study suggests that semantic context indices based on multi-level corpora, including L2 corpora, may provide a useful representation of the experience L2 writers have with input, which may assist with automatic scoring of L2 writing.

词汇频率基准已被广泛用于研究第二语言（L2）的词汇复杂度，尤其是在语言评估研究中。然而，基于语义共现的指数可能更好地反映了语言用户对词汇项目的体验，但尚未作为词汇复杂度的基准进行充分的测试。为了解决这一差距，我们从两种计算方法，即潜在语义分析和Word2Verc，开发并测试了基于语义共现的索引。这些索引是从一个L2书面语料库（即EF剑桥开放语言数据库[EF-CAMDAT]）和一个第一语言（L1）书面语料库（如《当代美国英语语料库》[COCA]杂志）中开发的。还评估了可用的L1语义上下文指数（即Touchstone Applied Sciences Associates[TASA]指数）。为了验证这些指标，他们被用来预测由人类评分者判断的二语作文质量分数。模型表明，由EF-CAMDAT和TASA发展而来的语义上下文指数，而不是COCA杂志指数，解释了词汇复杂度测量的独特差异。本研究表明，基于多层次语料库（包括二语语料库）的语义上下文指数可以提供二语作者对输入体验的有用表示，这可能有助于二语写作的自动评分。

{"title":"L2 and L1 semantic context indices as automated measures of lexical sophistication","authors":"Kátia Monteiro, S. Crossley, Robert-Mihai Botarleanu, M. Dascalu","doi":"10.1177/02655322221147924","DOIUrl":"https://doi.org/10.1177/02655322221147924","url":null,"abstract":"Lexical frequency benchmarks have been extensively used to investigate second language (L2) lexical sophistication, especially in language assessment studies. However, indices based on semantic co-occurrence, which may be a better representation of the experience language users have with lexical items, have not been sufficiently tested as benchmarks of lexical sophistication. To address this gap, we developed and tested indices based on semantic co-occurrence from two computational methods, namely, Latent Semantic Analysis and Word2Vec. The indices were developed from one L2 written corpus (i.e., EF Cambridge Open Language Database [EF-CAMDAT]) and one first language (L1) written corpus (i.e., Corpus of Contemporary American English [COCA] Magazine). Available L1 semantic context indices (i.e., Touchstone Applied Sciences Associates [TASA] indices) were also assessed. To validate the indices, they were used to predict L2 essay quality scores as judged by human raters. The models suggested that the semantic context indices developed from EF-CAMDAT and TASA, but not the COCA Magazine indices, explained unique variance in the presence of lexical sophistication measures. This study suggests that semantic context indices based on multi-level corpora, including L2 corpora, may provide a useful representation of the experience L2 writers have with input, which may assist with automatic scoring of L2 writing.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":"40 1","pages":"576 - 606"},"PeriodicalIF":4.1,"publicationDate":"2023-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47590675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Universal tools activation in English language proficiency assessments: A comparison of Grades 1–12 English learners with and without disabilities 通用工具在英语语言能力评估中的激活:1-12年级有残疾和无残疾英语学习者的比较

IF 4.1 1区文学 0 LANGUAGE & LINGUISTICS

Language Testing

Pub Date : 2023-02-02 DOI: 10.1177/02655322221149009

Ahyoung Alicia Kim, Meltem Yumsek, J. Kemp, Mark Chapman, H. Gary Cook

English learners (ELs) comprise approximately 10% of kindergarten to Grade 12 students in US public schools, with about 15% of ELs identified as having disabilities. English language proficiency (ELP) assessments must adhere to universal design principles and incorporate universal tools, designed to increase accessibility for all ELs, including those with disabilities. This two-phase mixed methods study examined the extent Grades 1–12 ELs with and without disabilities activated universal tools during an online ELP assessment: Color Overlay, Color Contrast, Help Tools, Line Guide, Highlighter, Magnifier, and Sticky Notes. In Phase 1, analyses were conducted on 1.25 million students’ test and telemetry data (record of keystrokes and clicks). Phase 2 involved interviewing 55 ELs after test administration. Findings show that ELs activated the Line Guide, Highlighter, and Magnifier more frequently than others. The tool activation rate was higher in listening and reading domains than in speaking and writing. A significantly higher percentage of ELs with disabilities activated the tools than ELs without disabilities, but effect sizes were small; interview findings further revealed students’ rationale for tool use. Results indicate differences in ELs’ activation of universal tools depending on their disability category and language domain, providing evidence for the usefulness of these tools.

在美国公立学校，英语学习者约占幼儿园至12年级学生的10%，其中约15%的英语学习者被认定为残疾。英语水平（ELP）评估必须遵循通用设计原则，并纳入通用工具，旨在提高包括残疾人在内的所有ELs的无障碍性。这项分两阶段的混合方法研究考察了1-12年级有残疾和无残疾的ELs在ELP在线评估中激活通用工具的程度：颜色叠加、颜色对比、帮助工具、线条指南、荧光笔、放大镜和贴纸。在第一阶段，对125万名学生的测试和遥测数据（击键和点击记录）进行了分析。第二阶段包括在试验给药后采访55名ELs。研究结果显示，EL比其他人更频繁地激活Line Guide、Highlighter和Magnifier。听力和阅读领域的工具激活率高于口语和写作领域。残疾ELs激活工具的比例明显高于无残疾ELs，但效果较小；访谈结果进一步揭示了学生使用工具的基本原理。结果表明，根据残疾类别和语言领域的不同，ELs对通用工具的激活存在差异，为这些工具的有用性提供了证据。

{"title":"Universal tools activation in English language proficiency assessments: A comparison of Grades 1–12 English learners with and without disabilities","authors":"Ahyoung Alicia Kim, Meltem Yumsek, J. Kemp, Mark Chapman, H. Gary Cook","doi":"10.1177/02655322221149009","DOIUrl":"https://doi.org/10.1177/02655322221149009","url":null,"abstract":"English learners (ELs) comprise approximately 10% of kindergarten to Grade 12 students in US public schools, with about 15% of ELs identified as having disabilities. English language proficiency (ELP) assessments must adhere to universal design principles and incorporate universal tools, designed to increase accessibility for all ELs, including those with disabilities. This two-phase mixed methods study examined the extent Grades 1–12 ELs with and without disabilities activated universal tools during an online ELP assessment: Color Overlay, Color Contrast, Help Tools, Line Guide, Highlighter, Magnifier, and Sticky Notes. In Phase 1, analyses were conducted on 1.25 million students’ test and telemetry data (record of keystrokes and clicks). Phase 2 involved interviewing 55 ELs after test administration. Findings show that ELs activated the Line Guide, Highlighter, and Magnifier more frequently than others. The tool activation rate was higher in listening and reading domains than in speaking and writing. A significantly higher percentage of ELs with disabilities activated the tools than ELs without disabilities, but effect sizes were small; interview findings further revealed students’ rationale for tool use. Results indicate differences in ELs’ activation of universal tools depending on their disability category and language domain, providing evidence for the usefulness of these tools.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":"40 1","pages":"877 - 903"},"PeriodicalIF":4.1,"publicationDate":"2023-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45570526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Linking scores from two written receptive English academic vocabulary tests—The VLT-Ac and the AVT 两个书面接受性英语学术词汇测试- VLT-Ac和AVT的连接分数

IF 4.1 1区文学 0 LANGUAGE & LINGUISTICS

Language Testing

Pub Date : 2023-01-12 DOI: 10.1177/02655322221145643

Marcus Warnby, Hans Malmström, Kajsa Yang Hansen

The academic section of the Vocabulary Levels Test (VLT-Ac) and the Academic Vocabulary Test (AVT) both assess meaning-recognition knowledge of written receptive academic vocabulary, deemed central for engagement in academic activities. Depending on the purpose and context of the testing, either of the tests can be appropriate, but for research and pedagogical purposes, it is important to be able to compare scores achieved on the two tests between administrations and within similar contexts. Based on a sample of 385 upper secondary school students in university-preparatory programs (independent CEFR B2-level users of English), this study presents a comparison model by linking the VLT-Ac and the AVT using concurrent calibration procedures in Item Response Theory. The key outcome of the study is a score comparison table providing a means for approximate score comparisons. Additionally, the study showcases a viable and valid method of comparing vocabulary scores from an older test with those from a newer one.

词汇水平测试(VLT-Ac)和学术词汇测试(AVT)的学术部分都评估书面接受性学术词汇的意义识别知识，这被认为是参与学术活动的核心。根据考试的目的和背景，这两种考试中的任何一种都是合适的，但为了研究和教学目的，能够在不同的行政部门和相似的背景下比较两种考试的成绩是很重要的。本研究以385名大学预科高中学生(独立的CEFR b2级英语使用者)为样本，利用项目反应理论中的同步校准程序，建立了VLT-Ac和AVT的比较模型。该研究的主要结果是一个分数比较表，提供了一种近似分数比较的方法。此外，该研究还展示了一种可行且有效的方法来比较旧测试和新测试的词汇分数。

引用次数: 1