首页 > 最新文献

Language Testing最新文献

英文 中文
L2 and L1 semantic context indices as automated measures of lexical sophistication L2和L1语义上下文指数作为词汇复杂度的自动化度量
IF 4.1 1区 文学 Q1 Arts and Humanities Pub Date : 2023-02-02 DOI: 10.1177/02655322221147924
Kátia Monteiro, S. Crossley, Robert-Mihai Botarleanu, M. Dascalu
Lexical frequency benchmarks have been extensively used to investigate second language (L2) lexical sophistication, especially in language assessment studies. However, indices based on semantic co-occurrence, which may be a better representation of the experience language users have with lexical items, have not been sufficiently tested as benchmarks of lexical sophistication. To address this gap, we developed and tested indices based on semantic co-occurrence from two computational methods, namely, Latent Semantic Analysis and Word2Vec. The indices were developed from one L2 written corpus (i.e., EF Cambridge Open Language Database [EF-CAMDAT]) and one first language (L1) written corpus (i.e., Corpus of Contemporary American English [COCA] Magazine). Available L1 semantic context indices (i.e., Touchstone Applied Sciences Associates [TASA] indices) were also assessed. To validate the indices, they were used to predict L2 essay quality scores as judged by human raters. The models suggested that the semantic context indices developed from EF-CAMDAT and TASA, but not the COCA Magazine indices, explained unique variance in the presence of lexical sophistication measures. This study suggests that semantic context indices based on multi-level corpora, including L2 corpora, may provide a useful representation of the experience L2 writers have with input, which may assist with automatic scoring of L2 writing.
词汇频率基准已被广泛用于研究第二语言(L2)的词汇复杂度,尤其是在语言评估研究中。然而,基于语义共现的指数可能更好地反映了语言用户对词汇项目的体验,但尚未作为词汇复杂度的基准进行充分的测试。为了解决这一差距,我们从两种计算方法,即潜在语义分析和Word2Verc,开发并测试了基于语义共现的索引。这些索引是从一个L2书面语料库(即EF剑桥开放语言数据库[EF-CAMDAT])和一个第一语言(L1)书面语料库(如《当代美国英语语料库》[COCA]杂志)中开发的。还评估了可用的L1语义上下文指数(即Touchstone Applied Sciences Associates[TASA]指数)。为了验证这些指标,他们被用来预测由人类评分者判断的二语作文质量分数。模型表明,由EF-CAMDAT和TASA发展而来的语义上下文指数,而不是COCA杂志指数,解释了词汇复杂度测量的独特差异。本研究表明,基于多层次语料库(包括二语语料库)的语义上下文指数可以提供二语作者对输入体验的有用表示,这可能有助于二语写作的自动评分。
{"title":"L2 and L1 semantic context indices as automated measures of lexical sophistication","authors":"Kátia Monteiro, S. Crossley, Robert-Mihai Botarleanu, M. Dascalu","doi":"10.1177/02655322221147924","DOIUrl":"https://doi.org/10.1177/02655322221147924","url":null,"abstract":"Lexical frequency benchmarks have been extensively used to investigate second language (L2) lexical sophistication, especially in language assessment studies. However, indices based on semantic co-occurrence, which may be a better representation of the experience language users have with lexical items, have not been sufficiently tested as benchmarks of lexical sophistication. To address this gap, we developed and tested indices based on semantic co-occurrence from two computational methods, namely, Latent Semantic Analysis and Word2Vec. The indices were developed from one L2 written corpus (i.e., EF Cambridge Open Language Database [EF-CAMDAT]) and one first language (L1) written corpus (i.e., Corpus of Contemporary American English [COCA] Magazine). Available L1 semantic context indices (i.e., Touchstone Applied Sciences Associates [TASA] indices) were also assessed. To validate the indices, they were used to predict L2 essay quality scores as judged by human raters. The models suggested that the semantic context indices developed from EF-CAMDAT and TASA, but not the COCA Magazine indices, explained unique variance in the presence of lexical sophistication measures. This study suggests that semantic context indices based on multi-level corpora, including L2 corpora, may provide a useful representation of the experience L2 writers have with input, which may assist with automatic scoring of L2 writing.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":null,"pages":null},"PeriodicalIF":4.1,"publicationDate":"2023-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47590675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Universal tools activation in English language proficiency assessments: A comparison of Grades 1–12 English learners with and without disabilities 通用工具在英语语言能力评估中的激活:1-12年级有残疾和无残疾英语学习者的比较
IF 4.1 1区 文学 Q1 Arts and Humanities Pub Date : 2023-02-02 DOI: 10.1177/02655322221149009
Ahyoung Alicia Kim, Meltem Yumsek, J. Kemp, Mark Chapman, H. Gary Cook
English learners (ELs) comprise approximately 10% of kindergarten to Grade 12 students in US public schools, with about 15% of ELs identified as having disabilities. English language proficiency (ELP) assessments must adhere to universal design principles and incorporate universal tools, designed to increase accessibility for all ELs, including those with disabilities. This two-phase mixed methods study examined the extent Grades 1–12 ELs with and without disabilities activated universal tools during an online ELP assessment: Color Overlay, Color Contrast, Help Tools, Line Guide, Highlighter, Magnifier, and Sticky Notes. In Phase 1, analyses were conducted on 1.25 million students’ test and telemetry data (record of keystrokes and clicks). Phase 2 involved interviewing 55 ELs after test administration. Findings show that ELs activated the Line Guide, Highlighter, and Magnifier more frequently than others. The tool activation rate was higher in listening and reading domains than in speaking and writing. A significantly higher percentage of ELs with disabilities activated the tools than ELs without disabilities, but effect sizes were small; interview findings further revealed students’ rationale for tool use. Results indicate differences in ELs’ activation of universal tools depending on their disability category and language domain, providing evidence for the usefulness of these tools.
在美国公立学校,英语学习者约占幼儿园至12年级学生的10%,其中约15%的英语学习者被认定为残疾。英语水平(ELP)评估必须遵循通用设计原则,并纳入通用工具,旨在提高包括残疾人在内的所有ELs的无障碍性。这项分两阶段的混合方法研究考察了1-12年级有残疾和无残疾的ELs在ELP在线评估中激活通用工具的程度:颜色叠加、颜色对比、帮助工具、线条指南、荧光笔、放大镜和贴纸。在第一阶段,对125万名学生的测试和遥测数据(击键和点击记录)进行了分析。第二阶段包括在试验给药后采访55名ELs。研究结果显示,EL比其他人更频繁地激活Line Guide、Highlighter和Magnifier。听力和阅读领域的工具激活率高于口语和写作领域。残疾ELs激活工具的比例明显高于无残疾ELs,但效果较小;访谈结果进一步揭示了学生使用工具的基本原理。结果表明,根据残疾类别和语言领域的不同,ELs对通用工具的激活存在差异,为这些工具的有用性提供了证据。
{"title":"Universal tools activation in English language proficiency assessments: A comparison of Grades 1–12 English learners with and without disabilities","authors":"Ahyoung Alicia Kim, Meltem Yumsek, J. Kemp, Mark Chapman, H. Gary Cook","doi":"10.1177/02655322221149009","DOIUrl":"https://doi.org/10.1177/02655322221149009","url":null,"abstract":"English learners (ELs) comprise approximately 10% of kindergarten to Grade 12 students in US public schools, with about 15% of ELs identified as having disabilities. English language proficiency (ELP) assessments must adhere to universal design principles and incorporate universal tools, designed to increase accessibility for all ELs, including those with disabilities. This two-phase mixed methods study examined the extent Grades 1–12 ELs with and without disabilities activated universal tools during an online ELP assessment: Color Overlay, Color Contrast, Help Tools, Line Guide, Highlighter, Magnifier, and Sticky Notes. In Phase 1, analyses were conducted on 1.25 million students’ test and telemetry data (record of keystrokes and clicks). Phase 2 involved interviewing 55 ELs after test administration. Findings show that ELs activated the Line Guide, Highlighter, and Magnifier more frequently than others. The tool activation rate was higher in listening and reading domains than in speaking and writing. A significantly higher percentage of ELs with disabilities activated the tools than ELs without disabilities, but effect sizes were small; interview findings further revealed students’ rationale for tool use. Results indicate differences in ELs’ activation of universal tools depending on their disability category and language domain, providing evidence for the usefulness of these tools.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":null,"pages":null},"PeriodicalIF":4.1,"publicationDate":"2023-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45570526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Linking scores from two written receptive English academic vocabulary tests—The VLT-Ac and the AVT 两个书面接受性英语学术词汇测试- VLT-Ac和AVT的连接分数
IF 4.1 1区 文学 Q1 Arts and Humanities Pub Date : 2023-01-12 DOI: 10.1177/02655322221145643
Marcus Warnby, Hans Malmström, Kajsa Yang Hansen
The academic section of the Vocabulary Levels Test (VLT-Ac) and the Academic Vocabulary Test (AVT) both assess meaning-recognition knowledge of written receptive academic vocabulary, deemed central for engagement in academic activities. Depending on the purpose and context of the testing, either of the tests can be appropriate, but for research and pedagogical purposes, it is important to be able to compare scores achieved on the two tests between administrations and within similar contexts. Based on a sample of 385 upper secondary school students in university-preparatory programs (independent CEFR B2-level users of English), this study presents a comparison model by linking the VLT-Ac and the AVT using concurrent calibration procedures in Item Response Theory. The key outcome of the study is a score comparison table providing a means for approximate score comparisons. Additionally, the study showcases a viable and valid method of comparing vocabulary scores from an older test with those from a newer one.
词汇水平测试(VLT-Ac)和学术词汇测试(AVT)的学术部分都评估书面接受性学术词汇的意义识别知识,这被认为是参与学术活动的核心。根据考试的目的和背景,这两种考试中的任何一种都是合适的,但为了研究和教学目的,能够在不同的行政部门和相似的背景下比较两种考试的成绩是很重要的。本研究以385名大学预科高中学生(独立的CEFR b2级英语使用者)为样本,利用项目反应理论中的同步校准程序,建立了VLT-Ac和AVT的比较模型。该研究的主要结果是一个分数比较表,提供了一种近似分数比较的方法。此外,该研究还展示了一种可行且有效的方法来比较旧测试和新测试的词汇分数。
{"title":"Linking scores from two written receptive English academic vocabulary tests—The VLT-Ac and the AVT","authors":"Marcus Warnby, Hans Malmström, Kajsa Yang Hansen","doi":"10.1177/02655322221145643","DOIUrl":"https://doi.org/10.1177/02655322221145643","url":null,"abstract":"The academic section of the Vocabulary Levels Test (VLT-Ac) and the Academic Vocabulary Test (AVT) both assess meaning-recognition knowledge of written receptive academic vocabulary, deemed central for engagement in academic activities. Depending on the purpose and context of the testing, either of the tests can be appropriate, but for research and pedagogical purposes, it is important to be able to compare scores achieved on the two tests between administrations and within similar contexts. Based on a sample of 385 upper secondary school students in university-preparatory programs (independent CEFR B2-level users of English), this study presents a comparison model by linking the VLT-Ac and the AVT using concurrent calibration procedures in Item Response Theory. The key outcome of the study is a score comparison table providing a means for approximate score comparisons. Additionally, the study showcases a viable and valid method of comparing vocabulary scores from an older test with those from a newer one.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":null,"pages":null},"PeriodicalIF":4.1,"publicationDate":"2023-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46502650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Measuring bilingual language dominance: An examination of the reliability of the Bilingual Language Profile 测量双语语言优势:双语语言概况可靠性的检验
IF 4.1 1区 文学 Q1 Arts and Humanities Pub Date : 2023-01-12 DOI: 10.1177/02655322221139162
Daniel J. Olson
Measuring language dominance, broadly defined as the relative strength of each of a bilingual’s two languages, remains a crucial methodological issue in bilingualism research. While various methods have been proposed, the Bilingual Language Profile (BLP) has been one of the most widely used tools for measuring language dominance. While previous studies have begun to establish its validity, the BLP has yet to be systematically evaluated with respect to reliability. Addressing this methodological gap, the current study examines the reliability of the BLP, employing a test–retest methodology with a large (N = 248), varied sample of Spanish–English bilinguals. Analysis focuses on the test–retest reliability of the overall dominance score, the dominant and non-dominant global language scores, and the subcomponent scores. The results demonstrate that the language dominance score produced by the BLP shows “excellent” levels of test–retest reliability. In addition, while some differences were found between the reliability of global language scores for the dominant and non-dominant languages, and for the different subcomponent scores, all components of the BLP display strong reliability. Taken as a whole, this study provides evidence for the reliability of BLP as a measure of bilingual language dominance.
衡量语言优势,广义上定义为双语者两种语言的相对强度,仍然是双语研究中的一个关键方法论问题。尽管已经提出了各种方法,但双语语言概况(BLP)一直是衡量语言优势的最广泛使用的工具之一。虽然先前的研究已经开始确定其有效性,但BLP的可靠性尚未得到系统评估。为了解决这一方法上的差距,本研究采用了一种测试-再测试方法,对BLP的可靠性进行了检验 = 248),西班牙语-英语双语者的不同样本。分析的重点是总体优势分数、优势和非优势全球语言分数以及子成分分数的测试-再测试可靠性。结果表明,BLP产生的语言优势分数显示出“优秀”的测试-再测试可靠性水平。此外,尽管优势语言和非优势语言的全局语言得分的可靠性以及不同的子成分得分之间存在一些差异,但BLP的所有成分都表现出较强的可靠性。总的来说,本研究为BLP作为衡量双语语言优势的可靠性提供了证据。
{"title":"Measuring bilingual language dominance: An examination of the reliability of the Bilingual Language Profile","authors":"Daniel J. Olson","doi":"10.1177/02655322221139162","DOIUrl":"https://doi.org/10.1177/02655322221139162","url":null,"abstract":"Measuring language dominance, broadly defined as the relative strength of each of a bilingual’s two languages, remains a crucial methodological issue in bilingualism research. While various methods have been proposed, the Bilingual Language Profile (BLP) has been one of the most widely used tools for measuring language dominance. While previous studies have begun to establish its validity, the BLP has yet to be systematically evaluated with respect to reliability. Addressing this methodological gap, the current study examines the reliability of the BLP, employing a test–retest methodology with a large (N = 248), varied sample of Spanish–English bilinguals. Analysis focuses on the test–retest reliability of the overall dominance score, the dominant and non-dominant global language scores, and the subcomponent scores. The results demonstrate that the language dominance score produced by the BLP shows “excellent” levels of test–retest reliability. In addition, while some differences were found between the reliability of global language scores for the dominant and non-dominant languages, and for the different subcomponent scores, all components of the BLP display strong reliability. Taken as a whole, this study provides evidence for the reliability of BLP as a measure of bilingual language dominance.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":null,"pages":null},"PeriodicalIF":4.1,"publicationDate":"2023-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45222587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Book Review: Reflecting on the Common European Framework of Reference for Languages and its companion volume 书评:反思欧洲语言参考框架及其配套卷
IF 4.1 1区 文学 Q1 Arts and Humanities Pub Date : 2023-01-04 DOI: 10.1177/02655322221144788
Claudia Harsch
Aryadoust, V., Ng, L. Y., & Sayama, H. (2020). A comprehensive review of Rasch measurement in language assessment: Recommendations and guidelines for research. Language Testing, 38(1), 6–40. https://doi.org/10.1177/0265532220927487 Berrío, Á. I., Gómez-Benito, J., & Arias-Patiño, E. M. (2020). Developments and trends in research on methods of detecting differential item functioning. Educational Research Review, 31, Article 100340. https://doi.org/10.1016/j.edurev.2020.100340 Choi, Y.-J., & Asilkalkan, A. (2019). R packages for item response theory analysis: Descriptions and features. Measurement: Interdisciplinary Research and Perspectives, 17(3), 168–175. https://doi.org/10.1080/15366367.2019.1586404 Desjardins, C. D., & Bulut, O. (2018). Handbook of educational measurement and psychometrics using R. CRC Press. https://doi.org/10.1201/b20498 Linacre, J. M. (2022a). Facets computer program for many-facet Rasch measurement (Version 3.84.0). Winsteps. Linacre, J. M. (2022b). Winsteps® Rasch measurement computer program (Version 5.3.1). Winsteps. Luo, Y., & Jiao, H. (2017). Using the Stan program for Bayesian item response theory. Educational and Psychological Measurement, 78(3), 384–408. https://doi.org/10.1177/0013164417693666 Nicklin, C., & Vitta, J. P. (2022). Assessing Rasch measurement estimation methods across R packages with yes/no vocabulary test data. Language Testing, 39(4), 513–540. https://doi. org/10.1177/02655322211066822 Yildiz, H. (2021). IrtGUI: Item response theory analysis with a graphic user interface (R Package Version 0.2). https://CRAN.R-project.org/package=irtGUI
Aryadoust,V.,Ng,L.Y.和Sayama,H.(2020)。语言评估中Rasch测量的全面综述:研究建议和指南。语言测试,38(1),6-40。https://doi.org/10.1177/0265532220927487Berrío,Á。I.、Gómez Benito,J.和Arias Patiño,E.M.(2020)。差异项目功能检测方法的研究进展和趋势。《教育研究评论》,31,第100340条。https://doi.org/10.1016/j.edurev.2020.100340Choi,Y.-J.和Asilkalkan,A.(2019)。项目反应理论分析的R包:描述和特点。测量:跨学科研究与展望,17(3),168-175。https://doi.org/10.1080/15366367.2019.1586404Desjardins,C.D.和Bulut,O.(2018)。使用R.CRC出版社的教育测量和心理测量手册。https://doi.org/10.1201/b20498Linacre,J.M.(2022a)。用于多方面Rasch测量的Facets计算机程序(版本3.84.0)。Winsteps。Linacre,J.M.(2022b)。Winsteps®Rasch测量计算机程序(5.3.1版)。Winsteps。罗,音,焦,H.(2017)。将Stan程序用于贝叶斯项目反应理论。教育和心理测量,78(3),384–408。https://doi.org/10.1177/0013164417693666Nicklin,C.和Vitta,J.P.(2022)。使用是/否词汇测试数据评估R包的Rasch测量估计方法。语言测试,39(4),513–540。https://doi.org/10.1177/026553222211066822 Yildiz,H.(2021)。IrtGUI:具有图形用户界面的项目响应理论分析(R Package版本0.2)。https://CRAN.R-project.org/package=irtGUI
{"title":"Book Review: Reflecting on the Common European Framework of Reference for Languages and its companion volume","authors":"Claudia Harsch","doi":"10.1177/02655322221144788","DOIUrl":"https://doi.org/10.1177/02655322221144788","url":null,"abstract":"Aryadoust, V., Ng, L. Y., & Sayama, H. (2020). A comprehensive review of Rasch measurement in language assessment: Recommendations and guidelines for research. Language Testing, 38(1), 6–40. https://doi.org/10.1177/0265532220927487 Berrío, Á. I., Gómez-Benito, J., & Arias-Patiño, E. M. (2020). Developments and trends in research on methods of detecting differential item functioning. Educational Research Review, 31, Article 100340. https://doi.org/10.1016/j.edurev.2020.100340 Choi, Y.-J., & Asilkalkan, A. (2019). R packages for item response theory analysis: Descriptions and features. Measurement: Interdisciplinary Research and Perspectives, 17(3), 168–175. https://doi.org/10.1080/15366367.2019.1586404 Desjardins, C. D., & Bulut, O. (2018). Handbook of educational measurement and psychometrics using R. CRC Press. https://doi.org/10.1201/b20498 Linacre, J. M. (2022a). Facets computer program for many-facet Rasch measurement (Version 3.84.0). Winsteps. Linacre, J. M. (2022b). Winsteps® Rasch measurement computer program (Version 5.3.1). Winsteps. Luo, Y., & Jiao, H. (2017). Using the Stan program for Bayesian item response theory. Educational and Psychological Measurement, 78(3), 384–408. https://doi.org/10.1177/0013164417693666 Nicklin, C., & Vitta, J. P. (2022). Assessing Rasch measurement estimation methods across R packages with yes/no vocabulary test data. Language Testing, 39(4), 513–540. https://doi. org/10.1177/02655322211066822 Yildiz, H. (2021). IrtGUI: Item response theory analysis with a graphic user interface (R Package Version 0.2). https://CRAN.R-project.org/package=irtGUI","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":null,"pages":null},"PeriodicalIF":4.1,"publicationDate":"2023-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48666199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Construct validity and fairness of an operational listening test with World Englishes 用《世界英语》构建一个操作性听力测试的效度和公平性
IF 4.1 1区 文学 Q1 Arts and Humanities Pub Date : 2023-01-04 DOI: 10.1177/02655322221137869
H. Nishizawa
In this study, I investigate the construct validity and fairness pertaining to the use of a variety of Englishes in listening test input. I obtained data from a post-entry English language placement test administered at a public university in the United States. In addition to expectedly familiar American English, the test features Hawai’i, Filipino, and Indian English, which are expectedly less familiar to our test takers, but justified by the context. I used confirmatory factor analysis to test whether the category of unfamiliar English items formed a latent factor distinct from the other category of more familiar American English items. I used Rasch-based differential item functioning analysis to examine item biases as a function of examinees’ place of origin. The results from the confirmatory factor analysis suggested that the unfamiliar English items tapped into the same underlying construct as the familiar English items. The Rasch-based differential item functioning analysis revealed many instances of item bias among unfamiliar English items with higher proportions of item biases for items targeting narrow comprehension than broad comprehension. However, at the test level, the unfamiliar English items did not substantially influence raw total scores. These findings offer support for using a variety of Englishes in listening tests.
在本研究中,我调查了在听力测试输入中使用各种英语的结构有效性和公平性。我从美国一所公立大学的入学后英语语言安置测试中获得了数据。除了预期熟悉的美国英语外,该测试还包括夏威夷语、菲律宾语和印度语英语,这些英语对我们的考生来说不太熟悉,但根据上下文可以证明是合理的。我使用验证性因素分析来测试不熟悉的英语项目类别是否形成了与更熟悉的美国英语项目其他类别不同的潜在因素。我使用基于Rasch的差异项目功能分析来检验作为考生原籍函数的项目偏差。验证性因素分析的结果表明,不熟悉的英语项目与熟悉的英语项具有相同的基本结构。基于Rasch的差异项目功能分析显示,在不熟悉的英语项目中,许多项目存在偏见,针对狭义理解的项目的项目偏见比例高于广义理解。然而,在测试水平上,不熟悉的英语项目并没有对原始总分产生实质性影响。这些发现为在听力测试中使用各种英语提供了支持。
{"title":"Construct validity and fairness of an operational listening test with World Englishes","authors":"H. Nishizawa","doi":"10.1177/02655322221137869","DOIUrl":"https://doi.org/10.1177/02655322221137869","url":null,"abstract":"In this study, I investigate the construct validity and fairness pertaining to the use of a variety of Englishes in listening test input. I obtained data from a post-entry English language placement test administered at a public university in the United States. In addition to expectedly familiar American English, the test features Hawai’i, Filipino, and Indian English, which are expectedly less familiar to our test takers, but justified by the context. I used confirmatory factor analysis to test whether the category of unfamiliar English items formed a latent factor distinct from the other category of more familiar American English items. I used Rasch-based differential item functioning analysis to examine item biases as a function of examinees’ place of origin. The results from the confirmatory factor analysis suggested that the unfamiliar English items tapped into the same underlying construct as the familiar English items. The Rasch-based differential item functioning analysis revealed many instances of item bias among unfamiliar English items with higher proportions of item biases for items targeting narrow comprehension than broad comprehension. However, at the test level, the unfamiliar English items did not substantially influence raw total scores. These findings offer support for using a variety of Englishes in listening tests.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":null,"pages":null},"PeriodicalIF":4.1,"publicationDate":"2023-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47354788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Test design and validity evidence of interactive speaking assessment in the era of emerging technologies 新兴技术时代交互式口语评价的测试设计与效度证据
IF 4.1 1区 文学 Q1 Arts and Humanities Pub Date : 2023-01-01 DOI: 10.1177/02655322221126606
Soo Jung Youn
As access to smartphones and emerging technologies has become ubiquitous in our daily lives and in language learning, technology-mediated social interaction has become common in teaching and assessing L2 speaking. The changing ecology of L2 spoken interaction provides language educators and testers with opportunities for renewed test design and the gathering of context-sensitive validity evidence of interactive speaking assessment. First, I review the current research on interactive speaking assessment focusing on commonly used test formats and types of validity evidence. Second, I discuss recent research that reports the use of artificial intelligence and technologies in teaching and assessing speaking in order to understand how and what evidence of interactive speaking is elicited. Based on the discussion, I argue that it is critical to identify what features of interactive speaking are elicited depending on the types of technology-mediated interaction for valid assessment decisions in relation to intended uses. I further discuss opportunities and challenges for future research on test design and eliciting validity evidence of interactive speaking using technology-mediated interaction.
随着智能手机和新兴技术在我们的日常生活和语言学习中无处不在,技术中介的社交互动在二语教学和评估中变得普遍。二语口语互动生态的变化为语言教育者和测试者提供了重新设计测试和收集互动口语评估的上下文敏感有效性证据的机会。首先,我回顾了目前对交互式口语评估的研究,重点是常用的测试格式和有效性证据类型。其次,我讨论了最近的一项研究,该研究报告了人工智能和技术在口语教学和评估中的应用,以了解互动口语是如何以及什么样的证据被引出的。基于讨论,我认为,至关重要的是,要确定互动演讲的哪些特征取决于技术中介的互动类型,才能做出与预期用途相关的有效评估决策。我进一步讨论了未来研究测试设计和利用技术中介互动获取互动演讲有效性证据的机会和挑战。
{"title":"Test design and validity evidence of interactive speaking assessment in the era of emerging technologies","authors":"Soo Jung Youn","doi":"10.1177/02655322221126606","DOIUrl":"https://doi.org/10.1177/02655322221126606","url":null,"abstract":"As access to smartphones and emerging technologies has become ubiquitous in our daily lives and in language learning, technology-mediated social interaction has become common in teaching and assessing L2 speaking. The changing ecology of L2 spoken interaction provides language educators and testers with opportunities for renewed test design and the gathering of context-sensitive validity evidence of interactive speaking assessment. First, I review the current research on interactive speaking assessment focusing on commonly used test formats and types of validity evidence. Second, I discuss recent research that reports the use of artificial intelligence and technologies in teaching and assessing speaking in order to understand how and what evidence of interactive speaking is elicited. Based on the discussion, I argue that it is critical to identify what features of interactive speaking are elicited depending on the types of technology-mediated interaction for valid assessment decisions in relation to intended uses. I further discuss opportunities and challenges for future research on test design and eliciting validity evidence of interactive speaking using technology-mediated interaction.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":null,"pages":null},"PeriodicalIF":4.1,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44244212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The vexing problem of validity and the future of second language assessment 恼人的效度问题与第二语言评估的未来
IF 4.1 1区 文学 Q1 Arts and Humanities Pub Date : 2023-01-01 DOI: 10.1177/02655322221125204
Vahid Aryadoust
Construct validity and building validity arguments are some of the main challenges facing the language assessment community. The notion of construct validity and validity arguments arose from research in psychological assessment and developed into the gold standard of validation/validity research in language assessment. At a theoretical level, construct validity and validity arguments conflate the scientific reasoning in assessment and policy matters of ethics. Thus, a test validator is expected to simultaneously serve the role of conducting scientific research and examining the consequential basis of assessments. I contend that validity investigations should be decoupled from the ethical and social aspects of assessment. In addition, the near-exclusive focus of empirical construct validity research on cognitive processing has not resulted in sufficient accuracy and replicability in predicting test takers’ performance in real language use domains. Accordingly, I underscore the significance of prediction in validation, in contrast to explanation, and propose that the question to ask might not so much be about what a test measures as what type of methods and tools can better generate language use profiles. Finally, I suggest that interdisciplinary alliances with cognitive and computational neuroscience and artificial intelligence (AI) fields should be forged to meet the demands of language assessment in the 21st century.
构建效度和建立效度论证是语言评估界面临的一些主要挑战。构念效度和效度论证的概念起源于心理评估研究,并发展成为语言评估效度和效度研究的金标准。在理论层面上,构建有效性和有效性论证将伦理评估和政策问题中的科学推理混为一谈。因此,测试验证者被期望同时服务于进行科学研究和检查评估的结果基础的角色。我认为,有效性调查应该与评估的伦理和社会方面脱钩。此外,经验构念效度研究几乎只关注认知加工,在预测考生在真实语言使用领域的表现方面缺乏足够的准确性和可复制性。因此,我强调了预测在验证中的重要性,而不是解释,并提出要问的问题可能不是关于测试测量什么,而是什么类型的方法和工具可以更好地生成语言使用概况。最后,我建议应该与认知和计算神经科学以及人工智能(AI)领域建立跨学科联盟,以满足21世纪语言评估的需求。
{"title":"The vexing problem of validity and the future of second language assessment","authors":"Vahid Aryadoust","doi":"10.1177/02655322221125204","DOIUrl":"https://doi.org/10.1177/02655322221125204","url":null,"abstract":"Construct validity and building validity arguments are some of the main challenges facing the language assessment community. The notion of construct validity and validity arguments arose from research in psychological assessment and developed into the gold standard of validation/validity research in language assessment. At a theoretical level, construct validity and validity arguments conflate the scientific reasoning in assessment and policy matters of ethics. Thus, a test validator is expected to simultaneously serve the role of conducting scientific research and examining the consequential basis of assessments. I contend that validity investigations should be decoupled from the ethical and social aspects of assessment. In addition, the near-exclusive focus of empirical construct validity research on cognitive processing has not resulted in sufficient accuracy and replicability in predicting test takers’ performance in real language use domains. Accordingly, I underscore the significance of prediction in validation, in contrast to explanation, and propose that the question to ask might not so much be about what a test measures as what type of methods and tools can better generate language use profiles. Finally, I suggest that interdisciplinary alliances with cognitive and computational neuroscience and artificial intelligence (AI) fields should be forged to meet the demands of language assessment in the 21st century.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":null,"pages":null},"PeriodicalIF":4.1,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48647839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Forty years of Language Testing, and the changing paths of publishing 四十年的语言测试,以及出版路径的变化
IF 4.1 1区 文学 Q1 Arts and Humanities Pub Date : 2023-01-01 DOI: 10.1177/02655322221136802
Paula M. Winke
{"title":"Forty years of Language Testing, and the changing paths of publishing","authors":"Paula M. Winke","doi":"10.1177/02655322221136802","DOIUrl":"https://doi.org/10.1177/02655322221136802","url":null,"abstract":"","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":null,"pages":null},"PeriodicalIF":4.1,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46083663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Epilogue—Note from an outgoing editor 结语——一位即将离任的编辑的注释
IF 4.1 1区 文学 Q1 Arts and Humanities Pub Date : 2023-01-01 DOI: 10.1177/02655322221138339
L. Harding
In this brief epilogue, outgoing editor Luke Harding reflects on his time as editor and considers the future Language Testing.
在这篇简短的结语中,即将离任的编辑Luke Harding回顾了他作为编辑的时光,并展望了未来的语言测试。
{"title":"Epilogue—Note from an outgoing editor","authors":"L. Harding","doi":"10.1177/02655322221138339","DOIUrl":"https://doi.org/10.1177/02655322221138339","url":null,"abstract":"In this brief epilogue, outgoing editor Luke Harding reflects on his time as editor and considers the future Language Testing.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":null,"pages":null},"PeriodicalIF":4.1,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45426305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Language Testing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1