MWE knowledge is key in the process of learning a foreign language, but its teaching remains hindered by the lack of list of expressions connected to pedagogical aims. In this paper, we present an extended version of the PolylexFLE database, containing 4,525 French multiword expressions (MWE) of three types: idioms, collocations or fixed expressions. In order to propose exercises following the difficulty scale of the European Framework of Reference for Languages (CEFR), we used a mixed approach (manual and automatic) to annotate 1,186 expressions according to the CEFR levels. The paper focuses mostly on the automatic procedure that first identifies the expressions from the PolylexFLE database (and their variants) in a corpus of pedagogical texts (with CEFR labels) using a pattern-based system. In a second step, their distribution in this corpus is estimated and transformed into a single CEFR level. The automatic approach proposed is finally evaluated by 52 French as foreign language learners.
{"title":"PolylexFLE","authors":"A. Todirascu, T. François, Marion Cargill","doi":"10.1075/itl.22031.tod","DOIUrl":"https://doi.org/10.1075/itl.22031.tod","url":null,"abstract":"\u0000 MWE knowledge is key in the process of learning a foreign language, but its teaching remains hindered by the lack\u0000 of list of expressions connected to pedagogical aims. In this paper, we present an extended version of the PolylexFLE database,\u0000 containing 4,525 French multiword expressions (MWE) of three types: idioms, collocations or fixed expressions. In order to propose\u0000 exercises following the difficulty scale of the European Framework of Reference for Languages (CEFR), we used a mixed approach\u0000 (manual and automatic) to annotate 1,186 expressions according to the CEFR levels. The paper focuses mostly on the automatic\u0000 procedure that first identifies the expressions from the PolylexFLE database (and their variants) in a corpus of pedagogical texts\u0000 (with CEFR labels) using a pattern-based system. In a second step, their distribution in this corpus is estimated and transformed\u0000 into a single CEFR level. The automatic approach proposed is finally evaluated by 52 French as foreign language learners.","PeriodicalId":510772,"journal":{"name":"ITL - International Journal of Applied Linguistics","volume":"19 13","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140739170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The study aims to demonstrate the procedure for constructing the CEFR-based Sentence Profile (CEFR-SP), a dataset with the CEFR levels assigned for sentences, and to identify the characteristics at each level. Basic statistics such as word length and sentence length are presented for each CEFR level for 7,511 carefully selected sentences, and statistical tests are conducted between adjacent levels to identify criterial features. The findings reveal significant differences in word length between adjacent levels, while word difficulty is not significant in discriminating levels at either end (A1–A2, C1–C2). Sentence length and depth are also not significant discriminators for higher levels (B2–C1, C1–C2). Notably, sentence-level data generally exhibit discriminative values compared to text-level statistics, indicating their direct capture of characteristics at each CEFR level.
{"title":"Profiling English sentences based on CEFR levels","authors":"Satoru Uchida, Yuki Arase, Tomoyuki Kajiwara","doi":"10.1075/itl.22018.uch","DOIUrl":"https://doi.org/10.1075/itl.22018.uch","url":null,"abstract":"\u0000 The study aims to demonstrate the procedure for constructing the CEFR-based Sentence Profile (CEFR-SP), a dataset\u0000 with the CEFR levels assigned for sentences, and to identify the characteristics at each level. Basic statistics such as word\u0000 length and sentence length are presented for each CEFR level for 7,511 carefully selected sentences, and statistical tests are\u0000 conducted between adjacent levels to identify criterial features. The findings reveal significant differences in word length\u0000 between adjacent levels, while word difficulty is not significant in discriminating levels at either end (A1–A2, C1–C2). Sentence\u0000 length and depth are also not significant discriminators for higher levels (B2–C1, C1–C2). Notably, sentence-level data generally\u0000 exhibit discriminative values compared to text-level statistics, indicating their direct capture of characteristics at each CEFR\u0000 level.","PeriodicalId":510772,"journal":{"name":"ITL - International Journal of Applied Linguistics","volume":" 7","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140217807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rocío Cuberos Vicente, E. Rosado Villegas, Iban Mañas Navarrete
Collocations have become increasingly important in our understanding of foreign language learning. When it comes to setting vocabulary learning goals, concerns about how to address collocations still arise today. This article explores the distribution of collocations in L1 and L2 Spanish production with the ultimate goal of informing the design of graded lexical inventories of multi-word combinations. To do so, we explore three defining properties of collocations in L1 and L2 production data, and across different levels of L2 proficiency: syntactic structure, semantic transparency, and the strength of association. Results indicate that there is an increase of collocational density and diversity, but that isolated features of collocations fail to predict L2 proficiency. Findings suggest the need to evaluate collocation use at a high level of granularity.
{"title":"Towards a graded lexical inventory of multi-word combinations","authors":"Rocío Cuberos Vicente, E. Rosado Villegas, Iban Mañas Navarrete","doi":"10.1075/itl.22021.cub","DOIUrl":"https://doi.org/10.1075/itl.22021.cub","url":null,"abstract":"\u0000Collocations have become increasingly important in our understanding of foreign language learning. When it comes to setting vocabulary learning goals, concerns about how to address collocations still arise today. This article explores the distribution of collocations in L1 and L2 Spanish production with the ultimate goal of informing the design of graded lexical inventories of multi-word combinations. To do so, we explore three defining properties of collocations in L1 and L2 production data, and across different levels of L2 proficiency: syntactic structure, semantic transparency, and the strength of association. Results indicate that there is an increase of collocational density and diversity, but that isolated features of collocations fail to predict L2 proficiency. Findings suggest the need to evaluate collocation use at a high level of granularity.","PeriodicalId":510772,"journal":{"name":"ITL - International Journal of Applied Linguistics","volume":"47 17","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140431236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elena Volodina, Yousuf Ali Mohammed, Therese Lindström Tiedemann
The article introduces a novel lexical resource for Swedish based on word family principles. The development of the Swedish Word Family (SweWF) resource is set into the context of linguistic complexity in second language acquisition. The SweWF is particularly appropriate for that, given that it contains lexical items used in second language corpora, namely, in a corpus of coursebook texts, and in a corpus of learner essays. The main focus of the article is on the construction of the resource with its user interface and on its applicability for research, although it also opens vast possibilities for practical applications for language learning, testing and assessment. We demonstrate the value of the resource through several case studies.
{"title":"Swedish word family resource","authors":"Elena Volodina, Yousuf Ali Mohammed, Therese Lindström Tiedemann","doi":"10.1075/itl.22026.vol","DOIUrl":"https://doi.org/10.1075/itl.22026.vol","url":null,"abstract":"\u0000 The article introduces a novel lexical resource for Swedish based on word family principles. The development of\u0000 the Swedish Word Family (SweWF) resource is set into the context of linguistic complexity in second language acquisition. The\u0000 SweWF is particularly appropriate for that, given that it contains lexical items used in second language corpora, namely, in a\u0000 corpus of coursebook texts, and in a corpus of learner essays. The main focus of the article is on the construction of the\u0000 resource with its user interface and on its applicability for research, although it also opens vast possibilities for practical\u0000 applications for language learning, testing and assessment. We demonstrate the value of the resource through several case\u0000 studies.","PeriodicalId":510772,"journal":{"name":"ITL - International Journal of Applied Linguistics","volume":"216 S685","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140428213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael Flor, Steven Holtzman, Paul Deane, Isaac Bejar
We describe a large-scale effort to map English-language vocabulary by U.S. school grade levels. Our motivation is to rapidly expand graded vocabulary resources for work with native English speakers in the USA, while taking into consideration school-related influences rather than relying on just the corpus-frequency approaches. We report on the initial effort of data collection, with mapping of about 22K word forms. We provide comparisons of this mapping to some other recent vocabulary mapping efforts, such as age-of-acquisition. We then describe the efforts to automatically expand this resource by using linguistically motivated variables and corpus-based methods. Our current resource maps more than 126K English word forms to US school grade levels. We also compare a subset of our L1 mapped data to English L2 vocabulary levels, as expressed on the CEFR scale, and find that there is a considerable overlap in the order of vocabulary learning in L1 and L2 English.
{"title":"Mapping of American English vocabulary by grade levels","authors":"Michael Flor, Steven Holtzman, Paul Deane, Isaac Bejar","doi":"10.1075/itl.22025.flo","DOIUrl":"https://doi.org/10.1075/itl.22025.flo","url":null,"abstract":"\u0000 We describe a large-scale effort to map English-language vocabulary by U.S. school grade levels. Our motivation is\u0000 to rapidly expand graded vocabulary resources for work with native English speakers in the USA, while taking into consideration\u0000 school-related influences rather than relying on just the corpus-frequency approaches. We report on the initial effort of data\u0000 collection, with mapping of about 22K word forms. We provide comparisons of this mapping to some other recent vocabulary mapping\u0000 efforts, such as age-of-acquisition. We then describe the efforts to automatically expand this resource by using linguistically\u0000 motivated variables and corpus-based methods. Our current resource maps more than 126K English word forms to US school grade\u0000 levels. We also compare a subset of our L1 mapped data to English L2 vocabulary levels, as expressed on the CEFR scale, and find\u0000 that there is a considerable overlap in the order of vocabulary learning in L1 and L2 English.","PeriodicalId":510772,"journal":{"name":"ITL - International Journal of Applied Linguistics","volume":"204 S619","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140428283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study investigated the extent to which two recall test formats – contextualized and decontextualized tests – affected productive recall of derivatives, and how the effects of token frequencies of derivatives and L2 receptive vocabulary knowledge on recalling derivatives was moderated by test format. Mixed effects logistic regression models examined the derivatives elicited from L1 (n = 21) and L2 English speakers’ (n = 107) on the two recall tests. Results indicated that contextual cues significantly facilitated recalling derivatives, while such facilitative effects were larger for native speakers and L2 learners with greater vocabulary knowledge. Furthermore, token frequency affected the responses on the decontextualized test to a greater degree compared to the contextualized test. Results suggest that test format influences test-takers’ ability to recall knowledge to produce derivatives.
{"title":"The effect of test format on productive recall of derivatives","authors":"Emi Iwaizumi, Stuart Webb","doi":"10.1075/itl.23002.iwa","DOIUrl":"https://doi.org/10.1075/itl.23002.iwa","url":null,"abstract":"This study investigated the extent to which two recall test formats – contextualized and decontextualized tests – affected productive recall of derivatives, and how the effects of token frequencies of derivatives and L2 receptive vocabulary knowledge on recalling derivatives was moderated by test format. Mixed effects logistic regression models examined the derivatives elicited from L1 (n = 21) and L2 English speakers’ (n = 107) on the two recall tests. Results indicated that contextual cues significantly facilitated recalling derivatives, while such facilitative effects were larger for native speakers and L2 learners with greater vocabulary knowledge. Furthermore, token frequency affected the responses on the decontextualized test to a greater degree compared to the contextualized test. Results suggest that test format influences test-takers’ ability to recall knowledge to produce derivatives.","PeriodicalId":510772,"journal":{"name":"ITL - International Journal of Applied Linguistics","volume":"38 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139182859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}