Categorising speakers’ language background: Theoretical assumptions and methodological challenges for learner corpus research

Research Methods in Applied Linguistics Pub Date : 2025-04-01 Epub Date: 2024-11-25 DOI:10.1016/j.rmal.2024.100170

Olga Lopopolo , Arianna Bienati , Jennifer-Carmen Frey , Aivars Glaznieks , Stefania Spina

{"title":"Categorising speakers’ language background: Theoretical assumptions and methodological challenges for learner corpus research","authors":"Olga Lopopolo , Arianna Bienati , Jennifer-Carmen Frey , Aivars Glaznieks , Stefania Spina","doi":"10.1016/j.rmal.2024.100170","DOIUrl":null,"url":null,"abstract":"<div><div>In this article, we investigate how speakers can be categorised based on their language background in the field of Learner Corpus Research (LCR). Specifically, we discuss three key aspects: first, the theoretical assumptions and methodological choices made in learner corpus design, second the integration of a holistic perspective for speaker categorisation in LCR and third the consequences that different categorisations might have on study outcomes. Through a comprehensive review of corpora used in the field, we identify the most common terms, definitions and criteria of categorisation used to describe a speaker's language background. Focusing on the most central metadata encoding language backgrounds, the <em>L1</em> metadata, we inspect different operationalisations made and scrutinise the theoretical assumptions underlying them. Drawing on research on plurilingualism, we propose a holistic view of speaker's language background for Learner Corpus Research, combining various aspects of speaker's language use by methods inspired from the Dominant Language Constellation framework. We apply this methodology to re-evaluate the language categorisation system in LEONIDE, a multilingual corpus of Italian, German and English texts from secondary school students of diverse language backgrounds. We use the same corpus to evaluate the consequences of using different categorisations of the students on the outcome of possible linguistic studies. Despite a generally high overlap between study results across categorisations, we observe that variables combining multiple aspects of the speakers’ language backgrounds seem to explain group differences for more of the linguistic features investigated.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 1","pages":"Article 100170"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research Methods in Applied Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772766124000764","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/25 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In this article, we investigate how speakers can be categorised based on their language background in the field of Learner Corpus Research (LCR). Specifically, we discuss three key aspects: first, the theoretical assumptions and methodological choices made in learner corpus design, second the integration of a holistic perspective for speaker categorisation in LCR and third the consequences that different categorisations might have on study outcomes. Through a comprehensive review of corpora used in the field, we identify the most common terms, definitions and criteria of categorisation used to describe a speaker's language background. Focusing on the most central metadata encoding language backgrounds, the L1 metadata, we inspect different operationalisations made and scrutinise the theoretical assumptions underlying them. Drawing on research on plurilingualism, we propose a holistic view of speaker's language background for Learner Corpus Research, combining various aspects of speaker's language use by methods inspired from the Dominant Language Constellation framework. We apply this methodology to re-evaluate the language categorisation system in LEONIDE, a multilingual corpus of Italian, German and English texts from secondary school students of diverse language backgrounds. We use the same corpus to evaluate the consequences of using different categorisations of the students on the outcome of possible linguistic studies. Despite a generally high overlap between study results across categorisations, we observe that variables combining multiple aspects of the speakers’ language backgrounds seem to explain group differences for more of the linguistic features investigated.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

发言者语言背景分类：学习者语料库研究的理论假设和方法挑战

在本文中，我们将研究在学习者语料库研究（LCR）领域如何根据语言背景对说话者进行分类。具体来说，我们讨论了三个关键方面：第一，学习者语料库设计中的理论假设和方法选择；第二，LCR 中说话者分类的整体视角的整合；第三，不同分类对研究结果可能产生的影响。通过对该领域使用的语料库进行全面审查，我们确定了用于描述说话者语言背景的最常用术语、定义和分类标准。我们以编码语言背景的最核心元数据（L1 元数据）为重点，检查了不同的操作方法，并仔细研究了这些方法所依据的理论假设。借鉴多语制研究，我们为学习者语料库研究提出了关于说话人语言背景的整体观点，将说话人语言使用的各个方面通过受主导语言构成框架启发的方法结合起来。我们运用这种方法重新评估了 LEONIDE 中的语言分类系统，LEONIDE 是一个多语言语料库，包含来自不同语言背景的中学生的意大利语、德语和英语文本。我们使用同一语料库来评估使用不同的学生分类对可能的语言学研究结果的影响。尽管不同分类的研究结果之间普遍存在较高的重叠性，但我们发现，结合说话者多方面语言背景的变量似乎可以解释更多被调查语言特征的群体差异。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Research Methods in Applied Linguistics

CiteScore

4.10

自引率

0.00%

发文量