首页 > 最新文献

International Journal of Learner Corpus Research最新文献

英文 中文
Announcing changes to our editorial team and editorial board 宣布我们的编辑团队和编辑委员会的变动
IF 1.1 Pub Date : 2021-10-11 DOI: 10.1075/ijlcr.00020.edi
{"title":"Announcing changes to our editorial team and editorial board","authors":"","doi":"10.1075/ijlcr.00020.edi","DOIUrl":"https://doi.org/10.1075/ijlcr.00020.edi","url":null,"abstract":"","PeriodicalId":29715,"journal":{"name":"International Journal of Learner Corpus Research","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2021-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44561060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
fsca fsca
IF 1.1 Pub Date : 2021-10-11 DOI: 10.1075/ijlcr.20018.van
Nathan Vandeweerd
This article reports on an open-source R package for the extraction of syntactic units from dependency-parsed French texts. To evaluate the reliability of the package, syntactic units were extracted from a corpus of L2 French and were compared to units extracted manually from the same corpus. The f-score of the extracted units ranged from 0.53–0.97. Although units were not always identical between the two methods, manual and automatically-derived syntactic complexity measures were strongly and significantly correlated (ρ = 0.62–0.97, p < 0.001), suggesting that this package may be a suitable replacement for manual annotation in some cases where manual annotation is not possible but that care should be used in interpreting the measures based on these units.
本文介绍了一个开源R包,用于从依赖项解析的法语文本中提取语法单元。为了评估包的可靠性,从L2法语语料库中提取句法单位,并将其与从同一语料库中手动提取的单位进行比较。提取单位的f值范围为0.53 ~ 0.97。尽管两种方法之间的单位并不总是相同的,但手工和自动派生的语法复杂性度量是强烈且显著相关的(ρ = 0.62-0.97,p < 0.001),这表明在某些情况下,手工注释是不可能的,但在解释基于这些单位的度量时应该小心使用,这个包可能是手动注释的合适替代品。
{"title":"fsca","authors":"Nathan Vandeweerd","doi":"10.1075/ijlcr.20018.van","DOIUrl":"https://doi.org/10.1075/ijlcr.20018.van","url":null,"abstract":"\u0000 This article reports on an open-source R package for the extraction of syntactic units from dependency-parsed\u0000 French texts. To evaluate the reliability of the package, syntactic units were extracted from a corpus of L2 French and were\u0000 compared to units extracted manually from the same corpus. The f-score of the extracted units ranged from 0.53–0.97. Although\u0000 units were not always identical between the two methods, manual and automatically-derived syntactic complexity measures were\u0000 strongly and significantly correlated (ρ = 0.62–0.97, p < 0.001), suggesting that this\u0000 package may be a suitable replacement for manual annotation in some cases where manual annotation is not possible but that care\u0000 should be used in interpreting the measures based on these units.","PeriodicalId":29715,"journal":{"name":"International Journal of Learner Corpus Research","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2021-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43471896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
The interphonology of contemporary English corpus (IPCE-IPAC) 当代英语语料库的间音学研究
IF 1.1 Pub Date : 2021-10-11 DOI: 10.1075/ijlcr.20010.her
Nadine Herry-Bénit, Stéphanie Lopez, Takeki Kamiyama, J. Tennant
This article presents the IPCE-IPAC corpus, an ongoing project, which has been collected in France, Italy, Spain and China since 2014. The data is collected to investigate the acquisition of segmental and suprasegmental phenomena by L2 learners of English, with a focus on phonemes. The article discusses the methods for the compilation of this original spoken learner corpus, designed to study L2 “interphonology” (Detey, Racine, Kawaguchi, & Zay, 2016), or interlanguage phonology.
本文介绍了IPCE-IPAC语料库,这是一个正在进行的项目,自2014年以来在法国、意大利、西班牙和中国收集。本研究旨在探讨二语英语学习者的语段和超语段习得现象,并以音素为重点。本文讨论了编写这一原始口语学习者语料库的方法,该语料库旨在研究第二语言的“中间音系”(Detey, Racine, Kawaguchi, & Zay, 2016)或中间语音系。
{"title":"The interphonology of contemporary English corpus (IPCE-IPAC)","authors":"Nadine Herry-Bénit, Stéphanie Lopez, Takeki Kamiyama, J. Tennant","doi":"10.1075/ijlcr.20010.her","DOIUrl":"https://doi.org/10.1075/ijlcr.20010.her","url":null,"abstract":"\u0000 This article presents the IPCE-IPAC corpus, an ongoing project, which has been collected in France, Italy, Spain\u0000 and China since 2014. The data is collected to investigate the acquisition of segmental and suprasegmental phenomena by L2\u0000 learners of English, with a focus on phonemes. The article discusses the methods for the compilation of this original spoken\u0000 learner corpus, designed to study L2 “interphonology” (Detey, Racine, Kawaguchi, & Zay,\u0000 2016), or interlanguage phonology.","PeriodicalId":29715,"journal":{"name":"International Journal of Learner Corpus Research","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2021-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41700012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Applying phraseological complexity measures to L2 French 二语法语短语复杂性测度的应用
IF 1.1 Pub Date : 2021-10-11 DOI: 10.1075/ijlcr.20015.van
Nathan Vandeweerd, Alex Housen, M. Paquot
This study partially replicates Paquot’s (2018, 2019) study of phraseological complexity in L2 English by investigating how phraseological complexity compares across proficiency levels as well as how phraseological complexity measures relate to lexical, syntactic and morphological complexity measures in a corpus of L2 French argumentative essays. Phraseological complexity is operationalized as the diversity (root type-token ratio; RTTR) and sophistication (pointwise mutual information; PMI) of three types of grammatical dependencies: adjectival modifiers, adverbial modifiers and direct objects. Results reveal a significant increase in the mean PMI of direct objects and the RTTR of adjectival modifiers across proficiency levels. In addition to phraseological sophistication, important predictors of proficiency include measures of lexical diversity, lexical sophistication, syntactic (phrasal) complexity and morphological complexity. The results provide cross-linguistic validation for the results of Paquot (2018, 2019) and further highlight the importance of including phraseological measures in the current repertoire of L2 complexity measures.
本研究部分复制了Paquot(20182019)对二语英语中短语复杂性的研究,通过调查二语法语议论文语料库中短语复杂性在不同水平之间的比较,以及短语复杂性测量与词汇、句法和形态复杂性测量之间的关系。短语复杂性可操作为三种类型的语法依赖的多样性(词根类型表征比;RTTR)和复杂性(逐点互信息;PMI):形容词修饰语、状语修饰语和直接宾语。结果显示,在不同的熟练程度上,直接宾语的平均PMI和形容词修饰语的RTTR显著增加。除了短语复杂度之外,熟练度的重要预测因素还包括词汇多样性、词汇复杂度、句法(短语)复杂性和形态复杂性。该结果为Paquot(20182019)的结果提供了跨语言验证,并进一步强调了将短语学测量纳入当前二语复杂性测量的重要性。
{"title":"Applying phraseological complexity measures to L2 French","authors":"Nathan Vandeweerd, Alex Housen, M. Paquot","doi":"10.1075/ijlcr.20015.van","DOIUrl":"https://doi.org/10.1075/ijlcr.20015.van","url":null,"abstract":"\u0000 This study partially replicates Paquot’s (2018, 2019) study of phraseological complexity in L2 English by investigating how phraseological complexity\u0000 compares across proficiency levels as well as how phraseological complexity measures relate to lexical, syntactic and\u0000 morphological complexity measures in a corpus of L2 French argumentative essays. Phraseological complexity is operationalized as\u0000 the diversity (root type-token ratio; RTTR) and sophistication (pointwise mutual information; PMI) of three\u0000 types of grammatical dependencies: adjectival modifiers, adverbial modifiers and direct objects. Results reveal a significant\u0000 increase in the mean PMI of direct objects and the RTTR of adjectival modifiers across proficiency levels. In\u0000 addition to phraseological sophistication, important predictors of proficiency include measures of lexical diversity, lexical\u0000 sophistication, syntactic (phrasal) complexity and morphological complexity. The results provide cross-linguistic validation for\u0000 the results of Paquot (2018, 2019) and\u0000 further highlight the importance of including phraseological measures in the current repertoire of L2 complexity measures.","PeriodicalId":29715,"journal":{"name":"International Journal of Learner Corpus Research","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2021-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46967053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Lexical diversity in an L2 Spanish learner corpus 二语西班牙语学习者语料库中的词汇多样性
IF 1.1 Pub Date : 2021-10-11 DOI: 10.1075/ijlcr.20017.fer
Paloma Fernández-Mira, Emily Morgan, Sam Davidson, Aaron Yamada, Agustina Carando, Kenji Sagae, C. Sánchez-Gutiérrez
This study examines the impact of two topic-related variables (i.e., valence polarity and everyday-life closeness) on the lexical diversity scores (i.e., MTLD) of learners of L2 Spanish at different proficiency levels. The analysis included 3,045 texts written in response to two pairs of prompts by 1,165 students enrolled in an L2 Spanish program. The first pair of prompts asked learners to narrate an event: prompt 1 focused on a perfect vacation (positive event), while prompt 2 asked participants to tell a terrible story (negative event). The second pair asked to describe a person: prompt 1 required that the subject be famous, thus not close to the writer, whereas prompt 2 required that the subject be special and close to the writer. Results indicate that lexical diversity scores were higher for the texts written about the positive event and the famous subject across all proficiency levels.
本研究考察了两个与主题相关的变量(效价极性和日常生活亲密度)对不同熟练程度的二语西班牙语学习者词汇多样性得分(即MTLD)的影响。这项分析包括了1165名参加第二语言西班牙语课程的学生根据两对提示写的3045篇文章。第一组提示要求学习者叙述一个事件:提示1关注一个完美的假期(积极事件),而提示2要求参与者讲述一个糟糕的故事(消极事件)。第二组要求描述一个人:提示1要求主题很有名,因此与作者不亲近,而提示2要求主题很特别,与作者亲近。结果表明,在不同水平的学生中,积极事件文本和著名主题文本的词汇多样性得分较高。
{"title":"Lexical diversity in an L2 Spanish learner corpus","authors":"Paloma Fernández-Mira, Emily Morgan, Sam Davidson, Aaron Yamada, Agustina Carando, Kenji Sagae, C. Sánchez-Gutiérrez","doi":"10.1075/ijlcr.20017.fer","DOIUrl":"https://doi.org/10.1075/ijlcr.20017.fer","url":null,"abstract":"\u0000 This study examines the impact of two topic-related variables (i.e., valence polarity and everyday-life closeness)\u0000 on the lexical diversity scores (i.e., MTLD) of learners of L2 Spanish at different proficiency levels. The analysis included\u0000 3,045 texts written in response to two pairs of prompts by 1,165 students enrolled in an L2 Spanish program. The first pair of\u0000 prompts asked learners to narrate an event: prompt 1 focused on a perfect vacation (positive event), while prompt 2 asked\u0000 participants to tell a terrible story (negative event). The second pair asked to describe a person: prompt 1 required that the\u0000 subject be famous, thus not close to the writer, whereas prompt 2 required that the subject be special and close to the writer.\u0000 Results indicate that lexical diversity scores were higher for the texts written about the positive event and the famous subject\u0000 across all proficiency levels.","PeriodicalId":29715,"journal":{"name":"International Journal of Learner Corpus Research","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2021-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45369561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Review of Schilk (2020): Language Processing in Advanced Learners of English: A Multi-method Approach to Collocation Based on Corpus Linguistics and Experimental Data Schilk评论(2020):高级英语学习者的语言处理:基于语料库语言学和实验数据的多方法搭配方法
IF 1.1 Pub Date : 2021-10-11 DOI: 10.1075/ijlcr.00021.gar
J. Garner
{"title":"Review of Schilk (2020): Language Processing in Advanced Learners of English: A Multi-method Approach to Collocation Based on Corpus Linguistics and Experimental Data","authors":"J. Garner","doi":"10.1075/ijlcr.00021.gar","DOIUrl":"https://doi.org/10.1075/ijlcr.00021.gar","url":null,"abstract":"","PeriodicalId":29715,"journal":{"name":"International Journal of Learner Corpus Research","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2021-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43328644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Review of Götz & Mukherjee (2019): Learner Corpora and Language Teaching Götz&Mukherjee评论(2019):学习型语料库与语言教学
IF 1.1 Pub Date : 2021-10-11 DOI: 10.1075/ijlcr.00022.ran
T. Rankin
{"title":"Review of Götz & Mukherjee (2019): Learner Corpora and Language Teaching","authors":"T. Rankin","doi":"10.1075/ijlcr.00022.ran","DOIUrl":"https://doi.org/10.1075/ijlcr.00022.ran","url":null,"abstract":"","PeriodicalId":29715,"journal":{"name":"International Journal of Learner Corpus Research","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2021-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41602419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Natural language processing for learner corpus research 用于学习者语料库研究的自然语言处理
IF 1.1 Pub Date : 2021-02-15 DOI: 10.1075/ijlcr.00019.int
K. Kyle
{"title":"Natural language processing for learner corpus research","authors":"K. Kyle","doi":"10.1075/ijlcr.00019.int","DOIUrl":"https://doi.org/10.1075/ijlcr.00019.int","url":null,"abstract":"","PeriodicalId":29715,"journal":{"name":"International Journal of Learner Corpus Research","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2021-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46021370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Machine learning for learner English 学习英语的机器学习
IF 1.1 Pub Date : 2020-04-14 DOI: 10.1075/ijlcr.18012.bal
Nicolas Ballier, S. Canu, C. Petitjean, G. Gasso, C. Balhana, T. Alexopoulou, Thomas Gaillat
This paper discusses machine learning techniques for the prediction of Common European Framework of Reference (CEFR) levels in a learner corpus. We summarise the CAp 2018 Machine Learning (ML) competition, a classification task of the six CEFR levels, which map linguistic competence in a foreign language onto six reference levels. The goal of this competition was to produce a machine learning system to predict learners’ competence levels from written productions comprising between 20 and 300 words and a set of characteristics computed for each text extracted from the French component of the EFCAMDAT data (Geertzen et al., 2013). Together with the description of the competition, we provide an analysis of the results and methods proposed by the participants and discuss the benefits of this kind of competition for the learner corpus research (LCR) community. The main findings address the methods used and lexical bias introduced by the task.
本文讨论了在学习者语料库中预测欧洲通用参考框架(CEFR)水平的机器学习技术。我们总结了CAp 2018机器学习(ML)竞赛,这是一项由六个CEFR级别组成的分类任务,将外语的语言能力映射到六个参考级别上。本次比赛的目标是开发一个机器学习系统,根据书面作品预测学习者的能力水平,该书面作品包括20至300个单词,以及从EFCAMDAT数据的法语部分提取的每一篇文本计算的一组特征(Geertzen等人,2013)。在描述比赛的同时,我们对参与者提出的结果和方法进行了分析,并讨论了这种比赛对学习者语料库研究(LCR)社区的好处。主要研究结果涉及任务所使用的方法和引入的词汇偏见。
{"title":"Machine learning for learner English","authors":"Nicolas Ballier, S. Canu, C. Petitjean, G. Gasso, C. Balhana, T. Alexopoulou, Thomas Gaillat","doi":"10.1075/ijlcr.18012.bal","DOIUrl":"https://doi.org/10.1075/ijlcr.18012.bal","url":null,"abstract":"\u0000 This paper discusses machine learning techniques for the prediction of Common European Framework of Reference (CEFR)\u0000 levels in a learner corpus. We summarise the CAp 2018 Machine Learning (ML) competition, a\u0000 classification task of the six CEFR levels, which map linguistic competence in a foreign language onto six reference levels. The goal of\u0000 this competition was to produce a machine learning system to predict learners’ competence levels from written productions comprising between\u0000 20 and 300 words and a set of characteristics computed for each text extracted from the French component of the EFCAMDAT data (Geertzen et al., 2013). Together with the description of the competition, we provide an analysis of\u0000 the results and methods proposed by the participants and discuss the benefits of this kind of competition for the learner corpus research\u0000 (LCR) community. The main findings address the methods used and lexical bias introduced by the task.","PeriodicalId":29715,"journal":{"name":"International Journal of Learner Corpus Research","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2020-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42428715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Corpus-based Approaches to Spoken L2 Production 基于语料库的第二语言口语生成方法
IF 1.1 Pub Date : 2019-09-24 DOI: 10.1075/ijlcr.00008.int
V. Brezina, Dana Gablasova, Tony McEnery
From the perspective of the compilers, a corpus is a journey. This particular journey – the process of the design and compilation of the Trinity Lancaster Corpus (TLC), the largest spoken learner corpus of (interactive) English to date – took over five years. It involved more than 3,500 hours of transcription time1 with many more hours spent on quality checking and post-processing of the data. This simple statistic shows why learner corpora of spoken language are still relatively rare, despite the fact that they provide a unique insight into spontaneous language production (McEnery, Brezina, Gablasova & Banerjee 2019). While the advances in computational technology allow better data processing and more efficient analysis, the starting point of a spoken (learner) corpus is still the recording of speech and its manual transcription. This method is considerably more reliable in capturing the details of spoken language than any existing voice recognition system. This is true for spoken L1 (McEnery 2018) as well as spoken L2 data (Gilquin 2015). The difference between the performance of an experienced transcriber and a state-ofthe-art automated system is immediately obvious from the comparison shown in Table 1. For meaningful linguistic analysis, only the sample transcript shown on the left (from the TLC) is suitable as it represents an accurate account of the spoken production. Building a spoken learner corpus is thus a resource-intensive project. The compilation of the TLC was made possible by research collaboration between Lancaster University and Trinity College London, a major international testing board. The project was supported by the Economic and Social Research Council (ESRC) and Trinity College London.2
从编纂者的角度来看,语料库是一段旅程。这段特殊的旅程——三一兰开斯特语料库(TLC)的设计和编译过程——花了五年多的时间,这是迄今为止最大的(交互式)英语口语学习者语料库。它涉及3500多个小时的转录时间1,其中更多的时间用于数据的质量检查和后处理。这个简单的统计数据表明了为什么口语学习者语料库仍然相对罕见,尽管它们为自发的语言产生提供了独特的见解(McEnery,Brezina,Gablasova和Banerjee 2019)。虽然计算技术的进步允许更好的数据处理和更有效的分析,但口语(学习者)语料库的起点仍然是语音的记录及其手动转录。这种方法在捕捉口语细节方面比任何现有的语音识别系统都要可靠得多。口语L1(McEnery 2018)和口语L2数据(Gilquin 2015)都是如此。从表1中所示的比较中,经验丰富的转录器和现有技术的自动化系统的性能之间的差异是显而易见的。为了进行有意义的语言分析,只有左边(TLC)显示的样本转录本是合适的,因为它代表了对口语产生的准确描述。因此,建立口语学习者语料库是一个资源密集型项目。兰开斯特大学和伦敦三一学院(一个主要的国际测试委员会)之间的研究合作使TLC的编制成为可能。该项目得到了经济和社会研究委员会(ESRC)和伦敦三一学院的支持。2
{"title":"Corpus-based Approaches to Spoken L2 Production","authors":"V. Brezina, Dana Gablasova, Tony McEnery","doi":"10.1075/ijlcr.00008.int","DOIUrl":"https://doi.org/10.1075/ijlcr.00008.int","url":null,"abstract":"From the perspective of the compilers, a corpus is a journey. This particular journey – the process of the design and compilation of the Trinity Lancaster Corpus (TLC), the largest spoken learner corpus of (interactive) English to date – took over five years. It involved more than 3,500 hours of transcription time1 with many more hours spent on quality checking and post-processing of the data. This simple statistic shows why learner corpora of spoken language are still relatively rare, despite the fact that they provide a unique insight into spontaneous language production (McEnery, Brezina, Gablasova & Banerjee 2019). While the advances in computational technology allow better data processing and more efficient analysis, the starting point of a spoken (learner) corpus is still the recording of speech and its manual transcription. This method is considerably more reliable in capturing the details of spoken language than any existing voice recognition system. This is true for spoken L1 (McEnery 2018) as well as spoken L2 data (Gilquin 2015). The difference between the performance of an experienced transcriber and a state-ofthe-art automated system is immediately obvious from the comparison shown in Table 1. For meaningful linguistic analysis, only the sample transcript shown on the left (from the TLC) is suitable as it represents an accurate account of the spoken production. Building a spoken learner corpus is thus a resource-intensive project. The compilation of the TLC was made possible by research collaboration between Lancaster University and Trinity College London, a major international testing board. The project was supported by the Economic and Social Research Council (ESRC) and Trinity College London.2","PeriodicalId":29715,"journal":{"name":"International Journal of Learner Corpus Research","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2019-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48005646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
International Journal of Learner Corpus Research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1