首页 > 最新文献

Applied Corpus Linguistics最新文献

英文 中文
AI-generated vs human-authored texts: A multidimensional comparison 人工智能生成的文本与人类撰写的文本:多维比较
Pub Date : 2023-12-20 DOI: 10.1016/j.acorp.2023.100083
Tony Berber Sardinha

The goal of this study is to assess the degree of resemblance between texts generated by artificial intelligence (GPT) and (written and spoken) texts produced by human individuals in real-world settings. A comparative analysis was conducted along the five main dimensions of variation that Biber (1988) identified. The findings revealed significant disparities between AI-generated and human-authored texts, with the AI-generated texts generally failing to exhibit resemblance to their human counterparts. Furthermore, a linear discriminant analysis, performed to measure the predictive potential of dimension scores for identifying the authorship of texts, demonstrated that AI-generated texts could be identified with relative ease based on their multidimensional profile. Collectively, the results underscore the current limitations of AI text generation in emulating natural human communication. This finding counters popular fears that AI will replace humans in textual communication. Rather, our findings suggest that, at present, AI's ability to capture the intricate patterns of natural language remains limited.

本研究的目的是评估人工智能生成的文本(GPT)与人类在现实世界中生成的(书面和口语)文本之间的相似程度。我们按照 Biber(1988 年)确定的五个主要变化维度进行了比较分析。研究结果表明,人工智能生成的文本与人类撰写的文本之间存在明显差异,人工智能生成的文本通常无法表现出与人类文本的相似性。此外,为测量维度分数在识别文本作者身份方面的预测潜力而进行的线性判别分析表明,人工智能生成的文本可以根据其多维特征相对容易地识别出来。总之,这些结果强调了目前人工智能文本生成在模拟人类自然交流方面的局限性。这一发现反驳了人们对人工智能将在文本交流中取代人类的担忧。相反,我们的研究结果表明,目前人工智能捕捉自然语言复杂模式的能力仍然有限。
{"title":"AI-generated vs human-authored texts: A multidimensional comparison","authors":"Tony Berber Sardinha","doi":"10.1016/j.acorp.2023.100083","DOIUrl":"10.1016/j.acorp.2023.100083","url":null,"abstract":"<div><p>The goal of this study is to assess the degree of resemblance between texts generated by artificial intelligence (GPT) and (written and spoken) texts produced by human individuals in real-world settings. A comparative analysis was conducted along the five main dimensions of variation that Biber (1988) identified. The findings revealed significant disparities between AI-generated and human-authored texts, with the AI-generated texts generally failing to exhibit resemblance to their human counterparts. Furthermore, a linear discriminant analysis, performed to measure the predictive potential of dimension scores for identifying the authorship of texts, demonstrated that AI-generated texts could be identified with relative ease based on their multidimensional profile. Collectively, the results underscore the current limitations of AI text generation in emulating natural human communication. This finding counters popular fears that AI will replace humans in textual communication. Rather, our findings suggest that, at present, AI's ability to capture the intricate patterns of natural language remains limited.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 1","pages":"Article 100083"},"PeriodicalIF":0.0,"publicationDate":"2023-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799123000436/pdfft?md5=eec63f0662cd28b0d80ac041ac33eae7&pid=1-s2.0-S2666799123000436-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139026627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prototype-by-component analysis: A corpus-based, intensional approach to ordinary meaning in statutory interpretation 按成分分析原型:基于语料库的成文法解释普通含义方法
Pub Date : 2023-12-20 DOI: 10.1016/j.acorp.2023.100078
Jesse Egbert , Thomas R. Lee

When faced with a word or phrase that is not defined in a statute, judges generally interpret the language of the law as it is likely to be understood by an ordinary user of the language. However, there is little agreement about what ordinary meaning is and how it can be determined. Proponents of corpus-based legal interpretation argue that corpora provide scientific rigor and increased validity and transparency, but there is currently no consensus on best practices for legal corpus linguistics. Our objective in this paper is to propose some refinements to the theory of ordinary meaning and corpus-based methods of analyzing it. We argue that the scope of legal language is established by conceptual (intensional) meaning, and not limited to attested referents. Yet, most current corpus-based approaches are purely referential (extensional). Therefore, we introduce a new methodology—prototype by component (PBC) analysisin which we bring together aspects of the componential approach and prototype theory by assuming that categories are gradient entities that are characterized by gradient semantic components. We introduce the analytical steps in PBC analysis and apply them to Nix v. Hedden (1893) to determine whether tomato is a member of the category vegetable. We conclude that conceptual categories have a prototypical reality and a componential reality. As a result, attested referents in a corpus can provide insights into the conceptual meaning of terms and the degree to which concepts are members of categories.

面对成文法中没有定义的单词或短语,法官通常会按照普通语言使用者的理解来解释法律语言。然而,对于什么是普通含义以及如何确定普通含义,人们的看法并不一致。基于语料库的法律解释的支持者认为,语料库提供了科学的严谨性,提高了有效性和透明度,但目前对法律语料库语言学的最佳实践还没有达成共识。我们在本文中的目标是对普通意义理论和基于语料库的分析方法提出一些改进建议。我们认为,法律语言的范围是由概念(内涵)意义确定的,而不局限于有据可查的所指。然而,目前大多数基于语料库的方法都是纯指代(外延)的。因此,我们引入了一种新的方法--原型成分(PBC)分析法,通过假设范畴是由梯度语义成分表征的梯度实体,将成分方法和原型理论的各个方面结合起来。我们介绍了 PBC 分析法的分析步骤,并将其应用于 Nix v. Hedden 案(1893 年),以确定番茄是否属于蔬菜类别。我们的结论是,概念范畴具有原型现实和成分现实。因此,语料库中的有据可查的指代可以让我们深入了解术语的概念含义以及概念在多大程度上是范畴的成员。
{"title":"Prototype-by-component analysis: A corpus-based, intensional approach to ordinary meaning in statutory interpretation","authors":"Jesse Egbert ,&nbsp;Thomas R. Lee","doi":"10.1016/j.acorp.2023.100078","DOIUrl":"10.1016/j.acorp.2023.100078","url":null,"abstract":"<div><p>When faced with a word or phrase that is not defined in a statute, judges generally interpret the language of the law as it is likely to be understood by an ordinary user of the language. However, there is little agreement about what ordinary meaning is and how it can be determined. Proponents of corpus-based legal interpretation argue that corpora provide scientific rigor and increased validity and transparency, but there is currently no consensus on best practices for legal corpus linguistics. Our objective in this paper is to propose some refinements to the theory of ordinary meaning and corpus-based methods of analyzing it. We argue that the scope of legal language is established by conceptual (<em>intensional</em>) meaning, and not limited to attested referents. Yet, most current corpus-based approaches are purely referential (<em>extensional</em>). Therefore, we introduce a new methodology—<em>prototype by component (PBC)</em> analysis<em>—</em>in which we bring together aspects of the componential approach and prototype theory by assuming that categories are gradient entities that are characterized by gradient semantic components. We introduce the analytical steps in PBC analysis and apply them to <em>Nix v. Hedden</em> (1893) to determine whether <em>tomato</em> is a member of the category vegetable. We conclude that conceptual categories have a prototypical reality and a componential reality. As a result, attested referents in a corpus can provide insights into the conceptual meaning of terms and the degree to which concepts are members of categories.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 1","pages":"Article 100078"},"PeriodicalIF":0.0,"publicationDate":"2023-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799123000382/pdfft?md5=f402bdd08e64a2ca946fa7003eabe040&pid=1-s2.0-S2666799123000382-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139014688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corpus-linguistic approaches to lexical statutory meaning: Extensionalist vs. intensionalist approaches 词汇法定意义的语料库语言学方法:外延主义与内涵主义方法
Pub Date : 2023-12-19 DOI: 10.1016/j.acorp.2023.100079
Stefan Th. Gries, Brian G. Slocum, Kevin Tobia

Scholars and practitioners interested in legal interpretation have become increasingly interested in corpus-linguistic methodology. Lee and Mouritsen (2018) developed and helped popularize the use of concordancing and collocate displays (of mostly COCA and COHA) to operationalize a central notion in legal interpretation, the ordinary meaning of expressions. This approach provides a good first approximation but is ultimately limited. Here, we outline an approach to ordinary meaning that is intensionalist (i.e., 'feature-based'), top-down, and informed by the notion of cue validity in prototype theory. The key advantages of this approach are that (i) it avoids the which-value-on-a-dimension problem of extensionalist approaches, (ii) it provides quantifiable prototypicality values for things whose membership status in a category is in question, and (iii) it can be extended even to cases for which no textual data are yet available. We exemplify the approach with two case studies that offer the option of utilizing survey data and/or word embeddings trained on corpora by deriving cue validities from word similarities. We exemplify this latter approach with the word vehicle on the basis of (i) an embedding model trained on 840 billion words crawled from the web, but now also with the more realistic application (in terms of corpus size and time frame) of (ii) an embedding model trained on the 1950s time slice of COHA to address the question to what degree Segways, which didn't exist in the 1950s, qualify as vehicles in this intensional approach.

对法律解释感兴趣的学者和从业人员对语料库语言学方法越来越感兴趣。Lee 和 Mouritsen(2018 年)开发并帮助普及了使用协词和搭配显示(主要是 COCA 和 COHA)来操作法律解释中的一个核心概念--表达的普通意义。这种方法提供了一个良好的初步近似,但终究是有限的。在此,我们概述了一种普通意义的方法,这种方法是内向主义的(即 "基于特征")、自上而下的,并借鉴了原型理论中的线索有效性概念。这种方法的主要优势在于:(i) 它避免了外延主义方法中的 "维度上的值 "问题;(ii) 它为那些在某个类别中的成员地位受到质疑的事物提供了可量化的原型性值;(iii) 它甚至可以扩展到尚无文本数据的情况。我们通过两个案例研究来说明这种方法,这两个案例研究提供了利用调查数据和/或在语料库中通过从词语相似性中推导线索有效性来训练词语嵌入的选项。我们以 "车辆 "一词为例,说明了后一种方法:(i) 基于从网络中抓取的 8400 亿个单词训练的嵌入模型,但现在也更现实地应用了(在语料库规模和时间框架方面)(ii) 基于 COHA 的 20 世纪 50 年代时间片训练的嵌入模型,以解决 20 世纪 50 年代并不存在的赛格威在多大程度上符合这种内向方法中的车辆的问题。
{"title":"Corpus-linguistic approaches to lexical statutory meaning: Extensionalist vs. intensionalist approaches","authors":"Stefan Th. Gries,&nbsp;Brian G. Slocum,&nbsp;Kevin Tobia","doi":"10.1016/j.acorp.2023.100079","DOIUrl":"https://doi.org/10.1016/j.acorp.2023.100079","url":null,"abstract":"<div><p>Scholars and practitioners interested in legal interpretation have become increasingly interested in corpus-linguistic methodology. <span>Lee and Mouritsen (2018)</span> developed and helped popularize the use of concordancing and collocate displays (of mostly COCA and COHA) to operationalize a central notion in legal interpretation, the <strong>ordinary meaning</strong> of expressions. This approach provides a good first approximation but is ultimately limited. Here, we outline an approach to ordinary meaning that is <strong>intensionalist</strong> (i.e., 'feature-based'), top-down, and informed by the notion of <strong>cue validity in prototype theory</strong>. The key advantages of this approach are that (i) it avoids the which-value-on-a-dimension problem of extensionalist approaches, (ii) it provides quantifiable prototypicality values for things whose membership status in a category is in question, and (iii) it can be extended even to cases for which no textual data are yet available. We exemplify the approach with two case studies that offer the option of utilizing survey data and/or word embeddings trained on corpora by deriving cue validities from word similarities. We exemplify this latter approach with the word <em>vehicle</em> on the basis of (i) an embedding model trained on 840 billion words crawled from the web, but now also with the more realistic application (in terms of corpus size and time frame) of (ii) an embedding model trained on the 1950s time slice of COHA to address the question to what degree Segways, which didn't exist in the 1950s, qualify as vehicles in this intensional approach.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 1","pages":"Article 100079"},"PeriodicalIF":0.0,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799123000394/pdfft?md5=fffa64c5cf04e01a22d462ddb9e4441e&pid=1-s2.0-S2666799123000394-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139099518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generative AI for corpus approaches to discourse studies: A critical evaluation of ChatGPT 用于语料库话语研究方法的生成式人工智能:对 ChatGPT 的批判性评估
Pub Date : 2023-12-19 DOI: 10.1016/j.acorp.2023.100082
Niall Curry , Paul Baker , Gavin Brookes

This paper explores the potential of generative artificial intelligence technology, specifically ChatGPT, for advancing corpus approaches to discourse studies. The contribution of artificial intelligence technologies to linguistics research has been transformational, both in the contexts of corpus linguistics and discourse analysis. However, shortcomings in the efficacy of such technologies for conducting automated qualitative analysis have limited their utility for corpus approaches to discourse studies. Acknowledging that new technologies in data analysis can replace and supplement existing approaches, and in view of the potential affordances of ChatGPT for automated qualitative analysis, this paper presents three replication case studies designed to investigate the applicability of ChatGPT for supporting automated qualitative analysis within studies using corpus approaches to discourse analysis.

The findings indicate that, generally, ChatGPT performs reasonably well when semantically categorising keywords; however, as the categorisation is based on decontextualised keywords, the categories can appear quite generic, limiting the value of such an approach for analysing corpora representing specialised genres and/or contexts. For concordance analysis, ChatGPT performs poorly, as the results include false inferences about the concordance lines and, at times, modifications of the input data. Finally, for function-to-form analysis, ChatGPT also performs poorly, as it fails to identify and analyse direct and indirect questions. Overall, the results raise questions about the affordances of ChatGPT for supporting automated qualitative analysis within corpus approaches to discourse studies, signalling issues of repeatability and replicability, ethical challenges surrounding data integrity, and the challenges associated with using non-deterministic technology for empirical linguistic research.

本文探讨了生成式人工智能技术(特别是 ChatGPT)在推动语料库方法用于话语研究方面的潜力。无论是在语料库语言学还是在话语分析方面,人工智能技术对语言学研究的贡献都是变革性的。然而,人工智能技术在进行自动定性分析方面的不足限制了其在语料库研究中的应用。鉴于数据分析中的新技术可以替代和补充现有方法,并考虑到 ChatGPT 在自动定性分析中的潜在能力,本文介绍了三项复制案例研究,旨在调查 ChatGPT 在使用语料库方法进行话语分析的研究中支持自动定性分析的适用性。研究结果表明,一般来说,ChatGPT 在对关键词进行语义分类时表现相当不错;但是,由于分类是基于非语境化的关键词进行的,因此分类可能会显得相当通用,从而限制了这种方法在分析代表专门流派和/或语境的语料库时的价值。ChatGPT 在协和分析方面表现不佳,因为其结果包括对协和行的错误推断,有时还会修改输入数据。最后,在功能到形式分析方面,ChatGPT 的表现也很差,因为它无法识别和分析直接和间接问题。总之,研究结果对 ChatGPT 在语料库方法中支持自动定性分析的能力提出了质疑,表明了可重复性和可复制性问题、围绕数据完整性的伦理挑战以及使用非确定性技术进行实证语言学研究的相关挑战。
{"title":"Generative AI for corpus approaches to discourse studies: A critical evaluation of ChatGPT","authors":"Niall Curry ,&nbsp;Paul Baker ,&nbsp;Gavin Brookes","doi":"10.1016/j.acorp.2023.100082","DOIUrl":"10.1016/j.acorp.2023.100082","url":null,"abstract":"<div><p>This paper explores the potential of generative artificial intelligence technology, specifically ChatGPT, for advancing corpus approaches to discourse studies. The contribution of artificial intelligence technologies to linguistics research has been transformational, both in the contexts of corpus linguistics and discourse analysis. However, shortcomings in the efficacy of such technologies for conducting automated qualitative analysis have limited their utility for corpus approaches to discourse studies. Acknowledging that new technologies in data analysis can replace and supplement existing approaches, and in view of the potential affordances of ChatGPT for automated qualitative analysis, this paper presents three replication case studies designed to investigate the applicability of ChatGPT for supporting automated qualitative analysis within studies using corpus approaches to discourse analysis.</p><p>The findings indicate that, generally, ChatGPT performs reasonably well when semantically categorising keywords; however, as the categorisation is based on decontextualised keywords, the categories can appear quite generic, limiting the value of such an approach for analysing corpora representing specialised genres and/or contexts. For concordance analysis, ChatGPT performs poorly, as the results include false inferences about the concordance lines and, at times, modifications of the input data. Finally, for function-to-form analysis, ChatGPT also performs poorly, as it fails to identify and analyse direct and indirect questions. Overall, the results raise questions about the affordances of ChatGPT for supporting automated qualitative analysis within corpus approaches to discourse studies, signalling issues of repeatability and replicability, ethical challenges surrounding data integrity, and the challenges associated with using non-deterministic technology for empirical linguistic research.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 1","pages":"Article 100082"},"PeriodicalIF":0.0,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799123000424/pdfft?md5=ae9708bc5113ac915574372c9ad6a9d7&pid=1-s2.0-S2666799123000424-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139023094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Erratum regarding missing Declaration of Competing Interest statements in previously published articles 关于先前发表的文章中缺少竞争利益声明的勘误表
Pub Date : 2023-12-01 DOI: 10.1016/j.acorp.2023.100071
{"title":"Erratum regarding missing Declaration of Competing Interest statements in previously published articles","authors":"","doi":"10.1016/j.acorp.2023.100071","DOIUrl":"https://doi.org/10.1016/j.acorp.2023.100071","url":null,"abstract":"","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 3","pages":"Article 100071"},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S266679912300031X/pdfft?md5=b062715ba46158ca342b354088c8e319&pid=1-s2.0-S266679912300031X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138484866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring early L2 writing development through the lens of grammatical complexity 从语法复杂性的角度探讨早期二语写作的发展
Pub Date : 2023-10-30 DOI: 10.1016/j.acorp.2023.100077
Tove Larsson , Tony Berber Sardinha , Bethany Gray , Douglas Biber

The present study explores the development of grammatical complexity in L2 English writing at the beginner, lower intermediate, and upper intermediate levels to see (i) to what extent the developmental stages proposed in Biber et al. (2011) are evident in low-proficiency L2 writing, and if so, what the patterns of progression are, and (ii) whether students gradually move away from speech-like production toward more advanced written production. We use data from COBRA, a corpus of L1 Brazilian Portuguese learner production, along with BR-ICLE and BR-LINDSEI. All the data were tagged using the Biber tagger (Biber, 1988) and the Developmental Complexity tagger (Gray et al., 2019), and subsequently analyzed using a technique developed in Staples et al. (2022) to quantify developmental profiles across levels. The technique considers not only overall change in frequency across levels, but also the incremental variation across each adjacent level (based on % frequency changes). The results show that the features were infrequent overall, with a majority of both clausal and phrasal features exhibiting an increase in frequency across the levels, albeit to varying degrees. This general pattern is contrary to predictions based on findings from previous studies, which found phrasal features increasing in use and clausal features decreasing in use. Nonetheless, for the features associated with each developmental stage, the frequencies generally increased, becoming more similar to advanced written production and more dissimilar to spoken production, as hypothesized in Biber et al. (2011).

本研究探讨了第二语言英语写作在初级、中低和中高水平时语法复杂性的发展,以了解(i) Biber等人(2011)提出的发展阶段在低熟练程度的第二语言写作中有多明显,如果是这样,发展模式是什么,以及(ii)学生是否逐渐从类似言语的生产转向更高级的书面生产。我们使用的数据来自COBRA,这是一个L1巴西葡萄牙语学习者的语料库,以及BR-ICLE和BR-LINDSEI。所有数据都使用Biber标记器(Biber, 1988)和发育复杂性标记器(Gray等人,2019)进行标记,随后使用Staples等人(2022)开发的技术进行分析,以量化各水平的发育概况。该技术不仅考虑了各级频率的总体变化,而且还考虑了每个相邻级别的增量变化(基于%频率变化)。结果表明,这些特征总体上并不常见,大多数小句和短语特征在各个级别上都表现出频率的增加,尽管程度不同。这种普遍模式与基于先前研究结果的预测相反,先前的研究发现短语特征的使用增加,小句特征的使用减少。尽管如此,正如Biber等人(2011)所假设的那样,对于与每个发展阶段相关的特征,频率普遍增加,与高级书面生产更相似,与口语生产更不同。
{"title":"Exploring early L2 writing development through the lens of grammatical complexity","authors":"Tove Larsson ,&nbsp;Tony Berber Sardinha ,&nbsp;Bethany Gray ,&nbsp;Douglas Biber","doi":"10.1016/j.acorp.2023.100077","DOIUrl":"https://doi.org/10.1016/j.acorp.2023.100077","url":null,"abstract":"<div><p>The present study explores the development of grammatical complexity in L2 English writing at the beginner, lower intermediate, and upper intermediate levels to see (i) to what extent the developmental stages proposed in Biber et al. (2011) are evident in low-proficiency L2 writing, and if so, what the patterns of progression are, and (ii) whether students gradually move away from speech-like production toward more advanced written production. We use data from COBRA, a corpus of L1 Brazilian Portuguese learner production, along with BR-ICLE and BR-LINDSEI. All the data were tagged using the Biber tagger (Biber, 1988) and the Developmental Complexity tagger (Gray et al., 2019), and subsequently analyzed using a technique developed in Staples et al. (2022) to quantify developmental profiles across levels. The technique considers not only overall change in frequency across levels, but also the incremental variation across each adjacent level (based on % frequency changes). The results show that the features were infrequent overall, with a majority of both clausal and phrasal features exhibiting an increase in frequency across the levels, albeit to varying degrees. This general pattern is contrary to predictions based on findings from previous studies, which found phrasal features increasing in use and clausal features <em>decreasing</em> in use. Nonetheless, for the features associated with each developmental stage, the frequencies generally increased, becoming more similar to advanced written production and more dissimilar to spoken production, as hypothesized in Biber et al. (2011).</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 3","pages":"Article 100077"},"PeriodicalIF":0.0,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91989988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effective corpus use in second language learning: A meta-analytic approach 第二语言学习中语料库的有效使用:一种元分析方法
Pub Date : 2023-10-21 DOI: 10.1016/j.acorp.2023.100076
Shotaro Ueno , Osamu Takeuchi

Data-driven learning (DDL) refers to the use of corpora by second and foreign language (L2) learners to explore and inductively discover patterns of their target language use from authentic language data without interventions from others. Although previous meta-analyses have demonstrated the positive effects of DDL on L2 learning (Boulton and Cobb, 2017), the number of empirical studies has been increasing since then. Therefore, this study included more recent studies and used meta-analyses to examine the extent to which: (1) DDL exerts an effect on L2 learning; and (2) moderator variables affect DDL's influence on L2 learning. The results demonstrated small to medium effect sizes for experimental/control group comparisons and pre/post and pre/delayed designs. Moreover, the moderator analyses found that moderator variables, such as publication types, learners’ factors, and research designs, influence the magnitude of DDL effectiveness in L2 learning.

数据驱动学习(data -driven learning, DDL)是指第二语言和外语学习者在没有他人干预的情况下,利用语料库从真实的语言数据中探索和归纳发现目标语言的使用模式。虽然之前的荟萃分析已经证明了DDL对第二语言学习的积极影响(Boulton和Cobb, 2017),但自那以后,实证研究的数量一直在增加。因此,本研究纳入了最近的研究,并使用荟萃分析来检验:(1)DDL对二语学习的影响程度;(2)调节变量影响DDL对二语学习的影响。结果表明,实验组/对照组比较、前/后和前/延迟设计的效应大小为小到中等。此外,调节变量分析发现,调节变量,如出版物类型、学习者因素和研究设计,会影响第二语言学习中DDL有效性的大小。
{"title":"Effective corpus use in second language learning: A meta-analytic approach","authors":"Shotaro Ueno ,&nbsp;Osamu Takeuchi","doi":"10.1016/j.acorp.2023.100076","DOIUrl":"https://doi.org/10.1016/j.acorp.2023.100076","url":null,"abstract":"<div><p>Data-driven learning (DDL) refers to the use of corpora by second and foreign language (L2) learners to explore and inductively discover patterns of their target language use from authentic language data without interventions from others. Although previous meta-analyses have demonstrated the positive effects of DDL on L2 learning (Boulton and Cobb, 2017), the number of empirical studies has been increasing since then. Therefore, this study included more recent studies and used meta-analyses to examine the extent to which: (1) DDL exerts an effect on L2 learning; and (2) moderator variables affect DDL's influence on L2 learning. The results demonstrated small to medium effect sizes for experimental/control group comparisons and pre/post and pre/delayed designs. Moreover, the moderator analyses found that moderator variables, such as publication types, learners’ factors, and research designs, influence the magnitude of DDL effectiveness in L2 learning.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 3","pages":"Article 100076"},"PeriodicalIF":0.0,"publicationDate":"2023-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91957142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using corpus linguistics to create tasks for teaching and assessing Aeronautical English 运用语料库语言学创建航空英语教学和评估任务
Pub Date : 2023-10-11 DOI: 10.1016/j.acorp.2023.100075
Aline Pacheco , Angela Carolina de Moraes Garcia , Ana Lúcia Tavares Monteiro , Malila Carvalho de Almeida Prado , Patrícia Tosqui-Lucks

This article presents the theoretical basis for corpus linguistics applied to Aeronautical English teaching and assessment followed by practical examples on how to use corpora to develop tasks for both purposes. It originates from the design of two webinars held remotely at the end of 2020, and promoted by the International Civil Aviation English Association. The webinars were targeted at Aeronautical English teachers, material designers, and test developers with little or no previous knowledge of corpus linguistics with the aim of guiding the audience in preparing step–by–step tasks using corpora. We share the work involved in the task design suggested, bridging the gap between research and practice. We conclude by outlining limitations, and suggesting prospects for future research.

本文介绍了语料库语言学应用于航空英语教学和评估的理论基础,并举例说明了如何利用语料库来开发航空英语教学和评估任务。它源于2020年底远程举办的两场网络研讨会的设计,并由国际民航英语协会推动。网络研讨会针对的是航空英语教师、材料设计师和测试开发人员,他们之前对语料库语言学知之甚少或一无所知,目的是指导听众使用语料库准备一步一步的任务。我们分享任务设计建议所涉及的工作,弥合研究与实践之间的差距。最后,我们概述了局限性,并提出了未来研究的展望。
{"title":"Using corpus linguistics to create tasks for teaching and assessing Aeronautical English","authors":"Aline Pacheco ,&nbsp;Angela Carolina de Moraes Garcia ,&nbsp;Ana Lúcia Tavares Monteiro ,&nbsp;Malila Carvalho de Almeida Prado ,&nbsp;Patrícia Tosqui-Lucks","doi":"10.1016/j.acorp.2023.100075","DOIUrl":"https://doi.org/10.1016/j.acorp.2023.100075","url":null,"abstract":"<div><p><span>This article presents the theoretical basis for corpus linguistics applied to Aeronautical English teaching and assessment followed by practical examples on how to use corpora to develop tasks for both purposes. It originates from the design of two webinars held remotely at the end of 2020, and promoted by the International </span>Civil Aviation English Association. The webinars were targeted at Aeronautical English teachers, material designers, and test developers with little or no previous knowledge of corpus linguistics with the aim of guiding the audience in preparing step–by–step tasks using corpora. We share the work involved in the task design suggested, bridging the gap between research and practice. We conclude by outlining limitations, and suggesting prospects for future research.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 3","pages":"Article 100075"},"PeriodicalIF":0.0,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49863545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lexical change and stability in 100 years of English in US newspapers 美国报纸100年来英语词汇的变化与稳定
Pub Date : 2023-09-08 DOI: 10.1016/j.acorp.2023.100073
Robert Poole , Qudus Ayinde Adebayo

This study explores diachronic variation across approximately one hundred years of the newspaper register in US American English from 1920 to 2019 as captured in the Corpus of Historical American English (Davies, 2010). Informed by a similar study of lexical change in British English (Baker, 2011), the analysis identified high-frequency words exhibiting the greatest increases and decreases in use as well as those words demonstrating stability across the four sampling periods: 1920–29, 1950–59, 1980–89, 2010–19. The process to identify words of change and stability began first with the application of a cumulative frequency threshold; coefficient of variance and Kendall's Tau correlation coefficient were then calculated to aid in identification. In other words, the process targeted high-frequency words whose use has demonstrated the greatest change or stability. The discussion presents the three resulting word lists (increasing, decreasing, stable) and reports concordance and collocation analysis of select words from each list to gain insight into the underlying factors informing lexical change and stability.

本研究探讨了美国历史英语语料库(Davies, 2010)中从1920年到2019年的美国英语中大约100年的报纸语域的历时变化。根据对英式英语词汇变化的类似研究(Baker, 2011),该分析发现,在1920 - 1929年、1950 - 1959年、1980 - 1989年、2010 - 2019年四个采样时期,高频词的使用增加和减少幅度最大,而高频词的使用则保持稳定。识别变化和稳定词的过程首先从应用累积频率阈值开始;然后计算方差系数和肯德尔Tau相关系数以帮助识别。换句话说,这个过程针对的是使用变化最大或最稳定的高频词。讨论了三个结果词表(增加,减少,稳定),并报告了从每个列表中选择的单词的一致性和搭配分析,以深入了解影响词汇变化和稳定的潜在因素。
{"title":"Lexical change and stability in 100 years of English in US newspapers","authors":"Robert Poole ,&nbsp;Qudus Ayinde Adebayo","doi":"10.1016/j.acorp.2023.100073","DOIUrl":"10.1016/j.acorp.2023.100073","url":null,"abstract":"<div><p>This study explores diachronic variation across approximately one hundred years of the newspaper register in US American English from 1920 to 2019 as captured in the Corpus of Historical American English (Davies, 2010). Informed by a similar study of lexical change in British English (Baker, 2011), the analysis identified high-frequency words exhibiting the greatest increases and decreases in use as well as those words demonstrating stability across the four sampling periods: 1920–29, 1950–59, 1980–89, 2010–19. The process to identify words of change and stability began first with the application of a cumulative frequency threshold; coefficient of variance and Kendall's Tau correlation coefficient were then calculated to aid in identification. In other words, the process targeted high-frequency words whose use has demonstrated the greatest change or stability. The discussion presents the three resulting word lists (increasing, decreasing, stable) and reports concordance and collocation analysis of select words from each list to gain insight into the underlying factors informing lexical change and stability.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 3","pages":"Article 100073"},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46738896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data-driven Learning Meets Generative AI: Introducing the Framework of Metacognitive Resource Use 数据驱动学习与生成式人工智能:引入元认知资源使用框架
Pub Date : 2023-09-07 DOI: 10.1016/j.acorp.2023.100074
Atsushi Mizumoto

This paper explores the intersection of data-driven learning (DDL) and generative AI (GenAI), represented by technologies like ChatGPT, in the realm of language learning and teaching. It presents two complementary perspectives on how to integrate these approaches. The first viewpoint advocates for a blended methodology that synergizes DDL and GenAI, capitalizing on their complementary strengths while offsetting their individual limitations. The second introduces the Metacognitive Resource Use (MRU) framework, a novel paradigm that positions DDL within an expansive ecosystem of language resources, which also includes GenAI tools. Anchored in the foundational principles of metacognition, the MRU framework centers on two pivotal dimensions: metacognitive knowledge and metacognitive regulation. The paper proposes pedagogical recommendations designed to enable learners to strategically utilize a wide range of language resources, from corpora to GenAI technologies, guided by their self-awareness, the specifics of the task, and relevant strategies. The paper concludes by highlighting promising avenues for future research, notably the empirical assessment of both the integrated DDL-GenAI approach and the MRU framework.

本文探讨了以ChatGPT等技术为代表的数据驱动学习(DDL)和生成式人工智能(GenAI)在语言学习和教学领域的交叉。它就如何整合这些方法提出了两个互补的观点。第一种观点提倡一种混合方法,使DDL和GenAI协同工作,利用它们的互补优势,同时抵消它们各自的局限性。第二部分介绍了元认知资源使用(MRU)框架,这是一种新的范式,将DDL定位在一个广泛的语言资源生态系统中,其中也包括GenAI工具。MRU框架以元认知的基本原理为基础,以两个关键维度为中心:元认知知识和元认知调节。本文提出了教学建议,旨在使学习者能够在自我意识、任务细节和相关策略的指导下,战略性地利用广泛的语言资源,从语料库到GenAI技术。本文最后强调了未来研究的有希望的途径,特别是对综合DDL-GenAI方法和MRU框架的实证评估。
{"title":"Data-driven Learning Meets Generative AI: Introducing the Framework of Metacognitive Resource Use","authors":"Atsushi Mizumoto","doi":"10.1016/j.acorp.2023.100074","DOIUrl":"10.1016/j.acorp.2023.100074","url":null,"abstract":"<div><p>This paper explores the intersection of data-driven learning (DDL) and generative AI (GenAI), represented by technologies like ChatGPT, in the realm of language learning and teaching. It presents two complementary perspectives on how to integrate these approaches. The first viewpoint advocates for a blended methodology that synergizes DDL and GenAI, capitalizing on their complementary strengths while offsetting their individual limitations. The second introduces the Metacognitive Resource Use (MRU) framework, a novel paradigm that positions DDL within an expansive ecosystem of language resources, which also includes GenAI tools. Anchored in the foundational principles of metacognition, the MRU framework centers on two pivotal dimensions: metacognitive knowledge and metacognitive regulation. The paper proposes pedagogical recommendations designed to enable learners to strategically utilize a wide range of language resources, from corpora to GenAI technologies, guided by their self-awareness, the specifics of the task, and relevant strategies. The paper concludes by highlighting promising avenues for future research, notably the empirical assessment of both the integrated DDL-GenAI approach and the MRU framework.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 3","pages":"Article 100074"},"PeriodicalIF":0.0,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48929007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Applied Corpus Linguistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1