Pub Date : 2023-12-20DOI: 10.1016/j.acorp.2023.100083
Tony Berber Sardinha
The goal of this study is to assess the degree of resemblance between texts generated by artificial intelligence (GPT) and (written and spoken) texts produced by human individuals in real-world settings. A comparative analysis was conducted along the five main dimensions of variation that Biber (1988) identified. The findings revealed significant disparities between AI-generated and human-authored texts, with the AI-generated texts generally failing to exhibit resemblance to their human counterparts. Furthermore, a linear discriminant analysis, performed to measure the predictive potential of dimension scores for identifying the authorship of texts, demonstrated that AI-generated texts could be identified with relative ease based on their multidimensional profile. Collectively, the results underscore the current limitations of AI text generation in emulating natural human communication. This finding counters popular fears that AI will replace humans in textual communication. Rather, our findings suggest that, at present, AI's ability to capture the intricate patterns of natural language remains limited.
{"title":"AI-generated vs human-authored texts: A multidimensional comparison","authors":"Tony Berber Sardinha","doi":"10.1016/j.acorp.2023.100083","DOIUrl":"10.1016/j.acorp.2023.100083","url":null,"abstract":"<div><p>The goal of this study is to assess the degree of resemblance between texts generated by artificial intelligence (GPT) and (written and spoken) texts produced by human individuals in real-world settings. A comparative analysis was conducted along the five main dimensions of variation that Biber (1988) identified. The findings revealed significant disparities between AI-generated and human-authored texts, with the AI-generated texts generally failing to exhibit resemblance to their human counterparts. Furthermore, a linear discriminant analysis, performed to measure the predictive potential of dimension scores for identifying the authorship of texts, demonstrated that AI-generated texts could be identified with relative ease based on their multidimensional profile. Collectively, the results underscore the current limitations of AI text generation in emulating natural human communication. This finding counters popular fears that AI will replace humans in textual communication. Rather, our findings suggest that, at present, AI's ability to capture the intricate patterns of natural language remains limited.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 1","pages":"Article 100083"},"PeriodicalIF":0.0,"publicationDate":"2023-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799123000436/pdfft?md5=eec63f0662cd28b0d80ac041ac33eae7&pid=1-s2.0-S2666799123000436-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139026627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-20DOI: 10.1016/j.acorp.2023.100078
Jesse Egbert , Thomas R. Lee
When faced with a word or phrase that is not defined in a statute, judges generally interpret the language of the law as it is likely to be understood by an ordinary user of the language. However, there is little agreement about what ordinary meaning is and how it can be determined. Proponents of corpus-based legal interpretation argue that corpora provide scientific rigor and increased validity and transparency, but there is currently no consensus on best practices for legal corpus linguistics. Our objective in this paper is to propose some refinements to the theory of ordinary meaning and corpus-based methods of analyzing it. We argue that the scope of legal language is established by conceptual (intensional) meaning, and not limited to attested referents. Yet, most current corpus-based approaches are purely referential (extensional). Therefore, we introduce a new methodology—prototype by component (PBC) analysis—in which we bring together aspects of the componential approach and prototype theory by assuming that categories are gradient entities that are characterized by gradient semantic components. We introduce the analytical steps in PBC analysis and apply them to Nix v. Hedden (1893) to determine whether tomato is a member of the category vegetable. We conclude that conceptual categories have a prototypical reality and a componential reality. As a result, attested referents in a corpus can provide insights into the conceptual meaning of terms and the degree to which concepts are members of categories.
面对成文法中没有定义的单词或短语,法官通常会按照普通语言使用者的理解来解释法律语言。然而,对于什么是普通含义以及如何确定普通含义,人们的看法并不一致。基于语料库的法律解释的支持者认为,语料库提供了科学的严谨性,提高了有效性和透明度,但目前对法律语料库语言学的最佳实践还没有达成共识。我们在本文中的目标是对普通意义理论和基于语料库的分析方法提出一些改进建议。我们认为,法律语言的范围是由概念(内涵)意义确定的,而不局限于有据可查的所指。然而,目前大多数基于语料库的方法都是纯指代(外延)的。因此,我们引入了一种新的方法--原型成分(PBC)分析法,通过假设范畴是由梯度语义成分表征的梯度实体,将成分方法和原型理论的各个方面结合起来。我们介绍了 PBC 分析法的分析步骤,并将其应用于 Nix v. Hedden 案(1893 年),以确定番茄是否属于蔬菜类别。我们的结论是,概念范畴具有原型现实和成分现实。因此,语料库中的有据可查的指代可以让我们深入了解术语的概念含义以及概念在多大程度上是范畴的成员。
{"title":"Prototype-by-component analysis: A corpus-based, intensional approach to ordinary meaning in statutory interpretation","authors":"Jesse Egbert , Thomas R. Lee","doi":"10.1016/j.acorp.2023.100078","DOIUrl":"10.1016/j.acorp.2023.100078","url":null,"abstract":"<div><p>When faced with a word or phrase that is not defined in a statute, judges generally interpret the language of the law as it is likely to be understood by an ordinary user of the language. However, there is little agreement about what ordinary meaning is and how it can be determined. Proponents of corpus-based legal interpretation argue that corpora provide scientific rigor and increased validity and transparency, but there is currently no consensus on best practices for legal corpus linguistics. Our objective in this paper is to propose some refinements to the theory of ordinary meaning and corpus-based methods of analyzing it. We argue that the scope of legal language is established by conceptual (<em>intensional</em>) meaning, and not limited to attested referents. Yet, most current corpus-based approaches are purely referential (<em>extensional</em>). Therefore, we introduce a new methodology—<em>prototype by component (PBC)</em> analysis<em>—</em>in which we bring together aspects of the componential approach and prototype theory by assuming that categories are gradient entities that are characterized by gradient semantic components. We introduce the analytical steps in PBC analysis and apply them to <em>Nix v. Hedden</em> (1893) to determine whether <em>tomato</em> is a member of the category vegetable. We conclude that conceptual categories have a prototypical reality and a componential reality. As a result, attested referents in a corpus can provide insights into the conceptual meaning of terms and the degree to which concepts are members of categories.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 1","pages":"Article 100078"},"PeriodicalIF":0.0,"publicationDate":"2023-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799123000382/pdfft?md5=f402bdd08e64a2ca946fa7003eabe040&pid=1-s2.0-S2666799123000382-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139014688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-19DOI: 10.1016/j.acorp.2023.100079
Stefan Th. Gries, Brian G. Slocum, Kevin Tobia
Scholars and practitioners interested in legal interpretation have become increasingly interested in corpus-linguistic methodology. Lee and Mouritsen (2018) developed and helped popularize the use of concordancing and collocate displays (of mostly COCA and COHA) to operationalize a central notion in legal interpretation, the ordinary meaning of expressions. This approach provides a good first approximation but is ultimately limited. Here, we outline an approach to ordinary meaning that is intensionalist (i.e., 'feature-based'), top-down, and informed by the notion of cue validity in prototype theory. The key advantages of this approach are that (i) it avoids the which-value-on-a-dimension problem of extensionalist approaches, (ii) it provides quantifiable prototypicality values for things whose membership status in a category is in question, and (iii) it can be extended even to cases for which no textual data are yet available. We exemplify the approach with two case studies that offer the option of utilizing survey data and/or word embeddings trained on corpora by deriving cue validities from word similarities. We exemplify this latter approach with the word vehicle on the basis of (i) an embedding model trained on 840 billion words crawled from the web, but now also with the more realistic application (in terms of corpus size and time frame) of (ii) an embedding model trained on the 1950s time slice of COHA to address the question to what degree Segways, which didn't exist in the 1950s, qualify as vehicles in this intensional approach.
{"title":"Corpus-linguistic approaches to lexical statutory meaning: Extensionalist vs. intensionalist approaches","authors":"Stefan Th. Gries, Brian G. Slocum, Kevin Tobia","doi":"10.1016/j.acorp.2023.100079","DOIUrl":"https://doi.org/10.1016/j.acorp.2023.100079","url":null,"abstract":"<div><p>Scholars and practitioners interested in legal interpretation have become increasingly interested in corpus-linguistic methodology. <span>Lee and Mouritsen (2018)</span> developed and helped popularize the use of concordancing and collocate displays (of mostly COCA and COHA) to operationalize a central notion in legal interpretation, the <strong>ordinary meaning</strong> of expressions. This approach provides a good first approximation but is ultimately limited. Here, we outline an approach to ordinary meaning that is <strong>intensionalist</strong> (i.e., 'feature-based'), top-down, and informed by the notion of <strong>cue validity in prototype theory</strong>. The key advantages of this approach are that (i) it avoids the which-value-on-a-dimension problem of extensionalist approaches, (ii) it provides quantifiable prototypicality values for things whose membership status in a category is in question, and (iii) it can be extended even to cases for which no textual data are yet available. We exemplify the approach with two case studies that offer the option of utilizing survey data and/or word embeddings trained on corpora by deriving cue validities from word similarities. We exemplify this latter approach with the word <em>vehicle</em> on the basis of (i) an embedding model trained on 840 billion words crawled from the web, but now also with the more realistic application (in terms of corpus size and time frame) of (ii) an embedding model trained on the 1950s time slice of COHA to address the question to what degree Segways, which didn't exist in the 1950s, qualify as vehicles in this intensional approach.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 1","pages":"Article 100079"},"PeriodicalIF":0.0,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799123000394/pdfft?md5=fffa64c5cf04e01a22d462ddb9e4441e&pid=1-s2.0-S2666799123000394-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139099518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-19DOI: 10.1016/j.acorp.2023.100082
Niall Curry , Paul Baker , Gavin Brookes
This paper explores the potential of generative artificial intelligence technology, specifically ChatGPT, for advancing corpus approaches to discourse studies. The contribution of artificial intelligence technologies to linguistics research has been transformational, both in the contexts of corpus linguistics and discourse analysis. However, shortcomings in the efficacy of such technologies for conducting automated qualitative analysis have limited their utility for corpus approaches to discourse studies. Acknowledging that new technologies in data analysis can replace and supplement existing approaches, and in view of the potential affordances of ChatGPT for automated qualitative analysis, this paper presents three replication case studies designed to investigate the applicability of ChatGPT for supporting automated qualitative analysis within studies using corpus approaches to discourse analysis.
The findings indicate that, generally, ChatGPT performs reasonably well when semantically categorising keywords; however, as the categorisation is based on decontextualised keywords, the categories can appear quite generic, limiting the value of such an approach for analysing corpora representing specialised genres and/or contexts. For concordance analysis, ChatGPT performs poorly, as the results include false inferences about the concordance lines and, at times, modifications of the input data. Finally, for function-to-form analysis, ChatGPT also performs poorly, as it fails to identify and analyse direct and indirect questions. Overall, the results raise questions about the affordances of ChatGPT for supporting automated qualitative analysis within corpus approaches to discourse studies, signalling issues of repeatability and replicability, ethical challenges surrounding data integrity, and the challenges associated with using non-deterministic technology for empirical linguistic research.
{"title":"Generative AI for corpus approaches to discourse studies: A critical evaluation of ChatGPT","authors":"Niall Curry , Paul Baker , Gavin Brookes","doi":"10.1016/j.acorp.2023.100082","DOIUrl":"10.1016/j.acorp.2023.100082","url":null,"abstract":"<div><p>This paper explores the potential of generative artificial intelligence technology, specifically ChatGPT, for advancing corpus approaches to discourse studies. The contribution of artificial intelligence technologies to linguistics research has been transformational, both in the contexts of corpus linguistics and discourse analysis. However, shortcomings in the efficacy of such technologies for conducting automated qualitative analysis have limited their utility for corpus approaches to discourse studies. Acknowledging that new technologies in data analysis can replace and supplement existing approaches, and in view of the potential affordances of ChatGPT for automated qualitative analysis, this paper presents three replication case studies designed to investigate the applicability of ChatGPT for supporting automated qualitative analysis within studies using corpus approaches to discourse analysis.</p><p>The findings indicate that, generally, ChatGPT performs reasonably well when semantically categorising keywords; however, as the categorisation is based on decontextualised keywords, the categories can appear quite generic, limiting the value of such an approach for analysing corpora representing specialised genres and/or contexts. For concordance analysis, ChatGPT performs poorly, as the results include false inferences about the concordance lines and, at times, modifications of the input data. Finally, for function-to-form analysis, ChatGPT also performs poorly, as it fails to identify and analyse direct and indirect questions. Overall, the results raise questions about the affordances of ChatGPT for supporting automated qualitative analysis within corpus approaches to discourse studies, signalling issues of repeatability and replicability, ethical challenges surrounding data integrity, and the challenges associated with using non-deterministic technology for empirical linguistic research.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 1","pages":"Article 100082"},"PeriodicalIF":0.0,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799123000424/pdfft?md5=ae9708bc5113ac915574372c9ad6a9d7&pid=1-s2.0-S2666799123000424-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139023094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-01DOI: 10.1016/j.acorp.2023.100071
{"title":"Erratum regarding missing Declaration of Competing Interest statements in previously published articles","authors":"","doi":"10.1016/j.acorp.2023.100071","DOIUrl":"https://doi.org/10.1016/j.acorp.2023.100071","url":null,"abstract":"","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 3","pages":"Article 100071"},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S266679912300031X/pdfft?md5=b062715ba46158ca342b354088c8e319&pid=1-s2.0-S266679912300031X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138484866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-30DOI: 10.1016/j.acorp.2023.100077
Tove Larsson , Tony Berber Sardinha , Bethany Gray , Douglas Biber
The present study explores the development of grammatical complexity in L2 English writing at the beginner, lower intermediate, and upper intermediate levels to see (i) to what extent the developmental stages proposed in Biber et al. (2011) are evident in low-proficiency L2 writing, and if so, what the patterns of progression are, and (ii) whether students gradually move away from speech-like production toward more advanced written production. We use data from COBRA, a corpus of L1 Brazilian Portuguese learner production, along with BR-ICLE and BR-LINDSEI. All the data were tagged using the Biber tagger (Biber, 1988) and the Developmental Complexity tagger (Gray et al., 2019), and subsequently analyzed using a technique developed in Staples et al. (2022) to quantify developmental profiles across levels. The technique considers not only overall change in frequency across levels, but also the incremental variation across each adjacent level (based on % frequency changes). The results show that the features were infrequent overall, with a majority of both clausal and phrasal features exhibiting an increase in frequency across the levels, albeit to varying degrees. This general pattern is contrary to predictions based on findings from previous studies, which found phrasal features increasing in use and clausal features decreasing in use. Nonetheless, for the features associated with each developmental stage, the frequencies generally increased, becoming more similar to advanced written production and more dissimilar to spoken production, as hypothesized in Biber et al. (2011).
{"title":"Exploring early L2 writing development through the lens of grammatical complexity","authors":"Tove Larsson , Tony Berber Sardinha , Bethany Gray , Douglas Biber","doi":"10.1016/j.acorp.2023.100077","DOIUrl":"https://doi.org/10.1016/j.acorp.2023.100077","url":null,"abstract":"<div><p>The present study explores the development of grammatical complexity in L2 English writing at the beginner, lower intermediate, and upper intermediate levels to see (i) to what extent the developmental stages proposed in Biber et al. (2011) are evident in low-proficiency L2 writing, and if so, what the patterns of progression are, and (ii) whether students gradually move away from speech-like production toward more advanced written production. We use data from COBRA, a corpus of L1 Brazilian Portuguese learner production, along with BR-ICLE and BR-LINDSEI. All the data were tagged using the Biber tagger (Biber, 1988) and the Developmental Complexity tagger (Gray et al., 2019), and subsequently analyzed using a technique developed in Staples et al. (2022) to quantify developmental profiles across levels. The technique considers not only overall change in frequency across levels, but also the incremental variation across each adjacent level (based on % frequency changes). The results show that the features were infrequent overall, with a majority of both clausal and phrasal features exhibiting an increase in frequency across the levels, albeit to varying degrees. This general pattern is contrary to predictions based on findings from previous studies, which found phrasal features increasing in use and clausal features <em>decreasing</em> in use. Nonetheless, for the features associated with each developmental stage, the frequencies generally increased, becoming more similar to advanced written production and more dissimilar to spoken production, as hypothesized in Biber et al. (2011).</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 3","pages":"Article 100077"},"PeriodicalIF":0.0,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91989988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-21DOI: 10.1016/j.acorp.2023.100076
Shotaro Ueno , Osamu Takeuchi
Data-driven learning (DDL) refers to the use of corpora by second and foreign language (L2) learners to explore and inductively discover patterns of their target language use from authentic language data without interventions from others. Although previous meta-analyses have demonstrated the positive effects of DDL on L2 learning (Boulton and Cobb, 2017), the number of empirical studies has been increasing since then. Therefore, this study included more recent studies and used meta-analyses to examine the extent to which: (1) DDL exerts an effect on L2 learning; and (2) moderator variables affect DDL's influence on L2 learning. The results demonstrated small to medium effect sizes for experimental/control group comparisons and pre/post and pre/delayed designs. Moreover, the moderator analyses found that moderator variables, such as publication types, learners’ factors, and research designs, influence the magnitude of DDL effectiveness in L2 learning.
{"title":"Effective corpus use in second language learning: A meta-analytic approach","authors":"Shotaro Ueno , Osamu Takeuchi","doi":"10.1016/j.acorp.2023.100076","DOIUrl":"https://doi.org/10.1016/j.acorp.2023.100076","url":null,"abstract":"<div><p>Data-driven learning (DDL) refers to the use of corpora by second and foreign language (L2) learners to explore and inductively discover patterns of their target language use from authentic language data without interventions from others. Although previous meta-analyses have demonstrated the positive effects of DDL on L2 learning (Boulton and Cobb, 2017), the number of empirical studies has been increasing since then. Therefore, this study included more recent studies and used meta-analyses to examine the extent to which: (1) DDL exerts an effect on L2 learning; and (2) moderator variables affect DDL's influence on L2 learning. The results demonstrated small to medium effect sizes for experimental/control group comparisons and pre/post and pre/delayed designs. Moreover, the moderator analyses found that moderator variables, such as publication types, learners’ factors, and research designs, influence the magnitude of DDL effectiveness in L2 learning.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 3","pages":"Article 100076"},"PeriodicalIF":0.0,"publicationDate":"2023-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91957142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-11DOI: 10.1016/j.acorp.2023.100075
Aline Pacheco , Angela Carolina de Moraes Garcia , Ana Lúcia Tavares Monteiro , Malila Carvalho de Almeida Prado , Patrícia Tosqui-Lucks
This article presents the theoretical basis for corpus linguistics applied to Aeronautical English teaching and assessment followed by practical examples on how to use corpora to develop tasks for both purposes. It originates from the design of two webinars held remotely at the end of 2020, and promoted by the International Civil Aviation English Association. The webinars were targeted at Aeronautical English teachers, material designers, and test developers with little or no previous knowledge of corpus linguistics with the aim of guiding the audience in preparing step–by–step tasks using corpora. We share the work involved in the task design suggested, bridging the gap between research and practice. We conclude by outlining limitations, and suggesting prospects for future research.
{"title":"Using corpus linguistics to create tasks for teaching and assessing Aeronautical English","authors":"Aline Pacheco , Angela Carolina de Moraes Garcia , Ana Lúcia Tavares Monteiro , Malila Carvalho de Almeida Prado , Patrícia Tosqui-Lucks","doi":"10.1016/j.acorp.2023.100075","DOIUrl":"https://doi.org/10.1016/j.acorp.2023.100075","url":null,"abstract":"<div><p><span>This article presents the theoretical basis for corpus linguistics applied to Aeronautical English teaching and assessment followed by practical examples on how to use corpora to develop tasks for both purposes. It originates from the design of two webinars held remotely at the end of 2020, and promoted by the International </span>Civil Aviation English Association. The webinars were targeted at Aeronautical English teachers, material designers, and test developers with little or no previous knowledge of corpus linguistics with the aim of guiding the audience in preparing step–by–step tasks using corpora. We share the work involved in the task design suggested, bridging the gap between research and practice. We conclude by outlining limitations, and suggesting prospects for future research.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 3","pages":"Article 100075"},"PeriodicalIF":0.0,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49863545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-08DOI: 10.1016/j.acorp.2023.100073
Robert Poole , Qudus Ayinde Adebayo
This study explores diachronic variation across approximately one hundred years of the newspaper register in US American English from 1920 to 2019 as captured in the Corpus of Historical American English (Davies, 2010). Informed by a similar study of lexical change in British English (Baker, 2011), the analysis identified high-frequency words exhibiting the greatest increases and decreases in use as well as those words demonstrating stability across the four sampling periods: 1920–29, 1950–59, 1980–89, 2010–19. The process to identify words of change and stability began first with the application of a cumulative frequency threshold; coefficient of variance and Kendall's Tau correlation coefficient were then calculated to aid in identification. In other words, the process targeted high-frequency words whose use has demonstrated the greatest change or stability. The discussion presents the three resulting word lists (increasing, decreasing, stable) and reports concordance and collocation analysis of select words from each list to gain insight into the underlying factors informing lexical change and stability.
{"title":"Lexical change and stability in 100 years of English in US newspapers","authors":"Robert Poole , Qudus Ayinde Adebayo","doi":"10.1016/j.acorp.2023.100073","DOIUrl":"10.1016/j.acorp.2023.100073","url":null,"abstract":"<div><p>This study explores diachronic variation across approximately one hundred years of the newspaper register in US American English from 1920 to 2019 as captured in the Corpus of Historical American English (Davies, 2010). Informed by a similar study of lexical change in British English (Baker, 2011), the analysis identified high-frequency words exhibiting the greatest increases and decreases in use as well as those words demonstrating stability across the four sampling periods: 1920–29, 1950–59, 1980–89, 2010–19. The process to identify words of change and stability began first with the application of a cumulative frequency threshold; coefficient of variance and Kendall's Tau correlation coefficient were then calculated to aid in identification. In other words, the process targeted high-frequency words whose use has demonstrated the greatest change or stability. The discussion presents the three resulting word lists (increasing, decreasing, stable) and reports concordance and collocation analysis of select words from each list to gain insight into the underlying factors informing lexical change and stability.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 3","pages":"Article 100073"},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46738896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-07DOI: 10.1016/j.acorp.2023.100074
Atsushi Mizumoto
This paper explores the intersection of data-driven learning (DDL) and generative AI (GenAI), represented by technologies like ChatGPT, in the realm of language learning and teaching. It presents two complementary perspectives on how to integrate these approaches. The first viewpoint advocates for a blended methodology that synergizes DDL and GenAI, capitalizing on their complementary strengths while offsetting their individual limitations. The second introduces the Metacognitive Resource Use (MRU) framework, a novel paradigm that positions DDL within an expansive ecosystem of language resources, which also includes GenAI tools. Anchored in the foundational principles of metacognition, the MRU framework centers on two pivotal dimensions: metacognitive knowledge and metacognitive regulation. The paper proposes pedagogical recommendations designed to enable learners to strategically utilize a wide range of language resources, from corpora to GenAI technologies, guided by their self-awareness, the specifics of the task, and relevant strategies. The paper concludes by highlighting promising avenues for future research, notably the empirical assessment of both the integrated DDL-GenAI approach and the MRU framework.
{"title":"Data-driven Learning Meets Generative AI: Introducing the Framework of Metacognitive Resource Use","authors":"Atsushi Mizumoto","doi":"10.1016/j.acorp.2023.100074","DOIUrl":"10.1016/j.acorp.2023.100074","url":null,"abstract":"<div><p>This paper explores the intersection of data-driven learning (DDL) and generative AI (GenAI), represented by technologies like ChatGPT, in the realm of language learning and teaching. It presents two complementary perspectives on how to integrate these approaches. The first viewpoint advocates for a blended methodology that synergizes DDL and GenAI, capitalizing on their complementary strengths while offsetting their individual limitations. The second introduces the Metacognitive Resource Use (MRU) framework, a novel paradigm that positions DDL within an expansive ecosystem of language resources, which also includes GenAI tools. Anchored in the foundational principles of metacognition, the MRU framework centers on two pivotal dimensions: metacognitive knowledge and metacognitive regulation. The paper proposes pedagogical recommendations designed to enable learners to strategically utilize a wide range of language resources, from corpora to GenAI technologies, guided by their self-awareness, the specifics of the task, and relevant strategies. The paper concludes by highlighting promising avenues for future research, notably the empirical assessment of both the integrated DDL-GenAI approach and the MRU framework.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 3","pages":"Article 100074"},"PeriodicalIF":0.0,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48929007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}