首页 > 最新文献

Applied Corpus Linguistics最新文献

英文 中文
Talking across the interdisciplinary aisle: A guide for legal and corpus-linguistic scholars and practitioners 跨学科对话:法律和语料库学者及从业人员指南
Pub Date : 2024-01-26 DOI: 10.1016/j.acorp.2024.100086
Stefan Th. Gries , Tammy Gales

In this paper, we discuss a variety of misunderstandings that have arisen – and still linger – in the field of Law and Corpus Linguistics (LCL). Many have to do with the interdisciplinary nature of legal scholarship and practice on the one hand and corpus linguistics (CL) on the other. Our goals are to address these misunderstandings to explicate them, illuminate the assumptions that co-motivated them in the first place, and provide advice as to how to discuss, maybe refute, and avoid them moving forward, especially given the progress made to-date. In order to illustrate our discussion, we have separated the critiques into two major stages in the collaborative process – (i) a legal stage and (ii) a corpus linguistics stage. In stage (i), we address issues such as the desire to involve a corpus linguist, the question of whether the use of CL outsources a judicial task, and the role CL plays in legal theories of interpretation. In stage (ii), we discuss common critiques of CL applications to legal interpretation such as the claim that the method is inherently subjective, the potential arbitrariness of corpus compilation and selection, and the variable role that context plays in such applications. The final section provides our set of recommendations connecting the two stages to allow for the iterative fine-tuning process we think is required for successful collaboration in academic and applied legal settings; we conclude with our view on who should do corpus linguistics in legal contexts, hopefully facilitating further talk across the interdisciplinary aisle.

在本文中,我们将讨论在法律与语料库语言学(LCL)领域已经出现并仍然存在的各种误解。其中许多误解与法律学术和实践的跨学科性质以及语料库语言学(CL)的跨学科性质有关。我们的目标是解决这些误解,解释这些误解,阐明最初导致这些误解的假设,并就如何讨论、反驳和避免这些误解提出建议,尤其是考虑到迄今为止所取得的进展。为了说明我们的讨论,我们将批评分为合作过程中的两个主要阶段--(i) 法律阶段和 (ii) 语料库语言学阶段。在第(i)阶段,我们讨论了一些问题,如让语料库语言学家参与的愿望、使用语料库语言学是否将司法任务外包的问题,以及语料库语言学在法律解释理论中扮演的角色。在第(ii)阶段,我们讨论了对将语言学应用于法律解释的常见批评,如该方法本身具有主观性的说法、语料库编纂和选择的潜在随意性,以及语境在此类应用中所扮演的多变角色。最后一节提出了我们的一系列建议,将这两个阶段联系起来,以实现我们认为在学术和应用法律环境中成功合作所需的迭代微调过程;最后,我们就谁应该在法律环境中进行语料库语言学研究提出了自己的观点,希望能促进跨学科领域的进一步讨论。
{"title":"Talking across the interdisciplinary aisle: A guide for legal and corpus-linguistic scholars and practitioners","authors":"Stefan Th. Gries ,&nbsp;Tammy Gales","doi":"10.1016/j.acorp.2024.100086","DOIUrl":"10.1016/j.acorp.2024.100086","url":null,"abstract":"<div><p>In this paper, we discuss a variety of misunderstandings that have arisen – and still linger – in the field of Law and Corpus Linguistics (LCL). Many have to do with the interdisciplinary nature of legal scholarship and practice on the one hand and corpus linguistics (CL) on the other. Our goals are to address these misunderstandings to explicate them, illuminate the assumptions that co-motivated them in the first place, and provide advice as to how to discuss, maybe refute, and avoid them moving forward, especially given the progress made to-date. In order to illustrate our discussion, we have separated the critiques into two major stages in the collaborative process – (i) a legal stage and (ii) a corpus linguistics stage. In stage (i), we address issues such as the desire to involve a corpus linguist, the question of whether the use of CL outsources a judicial task, and the role CL plays in legal theories of interpretation. In stage (ii), we discuss common critiques of CL applications to legal interpretation such as the claim that the method is inherently subjective, the potential arbitrariness of corpus compilation and selection, and the variable role that context plays in such applications. The final section provides our set of recommendations connecting the two stages to allow for the iterative fine-tuning process we think is required for successful collaboration in academic and applied legal settings; we conclude with our view on who should do corpus linguistics in legal contexts, hopefully facilitating further talk across the interdisciplinary aisle.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799124000030/pdfft?md5=3ca5d65b9eff85e662710ecaa844011f&pid=1-s2.0-S2666799124000030-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139638640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A corpus-based developmental investigation of linguistic complexity in children's writing 基于语料库的儿童写作语言复杂性发展调查
Pub Date : 2024-01-07 DOI: 10.1016/j.acorp.2024.100084
Yaling Hsiao , Nicola J. Dawson , Nilanjana Banerji , Kate Nation

Writing proficiency is associated with linguistic complexity. We used measures of linguistic complexity to investigate the development of children's narrative writing using a large corpus of short stories (N>100,000) written by children aged 5–13 in the UK. Linguistic complexity was assessed using both lexical (N = 30) and syntactic (N = 14) measures. Most measures were associated with age, with writing by older children showing greater lexical density, sophistication, and diversity than writing by younger children. Older children also used longer sentences, and longer T-units and clauses, and the density of smaller syntactic units inside larger units was also higher. Principal Component Analysis identified a number of dimensions associated with complexity, with the first two dimensions capturing nearly 50 % of variance. Lexical diversity was mainly represented on the first dimension and syntactic complexity on the second. Across the age range, there was wider variation in syntactic complexity than in lexical diversity, suggesting that syntactic development is subject to more individual differences than the ability to use a diverse set of lexical items. Our findings quantify the nature and content of children's writing through mid-childhood, and we discuss the utility of analysing children's writing using a computational, data-driven approach.

写作能力与语言复杂性有关。我们利用英国 5-13 岁儿童撰写的大量短篇故事语料库(N>100,000),采用语言复杂性测量方法来研究儿童叙事性写作的发展。语言复杂性的评估采用词法(30 个)和句法(14 个)两种测量方法。大多数测量结果都与年龄有关,与年龄较小的儿童相比,年龄较大的儿童所写的文章在词汇密度、复杂性和多样性方面都更胜一筹。大龄儿童使用的句子也更长,T-单位和分句也更长,大单位内小句法单位的密度也更高。主成分分析确定了一些与复杂性相关的维度,其中前两个维度占了近 50% 的方差。词汇多样性主要体现在第一个维度上,句法复杂性则体现在第二个维度上。在不同年龄段,句法复杂性的差异比词汇多样性的差异更大,这表明句法的发展受个体差异的影响要大于使用多样化词汇的能力。我们的研究结果量化了儿童中期写作的性质和内容,并讨论了使用计算、数据驱动方法分析儿童写作的实用性。
{"title":"A corpus-based developmental investigation of linguistic complexity in children's writing","authors":"Yaling Hsiao ,&nbsp;Nicola J. Dawson ,&nbsp;Nilanjana Banerji ,&nbsp;Kate Nation","doi":"10.1016/j.acorp.2024.100084","DOIUrl":"10.1016/j.acorp.2024.100084","url":null,"abstract":"<div><p>Writing proficiency is associated with linguistic complexity. We used measures of linguistic complexity to investigate the development of children's narrative writing using a large corpus of short stories (<em>N</em>&gt;100,000) written by children aged 5–13 in the UK. Linguistic complexity was assessed using both lexical (<em>N</em> = 30) and syntactic (<em>N</em> = 14) measures. Most measures were associated with age, with writing by older children showing greater lexical density, sophistication, and diversity than writing by younger children. Older children also used longer sentences, and longer T-units and clauses, and the density of smaller syntactic units inside larger units was also higher. Principal Component Analysis identified a number of dimensions associated with complexity, with the first two dimensions capturing nearly 50 % of variance. Lexical diversity was mainly represented on the first dimension and syntactic complexity on the second. Across the age range, there was wider variation in syntactic complexity than in lexical diversity, suggesting that syntactic development is subject to more individual differences than the ability to use a diverse set of lexical items. Our findings quantify the nature and content of children's writing through mid-childhood, and we discuss the utility of analysing children's writing using a computational, data-driven approach.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799124000017/pdfft?md5=26f900f0c1ffa0cd9e4f6495f4ba3386&pid=1-s2.0-S2666799124000017-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139453720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corpus-based tool: A digital science collocation list for multilingual middle school learners 基于语料库的工具:面向多语种初中生的数字科学搭配表
Pub Date : 2024-01-05 DOI: 10.1016/j.acorp.2024.100085
Rebeca Arndt

Collocational competence is essential for all learners, particularly for multilingual learners. This corpus-driven study assembled a 474-collocation list from a digital science corpus compiled from several thousand middle school science resources. Using a corpus of more than 2.7 million tokens and more than 400 node words, the collocation list was extracted by combining two approaches: frequency-based and expert-judged. The Digital Science Collocations List (DSCL) provides middle school learners and teachers with an unprecedented resource covering Life Science, Physical Science, and Earth and Space Science. This list may be especially useful to multilingual learners as most of the collocations in this list are composed of patterns that they struggle with (e.g., adjective + noun and verb + noun).

搭配能力对所有学习者,尤其是多语言学习者都至关重要。这项由语料库驱动的研究从数千个中学科学资源组成的数字科学语料库中收集了 474 个搭配列表。该语料库包含 270 多万个词块和 400 多个节点词,通过基于词频和专家判断两种方法提取搭配列表。数字科学搭配表(DSCL)为初中生和教师提供了前所未有的资源,涵盖生命科学、物理科学和地球与空间科学。该列表对多语言学习者尤其有用,因为列表中的大多数搭配都是他们难以掌握的模式(如形容词+名词和动词+名词)。
{"title":"Corpus-based tool: A digital science collocation list for multilingual middle school learners","authors":"Rebeca Arndt","doi":"10.1016/j.acorp.2024.100085","DOIUrl":"10.1016/j.acorp.2024.100085","url":null,"abstract":"<div><p>Collocational competence is essential for all learners, particularly for multilingual learners. This corpus-driven study assembled a 474-collocation list from a digital science corpus compiled from several thousand middle school science resources. Using a corpus of more than 2.7 million tokens and more than 400 node words, the collocation list was extracted by combining two approaches: frequency-based and expert-judged. The Digital Science Collocations List (DSCL) provides middle school learners and teachers with an unprecedented resource covering Life Science, Physical Science, and Earth and Space Science. This list may be especially useful to multilingual learners as most of the collocations in this list are composed of patterns that they struggle with (e.g., adjective + noun and verb + noun).</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799124000029/pdfft?md5=f47df5ea7f51d05305d96ff51e85b472&pid=1-s2.0-S2666799124000029-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139393904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Applied corpus linguistics and legal interpretation: A rapidly developing field of interdisciplinary scholarship 应用语料库语言学和法律解释:快速发展的跨学科学术领域
Pub Date : 2023-12-21 DOI: 10.1016/j.acorp.2023.100080
Ute Römer-Barron , Clark D. Cunningham

This article offers an overview of developments in a newly emerging interdisciplinary research field: legal corpus linguistics. The field brings together corpus research and legal theory by applying corpus-analytic techniques and linguistic concepts to facilitate the interpretation of legal texts. Despite the field's short history, it has already contributed important new insights into the meaning of statutory texts and parts of the U.S. Constitution, insights that may have significant practical implications for the American legal system. Our article provides an overview of relevant developments in legal corpus linguistics, from early success stories to recent and ongoing collaborative work between corpus linguists and legal scholars. It aims to highlight the benefits and illustrate the potential of this type of interdisciplinary work by summarizing three recent case studies, each of which deals with an important topic in American constitutional law. The case studies focus in turn on the following parts of the U.S. Constitution: (1) Article III and the meaning of “cases,” (2) the Appointments Provision in Article II, Section 2 and the meaning of “such inferior officers,” and (3) the Impeachment Provision in Article II, Section 4 and the meaning of “misdemeanors.” All three case studies use corpus analysis to explore phraseological patterns in large collections of Founding Era texts to provide insights into the meanings of the selected words and phrases in context during the time the Constitution was drafted and ratified. The article discusses the practical relevance of results from these case studies and potential implications of this and related work in legal corpus linguistics for contemporary and future litigation.

本文概述了一个新兴跨学科研究领域的发展情况:法律语料库语言学。该领域通过应用语料库分析技术和语言学概念来促进法律文本的解释,从而将语料库研究和法律理论结合起来。尽管该领域的历史不长,但它已经对法律文本和美国宪法部分内容的含义提出了重要的新见解,这些见解可能会对美国法律体系产生重大的实际影响。我们的文章概述了法律语料库语言学的相关发展,从早期的成功案例到最近语料库语言学家和法律学者之间正在进行的合作。文章总结了最近的三个案例研究,每个案例都涉及美国宪法中的一个重要主题,旨在强调这类跨学科工作的益处并说明其潜力。案例研究依次关注《美国宪法》的以下部分:(1) 第 III 条和 "案件 "的含义,(2) 第 II 条第 2 节中的任命条款和 "下级官员 "的含义,以及 (3) 第 II 条第 4 节中的弹劾条款和 "轻罪 "的含义。所有三个案例研究都使用语料库分析来探索建国时期大量文本中的用语模式,以深入了解所选单词和短语在宪法起草和批准期间的语境中的含义。文章讨论了这些案例研究结果的实际意义,以及这项工作和法律语料库语言学的相关工作对当代和未来诉讼的潜在影响。
{"title":"Applied corpus linguistics and legal interpretation: A rapidly developing field of interdisciplinary scholarship","authors":"Ute Römer-Barron ,&nbsp;Clark D. Cunningham","doi":"10.1016/j.acorp.2023.100080","DOIUrl":"10.1016/j.acorp.2023.100080","url":null,"abstract":"<div><p>This article offers an overview of developments in a newly emerging interdisciplinary research field: legal corpus linguistics. The field brings together corpus research and legal theory by applying corpus-analytic techniques and linguistic concepts to facilitate the interpretation of legal texts. Despite the field's short history, it has already contributed important new insights into the meaning of statutory texts and parts of the U.S. Constitution, insights that may have significant practical implications for the American legal system. Our article provides an overview of relevant developments in legal corpus linguistics, from early success stories to recent and ongoing collaborative work between corpus linguists and legal scholars. It aims to highlight the benefits and illustrate the potential of this type of interdisciplinary work by summarizing three recent case studies, each of which deals with an important topic in American constitutional law. The case studies focus in turn on the following parts of the U.S. Constitution: (1) Article III and the meaning of “cases,” (2) the Appointments Provision in Article II, Section 2 and the meaning of “such inferior officers,” and (3) the Impeachment Provision in Article II, Section 4 and the meaning of “misdemeanors.” All three case studies use corpus analysis to explore phraseological patterns in large collections of Founding Era texts to provide insights into the meanings of the selected words and phrases in context during the time the Constitution was drafted and ratified. The article discusses the practical relevance of results from these case studies and potential implications of this and related work in legal corpus linguistics for contemporary and future litigation.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799123000400/pdfft?md5=0df0fe523a8f1247b714bde587af844d&pid=1-s2.0-S2666799123000400-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139017195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Linguistic variation in functional types of statutory law 成文法功能类型的语言差异
Pub Date : 2023-12-20 DOI: 10.1016/j.acorp.2023.100081
Margaret Wood

When the meaning of an ambiguous word, phrase or grammatical structure in a statutory provision is disputed, courts are tasked with identifying the best meaning of the contested language. A common method of resolving linguistic ambiguities is to investigate the meaning of the contested word or structure in statutory provisions with similar subject matter. While the subject matter of a text has a demonstrated effect on language use, register variation research shows that the function of a text is also highly influential in predicting linguistic variation. Thus far, the function of a statutory provision (e.g., obligation to act, authorization to act) has not been considered in legal interpretative research. In the present study, I investigate the extent to which function influences the lexico-grammatical characteristics of statutory texts. 2,573 statutory provisions from the Arizona State Code are individually assigned to one of seven categories representing their function: Duties, Permissions, Impersonal Rules, Operational Definitions, Prohibitions, Procedural Guidelines, and Criminal Offenses. Key feature analysis is used to identify and describe patterns of lexico-grammatical variation between the seven functional types. Results reveal a great deal of lexico-grammatical variation associated with function in the register of statutory law. Furthermore, some functional types of statutory provisions are more linguistically distinct than others. These findings suggest that it may be beneficial to consider communicative function when investigating legal interpretative questions.

当对法律条文中模棱两可的词、短语或语法结构的含义产生争议时,法院的任务是确定争议语言的最佳含义。解决语言模棱两可问题的一个常用方法是调查有争议的词或结构在主题相似的法律条文中的含义。虽然文本的主题对语言的使用有明显的影响,但语域变异研究表明,文本的功能对语言变异的预测也有很大影响。迄今为止,法律解释研究尚未考虑过法律条文的功能(如行为义务、行为授权)。在本研究中,我调查了功能对法律条文词汇语法特征的影响程度。亚利桑那州法典》中的 2,573 条法律条文被分别归入代表其功能的七个类别之一:职责、许可、非人为规则、操作定义、禁止、程序指南和刑事犯罪。关键特征分析用于识别和描述七种功能类型之间的词汇语法差异模式。结果表明,在成文法登记簿中,与功能相关的词汇-语法变化非常多。此外,某些功能类型的成文法条款在语言上比其他类型的成文法条款更加独特。这些发现表明,在研究法律解释问题时考虑交际功能可能是有益的。
{"title":"Linguistic variation in functional types of statutory law","authors":"Margaret Wood","doi":"10.1016/j.acorp.2023.100081","DOIUrl":"10.1016/j.acorp.2023.100081","url":null,"abstract":"<div><p>When the meaning of an ambiguous word, phrase or grammatical structure in a statutory provision is disputed, courts are tasked with identifying the best meaning of the contested language. A common method of resolving linguistic ambiguities is to investigate the meaning of the contested word or structure in statutory provisions with similar subject matter. While the subject matter of a text has a demonstrated effect on language use, register variation research shows that the function of a text is also highly influential in predicting linguistic variation. Thus far, the function of a statutory provision (e.g., obligation to act, authorization to act) has not been considered in legal interpretative research. In the present study, I investigate the extent to which function influences the lexico-grammatical characteristics of statutory texts. 2,573 statutory provisions from the Arizona State Code are individually assigned to one of seven categories representing their function: Duties, Permissions, Impersonal Rules, Operational Definitions, Prohibitions, Procedural Guidelines, and Criminal Offenses. Key feature analysis is used to identify and describe patterns of lexico-grammatical variation between the seven functional types. Results reveal a great deal of lexico-grammatical variation associated with function in the register of statutory law. Furthermore, some functional types of statutory provisions are more linguistically distinct than others. These findings suggest that it may be beneficial to consider communicative function when investigating legal interpretative questions.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799123000412/pdfft?md5=e69c9782661415b4e96f65c3f83c57db&pid=1-s2.0-S2666799123000412-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139013416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI-generated vs human-authored texts: A multidimensional comparison 人工智能生成的文本与人类撰写的文本:多维比较
Pub Date : 2023-12-20 DOI: 10.1016/j.acorp.2023.100083
Tony Berber Sardinha

The goal of this study is to assess the degree of resemblance between texts generated by artificial intelligence (GPT) and (written and spoken) texts produced by human individuals in real-world settings. A comparative analysis was conducted along the five main dimensions of variation that Biber (1988) identified. The findings revealed significant disparities between AI-generated and human-authored texts, with the AI-generated texts generally failing to exhibit resemblance to their human counterparts. Furthermore, a linear discriminant analysis, performed to measure the predictive potential of dimension scores for identifying the authorship of texts, demonstrated that AI-generated texts could be identified with relative ease based on their multidimensional profile. Collectively, the results underscore the current limitations of AI text generation in emulating natural human communication. This finding counters popular fears that AI will replace humans in textual communication. Rather, our findings suggest that, at present, AI's ability to capture the intricate patterns of natural language remains limited.

本研究的目的是评估人工智能生成的文本(GPT)与人类在现实世界中生成的(书面和口语)文本之间的相似程度。我们按照 Biber(1988 年)确定的五个主要变化维度进行了比较分析。研究结果表明,人工智能生成的文本与人类撰写的文本之间存在明显差异,人工智能生成的文本通常无法表现出与人类文本的相似性。此外,为测量维度分数在识别文本作者身份方面的预测潜力而进行的线性判别分析表明,人工智能生成的文本可以根据其多维特征相对容易地识别出来。总之,这些结果强调了目前人工智能文本生成在模拟人类自然交流方面的局限性。这一发现反驳了人们对人工智能将在文本交流中取代人类的担忧。相反,我们的研究结果表明,目前人工智能捕捉自然语言复杂模式的能力仍然有限。
{"title":"AI-generated vs human-authored texts: A multidimensional comparison","authors":"Tony Berber Sardinha","doi":"10.1016/j.acorp.2023.100083","DOIUrl":"10.1016/j.acorp.2023.100083","url":null,"abstract":"<div><p>The goal of this study is to assess the degree of resemblance between texts generated by artificial intelligence (GPT) and (written and spoken) texts produced by human individuals in real-world settings. A comparative analysis was conducted along the five main dimensions of variation that Biber (1988) identified. The findings revealed significant disparities between AI-generated and human-authored texts, with the AI-generated texts generally failing to exhibit resemblance to their human counterparts. Furthermore, a linear discriminant analysis, performed to measure the predictive potential of dimension scores for identifying the authorship of texts, demonstrated that AI-generated texts could be identified with relative ease based on their multidimensional profile. Collectively, the results underscore the current limitations of AI text generation in emulating natural human communication. This finding counters popular fears that AI will replace humans in textual communication. Rather, our findings suggest that, at present, AI's ability to capture the intricate patterns of natural language remains limited.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799123000436/pdfft?md5=eec63f0662cd28b0d80ac041ac33eae7&pid=1-s2.0-S2666799123000436-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139026627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prototype-by-component analysis: A corpus-based, intensional approach to ordinary meaning in statutory interpretation 按成分分析原型:基于语料库的成文法解释普通含义方法
Pub Date : 2023-12-20 DOI: 10.1016/j.acorp.2023.100078
Jesse Egbert , Thomas R. Lee

When faced with a word or phrase that is not defined in a statute, judges generally interpret the language of the law as it is likely to be understood by an ordinary user of the language. However, there is little agreement about what ordinary meaning is and how it can be determined. Proponents of corpus-based legal interpretation argue that corpora provide scientific rigor and increased validity and transparency, but there is currently no consensus on best practices for legal corpus linguistics. Our objective in this paper is to propose some refinements to the theory of ordinary meaning and corpus-based methods of analyzing it. We argue that the scope of legal language is established by conceptual (intensional) meaning, and not limited to attested referents. Yet, most current corpus-based approaches are purely referential (extensional). Therefore, we introduce a new methodology—prototype by component (PBC) analysisin which we bring together aspects of the componential approach and prototype theory by assuming that categories are gradient entities that are characterized by gradient semantic components. We introduce the analytical steps in PBC analysis and apply them to Nix v. Hedden (1893) to determine whether tomato is a member of the category vegetable. We conclude that conceptual categories have a prototypical reality and a componential reality. As a result, attested referents in a corpus can provide insights into the conceptual meaning of terms and the degree to which concepts are members of categories.

面对成文法中没有定义的单词或短语,法官通常会按照普通语言使用者的理解来解释法律语言。然而,对于什么是普通含义以及如何确定普通含义,人们的看法并不一致。基于语料库的法律解释的支持者认为,语料库提供了科学的严谨性,提高了有效性和透明度,但目前对法律语料库语言学的最佳实践还没有达成共识。我们在本文中的目标是对普通意义理论和基于语料库的分析方法提出一些改进建议。我们认为,法律语言的范围是由概念(内涵)意义确定的,而不局限于有据可查的所指。然而,目前大多数基于语料库的方法都是纯指代(外延)的。因此,我们引入了一种新的方法--原型成分(PBC)分析法,通过假设范畴是由梯度语义成分表征的梯度实体,将成分方法和原型理论的各个方面结合起来。我们介绍了 PBC 分析法的分析步骤,并将其应用于 Nix v. Hedden 案(1893 年),以确定番茄是否属于蔬菜类别。我们的结论是,概念范畴具有原型现实和成分现实。因此,语料库中的有据可查的指代可以让我们深入了解术语的概念含义以及概念在多大程度上是范畴的成员。
{"title":"Prototype-by-component analysis: A corpus-based, intensional approach to ordinary meaning in statutory interpretation","authors":"Jesse Egbert ,&nbsp;Thomas R. Lee","doi":"10.1016/j.acorp.2023.100078","DOIUrl":"10.1016/j.acorp.2023.100078","url":null,"abstract":"<div><p>When faced with a word or phrase that is not defined in a statute, judges generally interpret the language of the law as it is likely to be understood by an ordinary user of the language. However, there is little agreement about what ordinary meaning is and how it can be determined. Proponents of corpus-based legal interpretation argue that corpora provide scientific rigor and increased validity and transparency, but there is currently no consensus on best practices for legal corpus linguistics. Our objective in this paper is to propose some refinements to the theory of ordinary meaning and corpus-based methods of analyzing it. We argue that the scope of legal language is established by conceptual (<em>intensional</em>) meaning, and not limited to attested referents. Yet, most current corpus-based approaches are purely referential (<em>extensional</em>). Therefore, we introduce a new methodology—<em>prototype by component (PBC)</em> analysis<em>—</em>in which we bring together aspects of the componential approach and prototype theory by assuming that categories are gradient entities that are characterized by gradient semantic components. We introduce the analytical steps in PBC analysis and apply them to <em>Nix v. Hedden</em> (1893) to determine whether <em>tomato</em> is a member of the category vegetable. We conclude that conceptual categories have a prototypical reality and a componential reality. As a result, attested referents in a corpus can provide insights into the conceptual meaning of terms and the degree to which concepts are members of categories.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799123000382/pdfft?md5=f402bdd08e64a2ca946fa7003eabe040&pid=1-s2.0-S2666799123000382-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139014688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corpus-linguistic approaches to lexical statutory meaning: Extensionalist vs. intensionalist approaches 词汇法定意义的语料库语言学方法:外延主义与内涵主义方法
Pub Date : 2023-12-19 DOI: 10.1016/j.acorp.2023.100079
Stefan Th. Gries, Brian G. Slocum, Kevin Tobia

Scholars and practitioners interested in legal interpretation have become increasingly interested in corpus-linguistic methodology. Lee and Mouritsen (2018) developed and helped popularize the use of concordancing and collocate displays (of mostly COCA and COHA) to operationalize a central notion in legal interpretation, the ordinary meaning of expressions. This approach provides a good first approximation but is ultimately limited. Here, we outline an approach to ordinary meaning that is intensionalist (i.e., 'feature-based'), top-down, and informed by the notion of cue validity in prototype theory. The key advantages of this approach are that (i) it avoids the which-value-on-a-dimension problem of extensionalist approaches, (ii) it provides quantifiable prototypicality values for things whose membership status in a category is in question, and (iii) it can be extended even to cases for which no textual data are yet available. We exemplify the approach with two case studies that offer the option of utilizing survey data and/or word embeddings trained on corpora by deriving cue validities from word similarities. We exemplify this latter approach with the word vehicle on the basis of (i) an embedding model trained on 840 billion words crawled from the web, but now also with the more realistic application (in terms of corpus size and time frame) of (ii) an embedding model trained on the 1950s time slice of COHA to address the question to what degree Segways, which didn't exist in the 1950s, qualify as vehicles in this intensional approach.

对法律解释感兴趣的学者和从业人员对语料库语言学方法越来越感兴趣。Lee 和 Mouritsen(2018 年)开发并帮助普及了使用协词和搭配显示(主要是 COCA 和 COHA)来操作法律解释中的一个核心概念--表达的普通意义。这种方法提供了一个良好的初步近似,但终究是有限的。在此,我们概述了一种普通意义的方法,这种方法是内向主义的(即 "基于特征")、自上而下的,并借鉴了原型理论中的线索有效性概念。这种方法的主要优势在于:(i) 它避免了外延主义方法中的 "维度上的值 "问题;(ii) 它为那些在某个类别中的成员地位受到质疑的事物提供了可量化的原型性值;(iii) 它甚至可以扩展到尚无文本数据的情况。我们通过两个案例研究来说明这种方法,这两个案例研究提供了利用调查数据和/或在语料库中通过从词语相似性中推导线索有效性来训练词语嵌入的选项。我们以 "车辆 "一词为例,说明了后一种方法:(i) 基于从网络中抓取的 8400 亿个单词训练的嵌入模型,但现在也更现实地应用了(在语料库规模和时间框架方面)(ii) 基于 COHA 的 20 世纪 50 年代时间片训练的嵌入模型,以解决 20 世纪 50 年代并不存在的赛格威在多大程度上符合这种内向方法中的车辆的问题。
{"title":"Corpus-linguistic approaches to lexical statutory meaning: Extensionalist vs. intensionalist approaches","authors":"Stefan Th. Gries,&nbsp;Brian G. Slocum,&nbsp;Kevin Tobia","doi":"10.1016/j.acorp.2023.100079","DOIUrl":"https://doi.org/10.1016/j.acorp.2023.100079","url":null,"abstract":"<div><p>Scholars and practitioners interested in legal interpretation have become increasingly interested in corpus-linguistic methodology. <span>Lee and Mouritsen (2018)</span> developed and helped popularize the use of concordancing and collocate displays (of mostly COCA and COHA) to operationalize a central notion in legal interpretation, the <strong>ordinary meaning</strong> of expressions. This approach provides a good first approximation but is ultimately limited. Here, we outline an approach to ordinary meaning that is <strong>intensionalist</strong> (i.e., 'feature-based'), top-down, and informed by the notion of <strong>cue validity in prototype theory</strong>. The key advantages of this approach are that (i) it avoids the which-value-on-a-dimension problem of extensionalist approaches, (ii) it provides quantifiable prototypicality values for things whose membership status in a category is in question, and (iii) it can be extended even to cases for which no textual data are yet available. We exemplify the approach with two case studies that offer the option of utilizing survey data and/or word embeddings trained on corpora by deriving cue validities from word similarities. We exemplify this latter approach with the word <em>vehicle</em> on the basis of (i) an embedding model trained on 840 billion words crawled from the web, but now also with the more realistic application (in terms of corpus size and time frame) of (ii) an embedding model trained on the 1950s time slice of COHA to address the question to what degree Segways, which didn't exist in the 1950s, qualify as vehicles in this intensional approach.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799123000394/pdfft?md5=fffa64c5cf04e01a22d462ddb9e4441e&pid=1-s2.0-S2666799123000394-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139099518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generative AI for corpus approaches to discourse studies: A critical evaluation of ChatGPT 用于语料库话语研究方法的生成式人工智能:对 ChatGPT 的批判性评估
Pub Date : 2023-12-19 DOI: 10.1016/j.acorp.2023.100082
Niall Curry , Paul Baker , Gavin Brookes

This paper explores the potential of generative artificial intelligence technology, specifically ChatGPT, for advancing corpus approaches to discourse studies. The contribution of artificial intelligence technologies to linguistics research has been transformational, both in the contexts of corpus linguistics and discourse analysis. However, shortcomings in the efficacy of such technologies for conducting automated qualitative analysis have limited their utility for corpus approaches to discourse studies. Acknowledging that new technologies in data analysis can replace and supplement existing approaches, and in view of the potential affordances of ChatGPT for automated qualitative analysis, this paper presents three replication case studies designed to investigate the applicability of ChatGPT for supporting automated qualitative analysis within studies using corpus approaches to discourse analysis.

The findings indicate that, generally, ChatGPT performs reasonably well when semantically categorising keywords; however, as the categorisation is based on decontextualised keywords, the categories can appear quite generic, limiting the value of such an approach for analysing corpora representing specialised genres and/or contexts. For concordance analysis, ChatGPT performs poorly, as the results include false inferences about the concordance lines and, at times, modifications of the input data. Finally, for function-to-form analysis, ChatGPT also performs poorly, as it fails to identify and analyse direct and indirect questions. Overall, the results raise questions about the affordances of ChatGPT for supporting automated qualitative analysis within corpus approaches to discourse studies, signalling issues of repeatability and replicability, ethical challenges surrounding data integrity, and the challenges associated with using non-deterministic technology for empirical linguistic research.

本文探讨了生成式人工智能技术(特别是 ChatGPT)在推动语料库方法用于话语研究方面的潜力。无论是在语料库语言学还是在话语分析方面,人工智能技术对语言学研究的贡献都是变革性的。然而,人工智能技术在进行自动定性分析方面的不足限制了其在语料库研究中的应用。鉴于数据分析中的新技术可以替代和补充现有方法,并考虑到 ChatGPT 在自动定性分析中的潜在能力,本文介绍了三项复制案例研究,旨在调查 ChatGPT 在使用语料库方法进行话语分析的研究中支持自动定性分析的适用性。研究结果表明,一般来说,ChatGPT 在对关键词进行语义分类时表现相当不错;但是,由于分类是基于非语境化的关键词进行的,因此分类可能会显得相当通用,从而限制了这种方法在分析代表专门流派和/或语境的语料库时的价值。ChatGPT 在协和分析方面表现不佳,因为其结果包括对协和行的错误推断,有时还会修改输入数据。最后,在功能到形式分析方面,ChatGPT 的表现也很差,因为它无法识别和分析直接和间接问题。总之,研究结果对 ChatGPT 在语料库方法中支持自动定性分析的能力提出了质疑,表明了可重复性和可复制性问题、围绕数据完整性的伦理挑战以及使用非确定性技术进行实证语言学研究的相关挑战。
{"title":"Generative AI for corpus approaches to discourse studies: A critical evaluation of ChatGPT","authors":"Niall Curry ,&nbsp;Paul Baker ,&nbsp;Gavin Brookes","doi":"10.1016/j.acorp.2023.100082","DOIUrl":"10.1016/j.acorp.2023.100082","url":null,"abstract":"<div><p>This paper explores the potential of generative artificial intelligence technology, specifically ChatGPT, for advancing corpus approaches to discourse studies. The contribution of artificial intelligence technologies to linguistics research has been transformational, both in the contexts of corpus linguistics and discourse analysis. However, shortcomings in the efficacy of such technologies for conducting automated qualitative analysis have limited their utility for corpus approaches to discourse studies. Acknowledging that new technologies in data analysis can replace and supplement existing approaches, and in view of the potential affordances of ChatGPT for automated qualitative analysis, this paper presents three replication case studies designed to investigate the applicability of ChatGPT for supporting automated qualitative analysis within studies using corpus approaches to discourse analysis.</p><p>The findings indicate that, generally, ChatGPT performs reasonably well when semantically categorising keywords; however, as the categorisation is based on decontextualised keywords, the categories can appear quite generic, limiting the value of such an approach for analysing corpora representing specialised genres and/or contexts. For concordance analysis, ChatGPT performs poorly, as the results include false inferences about the concordance lines and, at times, modifications of the input data. Finally, for function-to-form analysis, ChatGPT also performs poorly, as it fails to identify and analyse direct and indirect questions. Overall, the results raise questions about the affordances of ChatGPT for supporting automated qualitative analysis within corpus approaches to discourse studies, signalling issues of repeatability and replicability, ethical challenges surrounding data integrity, and the challenges associated with using non-deterministic technology for empirical linguistic research.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799123000424/pdfft?md5=ae9708bc5113ac915574372c9ad6a9d7&pid=1-s2.0-S2666799123000424-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139023094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Erratum regarding missing Declaration of Competing Interest statements in previously published articles 关于先前发表的文章中缺少竞争利益声明的勘误表
Pub Date : 2023-12-01 DOI: 10.1016/j.acorp.2023.100071
{"title":"Erratum regarding missing Declaration of Competing Interest statements in previously published articles","authors":"","doi":"10.1016/j.acorp.2023.100071","DOIUrl":"https://doi.org/10.1016/j.acorp.2023.100071","url":null,"abstract":"","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S266679912300031X/pdfft?md5=b062715ba46158ca342b354088c8e319&pid=1-s2.0-S266679912300031X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138484866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Applied Corpus Linguistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1