首页 > 最新文献

ICAME journal : computers in English linguistics最新文献

英文 中文
Ole Schützler and Julia Schlüter (eds.). Data and methods in corpus linguistics. Comparative approaches. Cambridge: Cambridge University Press, 2022. 357 pp. ISBN 978-1-10849964-4 Ole schtzler和Julia schl<e:1> ter(编)。语料库语言学中的数据与方法。比较的方法。剑桥:剑桥大学出版社,2022。357页,ISBN 978-1-10849964-4
Pub Date : 2023-05-01 DOI: 10.2478/icame-2023-0010
Matthias Eitelmann
{"title":"Ole Schützler and Julia Schlüter (eds.). Data and methods in corpus linguistics. Comparative approaches. Cambridge: Cambridge University Press, 2022. 357 pp. ISBN 978-1-10849964-4","authors":"Matthias Eitelmann","doi":"10.2478/icame-2023-0010","DOIUrl":"https://doi.org/10.2478/icame-2023-0010","url":null,"abstract":"","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73351201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From I am, with sincere regard, your most obedient servant to Yours sincerely: The simplification of leavetaking formulae in 18th-century Scottish and Irish English letters 我是您最恭顺的仆人,致您诚挚的:18世纪苏格兰和爱尔兰英语信件中送行公式的简化
Pub Date : 2023-05-01 DOI: 10.2478/icame-2023-0001
C. Elsweiler, P. Ronan
Abstract The study in hand investigates the impact of social status on the use and change of pragmatic formulae in historical varieties of English. The study asks which leavetaking formulae are used between writers of equal social status in varieties of English in the later 18th century. Working on a corpus of letters compiled from two subsets of letters each from 18th-century Scottish and Irish English, the study illustrates pragmatic change on the basis of the investigation of leavetakings involving the servant formula. By doing so, the study also helps to widen the hitherto predominating narrow focus on mainly English English. The study shows that the use of formulae is situationally dependant. It suggests that pragmatic change takes place amongst writers of equal social status in the private domain, which then leads to the use of such formulae in the public domain and to the use between writers of different status groups.
摘要本研究考察了历史英语变体中社会地位对语用句式使用和变化的影响。该研究询问了在18世纪后期的各种英语中,社会地位相同的作家之间使用哪些告别语。这项研究对18世纪苏格兰和爱尔兰英语的两个信件子集进行了汇编,并在对涉及仆人公式的离职调查的基础上说明了语用的变化。通过这样做,这项研究也有助于扩大迄今为止主要以英语为主的狭隘关注。研究表明,公式的使用取决于情况。这表明,在私人领域,社会地位平等的作家之间发生了务实的变化,这导致了这些公式在公共领域的使用,以及不同地位群体的作家之间的使用。
{"title":"From I am, with sincere regard, your most obedient servant to Yours sincerely: The simplification of leavetaking formulae in 18th-century Scottish and Irish English letters","authors":"C. Elsweiler, P. Ronan","doi":"10.2478/icame-2023-0001","DOIUrl":"https://doi.org/10.2478/icame-2023-0001","url":null,"abstract":"Abstract The study in hand investigates the impact of social status on the use and change of pragmatic formulae in historical varieties of English. The study asks which leavetaking formulae are used between writers of equal social status in varieties of English in the later 18th century. Working on a corpus of letters compiled from two subsets of letters each from 18th-century Scottish and Irish English, the study illustrates pragmatic change on the basis of the investigation of leavetakings involving the servant formula. By doing so, the study also helps to widen the hitherto predominating narrow focus on mainly English English. The study shows that the use of formulae is situationally dependant. It suggests that pragmatic change takes place amongst writers of equal social status in the private domain, which then leads to the use of such formulae in the public domain and to the use between writers of different status groups.","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84917963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compiling a corpus of South Asian online Englishes: A report, some reflections and a pilot study 编写南亚在线英语语料库:一份报告、一些反思和一项试点研究
Pub Date : 2023-05-01 DOI: 10.2478/icame-2023-0007
Muhammad Shakir, Dagmar Deuber
Abstract In this research article we introduce the South Asian Online Englishes (SAOnE) corpus representing four South Asian countries, i.e. Bangladesh, India, Pakistan, and Sri Lanka, and two native English-speaking countries, i.e. the UK and the USA. We have used semi-automatic and manual methods to collect data from three internet registers, i.e. newspaper comments, web forums and tweets, and a collection of internet sub-registers which we label as blogs and websites. Additionally, we have collected text messages using online freelance hiring platforms from each of the South Asian countries mentioned above. Each register category in the corpus consists of approximately 1 million words per register per country, except text messages, which contains around 500,000 words per country and only includes the four South Asian countries. We have verified the origin of website and blog links, authors of Twitter, and where possible of commenters and web forum users to make sure that only local content of each country is included. The corpus features some indigenous language content, which is tagged. In addition to the description of this dataset, we also present a pilot study analysing three discourse particles, namely na, neh, and yaar. The discourse particles na and yaar are native to Hindi/Urdu, while neh is based on a Sinhala negation marker. Our analysis indicates that na and neh have similarities in terms of their position in the clause/utterance. However, neh is confined to Sri Lanka while the Hindi/Urdu based discourse particles are also used in our Twitter data from Sri Lanka and Bangladesh. The use of these discourse particles in Bangladeshi tweets shows the influence of Indian culture through Bollywood celebrities. Of the Hindi/Urdu discourse particles yaar and na, yaar is preferred in Pakistan while na is preferred in India; additionally, yaar is used at the start of the clause more often in our Pakistani data. Lastly, we discuss the implications of the pilot study, the advantages of the type of data used for the pilot study, and future research directions.
本文介绍了南亚在线英语(SAOnE)语料库,该语料库代表了四个南亚国家,即孟加拉国、印度、巴基斯坦和斯里兰卡,以及两个以英语为母语的国家,即英国和美国。我们使用半自动和手动方法从三个互联网注册表收集数据,即报纸评论,网络论坛和推文,以及我们标记为博客和网站的互联网子注册表集合。此外,我们还收集了来自上述每个南亚国家的在线自由职业招聘平台的短信。语料库中的每个寄存器类别由每个国家的每个寄存器大约100万单词组成,但短信除外,每个国家大约包含50万单词,并且仅包括四个南亚国家。我们已经核实了网站和博客链接的来源,Twitter的作者,以及可能的评论和网络论坛用户,以确保只包括每个国家的本地内容。语料库的特点是一些本土语言内容,这些内容被标记。除了对该数据集的描述之外,我们还提出了一个初步研究,分析了三个话语粒子,即na, neh和yaar。语篇小品na和yaar原产于印地语/乌尔都语,而neh则基于僧伽罗语的否定标记。我们的分析表明na和neh在从句/话语中的位置有相似之处。然而,neh仅限于斯里兰卡,而基于印地语/乌尔都语的话语粒子也用于我们来自斯里兰卡和孟加拉国的Twitter数据。这些话语粒子在孟加拉人推文中的使用表明了印度文化通过宝莱坞名人的影响。在印地语/乌尔都语的话语粒子yaar和na中,巴基斯坦人更喜欢yaar,而印度人更喜欢na;此外,在我们的巴基斯坦语数据中,yaar更常用于子句的开头。最后,我们讨论了本次先导研究的意义、先导研究数据类型的优势以及未来的研究方向。
{"title":"Compiling a corpus of South Asian online Englishes: A report, some reflections and a pilot study","authors":"Muhammad Shakir, Dagmar Deuber","doi":"10.2478/icame-2023-0007","DOIUrl":"https://doi.org/10.2478/icame-2023-0007","url":null,"abstract":"Abstract In this research article we introduce the South Asian Online Englishes (SAOnE) corpus representing four South Asian countries, i.e. Bangladesh, India, Pakistan, and Sri Lanka, and two native English-speaking countries, i.e. the UK and the USA. We have used semi-automatic and manual methods to collect data from three internet registers, i.e. newspaper comments, web forums and tweets, and a collection of internet sub-registers which we label as blogs and websites. Additionally, we have collected text messages using online freelance hiring platforms from each of the South Asian countries mentioned above. Each register category in the corpus consists of approximately 1 million words per register per country, except text messages, which contains around 500,000 words per country and only includes the four South Asian countries. We have verified the origin of website and blog links, authors of Twitter, and where possible of commenters and web forum users to make sure that only local content of each country is included. The corpus features some indigenous language content, which is tagged. In addition to the description of this dataset, we also present a pilot study analysing three discourse particles, namely na, neh, and yaar. The discourse particles na and yaar are native to Hindi/Urdu, while neh is based on a Sinhala negation marker. Our analysis indicates that na and neh have similarities in terms of their position in the clause/utterance. However, neh is confined to Sri Lanka while the Hindi/Urdu based discourse particles are also used in our Twitter data from Sri Lanka and Bangladesh. The use of these discourse particles in Bangladeshi tweets shows the influence of Indian culture through Bollywood celebrities. Of the Hindi/Urdu discourse particles yaar and na, yaar is preferred in Pakistan while na is preferred in India; additionally, yaar is used at the start of the clause more often in our Pakistani data. Lastly, we discuss the implications of the pilot study, the advantages of the type of data used for the pilot study, and future research directions.","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74576326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TV series as disseminators of emerging vocabulary: Non-codified expressions in the TV Corpus 电视剧作为新兴词汇的传播者:电视语料库中的非典型化表达
Pub Date : 2023-05-01 DOI: 10.2478/icame-2023-0004
Daniela Landert, Tanja Säily, Mika Hämäläinen
Abstract This study presents a method for identifying words that appear in corpus data earlier than their first date of attestation in dictionaries. We demonstrate the application of this method based on a large diachronic corpus, the TV Corpus, and the Oxford English Dictionary (OED). Combining automatic extraction of candidate terms from the TV Corpus with comprehensive manual analysis and verification, the method identifies 32 words that were used in TV series before their first attestation in the OED. We present a detailed discussion of these words, analysing their distribution across decades and genres of the TV Corpus, their origins, semantic domains and word-formation processes. We also present extracts with their first uses in the TV Corpus and analyse how the words were presented to the large and anonymous mass audience. Our study shows that the method we present is suitable for identifying early attestations of words in large corpora, even though in the case of the TV Corpus, a great deal of manual analysis and verification is needed. In addition, we argue that TV series and other types of fictional texts are an important resource for studying the coinage and spread of terms, due to their function and the fact that they address a mass audience.
摘要本研究提出了一种识别语料库数据中出现的早于词典首次证明日期的单词的方法。我们在一个大型历时语料库、电视语料库和牛津英语词典(OED)的基础上演示了这种方法的应用。该方法将自动从电视语料库中提取候选词与全面的人工分析和验证相结合,确定了32个在电视连续剧首次在牛津英语词典中得到证实之前使用过的词。我们对这些词进行了详细的讨论,分析了它们在电视语料库中几十年和体裁的分布、它们的起源、语义域和构词过程。我们还介绍了在电视语料库中首次使用的摘录,并分析了这些单词是如何呈现给大量匿名的大众观众的。我们的研究表明,我们提出的方法适用于识别大型语料库中的单词的早期证明,尽管在电视语料库的情况下,需要大量的人工分析和验证。此外,我们认为电视剧和其他类型的虚构文本是研究术语造词和传播的重要资源,因为它们的功能和面向大众的事实。
{"title":"TV series as disseminators of emerging vocabulary: Non-codified expressions in the TV Corpus","authors":"Daniela Landert, Tanja Säily, Mika Hämäläinen","doi":"10.2478/icame-2023-0004","DOIUrl":"https://doi.org/10.2478/icame-2023-0004","url":null,"abstract":"Abstract This study presents a method for identifying words that appear in corpus data earlier than their first date of attestation in dictionaries. We demonstrate the application of this method based on a large diachronic corpus, the TV Corpus, and the Oxford English Dictionary (OED). Combining automatic extraction of candidate terms from the TV Corpus with comprehensive manual analysis and verification, the method identifies 32 words that were used in TV series before their first attestation in the OED. We present a detailed discussion of these words, analysing their distribution across decades and genres of the TV Corpus, their origins, semantic domains and word-formation processes. We also present extracts with their first uses in the TV Corpus and analyse how the words were presented to the large and anonymous mass audience. Our study shows that the method we present is suitable for identifying early attestations of words in large corpora, even though in the case of the TV Corpus, a great deal of manual analysis and verification is needed. In addition, we argue that TV series and other types of fictional texts are an important resource for studying the coinage and spread of terms, due to their function and the fact that they address a mass audience.","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88830233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A comparative corpus-based investigation of results sections of research articles in Applied Linguistics and Physics 基于语料库的应用语言学与物理学研究论文结果部分的比较研究
Pub Date : 2023-05-01 DOI: 10.2478/icame-2023-0005
Muhammed Parviz
Abstract The present study sought to identify the generic structures of the results sections of scientific research articles (RAs) between Applied Linguistics and Physics. Following a manual search approach, a total of 200 RAs in the field of Applied Linguistics and Physics from different top prestigious journals randomly were singled out and analyzed. In addition to offering a tentative template for the rhetorical organizations of results sections, the findings revealed shared and non-shared rhetorical units as well as obligatory and optional steps in the results sections (RSs) of research articles between the disciplines. The findings also indicated that RA writers organize the contents of the RSs around certain rhetorical resources (i.e., M1, M2, M3, M4, and M5) to present key experimental and factual analytical results of their studies. The findings further suggested the existence of common core of rhetorical resources in writing RSs between the disciplines, albeit there are a set of certain steps playing an essential part in distinguishing textual features of each discipline as well as depicting how RSs of individual discipline are developed. The findings generated from the study can offer a number of important pedagogical implications for teaching EAP and ESP courses, especially for Applied Linguistics and Physics teachers and students.
摘要本研究旨在确定应用语言学和物理学之间的科研论文结果部分的一般结构。采用人工检索的方法,随机抽取应用语言学和物理学领域来自不同顶级权威期刊的200篇RAs论文进行分析。除了为结果部分的修辞组织提供一个尝试性模板外,研究结果还揭示了学科之间研究文章的结果部分(RSs)中的共享和非共享修辞单位以及强制性和可选步骤。研究结果还表明,RA作者围绕一定的修辞资源(即M1、M2、M3、M4和M5)组织RSs的内容,以呈现他们研究的关键实验和事实分析结果。研究结果进一步表明,尽管在区分每个学科的文本特征以及描述单个学科的RSs是如何发展的过程中,有一系列特定的步骤起着至关重要的作用,但不同学科之间的RSs写作中存在共同的修辞资源核心。研究结果可以为EAP和ESP课程的教学提供一些重要的教学启示,特别是对应用语言学和物理学的教师和学生。
{"title":"A comparative corpus-based investigation of results sections of research articles in Applied Linguistics and Physics","authors":"Muhammed Parviz","doi":"10.2478/icame-2023-0005","DOIUrl":"https://doi.org/10.2478/icame-2023-0005","url":null,"abstract":"Abstract The present study sought to identify the generic structures of the results sections of scientific research articles (RAs) between Applied Linguistics and Physics. Following a manual search approach, a total of 200 RAs in the field of Applied Linguistics and Physics from different top prestigious journals randomly were singled out and analyzed. In addition to offering a tentative template for the rhetorical organizations of results sections, the findings revealed shared and non-shared rhetorical units as well as obligatory and optional steps in the results sections (RSs) of research articles between the disciplines. The findings also indicated that RA writers organize the contents of the RSs around certain rhetorical resources (i.e., M1, M2, M3, M4, and M5) to present key experimental and factual analytical results of their studies. The findings further suggested the existence of common core of rhetorical resources in writing RSs between the disciplines, albeit there are a set of certain steps playing an essential part in distinguishing textual features of each discipline as well as depicting how RSs of individual discipline are developed. The findings generated from the study can offer a number of important pedagogical implications for teaching EAP and ESP courses, especially for Applied Linguistics and Physics teachers and students.","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75205367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Gender and evaluation in contemporary American English: A corpus study based on pronominal and nominal expressions with male and female reference 当代美国英语中的性别与评价:基于男女指称代词和名词性表达的语料库研究
Pub Date : 2023-05-01 DOI: 10.2478/icame-2023-0003
Md Nazmus Saqueb Kathon
Abstract This study of contemporary American English examines how males and females are evaluated in terms of their personality, physical appearance, societal importance, etc. across various registers. In this study, evaluation is defined as an expression of a speaker or writer’s attitude toward, viewpoint on, or feelings about a male or female referent, which generally carries a positive or a negative meaning. The evaluative tokens analyzed in the study include noun phrases (e.g., a real jerk) and adjectival modification (e.g., congenial) co-occurring with gender-specific nominal expressions (e.g., boy, lady) or pronominal expressions (e.g., he, she). The findings imply a distinct gender patterning in the evaluation: whereas males are evaluated in terms of their skills, abilities, acuities and importance in society, females are typically assessed in terms of their looks and appearance. Males occupy considerably more evaluative space than females, particularly in the Newspaper register. The preponderance of the evaluation of males even in twenty-first-century American English is surprising, considering changes in gender role attitudes in U.S. society in recent decades.
摘要:本研究考察了当代美国英语中男性和女性在个性、外貌、社会重要性等方面的评价。在本研究中,评价被定义为说话者或作者对男性或女性指称物的态度、观点或感受的表达,通常带有积极或消极的意义。在研究中分析的评价符号包括名词短语(例如,一个真正的混蛋)和形容词修饰(例如,相投的)与特定性别的名义表达(例如,男孩,女士)或代词表达(例如,他,她)同时出现。研究结果表明,在评估中存在明显的性别模式:男性的评估标准是他们的技能、能力、敏锐度和在社会中的重要性,而女性的评估标准则是她们的长相和外表。男性比女性占据了更多的评价空间,特别是在报纸注册中。考虑到近几十年来美国社会性别角色态度的变化,即使在21世纪的美式英语中,对男性的评价也占主导地位,这令人惊讶。
{"title":"Gender and evaluation in contemporary American English: A corpus study based on pronominal and nominal expressions with male and female reference","authors":"Md Nazmus Saqueb Kathon","doi":"10.2478/icame-2023-0003","DOIUrl":"https://doi.org/10.2478/icame-2023-0003","url":null,"abstract":"Abstract This study of contemporary American English examines how males and females are evaluated in terms of their personality, physical appearance, societal importance, etc. across various registers. In this study, evaluation is defined as an expression of a speaker or writer’s attitude toward, viewpoint on, or feelings about a male or female referent, which generally carries a positive or a negative meaning. The evaluative tokens analyzed in the study include noun phrases (e.g., a real jerk) and adjectival modification (e.g., congenial) co-occurring with gender-specific nominal expressions (e.g., boy, lady) or pronominal expressions (e.g., he, she). The findings imply a distinct gender patterning in the evaluation: whereas males are evaluated in terms of their skills, abilities, acuities and importance in society, females are typically assessed in terms of their looks and appearance. Males occupy considerably more evaluative space than females, particularly in the Newspaper register. The preponderance of the evaluation of males even in twenty-first-century American English is surprising, considering changes in gender role attitudes in U.S. society in recent decades.","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83149273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pascual Pérez-Paredes and Geraldine Mark (eds.). Beyond concordance lines: Corpora in language education. Amsterdam/Philadelphia: John Benjamins Publishing Company, 2021. ix. 255 pp. ISBN: 978-9-02720989-4 (HB)
Pub Date : 2023-05-01 DOI: 10.2478/icame-2023-0009
Peter Crosthwaite
{"title":"Pascual Pérez-Paredes and Geraldine Mark (eds.). Beyond concordance lines: Corpora in language education. Amsterdam/Philadelphia: John Benjamins Publishing Company, 2021. ix. 255 pp. ISBN: 978-9-02720989-4 (HB)","authors":"Peter Crosthwaite","doi":"10.2478/icame-2023-0009","DOIUrl":"https://doi.org/10.2478/icame-2023-0009","url":null,"abstract":"","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87792921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Corpus of Contemporary English Legal Decisions, 1950–2021 (CoCELD): A new tool for analysing recent changes in English legal discourse 当代英语法律决策语料库,1950-2021 (CoCELD):分析英语法律话语最近变化的新工具
Pub Date : 2023-05-01 DOI: 10.2478/icame-2023-0006
Paula Rodríguez-Puente, David Hernández-Coalla
Abstract Legal discourse is widely assumed to be resistant to change, and indeed legislative documents are extremely conservative with fixed and formulaic structures. However, recent research has shown that changes can be observed in the lexico-grammatical features of some legal documents when examined diachronically, particularly since the emergence in the 1970s of the Plain Language Movement, which sought to draw attention to the unnecessary complexity of the official language, this including legal discourse. Despite the crucial changes in legal language in recent years, research in that direction is scarce to date, particularly in the British English variety, probably due, in part, to the shortage of specialised corpora that allow this kind of studies. In order to bridge this gap, we have embarked on the compilation of the Corpus of Contemporary English Legal Decisions, 1950–2021 (CoCELD), a corpus of British judicial decisions produced between 1950 and 2021. In this paper we present the structure and characteristics of CoCELD, as well as the methodology used for its compilation. The new corpus, which was released in February 2022, contains sample texts of roughly 2,500 words for each year from 1950 to 2021, which adds up to more than 730,000 words. The corpus contains files in raw text and with POS-annotation, and is freely available for the research community under signed consent. With CoCELD we hope to contribute with a new, useful resource for linguists with an interest in legal language, from both a synchronic and a diachronic perspective.
法律话语被普遍认为是抗拒变革的,事实上,立法文件极其保守,结构固定、公式化。然而,最近的研究表明,当考察历时时,可以观察到一些法律文件的词汇语法特征的变化,特别是自20世纪70年代出现的朴素语言运动以来,该运动试图引起人们对官方语言(包括法律话语)不必要的复杂性的关注。尽管近年来法律语言发生了重大变化,但到目前为止,这方面的研究还很少,尤其是在英式英语方面,部分原因可能是缺乏专门的语料库,无法进行此类研究。为了弥补这一差距,我们开始编写《1950 - 2021年当代英国法律判决语料库》(CoCELD),这是1950年至2021年间英国司法判决的语料库。在本文中,我们介绍了CoCELD的结构和特点,以及用于其编制的方法。新语料库于2022年2月发布,从1950年到2021年,每年包含大约2500个单词的样本文本,加起来超过73万个单词。该语料库包含原始文本和带有poss注释的文件,并在签署同意的情况下免费提供给研究界。通过CoCELD,我们希望从共时和历时的角度为对法律语言感兴趣的语言学家提供一个新的、有用的资源。
{"title":"The Corpus of Contemporary English Legal Decisions, 1950–2021 (CoCELD): A new tool for analysing recent changes in English legal discourse","authors":"Paula Rodríguez-Puente, David Hernández-Coalla","doi":"10.2478/icame-2023-0006","DOIUrl":"https://doi.org/10.2478/icame-2023-0006","url":null,"abstract":"Abstract Legal discourse is widely assumed to be resistant to change, and indeed legislative documents are extremely conservative with fixed and formulaic structures. However, recent research has shown that changes can be observed in the lexico-grammatical features of some legal documents when examined diachronically, particularly since the emergence in the 1970s of the Plain Language Movement, which sought to draw attention to the unnecessary complexity of the official language, this including legal discourse. Despite the crucial changes in legal language in recent years, research in that direction is scarce to date, particularly in the British English variety, probably due, in part, to the shortage of specialised corpora that allow this kind of studies. In order to bridge this gap, we have embarked on the compilation of the Corpus of Contemporary English Legal Decisions, 1950–2021 (CoCELD), a corpus of British judicial decisions produced between 1950 and 2021. In this paper we present the structure and characteristics of CoCELD, as well as the methodology used for its compilation. The new corpus, which was released in February 2022, contains sample texts of roughly 2,500 words for each year from 1950 to 2021, which adds up to more than 730,000 words. The corpus contains files in raw text and with POS-annotation, and is freely available for the research community under signed consent. With CoCELD we hope to contribute with a new, useful resource for linguists with an interest in legal language, from both a synchronic and a diachronic perspective.","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85299261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tony McEnery and Vaclav Brezina. Fundamental principles of corpus linguistics. Cambridge: Cambridge University Press, 2022. 313 pp. ISBN 978-1-1071-1062-5 托尼·麦克纳里和瓦茨拉夫·布雷吉娜。语料库语言学的基本原理。剑桥:剑桥大学出版社,2022。313页。ISBN 978-1-1071- 1065
Pub Date : 2023-05-01 DOI: 10.2478/icame-2023-0008
Magnus Levin
To date, only a few studies have been carried out on how to position the corpus linguistic approach within the study of language and within scientific approaches in general. Among the notable exceptions, McEnery and Brezina’s introduction (p. 1–2) mentions Leech (1992), Stubbs (2001) and Teubert (2005). There is thus a considerable gap to be filled by the present volume. The focus is not, however, to contrast the authors’ stance with that of previous linguistic studies, but instead to take renewed look at corpus linguistics through Karl Popper’s work on the philosophy of science, as this “provokes new ways of looking at old problems and practices” (p. 2). Across the chapters, McEnery and Brezina formulate 48 principles of corpus linguistics, some of which are partly modified in the course of the discussion. These principles constitute the theoretical foundations of corpus linguistics – three of the central ones are given below:
迄今为止,关于语料库语言学方法如何在语言研究和一般科学方法中定位的研究很少。在值得注意的例外中,McEnery和Brezina的引言(1-2页)提到了Leech (1992), Stubbs(2001)和Teubert(2005)。因此,有一个相当大的空白需要目前的卷来填补。然而,重点不是将作者的立场与以前的语言学研究进行对比,而是通过卡尔·波普尔关于科学哲学的工作来重新审视语料库语言学,因为这“激发了看待旧问题和实践的新方法”(第2页)。在各个章节中,McEnery和Brezina制定了48条语料库语言学原则,其中一些原则在讨论过程中被部分修改。这些原则构成了语料库语言学的理论基础,其中三个主要原则如下:
{"title":"Tony McEnery and Vaclav Brezina. Fundamental principles of corpus linguistics. Cambridge: Cambridge University Press, 2022. 313 pp. ISBN 978-1-1071-1062-5","authors":"Magnus Levin","doi":"10.2478/icame-2023-0008","DOIUrl":"https://doi.org/10.2478/icame-2023-0008","url":null,"abstract":"To date, only a few studies have been carried out on how to position the corpus linguistic approach within the study of language and within scientific approaches in general. Among the notable exceptions, McEnery and Brezina’s introduction (p. 1–2) mentions Leech (1992), Stubbs (2001) and Teubert (2005). There is thus a considerable gap to be filled by the present volume. The focus is not, however, to contrast the authors’ stance with that of previous linguistic studies, but instead to take renewed look at corpus linguistics through Karl Popper’s work on the philosophy of science, as this “provokes new ways of looking at old problems and practices” (p. 2). Across the chapters, McEnery and Brezina formulate 48 principles of corpus linguistics, some of which are partly modified in the course of the discussion. These principles constitute the theoretical foundations of corpus linguistics – three of the central ones are given below:","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78639256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Meaning differences between English clippings and their source words: A corpus-based study 基于语料库的英语剪报与原词词义差异研究
Pub Date : 2023-05-01 DOI: 10.2478/icame-2023-0002
M. Hilpert, D. Saavedra, Jennifer Rains
Abstract This paper uses corpus data and methods of distributional semantics in order to study English clippings such as dorm (< dormitory), memo (< memorandum), or quake (< earthquake). We investigate whether systematic meaning differences between clippings and their source words can be detected. The analysis is based on a sample of 50 English clippings. Each of the clippings is represented by a concordance of 100 examples in context that were gathered from the Corpus of Contemporary American English. We compare clippings and their source words both at the aggregate level and in terms of comparisons between individual clippings and their source words. The data show that clippings tend to be used in contexts that represent involved text production, which aligns with the idea that clipped words signal familiarity with their referents. It is further observed that individual clippings and their source words partly diverge in their distributional profiles, reflecting both overlap and differences with regard to their meanings. We interpret these findings against the theoretical background of Construction Grammar and specifically the Principle of No Synonymy.
摘要本文利用语料库数据和分布语义的方法,对英语剪报(dorm, < dorm)、备忘录(memo, < memorandum)、地震(quake, < earthquake)等进行了研究。我们研究是否可以检测到剪报与其源词之间的系统意义差异。这项分析是基于50份英国剪报的样本。每一个剪报都是由从当代美国英语语料库中收集的100个上下文例子的一致性代表的。我们既在总体水平上比较剪报和它们的源词,也在个别剪报和它们的源词之间进行比较。数据显示,剪接词往往用在表示涉及文本生产的语境中,这与剪接词表示对其所指物熟悉的观点一致。进一步观察到,个别剪报及其源词在其分布概况中存在部分差异,反映了其含义的重叠和差异。我们在构式语法的理论背景下,特别是在无同义原则的基础上解释这些发现。
{"title":"Meaning differences between English clippings and their source words: A corpus-based study","authors":"M. Hilpert, D. Saavedra, Jennifer Rains","doi":"10.2478/icame-2023-0002","DOIUrl":"https://doi.org/10.2478/icame-2023-0002","url":null,"abstract":"Abstract This paper uses corpus data and methods of distributional semantics in order to study English clippings such as dorm (< dormitory), memo (< memorandum), or quake (< earthquake). We investigate whether systematic meaning differences between clippings and their source words can be detected. The analysis is based on a sample of 50 English clippings. Each of the clippings is represented by a concordance of 100 examples in context that were gathered from the Corpus of Contemporary American English. We compare clippings and their source words both at the aggregate level and in terms of comparisons between individual clippings and their source words. The data show that clippings tend to be used in contexts that represent involved text production, which aligns with the idea that clipped words signal familiarity with their referents. It is further observed that individual clippings and their source words partly diverge in their distributional profiles, reflecting both overlap and differences with regard to their meanings. We interpret these findings against the theoretical background of Construction Grammar and specifically the Principle of No Synonymy.","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84622379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ICAME journal : computers in English linguistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1