首页 > 最新文献

Applied Corpus Linguistics最新文献

英文 中文
Stative verbs and perceptions of intensity: The case of ‘believe’ in simple and progressive aspect 静态动词与强度知觉——以“Believe”的简单体和进行体为例
Pub Date : 2023-09-04 DOI: 10.1016/j.acorp.2023.100072
Naoko Taguchi , Marianna Gracheva

This study assessed the validity of descriptive findings from corpus linguistics research by analyzing human participants’ performance and perception data. While the stative verb believe usually occurs in the simple aspect, a corpus-based analysis has revealed that believe also occurs in the progressive form in communicative situations conveying a heightened degree of intensity and marked with specific linguistic features such as intensifying adjectives, adverbs of certainty, direct addresses, and others (Gracheva, in press). This study adopted an experimental approach to further assess the link between the progressive form in situations of use conducive to assertive stance and emotional involvement and its surrounding linguistic characteristics. Eighty-six native English speakers were presented with 24 naturally-occurring texts from corpora. Half of the texts involved linguistic features of intensity (progressive aspect condition), while half involved no such features (simple aspect condition). Participants read the texts and selected the form of believe (simple or progressive aspect) which they thought was appropriate in each text. Results showed that participants selected the progressive aspect 47% of the times for the texts featuring language of intensity, while their selection of that aspect was less than 3% in the simple condition texts. Follow-up interviews revealed that participants sensed the intensity conveyed by the texts (e.g., strong emotion, urgency, emphasis), leading to their choice of the progressive over the simple aspect.

本研究通过分析人类参与者的表现和感知数据来评估语料库语言学研究中描述性发现的有效性。虽然状态动词believe通常以简单语态出现,但一项基于语料库的分析表明,believe在交际情境中也以进行形式出现,传达出更高程度的强度,并带有特定的语言特征,如强化形容词、确定性副词、直接指代等(Gracheva, in press)。本研究采用实验方法进一步评估了在有利于自信立场和情感投入的使用情境中进行式及其周围语言特征之间的联系。86名以英语为母语的人从语料库中获得了24个自然产生的文本。一半的文本涉及强度的语言特征(进行体条件),而一半的文本不涉及强度的语言特征(简单体条件)。参与者阅读文本并选择他们认为在每篇文本中合适的相信形式(简单或渐进)。结果表明,在具有强烈语言的文本中,参与者选择进步方面的次数占47%,而在简单条件文本中,他们选择进步方面的次数不到3%。后续访谈显示,参与者感受到文本所传达的强度(例如,强烈的情感、紧迫性、强调),导致他们选择渐进而不是简单的方面。
{"title":"Stative verbs and perceptions of intensity: The case of ‘believe’ in simple and progressive aspect","authors":"Naoko Taguchi ,&nbsp;Marianna Gracheva","doi":"10.1016/j.acorp.2023.100072","DOIUrl":"10.1016/j.acorp.2023.100072","url":null,"abstract":"<div><p><span>This study assessed the validity of descriptive findings from corpus linguistics research by analyzing human participants’ performance and perception data. While the stative verb </span><em>believe</em> usually occurs in the simple aspect, a corpus-based analysis has revealed that <em>believe</em> also occurs in the progressive form in communicative situations conveying a heightened degree of intensity and marked with specific linguistic features such as intensifying adjectives, adverbs of certainty, direct addresses, and others (<span>Gracheva, in press</span>). This study adopted an experimental approach to further assess the link between the progressive form in situations of use conducive to assertive stance and emotional involvement and its surrounding linguistic characteristics. Eighty-six native English speakers were presented with 24 naturally-occurring texts from corpora. Half of the texts involved linguistic features of intensity (progressive aspect condition), while half involved no such features (simple aspect condition). Participants read the texts and selected the form of <em>believe</em> (simple or progressive aspect) which they thought was appropriate in each text. Results showed that participants selected the progressive aspect 47% of the times for the texts featuring language of intensity, while their selection of that aspect was less than 3% in the simple condition texts. Follow-up interviews revealed that participants sensed the intensity conveyed by the texts (e.g., strong emotion, urgency, emphasis), leading to their choice of the progressive over the simple aspect<em>.</em></p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 3","pages":"Article 100072"},"PeriodicalIF":0.0,"publicationDate":"2023-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49497487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The DAIS-C: A small, specialised, spoken, schizophrenia corpus DAIS-C:一个小型的、专门的、口语的精神分裂症语料库
Pub Date : 2023-08-23 DOI: 10.1016/j.acorp.2023.100069
Oliver Delgaram-Nejad , Dawn Archer , Gerasimos Chatzidamianos , Louise Robinson , Alex Bartha

This paper describes the design and development of the DAIS-C (Discussing Abstract Ideas in Schizophrenia Corpus), a small, specialised corpus of spoken language in which speakers with a diagnosis of schizophrenia and those with no self-reported psychiatric or neuroleptic history were interviewed on the same topics. The corpus was constructed to allow for comparative analyses of speech behaviour in relation to linguistic creativity and formal thought disorder (FTD), but additional steps were taken to ensure that the corpus could be of use to other researchers and research questions. The present paper covers design decisions relevant to the construction of clinical corpora alongside information about the corpus of potential use to researchers interested in its use.

本文描述了DAIS-C(讨论精神分裂症语料库中的抽象概念)的设计和开发,这是一个小型的、专门的口语语料库,在这个语料库中,被诊断为精神分裂症的说话者和那些没有自我报告精神或神经安定病史的说话者就相同的主题进行了访谈。该语料库的构建是为了对语言创造力和形式思维障碍(FTD)相关的言语行为进行比较分析,但还采取了额外的步骤,以确保该语料库可以用于其他研究人员和研究问题。本论文涵盖了与临床语料库建设相关的设计决策,以及对其使用感兴趣的研究人员潜在使用语料库的信息。
{"title":"The DAIS-C: A small, specialised, spoken, schizophrenia corpus","authors":"Oliver Delgaram-Nejad ,&nbsp;Dawn Archer ,&nbsp;Gerasimos Chatzidamianos ,&nbsp;Louise Robinson ,&nbsp;Alex Bartha","doi":"10.1016/j.acorp.2023.100069","DOIUrl":"10.1016/j.acorp.2023.100069","url":null,"abstract":"<div><p>This paper describes the design and development of the DAIS-C (Discussing Abstract Ideas in Schizophrenia Corpus), a small, specialised corpus of spoken language in which speakers with a diagnosis of schizophrenia and those with no self-reported psychiatric or neuroleptic history were interviewed on the same topics. The corpus was constructed to allow for comparative analyses of speech behaviour in relation to linguistic creativity and formal thought disorder (FTD), but additional steps were taken to ensure that the corpus could be of use to other researchers and research questions. The present paper covers design decisions relevant to the construction of clinical corpora alongside information about the corpus of potential use to researchers interested in its use.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 3","pages":"Article 100069"},"PeriodicalIF":0.0,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48630178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development of word count data corpus for Hindi and Marathi literature 印地语和马拉地语文学字数统计语料库的开发
Pub Date : 2023-08-23 DOI: 10.1016/j.acorp.2023.100070
Vivek Belhekar, Radhika Bhargava

India has a huge diversity of languages, and Hindi and Marathi are the most spoken languages in the northern and western parts of India. Hindi and Marathi have more than 528 million and 83 million speakers, respectively. The paper describes the development of the Hindi Word Corpus (Hindi WordCorp) and the Marathi Word Corpus (Marathi WordCorp), reporting the frequency of single words (1-gram) used in written texts of the respective languages using the bag-of-words model (BoW). The word frequencies are provided for eleven decades (pre-1920, 1920 to 2020). These texts include books (fiction, non-fiction, history, autobiographies, etc.) and magazines. Academic and reference books were not used. The Hindi WordCorp and Marathi WordCorp used 640 and 712 texts, respectively. An analysis was employed to check whether the texts used were enough to stabilize the rank-order of the total frequencies of the words. Zipf's and Heaps’ law coefficients indicated the sufficiency of the texts. Researchers in various areas like linguistics, social sciences, text mining, machine learning, etc., can use the dataset to answer research questions about language and culture. Some demonstrative examples are provided for using the datasets in the two languages. The dataset is made available on an open data repository. The paper is an account of dataset creation for Hindi and Marathi WordCorp. Hence, no empirical results or conclusions are made based on the data created. A WebApp named Indian Languages Word Corpus (ILWC) has been developed for users. Future directions for text mining and language models are discussed.

印度有多种多样的语言,印地语和马拉地语是印度北部和西部最常用的语言。印地语和马拉地语的使用者分别超过5.28亿和8300万。本文描述了印地语单词语料库(Hindi WordCorp)和马拉地语单词语料库(Marathi WordCorp)的发展,使用单词袋模型(BoW)报告了各自语言书面文本中单个单词(1克)的使用频率。提供了11年的词频(1920年以前、1920年至2020年)。这些文本包括书籍(小说、非小说、历史、自传等)和杂志。没有使用学术书籍和参考书。印地语词汇公司和马拉地语词汇公司分别使用了640和712个文本。通过分析来检查所使用的文本是否足以稳定单词总频率的等级顺序。齐夫定律系数和希普斯定律系数表明了文本的充分性。语言学、社会科学、文本挖掘、机器学习等各个领域的研究人员都可以使用该数据集来回答有关语言和文化的研究问题。提供了一些使用两种语言的数据集的示范示例。数据集在开放数据存储库上可用。这篇论文是关于印地语和马拉地语WordCorp数据集创建的一篇文章。因此,没有实证结果或结论是基于所创建的数据。为用户开发了一个名为印度语言词库(ILWC)的web应用程序。讨论了文本挖掘和语言模型的未来发展方向。
{"title":"Development of word count data corpus for Hindi and Marathi literature","authors":"Vivek Belhekar,&nbsp;Radhika Bhargava","doi":"10.1016/j.acorp.2023.100070","DOIUrl":"10.1016/j.acorp.2023.100070","url":null,"abstract":"<div><p><span>India has a huge diversity of languages, and Hindi and Marathi are the most spoken languages in the northern and western parts of India. Hindi and Marathi have more than 528 million and 83 million speakers, respectively. The paper describes the development of the Hindi Word Corpus (Hindi WordCorp) and the Marathi Word Corpus (Marathi WordCorp), reporting the frequency of single words (1-gram) used in written texts of the respective languages using the bag-of-words model (BoW). The word frequencies are provided for eleven decades (pre-1920, 1920 to 2020). These texts include books (fiction, non-fiction, history, autobiographies, etc.) and magazines. Academic and reference books were not used. The Hindi WordCorp and Marathi WordCorp used 640 and 712 texts, respectively. An analysis was employed to check whether the texts used were enough to stabilize the rank-order of the total frequencies of the words. Zipf's and Heaps’ law coefficients indicated the sufficiency of the texts. Researchers in various areas like linguistics, social sciences, text mining, machine learning, etc., can use the dataset to answer research questions about language and culture. Some demonstrative examples are provided for using the datasets in the two languages. The dataset is made available on an </span>open data<span> repository. The paper is an account of dataset creation for Hindi and Marathi WordCorp. Hence, no empirical results or conclusions are made based on the data created. A WebApp named Indian Languages Word Corpus (ILWC) has been developed for users. Future directions for text mining and language models are discussed.</span></p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 3","pages":"Article 100070"},"PeriodicalIF":0.0,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48742031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The identification of YouTube videos that feature the linguistic features of English informal speech YouTube视频中具有英语非正式言语语言特征的识别
Pub Date : 2023-08-20 DOI: 10.1016/j.acorp.2023.100068
Christopher R. Cooper

YouTube is becoming an increasingly popular entertainment platform, with videos catering to a wide range of interests. If L2 users are to become proficient in the primary form of language, conversation, then the affordances created by YouTube videos containing informal speech could be very useful. In the current study a near-random corpus of 2602 YouTube video transcripts was compiled and 200 randomly selected texts from the Spoken BNC2014 (Love et al., 2017) were used as a reference corpus representing informal spoken English. The texts were tagged with 67 linguistic features as part of an additive multi-dimensional analysis. The dimension scores for each text were used in a cluster analysis to investigate which texts clustered with the Spoken BNC2014 texts. A two-cluster solution was chosen with 666 YouTube texts and 171 Spoken BNC2014 texts in one cluster, and the remaining texts in the other cluster. A small sample of texts from each cluster was analysed in detail. It is shown that this method has the potential to identify videos featuring informal speech and that some videos with similar categories have a very different linguistic style.

YouTube正在成为一个越来越受欢迎的娱乐平台,其视频迎合了广泛的兴趣。如果第二语言使用者要精通语言的主要形式,即对话,那么YouTube视频中包含的非正式演讲可能非常有用。在当前的研究中,编译了2602个YouTube视频文本的近乎随机语料库,并从口语BNC2014 (Love et al., 2017)中随机选择了200个文本作为代表非正式口语的参考语料库。作为附加的多维分析的一部分,这些文本被标记为67种语言特征。在聚类分析中使用每个文本的维度得分来调查哪些文本与口语BNC2014文本聚类。我们选择了一个双集群解决方案,其中666个YouTube文本和171个口语BNC2014文本在一个集群中,其余文本在另一个集群中。对每组文本的一小部分样本进行了详细分析。研究表明,这种方法有可能识别出具有非正式语言特征的视频,并且一些具有类似类别的视频具有非常不同的语言风格。
{"title":"The identification of YouTube videos that feature the linguistic features of English informal speech","authors":"Christopher R. Cooper","doi":"10.1016/j.acorp.2023.100068","DOIUrl":"10.1016/j.acorp.2023.100068","url":null,"abstract":"<div><p>YouTube is becoming an increasingly popular entertainment platform, with videos catering to a wide range of interests. If L2 users are to become proficient in the primary form of language, conversation, then the affordances created by YouTube videos containing informal speech could be very useful. In the current study a near-random corpus of 2602 YouTube video transcripts was compiled and 200 randomly selected texts from the Spoken BNC2014 (Love et al., 2017) were used as a reference corpus representing informal spoken English. The texts were tagged with 67 linguistic features as part of an additive multi-dimensional analysis. The dimension scores for each text were used in a cluster analysis to investigate which texts clustered with the Spoken BNC2014 texts. A two-cluster solution was chosen with 666 YouTube texts and 171 Spoken BNC2014 texts in one cluster, and the remaining texts in the other cluster. A small sample of texts from each cluster was analysed in detail. It is shown that this method has the potential to identify videos featuring informal speech and that some videos with similar categories have a very different linguistic style.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 3","pages":"Article 100068"},"PeriodicalIF":0.0,"publicationDate":"2023-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42633528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Review of McEnery and Brezina (2022) Fundamental Principles of Corpus Linguistics 回顾McEnery和Brezina(2022)语料库语言学的基本原则
Pub Date : 2023-08-01 DOI: 10.1016/j.acorp.2023.100055
Rickey Lu
{"title":"Review of McEnery and Brezina (2022) Fundamental Principles of Corpus Linguistics","authors":"Rickey Lu","doi":"10.1016/j.acorp.2023.100055","DOIUrl":"https://doi.org/10.1016/j.acorp.2023.100055","url":null,"abstract":"","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 2","pages":"Article 100055"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49858142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corpus to curriculum: Developing word lists for adult learners of Welsh 语料库到课程:为成人威尔士语学习者开发单词表
Pub Date : 2023-08-01 DOI: 10.1016/j.acorp.2023.100052
Dawn Knight , Tess Fitzpatrick , Steve Morris , Bethan Tovey-Walsh , Helen Prosser , Emyr Davies

The launch of a language's first comprehensive general corpus promises a sea-change in teaching and learning resources. Effective transition from corpus to classroom is not necessarily straightforward, though; expert and end-user input is essential for the potential of the corpus resource to be realised. This paper outlines the process by which fit-for-purpose vocabulary lists were derived from the new National Corpus of Contemporary Welsh (Corpws Cenedlaethol Cymraeg Cyfoes – CorCenCC). The immediate purpose in this case was to inform the revision of A1 and A2 level course materials for adult learners. A longer-term aim was to put in place a method by which vocabulary lists for more advanced level learners and learners of different ages could be extracted and developed from the corpus. The new corpus means that for the first time, the Welsh language curriculum is able to use word frequency information; teaching and assessment materials in major languages have been informed by word frequencies for several decades. Raw frequency lists, though, include troublesome content, and can exclude items with high relevance to learners. This paper demonstrates how, by working in partnership, Welsh language curriculum writers, assessors, language experts and corpus linguists can effectively manipulate corpus data into curriculum content. The methods and approaches reported here are replicable for use in other language contexts.

一种语言的第一个综合通用语料库的推出预示着教学资源的巨大变化。然而,从语料库到课堂的有效过渡并不一定是直接的;专家和最终用户的输入对于实现语料库资源的潜力至关重要。本文概述了从新的《当代威尔士国家语料库》(Corws Cenedlaethol Cymraeg Cyvenues–CorCenCC)中提取符合目的词汇表的过程。本案的直接目的是通知成人学习者A1和A2级别课程材料的修订。一个长期的目标是建立一种方法,通过该方法可以从语料库中提取和开发更高水平的学习者和不同年龄的学习者的词汇表。新语料库意味着威尔士语课程首次能够使用词频信息;几十年来,主要语言的教学和评估材料一直以单词频率为依据。然而,原始频率列表包括麻烦的内容,并且可以排除与学习者高度相关的项目。本文展示了通过合作,威尔士语课程作者、评估员、语言专家和语料库语言学家如何有效地将语料库数据转化为课程内容。这里报告的方法和方法可在其他语言环境中使用。
{"title":"Corpus to curriculum: Developing word lists for adult learners of Welsh","authors":"Dawn Knight ,&nbsp;Tess Fitzpatrick ,&nbsp;Steve Morris ,&nbsp;Bethan Tovey-Walsh ,&nbsp;Helen Prosser ,&nbsp;Emyr Davies","doi":"10.1016/j.acorp.2023.100052","DOIUrl":"https://doi.org/10.1016/j.acorp.2023.100052","url":null,"abstract":"<div><p>The launch of a language's first comprehensive general corpus promises a sea-change in teaching and learning resources. Effective transition from corpus to classroom is not necessarily straightforward, though; expert and end-user input is essential for the potential of the corpus resource to be realised. This paper outlines the process by which fit-for-purpose vocabulary lists were derived from the new National Corpus of Contemporary Welsh (<em>Corpws Cenedlaethol Cymraeg Cyfoes</em> – CorCenCC). The immediate purpose in this case was to inform the revision of A1 and A2 level course materials for adult learners. A longer-term aim was to put in place a method by which vocabulary lists for more advanced level learners and learners of different ages could be extracted and developed from the corpus. The new corpus means that for the first time, the Welsh language curriculum is able to use word frequency information; teaching and assessment materials in major languages have been informed by word frequencies for several decades. Raw frequency lists, though, include troublesome content, and can exclude items with high relevance to learners. This paper demonstrates how, by working in partnership, Welsh language curriculum writers, assessors, language experts and corpus linguists can effectively manipulate corpus data into curriculum content. The methods and approaches reported here are replicable for use in other language contexts.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 2","pages":"Article 100052"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49817971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The interface between specialized translation and institutional translation: A selection of candidate terms validated by Aeronautical Meteorology corpora 专业翻译和机构翻译之间的接口:航空气象语料库验证的候选术语选择
Pub Date : 2023-08-01 DOI: 10.1016/j.acorp.2023.100051
Rafaela Araújo Jordão Rigaud Peixoto

The purpose of this work is to revise and expand an aeronautical meteorology glossary, available at REDEMET, a homepage hosted on the Department of Airspace Control website, taking into consideration corpus data in the field. For that, to best meet the needs of institutions and users, data were compiled from some segments of the Aeronautical Meteorology domain. During the compilation of this corpus, it was noticed that there was a great scarcity of specialized sources of this Aviation subdomain in English and, mainly, in Portuguese, including material by the Department of Airspace Control (DECEA), the only official Brazilian institution with the role of regulating standards relevant to Aeronautical Meteorology. By taking into account that a given government institution is considered an authoritative source concerning terms used in a specialized domain, it would be advisable to align professional and academic expertise, and institutional interests. Therefore, based on contributions of corpus linguistics theories, terminology, and institutional translation, this work relied on established parameters for the compilation and processing of information for inclusion in the corpus, and focused, in this first stage, on the selection of candidate terms, according to corpus analysis. The first results showed that institutional and academic segments present some subtleties regarding terminology, as, on the one hand, some words are more specific to the academic register and, on the other hand, there are different uses of terms in the institutional setting, by ICAO, WMO, or FAA.

这项工作的目的是修订和扩充航空气象词汇表,该词汇表可在美国空域管制部网站上的REDEMET网站上获得,同时考虑到该领域的语料库数据。为此,为了最好地满足各机构和用户的需要,从航空气象领域的某些部分汇编了数据。在编制这个语料库的过程中,注意到这个航空子领域的专门资料非常缺乏,英文的主要是葡萄牙语的,包括巴西唯一负责管理航空气象学相关标准的官方机构——空域管制部(DECEA)的资料。考虑到一个给定的政府机构被认为是一个专业领域中使用的术语的权威来源,将专业和学术专长与机构利益结合起来是明智的。因此,基于语料库语言学理论、术语学和机构翻译的贡献,本工作依赖于已建立的语料库信息的编译和处理参数,并在第一阶段侧重于根据语料库分析选择候选术语。第一个结果表明,机构和学术部门在术语方面存在一些微妙之处,因为一方面,有些词更具体地用于学术登记,另一方面,在机构背景下,国际民航组织、世界气象组织或美国联邦航空局对术语的使用不同。
{"title":"The interface between specialized translation and institutional translation: A selection of candidate terms validated by Aeronautical Meteorology corpora","authors":"Rafaela Araújo Jordão Rigaud Peixoto","doi":"10.1016/j.acorp.2023.100051","DOIUrl":"10.1016/j.acorp.2023.100051","url":null,"abstract":"<div><p><span><span>The purpose of this work is to revise and expand an aeronautical meteorology glossary, available at REDEMET, a homepage hosted on the Department of Airspace Control website, taking into consideration corpus data in the field. For that, to best meet the needs of institutions and users, data were compiled from some segments of the Aeronautical Meteorology domain. During the compilation of this corpus, it was noticed that there was a great scarcity of specialized sources of this Aviation subdomain in English and, mainly, in Portuguese, including material by the Department of Airspace Control (DECEA), the only official Brazilian institution with the role of regulating standards relevant to Aeronautical Meteorology. By taking into account that a given government institution is considered an authoritative source concerning terms used in a specialized domain, it would be advisable to align professional and academic expertise, and institutional interests. Therefore, based on contributions of corpus linguistics theories, terminology, and institutional translation, this work relied on established parameters for the compilation and processing of information for inclusion in the corpus, and focused, in this first stage, on the selection of candidate terms, according to </span>corpus analysis. The first results showed that institutional and academic segments present some subtleties regarding terminology, as, on the one hand, some words are more specific to the academic register and, on the other hand, there are different uses of terms in the institutional setting, by </span>ICAO, WMO, or FAA.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 2","pages":"Article 100051"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48255097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
“I will say the picture of the background is not related to the words”: using corpus linguistics and focus groups to reveal how speakers of English as an additional language perceive the effectiveness of the phraseology and imagery in UK public health tweets during COVID-19 “我要说的是,背景图片与单词无关”:使用语料库语言学和焦点小组来揭示新冠肺炎期间,英语作为一种附加语言的使用者如何感知英国公共卫生推文中的措辞和图像的有效性
Pub Date : 2023-08-01 DOI: 10.1016/j.acorp.2023.100053
Christian Jones, David Oakey, Kay L. O'Halloran

This paper reports on an application of a multimodal corpus-based study into the effectiveness of public health information about COVID-19 for speakers of English as an additional language (EAL) in the UK. A corpus of information tweets from 13 UK public health agencies totalling 560,000 words, with concomitant images and videos, was collected between March 2020 and February 2021. The most frequent n-grams occurring across all 13 public health agencies, and sample images occurring alongside these, were identified. In this study, we examine how images and videos combine with the phraseology to shape these COVID-19 public health information messages. Following this, six illustrative tweets were used as prompts for three focus groups of EAL participants based in the UK representing a range of first languages and occupations. Data from the focus groups was analysed in order to identify how common public health phraseology and images were received, understood and responded to by participants and how they felt they could be amended to increase their effectiveness for EAL speakers. We conclude with suggestions for making the language of public health messages simpler and more direct, aligning images more clearly with the language used and removing linguistic ambiguity. These recommendations for how such messaging could be improved in future public health campaigns could ensure a more effective and inclusive public health response.

本文报告了一项基于多模式语料库的研究的应用,该研究旨在研究新冠肺炎公共卫生信息对英国以英语为附加语言(EAL)的人的有效性。在2020年3月至2021年2月期间,收集了来自13个英国公共卫生机构的总计56万字的信息推特语料库,以及伴随的图像和视频。确定了所有13个公共卫生机构中出现频率最高的n图,以及与这些n图同时出现的样本图像。在这项研究中,我们研究了图像和视频如何与措辞相结合来塑造这些新冠肺炎公共卫生信息。在此之后,六条说明性推文被用作三组EAL参与者的提示,这三组参与者来自英国,代表一系列第一语言和职业。对焦点小组的数据进行了分析,以确定参与者如何接受、理解和回应常见的公共卫生措辞和图像,以及他们认为如何对其进行修改,以提高其对EAL演讲者的有效性。最后,我们提出了一些建议,使公共卫生信息的语言更简单、更直接,使图像与所使用的语言更清晰地对齐,并消除语言歧义。这些关于如何在未来的公共卫生运动中改进这种信息传递的建议可以确保更有效和更具包容性的公共卫生应对措施。
{"title":"“I will say the picture of the background is not related to the words”: using corpus linguistics and focus groups to reveal how speakers of English as an additional language perceive the effectiveness of the phraseology and imagery in UK public health tweets during COVID-19","authors":"Christian Jones,&nbsp;David Oakey,&nbsp;Kay L. O'Halloran","doi":"10.1016/j.acorp.2023.100053","DOIUrl":"https://doi.org/10.1016/j.acorp.2023.100053","url":null,"abstract":"<div><p>This paper reports on an application of a multimodal corpus-based study into the effectiveness of public health information about COVID-19 for speakers of English as an additional language (EAL) in the UK. A corpus of information tweets from 13 UK public health agencies totalling 560,000 words, with concomitant images and videos, was collected between March 2020 and February 2021. The most frequent n-grams occurring across all 13 public health agencies, and sample images occurring alongside these, were identified. In this study, we examine how images and videos combine with the phraseology to shape these COVID-19 public health information messages. Following this, six illustrative tweets were used as prompts for three focus groups of EAL participants based in the UK representing a range of first languages and occupations. Data from the focus groups was analysed in order to identify how common public health phraseology and images were received, understood and responded to by participants and how they felt they could be amended to increase their effectiveness for EAL speakers. We conclude with suggestions for making the language of public health messages simpler and more direct, aligning images more clearly with the language used and removing linguistic ambiguity. These recommendations for how such messaging could be improved in future public health campaigns could ensure a more effective and inclusive public health response.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 2","pages":"Article 100053"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49817931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Review of Deignan, Candarli, & Oxley (2023). The linguistic challenge of the transition to secondary school: A corpus study of academic language Deignan, Candarli, & Oxley(2023)。向中学过渡的语言挑战:学术语言的语料库研究
Pub Date : 2023-08-01 DOI: 10.1016/j.acorp.2023.100049
Philip Durrant
{"title":"Review of Deignan, Candarli, & Oxley (2023). The linguistic challenge of the transition to secondary school: A corpus study of academic language","authors":"Philip Durrant","doi":"10.1016/j.acorp.2023.100049","DOIUrl":"10.1016/j.acorp.2023.100049","url":null,"abstract":"","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 2","pages":"Article 100049"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47466935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The corpus of United States state statutes—design, construction and use 美国州法规文集——设计、建造和使用
Pub Date : 2023-08-01 DOI: 10.1016/j.acorp.2023.100047
Jesse Egbert, Margaret Wood

There is a need for more publicly available corpora of legal language. To help fill this gap, we have developed the Corpus of U.S. State Statutes, or CorUSSS, a new corpus comprising the statutory code from all 50 U.S. states. In total the corpus contains 1,785,742 texts, each of which represents the statutory text associated with a unique Universal Citation in one of the 50 U.S. states’ codes. This corpus provides us with the ability to explore language use in statutes within or across all 50 states. After motivating the need for this corpus, we describe its design and the methods we used to collect, clean and store the texts. We then report on a case study that illustrates the utility of this corpus for addressing important questions in statutory interpretation by investigating whether the word information can be used to refer to statements that are non-factual. We conclude with a call for researchers in law and corpus linguistics to rely on both legal and ordinary language when investigating questions of interpretation.

有必要提供更多公开的法律语言语料库。为了帮助填补这一空白,我们开发了美国州法规语料库(CorUSSS),这是一个包含美国所有50个州的法定代码的新语料库。该语料库总共包含1,785,742个文本,每个文本都代表与美国50个州法典之一的唯一通用引文相关的法定文本。这个语料库为我们提供了探索所有50个州内或跨州的法规中语言使用的能力。在激发了对这个语料库的需求之后,我们描述了它的设计以及我们用来收集、清理和存储文本的方法。然后,我们报告了一个案例研究,通过调查“信息”一词是否可以用来指非事实性陈述,说明了该语料库在解决法律解释中的重要问题方面的效用。最后,我们呼吁法律和语料库语言学的研究人员在调查解释问题时既依赖法律语言,也依赖普通语言。
{"title":"The corpus of United States state statutes—design, construction and use","authors":"Jesse Egbert,&nbsp;Margaret Wood","doi":"10.1016/j.acorp.2023.100047","DOIUrl":"10.1016/j.acorp.2023.100047","url":null,"abstract":"<div><p>There is a need for more publicly available corpora of legal language. To help fill this gap, we have developed the Corpus of U.S. State Statutes, or CorUSSS, a new corpus comprising the statutory code from all 50 U.S. states. In total the corpus contains 1,785,742 texts, each of which represents the statutory text associated with a unique Universal Citation in one of the 50 U.S. states’ codes. This corpus provides us with the ability to explore language use in statutes within or across all 50 states. After motivating the need for this corpus, we describe its design and the methods we used to collect, clean and store the texts. We then report on a case study that illustrates the utility of this corpus for addressing important questions in statutory interpretation by investigating whether the word <em>information</em><span> can be used to refer to statements that are non-factual. We conclude with a call for researchers in law and corpus linguistics to rely on both legal and ordinary language when investigating questions of interpretation.</span></p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 2","pages":"Article 100047"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48380661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Applied Corpus Linguistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1