首页 > 最新文献

Corpora最新文献

英文 中文
Twenty-first century ideological discourses about US migrant education that transcend registers 21世纪关于美国移民教育的意识形态话语
IF 0.5 Q3 LINGUISTICS Pub Date : 2023-08-01 DOI: 10.3366/cor.2023.0280
Shannon Fitzsimmons‐Doolan
Widely distributed and often repeated discursive patterns which represent migrants can influence the education of migrant students ( Calavita, 1996 ; Santa Ana, 2002 ; Cutler, 2017 ; and Dabach et al., 2017 ). Ideological discourses (e.g., ‘immigrants are threats’) are particularly potent structures that mediate language, cognition and social life. Whilst there has been a recent increase in studies of texts on the topic of migration generally, there are few that focus on the intersection of migration and education or on discursive patterns that transcend registers. This study introduces a multi-dimensional analysis approach for the identification of ideological discourses from a 9 million-word corpus of twenty-first century, US texts about migrant education from multiple registers (online comments, national and regional newspaper texts, and federal and state government webpages) using the distribution of lexical variables that characterise variants of migrant/ migration. Eleven ideological discourses (e.g., ‘US immigration policies are problematic, but there is no consensus for solutions’) were found. Of these, several had not been previously identified, one confirmed a previously identified discourse, and several complemented and extended previously identified discursive patterns on this topic. Together, these findings reveal the highly naturalised ideologically discursive landscape that shapes educational opportunities for US migrant students.
代表移民的广泛分布和经常重复的话语模式可以影响移民学生的教育(Calavita, 1996;圣安娜,2002;卡特勒,2017;和Dabach等人,2017)。意识形态话语(例如,“移民是威胁”)是调解语言、认知和社会生活的特别有效的结构。虽然最近关于移民主题的文本研究有所增加,但很少有人关注移民与教育的交集或超越语域的话语模式。本研究引入了一种多维分析方法,利用表征移民/迁移变体的词汇变量分布,从21世纪900万字的语料库中识别意识形态话语,从多个登记册(在线评论,国家和地区报纸文本以及联邦和州政府网页)中识别关于移民教育的美国文本。发现了11种意识形态话语(例如,“美国移民政策有问题,但对解决方案没有共识”)。其中,有几个以前没有被确定,一个证实了以前确定的话语,还有几个补充和扩展了以前确定的关于这个主题的话语模式。总之,这些发现揭示了高度自然化的意识形态话语景观,它塑造了美国移民学生的教育机会。
{"title":"Twenty-first century ideological discourses about US migrant education that transcend registers","authors":"Shannon Fitzsimmons‐Doolan","doi":"10.3366/cor.2023.0280","DOIUrl":"https://doi.org/10.3366/cor.2023.0280","url":null,"abstract":"Widely distributed and often repeated discursive patterns which represent migrants can influence the education of migrant students ( Calavita, 1996 ; Santa Ana, 2002 ; Cutler, 2017 ; and Dabach et al., 2017 ). Ideological discourses (e.g., ‘immigrants are threats’) are particularly potent structures that mediate language, cognition and social life. Whilst there has been a recent increase in studies of texts on the topic of migration generally, there are few that focus on the intersection of migration and education or on discursive patterns that transcend registers. This study introduces a multi-dimensional analysis approach for the identification of ideological discourses from a 9 million-word corpus of twenty-first century, US texts about migrant education from multiple registers (online comments, national and regional newspaper texts, and federal and state government webpages) using the distribution of lexical variables that characterise variants of migrant/ migration. Eleven ideological discourses (e.g., ‘US immigration policies are problematic, but there is no consensus for solutions’) were found. Of these, several had not been previously identified, one confirmed a previously identified discourse, and several complemented and extended previously identified discursive patterns on this topic. Together, these findings reveal the highly naturalised ideologically discursive landscape that shapes educational opportunities for US migrant students.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":" ","pages":""},"PeriodicalIF":0.5,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48339384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Review: Islentyeva. 2020. Corpus-based Analysis of Ideological Bias: Migration in the British Press. London: Routledge 评论:Islenteva。2020.基于语料库的意识形态偏见分析:英国媒体的移民。伦敦:劳特利奇
IF 0.5 Q3 LINGUISTICS Pub Date : 2023-08-01 DOI: 10.3366/cor.2023.0285
A. Black
{"title":"Review: Islentyeva. 2020. Corpus-based Analysis of Ideological Bias: Migration in the British Press. London: Routledge","authors":"A. Black","doi":"10.3366/cor.2023.0285","DOIUrl":"https://doi.org/10.3366/cor.2023.0285","url":null,"abstract":"","PeriodicalId":44933,"journal":{"name":"Corpora","volume":" ","pages":""},"PeriodicalIF":0.5,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48989826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards increased reliability and transparency in projects with manual linguistic coding 提高手动语言编码项目的可靠性和透明度
IF 0.5 Q3 LINGUISTICS Pub Date : 2023-08-01 DOI: 10.3366/cor.2023.0284
Nicole Hober, Tülay Dixon, Tove Larsson
Manually coded data form the basis of many of our analyses in corpus linguistics. It is thus imperative that we work towards increased reliability and enhanced transparency in our coding practices, since failing to do so may ultimately lead us to draw erroneous conclusions about language. Using spoken data from a study on adverb usage for illustration, this methods paper discusses some strategies for identifying threats to the reliability of our coding and offers suggestions for how to mitigate these and ensure that our coding can be assessed and replicated. The paper also includes suggestions for best practices for manual linguistic coding and concludes with a discussion of the benefits of such practices. With this paper, we expand on the ongoing discussions in the field on issues of reliability and transparency as they relate to manual coding. We argue that while tests of inter-rater reliability offer a helpful starting point, further steps are needed to ensure increased reliability and transparency.
人工编码的数据构成了我们在语料库语言学中许多分析的基础。因此,我们必须努力提高编码实践的可靠性和透明度,因为如果不这样做,最终可能会导致我们对语言得出错误的结论。本文以一项副词使用研究的口语数据为例,讨论了识别对我们编码可靠性的威胁的一些策略,并就如何减轻这些威胁以及确保我们的编码能够被评估和复制提出了建议。本文还包括对手动语言编码的最佳实践的建议,并以讨论这种实践的好处作为结论。通过这篇论文,我们扩展了该领域正在进行的关于可靠性和透明度问题的讨论,因为这些问题与手动编码有关。我们认为,虽然评级机构间可靠性测试提供了一个有用的起点,但还需要采取进一步措施来确保提高可靠性和透明度。
{"title":"Towards increased reliability and transparency in projects with manual linguistic coding","authors":"Nicole Hober, Tülay Dixon, Tove Larsson","doi":"10.3366/cor.2023.0284","DOIUrl":"https://doi.org/10.3366/cor.2023.0284","url":null,"abstract":"Manually coded data form the basis of many of our analyses in corpus linguistics. It is thus imperative that we work towards increased reliability and enhanced transparency in our coding practices, since failing to do so may ultimately lead us to draw erroneous conclusions about language. Using spoken data from a study on adverb usage for illustration, this methods paper discusses some strategies for identifying threats to the reliability of our coding and offers suggestions for how to mitigate these and ensure that our coding can be assessed and replicated. The paper also includes suggestions for best practices for manual linguistic coding and concludes with a discussion of the benefits of such practices. With this paper, we expand on the ongoing discussions in the field on issues of reliability and transparency as they relate to manual coding. We argue that while tests of inter-rater reliability offer a helpful starting point, further steps are needed to ensure increased reliability and transparency.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":"1 1","pages":""},"PeriodicalIF":0.5,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41419454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Front matter 前页
Q3 LINGUISTICS Pub Date : 2023-08-01 DOI: 10.3366/cor.2023.0279
{"title":"Front matter","authors":"","doi":"10.3366/cor.2023.0279","DOIUrl":"https://doi.org/10.3366/cor.2023.0279","url":null,"abstract":"","PeriodicalId":44933,"journal":{"name":"Corpora","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135817966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Corpus of Historical Mapudungun: morpho-phonological parsing and the history of a Native American language 历史马普敦贡语料库:词形-音位分析和美洲土著语言的历史
IF 0.5 Q3 LINGUISTICS Pub Date : 2023-08-01 DOI: 10.3366/cor.2023.0281
Benjamin Molineaux
The Corpus of Historical Mapudungun (chm), which I present here, is a lemmatised, part-of-speech and grapho-phonologically parsed collection of texts in the ancestral language of the Mapuche people. This paper gives an overview of the corpus materials (spanning 1606 to 1930), their processing and search capabilities. The tei xml tags at the word and morpheme levels are shown to be suitable to account for the abundant agglutinative morphology of the language. The advantages of visualising sound–spelling equivalences across the various spelling systems in the corpus are also emphasised. Some uses and limitations of the corpus are surveyed too, with a particular emphasis on the contribution of typologically diverse languages to understanding language change and the importance of making heritage materials available to native speaker communities for revitalisation purposes.
我在这里介绍的历史马普顿贡语料库(chm)是马普切人祖先语言的词性、词性和书写音韵学分析的文本集。本文概述了语料库材料(1606年至1930年),他们的处理和搜索能力。在单词和语素水平上的tei xml标记被证明适合于解释语言中丰富的粘合形态。还强调了在语料库中各种拼写系统中可视化语音拼写等效的优势。语料库的一些用途和局限性也进行了调查,特别强调类型学上多样化的语言对理解语言变化的贡献,以及为振兴目的向母语社区提供遗产材料的重要性。
{"title":"The Corpus of Historical Mapudungun: morpho-phonological parsing and the history of a Native American language","authors":"Benjamin Molineaux","doi":"10.3366/cor.2023.0281","DOIUrl":"https://doi.org/10.3366/cor.2023.0281","url":null,"abstract":"The Corpus of Historical Mapudungun (chm), which I present here, is a lemmatised, part-of-speech and grapho-phonologically parsed collection of texts in the ancestral language of the Mapuche people. This paper gives an overview of the corpus materials (spanning 1606 to 1930), their processing and search capabilities. The tei xml tags at the word and morpheme levels are shown to be suitable to account for the abundant agglutinative morphology of the language. The advantages of visualising sound–spelling equivalences across the various spelling systems in the corpus are also emphasised. Some uses and limitations of the corpus are surveyed too, with a particular emphasis on the contribution of typologically diverse languages to understanding language change and the importance of making heritage materials available to native speaker communities for revitalisation purposes.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":" ","pages":""},"PeriodicalIF":0.5,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43046650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comparable corpus-based study of phrasal verbs in academic writing by English and Chinese scholars across disciplines 基于语料库的英汉学者学术写作中短语动词的比较研究
IF 0.5 Q3 LINGUISTICS Pub Date : 2023-08-01 DOI: 10.3366/cor.2023.0283
Xianwei Gao
This paper reports on a comparative investigation into the differences and similarities in the use of phrasal verbs (pvs) by L1 English and L1 Chinese scholars (ess and css) in academic English writing. Using a corpus of research articles from the fields of Physics, Computer Science, Linguistics and Management written by ess and css, we present data to reveal that: ( i) pvs are used in both css’ and ess’ research articles across disciplines; ( ii) there are significant differences in the use of pvs between css and ess, with css employing pvs less frequently than ess in both types and tokens; ( iii) disciplinary variations have been detected – research articles in soft science disciplines (Linguistics and Management) deploy significantly more pvs and the tendency is particularly so in ess’ research articles; ( iv) both css and ess use the ‘Verb + Adverbial particle + np’ or ‘Verb + np + Adverbial particle’ pattern and the ‘Verb + Preposition + np’ pattern most frequently; and ( v) the majority of the most frequent pvs are shared by css and ess and used in their metaphorical senses. Qualitative analyses of the four selected items demonstrate that the co-selection between the collocating nouns and the structural patterns of pvs decides the senses being realised. These findings shed light on teaching academic writing and provide writers with some guidance on verb choices.
本文对一级英语和一级汉语学者(ess和css)在学术英语写作中使用短语动词的异同进行了比较研究。使用ess和css撰写的来自物理、计算机科学、语言学和管理领域的研究文章语料库,我们提供的数据表明:(i)pvs在css和ess的跨学科研究文章中都有使用;(ii)css和ess在pvs的使用方面存在显著差异,css在类型和令牌方面使用pvs的频率都低于ess;(iii)发现纪律变化 – 软科学学科(语言学和管理学)的研究文章显著增加了pvs,尤其是ess的研究文章;(iv)css和ess都最频繁地使用“动词+副词助词+np”或“动词+np+状语助词”模式和“动词+介词+np”模式;和(v)大多数最常见的pvs由css和ess共享,并在其隐喻意义上使用。对四个选择项的定性分析表明,搭配名词和pvs结构模式之间的共同选择决定了意义的实现。这些发现为学术写作教学提供了启示,并为作者在动词选择方面提供了一些指导。
{"title":"A comparable corpus-based study of phrasal verbs in academic writing by English and Chinese scholars across disciplines","authors":"Xianwei Gao","doi":"10.3366/cor.2023.0283","DOIUrl":"https://doi.org/10.3366/cor.2023.0283","url":null,"abstract":"This paper reports on a comparative investigation into the differences and similarities in the use of phrasal verbs (pvs) by L1 English and L1 Chinese scholars (ess and css) in academic English writing. Using a corpus of research articles from the fields of Physics, Computer Science, Linguistics and Management written by ess and css, we present data to reveal that: ( i) pvs are used in both css’ and ess’ research articles across disciplines; ( ii) there are significant differences in the use of pvs between css and ess, with css employing pvs less frequently than ess in both types and tokens; ( iii) disciplinary variations have been detected – research articles in soft science disciplines (Linguistics and Management) deploy significantly more pvs and the tendency is particularly so in ess’ research articles; ( iv) both css and ess use the ‘Verb + Adverbial particle + np’ or ‘Verb + np + Adverbial particle’ pattern and the ‘Verb + Preposition + np’ pattern most frequently; and ( v) the majority of the most frequent pvs are shared by css and ess and used in their metaphorical senses. Qualitative analyses of the four selected items demonstrate that the co-selection between the collocating nouns and the structural patterns of pvs decides the senses being realised. These findings shed light on teaching academic writing and provide writers with some guidance on verb choices.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":" ","pages":""},"PeriodicalIF":0.5,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43988076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A corpus-based study of the discourse functions of English tense: the co-occurrence of tense and lexical aspect at various textual positions of news reports 基于语料库的英语时态语篇功能研究——新闻报道中不同语篇位置时态和词体的共现
IF 0.5 Q3 LINGUISTICS Pub Date : 2023-08-01 DOI: 10.3366/cor.2023.0282
Liying Zhang
This study analyses the discourse functions of tense in the New York Times Corpus under a three-dimensional framework – the three dimensions being tense, verb and textual position. The texts are divided into ten 10 percent sub-sections and the distribution of the tenses along textual positions, the distribution of verb categories along textual positions as well as the distribution of tense-verb construction along textual positions are calculated to examine the features of tense use in news reports. The association between tense and verb categories is calculated using WordSmith log-likelihood statistics. Quantitative distribution analysis of the tenses reveals their distribution patterns. The distribution of the present and the present perfect follows a multi-peaked curve while there is a steady increase of preterit from the beginning to the end. The association between tense and verb shows that different tenses have attractions for different verb categories. The present tense attracts state verbs, the past tense attracts achievement verbs, and the present perfect prefers achievement and activity verbs. Analysis of tense-verb constructions along textual positions reveals that tense-verb constructions have localised functions – within different textual positions, tense-verb constructions take on various features and focus on different functions. All these findings constitute the stylistic use of tenses in news reports and reveal modern news values in the journalistic community.
摘要本研究在三维框架下分析了《纽约时报》语料库中时态的语篇功能 – 三个维度是时态、动词和语篇位置。将文章分为10个10%的小节,计算时态沿语篇位置的分布、动词类别沿语篇地位的分布以及时态-动词结构沿语篇定位的分布,以检验新闻报道中时态使用的特点。时态和动词类别之间的关联是使用WordSmith对数似然统计来计算的。时态的数量分布分析揭示了时态的分布模式。现在和现在完成的分布遵循多峰曲线,而prerit从开始到结束都有稳定的增加。时态和动词之间的联系表明,不同的时态对不同的动词类别有吸引力。现在时吸引状态动词,过去时吸引成就动词,现在完成时偏爱成就和活动动词。从语篇位置分析时态动词结构揭示了时态动词结构具有地方性功能 – 在不同的语篇位置上,时态动词结构呈现出不同的特点和功能。这些发现构成了新闻报道中时态的文体运用,揭示了新闻界的现代新闻价值。
{"title":"A corpus-based study of the discourse functions of English tense: the co-occurrence of tense and lexical aspect at various textual positions of news reports","authors":"Liying Zhang","doi":"10.3366/cor.2023.0282","DOIUrl":"https://doi.org/10.3366/cor.2023.0282","url":null,"abstract":"This study analyses the discourse functions of tense in the New York Times Corpus under a three-dimensional framework – the three dimensions being tense, verb and textual position. The texts are divided into ten 10 percent sub-sections and the distribution of the tenses along textual positions, the distribution of verb categories along textual positions as well as the distribution of tense-verb construction along textual positions are calculated to examine the features of tense use in news reports. The association between tense and verb categories is calculated using WordSmith log-likelihood statistics. Quantitative distribution analysis of the tenses reveals their distribution patterns. The distribution of the present and the present perfect follows a multi-peaked curve while there is a steady increase of preterit from the beginning to the end. The association between tense and verb shows that different tenses have attractions for different verb categories. The present tense attracts state verbs, the past tense attracts achievement verbs, and the present perfect prefers achievement and activity verbs. Analysis of tense-verb constructions along textual positions reveals that tense-verb constructions have localised functions – within different textual positions, tense-verb constructions take on various features and focus on different functions. All these findings constitute the stylistic use of tenses in news reports and reveal modern news values in the journalistic community.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":" ","pages":""},"PeriodicalIF":0.5,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46920870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corpus of Founding Era American English: designing a corpus for interpreting the United States Constitution 建国时期美国英语语料库:美国宪法解读语料库的设计
IF 0.5 Q3 LINGUISTICS Pub Date : 2023-04-01 DOI: 10.3366/cor.2023.0270
Brett Hashimoto
The original meaning of words or phrases is often in dispute in Founding Era legislation, especially the US Constitution. The Corpus of Founding Era American English (cofea) accurately provides evidence for the meaning of contested terms during the Founding Era. cofea consists of 126,394 texts and over 136 million words. This corpus has been and is being used by legal researchers and interpreters in scholarly research as well as various courts, including the Supreme Court. This paper describes the motivation for the creation of cofea and describes the process of designing and collecting the corpus.
在建国时期的立法中,特别是在美国宪法中,单词或短语的原意经常存在争议。建国时期美国英语语料库准确地为建国时期有争议术语的含义提供了证据。cofea由126394篇文本和超过1.36亿个单词组成。法律研究人员和口译员在学术研究中以及包括最高法院在内的各个法院一直在使用这一语料库。本文描述了创建cofea的动机,并描述了语料库的设计和收集过程。
{"title":"Corpus of Founding Era American English: designing a corpus for interpreting the United States Constitution","authors":"Brett Hashimoto","doi":"10.3366/cor.2023.0270","DOIUrl":"https://doi.org/10.3366/cor.2023.0270","url":null,"abstract":"The original meaning of words or phrases is often in dispute in Founding Era legislation, especially the US Constitution. The Corpus of Founding Era American English (cofea) accurately provides evidence for the meaning of contested terms during the Founding Era. cofea consists of 126,394 texts and over 136 million words. This corpus has been and is being used by legal researchers and interpreters in scholarly research as well as various courts, including the Supreme Court. This paper describes the motivation for the creation of cofea and describes the process of designing and collecting the corpus.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":" ","pages":""},"PeriodicalIF":0.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47861140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Key feature analysis: a simple, yet powerful method for comparing text varieties 关键特征分析:一种简单而强大的文本变体比较方法
IF 0.5 Q3 LINGUISTICS Pub Date : 2023-04-01 DOI: 10.3366/cor.2023.0275
Jesse Egbert, D. Biber
To date, corpus-based methods for comparing language varieties have fallen into one of two camps: ( 1) md analysis – a complicated multi-variate approach based on analysis of functionally motivated linguistic features in each text of a corpus, or ( 2) keyword/key pos analysis – simple, univariate techniques to identify any feature with a statistically skewed distribution in a corpus. In this paper, we introduce a complementary technique – key feature analysis – which is a simple quantitative approach to compare the texts in two varieties with respect to a set of functionally motivated lexico-grammatical features. We introduce the methods of key feature analysis, contrast them with other approaches for comparing text varieties, and present case studies from the domains of online registers and US presidential debates.
到目前为止,基于语料库的语言变体比较方法分为两个阵营:(1)md分析 – 一种复杂的多变量方法,基于对语料库中每个文本中受功能驱动的语言特征的分析,或(2)关键词/关键位置分析 – 简单的单变量技术来识别语料库中具有统计偏斜分布的任何特征。在本文中,我们介绍了一种互补技术 – 关键特征分析 – 这是一种简单的定量方法,可以根据一组功能驱动的词典语法特征来比较两个变体的文本。我们介绍了关键特征分析的方法,将其与其他比较文本变体的方法进行了比较,并从在线登记和美国总统辩论领域进行了案例研究。
{"title":"Key feature analysis: a simple, yet powerful method for comparing text varieties","authors":"Jesse Egbert, D. Biber","doi":"10.3366/cor.2023.0275","DOIUrl":"https://doi.org/10.3366/cor.2023.0275","url":null,"abstract":"To date, corpus-based methods for comparing language varieties have fallen into one of two camps: ( 1) md analysis – a complicated multi-variate approach based on analysis of functionally motivated linguistic features in each text of a corpus, or ( 2) keyword/key pos analysis – simple, univariate techniques to identify any feature with a statistically skewed distribution in a corpus. In this paper, we introduce a complementary technique – key feature analysis – which is a simple quantitative approach to compare the texts in two varieties with respect to a set of functionally motivated lexico-grammatical features. We introduce the methods of key feature analysis, contrast them with other approaches for comparing text varieties, and present case studies from the domains of online registers and US presidential debates.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":" ","pages":""},"PeriodicalIF":0.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44470337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Review: Egbert, Larsson and Biber. 2020. Doing Linguistics with a Corpus: Methodological Considerations for the Everyday User 评论:Egbert,Larsson和Biber。2020.用语料库做语言学:日常用户的方法论思考
IF 0.5 Q3 LINGUISTICS Pub Date : 2023-04-01 DOI: 10.3366/cor.2023.0277
Veysel Altunel
{"title":"Review: Egbert, Larsson and Biber. 2020. Doing Linguistics with a Corpus: Methodological Considerations for the Everyday User","authors":"Veysel Altunel","doi":"10.3366/cor.2023.0277","DOIUrl":"https://doi.org/10.3366/cor.2023.0277","url":null,"abstract":"","PeriodicalId":44933,"journal":{"name":"Corpora","volume":" ","pages":""},"PeriodicalIF":0.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49277866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Corpora
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1