首页 > 最新文献

International Journal of Corpus Linguistics最新文献

英文 中文
Beyond base and collocate 超越基础和并置
IF 1 2区 文学 Q1 Arts and Humanities Pub Date : 2021-05-15 DOI: 10.1075/IJCL.18072.CAN
P. Cantos, Moisés Almela-Sánchez
Support verb constructions figure among the most frequently investigated topics in the literature on collocation. So far, most studies of this kind have focused on bipartite structures, consisting of a verbal collocate and a nominal base. Accordingly, the analysis of how support verbs are distributed has concentrated almost exclusively on the lexical control exerted by the base. In this article, we draw attention towards the influence exerted by the participation of verb and noun in more complex patterns of lexical co-occurrence. We contend that the distribution of the support verb collocate is contingent not only on the base noun but also on other elements of the lexical context. This highlights the need to enrich the theoretical framework of collocation analysis with the additional descriptive category of ‘second-order collocate’. The proposal is illustrated with two case studies using a large-scale web corpus of English.
支持动词结构是搭配研究中最常见的话题之一。到目前为止,这类研究大多集中在两部分结构上,即言语搭配和名词基。因此,对支持动词如何分布的分析几乎完全集中在基础施加的词汇控制上。在本文中,我们关注动词和名词在更复杂的词汇共现模式中的参与所产生的影响。我们认为,支持动词搭配的分布不仅取决于基本名词,还取决于词汇语境的其他要素。这突出了需要用“二阶搭配”的额外描述性类别来丰富搭配分析的理论框架。本文以两个使用大规模网络英语语料库的案例为例进行了说明。
{"title":"Beyond base and collocate","authors":"P. Cantos, Moisés Almela-Sánchez","doi":"10.1075/IJCL.18072.CAN","DOIUrl":"https://doi.org/10.1075/IJCL.18072.CAN","url":null,"abstract":"\u0000 Support verb constructions figure among the most frequently investigated topics in the literature on collocation.\u0000 So far, most studies of this kind have focused on bipartite structures, consisting of a verbal collocate and a nominal base.\u0000 Accordingly, the analysis of how support verbs are distributed has concentrated almost exclusively on the lexical control exerted\u0000 by the base. In this article, we draw attention towards the influence exerted by the participation of verb and noun in more\u0000 complex patterns of lexical co-occurrence. We contend that the distribution of the support verb collocate is contingent not only\u0000 on the base noun but also on other elements of the lexical context. This highlights the need to enrich the theoretical framework\u0000 of collocation analysis with the additional descriptive category of ‘second-order collocate’. The proposal is illustrated with two\u0000 case studies using a large-scale web corpus of English.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48634490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Two subjunctives or three? 两个虚拟语气还是三个?
IF 1 2区 文学 Q1 Arts and Humanities Pub Date : 2021-05-10 DOI: 10.1075/IJCL.19130.GUA
Gustavo Guajardo
This paper examines the use of the three non-periphrastic subjunctives in Spanish in embedded clauses under obligatory subjunctive predicates in the past tense in three Spanish varieties: Argentinean, Mexican and Peninsular Spanish. By means of random forest and logistic regression analyses, I demonstrate that a grammar where the two “past” subjunctives make up one group, such that the variation can be modeled on a binary opposition between (morphologically) past vs. (morphologically) present, achieves better prediction accuracy and goodness-of-fit parameters than a grammar with a three-way split. The results suggest that, at least in complement clauses of obligatory subjunctive predicates, there appear to be no semantic differences between the two past subjunctives but there are still relatively large differences in how the three subjunctive forms are used across the three Spanish varieties studied.1
本文考察了三种西班牙语变体:阿根廷语、墨西哥语和半岛西班牙语在过去时强制性虚拟谓词下的嵌入从句中使用西班牙语中的三种非周边虚拟语气。通过随机森林和逻辑回归分析,我证明了一种语法,其中两个“过去”虚拟语气组成一组,使得变化可以基于(形态)过去和(形态)现在之间的二元对立来建模,与三元分裂的语法相比,该语法实现了更好的预测准确性和拟合优度参数。结果表明,至少在强制性虚拟谓词的补语从句中,过去的两个虚拟语气之间似乎没有语义差异,但在所研究的三个西班牙语变体中,这三种虚拟语气的使用方式仍然存在相对较大的差异。1
{"title":"Two subjunctives or three?","authors":"Gustavo Guajardo","doi":"10.1075/IJCL.19130.GUA","DOIUrl":"https://doi.org/10.1075/IJCL.19130.GUA","url":null,"abstract":"\u0000This paper examines the use of the three non-periphrastic subjunctives in Spanish in embedded clauses under obligatory subjunctive predicates in the past tense in three Spanish varieties: Argentinean, Mexican and Peninsular Spanish. By means of random forest and logistic regression analyses, I demonstrate that a grammar where the two “past” subjunctives make up one group, such that the variation can be modeled on a binary opposition between (morphologically) past vs. (morphologically) present, achieves better prediction accuracy and goodness-of-fit parameters than a grammar with a three-way split. The results suggest that, at least in complement clauses of obligatory subjunctive predicates, there appear to be no semantic differences between the two past subjunctives but there are still relatively large differences in how the three subjunctive forms are used across the three Spanish varieties studied.1\u0000","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49057864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Shell nouns as register-specific discourse devices 壳名词作为语域专用话语手段
IF 1 2区 文学 Q1 Arts and Humanities Pub Date : 2021-05-06 DOI: 10.1075/IJCL.19059.FAN
A. Fang, Min Dong
This article provides a corpus-based investigation into shell nouns. Shell nouns perform a variety of referential functions and express speaker stance. The investigation was motivated by the fact that past research in this area has been primarily based on written texts. Very little is known about the use of shell nouns in speech. The study used the ICE-GB corpus of contemporary British English and investigated cataphoric shell nouns complemented by appositive that-clauses across fine-grained spoken and written registers. It has revealed that the deployment of shell nouns is governed by the principle of register formality definable in terms of contextual configurations of the Field-Tenor-Mode complex rather than the mode of production. Additionally, the study has uncovered the frequent use of a small core set of shell nouns common across speech and writing. Hence it argues that shell nouns are part and parcel of spoken and written discourse and that they pertain more to grammar than to lexis.
本文提供了一个基于语料库的外壳名词调查。外壳名词具有多种指称功能,表达说话人的立场。这项调查的动机是,过去在这一领域的研究主要基于书面文本。关于外壳名词在言语中的使用,我们知之甚少。这项研究使用了当代英国英语的ICE-GB语料库,并在细粒度的口语和书面语域中调查了由同位语从句补充的后指壳名词。研究表明,外壳名词的使用受语域形式原则的支配,该原则可根据领域-时态-模式复合体的上下文配置而非生产模式来定义。此外,这项研究还发现了一小部分核心壳名词在言语和写作中的频繁使用。因此,它认为壳名词是口语和书面语篇的组成部分,它们更多地与语法有关,而不是与词汇有关。
{"title":"Shell nouns as register-specific discourse devices","authors":"A. Fang, Min Dong","doi":"10.1075/IJCL.19059.FAN","DOIUrl":"https://doi.org/10.1075/IJCL.19059.FAN","url":null,"abstract":"\u0000This article provides a corpus-based investigation into shell nouns. Shell nouns perform a variety of referential functions and express speaker stance. The investigation was motivated by the fact that past research in this area has been primarily based on written texts. Very little is known about the use of shell nouns in speech. The study used the ICE-GB corpus of contemporary British English and investigated cataphoric shell nouns complemented by appositive that-clauses across fine-grained spoken and written registers. It has revealed that the deployment of shell nouns is governed by the principle of register formality definable in terms of contextual configurations of the Field-Tenor-Mode complex rather than the mode of production. Additionally, the study has uncovered the frequent use of a small core set of shell nouns common across speech and writing. Hence it argues that shell nouns are part and parcel of spoken and written discourse and that they pertain more to grammar than to lexis.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49296994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The Coronavirus Corpus 冠状病毒语料
IF 1 2区 文学 Q1 Arts and Humanities Pub Date : 2021-05-03 DOI: 10.1075/IJCL.21044.DAV
Mark Davies
This paper discusses the creation and use of the Coronavirus Corpus, which is currently (March 2021) 900 million words in size, and which will probably be about one billion words in size by May–June 2021. The Coronavirus Corpus is a subset of the NOW Corpus (News on the Web), which is currently about 12.1 billion words in size and which grows by about two billion words each year. These two corpora are updated every night, with about 6–10 million words for NOW and 2–3 million words for the Coronavirus Corpus. The Coronavirus Corpus allows users to see the frequency of words and phrases over time (even by individual day), and users can find all words that are more frequent in one time period than another. Users can also see the collocates for words and phrases, and compare the collocates to see what is being said about particular topics over time.
本文讨论了冠状病毒语料库的创建和使用,该语料库目前(2021年3月)有9亿个单词,到2021年5月至6月可能有10亿个单词。冠状病毒语料库是NOW语料库(网络新闻)的一个子集,目前约有121亿个单词,每年增长约20亿个单词。这两个语料库每天晚上都会更新,NOW大约有600万至1000万个单词,冠状病毒语料库大约有200万至300万个单词。冠状病毒语料库允许用户查看单词和短语在一段时间内的频率(甚至按一天),用户可以找到在一个时间段内比另一时间段更频繁的所有单词。用户还可以查看单词和短语的并置,并比较并置,以了解随着时间的推移对特定主题的看法。
{"title":"The Coronavirus Corpus","authors":"Mark Davies","doi":"10.1075/IJCL.21044.DAV","DOIUrl":"https://doi.org/10.1075/IJCL.21044.DAV","url":null,"abstract":"\u0000This paper discusses the creation and use of the Coronavirus Corpus, which is currently (March 2021) 900 million words in size, and which will probably be about one billion words in size by May–June 2021. The Coronavirus Corpus is a subset of the NOW Corpus (News on the Web), which is currently about 12.1 billion words in size and which grows by about two billion words each year. These two corpora are updated every night, with about 6–10 million words for NOW and 2–3 million words for the Coronavirus Corpus. The Coronavirus Corpus allows users to see the frequency of words and phrases over time (even by individual day), and users can find all words that are more frequent in one time period than another. Users can also see the collocates for words and phrases, and compare the collocates to see what is being said about particular topics over time.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48840297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Concordance line sorting in The Prime Machine 《质数机》中的协和线排序
IF 1 2区 文学 Q1 Arts and Humanities Pub Date : 2021-04-07 DOI: 10.1075/IJCL.18056.JEA
Stephen Jeaco
Corpus data provide evidence of the patterning of language, and one way word usage can be analysed is through the study of concordance lines. While popular concordancers provide different sorting methods, they are typically only able to display lines in the order in which they occur in the corpus, randomly, or alphabetically by words in slots to the left or right of the word of interest. Less sophisticated users may find recognising patterns from these orderings quite challenging. This paper considers possible needs of language learners in terms of concordance ranking and introduces two methods which have been adopted and developed for The Prime Machine. The first method uses repeated patterns, measuring the number of matches made with other lines in the set. The second method incorporates collocation scores, providing examples with strong collocations from the entire corpus at the top of sampled concordance lines.
语料库数据提供了语言模式的证据,分析单词用法的一种方法是通过研究一致性线。虽然流行的concordancer提供了不同的排序方法,但它们通常只能按照语料库中出现的顺序、随机或按兴趣单词的左侧或右侧插槽中的单词的字母顺序显示行。不太熟练的用户可能会发现从这些顺序中识别模式非常具有挑战性。本文考虑了语言学习者在一致性排序方面的可能需求,并介绍了两种为The Prime Machine所采用和开发的方法。第一种方法使用重复的图案,测量与集合中其他线条匹配的次数。第二种方法结合搭配得分,在采样的一致性线的顶部提供整个语料库中具有强搭配的例子。
{"title":"Concordance line sorting in The Prime Machine","authors":"Stephen Jeaco","doi":"10.1075/IJCL.18056.JEA","DOIUrl":"https://doi.org/10.1075/IJCL.18056.JEA","url":null,"abstract":"\u0000 Corpus data provide evidence of the patterning of language, and one way word usage can be analysed is through the\u0000 study of concordance lines. While popular concordancers provide different sorting methods, they are typically only able to display\u0000 lines in the order in which they occur in the corpus, randomly, or alphabetically by words in slots to the left or right of the\u0000 word of interest. Less sophisticated users may find recognising patterns from these orderings quite challenging. This paper\u0000 considers possible needs of language learners in terms of concordance ranking and introduces two methods which have been adopted\u0000 and developed for The Prime Machine. The first method uses repeated patterns, measuring the number of matches\u0000 made with other lines in the set. The second method incorporates collocation scores, providing examples with strong collocations\u0000 from the entire corpus at the top of sampled concordance lines.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44258542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Unit Directional Measures of Association: Moving Beyond Pairs of Words 联想的多单位定向测量:超越词对
IF 1 2区 文学 Q1 Arts and Humanities Pub Date : 2021-04-03 DOI: 10.1075/ijcl.16098.dun
J. Dunn
This paper formulates and evaluates a series of multi-unit measures of directional association, building on the pairwise {Delta}P measure, that are able to quantify association in sequences of varying length and type of representation. Multi-unit measures face an additional segmentation problem: once the implicit length constraint of pairwise measures is abandoned, association measures must also identify the borders of meaningful sequences. This paper takes a vector-based approach to the segmentation problem by using 18 unique measures to describe different aspects of multi-unit association. An examination of these measures across eight languages shows that they are stable across languages and that each provides a unique rank of associated sequences. Taken together, these measures expand corpus-based approaches to association by generalizing across varying lengths and types of representation.
本文在成对{Delta}P测度的基础上,制定并评价了一系列定向关联的多单位测度,这些测度能够在不同长度和表示类型的序列中量化关联。多单元测度面临一个额外的分割问题:一旦放弃了成对测度的隐式长度约束,关联测度还必须识别有意义序列的边界。本文采用基于向量的方法,使用18个独特的度量来描述多单元关联的不同方面。对八种语言的这些度量进行的检查表明,它们在不同语言之间是稳定的,并且每种语言都提供了唯一的关联序列秩。综上所述,这些方法通过泛化不同长度和类型的表示来扩展基于语料库的关联方法。
{"title":"Multi-Unit Directional Measures of Association: Moving Beyond Pairs of Words","authors":"J. Dunn","doi":"10.1075/ijcl.16098.dun","DOIUrl":"https://doi.org/10.1075/ijcl.16098.dun","url":null,"abstract":"This paper formulates and evaluates a series of multi-unit measures of directional association, building on the pairwise {Delta}P measure, that are able to quantify association in sequences of varying length and type of representation. Multi-unit measures face an additional segmentation problem: once the implicit length constraint of pairwise measures is abandoned, association measures must also identify the borders of meaningful sequences. This paper takes a vector-based approach to the segmentation problem by using 18 unique measures to describe different aspects of multi-unit association. An examination of these measures across eight languages shows that they are stable across languages and that each provides a unique rank of associated sequences. Taken together, these measures expand corpus-based approaches to association by generalizing across varying lengths and types of representation.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"58653232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The Covid infodemic Covid信息大流行
IF 1 2区 文学 Q1 Arts and Humanities Pub Date : 2021-02-25 DOI: 10.1075/IJCL.20160.HYL
Ken Hyland, F. Jiang
Covid-19, the greatest global health crisis for a century, brought a new immediacy and urgency to international bio-medical research. The pandemic generated intense competition to produce a vaccine and contain the virus, creating what the World Health Organization referred to as an ‘infodemic’ of published output. In this frantic atmosphere, researchers were keen to get their research noticed. In this paper, we explore whether this enthusiasm influenced the rhetorical presentation of research and encouraged scientists to “sell” their studies. Examining a corpus of the most highly cited SCI articles on the virus published in the first seven months of 2020, we explore authors’ use of hyperbolic and promotional language to boost aspects of their research. Our results show a significant increase in hype to stress certainty, contribution, novelty and potential, especially regarding research methods, outcomes and primacy. Our study sheds light on scientific persuasion at a time of intense social anxiety.
Covid-19是一个世纪以来最严重的全球卫生危机,给国际生物医学研究带来了新的紧迫性和紧迫性。大流行引发了生产疫苗和控制病毒的激烈竞争,造成了世界卫生组织(World Health Organization)所称的出版成果的“信息大流行”。在这种疯狂的氛围中,研究人员渴望让他们的研究得到关注。在本文中,我们探讨了这种热情是否影响了研究的修辞表达,并鼓励科学家“出售”他们的研究。我们研究了2020年前7个月发表的关于该病毒的SCI文章中被引用次数最多的文章,探讨了作者使用夸张和宣传语言来提升他们研究的各个方面。我们的研究结果显示,强调确定性、贡献、新颖性和潜力的炒作显著增加,特别是在研究方法、结果和首要地位方面。我们的研究揭示了在强烈的社会焦虑时期的科学说服。
{"title":"The Covid infodemic","authors":"Ken Hyland, F. Jiang","doi":"10.1075/IJCL.20160.HYL","DOIUrl":"https://doi.org/10.1075/IJCL.20160.HYL","url":null,"abstract":"\u0000Covid-19, the greatest global health crisis for a century, brought a new immediacy and urgency to international bio-medical research. The pandemic generated intense competition to produce a vaccine and contain the virus, creating what the World Health Organization referred to as an ‘infodemic’ of published output. In this frantic atmosphere, researchers were keen to get their research noticed. In this paper, we explore whether this enthusiasm influenced the rhetorical presentation of research and encouraged scientists to “sell” their studies. Examining a corpus of the most highly cited SCI articles on the virus published in the first seven months of 2020, we explore authors’ use of hyperbolic and promotional language to boost aspects of their research. Our results show a significant increase in hype to stress certainty, contribution, novelty and potential, especially regarding research methods, outcomes and primacy. Our study sheds light on scientific persuasion at a time of intense social anxiety.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43817505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
The TV and Movies corpora 电视和电影语料库
IF 1 2区 文学 Q1 Arts and Humanities Pub Date : 2021-02-09 DOI: 10.1075/IJCL.00035.DAV
Mark Davies
Abstract This paper discusses the creation and use of the TV Corpus (subtitles from 75,000 episodes, 325 million words, 6 English-speaking countries, 1950s-2010s) and the Movies Corpus (subtitles from 25,000 movies, 200 million words, 6 English-speaking countries, 1930s–2010s), which are available at English-Corpora.org. The corpora compare well to the BNC-Conversation data in terms of informality, lexis, phraseology, and syntax. But at 525 million words in total size, they are more than 30 times as large as BNC-Conversation (both BNC1994 and BNC2014 combined), which means that they can be used to look at a wide range of linguistic phenomena. The TV and Movies corpora also allow useful comparisons of very informal language across time (containing texts from the 1930s and later for the movies, and from the 1950s onwards for TV shows) and between dialects of English (such as British and American English).
摘要:本文讨论了在englishcorpora.org网站上提供的电视语料库(7.5万集,3.25亿字,6个英语国家,20世纪50年代至2010年代)和电影语料库(2.5万部电影,2亿字,6个英语国家,20世纪30年代至2010年代)的创建和使用。该语料库在非正式性、词汇、短语和语法方面与BNC-Conversation数据进行了比较。但它们的总字数为5.25亿,是BNC-Conversation (BNC1994和BNC2014的总和)的30多倍,这意味着它们可以用来研究广泛的语言现象。电视和电影语料库还允许对不同时期的非常非正式的语言(包括20世纪30年代和后来的电影文本,以及20世纪50年代以后的电视节目文本)和英语方言(如英式英语和美式英语)进行有用的比较。
{"title":"The TV and Movies corpora","authors":"Mark Davies","doi":"10.1075/IJCL.00035.DAV","DOIUrl":"https://doi.org/10.1075/IJCL.00035.DAV","url":null,"abstract":"Abstract This paper discusses the creation and use of the TV Corpus (subtitles from 75,000 episodes, 325 million words, 6 English-speaking countries, 1950s-2010s) and the Movies Corpus (subtitles from 25,000 movies, 200 million words, 6 English-speaking countries, 1930s–2010s), which are available at English-Corpora.org. The corpora compare well to the BNC-Conversation data in terms of informality, lexis, phraseology, and syntax. But at 525 million words in total size, they are more than 30 times as large as BNC-Conversation (both BNC1994 and BNC2014 combined), which means that they can be used to look at a wide range of linguistic phenomena. The TV and Movies corpora also allow useful comparisons of very informal language across time (containing texts from the 1930s and later for the movies, and from the 1950s onwards for TV shows) and between dialects of English (such as British and American English).","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41628555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Innovation on screen 屏幕上的创新
IF 1 2区 文学 Q1 Arts and Humanities Pub Date : 2020-12-08 DOI: 10.1075/IJCL.00038.REI
Susan A. Reichelt
Abstract This study explores marked affixation as a possible cue for characterization in scripted television dialogue. The data used here is the newly compiled TV Corpus, which encompasses over 265 million words in its North American English context. An initial corpus-based analysis quantifies the innovative use of affixes in word-formation processes across the corpus to allow for comparison with a following character analysis, which investigates how derivational word-formation supports characterization patterns within a specific series, Buffy the Vampire Slayer. For this, a list of productive prefixes (e.g. de-, un-) and suffixes (e.g. -y, -ish) is used to elicit relevant contexts. The study thus combines two approaches to word-formation processes in scripted contexts. On a large scale, it shows how derivational neologisms are spread across TV dialogue and on a much smaller scale, it highlights particular instances where these neologisms are used to aid character construction.
摘要本研究探讨了在电视剧本对话中标记词缀作为表征的可能线索。这里使用的数据是最新汇编的电视语料库,在北美英语语境中包含超过2.65亿个单词。最初的基于语料库的分析量化了词缀在整个语料库的单词形成过程中的创新使用,以便与下面的字符分析进行比较,该分析调查了衍生单词形成如何支持特定系列中的特征模式,吸血鬼杀手巴菲。为此,使用一系列富有成效的前缀(例如de、un-)和后缀(例如-y、-ish)来引出相关上下文。因此,该研究结合了两种方法来处理脚本环境中的单词形成过程。在很大程度上,它展示了衍生新词是如何在电视对话中传播的,在小得多的范围内,它强调了这些新词被用来帮助角色构建的特定例子。
{"title":"Innovation on screen","authors":"Susan A. Reichelt","doi":"10.1075/IJCL.00038.REI","DOIUrl":"https://doi.org/10.1075/IJCL.00038.REI","url":null,"abstract":"Abstract This study explores marked affixation as a possible cue for characterization in scripted television dialogue. The data used here is the newly compiled TV Corpus, which encompasses over 265 million words in its North American English context. An initial corpus-based analysis quantifies the innovative use of affixes in word-formation processes across the corpus to allow for comparison with a following character analysis, which investigates how derivational word-formation supports characterization patterns within a specific series, Buffy the Vampire Slayer. For this, a list of productive prefixes (e.g. de-, un-) and suffixes (e.g. -y, -ish) is used to elicit relevant contexts. The study thus combines two approaches to word-formation processes in scripted contexts. On a large scale, it shows how derivational neologisms are spread across TV dialogue and on a much smaller scale, it highlights particular instances where these neologisms are used to aid character construction.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2020-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48191280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Subcategorization frame identification for learner English 英语学习者的子范畴框架识别
IF 1 2区 文学 Q1 Arts and Humanities Pub Date : 2020-12-08 DOI: 10.1075/ijcl.18097.hua
Yi-Feng Huang, Akira Murakami, T. Alexopoulou, A. Korhonen
Abstract As large-scale learner corpora become increasingly available, it is vital that natural language processing (NLP) technology is developed to provide rich linguistic annotations necessary for second language (L2) research. We present a system for automatically analyzing subcategorization frames (SCFs) for learner English. SCFs link lexis with morphosyntax, shedding light on the interplay between lexical and structural information in learner language. Meanwhile, SCFs are crucial to the study of a wide range of phenomena including individual verbs, verb classes and varying syntactic structures. To illustrate the usefulness of our system for learner corpus research and second language acquisition (SLA), we investigate how L2 learners diversify their use of SCFs in text and how this diversity changes with L2 proficiency.
随着大规模学习语料库的日益普及,自然语言处理(NLP)技术的发展为第二语言研究提供丰富的语言注释变得至关重要。本文提出了一个用于英语学习者的子分类框架自动分析系统。SCFs将词汇与形态句法联系起来,揭示了学习者语言中词汇信息与结构信息之间的相互作用。同时,scf对于研究单个动词、动词类和句法结构变化等一系列现象至关重要。为了说明我们的系统对学习者语料库研究和第二语言习得(SLA)的有用性,我们研究了二语学习者如何在文本中多样化地使用scf,以及这种多样性如何随着二语熟练程度的变化而变化。
{"title":"Subcategorization frame identification for learner English","authors":"Yi-Feng Huang, Akira Murakami, T. Alexopoulou, A. Korhonen","doi":"10.1075/ijcl.18097.hua","DOIUrl":"https://doi.org/10.1075/ijcl.18097.hua","url":null,"abstract":"Abstract As large-scale learner corpora become increasingly available, it is vital that natural language processing (NLP) technology is developed to provide rich linguistic annotations necessary for second language (L2) research. We present a system for automatically analyzing subcategorization frames (SCFs) for learner English. SCFs link lexis with morphosyntax, shedding light on the interplay between lexical and structural information in learner language. Meanwhile, SCFs are crucial to the study of a wide range of phenomena including individual verbs, verb classes and varying syntactic structures. To illustrate the usefulness of our system for learner corpus research and second language acquisition (SLA), we investigate how L2 learners diversify their use of SCFs in text and how this diversity changes with L2 proficiency.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2020-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41839430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Corpus Linguistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1