首页 > 最新文献

International Journal of Corpus Linguistics最新文献

英文 中文
Review of Durrant (2023): Corpus linguistics for writing development 评论 Durrant (2023):用于写作发展的语料库语言学
IF 1 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2024-06-13 DOI: 10.1075/ijcl.00059.lim
Joyce Lim
This article reviews Corpus linguistics for writing development
本文评述了用于写作发展的语料库语言学
{"title":"Review of Durrant (2023): Corpus linguistics for writing development","authors":"Joyce Lim","doi":"10.1075/ijcl.00059.lim","DOIUrl":"https://doi.org/10.1075/ijcl.00059.lim","url":null,"abstract":"This article reviews Corpus linguistics for writing development","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":"32 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Review of Dunn (2022): Natural Language Processing for Corpus Linguistics 邓恩(2022)评论:语料库语言学的自然语言处理
IF 1 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2023-12-22 DOI: 10.1075/ijcl.00057.sch
Hanna Schmück
{"title":"Review of Dunn (2022): Natural Language Processing for Corpus Linguistics","authors":"Hanna Schmück","doi":"10.1075/ijcl.00057.sch","DOIUrl":"https://doi.org/10.1075/ijcl.00057.sch","url":null,"abstract":"","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":"42 14","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138946500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Review of Viana (2022): Teaching English with Corpora: A Resource Book 评论 Viana (2022):用语料库教英语:资源手册
IF 1 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2023-12-21 DOI: 10.1075/ijcl.00056.per
P. Pérez-Paredes
{"title":"Review of Viana (2022): Teaching English with Corpora: A Resource Book","authors":"P. Pérez-Paredes","doi":"10.1075/ijcl.00056.per","DOIUrl":"https://doi.org/10.1075/ijcl.00056.per","url":null,"abstract":"","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":"33 9","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138950411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Framing the path to net zero 规划零净排放之路
IF 1 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2023-12-07 DOI: 10.1075/ijcl.22123.fuo
Matteo Fuoli, Annika Beelitz
Big corporations are a leading contributor to global carbon emissions and their investment decisions have a significant impact on the world’s ability to tackle climate change. This study combines corpus and discourse approaches to examine how major corporate emitters have responded to the Paris Agreement, how they legitimize their practices amid mounting public pressure, and how companies operating in high- and middle-income countries differ in their framing of climate change. The results show that carbon majors place increasing focus on climate issues, widely support the goals of the Paris Agreement, and are increasingly making net-zero pledges. However, close inspection of linguistic patterns reveals a troubling disconnect between proclaimed goals, the solutions advocated for, and the radical steps needed to address the escalating climate crisis. Companies from middle-income countries devote comparatively less attention to climate change, which points to the need for better coordinated global efforts to address this problem.
大公司是全球碳排放的主要贡献者,它们的投资决策对全球应对气候变化的能力有重大影响。本研究结合了语料库和话语方法,考察了主要排放企业如何回应《巴黎协定》,它们如何在日益增加的公众压力下使自己的做法合法化,以及在高收入和中等收入国家运营的企业在气候变化框架方面的差异。结果显示,碳巨头越来越关注气候问题,广泛支持《巴黎协定》的目标,并越来越多地做出净零排放的承诺。然而,仔细观察语言模式就会发现,在宣称的目标、倡导的解决方案和应对不断升级的气候危机所需的激进步骤之间,存在令人不安的脱节。中等收入国家的公司对气候变化的关注相对较少,这表明需要更好地协调全球努力来解决这一问题。
{"title":"Framing the path to net zero","authors":"Matteo Fuoli, Annika Beelitz","doi":"10.1075/ijcl.22123.fuo","DOIUrl":"https://doi.org/10.1075/ijcl.22123.fuo","url":null,"abstract":"\u0000 Big corporations are a leading contributor to global carbon emissions and their investment decisions have a\u0000 significant impact on the world’s ability to tackle climate change. This study combines corpus and discourse approaches to examine\u0000 how major corporate emitters have responded to the Paris Agreement, how they legitimize their practices amid mounting public\u0000 pressure, and how companies operating in high- and middle-income countries differ in their framing of climate change. The results\u0000 show that carbon majors place increasing focus on climate issues, widely support the goals of the Paris Agreement, and are\u0000 increasingly making net-zero pledges. However, close inspection of linguistic patterns reveals a troubling disconnect between\u0000 proclaimed goals, the solutions advocated for, and the radical steps needed to address the escalating climate crisis. Companies\u0000 from middle-income countries devote comparatively less attention to climate change, which points to the need for better\u0000 coordinated global efforts to address this problem.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":"31 10","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138592265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Political framing of Covid-19 Covid-19的政治框架
2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2023-11-14 DOI: 10.1075/ijcl.22087.moh
Ariana N Mohammadi
Abstract The present study is a corpus-based discourse analysis of the metaphorical framing of Covid-19 in American political discourse. Drawing on data from a corpus of the White House briefings and statements, the study investigates the corpus profile of war and virus and illustrates how the Coronavirus is primarily represented as an enemy to go to war with, rather than a public health crisis to control and mitigate. The study further situates the militaristic framing of Covid-19 within the theoretical framework of moral panic and examines the discursive features that ultimately bridge the metaphorical representation of the pandemic and the construction of moral panic. The study points to nuanced discourse strategies used in the White House press briefings that reconstruct the enemy and regroup the Coronavirus with other so-called enemies of the United States, such as the Communists, as well as the Islamic radicals and the Latin gangs and cartels.
摘要本研究基于语料库对美国政治话语中Covid-19的隐喻框架进行了话语分析。该研究利用白宫简报和声明语料库中的数据,调查了战争和病毒的语料库概况,并说明了冠状病毒如何主要被描述为需要与之开战的敌人,而不是需要控制和缓解的公共卫生危机。该研究进一步将Covid-19的军国主义框架置于道德恐慌的理论框架中,并考察了最终将大流行的隐喻表现与道德恐慌的构建联系起来的话语特征。该研究指出了白宫新闻发布会上使用的微妙话语策略,这些策略重构了敌人,并将冠状病毒与其他所谓的美国敌人(如共产党、伊斯兰激进分子、拉丁帮派和卡特尔)重新组合在一起。
{"title":"Political framing of Covid-19","authors":"Ariana N Mohammadi","doi":"10.1075/ijcl.22087.moh","DOIUrl":"https://doi.org/10.1075/ijcl.22087.moh","url":null,"abstract":"Abstract The present study is a corpus-based discourse analysis of the metaphorical framing of Covid-19 in American political discourse. Drawing on data from a corpus of the White House briefings and statements, the study investigates the corpus profile of war and virus and illustrates how the Coronavirus is primarily represented as an enemy to go to war with, rather than a public health crisis to control and mitigate. The study further situates the militaristic framing of Covid-19 within the theoretical framework of moral panic and examines the discursive features that ultimately bridge the metaphorical representation of the pandemic and the construction of moral panic. The study points to nuanced discourse strategies used in the White House press briefings that reconstruct the enemy and regroup the Coronavirus with other so-called enemies of the United States, such as the Communists, as well as the Islamic radicals and the Latin gangs and cartels.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":"11 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134991153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancing Sino-Philippine linguistics and sociolinguistics using the Lannang Corpus (LanCorp) 利用兰南语料库推进中菲语言学和社会语言学
2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2023-10-26 DOI: 10.1075/ijcl.22096.gon
Wilkinson Daniel Wong Gonzales
Abstract This paper introduces the Lannang Corpus (LanCorp), a public 375,000-word collection of raw and transcribed recordings of Lannang languages spoken in metropolitan Manila, which have been annotated with part-of-speech tags and linked to 40 types of sociolinguistic metadata. It begins by providing an overview of the LanCorp (e.g. design, formats, accessibility). Then, it goes on to show various examples of how the corpus can be used for variationist sociolinguistic research, using Lánnang-uè data as a case study. The findings from the exploratory studies indicate that Lannang languages are influenced by sociolinguistic factors, demonstrating the intricate nature of the Sino-Philippine sociolinguistic ecology. Due to its large size, sociolinguistic metadata, and various formats, LanCorp can be used to study Lannang languages in general and how they are used by specific social groups. It enables scholars to investigate multilingual interactions in a wide range of sociolinguistic factors, furthering the field of Sino-Philippine (socio)linguistics.
摘要:本文介绍了兰南语语料库(LanCorp),这是一个37.5万字的马尼拉大都市兰南语原始和转录录音的公共集合,这些录音已经用词性标签进行了注释,并与40种社会语言学元数据相关联。它首先提供了LanCorp的概述(例如设计、格式、可访问性)。然后,它继续展示了语料库如何用于变异社会语言学研究的各种例子,使用Lánnang-uè数据作为案例研究。探索性研究结果表明,兰南语受到社会语言学因素的影响,显示了中菲社会语言学生态的复杂性。由于其庞大的规模、社会语言学元数据和各种格式,LanCorp可以用于研究一般的兰南语以及特定社会群体如何使用这些语言。它使学者能够在广泛的社会语言学因素中研究多语言互动,进一步推动中菲(社会)语言学领域的发展。
{"title":"Advancing Sino-Philippine linguistics and sociolinguistics using the Lannang Corpus (LanCorp)","authors":"Wilkinson Daniel Wong Gonzales","doi":"10.1075/ijcl.22096.gon","DOIUrl":"https://doi.org/10.1075/ijcl.22096.gon","url":null,"abstract":"Abstract This paper introduces the Lannang Corpus (LanCorp), a public 375,000-word collection of raw and transcribed recordings of Lannang languages spoken in metropolitan Manila, which have been annotated with part-of-speech tags and linked to 40 types of sociolinguistic metadata. It begins by providing an overview of the LanCorp (e.g. design, formats, accessibility). Then, it goes on to show various examples of how the corpus can be used for variationist sociolinguistic research, using Lánnang-uè data as a case study. The findings from the exploratory studies indicate that Lannang languages are influenced by sociolinguistic factors, demonstrating the intricate nature of the Sino-Philippine sociolinguistic ecology. Due to its large size, sociolinguistic metadata, and various formats, LanCorp can be used to study Lannang languages in general and how they are used by specific social groups. It enables scholars to investigate multilingual interactions in a wide range of sociolinguistic factors, furthering the field of Sino-Philippine (socio)linguistics.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":"157 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136381392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The inverse frequency effect 逆频率效应
2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2023-10-09 DOI: 10.1075/ijcl.22080.tem
David Temperley
Rare syntactic constructions show an especially strong tendency to be repeated, but some rare constructions exhibit this tendency much more strongly than others. The reasons for this variation are not well understood. This exploratory study examines five rare noun-phrase (NP) expansions in English: (the rich), (a Bob Gates), (architect Julia Morgan), (the jobs data), and (home electronic equipment). Repetition tendencies are very strong in the first and second of these and somewhat strong in the third; in the fourth and fifth they are much weaker, only slightly higher than those of common NP expansions such as (the black dog). To explain this variation, we suggest that constructions may be associated with different types of discourse: constructions with high repetition tendencies tend to occur in persuasive rather than informative discourse.
摘要罕见句法结构具有特别强的重复倾向,但有些罕见句法结构的重复倾向要比其他句法结构强烈得多。造成这种差异的原因尚不清楚。本文探讨了英语中五种罕见的名词短语扩展:<A>(富人),<a N prop N prop >(一个鲍勃·盖茨),<N唱N道具N道具>(建筑师Julia Morgan), < N pl N sing >(就业数据)和<N sing A N sing >(家用电子设备)。重复倾向在第一个和第二个非常强烈,在第三个稍微强一些;在第4和第5中,它们要弱得多,仅略高于常见的NP扩展,如<D; A; N sing >(黑狗)。为了解释这种差异,我们认为结构可能与不同类型的话语有关:具有高重复倾向的结构往往出现在说服性话语中,而不是信息性话语中。
{"title":"The inverse frequency effect","authors":"David Temperley","doi":"10.1075/ijcl.22080.tem","DOIUrl":"https://doi.org/10.1075/ijcl.22080.tem","url":null,"abstract":"Rare syntactic constructions show an especially strong tendency to be repeated, but some rare constructions exhibit this tendency much more strongly than others. The reasons for this variation are not well understood. This exploratory study examines five rare noun-phrase (NP) expansions in English: (the rich), (a Bob Gates), (architect Julia Morgan), (the jobs data), and (home electronic equipment). Repetition tendencies are very strong in the first and second of these and somewhat strong in the third; in the fourth and fifth they are much weaker, only slightly higher than those of common NP expansions such as (the black dog). To explain this variation, we suggest that constructions may be associated with different types of discourse: constructions with high repetition tendencies tend to occur in persuasive rather than informative discourse.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135095995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pinpointing prescriptive impact 明确规定的影响
2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2023-09-26 DOI: 10.1075/ijcl.22001.mal
Beth Malory
Abstract This paper presents a single-author case study which demonstrates that the statistical modelling technique change point analysis (CPA) can provide compelling evidence of prescriptive impact at an idiolectal level. It has been hypothesized that Late Modern English review periodicals consistently pushed a prescriptive agenda, and that this impacted language use ( McIntosh, 1998 ; Percy, 2009 ). A lack of empirical research has, however, left these claims unsubstantiated, partly because evaluating prescriptivist endeavours has proven challenging. Using a purpose-built 3-million-token idiolectal corpus spanning 7 decades, this paper reports that it is possible to discern a striking change in usage. Use of CPA enables this change to be located precisely, and correlated to the author’s exposure to a prescriptive review of her work. In demonstrating how effectively CPA can provide a sophisticated correlation indicative of causality, this paper showcases the suitability of this technique to the study of prescriptivism.
摘要本文提出了一个单作者案例研究,该研究表明统计建模技术变化点分析(CPA)可以在个体水平上提供令人信服的说明性影响证据。据推测,晚期现代英语评论期刊一直在推动一种规定性的议程,这影响了语言的使用(麦金托什,1998;珀西,2009)。然而,由于缺乏实证研究,这些说法没有得到证实,部分原因是评估规范主义者的努力已被证明具有挑战性。使用专门构建的跨越70年的300万个令牌习语语料库,本文报告有可能辨别出用法的惊人变化。使用CPA可以精确地定位这种变化,并将其与作者对其工作的规范性审查的暴露相关联。在演示CPA如何有效地提供一个复杂的因果关系指示,这篇论文展示了这种技术的适用性,以研究规定主义。
{"title":"Pinpointing prescriptive impact","authors":"Beth Malory","doi":"10.1075/ijcl.22001.mal","DOIUrl":"https://doi.org/10.1075/ijcl.22001.mal","url":null,"abstract":"Abstract This paper presents a single-author case study which demonstrates that the statistical modelling technique change point analysis (CPA) can provide compelling evidence of prescriptive impact at an idiolectal level. It has been hypothesized that Late Modern English review periodicals consistently pushed a prescriptive agenda, and that this impacted language use ( McIntosh, 1998 ; Percy, 2009 ). A lack of empirical research has, however, left these claims unsubstantiated, partly because evaluating prescriptivist endeavours has proven challenging. Using a purpose-built 3-million-token idiolectal corpus spanning 7 decades, this paper reports that it is possible to discern a striking change in usage. Use of CPA enables this change to be located precisely, and correlated to the author’s exposure to a prescriptive review of her work. In demonstrating how effectively CPA can provide a sophisticated correlation indicative of causality, this paper showcases the suitability of this technique to the study of prescriptivism.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134957679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Keywords of the manosphere 关键词大气
IF 1 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2023-08-14 DOI: 10.1075/ijcl.22053.mcg
M. McGlashan, A. Krendel
This paper examines language used in five of the largest manosphere communities on Reddit (r/TheRedPill, r/braincels, r/MensRights, r/seduction, and r/MGTOW) to identify idiosyncratic language use within these communities. To do so, a novel methodology which combines key-key-word analysis with notions from set theory was used to identify and compare keywords between corpora and to find keywords that are used uniquely within – and thus are distinctive to – these five separate communities. The paper achieves the following: it (i) presents a novel method for identifying what we term ‘complement keywords’ (keywords that are not shared between multiple different corpora when compared against the same reference corpus), and (ii) explores idiosyncratic language use in five separate manosphere communities. The analysis first examines interdiscursive relationships between communities emerging from the complement keywords identified before discussing community-specific preoccupations emergent in the idiosyncratic language use found in these five communities.
本文研究了Reddit上五个最大的庄园社区(r/TheRedPill、r/brainecels、r/MensRights、r/诱惑和r/MGTOW)中使用的语言,以确定这些社区中的特殊语言使用。为了做到这一点,我们使用了一种新的方法,将关键词分析与集合论的概念相结合,来识别和比较语料库之间的关键词,并找到在这五个独立的社区中唯一使用的关键词,因此这些关键词对这五个社区来说是独特的。本文实现了以下目标:(i)提出了一种新的方法来识别我们所称的“补充关键词”(当与同一参考语料库进行比较时,多个不同语料库之间不共享的关键词),以及(ii)探索了五个独立庄园社区中的特殊语言使用。该分析首先考察了在讨论这五个社区中发现的特殊语言使用中出现的特定社区关注点之前,从确定的补语关键词中出现的社区之间的交叉关系。
{"title":"Keywords of the manosphere","authors":"M. McGlashan, A. Krendel","doi":"10.1075/ijcl.22053.mcg","DOIUrl":"https://doi.org/10.1075/ijcl.22053.mcg","url":null,"abstract":"\u0000This paper examines language used in five of the largest manosphere communities on Reddit (r/TheRedPill, r/braincels, r/MensRights, r/seduction, and r/MGTOW) to identify idiosyncratic language use within these communities. To do so, a novel methodology which combines key-key-word analysis with notions from set theory was used to identify and compare keywords between corpora and to find keywords that are used uniquely within – and thus are distinctive to – these five separate communities. The paper achieves the following: it (i) presents a novel method for identifying what we term ‘complement keywords’ (keywords that are not shared between multiple different corpora when compared against the same reference corpus), and (ii) explores idiosyncratic language use in five separate manosphere communities. The analysis first examines interdiscursive relationships between communities emerging from the complement keywords identified before discussing community-specific preoccupations emergent in the idiosyncratic language use found in these five communities.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":" ","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43241876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Association measures for collocation extraction 搭配提取的关联测度
IF 1 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2023-08-14 DOI: 10.1075/ijcl.21056.su
Qi Su, Chen Gu, Pengyuan Liu
In this study, we propose a new evaluation scheme to assess the strengths and limitations of collocation extraction measures and explore type-sensitive methods for extracting collocations. We introduced the pooling strategy widely used in Information Retrieval and automated the evaluation process using online dictionaries. Sixteen well-known metrics are evaluated based on their effectiveness and then distributional and linguistic compared. The results show that Group A methods (e.g. z-score, Dice, PMI) are more effective in extracting low-frequency collocations with relatively small extraction scales. In contrast, Group B methods (e.g. t-test, LMI, LLR) perform better at finding high-frequency collocations, most of which outperform Group A methods as the extraction scale increases. Moreover, Group A prefers NN collocations, while Group B identifies collocations with a wide range of syntactic structures. This study provides suggestions for studies to identify hybrid extraction methods as well as for language educators and dictionary compilers.
在这项研究中,我们提出了一种新的评估方案来评估搭配提取措施的优势和局限性,并探索提取搭配的类型敏感方法。我们介绍了在信息检索中广泛使用的池策略,并使用在线词典自动化了评估过程。16个众所周知的指标根据其有效性进行评估,然后进行分布和语言比较。结果表明,A组方法(如z-score、Dice、PMI)在提取尺度相对较小的低频搭配时更有效。相反,B组方法(如t检验、LMI、LLR)在发现高频搭配方面表现更好,随着提取规模的增加,大多数方法都优于A组方法。此外,A组更喜欢NN搭配,而B组则认为搭配具有广泛的句法结构。这项研究为确定混合提取方法的研究以及语言教育工作者和词典编纂者提供了建议。
{"title":"Association measures for collocation extraction","authors":"Qi Su, Chen Gu, Pengyuan Liu","doi":"10.1075/ijcl.21056.su","DOIUrl":"https://doi.org/10.1075/ijcl.21056.su","url":null,"abstract":"\u0000In this study, we propose a new evaluation scheme to assess the strengths and limitations of collocation extraction measures and explore type-sensitive methods for extracting collocations. We introduced the pooling strategy widely used in Information Retrieval and automated the evaluation process using online dictionaries. Sixteen well-known metrics are evaluated based on their effectiveness and then distributional and linguistic compared. The results show that Group A methods (e.g. z-score, Dice, PMI) are more effective in extracting low-frequency collocations with relatively small extraction scales. In contrast, Group B methods (e.g. t-test, LMI, LLR) perform better at finding high-frequency collocations, most of which outperform Group A methods as the extraction scale increases. Moreover, Group A prefers NN collocations, while Group B identifies collocations with a wide range of syntactic structures. This study provides suggestions for studies to identify hybrid extraction methods as well as for language educators and dictionary compilers.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":" ","pages":""},"PeriodicalIF":1.0,"publicationDate":"2023-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44703465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Corpus Linguistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1