{"title":"SFU意见评论语料库:一个分析网络新闻评论的语料库。","authors":"Varada Kolhatkar, Hanhan Wu, Luca Cavasso, Emilie Francis, Kavan Shukla, Maite Taboada","doi":"10.1007/s41701-019-00065-w","DOIUrl":null,"url":null,"abstract":"<p><p>We present the SFU Opinion and Comments Corpus (SOCC ), a collection of opinion articles and the comments posted in response to the articles. The articles include all the opinion pieces published in the Canadian newspaper <i>The Globe and Mail</i> in the 5-year period between 2012 and 2016, a total of 10,339 articles and 663,173 comments. SOCC is part of a project that investigates the linguistic characteristics of online comments. The corpus can be used to study a host of pragmatic phenomena. Among other aspects, researchers can explore: the connections between articles and comments; the connections of comments to each other; the types of topics discussed in comments; the nice (constructive) or mean (toxic) ways in which commenters respond to each other; how language is used to convey very specific types of evaluation; and how negation affects the interpretation of evaluative meaning in discourse. Our current focus is the study of constructiveness and evaluation in the comments. To that end, we have annotated a subset of the large corpus (1043 comments) with four layers of annotations: constructiveness, toxicity, negation and Appraisal (Martin and White, The language of evaluation, Palgrave, New York, 2005). This paper details our corpus, the data collection process, the characteristics of the corpus and describes the annotations. While our focus is comments posted in response to opinion news articles, the phenomena in this corpus are likely to be present in many commenting platforms: other news comments, comments and replies in fora such as Reddit, feedback on blogs, or YouTube comments.</p>","PeriodicalId":52343,"journal":{"name":"Corpus Pragmatics","volume":null,"pages":null},"PeriodicalIF":1.3000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s41701-019-00065-w","citationCount":"4","resultStr":"{\"title\":\"The SFU Opinion and Comments Corpus: A Corpus for the Analysis of Online News Comments.\",\"authors\":\"Varada Kolhatkar, Hanhan Wu, Luca Cavasso, Emilie Francis, Kavan Shukla, Maite Taboada\",\"doi\":\"10.1007/s41701-019-00065-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>We present the SFU Opinion and Comments Corpus (SOCC ), a collection of opinion articles and the comments posted in response to the articles. The articles include all the opinion pieces published in the Canadian newspaper <i>The Globe and Mail</i> in the 5-year period between 2012 and 2016, a total of 10,339 articles and 663,173 comments. SOCC is part of a project that investigates the linguistic characteristics of online comments. The corpus can be used to study a host of pragmatic phenomena. Among other aspects, researchers can explore: the connections between articles and comments; the connections of comments to each other; the types of topics discussed in comments; the nice (constructive) or mean (toxic) ways in which commenters respond to each other; how language is used to convey very specific types of evaluation; and how negation affects the interpretation of evaluative meaning in discourse. Our current focus is the study of constructiveness and evaluation in the comments. To that end, we have annotated a subset of the large corpus (1043 comments) with four layers of annotations: constructiveness, toxicity, negation and Appraisal (Martin and White, The language of evaluation, Palgrave, New York, 2005). This paper details our corpus, the data collection process, the characteristics of the corpus and describes the annotations. While our focus is comments posted in response to opinion news articles, the phenomena in this corpus are likely to be present in many commenting platforms: other news comments, comments and replies in fora such as Reddit, feedback on blogs, or YouTube comments.</p>\",\"PeriodicalId\":52343,\"journal\":{\"name\":\"Corpus Pragmatics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2020-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1007/s41701-019-00065-w\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Corpus Pragmatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s41701-019-00065-w\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2019/11/2 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"0\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Corpus Pragmatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s41701-019-00065-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2019/11/2 0:00:00","PubModel":"Epub","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 4
摘要
我们提出了SFU意见和评论语料库(SOCC),这是一个意见文章和对这些文章作出回应的评论的集合。这些文章包括加拿大报纸《环球邮报》在2012年至2016年的5年间发表的所有评论文章,共计10339篇文章和663173条评论。SOCC是一个研究网络评论语言特征的项目的一部分。语料库可以用来研究许多语用现象。在其他方面,研究者可以探索:文章和评论之间的联系;评论之间的联系;评论中讨论的主题类型;评论者相互回应的友好(建设性的)或刻薄(有害的)方式;语言如何被用来传达非常具体的评价类型;以及否定如何影响话语中评价意义的解释。我们目前的重点是研究评论中的建设性和评价。为此,我们用四层注释注释了大语料库的一个子集(1043条注释):建设性、毒性、否定和评价(Martin and White, the language of evaluation, Palgrave, New York, 2005)。本文详细介绍了我们的语料库、数据收集过程、语料库的特点,并对标注进行了描述。虽然我们关注的是对观点新闻文章的评论,但这个语料库中的现象很可能出现在许多评论平台上:其他新闻评论、Reddit论坛上的评论和回复、博客上的反馈或YouTube评论。
The SFU Opinion and Comments Corpus: A Corpus for the Analysis of Online News Comments.
We present the SFU Opinion and Comments Corpus (SOCC ), a collection of opinion articles and the comments posted in response to the articles. The articles include all the opinion pieces published in the Canadian newspaper The Globe and Mail in the 5-year period between 2012 and 2016, a total of 10,339 articles and 663,173 comments. SOCC is part of a project that investigates the linguistic characteristics of online comments. The corpus can be used to study a host of pragmatic phenomena. Among other aspects, researchers can explore: the connections between articles and comments; the connections of comments to each other; the types of topics discussed in comments; the nice (constructive) or mean (toxic) ways in which commenters respond to each other; how language is used to convey very specific types of evaluation; and how negation affects the interpretation of evaluative meaning in discourse. Our current focus is the study of constructiveness and evaluation in the comments. To that end, we have annotated a subset of the large corpus (1043 comments) with four layers of annotations: constructiveness, toxicity, negation and Appraisal (Martin and White, The language of evaluation, Palgrave, New York, 2005). This paper details our corpus, the data collection process, the characteristics of the corpus and describes the annotations. While our focus is comments posted in response to opinion news articles, the phenomena in this corpus are likely to be present in many commenting platforms: other news comments, comments and replies in fora such as Reddit, feedback on blogs, or YouTube comments.
期刊介绍:
Corpus Pragmatics offers a forum for theoretical and applied linguists who carry out research in the new linguistic discipline that stands at the interface between corpus linguistics and pragmatics. The journal promotes the combination of the two approaches through research on new topics in linguistics, with a particular focus on interdisciplinary studies, and to enlarge and implement current pragmatic theories that have hitherto not benefited from empirical corpus support. Authors are encouraged to describe the statistical analyses used in their research and to supply the data and scripts in R when possible. The objective of Corpus Pragmatics is to develop pragmatics with the aid of quantitative corpus methodology. The journal accepts original research papers, short research notes, and occasional thematic issues. The journal follows a double-blind peer review system.