The InsightsNet Climate Change Corpus (ICCC)

Datenbank-Spektrum : Zeitschrift fur Datenbanktechnologie : Organ der Fachgruppe Datenbanken der Gesellschaft fur Informatik e.V Pub Date : 2023-09-11 DOI:10.1007/s13222-023-00454-1

Elena Volkanovska, Sherry Tan, Changxu Duan, Sabine Bartsch, Wolfgang Stille

{"title":"The InsightsNet Climate Change Corpus (ICCC)","authors":"Elena Volkanovska, Sherry Tan, Changxu Duan, Sabine Bartsch, Wolfgang Stille","doi":"10.1007/s13222-023-00454-1","DOIUrl":null,"url":null,"abstract":"Abstract The discourse on climate change has become a centerpiece of public debate, thereby creating a pressing need to analyze the multitude of messages created by the participants in this communication process. In addition to text, information on this topic is conveyed multimodally, through images, videos, tables and other data objects that are embedded within documents and accompany the text. This paper presents the process of building a multimodal pilot corpus to the InsightsNet Climate Change Corpus (ICCC) and using natural language processing (NLP) tools to enrich corpus (meta)data, thus creating a dataset that lends itself to the exploration of the interplay between the various modalities that constitute the discourse on climate change. We demonstrate how the pilot corpus can be queried for relevant information in two types of databases, and how the proposed data model promotes a more comprehensive sentiment analysis approach.","PeriodicalId":72771,"journal":{"name":"Datenbank-Spektrum : Zeitschrift fur Datenbanktechnologie : Organ der Fachgruppe Datenbanken der Gesellschaft fur Informatik e.V","volume":"77 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Datenbank-Spektrum : Zeitschrift fur Datenbanktechnologie : Organ der Fachgruppe Datenbanken der Gesellschaft fur Informatik e.V","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s13222-023-00454-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract The discourse on climate change has become a centerpiece of public debate, thereby creating a pressing need to analyze the multitude of messages created by the participants in this communication process. In addition to text, information on this topic is conveyed multimodally, through images, videos, tables and other data objects that are embedded within documents and accompany the text. This paper presents the process of building a multimodal pilot corpus to the InsightsNet Climate Change Corpus (ICCC) and using natural language processing (NLP) tools to enrich corpus (meta)data, thus creating a dataset that lends itself to the exploration of the interplay between the various modalities that constitute the discourse on climate change. We demonstrate how the pilot corpus can be queried for relevant information in two types of databases, and how the proposed data model promotes a more comprehensive sentiment analysis approach.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

洞察网气候变化语料库(ICCC)

关于气候变化的讨论已经成为公众辩论的焦点，因此迫切需要分析这一传播过程中参与者所产生的大量信息。除文本外，关于这一主题的信息还通过图像、视频、表格和嵌入文档中并随文本提供的其他数据对象以多种方式传达。本文介绍了为InsightsNet气候变化语料库(ICCC)构建一个多模态试点语料库的过程，并使用自然语言处理(NLP)工具来丰富语料库(元)数据，从而创建一个数据集，该数据集有助于探索构成气候变化话语的各种模式之间的相互作用。我们演示了如何在两种类型的数据库中查询试点语料库中的相关信息，以及所提出的数据模型如何促进更全面的情感分析方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Datenbank-Spektrum : Zeitschrift fur Datenbanktechnologie : Organ der Fachgruppe Datenbanken der Gesellschaft fur Informatik e.V

自引率

0.00%

发文量

期刊最新文献

An Extension of DNAContainer with a Small Memory Footprint SportsTables: A New Corpus for Semantic Type Detection (Extended Version) Dissertationen Accelerating Large Table Scan Using Processing-In-Memory Technology Geo Engine: Workflow-driven Geospatial Portals for Data Science