bd2tsum:用于抽象灾难推文摘要的基准数据集

IF 2.9 Q1 Social Sciences Online Social Networks and Media Pub Date : 2025-01-01 Epub Date: 2025-01-10 DOI:10.1016/j.osnem.2024.100299
Piyush Kumar Garg , Roshni Chakraborty , Sourav Kumar Dandapat
{"title":"bd2tsum:用于抽象灾难推文摘要的基准数据集","authors":"Piyush Kumar Garg ,&nbsp;Roshni Chakraborty ,&nbsp;Sourav Kumar Dandapat","doi":"10.1016/j.osnem.2024.100299","DOIUrl":null,"url":null,"abstract":"<div><div>Online social media platforms, such as Twitter, are mediums for valuable updates during disasters. However, the large scale of available information makes it difficult for humans to identify relevant information from the available information. An automatic summary of these tweets provides identification of relevant information easy and ensures a holistic overview of a disaster event to process the aid for disaster response. In literature, there are two types of abstractive disaster tweet summarization approaches based on the format of output summary: key-phrased-based (where summary is a set of key-phrases) and sentence-based (where summary is a paragraph consisting of sentences). Existing sentence-based abstractive approaches are either unsupervised or supervised. However, both types of approaches require a sizable amount of ground-truth summaries for training and/or evaluation such that they work on disaster events irrespective of type and location. The lack of abstractive disaster ground-truth summaries and guidelines for annotation motivates us to come up with a systematic procedure to create abstractive sentence ground-truth summaries of disaster events. Therefore, this paper presents a two-step systematic annotation procedure for sentence-based abstractive summary creation. Additionally, we release <em>BD2TSumm</em>, i.e., a benchmark ground-truth dataset for evaluating the sentence-based abstractive summarization approaches for disaster events. <em>BD2TSumm</em> consists of 15 ground-truth summaries belonging to 5 different continents and both natural and man-made disaster types. Furthermore, to ensure the high quality of the generated ground-truth summaries, we evaluate them qualitatively (using five metrics) and quantitatively (using two metrics). Finally, we compare 12 existing State-Of-The-Art (SOTA) abstractive summarization approaches on these ground-truth summaries using ROUGE-N F1-score.</div></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":"45 ","pages":"Article 100299"},"PeriodicalIF":2.9000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"BD2TSumm: A Benchmark Dataset for Abstractive Disaster Tweet Summarization\",\"authors\":\"Piyush Kumar Garg ,&nbsp;Roshni Chakraborty ,&nbsp;Sourav Kumar Dandapat\",\"doi\":\"10.1016/j.osnem.2024.100299\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Online social media platforms, such as Twitter, are mediums for valuable updates during disasters. However, the large scale of available information makes it difficult for humans to identify relevant information from the available information. An automatic summary of these tweets provides identification of relevant information easy and ensures a holistic overview of a disaster event to process the aid for disaster response. In literature, there are two types of abstractive disaster tweet summarization approaches based on the format of output summary: key-phrased-based (where summary is a set of key-phrases) and sentence-based (where summary is a paragraph consisting of sentences). Existing sentence-based abstractive approaches are either unsupervised or supervised. However, both types of approaches require a sizable amount of ground-truth summaries for training and/or evaluation such that they work on disaster events irrespective of type and location. The lack of abstractive disaster ground-truth summaries and guidelines for annotation motivates us to come up with a systematic procedure to create abstractive sentence ground-truth summaries of disaster events. Therefore, this paper presents a two-step systematic annotation procedure for sentence-based abstractive summary creation. Additionally, we release <em>BD2TSumm</em>, i.e., a benchmark ground-truth dataset for evaluating the sentence-based abstractive summarization approaches for disaster events. <em>BD2TSumm</em> consists of 15 ground-truth summaries belonging to 5 different continents and both natural and man-made disaster types. Furthermore, to ensure the high quality of the generated ground-truth summaries, we evaluate them qualitatively (using five metrics) and quantitatively (using two metrics). Finally, we compare 12 existing State-Of-The-Art (SOTA) abstractive summarization approaches on these ground-truth summaries using ROUGE-N F1-score.</div></div>\",\"PeriodicalId\":52228,\"journal\":{\"name\":\"Online Social Networks and Media\",\"volume\":\"45 \",\"pages\":\"Article 100299\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Online Social Networks and Media\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2468696424000247\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/10 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Online Social Networks and Media","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468696424000247","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/10 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 0

摘要

在线社交媒体平台,如Twitter,是灾难期间有价值的更新媒介。然而,大量的可用信息使得人类很难从可用信息中识别出相关信息。这些推文的自动摘要可以很容易地识别相关信息,并确保对灾难事件有一个全面的概述,从而为灾难响应提供援助。在文献中,基于输出摘要的格式,有两种抽象的灾难推文摘要方法:基于关键短语(key-phrase -based,摘要是一组关键短语)和基于句子(sentence-based,摘要是由句子组成的段落)。现有基于句子的抽象方法要么是无监督的,要么是有监督的。然而,这两种方法都需要大量的基础事实总结来进行培训和/或评估,以便它们适用于灾害事件,而不考虑类型和地点。由于缺乏抽象的灾难基础真值摘要和注释指南,促使我们提出一种系统的程序来创建抽象的句子基础真值摘要。因此,本文提出了一种基于句子的抽象摘要生成的两步系统标注流程。此外,我们发布了BD2TSumm,即一个基准的真实数据集,用于评估基于句子的灾难事件抽象摘要方法。BD2TSumm由5个不同大洲的15个事实总结组成,包括自然灾害和人为灾害类型。此外,为了确保生成的基础事实摘要的高质量,我们定性地(使用五个指标)和定量地(使用两个指标)对它们进行评估。最后,我们使用ROUGE-N F1-score比较了12种现有的最先进的(SOTA)抽象摘要方法对这些基础真值摘要的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
BD2TSumm: A Benchmark Dataset for Abstractive Disaster Tweet Summarization
Online social media platforms, such as Twitter, are mediums for valuable updates during disasters. However, the large scale of available information makes it difficult for humans to identify relevant information from the available information. An automatic summary of these tweets provides identification of relevant information easy and ensures a holistic overview of a disaster event to process the aid for disaster response. In literature, there are two types of abstractive disaster tweet summarization approaches based on the format of output summary: key-phrased-based (where summary is a set of key-phrases) and sentence-based (where summary is a paragraph consisting of sentences). Existing sentence-based abstractive approaches are either unsupervised or supervised. However, both types of approaches require a sizable amount of ground-truth summaries for training and/or evaluation such that they work on disaster events irrespective of type and location. The lack of abstractive disaster ground-truth summaries and guidelines for annotation motivates us to come up with a systematic procedure to create abstractive sentence ground-truth summaries of disaster events. Therefore, this paper presents a two-step systematic annotation procedure for sentence-based abstractive summary creation. Additionally, we release BD2TSumm, i.e., a benchmark ground-truth dataset for evaluating the sentence-based abstractive summarization approaches for disaster events. BD2TSumm consists of 15 ground-truth summaries belonging to 5 different continents and both natural and man-made disaster types. Furthermore, to ensure the high quality of the generated ground-truth summaries, we evaluate them qualitatively (using five metrics) and quantitatively (using two metrics). Finally, we compare 12 existing State-Of-The-Art (SOTA) abstractive summarization approaches on these ground-truth summaries using ROUGE-N F1-score.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Online Social Networks and Media
Online Social Networks and Media Social Sciences-Communication
CiteScore
10.60
自引率
0.00%
发文量
32
审稿时长
44 days
期刊最新文献
Misinformation detection on YouTube using video captions Improving fake news detection concatenating multimodal features with transformers-based deep learning models From slurs to slots: LLM masking and Telephone-Game Augmentation for multiclass hate speech detection Social media and food: Dietary trends across languages and countries Modeling the impact of group interactions on climate-related opinion change in Reddit
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1