I. Bilal, Bo Wang, A. Tsakalidis, Dong Nguyen, R. Procter, M. Liakata
{"title":"Template-based Abstractive Microblog Opinion Summarization","authors":"I. Bilal, Bo Wang, A. Tsakalidis, Dong Nguyen, R. Procter, M. Liakata","doi":"10.1162/tacl_a_00516","DOIUrl":null,"url":null,"abstract":"Abstract We introduce the task of microblog opinion summarization (MOS) and share a dataset of 3100 gold-standard opinion summaries to facilitate research in this domain. The dataset contains summaries of tweets spanning a 2-year period and covers more topics than any other public Twitter summarization dataset. Summaries are abstractive in nature and have been created by journalists skilled in summarizing news articles following a template separating factual information (main story) from author opinions. Our method differs from previous work on generating gold-standard summaries from social media, which usually involves selecting representative posts and thus favors extractive summarization models. To showcase the dataset’s utility and challenges, we benchmark a range of abstractive and extractive state-of-the-art summarization models and achieve good performance, with the former outperforming the latter. We also show that fine-tuning is necessary to improve performance and investigate the benefits of using different sample sizes.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"10 1","pages":"1229-1248"},"PeriodicalIF":4.2000,"publicationDate":"2022-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transactions of the Association for Computational Linguistics","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1162/tacl_a_00516","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 3
Abstract
Abstract We introduce the task of microblog opinion summarization (MOS) and share a dataset of 3100 gold-standard opinion summaries to facilitate research in this domain. The dataset contains summaries of tweets spanning a 2-year period and covers more topics than any other public Twitter summarization dataset. Summaries are abstractive in nature and have been created by journalists skilled in summarizing news articles following a template separating factual information (main story) from author opinions. Our method differs from previous work on generating gold-standard summaries from social media, which usually involves selecting representative posts and thus favors extractive summarization models. To showcase the dataset’s utility and challenges, we benchmark a range of abstractive and extractive state-of-the-art summarization models and achieve good performance, with the former outperforming the latter. We also show that fine-tuning is necessary to improve performance and investigate the benefits of using different sample sizes.
期刊介绍:
The highly regarded quarterly journal Computational Linguistics has a companion journal called Transactions of the Association for Computational Linguistics. This open access journal publishes articles in all areas of natural language processing and is an important resource for academic and industry computational linguists, natural language processing experts, artificial intelligence and machine learning investigators, cognitive scientists, speech specialists, as well as linguists and philosophers. The journal disseminates work of vital relevance to these professionals on an annual basis.