CATS: Customizable Abstractive Topic-based Summarization

ACM Transactions on Information Systems (TOIS) Pub Date : 2021-10-25 DOI:10.1145/3464299

Seyed Ali Bahrainian, George Zerveas, F. Crestani, Carsten Eickhoff

{"title":"CATS: Customizable Abstractive Topic-based Summarization","authors":"Seyed Ali Bahrainian, George Zerveas, F. Crestani, Carsten Eickhoff","doi":"10.1145/3464299","DOIUrl":null,"url":null,"abstract":"Neural sequence-to-sequence models are the state-of-the-art approach used in abstractive summarization of textual documents, useful for producing condensed versions of source text narratives without being restricted to using only words from the original text. Despite the advances in abstractive summarization, custom generation of summaries (e.g., towards a user’s preference) remains unexplored. In this article, we present CATS, an abstractive neural summarization model that summarizes content in a sequence-to-sequence fashion while also introducing a new mechanism to control the underlying latent topic distribution of the produced summaries. We empirically illustrate the efficacy of our model in producing customized summaries and present findings that facilitate the design of such systems. We use the well-known CNN/DailyMail dataset to evaluate our model. Furthermore, we present a transfer-learning method and demonstrate the effectiveness of our approach in a low resource setting, i.e., abstractive summarization of meetings minutes, where combining the main available meetings’ transcripts datasets, AMI and International Computer Science Institute(ICSI), results in merely a few hundred training documents.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"449 1","pages":"1 - 24"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Information Systems (TOIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3464299","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

Neural sequence-to-sequence models are the state-of-the-art approach used in abstractive summarization of textual documents, useful for producing condensed versions of source text narratives without being restricted to using only words from the original text. Despite the advances in abstractive summarization, custom generation of summaries (e.g., towards a user’s preference) remains unexplored. In this article, we present CATS, an abstractive neural summarization model that summarizes content in a sequence-to-sequence fashion while also introducing a new mechanism to control the underlying latent topic distribution of the produced summaries. We empirically illustrate the efficacy of our model in producing customized summaries and present findings that facilitate the design of such systems. We use the well-known CNN/DailyMail dataset to evaluate our model. Furthermore, we present a transfer-learning method and demonstrate the effectiveness of our approach in a low resource setting, i.e., abstractive summarization of meetings minutes, where combining the main available meetings’ transcripts datasets, AMI and International Computer Science Institute(ICSI), results in merely a few hundred training documents.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CATS:可定制的基于主题的抽象摘要

神经序列到序列模型是用于文本文档抽象摘要的最先进方法，可用于生成源文本叙述的浓缩版本，而不限于使用原始文本中的单词。尽管在抽象摘要方面取得了进步，但是自定义摘要的生成(例如，根据用户的偏好)仍然没有得到探索。在本文中，我们提出了CATS，这是一个抽象的神经摘要模型，它以序列到序列的方式总结内容，同时还引入了一种新机制来控制生成摘要的潜在主题分布。我们从经验上说明了我们的模型在产生定制摘要方面的有效性，并提出了促进此类系统设计的发现。我们使用著名的CNN/DailyMail数据集来评估我们的模型。此外，我们提出了一种迁移学习方法，并证明了我们的方法在低资源环境下的有效性，即会议纪要的抽象摘要，其中结合主要可用的会议记录数据集，AMI和国际计算机科学研究所(ICSI)，只产生了几百个培训文档。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Information Systems (TOIS)

自引率

0.00%

发文量