IMPROVING COVERAGE AND NOVELTY OF ABSTRACTIVE TEXT SUMMARIZATION USING TRANSFER LEARNING AND DIVIDE AND CONQUER APPROACHES

IF 1.1 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Malaysian Journal of Computer Science Pub Date : 2023-07-31 DOI:10.22452/mjcs.vol36no3.4
Ayham Alomari, N. Idris, Aznul Qalid, I. Alsmadi
{"title":"IMPROVING COVERAGE AND NOVELTY OF ABSTRACTIVE TEXT SUMMARIZATION USING TRANSFER LEARNING AND DIVIDE AND CONQUER APPROACHES","authors":"Ayham Alomari, N. Idris, Aznul Qalid, I. Alsmadi","doi":"10.22452/mjcs.vol36no3.4","DOIUrl":null,"url":null,"abstract":"Automatic Text Summarization (ATS) models yield outcomes with insufficient coverage of crucial details and poor degrees of novelty. The first issue resulted from the lengthy input, while the second problem resulted from the characteristics of the training dataset itself. This research employs the divide-and-conquer approach to address the first issue by breaking the lengthy input into smaller pieces to be summarized, followed by the conquest of the results in order to cover more significant details. For the second challenge, these chunks are summarized by models trained on datasets with higher novelty levels in order to produce more human-like and concise summaries with more novel words that do not appear in the input article. The results demonstrate an improvement in both coverage and novelty levels. Moreover, we defined a new metric to measure the novelty of the summary. Finally, we investigated the findings to discover whether the novelty is influenced more by the dataset itself, as in CNN/DM, or by the training model and its training objective, as in Pegasus.","PeriodicalId":49894,"journal":{"name":"Malaysian Journal of Computer Science","volume":" ","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2023-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Malaysian Journal of Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.22452/mjcs.vol36no3.4","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Automatic Text Summarization (ATS) models yield outcomes with insufficient coverage of crucial details and poor degrees of novelty. The first issue resulted from the lengthy input, while the second problem resulted from the characteristics of the training dataset itself. This research employs the divide-and-conquer approach to address the first issue by breaking the lengthy input into smaller pieces to be summarized, followed by the conquest of the results in order to cover more significant details. For the second challenge, these chunks are summarized by models trained on datasets with higher novelty levels in order to produce more human-like and concise summaries with more novel words that do not appear in the input article. The results demonstrate an improvement in both coverage and novelty levels. Moreover, we defined a new metric to measure the novelty of the summary. Finally, we investigated the findings to discover whether the novelty is influenced more by the dataset itself, as in CNN/DM, or by the training model and its training objective, as in Pegasus.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
运用迁移学习和分而治之的方法提高抽象文本摘要的覆盖率和新颖性
自动文本摘要(ATS)模型产生的结果对关键细节的覆盖不足,新颖性较差。第一个问题是由于输入时间过长,而第二个问题是因为训练数据集本身的特性。这项研究采用了分而治之的方法来解决第一个问题,将冗长的输入分解成更小的部分进行总结,然后征服结果,以涵盖更重要的细节。对于第二个挑战,这些块由在具有更高新颖性水平的数据集上训练的模型进行总结,以便用输入文章中没有出现的更新颖的单词生成更人性化和简洁的摘要。结果表明,覆盖率和新颖性都有所提高。此外,我们定义了一个新的度量标准来衡量摘要的新颖性。最后,我们调查了这些发现,以发现新颖性是更多地受到数据集本身的影响,如在CNN/DM中,还是受到训练模型及其训练目标的影响,例如在飞马座中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Malaysian Journal of Computer Science
Malaysian Journal of Computer Science COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, THEORY & METHODS
CiteScore
2.20
自引率
33.30%
发文量
35
审稿时长
7.5 months
期刊介绍: The Malaysian Journal of Computer Science (ISSN 0127-9084) is published four times a year in January, April, July and October by the Faculty of Computer Science and Information Technology, University of Malaya, since 1985. Over the years, the journal has gained popularity and the number of paper submissions has increased steadily. The rigorous reviews from the referees have helped in ensuring that the high standard of the journal is maintained. The objectives are to promote exchange of information and knowledge in research work, new inventions/developments of Computer Science and on the use of Information Technology towards the structuring of an information-rich society and to assist the academic staff from local and foreign universities, business and industrial sectors, government departments and academic institutions on publishing research results and studies in Computer Science and Information Technology through a scholarly publication.  The journal is being indexed and abstracted by Clarivate Analytics'' Web of Science and Elsevier''s Scopus
期刊最新文献
METHODICAL EVALUATION OF HEALTHCARE INTELLIGENCE FOR HUMAN LIFE DISEASE DETECTION DISINFORMATION DETECTION ABOUT ISLAMIC ISSUES ON SOCIAL MEDIA USING DEEP LEARNING TECHNIQUES ENHANCING SECURITY OF RFID-ENABLED IOT SUPPLY CHAIN A TRACE CLUSTERING FRAMEWORK FOR IMPROVING THE BEHAVIORAL AND STRUCTURAL QUALITY OF PROCESS MODELS IN PROCESS MINING IMPROVING COVERAGE AND NOVELTY OF ABSTRACTIVE TEXT SUMMARIZATION USING TRANSFER LEARNING AND DIVIDE AND CONQUER APPROACHES
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1