{"title":"IMPROVING COVERAGE AND NOVELTY OF ABSTRACTIVE TEXT SUMMARIZATION USING TRANSFER LEARNING AND DIVIDE AND CONQUER APPROACHES","authors":"Ayham Alomari, N. Idris, Aznul Qalid, I. Alsmadi","doi":"10.22452/mjcs.vol36no3.4","DOIUrl":null,"url":null,"abstract":"Automatic Text Summarization (ATS) models yield outcomes with insufficient coverage of crucial details and poor degrees of novelty. The first issue resulted from the lengthy input, while the second problem resulted from the characteristics of the training dataset itself. This research employs the divide-and-conquer approach to address the first issue by breaking the lengthy input into smaller pieces to be summarized, followed by the conquest of the results in order to cover more significant details. For the second challenge, these chunks are summarized by models trained on datasets with higher novelty levels in order to produce more human-like and concise summaries with more novel words that do not appear in the input article. The results demonstrate an improvement in both coverage and novelty levels. Moreover, we defined a new metric to measure the novelty of the summary. Finally, we investigated the findings to discover whether the novelty is influenced more by the dataset itself, as in CNN/DM, or by the training model and its training objective, as in Pegasus.","PeriodicalId":49894,"journal":{"name":"Malaysian Journal of Computer Science","volume":" ","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2023-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Malaysian Journal of Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.22452/mjcs.vol36no3.4","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Automatic Text Summarization (ATS) models yield outcomes with insufficient coverage of crucial details and poor degrees of novelty. The first issue resulted from the lengthy input, while the second problem resulted from the characteristics of the training dataset itself. This research employs the divide-and-conquer approach to address the first issue by breaking the lengthy input into smaller pieces to be summarized, followed by the conquest of the results in order to cover more significant details. For the second challenge, these chunks are summarized by models trained on datasets with higher novelty levels in order to produce more human-like and concise summaries with more novel words that do not appear in the input article. The results demonstrate an improvement in both coverage and novelty levels. Moreover, we defined a new metric to measure the novelty of the summary. Finally, we investigated the findings to discover whether the novelty is influenced more by the dataset itself, as in CNN/DM, or by the training model and its training objective, as in Pegasus.
期刊介绍:
The Malaysian Journal of Computer Science (ISSN 0127-9084) is published four times a year in January, April, July and October by the Faculty of Computer Science and Information Technology, University of Malaya, since 1985. Over the years, the journal has gained popularity and the number of paper submissions has increased steadily. The rigorous reviews from the referees have helped in ensuring that the high standard of the journal is maintained. The objectives are to promote exchange of information and knowledge in research work, new inventions/developments of Computer Science and on the use of Information Technology towards the structuring of an information-rich society and to assist the academic staff from local and foreign universities, business and industrial sectors, government departments and academic institutions on publishing research results and studies in Computer Science and Information Technology through a scholarly publication. The journal is being indexed and abstracted by Clarivate Analytics'' Web of Science and Elsevier''s Scopus