D. Gunawan, Siti Hazizah Harahap, Romi Fadillah Rahmat
{"title":"Multi-document Summarization by using TextRank and Maximal Marginal Relevance for Text in Bahasa Indonesia","authors":"D. Gunawan, Siti Hazizah Harahap, Romi Fadillah Rahmat","doi":"10.1109/ICISS48059.2019.8969785","DOIUrl":null,"url":null,"abstract":"The text summarizer reduces unnecessary information by selecting the important sentences. In multi-document summarization, there is a possibility that two or more important sentences share similar information. Including those sentences to the summary result will cause redundant information. This research aims to reduce similar sentences from multi-document that share similar information to obtain a more concise text summary. In order to accomplish the objective, this research uses the combination of several online news articles, divided into six groups. The combined articles are pre-processed to produce a clean text. After obtaining the clean text, this research utilizes the TextRank algorithm to extract the important sentences by using the similarity measurement. This process yields the summarized text. However, the summarized text is still containing similar sentences. The next process is calculating Maximal Marginal Relevance (MMR) to reduce similar sentences. The result of this process is the final text summary. The evaluation uses ROUGE-1 and ROUGE-2 with the average F-score is 0.5103 and 0.4257, respectively.","PeriodicalId":125643,"journal":{"name":"2019 International Conference on ICT for Smart Society (ICISS)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on ICT for Smart Society (ICISS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICISS48059.2019.8969785","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
The text summarizer reduces unnecessary information by selecting the important sentences. In multi-document summarization, there is a possibility that two or more important sentences share similar information. Including those sentences to the summary result will cause redundant information. This research aims to reduce similar sentences from multi-document that share similar information to obtain a more concise text summary. In order to accomplish the objective, this research uses the combination of several online news articles, divided into six groups. The combined articles are pre-processed to produce a clean text. After obtaining the clean text, this research utilizes the TextRank algorithm to extract the important sentences by using the similarity measurement. This process yields the summarized text. However, the summarized text is still containing similar sentences. The next process is calculating Maximal Marginal Relevance (MMR) to reduce similar sentences. The result of this process is the final text summary. The evaluation uses ROUGE-1 and ROUGE-2 with the average F-score is 0.5103 and 0.4257, respectively.