Text Summarization in Multi Document Using Genetic Algorithm

IJCCS Indonesian Journal of Computing and Cybernetics Systems Pub Date : 2021-10-31 DOI:10.22146/IJCCS.66026

Nirwana Hendrastuty, Azhari Sn

{"title":"Text Summarization in Multi Document Using Genetic Algorithm","authors":"Nirwana Hendrastuty, Azhari Sn","doi":"10.22146/IJCCS.66026","DOIUrl":null,"url":null,"abstract":"Automatic text summarization is a representation of a document that contains the essence or main focus of the document. Text summarization is automatically performed using the extraction method. The extraction method summarizes by copying the text that is considered the most important or most informative from the source text into a summary [1]. Documents can be divided into two types, namely single documents and multi documents. Multi document is input that comes from many documents from one or more sources that have more than one main idea.This study aims to summarize the text using a Genetic Algorithm by paying attention to the extraction of text features on each chromosome. The feature extraction used is sentence position, positive keywords, negative keywords, similarity between sentences, sentences containing entity words, sentences containing numbers, sentence length, connections between sentences, the number of connections between sentences. The number of chromosomes used is half of the number of public complaints. The data used is data on public complaints against the DIY government from February 2018 to July 2020. The data is obtained from the e-lapor DIY website. From the test results, the average value of Precision 1, Recall is 0.71, and f-measure value is 0.79.","PeriodicalId":31625,"journal":{"name":"IJCCS Indonesian Journal of Computing and Cybernetics Systems","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IJCCS Indonesian Journal of Computing and Cybernetics Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22146/IJCCS.66026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Automatic text summarization is a representation of a document that contains the essence or main focus of the document. Text summarization is automatically performed using the extraction method. The extraction method summarizes by copying the text that is considered the most important or most informative from the source text into a summary [1]. Documents can be divided into two types, namely single documents and multi documents. Multi document is input that comes from many documents from one or more sources that have more than one main idea.This study aims to summarize the text using a Genetic Algorithm by paying attention to the extraction of text features on each chromosome. The feature extraction used is sentence position, positive keywords, negative keywords, similarity between sentences, sentences containing entity words, sentences containing numbers, sentence length, connections between sentences, the number of connections between sentences. The number of chromosomes used is half of the number of public complaints. The data used is data on public complaints against the DIY government from February 2018 to July 2020. The data is obtained from the e-lapor DIY website. From the test results, the average value of Precision 1, Recall is 0.71, and f-measure value is 0.79.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于遗传算法的多文档文本摘要

自动文本摘要是文档的一种表示，它包含了文档的本质或主要焦点。使用提取方法自动执行文本摘要。提取方法通过将被认为最重要或最有信息量的文本从源文本复制到摘要[1]中来进行汇总。文档可以分为两种类型，即单一文档和多文档。多文档是来自一个或多个来源的许多文档的输入，这些文档有多个主要思想。本研究的目的是利用遗传算法对文本进行总结，重点是提取每条染色体上的文本特征。使用的特征提取是句子位置、肯定关键词、否定关键词、句子之间的相似度、包含实体词的句子、包含数字的句子、句子长度、句子之间的连接、句子之间的连接数。使用的染色体数量是公众投诉数量的一半。使用的数据是2018年2月至2020年7月期间公众对DIY政府的投诉数据。数据来源于e-lapor DIY网站。从测试结果来看，Precision 1、Recall的平均值为0.71,f-measure值为0.79。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IJCCS Indonesian Journal of Computing and Cybernetics Systems

自引率

0.00%

发文量

审稿时长

12 weeks