Text Summarization in Multi Document Using Genetic Algorithm

Nirwana Hendrastuty, Azhari Sn
{"title":"Text Summarization in Multi Document Using Genetic Algorithm","authors":"Nirwana Hendrastuty, Azhari Sn","doi":"10.22146/IJCCS.66026","DOIUrl":null,"url":null,"abstract":"Automatic text summarization is a representation of a document that contains the essence or main focus of the document. Text summarization is automatically performed using the extraction method. The extraction method summarizes by copying the text that is considered the most important or most informative from the source text into a summary [1]. Documents can be divided into two types, namely single documents and multi documents. Multi document is input that comes from many documents from one or more sources that have more than one main idea.This study aims to summarize the text using a Genetic Algorithm by paying attention to the extraction of text features on each chromosome. The feature extraction used is sentence position, positive keywords, negative keywords, similarity between sentences, sentences containing entity words, sentences containing numbers, sentence length, connections between sentences, the number of connections between sentences. The number of chromosomes used is half of the number of public complaints. The data used is data on public complaints against the DIY government from February 2018 to July 2020. The data is obtained from the e-lapor DIY website. From the test results, the average value of Precision 1, Recall is 0.71, and f-measure value is 0.79.","PeriodicalId":31625,"journal":{"name":"IJCCS Indonesian Journal of Computing and Cybernetics Systems","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IJCCS Indonesian Journal of Computing and Cybernetics Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22146/IJCCS.66026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Automatic text summarization is a representation of a document that contains the essence or main focus of the document. Text summarization is automatically performed using the extraction method. The extraction method summarizes by copying the text that is considered the most important or most informative from the source text into a summary [1]. Documents can be divided into two types, namely single documents and multi documents. Multi document is input that comes from many documents from one or more sources that have more than one main idea.This study aims to summarize the text using a Genetic Algorithm by paying attention to the extraction of text features on each chromosome. The feature extraction used is sentence position, positive keywords, negative keywords, similarity between sentences, sentences containing entity words, sentences containing numbers, sentence length, connections between sentences, the number of connections between sentences. The number of chromosomes used is half of the number of public complaints. The data used is data on public complaints against the DIY government from February 2018 to July 2020. The data is obtained from the e-lapor DIY website. From the test results, the average value of Precision 1, Recall is 0.71, and f-measure value is 0.79.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于遗传算法的多文档文本摘要
自动文本摘要是文档的一种表示,它包含了文档的本质或主要焦点。使用提取方法自动执行文本摘要。提取方法通过将被认为最重要或最有信息量的文本从源文本复制到摘要[1]中来进行汇总。文档可以分为两种类型,即单一文档和多文档。多文档是来自一个或多个来源的许多文档的输入,这些文档有多个主要思想。本研究的目的是利用遗传算法对文本进行总结,重点是提取每条染色体上的文本特征。使用的特征提取是句子位置、肯定关键词、否定关键词、句子之间的相似度、包含实体词的句子、包含数字的句子、句子长度、句子之间的连接、句子之间的连接数。使用的染色体数量是公众投诉数量的一半。使用的数据是2018年2月至2020年7月期间公众对DIY政府的投诉数据。数据来源于e-lapor DIY网站。从测试结果来看,Precision 1、Recall的平均值为0.71,f-measure值为0.79。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
20
审稿时长
12 weeks
期刊最新文献
Identify Reviews of Pedulilindungi Applications using Topic Modeling with Latent Dirichlet Allocation Method Convolutional Long Short-Term Memory (C-LSTM) For Multi Product Prediction Optimizing ODP Device Placement on FTTH Network Using Genetic Algorithms Backward Elimination for Feature Selection on Breast Cancer Classification Using Logistic Regression and Support Vector Machine Algorithms ESSAY ANSWER CLASSIFICATION WITH SMOTE RANDOM FOREST AND ADABOOST IN AUTOMATED ESSAY SCORING
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1