Relevance Feedback using Genetic Algorithm on Information Retrieval for Indonesian Language Documents

Journal of Information Systems Engineering and Business Intelligence Pub Date : 2019-10-24 DOI:10.20473/jisebi.5.2.171-182

Salman Dziyaul Azmi, R. Kusumaningrum

{"title":"Relevance Feedback using Genetic Algorithm on Information Retrieval for Indonesian Language Documents","authors":"Salman Dziyaul Azmi, R. Kusumaningrum","doi":"10.20473/jisebi.5.2.171-182","DOIUrl":null,"url":null,"abstract":"Background: The Rapid growth of technological developments in Indonesia had resulted in a growing amount of information. Therefore, a new information retrieval environment is necessary for finding documents that are in accordance with the user’s information needs.Objective: The purpose of this study is to uncover the differences between using Relevance Feedback (RF) with genetic algorithm and standard information retrieval systems without relevance feedback for the Indonesian language documents.Methods: The standard Information Retrieval (IR) System uses Sastrawi stemmer and Vector Space Model, while Genetic Algorithm-based (GA-based) relevance feedback uses Roulette-wheel selection and crossover recombination. The evaluation metrics are Mean Average Precision (MAP) and average recall based on user judgments.Results: By using two Indonesian language document datasets, namely abstract thesis and news dataset, the results show 15.2% and 28.6% increase in the corresponding MAP values for both datasets as opposed to the standard Information Retrieval System. A respective 7.1% and 10.5% improvement on the recall value at 10th position was also observed for both datasets. The best obtained genetic algorithm parameters for abstract thesis datasets were a population size of 20 with 0.7 crossover probability and 0.2 mutation probability, while for news dataset, the best obtained genetic algorithm parameters were a population size of 10 with 0.5 crossover probability and 0.2 mutation probability.Conclusion: Genetic Algorithm-based relevance feedback increases both values of MAP and average recall at 10th position of retrieved document. Generally, the best genetic algorithm parameters are as follows, mutation probability is 0.2, whereas the size of population size and crossover probability depends on the size of dataset and length of the query.Keywords: Genetic Algorithm, Information Retrieval, Indonesian language document, Mean Average Precision, Relevance Feedback ","PeriodicalId":16185,"journal":{"name":"Journal of Information Systems Engineering and Business Intelligence","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Systems Engineering and Business Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.20473/jisebi.5.2.171-182","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Background: The Rapid growth of technological developments in Indonesia had resulted in a growing amount of information. Therefore, a new information retrieval environment is necessary for finding documents that are in accordance with the user’s information needs.Objective: The purpose of this study is to uncover the differences between using Relevance Feedback (RF) with genetic algorithm and standard information retrieval systems without relevance feedback for the Indonesian language documents.Methods: The standard Information Retrieval (IR) System uses Sastrawi stemmer and Vector Space Model, while Genetic Algorithm-based (GA-based) relevance feedback uses Roulette-wheel selection and crossover recombination. The evaluation metrics are Mean Average Precision (MAP) and average recall based on user judgments.Results: By using two Indonesian language document datasets, namely abstract thesis and news dataset, the results show 15.2% and 28.6% increase in the corresponding MAP values for both datasets as opposed to the standard Information Retrieval System. A respective 7.1% and 10.5% improvement on the recall value at 10th position was also observed for both datasets. The best obtained genetic algorithm parameters for abstract thesis datasets were a population size of 20 with 0.7 crossover probability and 0.2 mutation probability, while for news dataset, the best obtained genetic algorithm parameters were a population size of 10 with 0.5 crossover probability and 0.2 mutation probability.Conclusion: Genetic Algorithm-based relevance feedback increases both values of MAP and average recall at 10th position of retrieved document. Generally, the best genetic algorithm parameters are as follows, mutation probability is 0.2, whereas the size of population size and crossover probability depends on the size of dataset and length of the query.Keywords: Genetic Algorithm, Information Retrieval, Indonesian language document, Mean Average Precision, Relevance Feedback

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于遗传算法的印尼语文献信息检索相关反馈

背景:印度尼西亚技术发展的迅速增长导致了信息量的增加。因此，需要一个新的信息检索环境来查找符合用户信息需求的文档。目的:本研究的目的是揭示使用关联反馈的遗传算法与不使用关联反馈的标准信息检索系统在印尼语文献检索中的差异。方法:标准的信息检索(IR)系统采用了Sastrawi梗和向量空间模型，而基于遗传算法(ga)的关联反馈采用了轮盘选择和交叉重组。评估指标是平均精度(MAP)和基于用户判断的平均召回率。结果:通过使用两个印尼语文档数据集，即摘要论文和新闻数据集，结果显示，与标准信息检索系统相比，这两个数据集对应的MAP值分别提高了15.2%和28.6%。两个数据集在第10位的召回值上也分别提高了7.1%和10.5%。摘要论文数据集的最佳遗传算法参数为人口规模为20人，交叉概率为0.7，突变概率为0.2;新闻数据集的最佳遗传算法参数为人口规模为10人，交叉概率为0.5，突变概率为0.2。结论:基于遗传算法的关联反馈提高了检索文献MAP值和第10位的平均查全率。一般情况下，最佳的遗传算法参数为:变异概率为0.2，而种群大小和交叉概率的大小取决于数据集的大小和查询的长度。关键词:遗传算法，信息检索，印尼语文献，平均精度，相关反馈

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Information Systems Engineering and Business Intelligence

CiteScore

0.30

自引率

0.00%

发文量