基于进化算法的高效信息检索系统

IF 3.6 3区医学 Q2 NEUROSCIENCES Network Neuroscience Pub Date : 2022-10-28 DOI:10.3390/network2040034

Doaa N. Mhawi, Haider W. Oleiwi, N. Saeed, Heba L. Al-Taie

{"title":"基于进化算法的高效信息检索系统","authors":"Doaa N. Mhawi, Haider W. Oleiwi, N. Saeed, Heba L. Al-Taie","doi":"10.3390/network2040034","DOIUrl":null,"url":null,"abstract":"When it comes to web search, information retrieval (IR) represents a critical technique as web pages have been increasingly growing. However, web users face major problems; unrelated user query retrieved documents (i.e., low precision), a lack of relevant document retrieval (i.e., low recall), acceptable retrieval time, and minimum storage space. This paper proposed a novel advanced document-indexing method (ADIM) with an integrated evolutionary algorithm. The proposed IRS includes three main stages; the first stage (i.e., the advanced documents indexing method) is preprocessing, which consists of two steps: dataset documents reading and advanced documents indexing method (ADIM), resulting in a set of two tables. The second stage is the query searching algorithm to produce a set of words or keywords and the related documents retrieving. The third stage (i.e., the searching algorithm) consists of two steps. The modified genetic algorithm (MGA) proposed new fitness functions using a cross-point operator with dynamic length chromosomes with the adaptive function of the culture algorithm (CA). The proposed system ranks the most relevant documents to the user query by adding a simple parameter (∝) to the fitness function to guarantee the convergence solution, retrieving the most relevant user’s document by integrating MGA with the CA algorithm to achieve the best accuracy. This system was simulated using a free dataset called WebKb containing Worldwide Webpages of computer science departments at multiple universities. The dataset is composed of 8280 HTML-programed semi-structured documents. Experimental results and evaluation measurements showed 100% average precision with 98.5236% average recall for 50 test queries, while the average response time was 00.46.74.78 milliseconds with 18.8 MB memory space for document indexing. The proposed work outperforms all the literature, comparatively, representing a remarkable leap in the studied field.","PeriodicalId":48520,"journal":{"name":"Network Neuroscience","volume":"10 1","pages":"583-605"},"PeriodicalIF":3.6000,"publicationDate":"2022-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"An Efficient Information Retrieval System Using Evolutionary Algorithms\",\"authors\":\"Doaa N. Mhawi, Haider W. Oleiwi, N. Saeed, Heba L. Al-Taie\",\"doi\":\"10.3390/network2040034\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"When it comes to web search, information retrieval (IR) represents a critical technique as web pages have been increasingly growing. However, web users face major problems; unrelated user query retrieved documents (i.e., low precision), a lack of relevant document retrieval (i.e., low recall), acceptable retrieval time, and minimum storage space. This paper proposed a novel advanced document-indexing method (ADIM) with an integrated evolutionary algorithm. The proposed IRS includes three main stages; the first stage (i.e., the advanced documents indexing method) is preprocessing, which consists of two steps: dataset documents reading and advanced documents indexing method (ADIM), resulting in a set of two tables. The second stage is the query searching algorithm to produce a set of words or keywords and the related documents retrieving. The third stage (i.e., the searching algorithm) consists of two steps. The modified genetic algorithm (MGA) proposed new fitness functions using a cross-point operator with dynamic length chromosomes with the adaptive function of the culture algorithm (CA). The proposed system ranks the most relevant documents to the user query by adding a simple parameter (∝) to the fitness function to guarantee the convergence solution, retrieving the most relevant user’s document by integrating MGA with the CA algorithm to achieve the best accuracy. This system was simulated using a free dataset called WebKb containing Worldwide Webpages of computer science departments at multiple universities. The dataset is composed of 8280 HTML-programed semi-structured documents. Experimental results and evaluation measurements showed 100% average precision with 98.5236% average recall for 50 test queries, while the average response time was 00.46.74.78 milliseconds with 18.8 MB memory space for document indexing. The proposed work outperforms all the literature, comparatively, representing a remarkable leap in the studied field.\",\"PeriodicalId\":48520,\"journal\":{\"name\":\"Network Neuroscience\",\"volume\":\"10 1\",\"pages\":\"583-605\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2022-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Network Neuroscience\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3390/network2040034\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"NEUROSCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Network Neuroscience","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3390/network2040034","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"NEUROSCIENCES","Score":null,"Total":0}

引用次数: 5

摘要

当涉及到网络搜索时，信息检索(IR)是一项关键的技术，因为网页已经越来越多。然而，网络用户面临着重大问题;不相关的用户查询检索的文档(即低精度)、缺乏相关的文档检索(即低召回率)、可接受的检索时间和最小的存储空间。提出了一种集成进化算法的高级文献索引方法(ADIM)。拟议的IRS包括三个主要阶段;第一阶段(即高级文档索引方法)是预处理，它包括两个步骤:数据集文档读取和高级文档索引方法(ADIM)，得到一组两个表。第二阶段是查询搜索算法，生成一组单词或关键字并对相关文档进行检索。第三阶段(即搜索算法)由两个步骤组成。改进的遗传算法(MGA)采用带动态长度染色体的交叉点算子，结合培养算法(CA)的自适应功能提出了新的适应度函数。该系统通过在适应度函数中加入一个简单的参数(∝)来对用户查询的最相关文档进行排序，以保证收敛解，通过将MGA与CA算法相结合来检索最相关的用户文档，以达到最佳精度。这个系统是用一个名为WebKb的免费数据集来模拟的，该数据集包含了多所大学计算机科学系的全球网页。数据集由8280个html编程的半结构化文档组成。实验结果和评估测量表明，50个测试查询的平均准确率为100%，平均召回率为98.5236%，平均响应时间为00.46.74.78毫秒，文档索引的内存空间为18.8 MB。相对而言，所提出的工作优于所有文献，代表了所研究领域的显着飞跃。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An Efficient Information Retrieval System Using Evolutionary Algorithms

When it comes to web search, information retrieval (IR) represents a critical technique as web pages have been increasingly growing. However, web users face major problems; unrelated user query retrieved documents (i.e., low precision), a lack of relevant document retrieval (i.e., low recall), acceptable retrieval time, and minimum storage space. This paper proposed a novel advanced document-indexing method (ADIM) with an integrated evolutionary algorithm. The proposed IRS includes three main stages; the first stage (i.e., the advanced documents indexing method) is preprocessing, which consists of two steps: dataset documents reading and advanced documents indexing method (ADIM), resulting in a set of two tables. The second stage is the query searching algorithm to produce a set of words or keywords and the related documents retrieving. The third stage (i.e., the searching algorithm) consists of two steps. The modified genetic algorithm (MGA) proposed new fitness functions using a cross-point operator with dynamic length chromosomes with the adaptive function of the culture algorithm (CA). The proposed system ranks the most relevant documents to the user query by adding a simple parameter (∝) to the fitness function to guarantee the convergence solution, retrieving the most relevant user’s document by integrating MGA with the CA algorithm to achieve the best accuracy. This system was simulated using a free dataset called WebKb containing Worldwide Webpages of computer science departments at multiple universities. The dataset is composed of 8280 HTML-programed semi-structured documents. Experimental results and evaluation measurements showed 100% average precision with 98.5236% average recall for 50 test queries, while the average response time was 00.46.74.78 milliseconds with 18.8 MB memory space for document indexing. The proposed work outperforms all the literature, comparatively, representing a remarkable leap in the studied field.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Network Neuroscience NEUROSCIENCES-

CiteScore

6.40

自引率

6.40%

发文量

审稿时长

16 weeks