基于进化算法的高效信息检索系统

IF 3.6 3区 医学 Q2 NEUROSCIENCES Network Neuroscience Pub Date : 2022-10-28 DOI:10.3390/network2040034
Doaa N. Mhawi, Haider W. Oleiwi, N. Saeed, Heba L. Al-Taie
{"title":"基于进化算法的高效信息检索系统","authors":"Doaa N. Mhawi, Haider W. Oleiwi, N. Saeed, Heba L. Al-Taie","doi":"10.3390/network2040034","DOIUrl":null,"url":null,"abstract":"When it comes to web search, information retrieval (IR) represents a critical technique as web pages have been increasingly growing. However, web users face major problems; unrelated user query retrieved documents (i.e., low precision), a lack of relevant document retrieval (i.e., low recall), acceptable retrieval time, and minimum storage space. This paper proposed a novel advanced document-indexing method (ADIM) with an integrated evolutionary algorithm. The proposed IRS includes three main stages; the first stage (i.e., the advanced documents indexing method) is preprocessing, which consists of two steps: dataset documents reading and advanced documents indexing method (ADIM), resulting in a set of two tables. The second stage is the query searching algorithm to produce a set of words or keywords and the related documents retrieving. The third stage (i.e., the searching algorithm) consists of two steps. The modified genetic algorithm (MGA) proposed new fitness functions using a cross-point operator with dynamic length chromosomes with the adaptive function of the culture algorithm (CA). The proposed system ranks the most relevant documents to the user query by adding a simple parameter (∝) to the fitness function to guarantee the convergence solution, retrieving the most relevant user’s document by integrating MGA with the CA algorithm to achieve the best accuracy. This system was simulated using a free dataset called WebKb containing Worldwide Webpages of computer science departments at multiple universities. The dataset is composed of 8280 HTML-programed semi-structured documents. Experimental results and evaluation measurements showed 100% average precision with 98.5236% average recall for 50 test queries, while the average response time was 00.46.74.78 milliseconds with 18.8 MB memory space for document indexing. The proposed work outperforms all the literature, comparatively, representing a remarkable leap in the studied field.","PeriodicalId":48520,"journal":{"name":"Network Neuroscience","volume":"10 1","pages":"583-605"},"PeriodicalIF":3.6000,"publicationDate":"2022-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"An Efficient Information Retrieval System Using Evolutionary Algorithms\",\"authors\":\"Doaa N. Mhawi, Haider W. Oleiwi, N. Saeed, Heba L. Al-Taie\",\"doi\":\"10.3390/network2040034\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"When it comes to web search, information retrieval (IR) represents a critical technique as web pages have been increasingly growing. However, web users face major problems; unrelated user query retrieved documents (i.e., low precision), a lack of relevant document retrieval (i.e., low recall), acceptable retrieval time, and minimum storage space. This paper proposed a novel advanced document-indexing method (ADIM) with an integrated evolutionary algorithm. The proposed IRS includes three main stages; the first stage (i.e., the advanced documents indexing method) is preprocessing, which consists of two steps: dataset documents reading and advanced documents indexing method (ADIM), resulting in a set of two tables. The second stage is the query searching algorithm to produce a set of words or keywords and the related documents retrieving. The third stage (i.e., the searching algorithm) consists of two steps. The modified genetic algorithm (MGA) proposed new fitness functions using a cross-point operator with dynamic length chromosomes with the adaptive function of the culture algorithm (CA). The proposed system ranks the most relevant documents to the user query by adding a simple parameter (∝) to the fitness function to guarantee the convergence solution, retrieving the most relevant user’s document by integrating MGA with the CA algorithm to achieve the best accuracy. This system was simulated using a free dataset called WebKb containing Worldwide Webpages of computer science departments at multiple universities. The dataset is composed of 8280 HTML-programed semi-structured documents. Experimental results and evaluation measurements showed 100% average precision with 98.5236% average recall for 50 test queries, while the average response time was 00.46.74.78 milliseconds with 18.8 MB memory space for document indexing. The proposed work outperforms all the literature, comparatively, representing a remarkable leap in the studied field.\",\"PeriodicalId\":48520,\"journal\":{\"name\":\"Network Neuroscience\",\"volume\":\"10 1\",\"pages\":\"583-605\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2022-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Network Neuroscience\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3390/network2040034\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"NEUROSCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Network Neuroscience","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3390/network2040034","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"NEUROSCIENCES","Score":null,"Total":0}
引用次数: 5

摘要

当涉及到网络搜索时,信息检索(IR)是一项关键的技术,因为网页已经越来越多。然而,网络用户面临着重大问题;不相关的用户查询检索的文档(即低精度)、缺乏相关的文档检索(即低召回率)、可接受的检索时间和最小的存储空间。提出了一种集成进化算法的高级文献索引方法(ADIM)。拟议的IRS包括三个主要阶段;第一阶段(即高级文档索引方法)是预处理,它包括两个步骤:数据集文档读取和高级文档索引方法(ADIM),得到一组两个表。第二阶段是查询搜索算法,生成一组单词或关键字并对相关文档进行检索。第三阶段(即搜索算法)由两个步骤组成。改进的遗传算法(MGA)采用带动态长度染色体的交叉点算子,结合培养算法(CA)的自适应功能提出了新的适应度函数。该系统通过在适应度函数中加入一个简单的参数(∝)来对用户查询的最相关文档进行排序,以保证收敛解,通过将MGA与CA算法相结合来检索最相关的用户文档,以达到最佳精度。这个系统是用一个名为WebKb的免费数据集来模拟的,该数据集包含了多所大学计算机科学系的全球网页。数据集由8280个html编程的半结构化文档组成。实验结果和评估测量表明,50个测试查询的平均准确率为100%,平均召回率为98.5236%,平均响应时间为00.46.74.78毫秒,文档索引的内存空间为18.8 MB。相对而言,所提出的工作优于所有文献,代表了所研究领域的显着飞跃。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
An Efficient Information Retrieval System Using Evolutionary Algorithms
When it comes to web search, information retrieval (IR) represents a critical technique as web pages have been increasingly growing. However, web users face major problems; unrelated user query retrieved documents (i.e., low precision), a lack of relevant document retrieval (i.e., low recall), acceptable retrieval time, and minimum storage space. This paper proposed a novel advanced document-indexing method (ADIM) with an integrated evolutionary algorithm. The proposed IRS includes three main stages; the first stage (i.e., the advanced documents indexing method) is preprocessing, which consists of two steps: dataset documents reading and advanced documents indexing method (ADIM), resulting in a set of two tables. The second stage is the query searching algorithm to produce a set of words or keywords and the related documents retrieving. The third stage (i.e., the searching algorithm) consists of two steps. The modified genetic algorithm (MGA) proposed new fitness functions using a cross-point operator with dynamic length chromosomes with the adaptive function of the culture algorithm (CA). The proposed system ranks the most relevant documents to the user query by adding a simple parameter (∝) to the fitness function to guarantee the convergence solution, retrieving the most relevant user’s document by integrating MGA with the CA algorithm to achieve the best accuracy. This system was simulated using a free dataset called WebKb containing Worldwide Webpages of computer science departments at multiple universities. The dataset is composed of 8280 HTML-programed semi-structured documents. Experimental results and evaluation measurements showed 100% average precision with 98.5236% average recall for 50 test queries, while the average response time was 00.46.74.78 milliseconds with 18.8 MB memory space for document indexing. The proposed work outperforms all the literature, comparatively, representing a remarkable leap in the studied field.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Network Neuroscience
Network Neuroscience NEUROSCIENCES-
CiteScore
6.40
自引率
6.40%
发文量
68
审稿时长
16 weeks
期刊最新文献
A Bayesian incorporated linear non-Gaussian acyclic model for multiple directed graph estimation to study brain emotion circuit development in adolescence. Analyzing asymmetry in brain hierarchies with a linear state-space model of resting-state fMRI data. Brain sodium MRI-derived priors support the estimation of epileptogenic zones using personalized model-based methods in epilepsy. Developmental differences in canonical cortical networks: Insights from microstructure-informed tractography. Frequency modulation increases the specificity of time-resolved connectivity: A resting-state fMRI study.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1