提高Solr信息检索系统的能力:阿拉伯语

Aminah Alqahtani, Manal Alnefaie, Nourah Alamri, Ahmad Khorsi
{"title":"提高Solr信息检索系统的能力:阿拉伯语","authors":"Aminah Alqahtani, Manal Alnefaie, Nourah Alamri, Ahmad Khorsi","doi":"10.1109/ICCAIS48893.2020.9096810","DOIUrl":null,"url":null,"abstract":"Arabic language is one of the most complex languages in Natural Language Processing (NLP). Solr is an Information Retrieval System (IRS) that is widely known for its accurate results and high performance in English. However, Arabic stemmer that is currently used by Solr is called Light-10 which has some deficiencies. In this approach, we evaluated two light stemmers (Assem, Tashaphyne) and two root stemmers (Khoja, ISRI) and chose the two stemmers that the experiments show the best; in addition to Light-10 stemmer. The highest two stemmers are Assem and Khoja. So, we used these two stemmers and Light-10 to evaluate the search retrieval accuracy of Solr in Arabic, then evaluated them again with synonyms. The evaluation is based on using two metrics Precision and Normalized Discounted Cumulative Gain (NDCG). Assem stemmer has the highest accuracy which is 86%, Light-10 is 83% and Khoja is 81%. Finally, Assem stemmer has been used as the stemmer for Almufed search engine that we developed in this approach based on Solr for more than 6000 Arabic books from Alshamela Library.","PeriodicalId":422184,"journal":{"name":"2020 3rd International Conference on Computer Applications & Information Security (ICCAIS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Enhancing the Capabilities of Solr Information Retrieval System: Arabic Language\",\"authors\":\"Aminah Alqahtani, Manal Alnefaie, Nourah Alamri, Ahmad Khorsi\",\"doi\":\"10.1109/ICCAIS48893.2020.9096810\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Arabic language is one of the most complex languages in Natural Language Processing (NLP). Solr is an Information Retrieval System (IRS) that is widely known for its accurate results and high performance in English. However, Arabic stemmer that is currently used by Solr is called Light-10 which has some deficiencies. In this approach, we evaluated two light stemmers (Assem, Tashaphyne) and two root stemmers (Khoja, ISRI) and chose the two stemmers that the experiments show the best; in addition to Light-10 stemmer. The highest two stemmers are Assem and Khoja. So, we used these two stemmers and Light-10 to evaluate the search retrieval accuracy of Solr in Arabic, then evaluated them again with synonyms. The evaluation is based on using two metrics Precision and Normalized Discounted Cumulative Gain (NDCG). Assem stemmer has the highest accuracy which is 86%, Light-10 is 83% and Khoja is 81%. Finally, Assem stemmer has been used as the stemmer for Almufed search engine that we developed in this approach based on Solr for more than 6000 Arabic books from Alshamela Library.\",\"PeriodicalId\":422184,\"journal\":{\"name\":\"2020 3rd International Conference on Computer Applications & Information Security (ICCAIS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 3rd International Conference on Computer Applications & Information Security (ICCAIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCAIS48893.2020.9096810\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 3rd International Conference on Computer Applications & Information Security (ICCAIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCAIS48893.2020.9096810","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

阿拉伯语是自然语言处理(NLP)中最复杂的语言之一。Solr是一个信息检索系统(IRS),以其准确的结果和高性能的英语而闻名。然而,Solr目前使用的阿拉伯语茎是Light-10,它有一些不足。在该方法中,我们对两个轻茎(Assem, Tashaphyne)和两个根茎(Khoja, ISRI)进行了评价,并选择了两个实验表现最好的茎;除了光-10茎。最高的两个茎是Assem和Khoja。因此,我们使用这两个stemmers和Light-10来评估阿拉伯语Solr的搜索检索精度,然后再使用同义词对它们进行评估。评估是基于两个指标精度和归一化贴现累积增益(NDCG)。Assem stemmer的准确率最高,为86%,Light-10为83%,Khoja为81%。最后,Assem的词干被用作Almufed搜索引擎的词干,我们基于Solr开发了这个搜索引擎,搜索了阿拉伯文图书馆的6000多本阿拉伯文图书。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Enhancing the Capabilities of Solr Information Retrieval System: Arabic Language
Arabic language is one of the most complex languages in Natural Language Processing (NLP). Solr is an Information Retrieval System (IRS) that is widely known for its accurate results and high performance in English. However, Arabic stemmer that is currently used by Solr is called Light-10 which has some deficiencies. In this approach, we evaluated two light stemmers (Assem, Tashaphyne) and two root stemmers (Khoja, ISRI) and chose the two stemmers that the experiments show the best; in addition to Light-10 stemmer. The highest two stemmers are Assem and Khoja. So, we used these two stemmers and Light-10 to evaluate the search retrieval accuracy of Solr in Arabic, then evaluated them again with synonyms. The evaluation is based on using two metrics Precision and Normalized Discounted Cumulative Gain (NDCG). Assem stemmer has the highest accuracy which is 86%, Light-10 is 83% and Khoja is 81%. Finally, Assem stemmer has been used as the stemmer for Almufed search engine that we developed in this approach based on Solr for more than 6000 Arabic books from Alshamela Library.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
ICCAIS 2020 Copyright Page The Best-Worst Method for Resource Allocation and Task Scheduling in Cloud Computing A Recommender System for Linear Satellite TV: Is It Possible? Proactive Priority Based Response to Road Flooding using AHP: A Case Study in Dammam Data and Location Privacy Issues in IoT Applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1