Intelligence Information Retrieval System Modeling for Afaan Oromo

2021 International Conference on Computational Performance Evaluation (ComPE) Pub Date : 2021-12-01 DOI:10.1109/ComPE53109.2021.9752270

Amin Tuni Gure, D. P. Sharma, J. K. Verma

{"title":"Intelligence Information Retrieval System Modeling for Afaan Oromo","authors":"Amin Tuni Gure, D. P. Sharma, J. K. Verma","doi":"10.1109/ComPE53109.2021.9752270","DOIUrl":null,"url":null,"abstract":"Due to today’s information overload, finding and retrieving desired data or information has become difficult. Due to this new phenomenon, users will have a harder time locating and retrieving relevant information. It is now the norm rather than the exception to have data and information in multiple languages. It is critical in Ethiopia, where huge amounts of Afaan Oromo data are produced daily. With massive document collections come archival and search issues, putting these data and information characteristics to the test. Afaan Oromo users could easily search for and retrieve data that was relevant to their needs and interests using a hybrid information retrieval system developed for Afaan Oromo. Performance of information retrieval systems (IRS) has been under-researched in the Afaan Oromo linguistic domain. The efficiency of these systems has been found to be significantly lacking. For these issues, the AOIR system’s performance was improved by integrating various IR system approaches. The prototype includes the current indexing and searching subsystems. On-line news articles (Oromia Broadcasting Network, VOA Afaan Oromo), websites, books, and the Afaan Oromo Bible were used to gather 1000 Afaan Oromo text documents for the experiment. To identify terms and vocabulary that contained content, the text was subjected to text operations such as tokenization, normalization, stop-word removal and stemming and calculated using tf-idf term weighting scheme. After the experimental analysis in Python 3.7, precision, recall, and F-measure were all greater than 96.6 percent. The polysemy issue continued to affect the system’s overall performance. The paper also advocates for more work on system performance optimization to advance the AOIR.","PeriodicalId":211704,"journal":{"name":"2021 International Conference on Computational Performance Evaluation (ComPE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Computational Performance Evaluation (ComPE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ComPE53109.2021.9752270","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Due to today’s information overload, finding and retrieving desired data or information has become difficult. Due to this new phenomenon, users will have a harder time locating and retrieving relevant information. It is now the norm rather than the exception to have data and information in multiple languages. It is critical in Ethiopia, where huge amounts of Afaan Oromo data are produced daily. With massive document collections come archival and search issues, putting these data and information characteristics to the test. Afaan Oromo users could easily search for and retrieve data that was relevant to their needs and interests using a hybrid information retrieval system developed for Afaan Oromo. Performance of information retrieval systems (IRS) has been under-researched in the Afaan Oromo linguistic domain. The efficiency of these systems has been found to be significantly lacking. For these issues, the AOIR system’s performance was improved by integrating various IR system approaches. The prototype includes the current indexing and searching subsystems. On-line news articles (Oromia Broadcasting Network, VOA Afaan Oromo), websites, books, and the Afaan Oromo Bible were used to gather 1000 Afaan Oromo text documents for the experiment. To identify terms and vocabulary that contained content, the text was subjected to text operations such as tokenization, normalization, stop-word removal and stemming and calculated using tf-idf term weighting scheme. After the experimental analysis in Python 3.7, precision, recall, and F-measure were all greater than 96.6 percent. The polysemy issue continued to affect the system’s overall performance. The paper also advocates for more work on system performance optimization to advance the AOIR.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

阿法安·奥罗莫人智能信息检索系统建模

由于今天的信息过载，查找和检索所需的数据或信息变得困难。由于这种新现象，用户将很难定位和检索相关信息。现在，以多种语言提供数据和信息已成为常态，而不是例外。这对埃塞俄比亚至关重要，因为每天都会产生大量的Afaan Oromo数据。大量的文献收集带来了档案和检索问题，对这些数据和信息的特性进行了考验。Afaan Oromo用户可以使用为Afaan Oromo开发的混合信息检索系统轻松搜索和检索与他们的需求和兴趣相关的数据。在阿法奥罗莫语领域中，信息检索系统(IRS)的性能已经得到了充分的研究。人们发现这些系统的效率明显不足。针对这些问题，通过集成各种红外系统方法，提高了AOIR系统的性能。原型包括当前的索引和搜索子系统。在线新闻文章(奥罗米亚广播网，美国之音阿法安奥罗莫)，网站，书籍和阿法安奥罗莫圣经被用来收集1000个阿法安奥罗莫文本文档用于实验。为了识别包含内容的术语和词汇表，对文本进行了标记化、规范化、停止词删除和词干提取等文本操作，并使用tf-idf术语加权方案进行了计算。在Python 3.7中进行实验分析后，精度、召回率和F-measure都大于96.6%。多义问题继续影响系统的整体性能。本文还提倡在系统性能优化方面做更多的工作，以推进AOIR。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 International Conference on Computational Performance Evaluation (ComPE)

自引率

0.00%

发文量

期刊最新文献

iSIMP with Integrity Validation using MD5 Hash A Fault Detection Scheme for IoT-enabled WSNs YOLOv3 based Real Time Social Distance Violation Detection in Public Places Finite Element Analysis of Femur Bone under Different Loading Conditions An Efficient and Anonymous Authentication Key Agreement Protocol for Smart Transportation System