基于潜在语义索引的孟加拉文文献排序信息检索系统

Md. Nesarul Hoque, Rabiul Islam, Md. Sajidul Karim
{"title":"基于潜在语义索引的孟加拉文文献排序信息检索系统","authors":"Md. Nesarul Hoque, Rabiul Islam, Md. Sajidul Karim","doi":"10.1109/ICASERT.2019.8934837","DOIUrl":null,"url":null,"abstract":"Nowadays, like the English and other languages, Bangla also plays a significant role to strengthen the web repository. The storing rate of Bangla information is augmented day-by-day. Because of the numerous documents in the World Wide Web, it is very difficult for a user to retrieve the desired information. Furthermore, finding the useful documents tends to be more time spending as well as an annoying job. These demands emerge to develop an Information Retrieval (IR) system to document ranking for Bangla language. In this paper, we have built such a retrieval system where users can find their needed documents which correspond to their own query strings throughout the ranking index. Although a lot of works have been done for English and other languages to rank the documents, unfortunately, we have found a very negligible amount of contributions in Bangla Language. Many methods such as – Boolean model, Maximal Marginal Relevance (MMR), Portfolio Theory (PR), Quantum Probability Ranking Principle (QPRP), Query Directed Clustering (QDC), Vector-based TFIDF and so on, have been proposed to implement the document ranking system. Here, we have applied a new approach, called Latent Semantic Indexing (LSI) to do the same task for Bangla documents. LSI uses the mathematical method called Singular Value Decomposition (SVD). After that, we have applied the cosine similarity to rank all the documents. We believe that the performance result of our proposed system has reached the trustworthy level.","PeriodicalId":6613,"journal":{"name":"2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT)","volume":"35 1","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Information Retrieval System in Bangla Document Ranking using Latent Semantic Indexing\",\"authors\":\"Md. Nesarul Hoque, Rabiul Islam, Md. Sajidul Karim\",\"doi\":\"10.1109/ICASERT.2019.8934837\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, like the English and other languages, Bangla also plays a significant role to strengthen the web repository. The storing rate of Bangla information is augmented day-by-day. Because of the numerous documents in the World Wide Web, it is very difficult for a user to retrieve the desired information. Furthermore, finding the useful documents tends to be more time spending as well as an annoying job. These demands emerge to develop an Information Retrieval (IR) system to document ranking for Bangla language. In this paper, we have built such a retrieval system where users can find their needed documents which correspond to their own query strings throughout the ranking index. Although a lot of works have been done for English and other languages to rank the documents, unfortunately, we have found a very negligible amount of contributions in Bangla Language. Many methods such as – Boolean model, Maximal Marginal Relevance (MMR), Portfolio Theory (PR), Quantum Probability Ranking Principle (QPRP), Query Directed Clustering (QDC), Vector-based TFIDF and so on, have been proposed to implement the document ranking system. Here, we have applied a new approach, called Latent Semantic Indexing (LSI) to do the same task for Bangla documents. LSI uses the mathematical method called Singular Value Decomposition (SVD). After that, we have applied the cosine similarity to rank all the documents. We believe that the performance result of our proposed system has reached the trustworthy level.\",\"PeriodicalId\":6613,\"journal\":{\"name\":\"2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT)\",\"volume\":\"35 1\",\"pages\":\"1-5\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASERT.2019.8934837\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASERT.2019.8934837","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

如今,像英语和其他语言一样,孟加拉语也在加强网络存储库方面发挥着重要作用。孟加拉语信息的存储速度与日俱增。由于万维网上有大量的文档,用户很难检索到所需的信息。此外,寻找有用的文件往往是花费更多的时间和烦人的工作。这些需求出现在开发一个信息检索(IR)系统来记录孟加拉语的排名。在本文中,我们建立了这样一个检索系统,用户可以在整个排序索引中找到与自己的查询字符串相对应的所需文档。虽然我们已经为英文和其他语言的文件做了很多排序工作,但不幸的是,我们发现孟加拉语的贡献非常微不足道。本文提出了许多方法,如-布尔模型、最大边际关联(MMR)、组合理论(PR)、量子概率排序原理(QPRP)、查询定向聚类(QDC)、基于向量的TFIDF等来实现文档排序系统。在这里,我们应用了一种称为潜在语义索引(LSI)的新方法来为孟加拉语文档执行相同的任务。LSI使用称为奇异值分解(SVD)的数学方法。之后,我们应用余弦相似度对所有文档进行排序。我们认为,我们提出的系统的性能结果已达到可信赖的水平。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Information Retrieval System in Bangla Document Ranking using Latent Semantic Indexing
Nowadays, like the English and other languages, Bangla also plays a significant role to strengthen the web repository. The storing rate of Bangla information is augmented day-by-day. Because of the numerous documents in the World Wide Web, it is very difficult for a user to retrieve the desired information. Furthermore, finding the useful documents tends to be more time spending as well as an annoying job. These demands emerge to develop an Information Retrieval (IR) system to document ranking for Bangla language. In this paper, we have built such a retrieval system where users can find their needed documents which correspond to their own query strings throughout the ranking index. Although a lot of works have been done for English and other languages to rank the documents, unfortunately, we have found a very negligible amount of contributions in Bangla Language. Many methods such as – Boolean model, Maximal Marginal Relevance (MMR), Portfolio Theory (PR), Quantum Probability Ranking Principle (QPRP), Query Directed Clustering (QDC), Vector-based TFIDF and so on, have been proposed to implement the document ranking system. Here, we have applied a new approach, called Latent Semantic Indexing (LSI) to do the same task for Bangla documents. LSI uses the mathematical method called Singular Value Decomposition (SVD). After that, we have applied the cosine similarity to rank all the documents. We believe that the performance result of our proposed system has reached the trustworthy level.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Thickness Dependency of Zinc Selenide (ZnSe) Thin Film Deposited By Vacuum Evaporation Method Comparative Study of Enhancing Stability of Wind Farm attached to the Grid by PID Controller based STATCOM and Capacitor Bank Performance Analysis of a High Power Quality Single Phase AC-DC Buck Boost Converter RoboFI: Autonomous Path Follower Robot for Human Body Detection and Geolocalization for Search and Rescue Missions using Computer Vision and IoT Electrical Properties of CSS Deposited CdTe Thin Films for Solar Cell Applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1