Word Embeddings-Based Pseudo Relevance Feedback Using Deep Averaging Networks for Arabic Document Retrieval

Yasir Hadi Farhan, S. Noah, M. Mohd, Jaffar Atwan
{"title":"Word Embeddings-Based Pseudo Relevance Feedback Using Deep Averaging Networks for Arabic Document Retrieval","authors":"Yasir Hadi Farhan, S. Noah, M. Mohd, Jaffar Atwan","doi":"10.1633/JISTAP.2021.9.2.1","DOIUrl":null,"url":null,"abstract":"Pseudo relevance feedback (PRF) is a powerful query expansion (QE) technique that prepares queries using the top k pseudo-relevant documents and choosing expansion elements. Traditional PRF frameworks have robustly handled vocabulary mismatch corresponding to user queries and pertinent documents; nevertheless, expansion elements are chosen, disregarding similarity to the original query’s elements. Word embedding (WE) schemes comprise techniques of significant interest concerning QE, that falls within the information retrieval domain. Deep averaging networks (DANs) defines a framework relying on average word presence passed through multiple linear layers. The complete query is understandably represented using the average vector comprising the query terms. The vector may be employed for determining expansion elements pertinent to the entire query. In this study, we suggest a DANs-based technique that augments PRF frameworks by integrating WE similarities to facilitate Arabic information retrieval. The technique is based on the fundamental that the top pseudo-relevant document set is assessed to determine candidate element distribution and select expansion terms appropriately, considering their similarity to the average vector representing the initial query elements. The Word2Vec model is selected for executing the experiments on a standard Arabic TREC 2001/2002 set. The majority of the evaluations indicate that the PRF implementation in the present study offers a significant performance improvement compared to that of the baseline PRF frameworks.","PeriodicalId":37582,"journal":{"name":"Journal of Information Science Theory and Practice","volume":"111 1","pages":"1-17"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Science Theory and Practice","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1633/JISTAP.2021.9.2.1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 1

Abstract

Pseudo relevance feedback (PRF) is a powerful query expansion (QE) technique that prepares queries using the top k pseudo-relevant documents and choosing expansion elements. Traditional PRF frameworks have robustly handled vocabulary mismatch corresponding to user queries and pertinent documents; nevertheless, expansion elements are chosen, disregarding similarity to the original query’s elements. Word embedding (WE) schemes comprise techniques of significant interest concerning QE, that falls within the information retrieval domain. Deep averaging networks (DANs) defines a framework relying on average word presence passed through multiple linear layers. The complete query is understandably represented using the average vector comprising the query terms. The vector may be employed for determining expansion elements pertinent to the entire query. In this study, we suggest a DANs-based technique that augments PRF frameworks by integrating WE similarities to facilitate Arabic information retrieval. The technique is based on the fundamental that the top pseudo-relevant document set is assessed to determine candidate element distribution and select expansion terms appropriately, considering their similarity to the average vector representing the initial query elements. The Word2Vec model is selected for executing the experiments on a standard Arabic TREC 2001/2002 set. The majority of the evaluations indicate that the PRF implementation in the present study offers a significant performance improvement compared to that of the baseline PRF frameworks.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于词嵌入的伪相关反馈的深度平均网络在阿拉伯语文档检索中的应用
伪相关反馈(PRF)是一种强大的查询扩展(QE)技术,它使用前k个伪相关文档和选择扩展元素来准备查询。传统的PRF框架健壮地处理了对应于用户查询和相关文档的词汇不匹配;然而,选择扩展元素,而不考虑与原始查询元素的相似性。词嵌入(WE)方案包含与QE相关的重要技术,属于信息检索领域。深度平均网络(dan)定义了一个框架,该框架依赖于通过多个线性层的平均单词存在。可以理解地使用包含查询项的平均向量来表示完整的查询。向量可用于确定与整个查询相关的展开元素。在这项研究中,我们提出了一种基于dan的技术,通过整合we相似性来增强PRF框架,以促进阿拉伯语信息检索。该技术基于这样一个基本原理,即评估顶部伪相关文档集,以确定候选元素分布,并考虑它们与表示初始查询元素的平均向量的相似性,适当地选择展开项。选用Word2Vec模型在标准阿拉伯语TREC 2001/2002集上进行实验。大多数评价表明,本研究中的PRF实施与基线PRF框架相比,提供了显着的绩效改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Information Science Theory and Practice
Journal of Information Science Theory and Practice Social Sciences-Library and Information Sciences
CiteScore
1.10
自引率
0.00%
发文量
0
审稿时长
12 weeks
期刊介绍: The Journal of Information Science Theory and Practice (JISTaP) is an international journal that aims at publishing original studies, review papers and brief communications on information science theory and practice. The journal provides an international forum for practical as well as theoretical research in the interdisciplinary areas of information science, such as information processing and management, knowledge organization, scholarly communication and bibliometrics. To foster scholarly communication among researchers and practitioners of library and information science around the globe, JISTaP offers a no-fee open access publishing venue where a team of dedicated editors, reviewers and staff members volunteer their services to ensure rapid dissemination and communication of scholarly works that make significant contributions. In a modern society, where information production and consumption grow at an astronomical rate, the science of information management, organization, and analysis is invaluable in effective utilization of information. The key objective of the journal is to foster research that can contribute to advancements and innovations in the theory and practice of information and library science so as to promote timely application of the findings from scientific investigations to everyday life. Recognizing the importance of the global perspective with understanding of region-specific issues, JISTaP encourages submissions of manuscripts that discuss global implications of regional findings as well as regional implications of global findings.
期刊最新文献
Information Behavior in COVID-19 Prevention: Does Anxiety among Indonesian Mothers Have an Effect? Enhancing Business Continuity in the Oil and Gas Industry through Electronic Records Management System Usage to Improve Off-Site Working: A Narrative Review Shifting Meme Content during Information Development on the COVID-19 Pandemic in Indonesia Influencing Factors of Research Collaboration Intention in Virtual Academic Communities in China ISRI - Information Systems Research Constructs and Indicators: A Web Tool for Information Systems Researchers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1