Diversifying Search Result Leveraging Aspect-based Query Expansion

Shajalal, Muhammad Anwarul Azim Masaki Aono
{"title":"Diversifying Search Result Leveraging Aspect-based Query Expansion","authors":"Shajalal, Muhammad Anwarul Azim Masaki Aono","doi":"10.17781/P002433","DOIUrl":null,"url":null,"abstract":"Web search queries are short, ambiguous and tend to have multiple underlying interpretations. To reformulate such queries, query expansion is a prominent method that leads to retrieve a set of relevant documents. In this paper, we propose an aspectbased query expansion technique for diversified document retrieval. At first, query suggestions and completions are retrieved from major commercial search engines. A frequent phrase-based soft clustering algorithm is then applied to group similar retrieved candidates into clusters. Each cluster represents different query aspect. The expansion terms are selected from the generated cluster labels for each cluster. To estimate the relevancy between the expanded query and the documents, multiple new lexical and semantic features are introduced using the content information, and word-embedding model, respectively. Finally, a linear ranking approach is employed to re-rank the documents retrieved for the original query using the extracted features. We conduct experiments on Clueweb09 document collection using TREC 2012 Web Track queries. The experimental results clearly demonstrate that our proposed aspect-based query expansion method is effective to diversify the retrieved documents and outperformed baseline and some known related methods in terms of diversity metrics ERR-IA, α-nDCG and NRBP at the cut of 20.","PeriodicalId":211757,"journal":{"name":"International journal of new computer architectures and their applications","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of new computer architectures and their applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17781/P002433","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Web search queries are short, ambiguous and tend to have multiple underlying interpretations. To reformulate such queries, query expansion is a prominent method that leads to retrieve a set of relevant documents. In this paper, we propose an aspectbased query expansion technique for diversified document retrieval. At first, query suggestions and completions are retrieved from major commercial search engines. A frequent phrase-based soft clustering algorithm is then applied to group similar retrieved candidates into clusters. Each cluster represents different query aspect. The expansion terms are selected from the generated cluster labels for each cluster. To estimate the relevancy between the expanded query and the documents, multiple new lexical and semantic features are introduced using the content information, and word-embedding model, respectively. Finally, a linear ranking approach is employed to re-rank the documents retrieved for the original query using the extracted features. We conduct experiments on Clueweb09 document collection using TREC 2012 Web Track queries. The experimental results clearly demonstrate that our proposed aspect-based query expansion method is effective to diversify the retrieved documents and outperformed baseline and some known related methods in terms of diversity metrics ERR-IA, α-nDCG and NRBP at the cut of 20.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用基于方面的查询扩展来多样化搜索结果
Web搜索查询简短、不明确,而且往往有多种潜在的解释。为了重新表述这样的查询,查询扩展是一种重要的方法,它导致检索一组相关文档。本文提出了一种基于方面的多元文档检索扩展技术。首先,查询建议和补全是从主要的商业搜索引擎中检索的。然后应用基于频繁短语的软聚类算法对相似的检索候选词进行聚类。每个集群代表不同的查询方面。从每个集群生成的集群标签中选择扩展项。为了估计扩展查询与文档之间的相关性,分别使用内容信息和词嵌入模型引入了多个新的词汇和语义特征。最后,采用线性排序方法,使用提取的特征对原始查询检索到的文档重新排序。我们使用TREC 2012 Web Track查询对Clueweb09文档收集进行了实验。实验结果清楚地表明,我们提出的基于方面的查询扩展方法在检索文档的多样性方面是有效的,并且在多样性指标ERR-IA, α-nDCG和NRBP方面优于基线和一些已知的相关方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Introduction to Sociology of Online Social Networks in Morocco. Data Acquisition Process: Results and Connectivity Analysis SLA-BASED RESOURCE ALLOCATION WITHIN CLOUD NETWORKING ENVIRONMENT Proportional Weighted Round Robin: A Proportional Share CPU Scheduler inTime Sharing Systems Variation Effect of Silicon Film Thickness on Electrical Properties of NANOMOSFET CAUSALITY ISSUES IN ORIENTATION CONTROL OF AN UNDER-ACTUATED DRILL MACHINE
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1