Search result diversification in resource selection for federated search

Dzung Hong, Luo Si
{"title":"Search result diversification in resource selection for federated search","authors":"Dzung Hong, Luo Si","doi":"10.1145/2484028.2484091","DOIUrl":null,"url":null,"abstract":"Prior research in resource selection for federated search mainly focused on selecting a small number of information sources that are most relevant to a user query. However, result novelty and diversification are largely unexplored, which does not reflect the various kinds of information needs of users in real world applications. This paper proposes two general approaches to model both result relevance and diversification in selecting sources, in order to provide more comprehensive coverage of multiple aspects of a user query. The first approach focuses on diversifying the document ranking on a centralized sample database before selecting information sources under the framework of Relevant Document Distribution Estimation (ReDDE). The second approach first evaluates the relevance of information sources with respect to each aspect of the query, and then ranks the sources based on the novelty and relevance that they offer. Both approaches can be applied with a wide range of existing resource selection algorithms such as ReDDE, CRCS, CORI and Big Document. Moreover, this paper proposes a learning based approach to combine multiple resource selection algorithms for result diversification, which can further improve the performance. We propose a set of new metrics for resource selection in federated search to evaluate the diversification performance of different approaches. To our best knowledge, this is the first piece of work that addresses the problem of search result diversification in federated search. The effectiveness of the proposed approaches has been demonstrated by an extensive set of experiments on the federated search testbed of the Clueweb dataset.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2484028.2484091","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

Abstract

Prior research in resource selection for federated search mainly focused on selecting a small number of information sources that are most relevant to a user query. However, result novelty and diversification are largely unexplored, which does not reflect the various kinds of information needs of users in real world applications. This paper proposes two general approaches to model both result relevance and diversification in selecting sources, in order to provide more comprehensive coverage of multiple aspects of a user query. The first approach focuses on diversifying the document ranking on a centralized sample database before selecting information sources under the framework of Relevant Document Distribution Estimation (ReDDE). The second approach first evaluates the relevance of information sources with respect to each aspect of the query, and then ranks the sources based on the novelty and relevance that they offer. Both approaches can be applied with a wide range of existing resource selection algorithms such as ReDDE, CRCS, CORI and Big Document. Moreover, this paper proposes a learning based approach to combine multiple resource selection algorithms for result diversification, which can further improve the performance. We propose a set of new metrics for resource selection in federated search to evaluate the diversification performance of different approaches. To our best knowledge, this is the first piece of work that addresses the problem of search result diversification in federated search. The effectiveness of the proposed approaches has been demonstrated by an extensive set of experiments on the federated search testbed of the Clueweb dataset.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
联邦搜索资源选择中的搜索结果多样化
先前关于联邦搜索资源选择的研究主要集中在选择与用户查询最相关的少数信息源。然而,结果的新颖性和多样性在很大程度上尚未得到开发,这并不能反映现实世界应用中用户的各种信息需求。为了更全面地覆盖用户查询的多个方面,本文提出了两种通用的方法来建模结果相关性和选择来源的多样性。第一种方法侧重于在相关文档分布估计(ReDDE)框架下,在选择信息源之前,在集中的样本数据库上多样化文档排名。第二种方法首先根据查询的每个方面评估信息源的相关性,然后根据它们提供的新颖性和相关性对信息源进行排序。这两种方法都可以广泛应用于现有的资源选择算法,如ReDDE、CRCS、CORI和Big Document。此外,本文提出了一种基于学习的方法,将多种资源选择算法结合起来实现结果多样化,进一步提高了性能。我们提出了一套新的联邦搜索资源选择指标,以评估不同方法的多样化性能。据我们所知,这是解决联邦搜索中搜索结果多样化问题的第一部分工作。在Clueweb数据集的联邦搜索测试平台上进行了大量的实验,证明了所提出方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Search engine switching detection based on user personal preferences and behavior patterns Workshop on benchmarking adaptive retrieval and recommender systems: BARS 2013 A test collection for entity search in DBpedia Sentiment analysis of user comments for one-class collaborative filtering over ted talks A document rating system for preference judgements
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1