使用分布式表示的多样性感知学习排序

Le Yan, Zhen Qin, Rama Kumar Pasumarthi, Xuanhui Wang, Michael Bendersky
{"title":"使用分布式表示的多样性感知学习排序","authors":"Le Yan, Zhen Qin, Rama Kumar Pasumarthi, Xuanhui Wang, Michael Bendersky","doi":"10.1145/3442381.3449831","DOIUrl":null,"url":null,"abstract":"Existing work on search result diversification typically falls into the “next document” paradigm, that is, selecting the next document based on the ones already chosen. A sequential process of selecting documents one-by-one is naturally modeled in learning-based approaches. However, such a process makes the learning difficult because there are an exponential number of ranking lists to consider. Sampling is usually used to reduce the computational complexity but this makes the learning less effective. In this paper, we propose a soft version of the “next document” paradigm in which we associate each document with an approximate rank, and thus the subtopics covered prior to a document can also be estimated. We show that we can derive differentiable diversification-aware losses, which are smooth approximation of diversity metrics like α-NDCG, based on these estimates. We further propose to optimize the losses in the learning-to-rank setting using neural distributed representations of queries and documents. Experiments are conducted on the public benchmark TREC datasets. By comparing with an extensive list of baseline methods, we show that our Diversification-Aware LEarning-TO-Rank (DALETOR) approaches outperform them by a large margin, while being much simpler during learning and inference.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"Diversification-Aware Learning to Rank using Distributed Representation\",\"authors\":\"Le Yan, Zhen Qin, Rama Kumar Pasumarthi, Xuanhui Wang, Michael Bendersky\",\"doi\":\"10.1145/3442381.3449831\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Existing work on search result diversification typically falls into the “next document” paradigm, that is, selecting the next document based on the ones already chosen. A sequential process of selecting documents one-by-one is naturally modeled in learning-based approaches. However, such a process makes the learning difficult because there are an exponential number of ranking lists to consider. Sampling is usually used to reduce the computational complexity but this makes the learning less effective. In this paper, we propose a soft version of the “next document” paradigm in which we associate each document with an approximate rank, and thus the subtopics covered prior to a document can also be estimated. We show that we can derive differentiable diversification-aware losses, which are smooth approximation of diversity metrics like α-NDCG, based on these estimates. We further propose to optimize the losses in the learning-to-rank setting using neural distributed representations of queries and documents. Experiments are conducted on the public benchmark TREC datasets. By comparing with an extensive list of baseline methods, we show that our Diversification-Aware LEarning-TO-Rank (DALETOR) approaches outperform them by a large margin, while being much simpler during learning and inference.\",\"PeriodicalId\":106672,\"journal\":{\"name\":\"Proceedings of the Web Conference 2021\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Web Conference 2021\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3442381.3449831\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Web Conference 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3442381.3449831","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23

摘要

现有的搜索结果多样化工作通常属于“下一个文档”范式,即根据已经选择的文档选择下一个文档。一个接一个地选择文档的顺序过程在基于学习的方法中自然地被建模。然而,这样的过程使学习变得困难,因为要考虑的排名列表的数量是指数级的。采样通常用于降低计算复杂度,但这会降低学习的效率。在本文中,我们提出了“下一个文档”范式的软版本,其中我们将每个文档与一个近似等级相关联,因此也可以估计文档之前覆盖的子主题。我们证明,基于这些估计,我们可以推导出可微的多样化感知损失,这是多样性指标(如α-NDCG)的光滑逼近。我们进一步建议使用查询和文档的神经分布式表示来优化学习排序设置中的损失。在公共基准TREC数据集上进行了实验。通过与广泛的基线方法列表进行比较,我们表明我们的多样化感知学习排序(DALETOR)方法在很大程度上优于它们,同时在学习和推理过程中更简单。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Diversification-Aware Learning to Rank using Distributed Representation
Existing work on search result diversification typically falls into the “next document” paradigm, that is, selecting the next document based on the ones already chosen. A sequential process of selecting documents one-by-one is naturally modeled in learning-based approaches. However, such a process makes the learning difficult because there are an exponential number of ranking lists to consider. Sampling is usually used to reduce the computational complexity but this makes the learning less effective. In this paper, we propose a soft version of the “next document” paradigm in which we associate each document with an approximate rank, and thus the subtopics covered prior to a document can also be estimated. We show that we can derive differentiable diversification-aware losses, which are smooth approximation of diversity metrics like α-NDCG, based on these estimates. We further propose to optimize the losses in the learning-to-rank setting using neural distributed representations of queries and documents. Experiments are conducted on the public benchmark TREC datasets. By comparing with an extensive list of baseline methods, we show that our Diversification-Aware LEarning-TO-Rank (DALETOR) approaches outperform them by a large margin, while being much simpler during learning and inference.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
WiseTrans: Adaptive Transport Protocol Selection for Mobile Web Service Outlier-Resilient Web Service QoS Prediction Not All Features Are Equal: Discovering Essential Features for Preserving Prediction Privacy Unsupervised Lifelong Learning with Curricula The Structure of Toxic Conversations on Twitter
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1