直接优化学习排序模型的排序方法

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI:10.1145/2487575.2487630

Ming Tan, Tian Xia, L. Guo, Shaojun Wang

{"title":"直接优化学习排序模型的排序方法","authors":"Ming Tan, Tian Xia, L. Guo, Shaojun Wang","doi":"10.1145/2487575.2487630","DOIUrl":null,"url":null,"abstract":"We present a novel learning algorithm, DirectRank, which directly and exactly optimizes ranking measures without resorting to any upper bounds or approximations. Our approach is essentially an iterative coordinate ascent method. In each iteration, we choose one coordinate and only update the corresponding parameter, with all others remaining fixed. Since the ranking measure is a stepwise function of a single parameter, we propose a novel line search algorithm that can locate the interval with the best ranking measure along this coordinate quite efficiently. In order to stabilize our system in small datasets, we construct a probabilistic framework for document-query pairs to maximize the likelihood of the objective permutation of top-$\\tau$ documents. This iterative procedure ensures convergence. Furthermore, we integrate regression trees as our weak learners in order to consider the correlation between the different features. Experiments on LETOR datasets and two large datasets, Yahoo challenge data and Microsoft 30K web data, show an improvement over state-of-the-art systems.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":"{\"title\":\"Direct optimization of ranking measures for learning to rank models\",\"authors\":\"Ming Tan, Tian Xia, L. Guo, Shaojun Wang\",\"doi\":\"10.1145/2487575.2487630\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a novel learning algorithm, DirectRank, which directly and exactly optimizes ranking measures without resorting to any upper bounds or approximations. Our approach is essentially an iterative coordinate ascent method. In each iteration, we choose one coordinate and only update the corresponding parameter, with all others remaining fixed. Since the ranking measure is a stepwise function of a single parameter, we propose a novel line search algorithm that can locate the interval with the best ranking measure along this coordinate quite efficiently. In order to stabilize our system in small datasets, we construct a probabilistic framework for document-query pairs to maximize the likelihood of the objective permutation of top-$\\\\tau$ documents. This iterative procedure ensures convergence. Furthermore, we integrate regression trees as our weak learners in order to consider the correlation between the different features. Experiments on LETOR datasets and two large datasets, Yahoo challenge data and Microsoft 30K web data, show an improvement over state-of-the-art systems.\",\"PeriodicalId\":20472,\"journal\":{\"name\":\"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-08-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"29\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2487575.2487630\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2487575.2487630","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 29

摘要

我们提出了一种新的学习算法，DirectRank，它直接和精确地优化排名措施，而不依赖于任何上界或近似。我们的方法本质上是一种迭代坐标上升法。在每次迭代中，我们选择一个坐标，只更新相应的参数，其他参数保持不变。由于排序测度是单参数的阶跃函数，我们提出了一种新的直线搜索算法，该算法可以沿该坐标非常有效地定位到具有最佳排序测度的区间。为了在小数据集中稳定我们的系统，我们为文档-查询对构建了一个概率框架，以最大化top-$\tau$文档客观排列的可能性。这个迭代过程保证了收敛性。此外，为了考虑不同特征之间的相关性，我们将回归树集成为弱学习器。在LETOR数据集和两个大型数据集(雅虎挑战数据和微软30K网络数据)上的实验表明，与最先进的系统相比，这种方法有所改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Direct optimization of ranking measures for learning to rank models

We present a novel learning algorithm, DirectRank, which directly and exactly optimizes ranking measures without resorting to any upper bounds or approximations. Our approach is essentially an iterative coordinate ascent method. In each iteration, we choose one coordinate and only update the corresponding parameter, with all others remaining fixed. Since the ranking measure is a stepwise function of a single parameter, we propose a novel line search algorithm that can locate the interval with the best ranking measure along this coordinate quite efficiently. In order to stabilize our system in small datasets, we construct a probabilistic framework for document-query pairs to maximize the likelihood of the objective permutation of top-$\tau$ documents. This iterative procedure ensures convergence. Furthermore, we integrate regression trees as our weak learners in order to consider the correlation between the different features. Experiments on LETOR datasets and two large datasets, Yahoo challenge data and Microsoft 30K web data, show an improvement over state-of-the-art systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

自引率

0.00%

发文量

期刊最新文献

A general bootstrap performance diagnostic Flexible and robust co-regularized multi-domain graph clustering Beyond myopic inference in big data pipelines Constrained stochastic gradient descent for large-scale least squares problem Inferring distant-time location in low-sampling-rate trajectories