平衡学习，在大数据中排名

2014 22nd European Signal Processing Conference (EUSIPCO) Pub Date : 2014-11-13 DOI:10.5281/ZENODO.44026

G. Cao, I. Ahmad, Honglei Zhang, Weiyi Xie, M. Gabbouj

{"title":"平衡学习，在大数据中排名","authors":"G. Cao, I. Ahmad, Honglei Zhang, Weiyi Xie, M. Gabbouj","doi":"10.5281/ZENODO.44026","DOIUrl":null,"url":null,"abstract":"We propose a distributed learning to rank method, and demonstrate its effectiveness in web-scale image retrieval. With the increasing amount of data, it is not applicable to train a centralized ranking model for any large scale learning problems. In distributed learning, the discrepancy between the training subsets and the whole when building the models are non-trivial but overlooked in the previous work. In this paper, we firstly include a cost factor to boosting algorithms to balance the individual models toward the whole data. Then, we propose to decompose the original algorithm to multiple layers, and their aggregation forms a superior ranker which can be easily scaled up to billions of images. The extensive experiments show the proposed method outperforms the straightforward aggregation of boosting algorithms.","PeriodicalId":198408,"journal":{"name":"2014 22nd European Signal Processing Conference (EUSIPCO)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Balance learning to rank in big data\",\"authors\":\"G. Cao, I. Ahmad, Honglei Zhang, Weiyi Xie, M. Gabbouj\",\"doi\":\"10.5281/ZENODO.44026\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a distributed learning to rank method, and demonstrate its effectiveness in web-scale image retrieval. With the increasing amount of data, it is not applicable to train a centralized ranking model for any large scale learning problems. In distributed learning, the discrepancy between the training subsets and the whole when building the models are non-trivial but overlooked in the previous work. In this paper, we firstly include a cost factor to boosting algorithms to balance the individual models toward the whole data. Then, we propose to decompose the original algorithm to multiple layers, and their aggregation forms a superior ranker which can be easily scaled up to billions of images. The extensive experiments show the proposed method outperforms the straightforward aggregation of boosting algorithms.\",\"PeriodicalId\":198408,\"journal\":{\"name\":\"2014 22nd European Signal Processing Conference (EUSIPCO)\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-11-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 22nd European Signal Processing Conference (EUSIPCO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5281/ZENODO.44026\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 22nd European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5281/ZENODO.44026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

我们提出了一种分布式学习排序方法，并证明了其在web规模图像检索中的有效性。随着数据量的不断增加，对于任何大规模的学习问题，都不适合训练集中式排名模型。在分布式学习中，在建立模型时，训练子集与整体之间的差异很重要，但在以前的工作中被忽略了。在本文中，我们首先在增强算法中加入一个成本因素，以平衡单个模型与整个数据。然后，我们提出将原始算法分解为多个层，它们的聚合形成一个更高级的秩，可以很容易地扩展到数十亿张图像。大量的实验表明，该方法优于直接聚合的增强算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Balance learning to rank in big data

We propose a distributed learning to rank method, and demonstrate its effectiveness in web-scale image retrieval. With the increasing amount of data, it is not applicable to train a centralized ranking model for any large scale learning problems. In distributed learning, the discrepancy between the training subsets and the whole when building the models are non-trivial but overlooked in the previous work. In this paper, we firstly include a cost factor to boosting algorithms to balance the individual models toward the whole data. Then, we propose to decompose the original algorithm to multiple layers, and their aggregation forms a superior ranker which can be easily scaled up to billions of images. The extensive experiments show the proposed method outperforms the straightforward aggregation of boosting algorithms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 22nd European Signal Processing Conference (EUSIPCO)

自引率

0.00%

发文量