Efficient Learning to Learn a Robust CTR Model for Web-scale Online Sponsored Search Advertising

Proceedings of the 30th ACM International Conference on Information & Knowledge Management Pub Date : 2021-10-26 DOI:10.1145/3459637.3481912

Xin Wang, Peng Yang, S. Chen, Lin Liu, Liang Zhao, Jiacheng Guo, Mingming Sun, Ping Li

{"title":"Efficient Learning to Learn a Robust CTR Model for Web-scale Online Sponsored Search Advertising","authors":"Xin Wang, Peng Yang, S. Chen, Lin Liu, Liang Zhao, Jiacheng Guo, Mingming Sun, Ping Li","doi":"10.1145/3459637.3481912","DOIUrl":null,"url":null,"abstract":"Click-through rate (CTR) prediction is crucial for online sponsored search advertising. Several successful CTR models have been adopted in the industry, including the regularized logistic regression (LR). Nonetheless, the learning process suffers from two limitations: 1) Feature crosses for high-order information may generate trillions of features, which are sparse for online learning examples; 2) Rapid changing of data distribution brings challenges to the accurate learning since the model has to perform a fast adaptation on the new data. Moreover, existing adaptive optimizers are ineffective in handling the sparsity issue for high-dimensional features. In this paper, we propose to learn an optimizer in a meta-learning scenario, where the optimizer is learned on prior data and can be easily adapted to the new data. We firstly build a low-dimensional feature embedding on prior data to encode the association among features. Then, the gradients on new data can be decomposed into the low-dimensional space, enabling the parameter update smoothed and relieving the sparsity. Note that this technology could be deployed into a distributed system to ensure efficient online learning on the trillions-level parameters. We conduct extensive experiments to evaluate the algorithm in terms of prediction accuracy and actual revenue. Experimental results demonstrate that the proposed framework achieves a promising prediction on the new data. The final online revenue is noticeably improved compared to the baseline. This framework was initially deployed in Baidu Search Ads (a.k.a. Phoenix Nest) in 2014 and is currently still being used in certain modules of Baidu's ads systems.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3459637.3481912","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Click-through rate (CTR) prediction is crucial for online sponsored search advertising. Several successful CTR models have been adopted in the industry, including the regularized logistic regression (LR). Nonetheless, the learning process suffers from two limitations: 1) Feature crosses for high-order information may generate trillions of features, which are sparse for online learning examples; 2) Rapid changing of data distribution brings challenges to the accurate learning since the model has to perform a fast adaptation on the new data. Moreover, existing adaptive optimizers are ineffective in handling the sparsity issue for high-dimensional features. In this paper, we propose to learn an optimizer in a meta-learning scenario, where the optimizer is learned on prior data and can be easily adapted to the new data. We firstly build a low-dimensional feature embedding on prior data to encode the association among features. Then, the gradients on new data can be decomposed into the low-dimensional space, enabling the parameter update smoothed and relieving the sparsity. Note that this technology could be deployed into a distributed system to ensure efficient online learning on the trillions-level parameters. We conduct extensive experiments to evaluate the algorithm in terms of prediction accuracy and actual revenue. Experimental results demonstrate that the proposed framework achieves a promising prediction on the new data. The final online revenue is noticeably improved compared to the baseline. This framework was initially deployed in Baidu Search Ads (a.k.a. Phoenix Nest) in 2014 and is currently still being used in certain modules of Baidu's ads systems.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

有效学习学习网络规模在线赞助搜索广告的鲁棒点击率模型

点击率(CTR)预测对在线赞助搜索广告至关重要。业界已经采用了几个成功的CTR模型，包括正则化逻辑回归(LR)。然而，学习过程存在两个局限性:1)高阶信息的特征交叉可能会产生数万亿个特征，这对于在线学习示例来说是稀疏的;2)数据分布的快速变化给模型的准确学习带来了挑战，因为模型必须对新数据进行快速适应。此外，现有的自适应优化器在处理高维特征的稀疏性问题时效果不佳。在本文中，我们建议在元学习场景中学习优化器，其中优化器是在先前数据上学习的，并且可以很容易地适应新数据。首先在先验数据上构建低维特征嵌入，对特征之间的关联进行编码。然后，将新数据上的梯度分解到低维空间中，使参数更新平滑，减轻稀疏性。请注意，这项技术可以部署到分布式系统中，以确保在数万亿级别的参数上有效地在线学习。我们进行了大量的实验来评估该算法的预测准确性和实际收益。实验结果表明，该框架对新数据的预测效果良好。与基线相比，最终的在线收入显著提高。该框架最初于2014年部署在百度搜索广告(又名凤凰巢)中，目前仍在百度广告系统的某些模块中使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 30th ACM International Conference on Information & Knowledge Management

自引率

0.00%

发文量

期刊最新文献

UltraGCN Fine and Coarse Granular Argument Classification before Clustering CHASE Crawler Detection in Location-Based Services Using Attributed Action Net Failure Prediction for Large-scale Water Pipe Networks Using GNN and Temporal Failure Series