Minhui Xie, Kai Ren, Youyou Lu, Guangxu Yang, Qingxing Xu, Bihai Wu, Jiazhen Lin, H. Ao, Wanhong Xu, J. Shu
{"title":"Kraken:大规模实时推荐的高效内存持续学习","authors":"Minhui Xie, Kai Ren, Youyou Lu, Guangxu Yang, Qingxing Xu, Bihai Wu, Jiazhen Lin, H. Ao, Wanhong Xu, J. Shu","doi":"10.1109/SC41405.2020.00025","DOIUrl":null,"url":null,"abstract":"Modern recommendation systems in industry often use deep learning (DL) models that achieve better model accuracy with more data and model parameters. However, current opensource DL frameworks, such as TensorFlow and PyTorch, show relatively low scalability on training recommendation models with terabytes of parameters. To efficiently learn large-scale recommendation models from data streams that generate hundreds of terabytes training data daily, we introduce a continual learning system called Kraken. Kraken contains a special parameter server implementation that dynamically adapts to the rapidly changing set of sparse features for the continual training and serving of recommendation models. Kraken provides a sparsity-aware training system that uses different learning optimizers for dense and sparse parameters to reduce memory overhead. Extensive experiments using real-world datasets confirm the effectiveness and scalability of Kraken. Kraken can benefit the accuracy of recommendation tasks with the same memory resources, or trisect the memory usage while keeping model performance.","PeriodicalId":424429,"journal":{"name":"SC20: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"Kraken: Memory-Efficient Continual Learning for Large-Scale Real-Time Recommendations\",\"authors\":\"Minhui Xie, Kai Ren, Youyou Lu, Guangxu Yang, Qingxing Xu, Bihai Wu, Jiazhen Lin, H. Ao, Wanhong Xu, J. Shu\",\"doi\":\"10.1109/SC41405.2020.00025\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern recommendation systems in industry often use deep learning (DL) models that achieve better model accuracy with more data and model parameters. However, current opensource DL frameworks, such as TensorFlow and PyTorch, show relatively low scalability on training recommendation models with terabytes of parameters. To efficiently learn large-scale recommendation models from data streams that generate hundreds of terabytes training data daily, we introduce a continual learning system called Kraken. Kraken contains a special parameter server implementation that dynamically adapts to the rapidly changing set of sparse features for the continual training and serving of recommendation models. Kraken provides a sparsity-aware training system that uses different learning optimizers for dense and sparse parameters to reduce memory overhead. Extensive experiments using real-world datasets confirm the effectiveness and scalability of Kraken. Kraken can benefit the accuracy of recommendation tasks with the same memory resources, or trisect the memory usage while keeping model performance.\",\"PeriodicalId\":424429,\"journal\":{\"name\":\"SC20: International Conference for High Performance Computing, Networking, Storage and Analysis\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SC20: International Conference for High Performance Computing, Networking, Storage and Analysis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SC41405.2020.00025\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SC20: International Conference for High Performance Computing, Networking, Storage and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SC41405.2020.00025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Kraken: Memory-Efficient Continual Learning for Large-Scale Real-Time Recommendations
Modern recommendation systems in industry often use deep learning (DL) models that achieve better model accuracy with more data and model parameters. However, current opensource DL frameworks, such as TensorFlow and PyTorch, show relatively low scalability on training recommendation models with terabytes of parameters. To efficiently learn large-scale recommendation models from data streams that generate hundreds of terabytes training data daily, we introduce a continual learning system called Kraken. Kraken contains a special parameter server implementation that dynamically adapts to the rapidly changing set of sparse features for the continual training and serving of recommendation models. Kraken provides a sparsity-aware training system that uses different learning optimizers for dense and sparse parameters to reduce memory overhead. Extensive experiments using real-world datasets confirm the effectiveness and scalability of Kraken. Kraken can benefit the accuracy of recommendation tasks with the same memory resources, or trisect the memory usage while keeping model performance.