利用结构化特征和运行时隔离实现高效推荐服务

IF 3.6 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE Transactions on Computers Pub Date : 2024-08-28 DOI:10.1109/TC.2024.3449749
Xin You;Hailong Yang;Siqi Wang;Tao Peng;Chen Ding;Xinyuan Li;Bangduo Chen;Zhongzhi Luan;Tongxuan Liu;Yong Li;Depei Qian
{"title":"利用结构化特征和运行时隔离实现高效推荐服务","authors":"Xin You;Hailong Yang;Siqi Wang;Tao Peng;Chen Ding;Xinyuan Li;Bangduo Chen;Zhongzhi Luan;Tongxuan Liu;Yong Li;Depei Qian","doi":"10.1109/TC.2024.3449749","DOIUrl":null,"url":null,"abstract":"Recommendation serving with deep learning models is one of the most valuable services of modern E-commerce companies. In production, to accommodate billions of recommendation queries with stringent service level agreements, high-performant recommendation serving systems play an essential role in meeting such daunting demand. Unfortunately, existing model serving frameworks fail to achieve efficient serving due to unique challenges such as 1) the input format mismatch between service needs and the model's ability and 2) heavy software contentions to concurrently execute the constrained operations. To address the above challenges, we propose \n<i>RecServe</i>\n, a high-performant serving system for recommendation with the optimized design of \n<i>structured features</i>\n and \n<i>SessionGroups</i>\n for recommendation serving. With \n<i>structured features</i>\n, \n<i>RecServe</i>\n packs single-user-multiple-candidates inputs by semi-automatically transforming computation graphs with annotated input tensors, which can significantly reduce redundant network transmission, data movements, and useless computations. With \n<i>session group</i>\n, \n<i>RecServe</i>\n further adopts resource isolations for multiple compute streams and cost-aware operator scheduler with critical-path-based schedule policy to enable concurrent kernel execution, further improving serving throughput. The experiment results demonstrate that \n<i>RecServe</i>\n can achieve maximum performance speedups of 12.3\n<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula>\n and \n<inline-formula><tex-math>$22.0\\boldsymbol{\\times}$</tex-math></inline-formula>\n compared to the state-of-the-art serving system on CPU and GPU platforms, respectively.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 11","pages":"2474-2487"},"PeriodicalIF":3.6000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploiting Structured Feature and Runtime Isolation for High-Performant Recommendation Serving\",\"authors\":\"Xin You;Hailong Yang;Siqi Wang;Tao Peng;Chen Ding;Xinyuan Li;Bangduo Chen;Zhongzhi Luan;Tongxuan Liu;Yong Li;Depei Qian\",\"doi\":\"10.1109/TC.2024.3449749\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recommendation serving with deep learning models is one of the most valuable services of modern E-commerce companies. In production, to accommodate billions of recommendation queries with stringent service level agreements, high-performant recommendation serving systems play an essential role in meeting such daunting demand. Unfortunately, existing model serving frameworks fail to achieve efficient serving due to unique challenges such as 1) the input format mismatch between service needs and the model's ability and 2) heavy software contentions to concurrently execute the constrained operations. To address the above challenges, we propose \\n<i>RecServe</i>\\n, a high-performant serving system for recommendation with the optimized design of \\n<i>structured features</i>\\n and \\n<i>SessionGroups</i>\\n for recommendation serving. With \\n<i>structured features</i>\\n, \\n<i>RecServe</i>\\n packs single-user-multiple-candidates inputs by semi-automatically transforming computation graphs with annotated input tensors, which can significantly reduce redundant network transmission, data movements, and useless computations. With \\n<i>session group</i>\\n, \\n<i>RecServe</i>\\n further adopts resource isolations for multiple compute streams and cost-aware operator scheduler with critical-path-based schedule policy to enable concurrent kernel execution, further improving serving throughput. The experiment results demonstrate that \\n<i>RecServe</i>\\n can achieve maximum performance speedups of 12.3\\n<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula>\\n and \\n<inline-formula><tex-math>$22.0\\\\boldsymbol{\\\\times}$</tex-math></inline-formula>\\n compared to the state-of-the-art serving system on CPU and GPU platforms, respectively.\",\"PeriodicalId\":13087,\"journal\":{\"name\":\"IEEE Transactions on Computers\",\"volume\":\"73 11\",\"pages\":\"2474-2487\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2024-08-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computers\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10654386/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10654386/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

利用深度学习模型提供推荐服务是现代电子商务公司最有价值的服务之一。在生产过程中,为了满足数十亿次推荐查询和严格的服务水平协议,高性能的推荐服务系统在满足如此巨大的需求方面发挥着至关重要的作用。遗憾的是,现有的模型服务框架无法实现高效服务,原因在于存在以下独特的挑战:1)服务需求与模型能力之间的输入格式不匹配;2)同时执行受限操作的软件任务繁重。针对上述挑战,我们提出了一个高性能的推荐服务系统 RecServe,该系统对结构化特征和会话组进行了优化设计,以提供推荐服务。利用结构化特征,RecServe 通过半自动转换带有注释的输入张量的计算图来打包单用户-多候选输入,这可以大大减少冗余的网络传输、数据移动和无用的计算。在会话组的基础上,RecServe 进一步采用了多个计算流的资源隔离和基于临界路径调度策略的成本感知操作员调度器,以实现并发内核执行,从而进一步提高服务吞吐量。实验结果表明,与CPU和GPU平台上最先进的服务系统相比,RecServe的最高性能分别提高了12.3倍和22.0倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Exploiting Structured Feature and Runtime Isolation for High-Performant Recommendation Serving
Recommendation serving with deep learning models is one of the most valuable services of modern E-commerce companies. In production, to accommodate billions of recommendation queries with stringent service level agreements, high-performant recommendation serving systems play an essential role in meeting such daunting demand. Unfortunately, existing model serving frameworks fail to achieve efficient serving due to unique challenges such as 1) the input format mismatch between service needs and the model's ability and 2) heavy software contentions to concurrently execute the constrained operations. To address the above challenges, we propose RecServe , a high-performant serving system for recommendation with the optimized design of structured features and SessionGroups for recommendation serving. With structured features , RecServe packs single-user-multiple-candidates inputs by semi-automatically transforming computation graphs with annotated input tensors, which can significantly reduce redundant network transmission, data movements, and useless computations. With session group , RecServe further adopts resource isolations for multiple compute streams and cost-aware operator scheduler with critical-path-based schedule policy to enable concurrent kernel execution, further improving serving throughput. The experiment results demonstrate that RecServe can achieve maximum performance speedups of 12.3 $\boldsymbol{\times}$ and $22.0\boldsymbol{\times}$ compared to the state-of-the-art serving system on CPU and GPU platforms, respectively.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Computers
IEEE Transactions on Computers 工程技术-工程:电子与电气
CiteScore
6.60
自引率
5.40%
发文量
199
审稿时长
6.0 months
期刊介绍: The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.
期刊最新文献
CUSPX: Efficient GPU Implementations of Post-Quantum Signature SPHINCS+ Chiplet-Gym: Optimizing Chiplet-based AI Accelerator Design with Reinforcement Learning FLALM: A Flexible Low Area-Latency Montgomery Modular Multiplication on FPGA Novel Lagrange Multipliers-Driven Adaptive Offloading for Vehicular Edge Computing Leveraging GPU in Homomorphic Encryption: Framework Design and Analysis of BFV Variants
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1