Xinchen Lyu;Yuewei Li;Ying He;Chenshan Ren;Wei Ni;Ren Ping Liu;Pengcheng Zhu;Qimei Cui
{"title":"目标驱动的分体式人工智能推理边缘网络流量预测和资源分配差异化优化","authors":"Xinchen Lyu;Yuewei Li;Ying He;Chenshan Ren;Wei Ni;Ren Ping Liu;Pengcheng Zhu;Qimei Cui","doi":"10.1109/TMLCN.2024.3449831","DOIUrl":null,"url":null,"abstract":"Split AI inference partitions an artificial intelligence (AI) model into multiple parts, enabling the offloading of computation-intensive AI services. Resource allocation is critical for the performance of split AI inference. The challenge arises from the time-sensitivity of many services versus time-varying traffic arrivals and network conditions. The conventional prediction-based resource allocation frameworks have adopted separate traffic prediction and resource optimization modules, which may be inefficient due to discrepancies between the traffic prediction accuracy and resource optimization objective. This paper proposes a new, objective-driven, differentiable optimization framework that integrates traffic prediction and resource allocation for split AI inference. The resource optimization problem (aimed to maximize network revenue while adhering to service and network constraints) is designed to be embedded as the output layer following the traffic prediction module. As such, the traffic prediction module can be trained directly based on the network revenue instead of the prediction accuracy, significantly outperforming the conventional prediction-based separate design. Employing the Lagrange duality and Karush-Kuhn-Tucker (KKT) conditions, we achieve efficient forward pass (obtaining resource allocation decisions) and backpropagation (deriving the objective-driven gradients for joint model training) of the output layer. Extensive experiments on different traffic datasets validate the superiority of the proposed approach, achieving up to 38.85% higher network revenue than the conventional predictive baselines.","PeriodicalId":100641,"journal":{"name":"IEEE Transactions on Machine Learning in Communications and Networking","volume":"2 ","pages":"1178-1192"},"PeriodicalIF":0.0000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10646623","citationCount":"0","resultStr":"{\"title\":\"Objective-Driven Differentiable Optimization of Traffic Prediction and Resource Allocation for Split AI Inference Edge Networks\",\"authors\":\"Xinchen Lyu;Yuewei Li;Ying He;Chenshan Ren;Wei Ni;Ren Ping Liu;Pengcheng Zhu;Qimei Cui\",\"doi\":\"10.1109/TMLCN.2024.3449831\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Split AI inference partitions an artificial intelligence (AI) model into multiple parts, enabling the offloading of computation-intensive AI services. Resource allocation is critical for the performance of split AI inference. The challenge arises from the time-sensitivity of many services versus time-varying traffic arrivals and network conditions. The conventional prediction-based resource allocation frameworks have adopted separate traffic prediction and resource optimization modules, which may be inefficient due to discrepancies between the traffic prediction accuracy and resource optimization objective. This paper proposes a new, objective-driven, differentiable optimization framework that integrates traffic prediction and resource allocation for split AI inference. The resource optimization problem (aimed to maximize network revenue while adhering to service and network constraints) is designed to be embedded as the output layer following the traffic prediction module. As such, the traffic prediction module can be trained directly based on the network revenue instead of the prediction accuracy, significantly outperforming the conventional prediction-based separate design. Employing the Lagrange duality and Karush-Kuhn-Tucker (KKT) conditions, we achieve efficient forward pass (obtaining resource allocation decisions) and backpropagation (deriving the objective-driven gradients for joint model training) of the output layer. Extensive experiments on different traffic datasets validate the superiority of the proposed approach, achieving up to 38.85% higher network revenue than the conventional predictive baselines.\",\"PeriodicalId\":100641,\"journal\":{\"name\":\"IEEE Transactions on Machine Learning in Communications and Networking\",\"volume\":\"2 \",\"pages\":\"1178-1192\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10646623\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Machine Learning in Communications and Networking\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10646623/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Machine Learning in Communications and Networking","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10646623/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Objective-Driven Differentiable Optimization of Traffic Prediction and Resource Allocation for Split AI Inference Edge Networks
Split AI inference partitions an artificial intelligence (AI) model into multiple parts, enabling the offloading of computation-intensive AI services. Resource allocation is critical for the performance of split AI inference. The challenge arises from the time-sensitivity of many services versus time-varying traffic arrivals and network conditions. The conventional prediction-based resource allocation frameworks have adopted separate traffic prediction and resource optimization modules, which may be inefficient due to discrepancies between the traffic prediction accuracy and resource optimization objective. This paper proposes a new, objective-driven, differentiable optimization framework that integrates traffic prediction and resource allocation for split AI inference. The resource optimization problem (aimed to maximize network revenue while adhering to service and network constraints) is designed to be embedded as the output layer following the traffic prediction module. As such, the traffic prediction module can be trained directly based on the network revenue instead of the prediction accuracy, significantly outperforming the conventional prediction-based separate design. Employing the Lagrange duality and Karush-Kuhn-Tucker (KKT) conditions, we achieve efficient forward pass (obtaining resource allocation decisions) and backpropagation (deriving the objective-driven gradients for joint model training) of the output layer. Extensive experiments on different traffic datasets validate the superiority of the proposed approach, achieving up to 38.85% higher network revenue than the conventional predictive baselines.