Latency-aware service placement for GenAI at the edge

Bipul Thapa, Lena Mashayekhy
{"title":"Latency-aware service placement for GenAI at the edge","authors":"Bipul Thapa, Lena Mashayekhy","doi":"10.1117/12.3013437","DOIUrl":null,"url":null,"abstract":"In the rapidly evolving landscape of artificial intelligence, Large Language Models (LLMs) and Generative AI (GenAI) have emerged as front-runners in shaping the next generation of intelligent applications, where human-like data generation is necessary. While their capabilities have shown transformative potential in centralized computing environments, there is a growing shift towards decentralized edge AI models, where computations are orchestrated closer to data sources to provide immediate insights, faster response times, and localized intelligence without the overhead of cloud communication. For latency-critical applications like autonomous vehicle driving, GenAI at the edge is vital, allowing vehicles to instantly generate and adapt driving strategies based on ever-changing road conditions and traffic patterns. In this paper, we propose a latency-aware service placement approach, designed for the seamless deployment of GenAI services on these cloudlets. We represent GenAI as a Direct Acyclic Graph, where GenAI operations represent the nodes and the dependencies between these operations represent the edges. We propose an Ant Colony Optimization approach that guides the placement of GenAI services at the edge based on capabilities of cloudlets and network conditions. Through experimental validation, we achieve notable GenAI performance at the edge with lower latency and efficient resource utilization. This advancement is expected to revolutionize and innovate in the field of GenAI, paving the way for more efficient and transformative applications at the edge.","PeriodicalId":178341,"journal":{"name":"Defense + Commercial Sensing","volume":"60 2","pages":"130580G - 130580G-14"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Defense + Commercial Sensing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.3013437","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In the rapidly evolving landscape of artificial intelligence, Large Language Models (LLMs) and Generative AI (GenAI) have emerged as front-runners in shaping the next generation of intelligent applications, where human-like data generation is necessary. While their capabilities have shown transformative potential in centralized computing environments, there is a growing shift towards decentralized edge AI models, where computations are orchestrated closer to data sources to provide immediate insights, faster response times, and localized intelligence without the overhead of cloud communication. For latency-critical applications like autonomous vehicle driving, GenAI at the edge is vital, allowing vehicles to instantly generate and adapt driving strategies based on ever-changing road conditions and traffic patterns. In this paper, we propose a latency-aware service placement approach, designed for the seamless deployment of GenAI services on these cloudlets. We represent GenAI as a Direct Acyclic Graph, where GenAI operations represent the nodes and the dependencies between these operations represent the edges. We propose an Ant Colony Optimization approach that guides the placement of GenAI services at the edge based on capabilities of cloudlets and network conditions. Through experimental validation, we achieve notable GenAI performance at the edge with lower latency and efficient resource utilization. This advancement is expected to revolutionize and innovate in the field of GenAI, paving the way for more efficient and transformative applications at the edge.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
针对边缘 GenAI 的延迟感知服务布局
在快速发展的人工智能领域,大型语言模型(LLM)和生成式人工智能(GenAI)已成为塑造下一代智能应用的领跑者,在这些应用中,需要像人类一样生成数据。虽然它们的能力在集中式计算环境中已显示出变革潜力,但现在正日益转向分散式边缘人工智能模型,在这种模型中,计算的协调工作更接近数据源,以提供即时洞察力、更快的响应时间和本地化智能,而无需云通信的开销。对于自动驾驶汽车等对延迟要求极高的应用来说,边缘 GenAI 至关重要,它能让汽车根据不断变化的路况和交通模式即时生成和调整驾驶策略。在本文中,我们提出了一种延迟感知服务放置方法,旨在将 GenAI 服务无缝部署到这些小云中。我们将 GenAI 表述为直接循环图,其中 GenAI 操作代表节点,这些操作之间的依赖关系代表边。我们提出了一种蚁群优化(Ant Colony Optimization)方法,可根据小云的能力和网络条件指导将 GenAI 服务放置在边上。通过实验验证,我们在边缘实现了显著的 GenAI 性能,延迟更低,资源利用效率更高。这一进步有望在 GenAI 领域带来革命性的创新,为更高效、更具变革性的边缘应用铺平道路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
相关文献
Importance of financial liquidity in hospital management
IF 0 Przeglad epidemiologicznyPub Date : 2019-01-01 DOI: 10.32394/pe.73.30
Dominik Maślach, Justyna Markiewicz, Alina Warelis, Michalina Krzyżak
IMPORTANCE OF PROFESSIONAL PRACTICE IN PHARMACIST LIFE
IF 0 IOSR Journal of PharmacyPub Date : 2012-01-01 DOI: 10.9790/3013-0220340341
T. Garg
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Enhanced robot state estimation using physics-informed neural networks and multimodal proprioceptive data Exploring MOF-based micromotors as SERS sensors Adaptive object detection algorithms for resource constrained autonomous robotic systems Adaptive SIF-EKF estimation for fault detection in attitude control experiments A homogeneous low-resolution face recognition method using correlation features at the edge
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1