首页 > 最新文献

IEEE Transactions on Parallel and Distributed Systems最新文献

英文 中文
MUCVR: Edge Computing-Enabled High-Quality Multi-User Collaboration for Interactive MVR MUCVR:支持边缘计算的交互式MVR高质量多用户协作
IF 6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-08-04 DOI: 10.1109/TPDS.2025.3595801
Weimin Li;Qin Li;Weihong Tian;Jie Gao;Fan Wu;Jianxun Liu;Ju Ren
Mobile Virtual Reality (MVR), which aims to provide high-quality VR services to mobile devices of end users, has become the latest trend in virtual reality developments. The current MVR solution is to remotely render frame data from a cloud server, while the potential of edge computing in MVR is underexploited. In this paper, we propose a new approach named MUCVR to achieve high-quality interactive MVR collaboration for multiple users by exploiting edge computing. First, we design “vertical” edge–cloud collaboration for VR task rendering, in which foreground interaction is offloaded to an edge server for rendering, while the background environment is rendered by the cloud server. Correspondingly, the VR device of a user is only responsible for decoding and displaying. Second, we propose the “horizontal” multi-user collaboration based on edge–edge cooperation, which synchronizes the data among edge servers. Finally, we implement the proposed MUCVR on an MVR device and the Unity VR application engine. The results show that MUCVR can effectively reduce the MVR service latency, improve the rendering performance, reduce the computing load on the VR device, and, ultimately, improve users’ quality of experience.
移动虚拟现实(MVR)旨在为终端用户的移动设备提供高质量的虚拟现实服务,已成为虚拟现实发展的最新趋势。目前的MVR解决方案是从云服务器远程渲染帧数据,而边缘计算在MVR中的潜力尚未得到充分利用。在本文中,我们提出了一种名为MUCVR的新方法,利用边缘计算实现多用户的高质量交互式MVR协作。首先,我们设计了用于VR任务渲染的“垂直”边缘云协作,其中前台交互被卸载到边缘服务器进行渲染,而后台环境由云服务器渲染。相应的,用户的VR设备只负责解码和显示。其次,我们提出了基于边缘协作的“横向”多用户协作,在边缘服务器之间同步数据。最后,我们在MVR设备和Unity VR应用引擎上实现了所提出的MUCVR。结果表明,MUCVR可以有效降低MVR业务延迟,提高渲染性能,降低VR设备的计算负荷,最终提高用户的体验质量。
{"title":"MUCVR: Edge Computing-Enabled High-Quality Multi-User Collaboration for Interactive MVR","authors":"Weimin Li;Qin Li;Weihong Tian;Jie Gao;Fan Wu;Jianxun Liu;Ju Ren","doi":"10.1109/TPDS.2025.3595801","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3595801","url":null,"abstract":"Mobile Virtual Reality (MVR), which aims to provide high-quality VR services to mobile devices of end users, has become the latest trend in virtual reality developments. The current MVR solution is to remotely render frame data from a cloud server, while the potential of edge computing in MVR is underexploited. In this paper, we propose a new approach named MUCVR to achieve high-quality interactive MVR collaboration for multiple users by exploiting edge computing. First, we design “vertical” edge–cloud collaboration for VR task rendering, in which foreground interaction is offloaded to an edge server for rendering, while the background environment is rendered by the cloud server. Correspondingly, the VR device of a user is only responsible for decoding and displaying. Second, we propose the “horizontal” multi-user collaboration based on edge–edge cooperation, which synchronizes the data among edge servers. Finally, we implement the proposed MUCVR on an MVR device and the Unity VR application engine. The results show that MUCVR can effectively reduce the MVR service latency, improve the rendering performance, reduce the computing load on the VR device, and, ultimately, improve users’ quality of experience.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 10","pages":"2058-2072"},"PeriodicalIF":6.0,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144843117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Decentralized QoS-Aware Model Inference Using Federated Split Learning for Cloud-Edge Medical Detection 基于联邦分裂学习的分布式qos感知模型推理用于云边缘医疗检测
IF 6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-08-01 DOI: 10.1109/TPDS.2025.3594694
Yishan Chen;Xiangwei Zeng;Huashuai Cai;Qing Xu;Zhiquan Liu
The application of federated learning (FL) has been widely extended to medical domains, including medical image analysis and health monitoring. With the increasing computation power demand on edge devices, split federated learning has emerged as a promising FL architecture. In this work, a home healthcare monitoring scenario is explored. Unlike existing split federated learning studies that primarily focus on model-level optimization, this study considers a system-level optimization involving latency, packet error rate, and federated training time. Specifically, a k-means algorithm is presented to select inference nodes, participating training clients, and aggregation servers referring to network conditions and data quality. Furthermore, a reinforcement learning method is utilized to allocate the computation and bandwidth resources during inference, training, and aggregation, thereby further improving the quality of service (QoS) and training efficiency. Simulation results demonstrate that the proposed architecture can achieve the target accuracy while offering the enhanced QoS and reduced the FL training time.
联邦学习的应用已经广泛扩展到医学领域,包括医学图像分析和健康监测。随着边缘设备对计算能力需求的不断增长,分裂联邦学习成为一种很有前途的FL架构。在这项工作中,探讨了家庭医疗保健监控场景。与现有的主要关注模型级优化的分裂联邦学习研究不同,本研究考虑了涉及延迟、数据包错误率和联邦训练时间的系统级优化。具体而言,提出了一种k-means算法,根据网络条件和数据质量选择推理节点、参与训练客户端和聚合服务器。利用强化学习方法对推理、训练和聚合过程中的计算和带宽资源进行分配,进一步提高服务质量(QoS)和训练效率。仿真结果表明,所提出的体系结构在提高QoS和减少FL训练时间的同时,能够达到目标精度。
{"title":"Decentralized QoS-Aware Model Inference Using Federated Split Learning for Cloud-Edge Medical Detection","authors":"Yishan Chen;Xiangwei Zeng;Huashuai Cai;Qing Xu;Zhiquan Liu","doi":"10.1109/TPDS.2025.3594694","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3594694","url":null,"abstract":"The application of federated learning (FL) has been widely extended to medical domains, including medical image analysis and health monitoring. With the increasing computation power demand on edge devices, split federated learning has emerged as a promising FL architecture. In this work, a home healthcare monitoring scenario is explored. Unlike existing split federated learning studies that primarily focus on model-level optimization, this study considers a system-level optimization involving latency, packet error rate, and federated training time. Specifically, a <italic>k</i>-means algorithm is presented to select inference nodes, participating training clients, and aggregation servers referring to network conditions and data quality. Furthermore, a reinforcement learning method is utilized to allocate the computation and bandwidth resources during inference, training, and aggregation, thereby further improving the quality of service (QoS) and training efficiency. Simulation results demonstrate that the proposed architecture can achieve the target accuracy while offering the enhanced QoS and reduced the FL training time.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 10","pages":"2119-2136"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic Multiresource Fair Allocation With Time Discount Utility 具有时间折扣效用的动态多资源公平分配
IF 6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-08-01 DOI: 10.1109/TPDS.2025.3594741
Bin Deng;Weidong Li
Multiresource allocation mechanisms have been studied in many scenarios. A new dynamic multiresource fair allocation model with time discount utility is proposed in this article, where users can arrive and depart at different time slots. We propose a new any price share time discount (APS-TD) mechanism for this model, which accounts for the users’ time discount utility while maintaining desirable properties. We prove that the APS-TD mechanism satisfies cumulative incentive sharing (CSI), i.e., that the cumulative utility of each user is not lower than the cumulative utility generated by evenly allocating the available resources in each time slot; cumulative strategyproofness (CSP), where users cannot increase their cumulative utility by falsely reporting their demands in any time slot; cumulative Pareto optimality (CPO), i.e., where no allocation can increase the cumulative utility of one user without reducing the cumulative utility of another user in any time slot; cumulative envy-freeness (CEF), where users who arrive later should not prefer allocations from other users who arrive first in any time slot; time discount share fairness (TDSF), where users with higher time discount values occupy larger resource shares in each time slot unless the utility levels of both users are generated by evenly allocating resources; and bottleneck fairness (BF), where the allocation should satisfy max-min fairness with respect to the bottleneck resources contained in each time slot. We run the APS-TD mechanism on Alibaba trace-driven data to demonstrate the performance enhancement achieved by our proposed mechanism over the existing mechanism extensions. The results show that the APS-TD mechanism is superior to hybrid multiresource fairness (H-MRF) and stateful dominant resource fairness (SDRF) in many ways.
多资源分配机制在许多情况下都得到了研究。本文提出了一种新的具有时间折扣效用的动态多资源公平分配模型,其中用户可以在不同的时隙到达和离开。我们提出了一种新的任意价格份额时间折扣(APS-TD)机制,该机制在保持理想属性的同时考虑了用户的时间折扣效用。证明了APS-TD机制满足累积激励共享(CSI),即每个用户的累积效用不低于每个时隙平均分配可用资源所产生的累积效用;累积策略证明(CSP),用户不能通过在任何时间段错误报告其需求来增加累积效用;累积帕累托最优性(CPO),即在任何时间段内,没有任何分配可以增加一个用户的累积效用而不减少另一个用户的累积效用;累积嫉妒自由(CEF),即晚到达的用户不应该更喜欢在任何时间段内先到达的其他用户的分配;时间折扣份额公平(TDSF),即时间折扣值较高的用户在每个时隙占有较大的资源份额,除非两个用户的效用水平是通过平均分配资源产生的;瓶颈公平性(BF),其中分配应该满足每个时隙中包含的瓶颈资源的最大最小公平性。我们在阿里巴巴跟踪驱动的数据上运行APS-TD机制,以证明我们提出的机制相对于现有机制扩展所实现的性能增强。结果表明,APS-TD机制在许多方面都优于混合多资源公平(H-MRF)和有状态优势资源公平(SDRF)。
{"title":"Dynamic Multiresource Fair Allocation With Time Discount Utility","authors":"Bin Deng;Weidong Li","doi":"10.1109/TPDS.2025.3594741","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3594741","url":null,"abstract":"Multiresource allocation mechanisms have been studied in many scenarios. A new dynamic multiresource fair allocation model with time discount utility is proposed in this article, where users can arrive and depart at different time slots. We propose a new <italic>any price share</i> time discount (APS-TD) mechanism for this model, which accounts for the users’ time discount utility while maintaining desirable properties. We prove that the APS-TD mechanism satisfies cumulative incentive sharing (CSI), i.e., that the cumulative utility of each user is not lower than the cumulative utility generated by evenly allocating the available resources in each time slot; cumulative strategyproofness (CSP), where users cannot increase their cumulative utility by falsely reporting their demands in any time slot; cumulative Pareto optimality (CPO), i.e., where no allocation can increase the cumulative utility of one user without reducing the cumulative utility of another user in any time slot; cumulative envy-freeness (CEF), where users who arrive later should not prefer allocations from other users who arrive first in any time slot; time discount share fairness (TDSF), where users with higher time discount values occupy larger resource shares in each time slot unless the utility levels of both users are generated by evenly allocating resources; and bottleneck fairness (BF), where the allocation should satisfy max-min fairness with respect to the bottleneck resources contained in each time slot. We run the APS-TD mechanism on Alibaba trace-driven data to demonstrate the performance enhancement achieved by our proposed mechanism over the existing mechanism extensions. The results show that the APS-TD mechanism is superior to hybrid multiresource fairness (H-MRF) and stateful dominant resource fairness (SDRF) in many ways.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 10","pages":"2089-2103"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144868146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FedEFsz: Fair Cross-Silo Federated Learning System With Error-Bounded Lossy Compression FedEFsz:具有误差有界有损压缩的公平跨竖井联邦学习系统
IF 6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-07-31 DOI: 10.1109/TPDS.2025.3593896
Zhaorui Zhang;Sheng Di;Benben Liu;Zhuoran Ji;Guanpeng Li;Xiaoyi Lu;Amelie Chi Zhou;Khalid Ayed Alharthi;Jiannong Cao
Cross-Silo federated learning systems have been identified as an efficient approach to scaling DNN training across geographically-distributed data silos to preserve the privacy of the training data. Communication efficiency and fairness are two major issues that need to be both satisfied when federated learning systems are deployed in practice. Simultaneously guaranteeing both of them, however, is exceptionally difficult because simply combining communication reduction and fairness optimization approaches often causes non-converged training or drastic accuracy degradation. To bridge this gap, we propose FedEFsz. On the one hand, it integrates the state-of-the-art error-bounded lossy compressor SZ3 into cross-silo federated learning systems to significantly reduce communication traffic during the training. On the other hand, it achieves a high fairness (i.e., rather consistent model accuracy and performance across different clients) through a carefully designed heuristic algorithm that can tune the error-bound of SZ3 for different clients during the training. Extensive experimental results based on a GPU cluster with 65 GPU cards show that FedEFsz improves the fairness across different benchmarks by up to 60.88% and meanwhile reduces the communication traffic by up to $315times$.
跨竖井联邦学习系统已被确定为跨地理分布数据竖井扩展DNN训练以保护训练数据隐私的有效方法。通信效率和公平性是联邦学习系统在实际应用中需要同时满足的两个主要问题。然而,同时保证这两者是非常困难的,因为简单地结合通信减少和公平性优化方法通常会导致不收敛的训练或严重的准确性下降。为了弥补这一差距,我们提出了FedEFsz。一方面,它将最先进的错误有界有损压缩器SZ3集成到跨筒仓联邦学习系统中,显著减少了训练过程中的通信流量。另一方面,它通过精心设计的启发式算法实现了较高的公平性(即在不同客户端之间相当一致的模型精度和性能),该算法可以在训练过程中针对不同的客户端调整SZ3的误差边界。基于65张GPU卡的GPU集群的大量实验结果表明,FedEFsz在不同基准测试中的公平性提高了60.88%,同时减少了通信流量高达315times$。
{"title":"FedEFsz: Fair Cross-Silo Federated Learning System With Error-Bounded Lossy Compression","authors":"Zhaorui Zhang;Sheng Di;Benben Liu;Zhuoran Ji;Guanpeng Li;Xiaoyi Lu;Amelie Chi Zhou;Khalid Ayed Alharthi;Jiannong Cao","doi":"10.1109/TPDS.2025.3593896","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3593896","url":null,"abstract":"Cross-Silo federated learning systems have been identified as an efficient approach to scaling DNN training across geographically-distributed data silos to preserve the privacy of the training data. Communication efficiency and fairness are two major issues that need to be both satisfied when federated learning systems are deployed in practice. Simultaneously guaranteeing both of them, however, is exceptionally difficult because simply combining communication reduction and fairness optimization approaches often causes non-converged training or drastic accuracy degradation. To bridge this gap, we propose <i>FedEFsz</i>. On the one hand, it integrates the state-of-the-art error-bounded lossy compressor SZ3 into cross-silo federated learning systems to significantly reduce communication traffic during the training. On the other hand, it achieves a high fairness (i.e., rather consistent model accuracy and performance across different clients) through a carefully designed heuristic algorithm that can tune the error-bound of SZ3 for different clients during the training. Extensive experimental results based on a GPU cluster with 65 GPU cards show that <i>FedEFsz</i> improves the fairness across different benchmarks by up to 60.88% and meanwhile reduces the communication traffic by up to <inline-formula><tex-math>$315times$</tex-math></inline-formula>.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 12","pages":"2482-2496"},"PeriodicalIF":6.0,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parallelization of Network Dynamics Computations in Heterogeneous Distributed Environment 异构分布环境下网络动力学计算的并行化
IF 6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-07-28 DOI: 10.1109/TPDS.2025.3593154
Oleksandr Sudakov;Volodymyr Maistrenko
This paper addresses the problem of parallelizing computations to study nonlinear dynamics in large networks of non-locally coupled oscillators using heterogeneous computing resources. The proposed approach can be applied to a variety of nonlinear dynamics models with runtime specification of parameters and network topologies. Parallelizing the solution of equations for different network elements is performed transparently and, in contrast to available tools, does not require parallel programming from end-users. The runtime scheduler takes into account the performance of computing and communication resources to reduce downtime and to achieve a quasi-optimal parallelizing speed-up. The proposed approach was implemented, and its efficiency is proven by numerous applications for simulating large dynamical networks with 103-108 elements described by Hodgkin–Huxley, FitzHugh–Nagumo, and Kuramoto models, for investigating pathological synchronization during Parkinson’s disease, analyzing multi-stability, for studying chimera and solitary states in 3D networks, etc. All the above computations may be performed using symmetrical multiprocessors, graphic processing units, and a network of workstations within the same run and it was demonstrated that near-linear speed-up can be achieved for large networks. The proposed approach is promising for extension to new hardware like edge-computing devices.
本文讨论了利用异构计算资源对非局部耦合振子网络进行非线性动力学研究的并行计算问题。该方法可应用于各种具有运行时参数和网络拓扑的非线性动力学模型。不同网络元素的方程解的并行化是透明的,与可用的工具相比,不需要最终用户的并行编程。运行时调度器考虑计算和通信资源的性能,以减少停机时间并实现准最佳的并行化加速。该方法已被实现,其有效性已被大量应用于模拟霍奇金-赫胥黎、FitzHugh-Nagumo和Kuramoto模型描述的103-108个元素的大型动态网络,研究帕金森病的病理同步,分析多重稳定性,研究三维网络中的嵌合体和孤立态等。上述所有计算都可以在同一次运行中使用对称多处理器、图形处理单元和工作站网络来执行,并且证明了在大型网络中可以实现近线性加速。所提出的方法有望扩展到边缘计算设备等新硬件。
{"title":"Parallelization of Network Dynamics Computations in Heterogeneous Distributed Environment","authors":"Oleksandr Sudakov;Volodymyr Maistrenko","doi":"10.1109/TPDS.2025.3593154","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3593154","url":null,"abstract":"This paper addresses the problem of parallelizing computations to study nonlinear dynamics in large networks of non-locally coupled oscillators using heterogeneous computing resources. The proposed approach can be applied to a variety of nonlinear dynamics models with runtime specification of parameters and network topologies. Parallelizing the solution of equations for different network elements is performed transparently and, in contrast to available tools, does not require parallel programming from end-users. The runtime scheduler takes into account the performance of computing and communication resources to reduce downtime and to achieve a quasi-optimal parallelizing speed-up. The proposed approach was implemented, and its efficiency is proven by numerous applications for simulating large dynamical networks with 10<sup>3</sup>-10<sup>8</sup> elements described by Hodgkin–Huxley, FitzHugh–Nagumo, and Kuramoto models, for investigating pathological synchronization during Parkinson’s disease, analyzing multi-stability, for studying chimera and solitary states in 3D networks, etc. All the above computations may be performed using symmetrical multiprocessors, graphic processing units, and a network of workstations within the same run and it was demonstrated that near-linear speed-up can be achieved for large networks. The proposed approach is promising for extension to new hardware like edge-computing devices.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 10","pages":"2030-2044"},"PeriodicalIF":6.0,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144831795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ELICA: Efficient and Load Balanced I/O Cache Architecture for Hyperconverged Infrastructures ELICA:面向超融合基础设施的高效负载均衡I/O缓存架构
IF 6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-07-24 DOI: 10.1109/TPDS.2025.3592275
Mostafa Kishani;Sina Ahmadi;Saba Ahmadian;Reza Salkhordeh;Zdenek Becvar;Onur Mutlu;André Brinkmann;Hossein Asadi
Hyperconverged Infrastructures (HCIs) combine processing and storage elements to meet the requirements of data-intensive applications in performance, scalability, and quality of service. As an emerging paradigm, HCI should couple with a variety of traditional performance improvement approaches such as I/O caching in virtualized platforms. Contemporary I/O caching schemes are optimized for traditional single-node storage architectures and suffer from two major shortcomings for multi-node architectures: a) imbalanced cache space requirement and b) imbalanced I/O traffic and load. This makes existing schemes inefficient in distributing cache resources over an array of separate physical nodes. In this paper, we propose an Efficient and Load Balanced I/O Cache Architecture (ELICA), managing the solid-state drive (SSD) cache resources across HCI nodes to enhance I/O performance. ELICA dynamically reconfigures and distributes the SSD cache resources throughout the array of HCI nodes and also balances the network traffic and I/O cache load by dynamic reallocation of cache resources. To maximize the performance, we further present an optimization problem defined by Integer Linear Programming to efficiently distribute cache resources and balance the network traffic and I/O cache relocations. Our experimental results on a real platform show that ELICA improves quality of service in terms of average and worst-case latency in HCIs by 3.1× and 23%, respectively, compared to the state-of-the-art.
超融合基础设施(Hyperconverged infrastructure, hci)将处理和存储元素结合起来,满足数据密集型应用对性能、可扩展性和服务质量的需求。作为一种新兴的范例,HCI应该与各种传统的性能改进方法相结合,例如虚拟平台中的I/O缓存。当前的I/O缓存方案针对传统的单节点存储架构进行了优化,但对于多节点架构存在两个主要缺点:a)缓存空间需求不平衡;b) I/O流量和负载不平衡。这使得现有的模式在将缓存资源分配到一组独立的物理节点上时效率低下。在本文中,我们提出了一个高效和负载均衡的I/O缓存架构(ELICA),管理跨HCI节点的固态驱动器(SSD)缓存资源,以提高I/O性能。ELICA在HCI节点阵列中动态重新配置和分配SSD缓存资源,并通过动态重新分配缓存资源来平衡网络流量和I/O缓存负载。为了最大限度地提高性能,我们进一步提出了一个由整数线性规划定义的优化问题,以有效地分配缓存资源,平衡网络流量和I/O缓存重定位。我们在真实平台上的实验结果表明,与最先进的技术相比,ELICA在hci的平均和最坏情况延迟方面分别提高了3.1倍和23%的服务质量。
{"title":"ELICA: Efficient and Load Balanced I/O Cache Architecture for Hyperconverged Infrastructures","authors":"Mostafa Kishani;Sina Ahmadi;Saba Ahmadian;Reza Salkhordeh;Zdenek Becvar;Onur Mutlu;André Brinkmann;Hossein Asadi","doi":"10.1109/TPDS.2025.3592275","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3592275","url":null,"abstract":"<italic>Hyperconverged Infrastructures</i> (HCIs) combine processing and storage elements to meet the requirements of data-intensive applications in performance, scalability, and quality of service. As an emerging paradigm, HCI should couple with a variety of traditional performance improvement approaches such as I/O caching in virtualized platforms. Contemporary I/O caching schemes are optimized for traditional single-node storage architectures and suffer from two major shortcomings for multi-node architectures: a) imbalanced cache space requirement and b) imbalanced I/O traffic and load. This makes existing schemes inefficient in distributing cache resources over an array of separate physical nodes. In this paper, we propose an <italic><u>E</u>fficient and <u>L</u>oad Balanced <u>I</u>/O <u>C</u>ache <u>A</u>rchitecture</i> (ELICA), managing the <italic>solid-state drive</i> (SSD) cache resources across HCI nodes to enhance I/O performance. ELICA dynamically reconfigures and distributes the SSD cache resources throughout the array of HCI nodes and also balances the network traffic and I/O cache load by dynamic reallocation of cache resources. To maximize the performance, we further present an optimization problem defined by <italic>Integer Linear Programming</i> to efficiently distribute cache resources and balance the network traffic and I/O cache relocations. Our experimental results on a real platform show that ELICA improves quality of service in terms of average and worst-case latency in HCIs by 3.1× and 23%, respectively, compared to the state-of-the-art.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 10","pages":"2152-2168"},"PeriodicalIF":6.0,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144990154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance Portability Assessment in Gaia Gaia的性能可移植性评估
IF 6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-07-22 DOI: 10.1109/TPDS.2025.3591452
Giulio Malenza;Valentina Cesare;Marco Edoardo Santimaria;Robert Birke;Alberto Vecchiato;Ugo Becciani;Marco Aldinucci
Modern scientific experiments produce ever-increasing amounts of data, soon requiring ExaFLOPs computing capacities for analysis. Reaching such performance requires purpose-built supercomputers with $O(10^{3})$ nodes, each hosting multicore CPUs and multiple GPUs, and applications designed to exploit this hardware optimally. Given that each supercomputer is generally a one-off project, the need for computing frameworks portable across diverse CPU and GPU architectures without performance losses is increasingly compelling. We investigate the performance portability () of a real-world application: the solver module of the AVU–GSR pipeline for the ESA Gaia mission. This code finds the astrometric parameters of ${sim} 10^{8}$ stars in the Milky Way using the LSQR iterative algorithm. LSQR is widely used to solve linear systems of equations across a wide range of high-performance computing applications, elevating the study beyond its astrophysical relevance. The code is memory-bound, with six main compute kernels implementing sparse matrix-by-vector products. We optimize the previous CUDA implementation and port the code to further six GPU-acceleration frameworks: C++ PSTL, SYCL, OpenMP, HIP, KOKKOS, and OpenACC. We evaluate each framework’s performance portability across multiple GPUs (NVIDIA and AMD) and problem sizes in terms of application and architectural efficiency. Architectural efficiency is estimated through the roofline model of the six most computationally expensive GPU kernels. Our results show that C++ library-based (C++ PSTL and KOKKOS), pragma-based (OpenMP and OpenACC), and language-specific (CUDA, HIP, and SYCL) frameworks achieve increasingly better performance portability across the supported platforms with larger problem sizes providing better scores due to higher GPU occupancies.
现代科学实验产生的数据量不断增加,很快就需要ExaFLOPs的计算能力来进行分析。要达到这样的性能,需要有$O(10^{3})$节点的专用超级计算机,每个节点托管多核cpu和多个gpu,以及设计用于最佳利用这些硬件的应用程序。考虑到每台超级计算机通常都是一次性项目,对跨不同CPU和GPU架构的便携计算框架的需求越来越迫切,而不会造成性能损失。我们研究了一个实际应用的性能可移植性():欧空局盖亚任务的AVU-GSR管道的求解器模块。此代码使用LSQR迭代算法查找银河系中${sim} 10^{8}$恒星的天体测量参数。LSQR被广泛用于求解各种高性能计算应用中的线性方程组,使该研究超越了其与天体物理学的相关性。代码是内存限制的,有六个主要的计算内核实现稀疏矩阵向量乘积。我们优化了以前的CUDA实现,并将代码移植到另外六个gpu加速框架:c++ PSTL, SYCL, OpenMP, HIP, KOKKOS和OpenACC。我们评估了每个框架在多个gpu (NVIDIA和AMD)上的性能可移植性,以及应用程序和架构效率方面的问题大小。通过六个计算成本最高的GPU内核的屋顶线模型来估计架构效率。我们的结果表明,基于c++库的(c++ PSTL和KOKKOS)、基于pragma的(OpenMP和OpenACC)和特定于语言的(CUDA、HIP和SYCL)框架在支持的平台上实现了越来越好的性能可移植性,问题规模更大,由于GPU占用率更高,得分也更高。
{"title":"Performance Portability Assessment in Gaia","authors":"Giulio Malenza;Valentina Cesare;Marco Edoardo Santimaria;Robert Birke;Alberto Vecchiato;Ugo Becciani;Marco Aldinucci","doi":"10.1109/TPDS.2025.3591452","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3591452","url":null,"abstract":"Modern scientific experiments produce ever-increasing amounts of data, soon requiring ExaFLOPs computing capacities for analysis. Reaching such performance requires purpose-built supercomputers with <inline-formula><tex-math>$O(10^{3})$</tex-math></inline-formula> nodes, each hosting multicore CPUs and multiple GPUs, and applications designed to exploit this hardware optimally. Given that each supercomputer is generally a one-off project, the need for computing frameworks portable across diverse CPU and GPU architectures without performance losses is increasingly compelling. We investigate the performance portability (<inline-graphic>) of a real-world application: the solver module of the AVU–GSR pipeline for the ESA Gaia mission. This code finds the astrometric parameters of <inline-formula><tex-math>${sim} 10^{8}$</tex-math></inline-formula> stars in the Milky Way using the LSQR iterative algorithm. LSQR is widely used to solve linear systems of equations across a wide range of high-performance computing applications, elevating the study beyond its astrophysical relevance. The code is memory-bound, with six main compute kernels implementing sparse matrix-by-vector products. We optimize the previous CUDA implementation and port the code to further six GPU-acceleration frameworks: C++ PSTL, SYCL, OpenMP, HIP, KOKKOS, and OpenACC. We evaluate each framework’s performance portability across multiple GPUs (NVIDIA and AMD) and problem sizes in terms of application and architectural efficiency. Architectural efficiency is estimated through the roofline model of the six most computationally expensive GPU kernels. Our results show that C++ library-based (C++ PSTL and KOKKOS), pragma-based (OpenMP and OpenACC), and language-specific (CUDA, HIP, and SYCL) frameworks achieve increasingly better performance portability across the supported platforms with larger problem sizes providing better <inline-graphic> scores due to higher GPU occupancies.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 10","pages":"2045-2057"},"PeriodicalIF":6.0,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11090032","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144831794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Task Scheduling in Geo-Distributed Computing: A Survey 地理分布式计算中的任务调度研究
IF 6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-07-21 DOI: 10.1109/TPDS.2025.3591010
Yujian Wu;Shanjiang Tang;Ce Yu;Bin Yang;Chao Sun;Jian Xiao;Hutong Wu;Jinghua Feng
Geo-distributed computing, a paradigm that assigns computational tasks to globally distributed nodes, has emerged as a promising approach in cloud computing, edge computing, cloud-edge computing, and supercomputer computing (SC). It enables low-latency services, ensures data locality, and handles large-scale applications. As global computing capacity and task demands increase rapidly, scheduling tasks for efficient execution in geo-distributed computing systems has become an increasingly critical research challenge. It arises from the inherent characteristics of geographic distribution, including heterogeneous network conditions, region-specific resource pricing, and varying computational capabilities across locations. Researchers have developed diverse task scheduling methods tailored to geo-distributed scenarios, aiming to achieve objectives such as performance enhancement, fairness assurance, and fault-tolerance improvement. This survey provides a comprehensive and systematic review of task scheduling techniques across four major distributed computing environments, with an in-depth analysis of these approaches based on their core scheduling objectives. Through our analysis, we identify key research challenges and outline promising directions for advancing task scheduling in geo-distributed computing.
地理分布式计算是一种将计算任务分配给全局分布式节点的范式,在云计算、边缘计算、云边缘计算和超级计算机计算(SC)中已经成为一种很有前途的方法。它支持低延迟服务,确保数据局部性,并处理大规模应用程序。随着全球计算能力和任务需求的快速增长,在地理分布式计算系统中调度任务以实现高效执行已成为一个日益重要的研究挑战。它源于地理分布的固有特征,包括异构网络条件、特定区域的资源定价以及不同位置的不同计算能力。研究人员针对地理分布式场景开发了多种任务调度方法,旨在实现性能增强、公平性保证和容错性改进等目标。本调查对四种主要分布式计算环境中的任务调度技术进行了全面而系统的回顾,并根据这些方法的核心调度目标对其进行了深入的分析。通过我们的分析,我们确定了关键的研究挑战,并概述了在地理分布式计算中推进任务调度的有希望的方向。
{"title":"Task Scheduling in Geo-Distributed Computing: A Survey","authors":"Yujian Wu;Shanjiang Tang;Ce Yu;Bin Yang;Chao Sun;Jian Xiao;Hutong Wu;Jinghua Feng","doi":"10.1109/TPDS.2025.3591010","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3591010","url":null,"abstract":"Geo-distributed computing, a paradigm that assigns computational tasks to globally distributed nodes, has emerged as a promising approach in cloud computing, edge computing, cloud-edge computing, and supercomputer computing (SC). It enables low-latency services, ensures data locality, and handles large-scale applications. As global computing capacity and task demands increase rapidly, scheduling tasks for efficient execution in geo-distributed computing systems has become an increasingly critical research challenge. It arises from the inherent characteristics of geographic distribution, including heterogeneous network conditions, region-specific resource pricing, and varying computational capabilities across locations. Researchers have developed diverse task scheduling methods tailored to geo-distributed scenarios, aiming to achieve objectives such as performance enhancement, fairness assurance, and fault-tolerance improvement. This survey provides a comprehensive and systematic review of task scheduling techniques across four major distributed computing environments, with an in-depth analysis of these approaches based on their core scheduling objectives. Through our analysis, we identify key research challenges and outline promising directions for advancing task scheduling in geo-distributed computing.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 10","pages":"2073-2088"},"PeriodicalIF":6.0,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144867934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Doing More With Less: Balancing Probing Costs and Task Offloading Efficiency At the Network Edge 事半功倍:在网络边缘平衡探测成本和任务卸载效率
IF 6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-07-18 DOI: 10.1109/TPDS.2025.3590368
Xishuo Li;Shan Zhang;Tie Ma;Zhiyuan Wang;Hongbin Luo
In decentralized edge computing environments, user devices need to perceive the status of neighboring devices, including computational availability and communication delays, to optimize task offloading decisions. However, probing the real-time status of all devices introduces significant overhead, and probing only a few devices can lead to suboptimal decision-making, considering the massive connectivity and non-stationarity of edge networks. Aiming to balance the status probing cost and task offloading performance, we study the joint transmission and computation status probing problem, where the status and offloading delay on edge devices are characterized by general, bounded, and non-stationary distributions. The problem is proved to be NP-hard, even with known offloading delay distributions. To handle this case, we design an efficient offline method that guarantees a $(1-1/e)$ approximation ratio via leveraging the submodularity of the expected offloading delay function. Furthermore, for scenarios with unknown and non-stationary offloading delay distributions, we reformulate the problem using the piecewise-stationary combinatorial multi-armed bandit framework and develop a change-point detection-based online status probing (CD-OSP) algorithm. CD-OSP can timely detect environmental changes and update probing strategies via using the proposed offline method and estimating offloading delay distributions. We prove that CD-OSP achieves a regret of $mathcal {O}(NVsqrt{Tln T})$, with $N$, $V$, and $T$ denoting the numbers of stationary periods, edge devices, and time slots, respectively. Extensive simulations and testbed experiments demonstrate that CD-OSP significantly outperforms state-of-the-art baselines, which can reduce the probing cost by up to 16.18X with a 2.14X increase in the offloading delay.
在分散的边缘计算环境中,用户设备需要感知相邻设备的状态,包括计算可用性和通信延迟,以优化任务卸载决策。然而,探测所有设备的实时状态会带来巨大的开销,并且考虑到边缘网络的大量连接和非平定性,仅探测少数设备可能导致次优决策。为了平衡状态探测成本和任务卸载性能,研究了联合传输和计算状态探测问题,其中边缘设备的状态和卸载延迟具有一般分布、有界分布和非平稳分布的特征。这个问题被证明是np困难的,即使有已知的卸载延迟分布。为了处理这种情况,我们设计了一种有效的离线方法,通过利用期望卸载延迟函数的子模块化来保证$(1-1/e)$近似比。此外,对于具有未知和非平稳卸载延迟分布的场景,我们使用分段平稳组合多臂强盗框架重新表述问题,并开发了基于变点检测的在线状态探测(CD-OSP)算法。CD-OSP可以利用所提出的离线方法和估计卸载延迟分布来及时检测环境变化并更新探测策略。我们证明CD-OSP实现了$mathcal {O}(NVsqrt{Tln T})$的遗憾,其中$N$, $V$和$T$分别表示平稳周期,边缘设备和时隙的数量。大量的模拟和试验台实验表明,CD-OSP显著优于最先进的基线,可以将探测成本降低16.18倍,卸载延迟增加2.14倍。
{"title":"Doing More With Less: Balancing Probing Costs and Task Offloading Efficiency At the Network Edge","authors":"Xishuo Li;Shan Zhang;Tie Ma;Zhiyuan Wang;Hongbin Luo","doi":"10.1109/TPDS.2025.3590368","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3590368","url":null,"abstract":"In decentralized edge computing environments, user devices need to perceive the status of neighboring devices, including computational availability and communication delays, to optimize task offloading decisions. However, probing the real-time status of all devices introduces significant overhead, and probing only a few devices can lead to suboptimal decision-making, considering the massive connectivity and non-stationarity of edge networks. Aiming to balance the status probing cost and task offloading performance, we study the joint transmission and computation status probing problem, where the status and offloading delay on edge devices are characterized by general, bounded, and non-stationary distributions. The problem is proved to be NP-hard, even with known offloading delay distributions. To handle this case, we design an efficient offline method that guarantees a <inline-formula><tex-math>$(1-1/e)$</tex-math></inline-formula> approximation ratio via leveraging the submodularity of the expected offloading delay function. Furthermore, for scenarios with unknown and non-stationary offloading delay distributions, we reformulate the problem using the piecewise-stationary combinatorial multi-armed bandit framework and develop a change-point detection-based online status probing (CD-OSP) algorithm. CD-OSP can timely detect environmental changes and update probing strategies via using the proposed offline method and estimating offloading delay distributions. We prove that CD-OSP achieves a regret of <inline-formula><tex-math>$mathcal {O}(NVsqrt{Tln T})$</tex-math></inline-formula>, with <inline-formula><tex-math>$N$</tex-math></inline-formula>, <inline-formula><tex-math>$V$</tex-math></inline-formula>, and <inline-formula><tex-math>$T$</tex-math></inline-formula> denoting the numbers of stationary periods, edge devices, and time slots, respectively. Extensive simulations and testbed experiments demonstrate that CD-OSP significantly outperforms state-of-the-art baselines, which can reduce the probing cost by up to 16.18X with a 2.14X increase in the offloading delay.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 11","pages":"2247-2263"},"PeriodicalIF":6.0,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145141745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cannikin: No Lagger of SLO in Concurrent Multiple LoRA LLM Serving 并行多LoRA LLM服务的SLO不滞后
IF 6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-07-17 DOI: 10.1109/TPDS.2025.3590014
Ruidong Zhu;Ziyue Jiang;Zhi Zhang;Xin Liu;Xuanzhe Liu;Xin Jin
Low-rank adaptation (LoRA) is widely used to efficiently fine-tune large language models (LLMs), leading to multiple models fine-tuned from the same pre-trained LLM. State-of-the-art LLM serving systems colocate these LoRA models on the same GPU instances for concurrent serving, which decreases memory usage and boosts efficiency. However, the unawareness of the SLO requirements of each LoRA service and the interference between requests from different LoRA services can cause significant SLO violations. This paper presents Cannikin, a multi-LoRA inference serving system that optimizes the minimum of the SLO attainments of all LoRA services in the serving system, denoted as lagger-SLO attainment. We obtain insights from the characterization of a real-world multi-LoRA serving trace, which reveals the stable input/output lengths of the most popular LoRA services. This motivates Cannikin to propose an SLO-aware scheduling algorithm that prioritizes requests based on efficient deadline estimation. Cannikin further detects the influence of interference between different LoRA services on SLO violations and eliminates the bias between these services. The evaluation using real-world traces demonstrates that compared to the state-of-the-art multi-LoRA serving systems, Cannikin can handle up to 3.6× higher rates or 2.8× more burstiness while maintaining the SLO attainment of each LoRA service $> $ 90% .
低秩自适应(Low-rank adaptation, LoRA)被广泛地应用于大型语言模型的有效微调,使得同一预训练的大型语言模型可以对多个模型进行微调。最先进的LLM服务系统将这些LoRA模型放在相同的GPU实例上进行并发服务,从而减少了内存使用并提高了效率。但是,不了解每个LoRA服务的SLO需求以及来自不同LoRA服务的请求之间的干扰可能会导致严重的SLO违规。本文提出了一种多LoRA推理服务系统,它优化了服务系统中所有LoRA服务的SLO成就的最小值,即lagger-SLO成就。我们从真实世界的多LoRA服务跟踪的特征中获得见解,它揭示了最流行的LoRA服务的稳定输入/输出长度。这促使cannkin提出了一种基于有效截止日期估计的请求优先级的慢速感知调度算法。cannkin进一步检测不同LoRA服务之间的干扰对SLO违规的影响,并消除这些服务之间的偏见。使用真实世界轨迹的评估表明,与最先进的多LoRA服务系统相比,在保持每个LoRA服务的SLO实现的同时,该系统可以处理高达3.6倍的速率或2.8倍的突发事件。90%。
{"title":"Cannikin: No Lagger of SLO in Concurrent Multiple LoRA LLM Serving","authors":"Ruidong Zhu;Ziyue Jiang;Zhi Zhang;Xin Liu;Xuanzhe Liu;Xin Jin","doi":"10.1109/TPDS.2025.3590014","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3590014","url":null,"abstract":"Low-rank adaptation (LoRA) is widely used to efficiently fine-tune large language models (LLMs), leading to multiple models fine-tuned from the same pre-trained LLM. State-of-the-art LLM serving systems colocate these LoRA models on the same GPU instances for concurrent serving, which decreases memory usage and boosts efficiency. However, the unawareness of the SLO requirements of each LoRA service and the interference between requests from different LoRA services can cause significant SLO violations. This paper presents Cannikin, a multi-LoRA inference serving system that optimizes the minimum of the SLO attainments of all LoRA services in the serving system, denoted as lagger-SLO attainment. We obtain insights from the characterization of a real-world multi-LoRA serving trace, which reveals the stable input/output lengths of the most popular LoRA services. This motivates Cannikin to propose an SLO-aware scheduling algorithm that prioritizes requests based on efficient deadline estimation. Cannikin further detects the influence of interference between different LoRA services on SLO violations and eliminates the bias between these services. The evaluation using real-world traces demonstrates that compared to the state-of-the-art multi-LoRA serving systems, Cannikin can handle up to 3.6× higher rates or 2.8× more burstiness while maintaining the SLO attainment of each LoRA service <inline-formula><tex-math>$&gt; $</tex-math></inline-formula> 90% .","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 9","pages":"1972-1984"},"PeriodicalIF":6.0,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144751089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Parallel and Distributed Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1