首页 > 最新文献

IEEE Transactions on Parallel and Distributed Systems最新文献

英文 中文
Slark: A Performance Robust Decentralized Inter-Datacenter Deadline-Aware Coflows Scheduling Framework With Local Information 基于本地信息的分布式数据中心间截止日期感知协同流调度框架
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-11-28 DOI: 10.1109/TPDS.2024.3508275
Xiaodong Dong;Lihai Nie;Zheli Liu;Yang Xiang
Inter-datacenter network applications generate massive coflows for purposes, e.g., backup, synchronization, and analytics, with deadline requirements. Decentralized coflow scheduling frameworks are desirable for their scalability in cross-domain deployment but grappling with the challenge of information agnosticism for lack of cross-domain privileges. Current information-agnostic coflow scheduling methods are incompatible with decentralized frameworks for relying on centralized controllers to continuously monitor and learn from coflow global transmission states to infer global coflow information. Alternative methods propose mechanisms for decentralized global coflow information gathering and synchronization. However, they require dedicated physical hardware or control logic, which could be impractical for incremental deployment. This article proposes Slark, a decentralized deadline-aware coflow scheduling framework, which meets coflows’ soft and hard deadline requirements using only local traffic information. It eschews requiring global coflow transmission states and dedicated hardware or control logic by leveraging multiple software-implemented scheduling agents working independently on each node and integrating such information agnosticism into node-specific bandwidth allocation by modeling it as a robust optimization problem with flow information on the other nodes represented as uncertain parameters. Subsequently, we validate the performance robustness of Slark by investigating how perturbations in the optimal objective function value and the associated optimal solution are affected by uncertain parameters. Finally, we propose a firebug-swarm-optimization-based heuristic algorithm to tackle the non-convexity in our problem. Experimental results demonstrate that Slark can significantly enhance transmission revenue and increase soft and hard deadline guarantee ratios by 10.52% and 7.99% on average.
跨数据中心网络应用程序生成大量的coflow,用于备份、同步和分析等目的,并具有截止日期要求。分散的协同流调度框架在跨域部署中的可扩展性是可取的,但由于缺乏跨域权限而面临信息不可知论的挑战。当前信息不可知的协同流调度方法与去中心化框架不兼容,依赖于集中式控制器持续监控和学习协同流全局传输状态来推断全局协同流信息。替代方法提出了分散的全球共流信息收集和同步机制。然而,它们需要专用的物理硬件或控制逻辑,这对于增量部署可能是不切实际的。本文提出了一种分散的截止日期感知的协同流调度框架lark,它仅使用本地交通信息来满足协同流的软截止日期和硬截止日期要求。它通过利用在每个节点上独立工作的多个软件实现的调度代理,并将这种信息不可知性集成到节点特定的带宽分配中,从而避免了对全局共流传输状态和专用硬件或控制逻辑的要求,将其建模为一个鲁棒优化问题,其他节点上的流量信息表示为不确定参数。随后,我们通过研究最优目标函数值和相关最优解的扰动如何受到不确定参数的影响来验证lark的性能鲁棒性。最后,我们提出了一种基于火虫群优化的启发式算法来解决问题中的非凸性。实验结果表明,lark可以显著提高传输收益,将软、硬截止日期保证比平均提高10.52%和7.99%。
{"title":"Slark: A Performance Robust Decentralized Inter-Datacenter Deadline-Aware Coflows Scheduling Framework With Local Information","authors":"Xiaodong Dong;Lihai Nie;Zheli Liu;Yang Xiang","doi":"10.1109/TPDS.2024.3508275","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3508275","url":null,"abstract":"Inter-datacenter network applications generate massive coflows for purposes, e.g., backup, synchronization, and analytics, with deadline requirements. Decentralized coflow scheduling frameworks are desirable for their scalability in cross-domain deployment but grappling with the challenge of information agnosticism for lack of cross-domain privileges. Current information-agnostic coflow scheduling methods are incompatible with decentralized frameworks for relying on centralized controllers to continuously monitor and learn from coflow global transmission states to infer global coflow information. Alternative methods propose mechanisms for decentralized global coflow information gathering and synchronization. However, they require dedicated physical hardware or control logic, which could be impractical for incremental deployment. This article proposes Slark, a decentralized deadline-aware coflow scheduling framework, which meets coflows’ soft and hard deadline requirements using only local traffic information. It eschews requiring global coflow transmission states and dedicated hardware or control logic by leveraging multiple software-implemented scheduling agents working independently on each node and integrating such information agnosticism into node-specific bandwidth allocation by modeling it as a robust optimization problem with flow information on the other nodes represented as uncertain parameters. Subsequently, we validate the performance robustness of Slark by investigating how perturbations in the optimal objective function value and the associated optimal solution are affected by uncertain parameters. Finally, we propose a firebug-swarm-optimization-based heuristic algorithm to tackle the non-convexity in our problem. Experimental results demonstrate that Slark can significantly enhance transmission revenue and increase soft and hard deadline guarantee ratios by 10.52% and 7.99% on average.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 2","pages":"197-211"},"PeriodicalIF":5.6,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint Dynamic Data and Model Parallelism for Distributed Training of DNNs Over Heterogeneous Infrastructure 基于异构基础设施的dnn分布式训练联合动态数据和模型并行性
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-11-27 DOI: 10.1109/TPDS.2024.3506588
Zhi Ling;Xiaofeng Jiang;Xiaobin Tan;Huasen He;Shiyin Zhu;Jian Yang
Distributed training of deep neural networks (DNNs) suffers from efficiency declines in dynamic heterogeneous environments, due to the resource wastage brought by the straggler problem in data parallelism (DP) and pipeline bubbles in model parallelism (MP). Additionally, the limited resource availability requires a trade-off between training performance and long-term costs, particularly in online settings. To address these challenges, this article presents a novel online approach to maximize long-term training efficiency in heterogeneous environments through uneven data assignment and communication-aware model partitioning. A group-based hierarchical architecture combining DP and MP is developed to balance discrepant computation and communication capabilities, and offer a flexible parallel mechanism. In order to jointly optimize the performance and long-term cost of the online DL training process, we formulate this problem as a stochastic optimization with time-averaged constraints. By utilizing Lyapunov’s stochastic network optimization theory, we decompose it into several instantaneous sub-optimizations, and devise an effective online solution to address them based on tentative searching and linear solving. We have implemented a prototype system and evaluated the effectiveness of our solution based on realistic experiments, reducing batch training time by up to 68.59% over state-of-the-art methods.
由于数据并行(DP)中的离散问题和模型并行(MP)中的管道气泡带来的资源浪费,深度神经网络(dnn)的分布式训练在动态异构环境下效率下降。此外,有限的可用资源需要在培训绩效和长期成本之间进行权衡,特别是在在线设置中。为了解决这些挑战,本文提出了一种新的在线方法,通过不均匀的数据分配和通信感知模型划分来最大化异构环境中的长期训练效率。为了平衡计算能力和通信能力的差异,提出了一种基于分组的分层结构,并提供了灵活的并行机制。为了共同优化在线DL训练过程的性能和长期成本,我们将该问题表述为具有时间平均约束的随机优化问题。利用Lyapunov随机网络优化理论,将其分解为若干瞬时子优化,并基于暂定搜索和线性求解设计了一个有效的在线解决方案。我们已经实现了一个原型系统,并基于实际实验评估了我们的解决方案的有效性,与最先进的方法相比,将批量训练时间减少了68.59%。
{"title":"Joint Dynamic Data and Model Parallelism for Distributed Training of DNNs Over Heterogeneous Infrastructure","authors":"Zhi Ling;Xiaofeng Jiang;Xiaobin Tan;Huasen He;Shiyin Zhu;Jian Yang","doi":"10.1109/TPDS.2024.3506588","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3506588","url":null,"abstract":"Distributed training of deep neural networks (DNNs) suffers from efficiency declines in dynamic heterogeneous environments, due to the resource wastage brought by the straggler problem in data parallelism (DP) and pipeline bubbles in model parallelism (MP). Additionally, the limited resource availability requires a trade-off between training performance and long-term costs, particularly in online settings. To address these challenges, this article presents a novel online approach to maximize long-term training efficiency in heterogeneous environments through uneven data assignment and communication-aware model partitioning. A group-based hierarchical architecture combining DP and MP is developed to balance discrepant computation and communication capabilities, and offer a flexible parallel mechanism. In order to jointly optimize the performance and long-term cost of the online DL training process, we formulate this problem as a stochastic optimization with time-averaged constraints. By utilizing Lyapunov’s stochastic network optimization theory, we decompose it into several instantaneous sub-optimizations, and devise an effective online solution to address them based on tentative searching and linear solving. We have implemented a prototype system and evaluated the effectiveness of our solution based on realistic experiments, reducing batch training time by up to 68.59% over state-of-the-art methods.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 2","pages":"150-167"},"PeriodicalIF":5.6,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cost-Effective and Low-Latency Data Placement in Edge Environment Based on PageRank-Inspired Regional Value 基于pagerank启发区域值的边缘环境低成本低延迟数据放置
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-11-25 DOI: 10.1109/TPDS.2024.3506625
Pengwei Wang;Junye Qiao;Yuying Zhao;Zhijun Ding
Edge storage offers low-latency services to users. However, due to strained edge resources and high costs, enterprises must choose the data that most warrant placement at the edge and place it in the right location. In practice, data exhibit temporal and spatial properties, and variability, which have a significant impact on their placement, but have been largely ignored in research. To address this, we introduce the concept of data temperature, which considers data characteristics over time and space. To consider the influence of spatial relevance among different regions for placing data, inspired by PageRank, we present a model using data temperature to assess the regional value of data, which effectively leverages collaboration within the edge storage system. We also propose a regional value-based algorithm (RVA) that minimizes cost while meeting user response time requirements. By taking into account the correlation between regions, the RVA can achieve lower latency than current methods when creating an equal or even smaller number of replicas. Experimental results validate the efficacy of the proposed method in terms of latency, success rate, and cost efficiency.
边缘存储为用户提供低延迟服务。然而,由于边缘资源紧张和成本高,企业必须选择最值得放置在边缘的数据,并将其放置在正确的位置。在实践中,数据表现出时间和空间特性以及可变性,这对它们的位置有重大影响,但在研究中很大程度上被忽略了。为了解决这个问题,我们引入了数据温度的概念,它考虑了数据随时间和空间的特征。为了考虑不同区域之间空间相关性对数据放置的影响,受PageRank的启发,我们提出了一个使用数据温度来评估数据区域价值的模型,该模型有效地利用了边缘存储系统内部的协作。我们还提出了一种基于区域值的算法(RVA),该算法在满足用户响应时间要求的同时最小化了成本。通过考虑区域之间的相关性,RVA可以在创建相同数量甚至更少数量的副本时实现比当前方法更低的延迟。实验结果验证了该方法在延迟、成功率和成本效率方面的有效性。
{"title":"Cost-Effective and Low-Latency Data Placement in Edge Environment Based on PageRank-Inspired Regional Value","authors":"Pengwei Wang;Junye Qiao;Yuying Zhao;Zhijun Ding","doi":"10.1109/TPDS.2024.3506625","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3506625","url":null,"abstract":"Edge storage offers low-latency services to users. However, due to strained edge resources and high costs, enterprises must choose the data that most warrant placement at the edge and place it in the right location. In practice, data exhibit temporal and spatial properties, and variability, which have a significant impact on their placement, but have been largely ignored in research. To address this, we introduce the concept of data temperature, which considers data characteristics over time and space. To consider the influence of spatial relevance among different regions for placing data, inspired by PageRank, we present a model using data temperature to assess the regional value of data, which effectively leverages collaboration within the edge storage system. We also propose a regional value-based algorithm (RVA) that minimizes cost while meeting user response time requirements. By taking into account the correlation between regions, the RVA can achieve lower latency than current methods when creating an equal or even smaller number of replicas. Experimental results validate the efficacy of the proposed method in terms of latency, success rate, and cost efficiency.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 2","pages":"185-196"},"PeriodicalIF":5.6,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DegaFL: Decentralized Gradient Aggregation for Cross-Silo Federated Learning DegaFL:跨筒仓联邦学习的分散梯度聚合
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-11-18 DOI: 10.1109/TPDS.2024.3501581
Jialiang Han;Yudong Han;Xiang Jing;Gang Huang;Yun Ma
Federated learning (FL) is an emerging promising paradigm of privacy-preserving machine learning (ML). An important type of FL is cross-silo FL, which enables a moderate number of organizations to cooperatively train a shared model by keeping confidential data locally and aggregating gradients on a central parameter server. However, the central server may be vulnerable to malicious attacks or software failures in practice. To address this issue, in this paper, we propose $mathtt{DegaFL} $, a novel decentralized gradient aggregation approach for cross-silo FL. $mathtt{DegaFL} $ eliminates the central server by aggregating gradients on each participant, and maintains and synchronizes gradients of only the current training round. Besides, we propose $mathtt{AdaAgg} $ to adaptively aggregate correct gradients from honest nodes and use HotStuff to ensure the consistency of the training round number and gradients among all nodes. Experimental results show that $mathtt{DegaFL} $ defends against common threat models with minimal accuracy loss, and achieves up to $50times$ reduction in storage overhead and up to $13times$ reduction in network overhead, compared to state-of-the-art decentralized FL approaches.
联邦学习(FL)是一种新兴的有前途的隐私保护机器学习(ML)范式。一个重要的FL类型是跨竖井FL,它允许适量的组织通过在本地保存机密数据和在中心参数服务器上聚合梯度来合作训练共享模型。但是,在实际应用中,中央服务器容易受到恶意攻击或软件故障的影响。为了解决这个问题,在本文中,我们提出了$mathtt{DegaFL} $,这是一种新的用于跨竖井FL的分散梯度聚合方法。$mathtt{DegaFL} $通过聚合每个参与者的梯度来消除中央服务器,并且只维护和同步当前训练轮的梯度。此外,我们提出$mathtt{AdaAgg} $自适应聚合诚实节点的正确梯度,并使用HotStuff保证所有节点之间训练整数和梯度的一致性。实验结果表明,与最先进的去中心化FL方法相比,$mathtt{DegaFL} $以最小的准确性损失防御常见的威胁模型,并实现了高达50美元的存储开销减少和高达13美元的网络开销减少。
{"title":"DegaFL: Decentralized Gradient Aggregation for Cross-Silo Federated Learning","authors":"Jialiang Han;Yudong Han;Xiang Jing;Gang Huang;Yun Ma","doi":"10.1109/TPDS.2024.3501581","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3501581","url":null,"abstract":"Federated learning (FL) is an emerging promising paradigm of privacy-preserving machine learning (ML). An important type of FL is cross-silo FL, which enables a moderate number of organizations to cooperatively train a shared model by keeping confidential data locally and aggregating gradients on a central parameter server. However, the central server may be vulnerable to malicious attacks or software failures in practice. To address this issue, in this paper, we propose \u0000<inline-formula><tex-math>$mathtt{DegaFL} $</tex-math></inline-formula>\u0000, a novel decentralized gradient aggregation approach for cross-silo FL. \u0000<inline-formula><tex-math>$mathtt{DegaFL} $</tex-math></inline-formula>\u0000 eliminates the central server by aggregating gradients on each participant, and maintains and synchronizes gradients of only the current training round. Besides, we propose \u0000<inline-formula><tex-math>$mathtt{AdaAgg} $</tex-math></inline-formula>\u0000 to adaptively aggregate correct gradients from honest nodes and use HotStuff to ensure the consistency of the training round number and gradients among all nodes. Experimental results show that \u0000<inline-formula><tex-math>$mathtt{DegaFL} $</tex-math></inline-formula>\u0000 defends against common threat models with minimal accuracy loss, and achieves up to \u0000<inline-formula><tex-math>$50times$</tex-math></inline-formula>\u0000 reduction in storage overhead and up to \u0000<inline-formula><tex-math>$13times$</tex-math></inline-formula>\u0000 reduction in network overhead, compared to state-of-the-art decentralized FL approaches.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 2","pages":"212-225"},"PeriodicalIF":5.6,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Two-Dimensional Balanced Partitioning and Efficient Caching for Distributed Graph Analysis 分布式图分析的二维均衡分区和高效缓存
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-11-18 DOI: 10.1109/TPDS.2024.3501292
Shuai Lin;Rui Wang;Yongkun Li;Yinlong Xu;John C. S. Lui
Distributed graph analysis usually partitions a large graph into multiple small-sized subgraphs and distributes them into a cluster of machines for computing. Therefore, graph partitioning plays a crucial role in distributed graph analysis. However, the widely used existing graph partitioning schemes balance only in one dimension (number of edges or vertices) or incur a large number of edge cuts, so they degrade the performance of distributed graph analysis. In this article, we propose a novel graph partition scheme BPart and two enhanced algorithms BPart-C and BPart-S to achieve a balanced partition for both vertices and edges, and also reduce the number of edge cuts. Besides, we also propose a neighbor-aware caching scheme to further reduce the number of edge cuts so as to improve the efficiency of distributed graph analysis. Our experimental results show that BPart-C and BPart-S can achieve a better balance in both dimensions (the number of vertices and edges), and meanwhile reducing the number of edge cuts, compared to multiple existing graph partitioning algorithms, i.e., Chunk-V, Chunk-E, Fennel, and Hash. We also integrate these partitioning algorithms into two popular distributed graph systems, KnightKing and Gemini, to validate their impact on graph analysis efficiency. Results show that both BPart-C and BPart-S can significantly reduce the total running time of various graph applications by up to 60% and 70%, respectively. In addition, the neighbor-aware caching scheme can further improve the performance by up to 24%.
分布式图分析通常将一个大的图划分为多个小的子图,并将它们分布到一个机器集群中进行计算。因此,图划分在分布式图分析中起着至关重要的作用。然而,现有广泛使用的图划分方案只能在一维(边数或顶点数)上进行平衡,或者会产生大量的切边,从而降低了分布式图分析的性能。本文提出了一种新的图划分方案BPart和两种增强算法BPart- c和BPart- s,以实现顶点和边的平衡划分,并减少了边的切割次数。此外,我们还提出了一种邻居感知的缓存方案,以进一步减少割边次数,从而提高分布式图分析的效率。我们的实验结果表明,与现有的多种图划分算法(Chunk-V、Chunk-E、Fennel和Hash)相比,BPart-C和BPart-S在两个维度(顶点和边的数量)上实现了更好的平衡,同时减少了切边的次数。我们还将这些划分算法集成到两个流行的分布式图系统中,KnightKing和Gemini,以验证它们对图分析效率的影响。结果表明,BPart-C和BPart-S都能显著减少各种图形应用程序的总运行时间,分别减少60%和70%。此外,邻居感知缓存方案可以进一步提高性能,最高可达24%。
{"title":"Two-Dimensional Balanced Partitioning and Efficient Caching for Distributed Graph Analysis","authors":"Shuai Lin;Rui Wang;Yongkun Li;Yinlong Xu;John C. S. Lui","doi":"10.1109/TPDS.2024.3501292","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3501292","url":null,"abstract":"Distributed graph analysis usually partitions a large graph into multiple small-sized subgraphs and distributes them into a cluster of machines for computing. Therefore, graph partitioning plays a crucial role in distributed graph analysis. However, the widely used existing graph partitioning schemes balance only in one dimension (number of edges or vertices) or incur a large number of edge cuts, so they degrade the performance of distributed graph analysis. In this article, we propose a novel graph partition scheme BPart and two enhanced algorithms BPart-C and BPart-S to achieve a balanced partition for both vertices and edges, and also reduce the number of edge cuts. Besides, we also propose a neighbor-aware caching scheme to further reduce the number of edge cuts so as to improve the efficiency of distributed graph analysis. Our experimental results show that BPart-C and BPart-S can achieve a better balance in both dimensions (the number of vertices and edges), and meanwhile reducing the number of edge cuts, compared to multiple existing graph partitioning algorithms, i.e., Chunk-V, Chunk-E, Fennel, and Hash. We also integrate these partitioning algorithms into two popular distributed graph systems, KnightKing and Gemini, to validate their impact on graph analysis efficiency. Results show that both BPart-C and BPart-S can significantly reduce the total running time of various graph applications by up to 60% and 70%, respectively. In addition, the neighbor-aware caching scheme can further improve the performance by up to 24%.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 2","pages":"133-149"},"PeriodicalIF":5.6,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spreeze: High-Throughput Parallel Reinforcement Learning Framework spreze:高吞吐量并行强化学习框架
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-11-14 DOI: 10.1109/TPDS.2024.3497986
Jing Hou;Guang Chen;Ruiqi Zhang;Zhijun Li;Shangding Gu;Changjun Jiang
The promotion of large-scale applications of reinforcement learning (RL) requires efficient training computation. While existing parallel RL frameworks encompass a variety of RL algorithms and parallelization techniques, the excessively burdensome communication frameworks hinder the attainment of the hardware's limit for final throughput and training effects on a single desktop. In this article, we propose Spreeze, a lightweight parallel framework for RL that efficiently utilizes a single desktop hardware resource to approach the throughput limit. We asynchronously parallelize the experience sampling, network update, performance evaluation, and visualization operations, and employ multiple efficient data transmission techniques to transfer various types of data between processes. The framework can automatically adjust the parallelization hyperparameters based on the computing ability of the hardware device in order to perform efficient large-batch updates. Based on the characteristics of the “Actor-Critic” RL algorithm, our framework uses dual GPUs to independently update the network of actors and critics in order to further improve throughput. Simulation results show that our framework can achieve up to 15,000 Hz experience sampling and 370,000 Hz network update frame rate using only a personal desktop computer, which is an order of magnitude higher than other mainstream parallel RL frameworks, resulting in a 73% reduction of training time. Our work on fully utilizing the hardware resources of a single desktop computer is fundamental to enabling efficient large-scale distributed RL training.
强化学习(RL)的大规模应用需要高效的训练计算。虽然现有的并行强化学习框架包含各种强化学习算法和并行化技术,但过于繁重的通信框架阻碍了在单个桌面上实现最终吞吐量和训练效果的硬件限制。在本文中,我们提出了spreze,这是一个轻量级的RL并行框架,它有效地利用单个桌面硬件资源来接近吞吐量限制。我们将经验采样、网络更新、性能评估和可视化操作异步并行化,并采用多种高效的数据传输技术在进程之间传输各种类型的数据。该框架可以根据硬件设备的计算能力自动调整并行化超参数,以实现高效的大批量更新。基于“演员-评论家”强化学习算法的特点,我们的框架使用双gpu独立更新演员和评论家的网络,以进一步提高吞吐量。仿真结果表明,我们的框架仅在一台个人台式计算机上就可以实现高达15,000 Hz的体验采样和370,000 Hz的网络更新帧率,比其他主流并行RL框架高出一个数量级,从而减少73%的训练时间。我们在充分利用单个台式计算机硬件资源方面的工作是实现高效大规模分布式强化学习训练的基础。
{"title":"Spreeze: High-Throughput Parallel Reinforcement Learning Framework","authors":"Jing Hou;Guang Chen;Ruiqi Zhang;Zhijun Li;Shangding Gu;Changjun Jiang","doi":"10.1109/TPDS.2024.3497986","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3497986","url":null,"abstract":"The promotion of large-scale applications of reinforcement learning (RL) requires efficient training computation. While existing parallel RL frameworks encompass a variety of RL algorithms and parallelization techniques, the excessively burdensome communication frameworks hinder the attainment of the hardware's limit for final throughput and training effects on a single desktop. In this article, we propose Spreeze, a lightweight parallel framework for RL that efficiently utilizes a single desktop hardware resource to approach the throughput limit. We asynchronously parallelize the experience sampling, network update, performance evaluation, and visualization operations, and employ multiple efficient data transmission techniques to transfer various types of data between processes. The framework can automatically adjust the parallelization hyperparameters based on the computing ability of the hardware device in order to perform efficient large-batch updates. Based on the characteristics of the “Actor-Critic” RL algorithm, our framework uses dual GPUs to independently update the network of actors and critics in order to further improve throughput. Simulation results show that our framework can achieve up to 15,000 Hz experience sampling and 370,000 Hz network update frame rate using only a personal desktop computer, which is an order of magnitude higher than other mainstream parallel RL frameworks, resulting in a 73% reduction of training time. Our work on fully utilizing the hardware resources of a single desktop computer is fundamental to enabling efficient large-scale distributed RL training.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 2","pages":"282-292"},"PeriodicalIF":5.6,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142890380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Impact of Service Demand Variability on Data Center Performance 服务需求变化对数据中心性能的影响
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-11-14 DOI: 10.1109/TPDS.2024.3497792
Diletta Olliaro;Adityo Anggraito;Marco Ajmone Marsan;Simonetta Balsamo;Andrea Marin
Modern data centers feature an extensive array of cores that handle quite a diverse range of jobs. Recent traces, shared by leading cloud data center enterprises like Google and Alibaba, reveal that the constant increase in data center services and computational power is accompanied by a growing variability in service demand requirements. The number of cores needed for a job can vary widely, ranging from one to several thousands, and the number of seconds a core is held by a job can span more than five orders of magnitude. In this context of extreme variability, the policies governing the allocation of cores to jobs play a crucial role in the performance of data centers. It is widely acknowledged that the First-In First-Out (FIFO) policy tends to underutilize available computing capacity due to the varying magnitudes of core requests. However, the impact of the extreme variability in service demands on job waiting and response times, that has been deeply investigated in traditional queuing models, is not as well understood in the case of data centers, as we will show. To address this issue, we investigate the dynamics of a data center cluster through analytical models in simple cases, and discrete event simulations based on real data. Our findings emphasize the significant impact of service demand variability, both in terms of requested cores and service times, and allow us to provide insight for enhancing data center performance. In particular, we show how data center performance can be improved thanks to the control of the interplay between service and waiting times through the assignment of cores to jobs.
现代数据中心具有广泛的核心阵列,可以处理各种各样的任务。b谷歌和阿里巴巴等领先的云数据中心企业最近分享的痕迹显示,随着数据中心服务和计算能力的不断增加,服务需求需求的变化也越来越大。一个作业所需的核数可以相差很大,从一个到几千个不等,一个作业占用一个核的秒数可以超过五个数量级。在这种极端可变性的环境中,管理向作业分配核心的策略在数据中心的性能中起着至关重要的作用。人们普遍认为,由于核心请求的大小不同,先进先出(FIFO)策略倾向于充分利用可用的计算能力。然而,服务需求的极端可变性对工作等待和响应时间的影响在传统排队模型中已经得到了深入的研究,但在数据中心的情况下却没有得到很好的理解,正如我们将展示的那样。为了解决这个问题,我们通过简单情况下的分析模型和基于真实数据的离散事件模拟来研究数据中心集群的动态。我们的研究结果强调了服务需求可变性的重大影响,包括所请求的核心和服务时间,并允许我们提供增强数据中心性能的见解。特别是,我们将展示如何通过将核心分配给作业来控制服务和等待时间之间的相互作用,从而提高数据中心的性能。
{"title":"The Impact of Service Demand Variability on Data Center Performance","authors":"Diletta Olliaro;Adityo Anggraito;Marco Ajmone Marsan;Simonetta Balsamo;Andrea Marin","doi":"10.1109/TPDS.2024.3497792","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3497792","url":null,"abstract":"Modern data centers feature an extensive array of cores that handle quite a diverse range of jobs. Recent traces, shared by leading cloud data center enterprises like Google and Alibaba, reveal that the constant increase in data center services and computational power is accompanied by a growing variability in service demand requirements. The number of cores needed for a job can vary widely, ranging from one to several thousands, and the number of seconds a core is held by a job can span more than five orders of magnitude. In this context of extreme variability, the policies governing the allocation of cores to jobs play a crucial role in the performance of data centers. It is widely acknowledged that the First-In First-Out (FIFO) policy tends to underutilize available computing capacity due to the varying magnitudes of core requests. However, the impact of the extreme variability in service demands on job waiting and response times, that has been deeply investigated in traditional queuing models, is not as well understood in the case of data centers, as we will show. To address this issue, we investigate the dynamics of a data center cluster through analytical models in simple cases, and discrete event simulations based on real data. Our findings emphasize the significant impact of service demand variability, both in terms of requested cores and service times, and allow us to provide insight for enhancing data center performance. In particular, we show how data center performance can be improved thanks to the control of the interplay between service and waiting times through the assignment of cores to jobs.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 2","pages":"120-132"},"PeriodicalIF":5.6,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10753043","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Retrospecting Available CPU Resources: SMT-Aware Scheduling to Prevent SLA Violations in Data Centers 回溯可用的 CPU 资源:在数据中心防止违反服务水平协议的 SMT 感知调度
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-11-08 DOI: 10.1109/TPDS.2024.3494879
Haoyu Liao;Tong-yu Liu;Jianmei Guo;Bo Huang;Dingyu Yang;Jonathan Ding
The article focuses on an understudied yet fundamental problem: existing methods typically average the utilization of multiple hardware threads to evaluate the available CPU resources. However, the approach could underestimate the actual usage of the underlying physical core for Simultaneous Multi-Threading (SMT) processors, leading to an overestimation of remaining resources. The overestimation propagates from microarchitecture to operating systems and cloud schedulers, which may misguide scheduling decisions, exacerbate CPU overcommitment, and increase Service Level Agreement (SLA) violations. To address the potential overestimation problem, we propose an SMT-aware and purely data-driven approach named Remaining CPU (RCPU) that reserves more CPU resources to restrict CPU overcommitment and prevent SLA violations. RCPU requires only a few modifications to the existing cloud infrastructures and can be scaled up to large data centers. Extensive evaluations in the data center proved that RCPU contributes to a reduction of SLA violations by 18% on average for 98% of all latency-sensitive applications. Under a benchmarking experiment, we prove that RCPU increases the accuracy by 69% in terms of Mean Absolute Error (MAE) compared to the state-of-the-art.
文章重点讨论了一个未被充分研究的基本问题:现有方法通常通过平均多个硬件线程的利用率来评估可用的 CPU 资源。然而,这种方法可能会低估同时多线程(SMT)处理器底层物理内核的实际使用率,从而导致高估剩余资源。这种高估会从微体系结构传播到操作系统和云调度程序,可能会误导调度决策,加剧 CPU 的超负荷,并增加违反服务水平协议(SLA)的情况。为了解决潜在的高估问题,我们提出了一种 SMT 感知和纯数据驱动的方法,即剩余 CPU(RCPU),它可以保留更多的 CPU 资源,以限制 CPU 过度分配并防止违反 SLA。RCPU 只需对现有云基础设施进行少量修改,即可扩展到大型数据中心。在数据中心进行的广泛评估证明,对于 98% 的延迟敏感型应用而言,RCPU 可将违反 SLA 的情况平均减少 18%。在基准实验中,我们证明 RCPU 在平均绝对误差 (MAE) 方面比最先进的技术提高了 69%。
{"title":"Retrospecting Available CPU Resources: SMT-Aware Scheduling to Prevent SLA Violations in Data Centers","authors":"Haoyu Liao;Tong-yu Liu;Jianmei Guo;Bo Huang;Dingyu Yang;Jonathan Ding","doi":"10.1109/TPDS.2024.3494879","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3494879","url":null,"abstract":"The article focuses on an understudied yet fundamental problem: existing methods typically average the utilization of multiple hardware threads to evaluate the available CPU resources. However, the approach could underestimate the actual usage of the underlying physical core for Simultaneous Multi-Threading (SMT) processors, leading to an overestimation of remaining resources. The overestimation propagates from microarchitecture to operating systems and cloud schedulers, which may misguide scheduling decisions, exacerbate CPU overcommitment, and increase Service Level Agreement (SLA) violations. To address the potential overestimation problem, we propose an SMT-aware and purely data-driven approach named \u0000<italic>Remaining CPU</i>\u0000 (RCPU) that reserves more CPU resources to restrict CPU overcommitment and prevent SLA violations. RCPU requires only a few modifications to the existing cloud infrastructures and can be scaled up to large data centers. Extensive evaluations in the data center proved that RCPU contributes to a reduction of SLA violations by 18% on average for 98% of all latency-sensitive applications. Under a benchmarking experiment, we prove that RCPU increases the accuracy by 69% in terms of Mean Absolute Error (MAE) compared to the state-of-the-art.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 1","pages":"67-83"},"PeriodicalIF":5.6,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142736347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Balanced Splitting: A Framework for Achieving Zero-Wait in the Multiserver-Job Model 平衡拆分:在多服务器任务模型中实现零等待的框架
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-11-07 DOI: 10.1109/TPDS.2024.3493631
Jonatha Anselmi;Josu Doncel
We present a new framework for designing nonpreemptive and job-size oblivious scheduling policies in the multiserver-job queueing model. The main requirement is to identify a static and balanced sub-partition of the server set and ensure that the servers in each set of that sub-partition can only handle jobs of a given class and in a first-come first-served order. A job class is determined by the number of servers to which it has exclusive access during its entire execution and the probability distribution of its service time. This approach aims to reduce delays by preventing small jobs from being blocked by larger ones that arrived first, and it is particularly beneficial when the job size variability intra resp. inter classes is small resp. large. In this setting, we propose a new scheduling policy, Balanced-Splitting. In our main results, we provide a sufficient condition for the stability of Balanced-Splitting and show that the resulting queueing probability, i.e., the probability that an arriving job needs to wait for processing upon arrival, vanishes in both the subcritical (the load is kept fixed to a constant less than one) and critical (the load approaches one from below) many-server limiting regimes. Crucial to our analysis is a connection with the M/GI/$s$/$s$ queue and Erlang’s loss formula, which allows our analysis to rely on fundamental results from queueing theory. Numerical simulations show that the proposed policy performs better than several preemptive/nonpreemptive size-aware/oblivious policies in various practical scenarios. This is also confirmed by simulations running on real traces from High Performance Computing (HPC) workloads. The delays induced by Balanced-Splitting are also competitive with those induced by state-of-the-art policies such as First-Fit-SRPT and ServerFilling-SRPT, though our approach has the advantage of not requiring preemption, nor the knowledge of job sizes.
我们提出了一个新框架,用于在多服务器作业队列模型中设计非抢占式和作业大小忽略式调度策略。主要要求是确定服务器集的静态平衡子分区,并确保该子分区中的每一组服务器只能按先到先得的顺序处理给定类别的作业。作业类别由作业在整个执行过程中可独占访问的服务器数量及其服务时间的概率分布决定。这种方法的目的是防止小作业被先到的大作业阻塞,从而减少延迟。在这种情况下,我们提出了一种新的调度策略--平衡拆分。在我们的主要结果中,我们提供了平衡拆分法稳定性的充分条件,并证明了由此产生的排队概率,即到达的作业在到达后需要等待处理的概率,在亚临界(负载固定为小于 1 的常数)和临界(负载从下往上接近 1)多服务器极限状态下都会消失。我们的分析与 M/GI/$s$/$s$ 队列和 Erlang 损失公式之间的联系至关重要,这使得我们的分析可以依赖于队列理论的基本结果。数值模拟表明,在各种实际情况下,建议的策略比几种抢占式/非抢占式大小感知/盲目策略性能更好。在高性能计算(HPC)工作负载的真实轨迹上运行的仿真也证实了这一点。尽管我们的方法具有无需抢占、无需了解作业大小的优势,但平衡拆分引发的延迟与最先进的策略(如 First-Fit-SRPT 和 ServerFilling-SRPT)相比也具有竞争力。
{"title":"Balanced Splitting: A Framework for Achieving Zero-Wait in the Multiserver-Job Model","authors":"Jonatha Anselmi;Josu Doncel","doi":"10.1109/TPDS.2024.3493631","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3493631","url":null,"abstract":"We present a new framework for designing nonpreemptive and job-size oblivious scheduling policies in the multiserver-job queueing model. The main requirement is to identify a \u0000<i>static and balanced sub-partition</i>\u0000 of the server set and ensure that the servers in each set of that sub-partition can only handle jobs of a given \u0000<i>class</i>\u0000 and in a first-come first-served order. A job class is determined by the number of servers to which it has exclusive access during its entire execution and the probability distribution of its service time. This approach aims to reduce delays by preventing small jobs from being blocked by larger ones that arrived first, and it is particularly beneficial when the job size variability intra resp. inter classes is small resp. large. In this setting, we propose a new scheduling policy, Balanced-Splitting. In our main results, we provide a sufficient condition for the stability of Balanced-Splitting and show that the resulting queueing probability, i.e., the probability that an arriving job needs to wait for processing upon arrival, vanishes in both the subcritical (the load is kept fixed to a constant less than one) and critical (the load approaches one from below) many-server limiting regimes. Crucial to our analysis is a connection with the M/GI/\u0000<inline-formula><tex-math>$s$</tex-math></inline-formula>\u0000/\u0000<inline-formula><tex-math>$s$</tex-math></inline-formula>\u0000 queue and Erlang’s loss formula, which allows our analysis to rely on fundamental results from queueing theory. Numerical simulations show that the proposed policy performs better than several preemptive/nonpreemptive size-aware/oblivious policies in various practical scenarios. This is also confirmed by simulations running on real traces from High Performance Computing (HPC) workloads. The delays induced by Balanced-Splitting are also competitive with those induced by state-of-the-art policies such as First-Fit-SRPT and ServerFilling-SRPT, though our approach has the advantage of not requiring preemption, nor the knowledge of job sizes.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 1","pages":"43-54"},"PeriodicalIF":5.6,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Edge Data Deduplication Under Uncertainties: A Robust Optimization Approach 不确定情况下的边缘重复数据删除:稳健的优化方法
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-11-07 DOI: 10.1109/TPDS.2024.3493959
Ruikun Luo;Qiang He;Mengxi Xu;Feifei Chen;Song Wu;Jing Yang;Yuan Gao;Hai Jin
The emergence of mobile edge computing (MEC) in distributed systems has sparked increased attention toward edge data management. A conflict arises from the disparity between limited edge resources and the continuously expanding data requests for data storage, making the reduction of data storage costs a critical objective. Despite the extensive studies of edge data deduplication as a data reduction technique, existing deduplication methods encounter numerous challenges in MEC environments. These challenges stem from disparities between edge servers and cloud data center edge servers, as well as uncertainties such as user mobility, leading to insufficient robustness in deduplication decision-making. Consequently, this paper presents a robust optimization-based approach for the edge data deduplication problem. By accounting for uncertainties including the number of data requirements and edge server failures, we propose two distinct solving algorithms: uEDDE-C, a two-stage algorithm based on column-and-constraint generation, and uEDDE-A, an approximation algorithm to address the high computation overhead of uEDDE-C. Our method facilitates efficient data deduplication in volatile edge network environments and maintains robustness across various uncertain scenarios. We validate the performance and robustness of uEDDE-C and uEDDE-A through theoretical analysis and experimental evaluations. The extensive experimental results demonstrate that our approach significantly reduces data storage cost and data retrieval latency while ensuring reliability in real-world MEC environments.
分布式系统中出现的移动边缘计算(MEC)引发了人们对边缘数据管理的更多关注。有限的边缘资源和不断扩大的数据存储需求之间的矛盾使降低数据存储成本成为一个关键目标。尽管对作为数据削减技术的边缘重复数据删除进行了广泛研究,但现有的重复数据删除方法在 MEC 环境中遇到了许多挑战。这些挑战源于边缘服务器和云数据中心边缘服务器之间的差异,以及用户移动性等不确定性,导致重复数据删除决策的鲁棒性不足。因此,本文针对边缘重复数据删除问题提出了一种基于稳健优化的方法。通过考虑数据需求数量和边缘服务器故障等不确定因素,我们提出了两种不同的求解算法:uEDDE-C(一种基于列和约束生成的两阶段算法)和 uEDDE-A(一种近似算法,用于解决 uEDDE-C 的高计算开销问题)。我们的方法有助于在不稳定的边缘网络环境中实现高效的重复数据删除,并在各种不确定场景中保持稳健性。我们通过理论分析和实验评估验证了 uEDDE-C 和 uEDDE-A 的性能和鲁棒性。大量实验结果表明,我们的方法显著降低了数据存储成本和数据检索延迟,同时确保了真实世界 MEC 环境中的可靠性。
{"title":"Edge Data Deduplication Under Uncertainties: A Robust Optimization Approach","authors":"Ruikun Luo;Qiang He;Mengxi Xu;Feifei Chen;Song Wu;Jing Yang;Yuan Gao;Hai Jin","doi":"10.1109/TPDS.2024.3493959","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3493959","url":null,"abstract":"The emergence of \u0000<italic>mobile edge computing</i>\u0000 (MEC) in distributed systems has sparked increased attention toward edge data management. A conflict arises from the disparity between limited edge resources and the continuously expanding data requests for data storage, making the reduction of data storage costs a critical objective. Despite the extensive studies of edge data deduplication as a data reduction technique, existing deduplication methods encounter numerous challenges in MEC environments. These challenges stem from disparities between edge servers and cloud data center edge servers, as well as uncertainties such as user mobility, leading to insufficient robustness in deduplication decision-making. Consequently, this paper presents a robust optimization-based approach for the edge data deduplication problem. By accounting for uncertainties including the number of data requirements and edge server failures, we propose two distinct solving algorithms: uEDDE-C, a two-stage algorithm based on column-and-constraint generation, and uEDDE-A, an approximation algorithm to address the high computation overhead of uEDDE-C. Our method facilitates efficient data deduplication in volatile edge network environments and maintains robustness across various uncertain scenarios. We validate the performance and robustness of uEDDE-C and uEDDE-A through theoretical analysis and experimental evaluations. The extensive experimental results demonstrate that our approach significantly reduces data storage cost and data retrieval latency while ensuring reliability in real-world MEC environments.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 1","pages":"84-95"},"PeriodicalIF":5.6,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10747105","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142736348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Parallel and Distributed Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1