首页 > 最新文献

IEEE Transactions on Parallel and Distributed Systems最新文献

英文 中文
Balanced Splitting: A Framework for Achieving Zero-Wait in the Multiserver-Job Model 平衡拆分:在多服务器任务模型中实现零等待的框架
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-11-07 DOI: 10.1109/TPDS.2024.3493631
Jonatha Anselmi;Josu Doncel
We present a new framework for designing nonpreemptive and job-size oblivious scheduling policies in the multiserver-job queueing model. The main requirement is to identify a static and balanced sub-partition of the server set and ensure that the servers in each set of that sub-partition can only handle jobs of a given class and in a first-come first-served order. A job class is determined by the number of servers to which it has exclusive access during its entire execution and the probability distribution of its service time. This approach aims to reduce delays by preventing small jobs from being blocked by larger ones that arrived first, and it is particularly beneficial when the job size variability intra resp. inter classes is small resp. large. In this setting, we propose a new scheduling policy, Balanced-Splitting. In our main results, we provide a sufficient condition for the stability of Balanced-Splitting and show that the resulting queueing probability, i.e., the probability that an arriving job needs to wait for processing upon arrival, vanishes in both the subcritical (the load is kept fixed to a constant less than one) and critical (the load approaches one from below) many-server limiting regimes. Crucial to our analysis is a connection with the M/GI/$s$/$s$ queue and Erlang’s loss formula, which allows our analysis to rely on fundamental results from queueing theory. Numerical simulations show that the proposed policy performs better than several preemptive/nonpreemptive size-aware/oblivious policies in various practical scenarios. This is also confirmed by simulations running on real traces from High Performance Computing (HPC) workloads. The delays induced by Balanced-Splitting are also competitive with those induced by state-of-the-art policies such as First-Fit-SRPT and ServerFilling-SRPT, though our approach has the advantage of not requiring preemption, nor the knowledge of job sizes.
我们提出了一个新框架,用于在多服务器作业队列模型中设计非抢占式和作业大小忽略式调度策略。主要要求是确定服务器集的静态平衡子分区,并确保该子分区中的每一组服务器只能按先到先得的顺序处理给定类别的作业。作业类别由作业在整个执行过程中可独占访问的服务器数量及其服务时间的概率分布决定。这种方法的目的是防止小作业被先到的大作业阻塞,从而减少延迟。在这种情况下,我们提出了一种新的调度策略--平衡拆分。在我们的主要结果中,我们提供了平衡拆分法稳定性的充分条件,并证明了由此产生的排队概率,即到达的作业在到达后需要等待处理的概率,在亚临界(负载固定为小于 1 的常数)和临界(负载从下往上接近 1)多服务器极限状态下都会消失。我们的分析与 M/GI/$s$/$s$ 队列和 Erlang 损失公式之间的联系至关重要,这使得我们的分析可以依赖于队列理论的基本结果。数值模拟表明,在各种实际情况下,建议的策略比几种抢占式/非抢占式大小感知/盲目策略性能更好。在高性能计算(HPC)工作负载的真实轨迹上运行的仿真也证实了这一点。尽管我们的方法具有无需抢占、无需了解作业大小的优势,但平衡拆分引发的延迟与最先进的策略(如 First-Fit-SRPT 和 ServerFilling-SRPT)相比也具有竞争力。
{"title":"Balanced Splitting: A Framework for Achieving Zero-Wait in the Multiserver-Job Model","authors":"Jonatha Anselmi;Josu Doncel","doi":"10.1109/TPDS.2024.3493631","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3493631","url":null,"abstract":"We present a new framework for designing nonpreemptive and job-size oblivious scheduling policies in the multiserver-job queueing model. The main requirement is to identify a \u0000<i>static and balanced sub-partition</i>\u0000 of the server set and ensure that the servers in each set of that sub-partition can only handle jobs of a given \u0000<i>class</i>\u0000 and in a first-come first-served order. A job class is determined by the number of servers to which it has exclusive access during its entire execution and the probability distribution of its service time. This approach aims to reduce delays by preventing small jobs from being blocked by larger ones that arrived first, and it is particularly beneficial when the job size variability intra resp. inter classes is small resp. large. In this setting, we propose a new scheduling policy, Balanced-Splitting. In our main results, we provide a sufficient condition for the stability of Balanced-Splitting and show that the resulting queueing probability, i.e., the probability that an arriving job needs to wait for processing upon arrival, vanishes in both the subcritical (the load is kept fixed to a constant less than one) and critical (the load approaches one from below) many-server limiting regimes. Crucial to our analysis is a connection with the M/GI/\u0000<inline-formula><tex-math>$s$</tex-math></inline-formula>\u0000/\u0000<inline-formula><tex-math>$s$</tex-math></inline-formula>\u0000 queue and Erlang’s loss formula, which allows our analysis to rely on fundamental results from queueing theory. Numerical simulations show that the proposed policy performs better than several preemptive/nonpreemptive size-aware/oblivious policies in various practical scenarios. This is also confirmed by simulations running on real traces from High Performance Computing (HPC) workloads. The delays induced by Balanced-Splitting are also competitive with those induced by state-of-the-art policies such as First-Fit-SRPT and ServerFilling-SRPT, though our approach has the advantage of not requiring preemption, nor the knowledge of job sizes.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 1","pages":"43-54"},"PeriodicalIF":5.6,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ripple: Enabling Decentralized Data Deduplication at the Edge 瑞波:在边缘实现去中心化重复数据删除
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-11-07 DOI: 10.1109/TPDS.2024.3493953
Ruikun Luo;Qiang He;Feifei Chen;Song Wu;Hai Jin;Yun Yang
With its advantages in ensuring low data retrieval latency and reducing backhaul network traffic, edge computing is becoming a backbone solution for many latency-sensitive applications. An increasingly large number of data is being generated at the edge, stretching the limited capacity of edge storage systems. Improving resource utilization for edge storage systems has become a significant challenge in recent years. Existing solutions attempt to achieve this goal through data placement optimization, data partitioning, data sharing, etc. These approaches overlook the data redundancy in edge storage systems, which produces substantial storage resource wastage. This motivates the need for an approach for data deduplication at the edge. However, existing data deduplication methods rely on centralized control, which is not always feasible in practical edge computing environments. This article presents Ripple, the first approach that enables edge servers to deduplicate their data in a decentralized manner. At its core, it builds a data index for each edge server, enabling them to deduplicate data without central control. With Ripple, edge servers can 1) identify data duplicates; 2) remove redundant data without violating data retrieval latency constraints; and 3) ensure data availability after deduplication. The results of trace-driven experiments conducted in a testbed system demonstrate the usefulness of Ripple in practice. Compared with the state-of-the-art approach, Ripple improves the deduplication ratio by up to 16.79% and reduces data retrieval latency by an average of 60.42%.
边缘计算具有确保低数据检索延迟和减少回程网络流量的优势,正在成为许多延迟敏感型应用的骨干解决方案。越来越多的数据在边缘产生,使边缘存储系统的有限容量捉襟见肘。近年来,提高边缘存储系统的资源利用率已成为一项重大挑战。现有的解决方案试图通过数据放置优化、数据分区、数据共享等来实现这一目标。这些方法忽视了边缘存储系统中的数据冗余,从而造成大量存储资源浪费。因此,我们需要一种边缘重复数据删除方法。然而,现有的重复数据删除方法依赖于集中控制,这在实际的边缘计算环境中并不总是可行的。本文介绍的 Ripple 是第一种能让边缘服务器以分散方式重复数据删除的方法。它的核心是为每个边缘服务器建立一个数据索引,使它们能够在没有中央控制的情况下重复数据。利用 Ripple,边缘服务器可以:1)识别重复数据;2)在不违反数据检索延迟约束的情况下删除冗余数据;3)确保重复数据删除后的数据可用性。在测试平台系统中进行的跟踪实验结果证明了 Ripple 在实践中的实用性。与最先进的方法相比,Ripple 将重复数据删除率提高了 16.79%,并将数据检索延迟平均降低了 60.42%。
{"title":"Ripple: Enabling Decentralized Data Deduplication at the Edge","authors":"Ruikun Luo;Qiang He;Feifei Chen;Song Wu;Hai Jin;Yun Yang","doi":"10.1109/TPDS.2024.3493953","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3493953","url":null,"abstract":"With its advantages in ensuring low data retrieval latency and reducing backhaul network traffic, edge computing is becoming a backbone solution for many latency-sensitive applications. An increasingly large number of data is being generated at the edge, stretching the limited capacity of edge storage systems. Improving resource utilization for edge storage systems has become a significant challenge in recent years. Existing solutions attempt to achieve this goal through data placement optimization, data partitioning, data sharing, etc. These approaches overlook the data redundancy in edge storage systems, which produces substantial storage resource wastage. This motivates the need for an approach for data deduplication at the edge. However, existing data deduplication methods rely on centralized control, which is not always feasible in practical edge computing environments. This article presents Ripple, the first approach that enables edge servers to deduplicate their data in a decentralized manner. At its core, it builds a data index for each edge server, enabling them to deduplicate data without central control. With Ripple, edge servers can 1) identify data duplicates; 2) remove redundant data without violating data retrieval latency constraints; and 3) ensure data availability after deduplication. The results of trace-driven experiments conducted in a testbed system demonstrate the usefulness of Ripple in practice. Compared with the state-of-the-art approach, Ripple improves the deduplication ratio by up to 16.79% and reduces data retrieval latency by an average of 60.42%.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 1","pages":"55-66"},"PeriodicalIF":5.6,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10747114","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EdgeHydra: Fault-Tolerant Edge Data Distribution Based on Erasure Coding EdgeHydra:基于消除编码的容错边缘数据分发
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-11-07 DOI: 10.1109/TPDS.2024.3493034
Qiang He;Guobiao Zhang;Jiawei Wang;Ruikun Luo;Xiaohai Dai;Yuchong Hu;Feifei Chen;Hai Jin;Yun Yang
In the edge computing environment, app vendors can distribute popular data from the cloud to edge servers to provide low-latency data retrieval. A key problem is how to distribute these data from the cloud to edge servers cost-effectively. Under current schemes, a file is divided into some data blocks for parallel transmissions from the cloud to target edge servers. Edge servers can then combine received data blocks to reconstruct the file. While this method expedites the data distribution process, it presents potential drawbacks. It is sensitive to transmission delays and transmission failures caused by runtime exceptions like network fluctuations and server failures. This paper presents EdgeHydra, the first edge data distribution scheme that tackles this challenge through fault tolerance based on erasure coding. Under EdgeHydra, a file is encoded into data blocks and parity blocks for parallel transmission from the cloud to target edge servers. An edge server can reconstruct the file upon the receipt of a sufficient number of these blocks without having to wait for all the blocks in transmission. It also innovatively employs a leaderless block supplement mechanism to ensure the receipt of sufficient blocks for individual target edge servers. These improve the robustness of the data distribution process significantly. Extensive experiments show that EdgeHydra can tolerate delays and failures in individual transmission links effectively, outperforming the state-of-the-art scheme by up to 50.54% in distribution time.
在边缘计算环境中,应用程序供应商可以将流行数据从云端分发到边缘服务器,以提供低延迟数据检索。一个关键问题是如何经济高效地将这些数据从云端分发到边缘服务器。根据目前的方案,一个文件会被分成若干数据块,从云端并行传输到目标边缘服务器。然后,边缘服务器可以将接收到的数据块组合起来,重建文件。虽然这种方法加快了数据分发过程,但也存在潜在的缺点。它对网络波动和服务器故障等运行时异常情况造成的传输延迟和传输失败很敏感。本文介绍的 EdgeHydra 是首个边缘数据分发方案,它通过基于擦除编码的容错来应对这一挑战。在 EdgeHydra 中,文件被编码成数据块和奇偶校验块,从云并行传输到目标边缘服务器。边缘服务器在收到足够数量的数据块后就能重建文件,而无需等待传输中的所有数据块。它还创新性地采用了无领导块补充机制,以确保各个目标边缘服务器收到足够的块。这些都大大提高了数据分发过程的稳健性。大量实验表明,EdgeHydra 可以有效地容忍单个传输链路的延迟和故障,在分发时间上比最先进的方案最多可节省 50.54%。
{"title":"EdgeHydra: Fault-Tolerant Edge Data Distribution Based on Erasure Coding","authors":"Qiang He;Guobiao Zhang;Jiawei Wang;Ruikun Luo;Xiaohai Dai;Yuchong Hu;Feifei Chen;Hai Jin;Yun Yang","doi":"10.1109/TPDS.2024.3493034","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3493034","url":null,"abstract":"In the edge computing environment, app vendors can distribute popular data from the cloud to edge servers to provide low-latency data retrieval. A key problem is how to distribute these data from the cloud to edge servers cost-effectively. Under current schemes, a file is divided into some data blocks for parallel transmissions from the cloud to target edge servers. Edge servers can then combine received data blocks to reconstruct the file. While this method expedites the data distribution process, it presents potential drawbacks. It is sensitive to transmission delays and transmission failures caused by runtime exceptions like network fluctuations and server failures. This paper presents EdgeHydra, the first edge data distribution scheme that tackles this challenge through fault tolerance based on erasure coding. Under EdgeHydra, a file is encoded into data blocks and parity blocks for parallel transmission from the cloud to target edge servers. An edge server can reconstruct the file upon the receipt of a sufficient number of these blocks without having to wait for all the blocks in transmission. It also innovatively employs a leaderless block supplement mechanism to ensure the receipt of sufficient blocks for individual target edge servers. These improve the robustness of the data distribution process significantly. Extensive experiments show that EdgeHydra can tolerate delays and failures in individual transmission links effectively, outperforming the state-of-the-art scheme by up to 50.54% in distribution time.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 1","pages":"29-42"},"PeriodicalIF":5.6,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10746622","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real Relative Encoding Genetic Algorithm for Workflow Scheduling in Heterogeneous Distributed Computing Systems 异构分布式计算系统中工作流调度的真实相对编码遗传算法
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-11-06 DOI: 10.1109/TPDS.2024.3492210
Junqiang Jiang;Zhifang Sun;Ruiqi Lu;Li Pan;Zebo Peng
This paper introduces a novel Real Relative encoding Genetic Algorithm (R$^{2}$GA) to tackle the workflow scheduling problem in heterogeneous distributed computing systems (HDCS). R$^{2}$GA employs a unique encoding mechanism, using real numbers to represent the relative positions of tasks in the schedulable task set. Decoding is performed by interpreting these real numbers in relation to the directed acyclic graph (DAG) of the workflow. This approach ensures that any sequence of randomly generated real numbers, produced by cross-over and mutation operations, can always be decoded into a valid solution, as the precedence constraints between tasks are explicitly defined by the DAG. The proposed encoding and decoding mechanism simplifies genetic operations and facilitates efficient exploration of the solution space. This inherent flexibility also allows R$^{2}$GA to be easily adapted to various optimization scenarios in workflow scheduling within HDCS. Additionally, R$^{2}$GA overcomes several issues associated with traditional genetic algorithms (GAs) and existing real-number encoding GAs, such as the generation of chromosomes that violate task precedence constraints and the strict limitations on gene value ranges. Experimental results show that R$^{2}$GA consistently delivers superior performance in terms of solution quality and efficiency compared to existing techniques.
本文介绍了一种新颖的实数相对编码遗传算法(R$^{2}$GA),用于解决异构分布式计算系统(HDCS)中的工作流调度问题。R$^{2}$GA 采用独特的编码机制,用实数表示可调度任务集中任务的相对位置。解码是根据工作流的有向无环图(DAG)来解释这些实数的。这种方法可确保任何由交叉和突变操作随机生成的实数序列总能被解码为有效的解决方案,因为任务之间的优先级约束是由 DAG 明确定义的。所提出的编码和解码机制简化了遗传操作,有利于高效探索解空间。这种固有的灵活性也使得 R$^{2}$GA 可以轻松适应 HDCS 中工作流调度的各种优化方案。此外,R$^{2}$GA 还克服了与传统遗传算法(GA)和现有实数编码 GA 相关的几个问题,如生成的染色体违反任务优先级约束以及基因值范围的严格限制。实验结果表明,与现有技术相比,R$^{2}$GA 在解决方案的质量和效率方面始终表现出色。
{"title":"Real Relative Encoding Genetic Algorithm for Workflow Scheduling in Heterogeneous Distributed Computing Systems","authors":"Junqiang Jiang;Zhifang Sun;Ruiqi Lu;Li Pan;Zebo Peng","doi":"10.1109/TPDS.2024.3492210","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3492210","url":null,"abstract":"This paper introduces a novel Real Relative encoding Genetic Algorithm (R\u0000<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>\u0000GA) to tackle the workflow scheduling problem in heterogeneous distributed computing systems (HDCS). R\u0000<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>\u0000GA employs a unique encoding mechanism, using real numbers to represent the relative positions of tasks in the schedulable task set. Decoding is performed by interpreting these real numbers in relation to the directed acyclic graph (DAG) of the workflow. This approach ensures that any sequence of randomly generated real numbers, produced by cross-over and mutation operations, can always be decoded into a valid solution, as the precedence constraints between tasks are explicitly defined by the DAG. The proposed encoding and decoding mechanism simplifies genetic operations and facilitates efficient exploration of the solution space. This inherent flexibility also allows R\u0000<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>\u0000GA to be easily adapted to various optimization scenarios in workflow scheduling within HDCS. Additionally, R\u0000<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>\u0000GA overcomes several issues associated with traditional genetic algorithms (GAs) and existing real-number encoding GAs, such as the generation of chromosomes that violate task precedence constraints and the strict limitations on gene value ranges. Experimental results show that R\u0000<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>\u0000GA consistently delivers superior performance in terms of solution quality and efficiency compared to existing techniques.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 1","pages":"1-14"},"PeriodicalIF":5.6,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DyLaClass: Dynamic Labeling Based Classification for Optimal Sparse Matrix Format Selection in Accelerating SpMV DyLaClass:加速 SpMV 时基于动态标签分类的稀疏矩阵格式优化选择
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-10-29 DOI: 10.1109/TPDS.2024.3488053
Zheng Shi;Yi Zou;Xianfeng Song;Shupeng Li;Fangming Liu;Quan Xue
Sparse matrix-vector multiplication (SpMV) is crucial in many scientific and engineering applications, particularly concerning the effectiveness of different sparse matrix storage formats for various architectures, no single format excels across all hardware. Previous research has focused on trying different algorithms to build predictors for the best format, yet it overlooked how to address the issue of the best format changing in the same hardware environment and how to reduce prediction overhead rather than merely considering the overhead in building predictors. This paper proposes a novel classification algorithm for optimizing sparse matrix storage formats, DyLaClass, based on dynamic labeling and flexible feature selection. Particularly, we introduce mixed labels and features with strong correlations, allowing us to achieve ultra-high prediction accuracy with minimal feature inputs, significantly reducing feature extraction overhead. For the first time, we propose the concept of the most suitable storage format rather than the best storage format, which can stably predict changes in the best format for the same matrix across multiple SpMV executions. We further demonstrate the proposed method on the University of Florida’s public sparse matrix collection dataset. Experimental results show that compared to existing work, our method achieves up to 91% classification accuracy. Using two different hardware platforms for verification, the proposed method outperforms existing methods by 1.26 to 1.43 times. Most importantly, the stability of the proposed prediction model is 25.5% higher than previous methods, greatly increasing the feasibility of the model in practical field applications.
稀疏矩阵-矢量乘法(SpMV)在许多科学和工程应用中至关重要,特别是在不同架构下不同稀疏矩阵存储格式的有效性方面,没有一种格式在所有硬件上都表现出色。以往的研究侧重于尝试不同的算法来构建最佳格式的预测器,但却忽略了如何解决最佳格式在同一硬件环境中发生变化的问题,以及如何减少预测开销,而不仅仅是考虑构建预测器的开销。本文基于动态标签和灵活的特征选择,提出了一种优化稀疏矩阵存储格式的新型分类算法 DyLaClass。特别是,我们引入了具有强相关性的混合标签和特征,使我们能够以最小的特征输入实现超高的预测准确率,大大减少了特征提取开销。我们首次提出了 "最合适的存储格式 "而非 "最佳存储格式 "的概念,从而可以在 SpMV 的多次执行中稳定地预测同一矩阵的最佳格式变化。我们还在佛罗里达大学的公共稀疏矩阵收集数据集上进一步演示了所提出的方法。实验结果表明,与现有工作相比,我们的方法达到了高达 91% 的分类准确率。使用两种不同的硬件平台进行验证,所提出的方法比现有方法高出 1.26 到 1.43 倍。最重要的是,提出的预测模型的稳定性比以前的方法高出 25.5%,大大提高了模型在实际现场应用中的可行性。
{"title":"DyLaClass: Dynamic Labeling Based Classification for Optimal Sparse Matrix Format Selection in Accelerating SpMV","authors":"Zheng Shi;Yi Zou;Xianfeng Song;Shupeng Li;Fangming Liu;Quan Xue","doi":"10.1109/TPDS.2024.3488053","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3488053","url":null,"abstract":"Sparse matrix-vector multiplication (SpMV) is crucial in many scientific and engineering applications, particularly concerning the effectiveness of different sparse matrix storage formats for various architectures, no single format excels across all hardware. Previous research has focused on trying different algorithms to build predictors for the best format, yet it overlooked how to address the issue of the best format changing in the same hardware environment and how to reduce prediction overhead rather than merely considering the overhead in building predictors. This paper proposes a novel classification algorithm for optimizing sparse matrix storage formats, DyLaClass, based on dynamic labeling and flexible feature selection. Particularly, we introduce mixed labels and features with strong correlations, allowing us to achieve ultra-high prediction accuracy with minimal feature inputs, significantly reducing feature extraction overhead. For the first time, we propose the concept of the most suitable storage format rather than the best storage format, which can stably predict changes in the best format for the same matrix across multiple SpMV executions. We further demonstrate the proposed method on the University of Florida’s public sparse matrix collection dataset. Experimental results show that compared to existing work, our method achieves up to 91% classification accuracy. Using two different hardware platforms for verification, the proposed method outperforms existing methods by 1.26 to 1.43 times. Most importantly, the stability of the proposed prediction model is 25.5% higher than previous methods, greatly increasing the feasibility of the model in practical field applications.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 12","pages":"2624-2639"},"PeriodicalIF":5.6,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PeakFS: An Ultra-High Performance Parallel File System via Computing-Network-Storage Co-Optimization for HPC Applications PeakFS:通过计算-网络-存储协同优化实现高性能计算应用的超高性能并行文件系统
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-10-24 DOI: 10.1109/TPDS.2024.3485754
Yixiao Chen;Haomai Yang;Kai Lu;Wenlve Huang;Jibin Wang;Jiguang Wan;Jian Zhou;Fei Wu;Changsheng Xie
Emerging high-performance computing (HPC) applications with diverse workload characteristics impose greater demands on parallel file systems (PFSs). PFSs also require more efficient software designs to fully utilize the performance of modern hardware, such as multi-core CPUs, Remote Direct Memory Access (RDMA), and NVMe SSDs. However, existing PFSs expose great limitations under these requirements due to limited multi-core scalability, unaware of HPC workloads, and disjointed network-storage optimizations. In this article, we present PeakFS, an ultra-high performance parallel file system via computing-network-storage co-optimization for HPC applications. PeakFS designs a shared-nothing scheduling system based on link-reduced task dispatching with lock-free queues to reduce concurrency overhead. Besides, PeakFS improves I/O performance with flexible distribution strategies, memory-efficient indexing, and metadata caching according to HPC I/O characteristics. Finally, PeakFS shortens the critical path of request processing through network-storage co-optimizations. Experimental results show that the metadata and data performance of PeakFS reaches more than 90% of the hardware limits. For metadata throughput, PeakFS achieves a 3.5–19× improvement over GekkoFS and outperforms BeeGFS by three orders of magnitude.
新兴的高性能计算(HPC)应用具有多样化的工作负载特征,对并行文件系统(PFS)提出了更高的要求。并行文件系统还需要更高效的软件设计,以充分利用多核 CPU、远程直接内存访问(RDMA)和 NVMe SSD 等现代硬件的性能。然而,由于多核可扩展性有限、不了解 HPC 工作负载以及网络-存储优化脱节,现有的 PFS 在这些要求下暴露出很大的局限性。在这篇文章中,我们介绍了PeakFS,一个通过计算-网络-存储协同优化实现高性能计算应用的超高性能并行文件系统。PeakFS 设计了一个基于链路减少的任务调度和无锁队列的无共享调度系统,以减少并发开销。此外,PeakFS 还根据 HPC I/O 特性,通过灵活的分配策略、内存高效索引和元数据缓存来提高 I/O 性能。最后,PeakFS通过网络-存储协同优化缩短了请求处理的关键路径。实验结果表明,PeakFS的元数据和数据性能达到硬件极限的90%以上。在元数据吞吐量方面,PeakFS 比 GekkoFS 提高了 3.5-19 倍,比 BeeGFS 高出三个数量级。
{"title":"PeakFS: An Ultra-High Performance Parallel File System via Computing-Network-Storage Co-Optimization for HPC Applications","authors":"Yixiao Chen;Haomai Yang;Kai Lu;Wenlve Huang;Jibin Wang;Jiguang Wan;Jian Zhou;Fei Wu;Changsheng Xie","doi":"10.1109/TPDS.2024.3485754","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3485754","url":null,"abstract":"Emerging high-performance computing (HPC) applications with diverse workload characteristics impose greater demands on parallel file systems (PFSs). PFSs also require more efficient software designs to fully utilize the performance of modern hardware, such as multi-core CPUs, Remote Direct Memory Access (RDMA), and NVMe SSDs. However, existing PFSs expose great limitations under these requirements due to limited multi-core scalability, unaware of HPC workloads, and disjointed network-storage optimizations. In this article, we present PeakFS, an ultra-high performance parallel file system via computing-network-storage co-optimization for HPC applications. PeakFS designs a shared-nothing scheduling system based on link-reduced task dispatching with lock-free queues to reduce concurrency overhead. Besides, PeakFS improves I/O performance with flexible distribution strategies, memory-efficient indexing, and metadata caching according to HPC I/O characteristics. Finally, PeakFS shortens the critical path of request processing through network-storage co-optimizations. Experimental results show that the metadata and data performance of PeakFS reaches more than 90% of the hardware limits. For metadata throughput, PeakFS achieves a 3.5–19× improvement over GekkoFS and outperforms BeeGFS by three orders of magnitude.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 12","pages":"2578-2595"},"PeriodicalIF":5.6,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Algorithms for Data Sharing-Aware Task Allocation in Edge Computing Systems 边缘计算系统中数据共享感知任务分配算法
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-10-24 DOI: 10.1109/TPDS.2024.3486184
Sanaz Rabinia;Niloofar Didar;Marco Brocanelli;Daniel Grosu
Edge computing has been developed as a low-latency data driven computation paradigm close to the end user to maximize profit, and/or minimize energy consumption. Edge computing allows each user’s task to analyze locally-acquired sensor data at the edge to reduce the resource congestion and improve the efficiency of data processing. To reduce application latency and data transferred to edge servers it is essential to consider data sharing for some user tasks that operate on the same data items. In this article, we formulate the data sharing-aware allocation problem which has as objectives the maximization of profit and minimization of network traffic by considering data-sharing characteristics of tasks on servers. Because the problem is ${sf NP-hard}$, we design the ${sf DSTA}$ algorithm to find a feasible solution in polynomial time. We investigate the approximation guarantees of ${sf DSTA}$ by determining the approximation ratios with respect to the total profit and the amount of total data traffic in the edge network. We also design a variant of ${sf DSTA}$, called ${sf DSTAR}$ that uses a smart rearrangement of tasks to allocate some of the unallocated tasks for increased total profit. We perform extensive experiments to investigate the performance of ${sf DSTA}$ and ${sf DSTAR}$, and compare them with a representative greedy baseline that only maximizes profit. Our experimental analysis shows that, compared to the baseline, ${sf DSTA}$ reduces the total data traffic in the edge network by up to 20% across 45 case study instances at a small profit loss. In addition, ${sf DSTAR}$ increases the total profit by up to 27% and the number of allocated tasks by 25% compared to ${sf DSTA}$, all while limiting the increase of total data traffic in the network.
边缘计算是一种接近终端用户的低延迟数据驱动计算模式,可实现利润最大化和/或能耗最小化。边缘计算允许每个用户的任务在边缘分析本地获取的传感器数据,以减少资源拥塞并提高数据处理效率。为了减少应用延迟和传输到边缘服务器的数据,必须考虑对相同数据项进行操作的某些用户任务的数据共享。在本文中,我们通过考虑服务器上任务的数据共享特性,提出了以利润最大化和网络流量最小化为目标的数据共享感知分配问题。由于该问题是${sf NP-hard}$,我们设计了${sf DSTA}$算法来在多项式时间内找到可行解。我们研究了 ${sf DSTA}$ 的近似保证,确定了与总利润和边缘网络总数据流量有关的近似率。我们还设计了${sf DSTA}$的一个变种,称为${sf DSTAR}$,它使用智能任务重排来分配一些未分配的任务,以增加总利润。我们进行了大量实验来研究 ${sf DSTA}$ 和 ${sf DSTAR}$ 的性能,并将它们与只追求利润最大化的代表性贪婪基线进行比较。我们的实验分析表明,与基线相比,${sf DSTA}$在45个案例研究实例中减少了高达20%的边缘网络总数据流量,而利润损失很小。此外,与 ${sf DSTA}$ 相比,${sf DSTAR}$ 的总利润增加了 27%,分配任务数增加了 25%,同时限制了网络总数据流量的增加。
{"title":"Algorithms for Data Sharing-Aware Task Allocation in Edge Computing Systems","authors":"Sanaz Rabinia;Niloofar Didar;Marco Brocanelli;Daniel Grosu","doi":"10.1109/TPDS.2024.3486184","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3486184","url":null,"abstract":"Edge computing has been developed as a low-latency data driven computation paradigm close to the end user to maximize profit, and/or minimize energy consumption. Edge computing allows each user’s task to analyze locally-acquired sensor data at the edge to reduce the resource congestion and improve the efficiency of data processing. To reduce application latency and data transferred to edge servers it is essential to consider data sharing for some user tasks that operate on the same data items. In this article, we formulate the data sharing-aware allocation problem which has as objectives the maximization of profit and minimization of network traffic by considering data-sharing characteristics of tasks on servers. Because the problem is \u0000<inline-formula><tex-math>${sf NP-hard}$</tex-math></inline-formula>\u0000, we design the \u0000<inline-formula><tex-math>${sf DSTA}$</tex-math></inline-formula>\u0000 algorithm to find a feasible solution in polynomial time. We investigate the approximation guarantees of \u0000<inline-formula><tex-math>${sf DSTA}$</tex-math></inline-formula>\u0000 by determining the approximation ratios with respect to the total profit and the amount of total data traffic in the edge network. We also design a variant of \u0000<inline-formula><tex-math>${sf DSTA}$</tex-math></inline-formula>\u0000, called \u0000<inline-formula><tex-math>${sf DSTAR}$</tex-math></inline-formula>\u0000 that uses a smart rearrangement of tasks to allocate some of the unallocated tasks for increased total profit. We perform extensive experiments to investigate the performance of \u0000<inline-formula><tex-math>${sf DSTA}$</tex-math></inline-formula>\u0000 and \u0000<inline-formula><tex-math>${sf DSTAR}$</tex-math></inline-formula>\u0000, and compare them with a representative greedy baseline that only maximizes profit. Our experimental analysis shows that, compared to the baseline, \u0000<inline-formula><tex-math>${sf DSTA}$</tex-math></inline-formula>\u0000 reduces the total data traffic in the edge network by up to 20% across 45 case study instances at a small profit loss. In addition, \u0000<inline-formula><tex-math>${sf DSTAR}$</tex-math></inline-formula>\u0000 increases the total profit by up to 27% and the number of allocated tasks by 25% compared to \u0000<inline-formula><tex-math>${sf DSTA}$</tex-math></inline-formula>\u0000, all while limiting the increase of total data traffic in the network.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 1","pages":"15-28"},"PeriodicalIF":5.6,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design and Performance Evaluation of Linearly Extensible Cube-Triangle Network for Multicore Systems 多核系统线性可扩展立方三角网络的设计与性能评估
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-10-24 DOI: 10.1109/TPDS.2024.3486219
Savita Gautam;Abdus Samad;Mohammad S. Umar
High-performance interconnection networks are currently being used to design Massively Parallel Computers. Selecting the set of nodes on which parallel tasks execute plays a vital role in the performance of such systems. These networks when deployed to run large parallel applications suffer from communication latencies which ultimately affect the system throughput. Mesh and Torus are primary examples of topologies used in such systems. However, these are being replaced with more efficient and complicated hybrid topologies such as ZMesh and x-Folded TM networks. This paper presents a new topology named as Linearly Extensible Cube-Triangle (LECΔ) which focuses on low latency, lesser average distance and improved throughput. It is symmetrical in nature and exhibits the desirable properties of similar networks with lesser complexity and cost. For N x N network, the LECΔ topology has lesser network latency than that of Mesh, ZMesh, Torus and x-Folded networks. The proposed LECΔ network produces reduced average distance, diameter and cost. It has a high value of bisection width and good scalability. The simulation results show that the performance of LECΔ network is similar to that of Mesh, ZMesh, Torus and x-Folded networks. The results verify the efficiency of the LECΔ network as evaluated and compared with similar networks.
高性能互连网络目前正被用于设计大规模并行计算机。选择执行并行任务的节点集对此类系统的性能起着至关重要的作用。这些网络在部署用于运行大型并行应用时会出现通信延迟,最终影响系统吞吐量。网格和 Torus 是此类系统中使用的拓扑结构的主要例子。然而,这些拓扑结构正在被 ZMesh 和 x-Folded TM 网络等更高效、更复杂的混合拓扑结构所取代。本文提出了一种名为线性可扩展立方体-三角形(LECΔ)的新拓扑结构,其重点是低延迟、减少平均距离和提高吞吐量。它在本质上是对称的,具有类似网络的理想特性,但复杂性和成本较低。对于 N x N 网络,LECΔ 拓扑的网络延迟低于 Mesh、ZMesh、Torus 和 x-Folded 网络。拟议的 LECΔ 网络可减少平均距离、直径和成本。它具有较高的分段宽度值和良好的可扩展性。仿真结果表明,LECΔ 网络的性能与 Mesh、ZMesh、Torus 和 x-Folded 网络相似。这些结果验证了 LECΔ 网络的效率,并将其与类似网络进行了评估和比较。
{"title":"Design and Performance Evaluation of Linearly Extensible Cube-Triangle Network for Multicore Systems","authors":"Savita Gautam;Abdus Samad;Mohammad S. Umar","doi":"10.1109/TPDS.2024.3486219","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3486219","url":null,"abstract":"High-performance interconnection networks are currently being used to design Massively Parallel Computers. Selecting the set of nodes on which parallel tasks execute plays a vital role in the performance of such systems. These networks when deployed to run large parallel applications suffer from communication latencies which ultimately affect the system throughput. Mesh and Torus are primary examples of topologies used in such systems. However, these are being replaced with more efficient and complicated hybrid topologies such as ZMesh and x-Folded TM networks. This paper presents a new topology named as Linearly Extensible Cube-Triangle (LECΔ) which focuses on low latency, lesser average distance and improved throughput. It is symmetrical in nature and exhibits the desirable properties of similar networks with lesser complexity and cost. For N x N network, the LECΔ topology has lesser network latency than that of Mesh, ZMesh, Torus and x-Folded networks. The proposed LECΔ network produces reduced average distance, diameter and cost. It has a high value of bisection width and good scalability. The simulation results show that the performance of LECΔ network is similar to that of Mesh, ZMesh, Torus and x-Folded networks. The results verify the efficiency of the LECΔ network as evaluated and compared with similar networks.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 12","pages":"2596-2607"},"PeriodicalIF":5.6,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Breaking the Memory Wall for Heterogeneous Federated Learning via Model Splitting 通过模型拆分打破异构联合学习的内存墙
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-10-14 DOI: 10.1109/TPDS.2024.3480115
Chunlin Tian;Li Li;Kahou Tam;Yebo Wu;Cheng-Zhong Xu
Federated Learning (FL) enables multiple devices to collaboratively train a shared model while preserving data privacy. Ever-increasing model complexity coupled with limited memory resources on the participating devices severely bottlenecks the deployment of FL in real-world scenarios. Thus, a framework that can effectively break the memory wall while jointly taking into account the hardware and statistical heterogeneity in FL is urgently required. In this article, we propose SmartSplit a framework that effectively reduces the memory footprint on the device side while guaranteeing the training progress and model accuracy for heterogeneous FL through model splitting. Towards this end, SmartSplit employs a hierarchical structure to adaptively guide the overall training process. In each training round, the central manager, hosted on the server, dynamically selects the participating devices and sets the cutting layer by jointly considering the memory budget, training capacity, and data distribution of each device. The MEC manager, deployed within the edge server, proceeds to split the local model and perform training of the server-side portion. Meanwhile, it fine-tunes the splitting points based on the time-evolving statistical importance. The on-device manager, embedded inside each mobile device, continuously monitors the local training status while employing cost-aware checkpointing to match the runtime dynamic memory budget. Extensive experiments on representative datasets are conducted on both commercial off-the-shelf mobile device testbeds. The experimental results show that SmartSplit excels in FL training on highly memory-constrained mobile SoCs, offering up to a 94% peak latency reduction and 100-fold memory savings. It enhances accuracy performance by 1.49%-57.18% and adaptively adjusts to dynamic memory budgets through cost-aware recomputation
联合学习(FL)使多台设备能够协同训练一个共享模型,同时保护数据隐私。模型的复杂性不断增加,而参与设备的内存资源有限,这严重制约了 FL 在实际应用场景中的部署。因此,急需一种框架,既能有效打破内存墙,又能共同考虑 FL 中的硬件和统计异质性。在本文中,我们提出了 SmartSplit 框架,它能有效减少设备端的内存占用,同时通过模型拆分保证异构 FL 的训练进度和模型准确性。为此,SmartSplit 采用分层结构,自适应地指导整个训练过程。在每一轮训练中,服务器上的中央管理器会动态选择参与的设备,并通过共同考虑每个设备的内存预算、训练容量和数据分布来设置切割层。部署在边缘服务器上的 MEC 管理器会继续拆分本地模型,并对服务器端部分进行训练。同时,它还会根据随时间变化的统计重要性对分割点进行微调。嵌入在每台移动设备中的设备上管理器会持续监控本地训练状态,同时采用成本感知检查点技术来匹配运行时的动态内存预算。在两个现成的商用移动设备测试平台上对具有代表性的数据集进行了广泛的实验。实验结果表明,SmartSplit 在内存高度受限的移动 SoC 上进行 FL 训练时表现出色,峰值延迟降低了 94%,内存节省了 100 倍。它将准确度性能提高了 1.49%-57.18%,并通过成本感知的重新计算适应动态内存预算。
{"title":"Breaking the Memory Wall for Heterogeneous Federated Learning via Model Splitting","authors":"Chunlin Tian;Li Li;Kahou Tam;Yebo Wu;Cheng-Zhong Xu","doi":"10.1109/TPDS.2024.3480115","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3480115","url":null,"abstract":"Federated Learning (FL) enables multiple devices to collaboratively train a shared model while preserving data privacy. Ever-increasing model complexity coupled with limited memory resources on the participating devices severely bottlenecks the deployment of FL in real-world scenarios. Thus, a framework that can effectively break the memory wall while jointly taking into account the hardware and statistical heterogeneity in FL is urgently required. In this article, we propose \u0000<italic>SmartSplit</i>\u0000 a framework that effectively reduces the memory footprint on the device side while guaranteeing the training progress and model accuracy for heterogeneous FL through model splitting. Towards this end, \u0000<italic>SmartSplit</i>\u0000 employs a hierarchical structure to adaptively guide the overall training process. In each training round, the central manager, hosted on the server, dynamically selects the participating devices and sets the cutting layer by jointly considering the memory budget, training capacity, and data distribution of each device. The MEC manager, deployed within the edge server, proceeds to split the local model and perform training of the server-side portion. Meanwhile, it fine-tunes the splitting points based on the time-evolving statistical importance. The on-device manager, embedded inside each mobile device, continuously monitors the local training status while employing cost-aware checkpointing to match the runtime dynamic memory budget. Extensive experiments on representative datasets are conducted on both commercial off-the-shelf mobile device testbeds. The experimental results show that \u0000<italic>SmartSplit</i>\u0000 excels in FL training on highly memory-constrained mobile SoCs, offering up to a 94% peak latency reduction and 100-fold memory savings. It enhances accuracy performance by 1.49%-57.18% and adaptively adjusts to dynamic memory budgets through cost-aware recomputation","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 12","pages":"2513-2526"},"PeriodicalIF":5.6,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mitosis: A Scalable Sharding System Featuring Multiple Dynamic Relay Chains 米特西:具有多个动态中继链的可扩展分片系统
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-10-14 DOI: 10.1109/TPDS.2024.3480223
Keyuan Wang;Linpeng Jia;Zhaoxiong Song;Yi Sun
Sharding is a prevalent approach for addressing performance issues in blockchain. To reduce governance complexities and ensure system security, a common practice involves a relay chain to coordinate cross-shard transactions. However, with a growing number of shards and cross-shard transactions, the single relay chain usually first suffers from performance bottleneck and shows poor scalability, thus making the relay chain's scalability vital for sharding systems. To solve this, we propose Mitosis, the first multi-relay architecture to improve the relay chain's scalability by sharding the relay chain itself. Our proposed relay sharding algorithm dynamically adjusts the number of relays or optimizes the topology between relays and shards to adaptively scale up relay chain's performance. Furthermore, to guarantee the security of the multi-relay architecture, a new validator reconfiguration scheme is designed, accompanied by a comprehensive security analysis of Mitosis. Through simulation experiments on two mainstream relay chain paradigms, we demonstrate that Mitosis can achieve high scalability and outperform state-of-the-art baselines in terms of workload of relays, relay chain throughput, and transaction latency.
分片是解决区块链性能问题的一种普遍方法。为了降低治理复杂性并确保系统安全,一种常见的做法是使用中继链来协调跨分片交易。然而,随着分片和跨分片交易数量的不断增加,单个中继链通常会首先遭遇性能瓶颈,表现出很差的可扩展性,因此中继链的可扩展性对分片系统至关重要。为了解决这个问题,我们提出了 Mitosis,这是第一个通过对中继链本身进行分片来提高中继链可扩展性的多中继架构。我们提出的中继分片算法可动态调整中继数量或优化中继与分片之间的拓扑结构,从而自适应地提高中继链的性能。此外,为了保证多中继架构的安全性,我们设计了一种新的验证器重构方案,并对 Mitosis 进行了全面的安全性分析。通过对两种主流中继链范例的仿真实验,我们证明了Mitosis可以实现高可扩展性,并在中继工作量、中继链吞吐量和事务延迟方面优于最先进的基线。
{"title":"Mitosis: A Scalable Sharding System Featuring Multiple Dynamic Relay Chains","authors":"Keyuan Wang;Linpeng Jia;Zhaoxiong Song;Yi Sun","doi":"10.1109/TPDS.2024.3480223","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3480223","url":null,"abstract":"Sharding is a prevalent approach for addressing performance issues in blockchain. To reduce governance complexities and ensure system security, a common practice involves a relay chain to coordinate cross-shard transactions. However, with a growing number of shards and cross-shard transactions, the single relay chain usually first suffers from performance bottleneck and shows poor scalability, thus making the relay chain's scalability vital for sharding systems. To solve this, we propose \u0000<italic>Mitosis</i>\u0000, the first multi-relay architecture to improve the relay chain's scalability by sharding the relay chain itself. Our proposed relay sharding algorithm dynamically adjusts the number of relays or optimizes the topology between relays and shards to adaptively scale up relay chain's performance. Furthermore, to guarantee the security of the multi-relay architecture, a new validator reconfiguration scheme is designed, accompanied by a comprehensive security analysis of \u0000<italic>Mitosis</i>\u0000. Through simulation experiments on two mainstream relay chain paradigms, we demonstrate that \u0000<italic>Mitosis</i>\u0000 can achieve high scalability and outperform state-of-the-art baselines in terms of workload of relays, relay chain throughput, and transaction latency.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 12","pages":"2497-2512"},"PeriodicalIF":5.6,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10716349","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142518166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Parallel and Distributed Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1