arXiv - CS - Distributed, Parallel, and Cluster Computing最新文献

Massively parallel CMA-ES with increasing population 大规模并行 CMA-ES 随着人口的增加而增加

arXiv - CS - Distributed, Parallel, and Cluster Computing

Pub Date : 2024-09-18 DOI: arxiv-2409.11765

David RedonCRIStAL, BONUS, Pierre FortinCRIStAL, BONUS, Bilel DerbelCRIStAL, BONUS, Miwako TsujiRIKEN CCS, Mitsuhisa SatoRIKEN CCS

The Increasing Population Covariance Matrix Adaptation Evolution Strategy(IPOP-CMA-ES) algorithm is a reference stochastic optimizer dedicated toblackbox optimization, where no prior knowledge about the underlying problemstructure is available. This paper aims at accelerating IPOP-CMA-ES thanks tohigh performance computing and parallelism when solving large optimizationproblems. We first show how BLAS and LAPACK routines can be introduced inlinear algebra operations, and we then propose two strategies for deployingIPOP-CMA-ES efficiently on large-scale parallel architectures with thousands ofCPU cores. The first parallel strategy processes the multiple searches in thesame ordering as the sequential IPOP-CMA-ES, while the second one processesconcurrently these multiple searches. These strategies are implemented inMPI+OpenMP and compared on 6144 cores of the supercomputer Fugaku. We manage toobtain substantial speedups (up to several thousand) and even super-linearones, and we provide an in-depth analysis of our results to understandprecisely the superior performance of our second strategy.

增殖群体协方差矩阵适应进化策略（IPOP-CMA-ES）算法是一种专用于黑箱优化的参考随机优化器，在这种情况下，没有关于底层问题结构的先验知识。本文旨在利用高性能计算和并行性加速 IPOP-CMA-ES 解决大型优化问题。我们首先展示了如何在线性代数运算中引入 BLAS 和 LAPACK 例程，然后提出了在具有数千个 CPU 内核的大规模并行架构上高效部署 IPOP-CMA-ES 的两种策略。第一种并行策略以与顺序 IPOP-CMA-ES 相同的顺序处理多重搜索，而第二种策略则同时处理这些多重搜索。这些策略是在 MPI+OpenMP 中实现的，并在超级计算机 Fugaku 的 6144 个内核上进行了比较。我们设法获得了大幅提速（高达数千），甚至超线性，我们对结果进行了深入分析，以准确理解第二种策略的优越性能。

{"title":"Massively parallel CMA-ES with increasing population","authors":"David RedonCRIStAL, BONUS, Pierre FortinCRIStAL, BONUS, Bilel DerbelCRIStAL, BONUS, Miwako TsujiRIKEN CCS, Mitsuhisa SatoRIKEN CCS","doi":"arxiv-2409.11765","DOIUrl":"https://doi.org/arxiv-2409.11765","url":null,"abstract":"The Increasing Population Covariance Matrix Adaptation Evolution Strategy\u0000(IPOP-CMA-ES) algorithm is a reference stochastic optimizer dedicated to\u0000blackbox optimization, where no prior knowledge about the underlying problem\u0000structure is available. This paper aims at accelerating IPOP-CMA-ES thanks to\u0000high performance computing and parallelism when solving large optimization\u0000problems. We first show how BLAS and LAPACK routines can be introduced in\u0000linear algebra operations, and we then propose two strategies for deploying\u0000IPOP-CMA-ES efficiently on large-scale parallel architectures with thousands of\u0000CPU cores. The first parallel strategy processes the multiple searches in the\u0000same ordering as the sequential IPOP-CMA-ES, while the second one processes\u0000concurrently these multiple searches. These strategies are implemented in\u0000MPI+OpenMP and compared on 6144 cores of the supercomputer Fugaku. We manage to\u0000obtain substantial speedups (up to several thousand) and even super-linear\u0000ones, and we provide an in-depth analysis of our results to understand\u0000precisely the superior performance of our second strategy.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"189 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Communication Lower Bounds and Optimal Algorithms for Symmetric Matrix Computations 对称矩阵计算的通信下限和最佳算法

arXiv - CS - Distributed, Parallel, and Cluster Computing

Pub Date : 2024-09-17 DOI: arxiv-2409.11304

Hussam Al DaasSTFC, Scientific Computing Department, Rutherford Appleton Laboratory, Didcot, UK, Grey BallardWake Forest University, Computer Science Department, Winston-Salem, NC, USA, Laura GrigoriEPFL, Institute of Mathematics, Lausanne, Switzerland and PSI, Center for Scientific Computing, Theory and Data, Villigen, Switzerland, Suraj KumarInstitut national de recherche en sciences et technologies du numérique, Lyon, France, Kathryn RouseInmar Intelligence, Winston-Salem, NC, USA, Mathieu VeriteEPFL, Institute of Mathematics, Lausanne, Switzerland

In this article, we focus on the communication costs of three symmetricmatrix computations: i) multiplying a matrix with its transpose, known as asymmetric rank-k update (SYRK) ii) adding the result of the multiplication of amatrix with the transpose of another matrix and the transpose of that result,known as a symmetric rank-2k update (SYR2K) iii) performing matrixmultiplication with a symmetric input matrix (SYMM). All three computationsappear in the Level 3 Basic Linear Algebra Subroutines (BLAS) and have wide usein applications involving symmetric matrices. We establish communication lowerbounds for these kernels using sequential and distributed-memory parallelcomputational models, and we show that our bounds are tight by presentingcommunication-optimal algorithms for each setting. Our lower bound proofs relyon applying a geometric inequality for symmetric computations and analyticallysolving constrained nonlinear optimization problems. The symmetric matrix andits corresponding computations are accessed and performed according to atriangular block partitioning scheme in the optimal algorithms.

本文重点讨论三种对称矩阵计算的通信成本：i）矩阵与其转置相乘，称为非对称秩-k 更新（SYRK） ii）矩阵与另一矩阵的转置相乘的结果与该结果的转置相加，称为对称秩-2k 更新（SYR2K） iii）与对称输入矩阵执行矩阵乘法（SYMM）。所有这三种计算都出现在第 3 级基本线性代数子程序（BLAS）中，并在涉及对称矩阵的应用中得到广泛应用。我们使用顺序和分布式内存并行计算模型建立了这些内核的通信下界，并通过提出每种环境下的通信最优算法来证明我们的下界是紧密的。我们的下界证明依赖于应用对称计算的几何不等式和分析解决受限非线性优化问题。在最优算法中，对称矩阵及其相应的计算是根据三角形块分割方案访问和执行的。

{"title":"Communication Lower Bounds and Optimal Algorithms for Symmetric Matrix Computations","authors":"Hussam Al DaasSTFC, Scientific Computing Department, Rutherford Appleton Laboratory, Didcot, UK, Grey BallardWake Forest University, Computer Science Department, Winston-Salem, NC, USA, Laura GrigoriEPFL, Institute of Mathematics, Lausanne, Switzerland and PSI, Center for Scientific Computing, Theory and Data, Villigen, Switzerland, Suraj KumarInstitut national de recherche en sciences et technologies du numérique, Lyon, France, Kathryn RouseInmar Intelligence, Winston-Salem, NC, USA, Mathieu VeriteEPFL, Institute of Mathematics, Lausanne, Switzerland","doi":"arxiv-2409.11304","DOIUrl":"https://doi.org/arxiv-2409.11304","url":null,"abstract":"In this article, we focus on the communication costs of three symmetric\u0000matrix computations: i) multiplying a matrix with its transpose, known as a\u0000symmetric rank-k update (SYRK) ii) adding the result of the multiplication of a\u0000matrix with the transpose of another matrix and the transpose of that result,\u0000known as a symmetric rank-2k update (SYR2K) iii) performing matrix\u0000multiplication with a symmetric input matrix (SYMM). All three computations\u0000appear in the Level 3 Basic Linear Algebra Subroutines (BLAS) and have wide use\u0000in applications involving symmetric matrices. We establish communication lower\u0000bounds for these kernels using sequential and distributed-memory parallel\u0000computational models, and we show that our bounds are tight by presenting\u0000communication-optimal algorithms for each setting. Our lower bound proofs rely\u0000on applying a geometric inequality for symmetric computations and analytically\u0000solving constrained nonlinear optimization problems. The symmetric matrix and\u0000its corresponding computations are accessed and performed according to a\u0000triangular block partitioning scheme in the optimal algorithms.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"7 5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Ladon: High-Performance Multi-BFT Consensus via Dynamic Global Ordering (Extended Version) 拉顿通过动态全局排序实现高性能多 BFT 共识（扩展版）

arXiv - CS - Distributed, Parallel, and Cluster Computing

Pub Date : 2024-09-17 DOI: arxiv-2409.10954

Hanzheng Lyu, Shaokang Xie, Jianyu Niu, Chen Feng, Yinqian Zhang, Ivan Beschastnikh

Multi-BFT consensus runs multiple leader-based consensus instances inparallel, circumventing the leader bottleneck of a single instance. However, itcontains an Achilles' heel: the need to globally order output blocks acrossinstances. Deriving this global ordering is challenging because it must copewith different rates at which blocks are produced by instances. Prior Multi-BFTdesigns assign each block a global index before creation, leading to poorperformance. We propose Ladon, a high-performance Multi-BFT protocol that allows varyinginstance block rates. Our key idea is to order blocks across instancesdynamically, which eliminates blocking on slow instances. We achieve dynamicglobal ordering by assigning monotonic ranks to blocks. We pipeline rankcoordination with the consensus process to reduce protocol overhead and combineaggregate signatures with rank information to reduce message complexity.Ladon's dynamic ordering enables blocks to be globally ordered according totheir generation, which respects inter-block causality. We implemented andevaluated Ladon by integrating it with both PBFT and HotStuff protocols. Ourevaluation shows that Ladon-PBFT (resp., Ladon-HotStuff) improves the peakthroughput of the prior art by $approx$8x (resp., 2x) and reduces latency by$approx$62% (resp., 23%), when deployed with one straggling replica (out of128 replicas) in a WAN setting.

Multi-BFT 共识可以并行运行多个基于领导者的共识实例，从而避免了单个实例的领导者瓶颈。然而，它也有一个致命弱点：需要对跨实例的输出块进行全局排序。这种全局排序具有挑战性，因为它必须应对实例产生区块的不同速度。先前的 Multi-BFT 设计在创建前为每个块分配一个全局索引，导致性能低下。我们提出的 Ladon 是一种允许不同实例块速率的高性能多 BFT 协议。我们的关键想法是在实例间动态排序块，从而消除了慢速实例上的阻塞。我们通过为区块分配单调的等级来实现动态全局排序。我们将等级协调与共识过程进行管道化，以减少协议开销，并将聚合签名与等级信息相结合，以降低消息复杂性。Ladon 的动态排序使块能够根据其生成情况进行全局排序，从而尊重了块间的因果关系。我们将 Ladon 与 PBFT 和 HotStuff 协议集成，对其进行了实施和评估。评估结果表明，当在广域网环境中部署一个杂散的副本（共有128个副本）时，Ladon-PBFT（即Ladon-HotStuff）将现有技术的峰值吞吐量提高了约8倍（即2倍），并将延迟降低了约62%（即23%）。

{"title":"Ladon: High-Performance Multi-BFT Consensus via Dynamic Global Ordering (Extended Version)","authors":"Hanzheng Lyu, Shaokang Xie, Jianyu Niu, Chen Feng, Yinqian Zhang, Ivan Beschastnikh","doi":"arxiv-2409.10954","DOIUrl":"https://doi.org/arxiv-2409.10954","url":null,"abstract":"Multi-BFT consensus runs multiple leader-based consensus instances in\u0000parallel, circumventing the leader bottleneck of a single instance. However, it\u0000contains an Achilles' heel: the need to globally order output blocks across\u0000instances. Deriving this global ordering is challenging because it must cope\u0000with different rates at which blocks are produced by instances. Prior Multi-BFT\u0000designs assign each block a global index before creation, leading to poor\u0000performance. We propose Ladon, a high-performance Multi-BFT protocol that allows varying\u0000instance block rates. Our key idea is to order blocks across instances\u0000dynamically, which eliminates blocking on slow instances. We achieve dynamic\u0000global ordering by assigning monotonic ranks to blocks. We pipeline rank\u0000coordination with the consensus process to reduce protocol overhead and combine\u0000aggregate signatures with rank information to reduce message complexity.\u0000Ladon's dynamic ordering enables blocks to be globally ordered according to\u0000their generation, which respects inter-block causality. We implemented and\u0000evaluated Ladon by integrating it with both PBFT and HotStuff protocols. Our\u0000evaluation shows that Ladon-PBFT (resp., Ladon-HotStuff) improves the peak\u0000throughput of the prior art by $approx$8x (resp., 2x) and reduces latency by\u0000$approx$62% (resp., 23%), when deployed with one straggling replica (out of\u0000128 replicas) in a WAN setting.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CountChain: A Decentralized Oracle Network for Counting Systems 计数链：用于计数系统的去中心化 Oracle 网络

arXiv - CS - Distributed, Parallel, and Cluster Computing

Pub Date : 2024-09-17 DOI: arxiv-2409.11592

Behkish Nassirzadeh, Stefanos Leonardos, Albert Heinle, Anwar Hasan, Vijay Ganesh

Blockchain integration in industries like online advertising is hindered byits connectivity limitations to off-chain data. These industries heavily relyon precise counting systems for collecting and analyzing off-chain data. Thisrequires mechanisms, often called oracles, to feed off-chain data into smartcontracts. However, current oracle solutions are ill-suited for countingsystems since the oracles do not know when to expect the data, posing asignificant challenge. To address this, we present CountChain, a decentralized oracle network forcounting systems. In CountChain, data is received by all oracle nodes, and anynode can submit a proposition request. Each proposition contains enough data toevaluate the occurrence of an event. Only randomly selected nodes participatein a game to evaluate the truthfulness of each proposition by providing proofand some stake. Finally, the propositions with the outcome of True incrementthe counter in a smart contract. Thus, instead of a contract calling oraclesfor data, in CountChain, the oracles call a smart contract when the data isavailable. Furthermore, we present a formal analysis and experimentalevaluation of the system's parameters on over half a million data points toobtain optimal system parameters. In such conditions, our game-theoreticalanalysis demonstrates that a Nash equilibrium exists wherein all rationalparties participate with honesty.

区块链与在线广告等行业的整合因其与链外数据的连接限制而受到阻碍。这些行业严重依赖精确的计数系统来收集和分析链外数据。这就需要通常称为oracle的机制将链外数据输入智能合约。然而，当前的神谕解决方案并不适合计数系统，因为神谕不知道何时会收到数据，这带来了巨大的挑战。为了解决这个问题，我们提出了用于计数系统的去中心化甲骨文网络 CountChain。在 CountChain 中，所有甲骨文节点都会收到数据，任何节点都可以提交命题请求。每个命题都包含足够的数据来评估事件的发生。只有随机选择的节点才能参与游戏，通过提供证明和一定的赌注来评估每个命题的真实性。最后，结果为 "真 "的命题会增加智能合约中的计数器。因此，在 CountChain 中，当数据可用时，神谕会调用智能合约，而不是合约调用神谕获取数据。此外，我们还对超过 50 万个数据点的系统参数进行了形式分析和实验评估，以获得最佳系统参数。在这样的条件下，我们的博弈理论分析表明，存在一个纳什均衡，在这个均衡中，所有理性的各方都以诚实的态度参与。

{"title":"CountChain: A Decentralized Oracle Network for Counting Systems","authors":"Behkish Nassirzadeh, Stefanos Leonardos, Albert Heinle, Anwar Hasan, Vijay Ganesh","doi":"arxiv-2409.11592","DOIUrl":"https://doi.org/arxiv-2409.11592","url":null,"abstract":"Blockchain integration in industries like online advertising is hindered by\u0000its connectivity limitations to off-chain data. These industries heavily rely\u0000on precise counting systems for collecting and analyzing off-chain data. This\u0000requires mechanisms, often called oracles, to feed off-chain data into smart\u0000contracts. However, current oracle solutions are ill-suited for counting\u0000systems since the oracles do not know when to expect the data, posing a\u0000significant challenge. To address this, we present CountChain, a decentralized oracle network for\u0000counting systems. In CountChain, data is received by all oracle nodes, and any\u0000node can submit a proposition request. Each proposition contains enough data to\u0000evaluate the occurrence of an event. Only randomly selected nodes participate\u0000in a game to evaluate the truthfulness of each proposition by providing proof\u0000and some stake. Finally, the propositions with the outcome of True increment\u0000the counter in a smart contract. Thus, instead of a contract calling oracles\u0000for data, in CountChain, the oracles call a smart contract when the data is\u0000available. Furthermore, we present a formal analysis and experimental\u0000evaluation of the system's parameters on over half a million data points to\u0000obtain optimal system parameters. In such conditions, our game-theoretical\u0000analysis demonstrates that a Nash equilibrium exists wherein all rational\u0000parties participate with honesty.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"93 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Energy Efficiency Support for Software Defined Networks: a Serverless Computing Approach 软件定义网络的能效支持：无服务器计算方法

arXiv - CS - Distributed, Parallel, and Cluster Computing

Pub Date : 2024-09-17 DOI: arxiv-2409.11208

Fatemeh Banaie, Karim Djemame, Abdulaziz Alhindi, Vasilios Kelefouras

Automatic network management strategies have become paramount for meeting theneeds of innovative real-time and data-intensive applications, such as in theInternet of Things. However, meeting the ever-growing and fluctuating demandsfor data and services in such applications requires more than ever an efficientand scalable network resource management approach. Such approach should enablethe automated provisioning of services while incentivising energy-efficientresource usage that expands throughout the edge-to-cloud continuum. This paperis the first to realise the concept of modular Software-Defined Networks basedon serverless functions in an energy-aware environment. By adopting Function asa Service, the approach enables on-demand deployment of network functions,resulting in cost reduction through fine resource provisioning granularity. Ananalytical model is presented to approximate the service delivery time andpower consumption, as well as an open-source prototype implementation supportedby an extensive experimental evaluation. The experiments demonstrate not onlythe practical applicability of the proposed approach but significantimprovement in terms of energy efficiency.

要满足创新型实时和数据密集型应用（如物联网）的需求，自动网络管理策略已变得至关重要。然而，要满足此类应用中不断增长和波动的数据和服务需求，比以往任何时候都更需要高效和可扩展的网络资源管理方法。这种方法应能自动提供服务，同时鼓励在从边缘到云的整个过程中高效使用能源。本文首次在能源感知环境中实现了基于无服务器功能的模块化软件定义网络概念。通过采用 "功能即服务"（Function asa Service），该方法实现了网络功能的按需部署，通过精细的资源调配粒度降低了成本。本文提出了一个用于估算服务交付时间和功耗的分析模型，以及一个开源原型实施方案，并辅以广泛的实验评估。实验证明，所提出的方法不仅具有实际应用性，而且在能效方面也有显著提高。

{"title":"Energy Efficiency Support for Software Defined Networks: a Serverless Computing Approach","authors":"Fatemeh Banaie, Karim Djemame, Abdulaziz Alhindi, Vasilios Kelefouras","doi":"arxiv-2409.11208","DOIUrl":"https://doi.org/arxiv-2409.11208","url":null,"abstract":"Automatic network management strategies have become paramount for meeting the\u0000needs of innovative real-time and data-intensive applications, such as in the\u0000Internet of Things. However, meeting the ever-growing and fluctuating demands\u0000for data and services in such applications requires more than ever an efficient\u0000and scalable network resource management approach. Such approach should enable\u0000the automated provisioning of services while incentivising energy-efficient\u0000resource usage that expands throughout the edge-to-cloud continuum. This paper\u0000is the first to realise the concept of modular Software-Defined Networks based\u0000on serverless functions in an energy-aware environment. By adopting Function as\u0000a Service, the approach enables on-demand deployment of network functions,\u0000resulting in cost reduction through fine resource provisioning granularity. An\u0000analytical model is presented to approximate the service delivery time and\u0000power consumption, as well as an open-source prototype implementation supported\u0000by an extensive experimental evaluation. The experiments demonstrate not only\u0000the practical applicability of the proposed approach but significant\u0000improvement in terms of energy efficiency.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"47 25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Delay Analysis of EIP-4844 EIP-4844 的延迟分析

arXiv - CS - Distributed, Parallel, and Cluster Computing

Pub Date : 2024-09-17 DOI: arxiv-2409.11043

Pourya Soltani, Farid Ashtiani

Proto-Danksharding, proposed in Ethereum Improvement Proposal 4844(EIP-4844), aims to incrementally improve the scalability of the Ethereumblockchain by introducing a new type of transaction known as blob-carryingtransactions. These transactions incorporate binary large objects (blobs) ofdata that are stored off-chain but referenced and verified on-chain to ensuredata availability. By decoupling data availability from transaction execution,Proto-Danksharding alleviates network congestion and reduces gas fees, layingthe groundwork for future, more advanced sharding solutions. This letterprovides an analytical model to derive the delay for these new transactions. Wemodel the system as an $mathrm{M/D}^B/1$ queue which we then find its steadystate distribution through embedding a Markov chain and use of supplementaryvariable method. We show that transactions with more blobs but less frequentimpose higher delays on the system compared to lower blobs but more frequent.

以太坊第 4844 号改进提案（EIP-4844）中提出的 "原数据库分片"（Proto-Danksharding）旨在通过引入一种称为 "blob-carryingtransactions "的新型交易，逐步提高以太坊区块链的可扩展性。这些交易包含二进制大型数据对象（blob），这些数据对象存储在链外，但在链上被引用和验证，以确保数据的可用性。通过将数据可用性与交易执行解耦，Proto-Danksharding 缓解了网络拥塞，降低了气体费用，为未来更先进的分片解决方案奠定了基础。这封信提供了一个分析模型，用于推导这些新事务的延迟。我们将系统建模为一个 $mathrm{M/D}^B/1$ 队列，然后通过嵌入马尔可夫链和使用补充变量法找到其稳态分布。我们的研究表明，相对于较低的 Blob 值和较高的频率，Blob 值较高但频率较低的事务会给系统带来更高的延迟。

引用次数: 0

A Green Multi-Attribute Client Selection for Over-The-Air Federated Learning: A Grey-Wolf-Optimizer Approach 空中联合学习的绿色多属性客户端选择：灰狼优化器方法

arXiv - CS - Distributed, Parallel, and Cluster Computing

Pub Date : 2024-09-16 DOI: arxiv-2409.11442

Maryam Ben Driss, Essaid Sabir, Halima Elbiaze, Abdoulaye Baniré Diallo, Mohamed Sadik

Federated Learning (FL) has gained attention across various industries forits capability to train machine learning models without centralizing sensitivedata. While this approach offers significant benefits such as privacypreservation and decreased communication overhead, it presents severalchallenges, including deployment complexity and interoperability issues,particularly in heterogeneous scenarios or resource-constrained environments.Over-the-air (OTA) FL was introduced to tackle these challenges bydisseminating model updates without necessitating direct device-to-deviceconnections or centralized servers. However, OTA-FL brought forth limitationsassociated with heightened energy consumption and network latency. In thispaper, we propose a multi-attribute client selection framework employing thegrey wolf optimizer (GWO) to strategically control the number of participantsin each round and optimize the OTA-FL process while considering accuracy,energy, delay, reliability, and fairness constraints of participating devices.We evaluate the performance of our multi-attribute client selection approach interms of model loss minimization, convergence time reduction, and energyefficiency. In our experimental evaluation, we assessed and compared theperformance of our approach against the existing state-of-the-art methods. Ourresults demonstrate that the proposed GWO-based client selection outperformsthese baselines across various metrics. Specifically, our approach achieves anotable reduction in model loss, accelerates convergence time, and enhancesenergy efficiency while maintaining high fairness and reliability indicators.

联盟学习（FL）因其无需集中敏感数据就能训练机器学习模型的能力而受到各行各业的关注。虽然这种方法具有保护隐私和减少通信开销等显著优势，但它也带来了一些挑战，包括部署复杂性和互操作性问题，特别是在异构场景或资源受限的环境中。空中（OTA）FL 的引入是为了应对这些挑战，它通过传播模型更新而无需设备与设备之间的直接连接或集中式服务器。然而，OTA-FL 带来了与高能耗和网络延迟相关的限制。在本文中，我们提出了一种多属性客户端选择框架，利用灰狼优化器（GWO）战略性地控制每一轮的参与人数，并优化 OTA-FL 流程，同时考虑参与设备的准确性、能耗、延迟、可靠性和公平性约束。在实验评估中，我们评估并比较了我们的方法与现有最先进方法的性能。结果表明，基于 GWO 的客户端选择方法在各种指标上都优于这些基准方法。具体来说，我们的方法显著减少了模型损失，加快了收敛时间，提高了能效，同时保持了较高的公平性和可靠性指标。

{"title":"A Green Multi-Attribute Client Selection for Over-The-Air Federated Learning: A Grey-Wolf-Optimizer Approach","authors":"Maryam Ben Driss, Essaid Sabir, Halima Elbiaze, Abdoulaye Baniré Diallo, Mohamed Sadik","doi":"arxiv-2409.11442","DOIUrl":"https://doi.org/arxiv-2409.11442","url":null,"abstract":"Federated Learning (FL) has gained attention across various industries for\u0000its capability to train machine learning models without centralizing sensitive\u0000data. While this approach offers significant benefits such as privacy\u0000preservation and decreased communication overhead, it presents several\u0000challenges, including deployment complexity and interoperability issues,\u0000particularly in heterogeneous scenarios or resource-constrained environments.\u0000Over-the-air (OTA) FL was introduced to tackle these challenges by\u0000disseminating model updates without necessitating direct device-to-device\u0000connections or centralized servers. However, OTA-FL brought forth limitations\u0000associated with heightened energy consumption and network latency. In this\u0000paper, we propose a multi-attribute client selection framework employing the\u0000grey wolf optimizer (GWO) to strategically control the number of participants\u0000in each round and optimize the OTA-FL process while considering accuracy,\u0000energy, delay, reliability, and fairness constraints of participating devices.\u0000We evaluate the performance of our multi-attribute client selection approach in\u0000terms of model loss minimization, convergence time reduction, and energy\u0000efficiency. In our experimental evaluation, we assessed and compared the\u0000performance of our approach against the existing state-of-the-art methods. Our\u0000results demonstrate that the proposed GWO-based client selection outperforms\u0000these baselines across various metrics. Specifically, our approach achieves a\u0000notable reduction in model loss, accelerates convergence time, and enhances\u0000energy efficiency while maintaining high fairness and reliability indicators.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"591 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture CPU、GPU 加速计算机和 SIMD 架构的性能编程研究

arXiv - CS - Distributed, Parallel, and Cluster Computing

Pub Date : 2024-09-16 DOI: arxiv-2409.10661

Xinyao Yi

Parallel computing is a standard approach to achieving high-performancecomputing (HPC). Three commonly used methods to implement parallel computinginclude: 1) applying multithreading technology on single-core or multi-coreCPUs; 2) incorporating powerful parallel computing devices such as GPUs, FPGAs,and other accelerators; and 3) utilizing special parallel architectures likeSingle Instruction/Multiple Data (SIMD). Many researchers have made efforts using different parallel technologies,including developing applications, conducting performance analyses, identifyingperformance bottlenecks, and proposing feasible solutions. However, balancingand optimizing parallel programs remain challenging due to the complexity ofparallel algorithms and hardware architectures. Issues such as data transferbetween hosts and devices in heterogeneous systems continue to be bottlenecksthat limit performance. This work summarizes a vast amount of information on various parallelprogramming techniques, aiming to present the current state and futuredevelopment trends of parallel programming, performance issues, and solutions.It seeks to give readers an overall picture and provide background knowledge tosupport subsequent research.

并行计算是实现高性能计算（HPC）的标准方法。实现并行计算的三种常用方法包括1）在单核或多核CPU上应用多线程技术；2）集成强大的并行计算设备，如GPU、FPGA和其他加速器；3）利用特殊的并行架构，如单指令/多数据（SIMD）。许多研究人员利用不同的并行技术做出了努力，包括开发应用程序、进行性能分析、找出性能瓶颈并提出可行的解决方案。然而，由于并行算法和硬件架构的复杂性，平衡和优化并行程序仍然具有挑战性。异构系统中主机和设备之间的数据传输等问题仍然是限制性能的瓶颈。本著作总结了各种并行编程技术的大量信息，旨在介绍并行编程的现状和未来发展趋势、性能问题和解决方案。

{"title":"A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture","authors":"Xinyao Yi","doi":"arxiv-2409.10661","DOIUrl":"https://doi.org/arxiv-2409.10661","url":null,"abstract":"Parallel computing is a standard approach to achieving high-performance\u0000computing (HPC). Three commonly used methods to implement parallel computing\u0000include: 1) applying multithreading technology on single-core or multi-core\u0000CPUs; 2) incorporating powerful parallel computing devices such as GPUs, FPGAs,\u0000and other accelerators; and 3) utilizing special parallel architectures like\u0000Single Instruction/Multiple Data (SIMD). Many researchers have made efforts using different parallel technologies,\u0000including developing applications, conducting performance analyses, identifying\u0000performance bottlenecks, and proposing feasible solutions. However, balancing\u0000and optimizing parallel programs remain challenging due to the complexity of\u0000parallel algorithms and hardware architectures. Issues such as data transfer\u0000between hosts and devices in heterogeneous systems continue to be bottlenecks\u0000that limit performance. This work summarizes a vast amount of information on various parallel\u0000programming techniques, aiming to present the current state and future\u0000development trends of parallel programming, performance issues, and solutions.\u0000It seeks to give readers an overall picture and provide background knowledge to\u0000support subsequent research.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deterministic Bounds in Committee Selection: Enhancing Decentralization and Scalability in Distributed Ledgers 委员会选择中的确定性界限：增强分布式账本的去中心化和可扩展性

arXiv - CS - Distributed, Parallel, and Cluster Computing

Pub Date : 2024-09-16 DOI: arxiv-2409.10727

Grigorii Melnikov, Sebastian Müller, Nikita Polyanskii, Yury Yanovich

Consensus plays a crucial role in distributed ledger systems, impacting bothscalability and decentralization. Many blockchain systems use a weightedlottery based on a scarce resource such as a stake, storage, memory, orcomputing power to select a committee whose members drive the consensus and areresponsible for adding new information to the ledger. Therefore, ensuring arobust and fair committee selection process is essential for maintainingsecurity, efficiency, and decentralization. There are two main approaches to randomized committee selection. In oneapproach, each validator candidate locally checks whether they are elected tothe committee and reveals their proof during the consensus phase. In contrast,in the second approach, a sortition algorithm decides a fixed-sized committeethat is globally verified. This paper focuses on the latter approach, withcryptographic sortition as a method for fair committee selection thatguarantees a constant committee size. Our goal is to develop deterministicguarantees that strengthen decentralization. We introduce novel methods thatprovide deterministic bounds on the influence of adversaries within thecommittee, as evidenced by numerical experiments. This approach overcomes thelimitations of existing protocols that only offer probabilistic guarantees,often providing large committees that are impractical for many quorum-basedapplications like atomic broadcast and randomness beacon protocols.

共识在分布式账本系统中起着至关重要的作用，对可扩展性和去中心化都有影响。许多区块链系统使用基于稀缺资源（如股权、存储、内存或计算能力）的加权乐透彩票来选择一个委员会，该委员会的成员推动共识的达成，并负责将新信息添加到分类账中。因此，确保委员会选择过程的稳健性和公平性对于维护安全、效率和去中心化至关重要。随机委员会选择有两种主要方法。在一种方法中，每个验证候选者都会在本地检查自己是否当选为委员会成员，并在共识阶段披露自己的证明。相比之下，在第二种方法中，排序算法会决定一个固定规模的委员会，并对其进行全局验证。本文侧重于后一种方法，将加密排序作为一种公平选择委员会的方法，以保证委员会规模恒定。我们的目标是开发能加强去中心化的确定性保证。我们引入了新方法，通过数值实验证明，这种方法能对委员会内部对手的影响提供确定性约束。这种方法克服了现有协议的局限性，因为现有协议只提供概率保证，通常提供的大型委员会对于许多基于法定人数的应用（如原子广播和随机信标协议）来说是不切实际的。

{"title":"Deterministic Bounds in Committee Selection: Enhancing Decentralization and Scalability in Distributed Ledgers","authors":"Grigorii Melnikov, Sebastian Müller, Nikita Polyanskii, Yury Yanovich","doi":"arxiv-2409.10727","DOIUrl":"https://doi.org/arxiv-2409.10727","url":null,"abstract":"Consensus plays a crucial role in distributed ledger systems, impacting both\u0000scalability and decentralization. Many blockchain systems use a weighted\u0000lottery based on a scarce resource such as a stake, storage, memory, or\u0000computing power to select a committee whose members drive the consensus and are\u0000responsible for adding new information to the ledger. Therefore, ensuring a\u0000robust and fair committee selection process is essential for maintaining\u0000security, efficiency, and decentralization. There are two main approaches to randomized committee selection. In one\u0000approach, each validator candidate locally checks whether they are elected to\u0000the committee and reveals their proof during the consensus phase. In contrast,\u0000in the second approach, a sortition algorithm decides a fixed-sized committee\u0000that is globally verified. This paper focuses on the latter approach, with\u0000cryptographic sortition as a method for fair committee selection that\u0000guarantees a constant committee size. Our goal is to develop deterministic\u0000guarantees that strengthen decentralization. We introduce novel methods that\u0000provide deterministic bounds on the influence of adversaries within the\u0000committee, as evidenced by numerical experiments. This approach overcomes the\u0000limitations of existing protocols that only offer probabilistic guarantees,\u0000often providing large committees that are impractical for many quorum-based\u0000applications like atomic broadcast and randomness beacon protocols.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TPFL: Tsetlin-Personalized Federated Learning with Confidence-Based Clustering TPFL：基于置信度聚类的蔡特林个性化联合学习

arXiv - CS - Distributed, Parallel, and Cluster Computing

Pub Date : 2024-09-16 DOI: arxiv-2409.10392

Rasoul Jafari Gohari, Laya Aliahmadipour, Ezat Valipour

The world of Machine Learning (ML) has witnessed rapid changes in terms ofnew models and ways to process users data. The majority of work that has beendone is focused on Deep Learning (DL) based approaches. However, with theemergence of new algorithms such as the Tsetlin Machine (TM) algorithm, thereis growing interest in exploring alternative approaches that may offer uniqueadvantages in certain domains or applications. One of these domains isFederated Learning (FL), in which users privacy is of utmost importance. Due toits novelty, FL has seen a surge in the incorporation of personalizationtechniques to enhance model accuracy while maintaining user privacy underpersonalized conditions. In this work, we propose a novel approach dubbed TPFL:Tsetlin-Personalized Federated Learning, in which models are grouped intoclusters based on their confidence towards a specific class. In this way,clustering can benefit from two key advantages. Firstly, clients share onlywhat they are confident about, resulting in the elimination of wrongful weightaggregation among clients whose data for a specific class may have not beenenough during the training. This phenomenon is prevalent when the data arenon-Independent and Identically Distributed (non-IID). Secondly, by sharingonly weights towards a specific class, communication cost is substantiallyreduced, making TPLF efficient in terms of both accuracy and communicationcost. The results of TPFL demonstrated the highest accuracy on three differentdatasets; namely MNIST, FashionMNIST and FEMNIST.

机器学习（ML）领域在新模型和用户数据处理方法方面发生了日新月异的变化。大部分工作都集中在基于深度学习（DL）的方法上。然而，随着 Tsetlin Machine（TM）算法等新算法的出现，人们对探索替代方法的兴趣与日俱增，这些方法可能会在某些领域或应用中提供独特的优势。其中一个领域是联合学习（FL），在该领域中，用户隐私至关重要。由于其新颖性，FL 在个性化条件下为提高模型准确性同时维护用户隐私而采用个性化技术的情况激增。在这项工作中，我们提出了一种被称为 TPFL：Tsetlin-个性化联合学习的新方法，根据模型对特定类别的置信度将其分组。通过这种方式，聚类可以受益于两个关键优势。首先，客户只分享他们有信心的内容，从而避免了客户之间错误的权重划分，因为在训练过程中，客户对特定类别的数据可能不够充分。这种现象在数据非独立和相同分布（non-IID）的情况下非常普遍。其次，通过只共享特定类别的权重，通信成本大大降低，使得 TPLF 在准确性和通信成本方面都很高效。TPFL 在三个不同的数据集（即 MNIST、FashionMNIST 和 FEMNIST）上取得了最高的准确率。

{"title":"TPFL: Tsetlin-Personalized Federated Learning with Confidence-Based Clustering","authors":"Rasoul Jafari Gohari, Laya Aliahmadipour, Ezat Valipour","doi":"arxiv-2409.10392","DOIUrl":"https://doi.org/arxiv-2409.10392","url":null,"abstract":"The world of Machine Learning (ML) has witnessed rapid changes in terms of\u0000new models and ways to process users data. The majority of work that has been\u0000done is focused on Deep Learning (DL) based approaches. However, with the\u0000emergence of new algorithms such as the Tsetlin Machine (TM) algorithm, there\u0000is growing interest in exploring alternative approaches that may offer unique\u0000advantages in certain domains or applications. One of these domains is\u0000Federated Learning (FL), in which users privacy is of utmost importance. Due to\u0000its novelty, FL has seen a surge in the incorporation of personalization\u0000techniques to enhance model accuracy while maintaining user privacy under\u0000personalized conditions. In this work, we propose a novel approach dubbed TPFL:\u0000Tsetlin-Personalized Federated Learning, in which models are grouped into\u0000clusters based on their confidence towards a specific class. In this way,\u0000clustering can benefit from two key advantages. Firstly, clients share only\u0000what they are confident about, resulting in the elimination of wrongful weight\u0000aggregation among clients whose data for a specific class may have not been\u0000enough during the training. This phenomenon is prevalent when the data are\u0000non-Independent and Identically Distributed (non-IID). Secondly, by sharing\u0000only weights towards a specific class, communication cost is substantially\u0000reduced, making TPLF efficient in terms of both accuracy and communication\u0000cost. The results of TPFL demonstrated the highest accuracy on three different\u0000datasets; namely MNIST, FashionMNIST and FEMNIST.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"52 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0