2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)最新文献_第2页

APPFIS: An Advanced Parallel Programming Framework for Iterative Stencil Based Scientific Applications in HPC Environments 基于迭代模板的HPC环境下科学应用的高级并行编程框架

2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)

Pub Date : 2022-07-01 DOI: 10.1109/ISPDC55340.2022.00019

Md Bulbul Sharif, S. Ghafoor

Developing performant parallel applications for the distributed environment is challenging and requires expertise in both the HPC system and the application domain. We have developed a C++-based framework called APPFIS that hides the system complexities by providing an easy-to-use interface for developing performance portable structured grid-based stencil applications. APPFIS’s user interface is hardware agnostic and provides partitioning, code optimization, and automatic communication for stencil applications in distributed HPC environment. In addition, it offers straightforward APIs for utilizing multiple GPU accelerators, shared memory, and node-level parallelizations with automatic optimization for computation and communication overlapping. We have tested the functionality and performance of APPFIS using several applications on three platforms (Stampede2 at Texas Advanced Computing Center, Bridges-2 at Pittsburgh Supercomputing Center, and Summit Supercomputer at Oak Ridge National Laboratory). Experimental results show comparable performance to hand-tuned code with an excellent strong and weak scalability up to 4096 CPUs and 384 GPUs.

为分布式环境开发高性能并行应用程序具有挑战性，需要HPC系统和应用程序领域的专业知识。我们开发了一个名为APPFIS的基于c++的框架，它通过为开发性能可移植的基于网格的结构化模板应用程序提供一个易于使用的界面，从而隐藏了系统的复杂性。APPFIS的用户界面与硬件无关，并为分布式HPC环境中的模板应用程序提供分区、代码优化和自动通信。此外，它还提供了直接的api，用于利用多个GPU加速器、共享内存和节点级并行，并自动优化计算和通信重叠。我们使用三个平台上的几个应用程序测试了APPFIS的功能和性能(德克萨斯高级计算中心的Stampede2、匹兹堡超级计算中心的Bridges-2和橡树岭国家实验室的Summit超级计算机)。实验结果表明，在4096个cpu和384个gpu的情况下，具有良好的强扩展性和弱扩展性，性能与手工调优代码相当。

{"title":"APPFIS: An Advanced Parallel Programming Framework for Iterative Stencil Based Scientific Applications in HPC Environments","authors":"Md Bulbul Sharif, S. Ghafoor","doi":"10.1109/ISPDC55340.2022.00019","DOIUrl":"https://doi.org/10.1109/ISPDC55340.2022.00019","url":null,"abstract":"Developing performant parallel applications for the distributed environment is challenging and requires expertise in both the HPC system and the application domain. We have developed a C++-based framework called APPFIS that hides the system complexities by providing an easy-to-use interface for developing performance portable structured grid-based stencil applications. APPFIS’s user interface is hardware agnostic and provides partitioning, code optimization, and automatic communication for stencil applications in distributed HPC environment. In addition, it offers straightforward APIs for utilizing multiple GPU accelerators, shared memory, and node-level parallelizations with automatic optimization for computation and communication overlapping. We have tested the functionality and performance of APPFIS using several applications on three platforms (Stampede2 at Texas Advanced Computing Center, Bridges-2 at Pittsburgh Supercomputing Center, and Summit Supercomputer at Oak Ridge National Laboratory). Experimental results show comparable performance to hand-tuned code with an excellent strong and weak scalability up to 4096 CPUs and 384 GPUs.","PeriodicalId":389334,"journal":{"name":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134044313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Analysis and Mitigation of Soft-Errors on High Performance Embedded GPUs 高性能嵌入式gpu的软错误分析与缓解

2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)

Pub Date : 2022-07-01 DOI: 10.1109/ISPDC55340.2022.00022

L. Sterpone, S. Azimi, C. D. Sio, Filippo Parisi

Multiprocessor system-on-chip such as embedded GPUs are becoming very popular in safety-critical applications, such as autonomous and semi-autonomous vehicles. However, these devices can suffer from the effects of soft-errors, such as those produced by radiation effects. These effects are able to generate unpredictable misbehaviors. Fault tolerance oriented to multi-threaded software introduces severe performance degradations due to the redundancy, voting and correction threads operations. In this paper, we propose a new fault injection environment for NVIDIA GPGPU devices and a fault tolerance approach based on error detection and correction threads executed during data transfer operations on embedded GPUs. The fault injection environment is capable of automatically injecting faults into the instructions at SASS level by instrumenting the CUDA binary executable file. The mitigation approach is based on concurrent error detection threads running simultaneously with the memory stream device to host data transfer operations. With several benchmark applications, we evaluate the impact of soft- errors classifying Silent Data Corruption, Detection, Unrecoverable Error and Hang. Finally, the proposed mitigation approach has been validated by soft-error fault injection campaigns on an NVIDIA Pascal Architecture GPU controlled by Quad-Core A57 ARM processor (JETSON TX2) demonstrating an advantage of more than 37% with respect to state of the art solution.

嵌入式gpu等多处理器片上系统在自动驾驶和半自动驾驶汽车等安全关键应用中变得非常流行。然而，这些设备可能受到软误差的影响，例如由辐射效应产生的那些误差。这些影响会产生不可预测的不良行为。面向多线程软件的容错由于冗余、投票和纠错线程操作导致了严重的性能下降。本文提出了一种新的NVIDIA GPGPU设备故障注入环境，并提出了一种基于嵌入式gpu数据传输过程中执行的错误检测和纠错线程的容错方法。故障注入环境能够通过检测CUDA二进制可执行文件自动将故障注入SASS级别的指令中。缓解方法基于与内存流设备同时运行以承载数据传输操作的并发错误检测线程。通过几个基准测试应用程序，我们评估了软错误对静默数据损坏、检测、不可恢复错误和挂起的影响。最后，在由四核A57 ARM处理器(JETSON TX2)控制的NVIDIA Pascal架构GPU上，通过软错误故障注入活动验证了所提出的缓解方法，表明相对于最先进的解决方案，该方法的优势超过37%。

{"title":"Analysis and Mitigation of Soft-Errors on High Performance Embedded GPUs","authors":"L. Sterpone, S. Azimi, C. D. Sio, Filippo Parisi","doi":"10.1109/ISPDC55340.2022.00022","DOIUrl":"https://doi.org/10.1109/ISPDC55340.2022.00022","url":null,"abstract":"Multiprocessor system-on-chip such as embedded GPUs are becoming very popular in safety-critical applications, such as autonomous and semi-autonomous vehicles. However, these devices can suffer from the effects of soft-errors, such as those produced by radiation effects. These effects are able to generate unpredictable misbehaviors. Fault tolerance oriented to multi-threaded software introduces severe performance degradations due to the redundancy, voting and correction threads operations. In this paper, we propose a new fault injection environment for NVIDIA GPGPU devices and a fault tolerance approach based on error detection and correction threads executed during data transfer operations on embedded GPUs. The fault injection environment is capable of automatically injecting faults into the instructions at SASS level by instrumenting the CUDA binary executable file. The mitigation approach is based on concurrent error detection threads running simultaneously with the memory stream device to host data transfer operations. With several benchmark applications, we evaluate the impact of soft- errors classifying Silent Data Corruption, Detection, Unrecoverable Error and Hang. Finally, the proposed mitigation approach has been validated by soft-error fault injection campaigns on an NVIDIA Pascal Architecture GPU controlled by Quad-Core A57 ARM processor (JETSON TX2) demonstrating an advantage of more than 37% with respect to state of the art solution.","PeriodicalId":389334,"journal":{"name":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122610678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

TrustS: Probability-based trust management system in smart cities 信任:智慧城市基于概率的信任管理系统

2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)

Pub Date : 2022-07-01 DOI: 10.1109/ISPDC55340.2022.00018

Bogdan-Costel Mocanu, Gabriel-Cosmin Apostol, Dragos-Mihai Radulescu, Cristina Serbanescu

We live in the era of smart cities, but are the cities of today as smart as they claimƒ So far, this desideratum has been on the agenda of almost all decision-making factors worldwide and since then, smart cities have not hit their full potential. Even though there have been achieved important milestones in this process, we assume that the next generation of smart cities will improve the quality of life for citizens through connected and ubiquitous roads, vehicles, and urban environments. When we take into consideration such a massive amount of devices and data, it is crucial to ensure that the interconnected systems and trusted and reliable. Classical approaches for security are not suitable in these scenarios because of the heterogeneity of the interconnected nodes. Thus, in this paper, we present a trust management system for smart cities based on a novel Markov trust computation approach that is not dependent on the type of overlay network. We define a markovian system with four states for which we compute the stationary probabilities. We validate or model through simulation considering different characteristics of proposed systems. The results show that our trust model has a deterministic behavior and is suitable for smart city scenarios.

我们生活在智慧城市的时代，但今天的城市是否像他们声称的那样智能?到目前为止，这一愿望已经被提上了全球几乎所有决策因素的议程，从那时起，智慧城市并没有充分发挥其潜力。尽管在这一过程中已经取得了重要的里程碑，但我们认为下一代智慧城市将通过互联和无处不在的道路、车辆和城市环境来改善市民的生活质量。当我们考虑到如此大量的设备和数据时，确保互联系统的可信和可靠至关重要。由于互联节点的异构性，传统的安全方法不适合这些场景。因此，在本文中，我们提出了一种基于新颖的不依赖于覆盖网络类型的马尔可夫信任计算方法的智慧城市信任管理系统。我们定义了一个具有四个状态的马尔可夫系统，并计算了其平稳概率。考虑到所提出系统的不同特性，我们通过仿真验证或建模。结果表明，该信任模型具有确定性行为，适用于智慧城市场景。

{"title":"TrustS: Probability-based trust management system in smart cities","authors":"Bogdan-Costel Mocanu, Gabriel-Cosmin Apostol, Dragos-Mihai Radulescu, Cristina Serbanescu","doi":"10.1109/ISPDC55340.2022.00018","DOIUrl":"https://doi.org/10.1109/ISPDC55340.2022.00018","url":null,"abstract":"We live in the era of smart cities, but are the cities of today as smart as they claimƒ So far, this desideratum has been on the agenda of almost all decision-making factors worldwide and since then, smart cities have not hit their full potential. Even though there have been achieved important milestones in this process, we assume that the next generation of smart cities will improve the quality of life for citizens through connected and ubiquitous roads, vehicles, and urban environments. When we take into consideration such a massive amount of devices and data, it is crucial to ensure that the interconnected systems and trusted and reliable. Classical approaches for security are not suitable in these scenarios because of the heterogeneity of the interconnected nodes. Thus, in this paper, we present a trust management system for smart cities based on a novel Markov trust computation approach that is not dependent on the type of overlay network. We define a markovian system with four states for which we compute the stationary probabilities. We validate or model through simulation considering different characteristics of proposed systems. The results show that our trust model has a deterministic behavior and is suitable for smart city scenarios.","PeriodicalId":389334,"journal":{"name":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129314591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Coarse-Grained Floorplanning for streaming CNN applications on Multi-Die FPGAs 多模fpga流CNN应用的粗粒度平面规划

2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)

Pub Date : 2022-07-01 DOI: 10.1109/ISPDC55340.2022.00014

Danielle Tchuinkou Kwadjo, Erman Nghonda Tchinda, C. Bobda

With the vast adoption of FPGAs in the cloud, it becomes necessary to investigate architectures and mechanisms for the efficient deployment of CNN into multi-FPGAs cloud Infrastructure. However, neural networks’ growing size and complexity, coupled with communication and off-chip memory bottlenecks, make it increasingly difficult for multi-FPGA designs to achieve high resource utilization. In this work, we introduce a scalable framework that supports the efficient integration of CNN applications into a cloud infrastructure that exposes multi-Die FPGAs to cloud developers. Our framework is equipped is with two mechanisms to facilitate the deployment of CNN inference on FPGA. First, we propose a model to find the parameters that maximize the parallelism within the resource budget while maintaining a balanced rate between the layers. Then, we propose an efficient Coarse-Grained graph partitioning algorithm for high-quality and scalable routability-drive placement of CNN’s components on the FPGAs. Prototyping results achieve an overall 37% higher frequency, with lower resource usage compared to a baseline implementation on the same number of FPGAs.

随着fpga在云端的广泛应用，有必要研究将CNN高效部署到多fpga云基础设施中的架构和机制。然而，神经网络的日益庞大和复杂，加上通信和片外存储器的瓶颈，使得多fpga设计越来越难以实现高资源利用率。在这项工作中，我们引入了一个可扩展的框架，该框架支持将CNN应用程序有效地集成到云基础设施中，从而向云开发人员公开多芯片fpga。我们的框架配备了两种机制来促进CNN推理在FPGA上的部署。首先，我们提出了一个模型来找到在资源预算内最大化并行性的参数，同时保持层之间的平衡速率。然后，我们提出了一种高效的粗粒度图划分算法，用于在fpga上放置CNN组件的高质量和可扩展的路由驱动。与相同数量的fpga的基线实现相比，原型结果总体上实现了37%的高频率，资源使用量更低。

{"title":"Coarse-Grained Floorplanning for streaming CNN applications on Multi-Die FPGAs","authors":"Danielle Tchuinkou Kwadjo, Erman Nghonda Tchinda, C. Bobda","doi":"10.1109/ISPDC55340.2022.00014","DOIUrl":"https://doi.org/10.1109/ISPDC55340.2022.00014","url":null,"abstract":"With the vast adoption of FPGAs in the cloud, it becomes necessary to investigate architectures and mechanisms for the efficient deployment of CNN into multi-FPGAs cloud Infrastructure. However, neural networks’ growing size and complexity, coupled with communication and off-chip memory bottlenecks, make it increasingly difficult for multi-FPGA designs to achieve high resource utilization. In this work, we introduce a scalable framework that supports the efficient integration of CNN applications into a cloud infrastructure that exposes multi-Die FPGAs to cloud developers. Our framework is equipped is with two mechanisms to facilitate the deployment of CNN inference on FPGA. First, we propose a model to find the parameters that maximize the parallelism within the resource budget while maintaining a balanced rate between the layers. Then, we propose an efficient Coarse-Grained graph partitioning algorithm for high-quality and scalable routability-drive placement of CNN’s components on the FPGAs. Prototyping results achieve an overall 37% higher frequency, with lower resource usage compared to a baseline implementation on the same number of FPGAs.","PeriodicalId":389334,"journal":{"name":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130010400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ZPaxos: An Asynchronous BFT Paxos with a Leaderless Synchronous Group ZPaxos:具有无领导同步组的异步BFT Paxos

2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)

Pub Date : 2022-07-01 DOI: 10.1109/ISPDC55340.2022.00025

D. D. Amarasekara, D. Ranasinghe

Increased resource overhead and low throughput are major issues associated with current BFT consensus systems. EPaxos [48], a leaderless crash fault-tolerant protocol can perform commits quickly within a single phase using a fast quorum provided that the commands are independent. If the commands have dependencies, EPaxos needs to run an additional phase on a majority quorum. As a result, it would lose its throughput due to the number of command instances it needs to keep in memory and the high volume of messages it needs to exchange with other replicas to reach consensus. Alternatively, XPaxos [43] which is a leader-based system solves practical BFT with minimal resources while assuring high throughput. This paper describes an improved consensus mechanism, ZPaxos, which is an enhanced EPaxos embedding two key features of XPaxos i.e., a synchronous group with fault detection, and a recovery protocol under weak asynchrony assumptions. The new protocol achieves consensus in a single round with a leaderless synchronous group which allows the removal and addition of a node at a time whenever it sees a crashed or a byzantine replica while the system continues servicing requests without any significant interruptions throughout view synchrony. Transaction throughput and latency of ZPaxos which can handle both byzantine and crash failures exhibit superior performance to that of EPaxos which handles crash failures only, which are two leaderless core group architectures.

增加的资源开销和低吞吐量是当前BFT共识系统相关的主要问题。EPaxos[48]是一种无领导的崩溃容错协议，只要命令是独立的，就可以使用快速仲裁在单个阶段内快速执行提交。如果命令有依赖项，EPaxos需要在多数仲裁上运行一个额外的阶段。因此，由于需要保留在内存中的命令实例的数量以及需要与其他副本交换以达成共识的大量消息，它将失去吞吐量。XPaxos[43]是一种基于leader的系统，在保证高吞吐量的同时，用最少的资源解决了实际的BFT问题。本文描述了一种改进的共识机制ZPaxos，它是一种增强的EPaxos，它嵌入了XPaxos的两个关键特性，即具有故障检测的同步组和弱异步假设下的恢复协议。新协议在一个无领导同步组的单轮中达成共识，允许每次看到崩溃或拜占庭副本时删除和添加节点，同时系统继续服务请求，而不会在整个视图同步过程中出现任何重大中断。可以处理拜占庭和崩溃故障的ZPaxos的事务吞吐量和延迟表现出优于仅处理崩溃故障的EPaxos的性能，这是两个无领导核心组架构。

{"title":"ZPaxos: An Asynchronous BFT Paxos with a Leaderless Synchronous Group","authors":"D. D. Amarasekara, D. Ranasinghe","doi":"10.1109/ISPDC55340.2022.00025","DOIUrl":"https://doi.org/10.1109/ISPDC55340.2022.00025","url":null,"abstract":"Increased resource overhead and low throughput are major issues associated with current BFT consensus systems. EPaxos [48], a leaderless crash fault-tolerant protocol can perform commits quickly within a single phase using a fast quorum provided that the commands are independent. If the commands have dependencies, EPaxos needs to run an additional phase on a majority quorum. As a result, it would lose its throughput due to the number of command instances it needs to keep in memory and the high volume of messages it needs to exchange with other replicas to reach consensus. Alternatively, XPaxos [43] which is a leader-based system solves practical BFT with minimal resources while assuring high throughput. This paper describes an improved consensus mechanism, ZPaxos, which is an enhanced EPaxos embedding two key features of XPaxos i.e., a synchronous group with fault detection, and a recovery protocol under weak asynchrony assumptions. The new protocol achieves consensus in a single round with a leaderless synchronous group which allows the removal and addition of a node at a time whenever it sees a crashed or a byzantine replica while the system continues servicing requests without any significant interruptions throughout view synchrony. Transaction throughput and latency of ZPaxos which can handle both byzantine and crash failures exhibit superior performance to that of EPaxos which handles crash failures only, which are two leaderless core group architectures.","PeriodicalId":389334,"journal":{"name":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"31 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132758178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FlexiShard: a Flexible Sharding Scheme for Blockchain based on a Hybrid Fault Model FlexiShard:一种基于混合故障模型的区块链灵活分片方案

2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)

Pub Date : 2022-07-01 DOI: 10.1109/ISPDC55340.2022.00011

Tirathraj Ramburn, D. Goswami

One of the major bottlenecks of traditional Blockchain is its low throughput resulting in poor scalability. One way to increase throughput is to shard the network nodes to form smaller groups (shards). There are a number of sharding schemes in the literature with a common goal: nodes are split into groups to concurrently process different sets of transactions. Parallelism is used to enhance scalability, however with a trade-off in fault-tolerance; i.e., the smaller the shard size is, the better is the performance but higher is the fault probability. Contemporary sharding schemes use variants of Byzantine Fault Tolerance (BFT) protocol as their intra-shard consensus algorithms. BFT gives good performance when shard sizes are kept relatively small and maximum allowable faults is below some threshold. However, all these systems make rigid assumptions about their shard sizes and maximum allowable faults which may not be practical at times. In recent years, there have been more practical hybrid fault models in the literature which are better applicable to Blockchain (e.g., hybrid of Byzantine and alive-but-corrupt (abc) faults where the latter only compromises on safety) and corresponding consensus protocols that offer flexibility in choice of fault types and quorum sizes, e.g., Flexible Byzantine Fault Tolerance (Flexible BFT). In this paper, we present a new sharding scheme, FlexiShard, that uses Flexible BFT as its intra-shard consensus algorithm. FlexiShard leverages the notion of flexible Byzantine quorums and the hybrid fault model introduced in Flexible BFT that comprises of Byzantine and abc faults. Use of Flexible BFT allows flexibility in the choice of fault types and choosing shard sizes based on a range of allowable fault thresholds. Additionally, it allows to form shards that can tolerate more total faults than traditional BFT shards of similar size, and hence can deliver similar performance but with more fault-tolerance. To the best of our knowledge, FlexiShard is the first application of Flexible BFT and the hybrid fault model to Blockchain and its sharding. A theoretical analysis of FlexiShard is presented which demonstrates its flexibility and advantages over traditional sharding schemes.

传统区块链的主要瓶颈之一是吞吐量低，导致可伸缩性差。提高吞吐量的一种方法是对网络节点进行分片，形成更小的组(分片)。文献中有许多分片方案，它们都有一个共同的目标:将节点分成组，以并发地处理不同的事务集。并行性用于增强可伸缩性，但是要在容错性方面进行权衡;即分片大小越小，性能越好，但故障概率越高。当代分片方案使用拜占庭容错(BFT)协议的变体作为分片内共识算法。当分片大小保持相对较小且最大允许错误低于某个阈值时，BFT提供了良好的性能。然而，所有这些系统都对它们的分片大小和最大允许错误做出了严格的假设，这有时可能不实用。近年来，文献中出现了更实用的混合故障模型，这些模型更适用于b区块链(例如，拜占庭和活但损坏(abc)故障的混合，后者仅损害安全性)和相应的共识协议，这些协议在故障类型和仲裁规模的选择上提供了灵活性，例如柔性拜占庭容错(Flexible Byzantine fault Tolerance，简称Flexible BFT)。在本文中，我们提出了一种新的分片方案FlexiShard，它使用Flexible BFT作为其分片内一致性算法。FlexiShard利用了灵活Byzantine quorum的概念和flexible BFT中引入的混合故障模型，该模型由Byzantine和abc故障组成。使用灵活的BFT可以灵活地选择故障类型，并根据可允许的故障阈值范围选择分片大小。此外，它允许形成的分片比相同大小的传统BFT分片能够容忍更多的总错误，因此可以提供类似的性能，但具有更高的容错性。据我们所知，FlexiShard是首次将Flexible BFT和混合故障模型应用于区块链及其分片。对FlexiShard进行了理论分析，证明了其灵活性和优于传统分片方案的优点。

{"title":"FlexiShard: a Flexible Sharding Scheme for Blockchain based on a Hybrid Fault Model","authors":"Tirathraj Ramburn, D. Goswami","doi":"10.1109/ISPDC55340.2022.00011","DOIUrl":"https://doi.org/10.1109/ISPDC55340.2022.00011","url":null,"abstract":"One of the major bottlenecks of traditional Blockchain is its low throughput resulting in poor scalability. One way to increase throughput is to shard the network nodes to form smaller groups (shards). There are a number of sharding schemes in the literature with a common goal: nodes are split into groups to concurrently process different sets of transactions. Parallelism is used to enhance scalability, however with a trade-off in fault-tolerance; i.e., the smaller the shard size is, the better is the performance but higher is the fault probability. Contemporary sharding schemes use variants of Byzantine Fault Tolerance (BFT) protocol as their intra-shard consensus algorithms. BFT gives good performance when shard sizes are kept relatively small and maximum allowable faults is below some threshold. However, all these systems make rigid assumptions about their shard sizes and maximum allowable faults which may not be practical at times. In recent years, there have been more practical hybrid fault models in the literature which are better applicable to Blockchain (e.g., hybrid of Byzantine and alive-but-corrupt (abc) faults where the latter only compromises on safety) and corresponding consensus protocols that offer flexibility in choice of fault types and quorum sizes, e.g., Flexible Byzantine Fault Tolerance (Flexible BFT). In this paper, we present a new sharding scheme, FlexiShard, that uses Flexible BFT as its intra-shard consensus algorithm. FlexiShard leverages the notion of flexible Byzantine quorums and the hybrid fault model introduced in Flexible BFT that comprises of Byzantine and abc faults. Use of Flexible BFT allows flexibility in the choice of fault types and choosing shard sizes based on a range of allowable fault thresholds. Additionally, it allows to form shards that can tolerate more total faults than traditional BFT shards of similar size, and hence can deliver similar performance but with more fault-tolerance. To the best of our knowledge, FlexiShard is the first application of Flexible BFT and the hybrid fault model to Blockchain and its sharding. A theoretical analysis of FlexiShard is presented which demonstrates its flexibility and advantages over traditional sharding schemes.","PeriodicalId":389334,"journal":{"name":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132019811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Optimizing the Resource and Job Management System of an Academic HPC & Research Computing Facility 学术高性能计算与科研计算设备资源与作业管理系统的优化

2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)

Pub Date : 2022-07-01 DOI: 10.1109/ISPDC55340.2022.00027

S. Varrette, Emmanuel Kieffer, F. Pinel

High Performance Computing (HPC) is nowadays a strategic asset required to sustain the surging demands for massive processing and data-analytic capabilities. In practice, the effective management of such large scale and distributed computing infrastructures is left to a Resource and Job Management System (RJMS). This essential middleware component is responsible for managing the computing resources, handling user requests to allocate resources while providing an optimized framework for starting, executing and monitoring jobs on the allocated resources. The University of Luxembourg has been operating for 15 years a large academic HPC facility which relies since 2017 on the Slurm RJMS introduced on top of the flagship cluster Iris. The acquisition of a new liquid-cooled supercomputer named Aion which was released in 2021 was the occasion to deeply review and optimize the seminal Slurm configuration, the resource limits defined and the sustaining fairsharing algorithm.This paper presents the outcomes of this study and details the implemented RJMS policy. The impact of the decisions made over the supercomputers workloads is also described. In particular, the performance evaluation conducted highlights that when compared to the seminal configuration, the described and implemented environment brought concrete and measurable improvements with regards the platform utilization (+12.64%), the jobs efficiency (as measured by the average Wall-time Request Accuracy, improved by 110.81%) or the management and funding (increased by 10%). The systems demonstrated sustainable and scalable HPC performances, and this effort has led to a negligible penalty on the average slowdown metric (response time normalized by runtime), which was increased by 0.59% for job workloads covering a complete year of exercise. Overall, this new setup has been in production for 18 months on both supercomputers and the updated model proves to bring a fairer and more satisfying experience to the end users. The proposed configurations and policies may help other HPC centres when designing or improving the RJMS sustaining the job scheduling strategy at the advent of computing capacity expansions.

高性能计算(HPC)现在是一种战略资产，需要维持对大规模处理和数据分析能力不断增长的需求。在实践中，这种大规模和分布式计算基础设施的有效管理留给了资源和作业管理系统(RJMS)。这个重要的中间件组件负责管理计算资源，处理分配资源的用户请求，同时为启动、执行和监视分配资源上的作业提供优化的框架。卢森堡大学的大型学术高性能计算设施已经运营了15年，该设施自2017年以来一直依赖于在旗舰集群Iris之上引入的Slurm RJMS。收购2021年发布的新型液冷超级计算机“Aion”，是对开创性的Slurm配置、资源限制定义和持续公平共享算法进行深入审查和优化的机会。本文介绍了本研究的结果，并详细介绍了实施的RJMS策略。还描述了对超级计算机工作负载做出的决策的影响。特别是，所进行的性能评估强调，与原始配置相比，所描述和实现的环境在平台利用率(+12.64%)、工作效率(通过平均Wall-time请求精度测量，提高了110.81%)或管理和资金(增加了10%)方面带来了具体和可衡量的改进。系统展示了可持续和可扩展的HPC性能，并且这一努力在平均减速指标(按运行时标准化的响应时间)上的损失可以忽略不计，对于覆盖一整年的作业负载，该指标增加了0.59%。总的来说，这个新设置已经在两台超级计算机上运行了18个月，更新后的模型证明给最终用户带来了更公平、更令人满意的体验。所提出的配置和策略可以帮助其他HPC中心在设计或改进RJMS时，在计算能力扩展时维持作业调度策略。

{"title":"Optimizing the Resource and Job Management System of an Academic HPC & Research Computing Facility","authors":"S. Varrette, Emmanuel Kieffer, F. Pinel","doi":"10.1109/ISPDC55340.2022.00027","DOIUrl":"https://doi.org/10.1109/ISPDC55340.2022.00027","url":null,"abstract":"High Performance Computing (HPC) is nowadays a strategic asset required to sustain the surging demands for massive processing and data-analytic capabilities. In practice, the effective management of such large scale and distributed computing infrastructures is left to a Resource and Job Management System (RJMS). This essential middleware component is responsible for managing the computing resources, handling user requests to allocate resources while providing an optimized framework for starting, executing and monitoring jobs on the allocated resources. The University of Luxembourg has been operating for 15 years a large academic HPC facility which relies since 2017 on the Slurm RJMS introduced on top of the flagship cluster Iris. The acquisition of a new liquid-cooled supercomputer named Aion which was released in 2021 was the occasion to deeply review and optimize the seminal Slurm configuration, the resource limits defined and the sustaining fairsharing algorithm.This paper presents the outcomes of this study and details the implemented RJMS policy. The impact of the decisions made over the supercomputers workloads is also described. In particular, the performance evaluation conducted highlights that when compared to the seminal configuration, the described and implemented environment brought concrete and measurable improvements with regards the platform utilization (+12.64%), the jobs efficiency (as measured by the average Wall-time Request Accuracy, improved by 110.81%) or the management and funding (increased by 10%). The systems demonstrated sustainable and scalable HPC performances, and this effort has led to a negligible penalty on the average slowdown metric (response time normalized by runtime), which was increased by 0.59% for job workloads covering a complete year of exercise. Overall, this new setup has been in production for 18 months on both supercomputers and the updated model proves to bring a fairer and more satisfying experience to the end users. The proposed configurations and policies may help other HPC centres when designing or improving the RJMS sustaining the job scheduling strategy at the advent of computing capacity expansions.","PeriodicalId":389334,"journal":{"name":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127172828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Investigating TCP/MPTCP Support for Drop Computing in User Space Network Stacks 研究用户空间网络栈中TCP/MPTCP对Drop计算的支持

2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)

Pub Date : 2022-07-01 DOI: 10.1109/ISPDC55340.2022.00024

C. Stoica, Radu-Ioan Ciobanu, C. Dobre

The tremendous growth of smart devices in the past few years has brought new computational, storage and networking resources at the edge of the Internet. Using the recently-developed Drop Computing paradigm, these devices can be interconnected in an ad-hoc network, using said resources beyond local barriers in order to improve the latency and to remove the pressure from the core Internet network caused by the intensive usage of a centralized cloud architecture. In order to use the networking capabilities of the devices registered in a Drop Computing network in an optimal manner, multi-path TCP (MPTCP) is the key technology which allows the concurrent usage of all the devices’ networking interfaces, leading to a smoother reaction to failures, improved latency, and better throughput. Because of the high variety of devices in Drop Computing, some low-end devices can have stripped network stacks with many missing features, but the Linux Kernel Library (LKL) or User Mode Linux (UML) implementations are able to offer MPTCP and other features directly in user space.In this paper, we assess the feasibility and analyze the behaviour of TCP and MPTCP in Drop Computing-specific scenarios, using the Linux native network stack or the LKL/UML user space network stack. We demonstrate that MPTCP can be used successfully over any of them and that the most suitable one should be selected based on the hardware profile of the device used and the target software application. Furthermore, we prove that LKL and UML can be successfully utilised on low-end devices in order to allow them to use all their network interfaces and have a better failure handover solution.

在过去的几年里，智能设备的巨大增长为互联网的边缘带来了新的计算、存储和网络资源。使用最近开发的Drop Computing范式，这些设备可以在ad-hoc网络中互连，使用超出本地障碍的资源，以改善延迟并消除由集中式云架构的密集使用引起的核心互联网网络的压力。为了以最佳方式使用在Drop Computing网络中注册的设备的网络功能，多路径TCP (MPTCP)是允许并发使用所有设备网络接口的关键技术，从而导致更平滑的故障响应，改善延迟和更好的吞吐量。由于Drop Computing中的设备种类繁多，一些低端设备可能剥离了具有许多缺失特性的网络堆栈，但是Linux内核库(LKL)或用户模式Linux (UML)实现能够直接在用户空间中提供MPTCP和其他特性。在本文中，我们使用Linux本地网络栈或LKL/UML用户空间网络栈来评估和分析TCP和MPTCP在Drop计算特定场景中的可行性和行为。我们演示了MPTCP可以成功地在它们中的任何一个上使用，并且应该根据所使用设备的硬件配置文件和目标软件应用程序选择最合适的一个。此外，我们证明LKL和UML可以成功地用于低端设备，以允许它们使用所有的网络接口并具有更好的故障切换解决方案。

{"title":"Investigating TCP/MPTCP Support for Drop Computing in User Space Network Stacks","authors":"C. Stoica, Radu-Ioan Ciobanu, C. Dobre","doi":"10.1109/ISPDC55340.2022.00024","DOIUrl":"https://doi.org/10.1109/ISPDC55340.2022.00024","url":null,"abstract":"The tremendous growth of smart devices in the past few years has brought new computational, storage and networking resources at the edge of the Internet. Using the recently-developed Drop Computing paradigm, these devices can be interconnected in an ad-hoc network, using said resources beyond local barriers in order to improve the latency and to remove the pressure from the core Internet network caused by the intensive usage of a centralized cloud architecture. In order to use the networking capabilities of the devices registered in a Drop Computing network in an optimal manner, multi-path TCP (MPTCP) is the key technology which allows the concurrent usage of all the devices’ networking interfaces, leading to a smoother reaction to failures, improved latency, and better throughput. Because of the high variety of devices in Drop Computing, some low-end devices can have stripped network stacks with many missing features, but the Linux Kernel Library (LKL) or User Mode Linux (UML) implementations are able to offer MPTCP and other features directly in user space.In this paper, we assess the feasibility and analyze the behaviour of TCP and MPTCP in Drop Computing-specific scenarios, using the Linux native network stack or the LKL/UML user space network stack. We demonstrate that MPTCP can be used successfully over any of them and that the most suitable one should be selected based on the hardware profile of the device used and the target software application. Furthermore, we prove that LKL and UML can be successfully utilised on low-end devices in order to allow them to use all their network interfaces and have a better failure handover solution.","PeriodicalId":389334,"journal":{"name":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125790839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Online Event Selection for Mu3e using GPUs 使用gpu的Mu3e在线事件选择

2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)

Pub Date : 2022-06-23 DOI: 10.1109/ISPDC55340.2022.00012

Valentin Henkys, B. Schmidt, N. Berger

In the search for physics beyond the Standard Model the Mu3e experiment tries to observe the lepton flavor violating decay μ+ → e+e–e+. By observing the decay products of 1 • 108μ/s it aims to either observe the process, or set a new upper limit on its estimated branching ratio. The high muon rates result in high data rates of 80 Gbps, dominated by data produced through background processes. We present the Online Event Selection, a three step algorithm running on the graphics processing units (GPU) of the 12 Mu3e filter farm computers.By using simple and fast geometric selection criteria, the algorithm first reduces the amount of possible event candidates to below 5% of the initial set. These candidates are then used to reconstruct full particle tracks, correctly reconstructing over 97% of signal tracks. Finally a possible decay vertex is reconstructed using simple geometric considerations instead of a full reconstruction, correctly identifying over 94% of signal events.We also present a full implementation of the algorithm, fulfilling all performance requirements at the targeted muon rate and successfully reducing the data rate by a factor of 200.

在寻找超越标准模型的物理过程中，Mu3e实验试图观察违反衰变μ+→e+e - e+的轻子风味。通过观察1•108μ/s的衰变产物，目的是观察这一过程，或者为其估计的分支比设定一个新的上限。高介子速率导致80gbps的高数据速率，主要是通过后台进程产生的数据。我们提出了在线事件选择，这是一种运行在12 Mu3e滤波器农场计算机的图形处理单元(GPU)上的三步算法。该算法首先采用简单快速的几何选择准则，将可能的候选事件数量减少到初始集合的5%以下;然后使用这些候选者重建完整的粒子轨道，正确重建超过97%的信号轨道。最后，使用简单的几何考虑来重建可能的衰减顶点，而不是完整的重建，正确识别超过94%的信号事件。我们还提出了该算法的完整实现，在目标μ子速率下满足所有性能要求，并成功地将数据速率降低了200倍。

{"title":"Online Event Selection for Mu3e using GPUs","authors":"Valentin Henkys, B. Schmidt, N. Berger","doi":"10.1109/ISPDC55340.2022.00012","DOIUrl":"https://doi.org/10.1109/ISPDC55340.2022.00012","url":null,"abstract":"In the search for physics beyond the Standard Model the Mu3e experiment tries to observe the lepton flavor violating decay μ+ → e+e–e+. By observing the decay products of 1 • 108μ/s it aims to either observe the process, or set a new upper limit on its estimated branching ratio. The high muon rates result in high data rates of 80 Gbps, dominated by data produced through background processes. We present the Online Event Selection, a three step algorithm running on the graphics processing units (GPU) of the 12 Mu3e filter farm computers.By using simple and fast geometric selection criteria, the algorithm first reduces the amount of possible event candidates to below 5% of the initial set. These candidates are then used to reconstruct full particle tracks, correctly reconstructing over 97% of signal tracks. Finally a possible decay vertex is reconstructed using simple geometric considerations instead of a full reconstruction, correctly identifying over 94% of signal events.We also present a full implementation of the algorithm, fulfilling all performance requirements at the targeted muon rate and successfully reducing the data rate by a factor of 200.","PeriodicalId":389334,"journal":{"name":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132487672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3