Pub Date : 2022-07-01DOI: 10.1109/ISPDC55340.2022.00019
Md Bulbul Sharif, S. Ghafoor
Developing performant parallel applications for the distributed environment is challenging and requires expertise in both the HPC system and the application domain. We have developed a C++-based framework called APPFIS that hides the system complexities by providing an easy-to-use interface for developing performance portable structured grid-based stencil applications. APPFIS’s user interface is hardware agnostic and provides partitioning, code optimization, and automatic communication for stencil applications in distributed HPC environment. In addition, it offers straightforward APIs for utilizing multiple GPU accelerators, shared memory, and node-level parallelizations with automatic optimization for computation and communication overlapping. We have tested the functionality and performance of APPFIS using several applications on three platforms (Stampede2 at Texas Advanced Computing Center, Bridges-2 at Pittsburgh Supercomputing Center, and Summit Supercomputer at Oak Ridge National Laboratory). Experimental results show comparable performance to hand-tuned code with an excellent strong and weak scalability up to 4096 CPUs and 384 GPUs.
{"title":"APPFIS: An Advanced Parallel Programming Framework for Iterative Stencil Based Scientific Applications in HPC Environments","authors":"Md Bulbul Sharif, S. Ghafoor","doi":"10.1109/ISPDC55340.2022.00019","DOIUrl":"https://doi.org/10.1109/ISPDC55340.2022.00019","url":null,"abstract":"Developing performant parallel applications for the distributed environment is challenging and requires expertise in both the HPC system and the application domain. We have developed a C++-based framework called APPFIS that hides the system complexities by providing an easy-to-use interface for developing performance portable structured grid-based stencil applications. APPFIS’s user interface is hardware agnostic and provides partitioning, code optimization, and automatic communication for stencil applications in distributed HPC environment. In addition, it offers straightforward APIs for utilizing multiple GPU accelerators, shared memory, and node-level parallelizations with automatic optimization for computation and communication overlapping. We have tested the functionality and performance of APPFIS using several applications on three platforms (Stampede2 at Texas Advanced Computing Center, Bridges-2 at Pittsburgh Supercomputing Center, and Summit Supercomputer at Oak Ridge National Laboratory). Experimental results show comparable performance to hand-tuned code with an excellent strong and weak scalability up to 4096 CPUs and 384 GPUs.","PeriodicalId":389334,"journal":{"name":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134044313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1109/ISPDC55340.2022.00022
L. Sterpone, S. Azimi, C. D. Sio, Filippo Parisi
Multiprocessor system-on-chip such as embedded GPUs are becoming very popular in safety-critical applications, such as autonomous and semi-autonomous vehicles. However, these devices can suffer from the effects of soft-errors, such as those produced by radiation effects. These effects are able to generate unpredictable misbehaviors. Fault tolerance oriented to multi-threaded software introduces severe performance degradations due to the redundancy, voting and correction threads operations. In this paper, we propose a new fault injection environment for NVIDIA GPGPU devices and a fault tolerance approach based on error detection and correction threads executed during data transfer operations on embedded GPUs. The fault injection environment is capable of automatically injecting faults into the instructions at SASS level by instrumenting the CUDA binary executable file. The mitigation approach is based on concurrent error detection threads running simultaneously with the memory stream device to host data transfer operations. With several benchmark applications, we evaluate the impact of soft- errors classifying Silent Data Corruption, Detection, Unrecoverable Error and Hang. Finally, the proposed mitigation approach has been validated by soft-error fault injection campaigns on an NVIDIA Pascal Architecture GPU controlled by Quad-Core A57 ARM processor (JETSON TX2) demonstrating an advantage of more than 37% with respect to state of the art solution.
{"title":"Analysis and Mitigation of Soft-Errors on High Performance Embedded GPUs","authors":"L. Sterpone, S. Azimi, C. D. Sio, Filippo Parisi","doi":"10.1109/ISPDC55340.2022.00022","DOIUrl":"https://doi.org/10.1109/ISPDC55340.2022.00022","url":null,"abstract":"Multiprocessor system-on-chip such as embedded GPUs are becoming very popular in safety-critical applications, such as autonomous and semi-autonomous vehicles. However, these devices can suffer from the effects of soft-errors, such as those produced by radiation effects. These effects are able to generate unpredictable misbehaviors. Fault tolerance oriented to multi-threaded software introduces severe performance degradations due to the redundancy, voting and correction threads operations. In this paper, we propose a new fault injection environment for NVIDIA GPGPU devices and a fault tolerance approach based on error detection and correction threads executed during data transfer operations on embedded GPUs. The fault injection environment is capable of automatically injecting faults into the instructions at SASS level by instrumenting the CUDA binary executable file. The mitigation approach is based on concurrent error detection threads running simultaneously with the memory stream device to host data transfer operations. With several benchmark applications, we evaluate the impact of soft- errors classifying Silent Data Corruption, Detection, Unrecoverable Error and Hang. Finally, the proposed mitigation approach has been validated by soft-error fault injection campaigns on an NVIDIA Pascal Architecture GPU controlled by Quad-Core A57 ARM processor (JETSON TX2) demonstrating an advantage of more than 37% with respect to state of the art solution.","PeriodicalId":389334,"journal":{"name":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122610678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We live in the era of smart cities, but are the cities of today as smart as they claimƒ So far, this desideratum has been on the agenda of almost all decision-making factors worldwide and since then, smart cities have not hit their full potential. Even though there have been achieved important milestones in this process, we assume that the next generation of smart cities will improve the quality of life for citizens through connected and ubiquitous roads, vehicles, and urban environments. When we take into consideration such a massive amount of devices and data, it is crucial to ensure that the interconnected systems and trusted and reliable. Classical approaches for security are not suitable in these scenarios because of the heterogeneity of the interconnected nodes. Thus, in this paper, we present a trust management system for smart cities based on a novel Markov trust computation approach that is not dependent on the type of overlay network. We define a markovian system with four states for which we compute the stationary probabilities. We validate or model through simulation considering different characteristics of proposed systems. The results show that our trust model has a deterministic behavior and is suitable for smart city scenarios.
{"title":"TrustS: Probability-based trust management system in smart cities","authors":"Bogdan-Costel Mocanu, Gabriel-Cosmin Apostol, Dragos-Mihai Radulescu, Cristina Serbanescu","doi":"10.1109/ISPDC55340.2022.00018","DOIUrl":"https://doi.org/10.1109/ISPDC55340.2022.00018","url":null,"abstract":"We live in the era of smart cities, but are the cities of today as smart as they claimƒ So far, this desideratum has been on the agenda of almost all decision-making factors worldwide and since then, smart cities have not hit their full potential. Even though there have been achieved important milestones in this process, we assume that the next generation of smart cities will improve the quality of life for citizens through connected and ubiquitous roads, vehicles, and urban environments. When we take into consideration such a massive amount of devices and data, it is crucial to ensure that the interconnected systems and trusted and reliable. Classical approaches for security are not suitable in these scenarios because of the heterogeneity of the interconnected nodes. Thus, in this paper, we present a trust management system for smart cities based on a novel Markov trust computation approach that is not dependent on the type of overlay network. We define a markovian system with four states for which we compute the stationary probabilities. We validate or model through simulation considering different characteristics of proposed systems. The results show that our trust model has a deterministic behavior and is suitable for smart city scenarios.","PeriodicalId":389334,"journal":{"name":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129314591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1109/ISPDC55340.2022.00014
Danielle Tchuinkou Kwadjo, Erman Nghonda Tchinda, C. Bobda
With the vast adoption of FPGAs in the cloud, it becomes necessary to investigate architectures and mechanisms for the efficient deployment of CNN into multi-FPGAs cloud Infrastructure. However, neural networks’ growing size and complexity, coupled with communication and off-chip memory bottlenecks, make it increasingly difficult for multi-FPGA designs to achieve high resource utilization. In this work, we introduce a scalable framework that supports the efficient integration of CNN applications into a cloud infrastructure that exposes multi-Die FPGAs to cloud developers. Our framework is equipped is with two mechanisms to facilitate the deployment of CNN inference on FPGA. First, we propose a model to find the parameters that maximize the parallelism within the resource budget while maintaining a balanced rate between the layers. Then, we propose an efficient Coarse-Grained graph partitioning algorithm for high-quality and scalable routability-drive placement of CNN’s components on the FPGAs. Prototyping results achieve an overall 37% higher frequency, with lower resource usage compared to a baseline implementation on the same number of FPGAs.
{"title":"Coarse-Grained Floorplanning for streaming CNN applications on Multi-Die FPGAs","authors":"Danielle Tchuinkou Kwadjo, Erman Nghonda Tchinda, C. Bobda","doi":"10.1109/ISPDC55340.2022.00014","DOIUrl":"https://doi.org/10.1109/ISPDC55340.2022.00014","url":null,"abstract":"With the vast adoption of FPGAs in the cloud, it becomes necessary to investigate architectures and mechanisms for the efficient deployment of CNN into multi-FPGAs cloud Infrastructure. However, neural networks’ growing size and complexity, coupled with communication and off-chip memory bottlenecks, make it increasingly difficult for multi-FPGA designs to achieve high resource utilization. In this work, we introduce a scalable framework that supports the efficient integration of CNN applications into a cloud infrastructure that exposes multi-Die FPGAs to cloud developers. Our framework is equipped is with two mechanisms to facilitate the deployment of CNN inference on FPGA. First, we propose a model to find the parameters that maximize the parallelism within the resource budget while maintaining a balanced rate between the layers. Then, we propose an efficient Coarse-Grained graph partitioning algorithm for high-quality and scalable routability-drive placement of CNN’s components on the FPGAs. Prototyping results achieve an overall 37% higher frequency, with lower resource usage compared to a baseline implementation on the same number of FPGAs.","PeriodicalId":389334,"journal":{"name":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130010400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1109/ISPDC55340.2022.00025
D. D. Amarasekara, D. Ranasinghe
Increased resource overhead and low throughput are major issues associated with current BFT consensus systems. EPaxos [48], a leaderless crash fault-tolerant protocol can perform commits quickly within a single phase using a fast quorum provided that the commands are independent. If the commands have dependencies, EPaxos needs to run an additional phase on a majority quorum. As a result, it would lose its throughput due to the number of command instances it needs to keep in memory and the high volume of messages it needs to exchange with other replicas to reach consensus. Alternatively, XPaxos [43] which is a leader-based system solves practical BFT with minimal resources while assuring high throughput. This paper describes an improved consensus mechanism, ZPaxos, which is an enhanced EPaxos embedding two key features of XPaxos i.e., a synchronous group with fault detection, and a recovery protocol under weak asynchrony assumptions. The new protocol achieves consensus in a single round with a leaderless synchronous group which allows the removal and addition of a node at a time whenever it sees a crashed or a byzantine replica while the system continues servicing requests without any significant interruptions throughout view synchrony. Transaction throughput and latency of ZPaxos which can handle both byzantine and crash failures exhibit superior performance to that of EPaxos which handles crash failures only, which are two leaderless core group architectures.
{"title":"ZPaxos: An Asynchronous BFT Paxos with a Leaderless Synchronous Group","authors":"D. D. Amarasekara, D. Ranasinghe","doi":"10.1109/ISPDC55340.2022.00025","DOIUrl":"https://doi.org/10.1109/ISPDC55340.2022.00025","url":null,"abstract":"Increased resource overhead and low throughput are major issues associated with current BFT consensus systems. EPaxos [48], a leaderless crash fault-tolerant protocol can perform commits quickly within a single phase using a fast quorum provided that the commands are independent. If the commands have dependencies, EPaxos needs to run an additional phase on a majority quorum. As a result, it would lose its throughput due to the number of command instances it needs to keep in memory and the high volume of messages it needs to exchange with other replicas to reach consensus. Alternatively, XPaxos [43] which is a leader-based system solves practical BFT with minimal resources while assuring high throughput. This paper describes an improved consensus mechanism, ZPaxos, which is an enhanced EPaxos embedding two key features of XPaxos i.e., a synchronous group with fault detection, and a recovery protocol under weak asynchrony assumptions. The new protocol achieves consensus in a single round with a leaderless synchronous group which allows the removal and addition of a node at a time whenever it sees a crashed or a byzantine replica while the system continues servicing requests without any significant interruptions throughout view synchrony. Transaction throughput and latency of ZPaxos which can handle both byzantine and crash failures exhibit superior performance to that of EPaxos which handles crash failures only, which are two leaderless core group architectures.","PeriodicalId":389334,"journal":{"name":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"31 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132758178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1109/ISPDC55340.2022.00011
Tirathraj Ramburn, D. Goswami
One of the major bottlenecks of traditional Blockchain is its low throughput resulting in poor scalability. One way to increase throughput is to shard the network nodes to form smaller groups (shards). There are a number of sharding schemes in the literature with a common goal: nodes are split into groups to concurrently process different sets of transactions. Parallelism is used to enhance scalability, however with a trade-off in fault-tolerance; i.e., the smaller the shard size is, the better is the performance but higher is the fault probability. Contemporary sharding schemes use variants of Byzantine Fault Tolerance (BFT) protocol as their intra-shard consensus algorithms. BFT gives good performance when shard sizes are kept relatively small and maximum allowable faults is below some threshold. However, all these systems make rigid assumptions about their shard sizes and maximum allowable faults which may not be practical at times. In recent years, there have been more practical hybrid fault models in the literature which are better applicable to Blockchain (e.g., hybrid of Byzantine and alive-but-corrupt (abc) faults where the latter only compromises on safety) and corresponding consensus protocols that offer flexibility in choice of fault types and quorum sizes, e.g., Flexible Byzantine Fault Tolerance (Flexible BFT). In this paper, we present a new sharding scheme, FlexiShard, that uses Flexible BFT as its intra-shard consensus algorithm. FlexiShard leverages the notion of flexible Byzantine quorums and the hybrid fault model introduced in Flexible BFT that comprises of Byzantine and abc faults. Use of Flexible BFT allows flexibility in the choice of fault types and choosing shard sizes based on a range of allowable fault thresholds. Additionally, it allows to form shards that can tolerate more total faults than traditional BFT shards of similar size, and hence can deliver similar performance but with more fault-tolerance. To the best of our knowledge, FlexiShard is the first application of Flexible BFT and the hybrid fault model to Blockchain and its sharding. A theoretical analysis of FlexiShard is presented which demonstrates its flexibility and advantages over traditional sharding schemes.
{"title":"FlexiShard: a Flexible Sharding Scheme for Blockchain based on a Hybrid Fault Model","authors":"Tirathraj Ramburn, D. Goswami","doi":"10.1109/ISPDC55340.2022.00011","DOIUrl":"https://doi.org/10.1109/ISPDC55340.2022.00011","url":null,"abstract":"One of the major bottlenecks of traditional Blockchain is its low throughput resulting in poor scalability. One way to increase throughput is to shard the network nodes to form smaller groups (shards). There are a number of sharding schemes in the literature with a common goal: nodes are split into groups to concurrently process different sets of transactions. Parallelism is used to enhance scalability, however with a trade-off in fault-tolerance; i.e., the smaller the shard size is, the better is the performance but higher is the fault probability. Contemporary sharding schemes use variants of Byzantine Fault Tolerance (BFT) protocol as their intra-shard consensus algorithms. BFT gives good performance when shard sizes are kept relatively small and maximum allowable faults is below some threshold. However, all these systems make rigid assumptions about their shard sizes and maximum allowable faults which may not be practical at times. In recent years, there have been more practical hybrid fault models in the literature which are better applicable to Blockchain (e.g., hybrid of Byzantine and alive-but-corrupt (abc) faults where the latter only compromises on safety) and corresponding consensus protocols that offer flexibility in choice of fault types and quorum sizes, e.g., Flexible Byzantine Fault Tolerance (Flexible BFT). In this paper, we present a new sharding scheme, FlexiShard, that uses Flexible BFT as its intra-shard consensus algorithm. FlexiShard leverages the notion of flexible Byzantine quorums and the hybrid fault model introduced in Flexible BFT that comprises of Byzantine and abc faults. Use of Flexible BFT allows flexibility in the choice of fault types and choosing shard sizes based on a range of allowable fault thresholds. Additionally, it allows to form shards that can tolerate more total faults than traditional BFT shards of similar size, and hence can deliver similar performance but with more fault-tolerance. To the best of our knowledge, FlexiShard is the first application of Flexible BFT and the hybrid fault model to Blockchain and its sharding. A theoretical analysis of FlexiShard is presented which demonstrates its flexibility and advantages over traditional sharding schemes.","PeriodicalId":389334,"journal":{"name":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132019811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1109/ISPDC55340.2022.00027
S. Varrette, Emmanuel Kieffer, F. Pinel
High Performance Computing (HPC) is nowadays a strategic asset required to sustain the surging demands for massive processing and data-analytic capabilities. In practice, the effective management of such large scale and distributed computing infrastructures is left to a Resource and Job Management System (RJMS). This essential middleware component is responsible for managing the computing resources, handling user requests to allocate resources while providing an optimized framework for starting, executing and monitoring jobs on the allocated resources. The University of Luxembourg has been operating for 15 years a large academic HPC facility which relies since 2017 on the Slurm RJMS introduced on top of the flagship cluster Iris. The acquisition of a new liquid-cooled supercomputer named Aion which was released in 2021 was the occasion to deeply review and optimize the seminal Slurm configuration, the resource limits defined and the sustaining fairsharing algorithm.This paper presents the outcomes of this study and details the implemented RJMS policy. The impact of the decisions made over the supercomputers workloads is also described. In particular, the performance evaluation conducted highlights that when compared to the seminal configuration, the described and implemented environment brought concrete and measurable improvements with regards the platform utilization (+12.64%), the jobs efficiency (as measured by the average Wall-time Request Accuracy, improved by 110.81%) or the management and funding (increased by 10%). The systems demonstrated sustainable and scalable HPC performances, and this effort has led to a negligible penalty on the average slowdown metric (response time normalized by runtime), which was increased by 0.59% for job workloads covering a complete year of exercise. Overall, this new setup has been in production for 18 months on both supercomputers and the updated model proves to bring a fairer and more satisfying experience to the end users. The proposed configurations and policies may help other HPC centres when designing or improving the RJMS sustaining the job scheduling strategy at the advent of computing capacity expansions.
{"title":"Optimizing the Resource and Job Management System of an Academic HPC & Research Computing Facility","authors":"S. Varrette, Emmanuel Kieffer, F. Pinel","doi":"10.1109/ISPDC55340.2022.00027","DOIUrl":"https://doi.org/10.1109/ISPDC55340.2022.00027","url":null,"abstract":"High Performance Computing (HPC) is nowadays a strategic asset required to sustain the surging demands for massive processing and data-analytic capabilities. In practice, the effective management of such large scale and distributed computing infrastructures is left to a Resource and Job Management System (RJMS). This essential middleware component is responsible for managing the computing resources, handling user requests to allocate resources while providing an optimized framework for starting, executing and monitoring jobs on the allocated resources. The University of Luxembourg has been operating for 15 years a large academic HPC facility which relies since 2017 on the Slurm RJMS introduced on top of the flagship cluster Iris. The acquisition of a new liquid-cooled supercomputer named Aion which was released in 2021 was the occasion to deeply review and optimize the seminal Slurm configuration, the resource limits defined and the sustaining fairsharing algorithm.This paper presents the outcomes of this study and details the implemented RJMS policy. The impact of the decisions made over the supercomputers workloads is also described. In particular, the performance evaluation conducted highlights that when compared to the seminal configuration, the described and implemented environment brought concrete and measurable improvements with regards the platform utilization (+12.64%), the jobs efficiency (as measured by the average Wall-time Request Accuracy, improved by 110.81%) or the management and funding (increased by 10%). The systems demonstrated sustainable and scalable HPC performances, and this effort has led to a negligible penalty on the average slowdown metric (response time normalized by runtime), which was increased by 0.59% for job workloads covering a complete year of exercise. Overall, this new setup has been in production for 18 months on both supercomputers and the updated model proves to bring a fairer and more satisfying experience to the end users. The proposed configurations and policies may help other HPC centres when designing or improving the RJMS sustaining the job scheduling strategy at the advent of computing capacity expansions.","PeriodicalId":389334,"journal":{"name":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127172828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1109/ISPDC55340.2022.00024
C. Stoica, Radu-Ioan Ciobanu, C. Dobre
The tremendous growth of smart devices in the past few years has brought new computational, storage and networking resources at the edge of the Internet. Using the recently-developed Drop Computing paradigm, these devices can be interconnected in an ad-hoc network, using said resources beyond local barriers in order to improve the latency and to remove the pressure from the core Internet network caused by the intensive usage of a centralized cloud architecture. In order to use the networking capabilities of the devices registered in a Drop Computing network in an optimal manner, multi-path TCP (MPTCP) is the key technology which allows the concurrent usage of all the devices’ networking interfaces, leading to a smoother reaction to failures, improved latency, and better throughput. Because of the high variety of devices in Drop Computing, some low-end devices can have stripped network stacks with many missing features, but the Linux Kernel Library (LKL) or User Mode Linux (UML) implementations are able to offer MPTCP and other features directly in user space.In this paper, we assess the feasibility and analyze the behaviour of TCP and MPTCP in Drop Computing-specific scenarios, using the Linux native network stack or the LKL/UML user space network stack. We demonstrate that MPTCP can be used successfully over any of them and that the most suitable one should be selected based on the hardware profile of the device used and the target software application. Furthermore, we prove that LKL and UML can be successfully utilised on low-end devices in order to allow them to use all their network interfaces and have a better failure handover solution.
{"title":"Investigating TCP/MPTCP Support for Drop Computing in User Space Network Stacks","authors":"C. Stoica, Radu-Ioan Ciobanu, C. Dobre","doi":"10.1109/ISPDC55340.2022.00024","DOIUrl":"https://doi.org/10.1109/ISPDC55340.2022.00024","url":null,"abstract":"The tremendous growth of smart devices in the past few years has brought new computational, storage and networking resources at the edge of the Internet. Using the recently-developed Drop Computing paradigm, these devices can be interconnected in an ad-hoc network, using said resources beyond local barriers in order to improve the latency and to remove the pressure from the core Internet network caused by the intensive usage of a centralized cloud architecture. In order to use the networking capabilities of the devices registered in a Drop Computing network in an optimal manner, multi-path TCP (MPTCP) is the key technology which allows the concurrent usage of all the devices’ networking interfaces, leading to a smoother reaction to failures, improved latency, and better throughput. Because of the high variety of devices in Drop Computing, some low-end devices can have stripped network stacks with many missing features, but the Linux Kernel Library (LKL) or User Mode Linux (UML) implementations are able to offer MPTCP and other features directly in user space.In this paper, we assess the feasibility and analyze the behaviour of TCP and MPTCP in Drop Computing-specific scenarios, using the Linux native network stack or the LKL/UML user space network stack. We demonstrate that MPTCP can be used successfully over any of them and that the most suitable one should be selected based on the hardware profile of the device used and the target software application. Furthermore, we prove that LKL and UML can be successfully utilised on low-end devices in order to allow them to use all their network interfaces and have a better failure handover solution.","PeriodicalId":389334,"journal":{"name":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125790839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-23DOI: 10.1109/ISPDC55340.2022.00012
Valentin Henkys, B. Schmidt, N. Berger
In the search for physics beyond the Standard Model the Mu3e experiment tries to observe the lepton flavor violating decay μ+ → e+e–e+. By observing the decay products of 1 • 108μ/s it aims to either observe the process, or set a new upper limit on its estimated branching ratio. The high muon rates result in high data rates of 80 Gbps, dominated by data produced through background processes. We present the Online Event Selection, a three step algorithm running on the graphics processing units (GPU) of the 12 Mu3e filter farm computers.By using simple and fast geometric selection criteria, the algorithm first reduces the amount of possible event candidates to below 5% of the initial set. These candidates are then used to reconstruct full particle tracks, correctly reconstructing over 97% of signal tracks. Finally a possible decay vertex is reconstructed using simple geometric considerations instead of a full reconstruction, correctly identifying over 94% of signal events.We also present a full implementation of the algorithm, fulfilling all performance requirements at the targeted muon rate and successfully reducing the data rate by a factor of 200.
{"title":"Online Event Selection for Mu3e using GPUs","authors":"Valentin Henkys, B. Schmidt, N. Berger","doi":"10.1109/ISPDC55340.2022.00012","DOIUrl":"https://doi.org/10.1109/ISPDC55340.2022.00012","url":null,"abstract":"In the search for physics beyond the Standard Model the Mu3e experiment tries to observe the lepton flavor violating decay μ+ → e+e–e+. By observing the decay products of 1 • 108μ/s it aims to either observe the process, or set a new upper limit on its estimated branching ratio. The high muon rates result in high data rates of 80 Gbps, dominated by data produced through background processes. We present the Online Event Selection, a three step algorithm running on the graphics processing units (GPU) of the 12 Mu3e filter farm computers.By using simple and fast geometric selection criteria, the algorithm first reduces the amount of possible event candidates to below 5% of the initial set. These candidates are then used to reconstruct full particle tracks, correctly reconstructing over 97% of signal tracks. Finally a possible decay vertex is reconstructed using simple geometric considerations instead of a full reconstruction, correctly identifying over 94% of signal events.We also present a full implementation of the algorithm, fulfilling all performance requirements at the targeted muon rate and successfully reducing the data rate by a factor of 200.","PeriodicalId":389334,"journal":{"name":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132487672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}