Pub Date : 2018-03-21DOI: 10.1109/PDP2018.2018.00102
Jie Hou, M. Radetzki
Technology scaling makes it possible to implement systems with hundreds of processing cores, and thousands in the future. The communication in such systems is enabled by Networks-on-Chips (NoCs). A downside of technology scaling is the increased susceptibility to failures in NoC resources. Ensuring reliable operation despite such failures degrades NoC performance and may even invalidate the performance benefits expected from scaling. Thus, it is not enough to analyze performance and reliability in isolation, as usually done. Instead, we suggest treating both aspects together using the concept of performability and its analysis with Markov reward models. Our methodology is exemplified for mesh NoCs and transient faults but can be transferred to other topologies and fault models. We investigate how performability develops with scaling towards larger NoCs and explore the limits of scaling by determining the break-even failure rates under which scaling can achieve net performance increase.
{"title":"Performability Analysis of Mesh-Based NoCs Using Markov Reward Model","authors":"Jie Hou, M. Radetzki","doi":"10.1109/PDP2018.2018.00102","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00102","url":null,"abstract":"Technology scaling makes it possible to implement systems with hundreds of processing cores, and thousands in the future. The communication in such systems is enabled by Networks-on-Chips (NoCs). A downside of technology scaling is the increased susceptibility to failures in NoC resources. Ensuring reliable operation despite such failures degrades NoC performance and may even invalidate the performance benefits expected from scaling. Thus, it is not enough to analyze performance and reliability in isolation, as usually done. Instead, we suggest treating both aspects together using the concept of performability and its analysis with Markov reward models. Our methodology is exemplified for mesh NoCs and transient faults but can be transferred to other topologies and fault models. We investigate how performability develops with scaling towards larger NoCs and explore the limits of scaling by determining the break-even failure rates under which scaling can achieve net performance increase.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117023761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-21DOI: 10.1109/PDP2018.2018.00024
R. Krawczyk, P. Linczuk, A. Wojeński, K. Poźniak, G. Kasprowicz, Wojciech Zabolotny, M. Gąska, D. Mazon, A. Jardin, T. Czarski, P. Kolasiński, M. Chernyshova, E. Kowalska-Strzeciwilk, K. Malinowski
The article presents results of the novel approach of combining high-performance and parallel computing solutions with front-end electronics in the development of scalable specialized soft X-rays measurement tool for high-scale plasma physics experiments with thermal fusion devices. Regarding the need for an easily-modifiable advanced diagnostics of tokamak hot plasma content, the heterogeneous system consisting of FPGAs and the PC server was introduced. The objective is to provide data quality monitoring and evaluation mechanisms along with an algorithm benchmarking tool for fast, low-latency measurements of soft X-rays emitted by hot tokamak plasma. The article describes a method of the development of the computation pipeline on the server side. The novel parallel algorithms and results are discussed. This brand new approach is targeted to adapt a HPC techniques in new areas of science, where comprehensive low-latency measurements and instrumentation are increasingly desired. The presented solution is deployed in the operational tokamak WEST (Tungsten Environment in Steady-State Tokamak) in collaboration with French Alternative Energies and Atomic Energy Commission (CEA), Cadarache, France.
{"title":"Novel Application of Parallel Computing Techniques in Soft X-Rays Plasma Measurement Systems for the WEST Experimental Thermal Fusion Reactor","authors":"R. Krawczyk, P. Linczuk, A. Wojeński, K. Poźniak, G. Kasprowicz, Wojciech Zabolotny, M. Gąska, D. Mazon, A. Jardin, T. Czarski, P. Kolasiński, M. Chernyshova, E. Kowalska-Strzeciwilk, K. Malinowski","doi":"10.1109/PDP2018.2018.00024","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00024","url":null,"abstract":"The article presents results of the novel approach of combining high-performance and parallel computing solutions with front-end electronics in the development of scalable specialized soft X-rays measurement tool for high-scale plasma physics experiments with thermal fusion devices. Regarding the need for an easily-modifiable advanced diagnostics of tokamak hot plasma content, the heterogeneous system consisting of FPGAs and the PC server was introduced. The objective is to provide data quality monitoring and evaluation mechanisms along with an algorithm benchmarking tool for fast, low-latency measurements of soft X-rays emitted by hot tokamak plasma. The article describes a method of the development of the computation pipeline on the server side. The novel parallel algorithms and results are discussed. This brand new approach is targeted to adapt a HPC techniques in new areas of science, where comprehensive low-latency measurements and instrumentation are increasingly desired. The presented solution is deployed in the operational tokamak WEST (Tungsten Environment in Steady-State Tokamak) in collaboration with French Alternative Energies and Atomic Energy Commission (CEA), Cadarache, France.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127039400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-21DOI: 10.1109/PDP2018.2018.00043
Gauthier Sornet, S. Jubertie, F. Dupros, F. D. Martin, P. Thierry, Sébastien Limet
The Finite-Element Method (FEM) is routinely used to solve Partial Differential Equations (PDE) in various scientific domains. For seismic waves modeling, the Spectral Element Method (SEM), which is a specific formulation of the classical FEM approach, have gained significant attention for the last two decades. This is explained both from the very good numerical accuracy of this method and from the parallel performance of classical MPI-based implementations that scale up to several tens of thousands computing cores. Nevertheless, the trend for current processors with an increasing level of low-level parallelism requires significant efforts at the shared-memory level. One major bottleneck is coming from the standard FEM assembly phase that leads to significant amount of irregular memory accesses. This prevents any efficient automatic optimizations from the compiler for instance. In this paper, we extract a kernel from a spectral-element application dedicated to earthquake simulations in complex geological medium (EFISPEC code developed at BRGM, the French Geological Survey). We study the intra-node behavior and we propose different levels of optimization (data-layout, manual vectorization, multi-threading) to fully benefit from SIMD units and NUMA architectures. Experiments performed on Intel Broadwell architecture show that the proposed optimizations dramatically improve the intra-node performance of the mini-application. Moreover, our results show a good match with rooflines theoretical performance models. We believe that these optimizations are not specific to this mini-application and may be implemented in different SEM and FEM based solvers as well.
{"title":"Data-Layout Reorganization for an Efficient Intra-Node Assembly of a Spectral Finite-Element Method","authors":"Gauthier Sornet, S. Jubertie, F. Dupros, F. D. Martin, P. Thierry, Sébastien Limet","doi":"10.1109/PDP2018.2018.00043","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00043","url":null,"abstract":"The Finite-Element Method (FEM) is routinely used to solve Partial Differential Equations (PDE) in various scientific domains. For seismic waves modeling, the Spectral Element Method (SEM), which is a specific formulation of the classical FEM approach, have gained significant attention for the last two decades. This is explained both from the very good numerical accuracy of this method and from the parallel performance of classical MPI-based implementations that scale up to several tens of thousands computing cores. Nevertheless, the trend for current processors with an increasing level of low-level parallelism requires significant efforts at the shared-memory level. One major bottleneck is coming from the standard FEM assembly phase that leads to significant amount of irregular memory accesses. This prevents any efficient automatic optimizations from the compiler for instance. In this paper, we extract a kernel from a spectral-element application dedicated to earthquake simulations in complex geological medium (EFISPEC code developed at BRGM, the French Geological Survey). We study the intra-node behavior and we propose different levels of optimization (data-layout, manual vectorization, multi-threading) to fully benefit from SIMD units and NUMA architectures. Experiments performed on Intel Broadwell architecture show that the proposed optimizations dramatically improve the intra-node performance of the mini-application. Moreover, our results show a good match with rooflines theoretical performance models. We believe that these optimizations are not specific to this mini-application and may be implemented in different SEM and FEM based solvers as well.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130987048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-21DOI: 10.1109/PDP2018.2018.00089
G. Bella, Francesco Marino, Gianpiero Costantino, F. Martinelli
Mobile users have got used to getting useful information while they are literally on the move. An implication of this habit is that certain live information, such as that for navigation, for dating and for handling emergencies, should be tailored to the user's current location. While this is technically feasible with the current technology, it raises concerns on the user's location privacy. To address the delicate tradeoff between user's location privacy and appropriateness of the information for that location, this paper discusses three information delivery protocols. One is the widely adopted Android's protocol, the other two are the authors' novel ones, termed AL protocol and LBPP protocol respectively. The former conceals the user's location within a geographical area, the latter employs secure two-party computation. Privacy of all protocols is analysed, motivating the choice to implement the LBPP protocol. It is made available as the "Getmewhere" service for the reader to download.
{"title":"Getmewhere: A Location-Based Privacy-Preserving Information Service","authors":"G. Bella, Francesco Marino, Gianpiero Costantino, F. Martinelli","doi":"10.1109/PDP2018.2018.00089","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00089","url":null,"abstract":"Mobile users have got used to getting useful information while they are literally on the move. An implication of this habit is that certain live information, such as that for navigation, for dating and for handling emergencies, should be tailored to the user's current location. While this is technically feasible with the current technology, it raises concerns on the user's location privacy. To address the delicate tradeoff between user's location privacy and appropriateness of the information for that location, this paper discusses three information delivery protocols. One is the widely adopted Android's protocol, the other two are the authors' novel ones, termed AL protocol and LBPP protocol respectively. The former conceals the user's location within a geographical area, the latter employs secure two-party computation. Privacy of all protocols is analysed, motivating the choice to implement the LBPP protocol. It is made available as the \"Getmewhere\" service for the reader to download.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130290633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-21DOI: 10.1109/PDP2018.2018.00090
T. Jun, Daeyoun Kang, Dohyeun Kim, Daeyoung Kim
A new form of cloud computing, serverless computing, is drawing attention as a new way to design micro-services architectures. In a serverless computing environment, services are developed as service functional units. The function development environment of all serverless computing framework at present is CPU based. In this paper, we propose a GPU-supported serverless computing framework that can deploy services faster than existing serverless computing framework using CPU. Our core approach is to integrate the open source serverless computing framework with NVIDIA-Docker and deploy services based on the GPU support container. We have developed an API that connects the open source framework to the NVIDIA-Docker and commands that enable GPU programming. In our experiments, we measured the performance of the framework in various environments. As a result, developers who want to develop services through the framework can deploy high-performance micro services and developers who want to run deep learning programs without a GPU environment can run code on remote GPUs with little performance degradation.
{"title":"GPU Enabled Serverless Computing Framework","authors":"T. Jun, Daeyoun Kang, Dohyeun Kim, Daeyoung Kim","doi":"10.1109/PDP2018.2018.00090","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00090","url":null,"abstract":"A new form of cloud computing, serverless computing, is drawing attention as a new way to design micro-services architectures. In a serverless computing environment, services are developed as service functional units. The function development environment of all serverless computing framework at present is CPU based. In this paper, we propose a GPU-supported serverless computing framework that can deploy services faster than existing serverless computing framework using CPU. Our core approach is to integrate the open source serverless computing framework with NVIDIA-Docker and deploy services based on the GPU support container. We have developed an API that connects the open source framework to the NVIDIA-Docker and commands that enable GPU programming. In our experiments, we measured the performance of the framework in various environments. As a result, developers who want to develop services through the framework can deploy high-performance micro services and developers who want to run deep learning programs without a GPU environment can run code on remote GPUs with little performance degradation.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126505253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-21DOI: 10.1109/PDP2018.2018.00077
C. Putman, Abhishta Abhishta, L. Nieuwenhuis
Botnets continue to be an active threat against firms or companies and individuals worldwide. Previous research regarding botnets has unveiled information on how the system and their stakeholders operate, but an insight on the economic structure that supports these stakeholders is lacking. The objective of this research is to analyse the business model and determine the revenue stream of a botnet owner. We also study the botnet life-cycle and determine the costs associated with it on the basis of four case studies. We conclude that building a full scale cyber army from scratch is very expensive where as acquiring a previously developed botnet requires a little cost. We find that initial setup and monthly costs were minimal compared to total revenue.
{"title":"Business Model of a Botnet","authors":"C. Putman, Abhishta Abhishta, L. Nieuwenhuis","doi":"10.1109/PDP2018.2018.00077","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00077","url":null,"abstract":"Botnets continue to be an active threat against firms or companies and individuals worldwide. Previous research regarding botnets has unveiled information on how the system and their stakeholders operate, but an insight on the economic structure that supports these stakeholders is lacking. The objective of this research is to analyse the business model and determine the revenue stream of a botnet owner. We also study the botnet life-cycle and determine the costs associated with it on the basis of four case studies. We conclude that building a full scale cyber army from scratch is very expensive where as acquiring a previously developed botnet requires a little cost. We find that initial setup and monthly costs were minimal compared to total revenue.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126086373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-21DOI: 10.1109/PDP2018.2018.00049
Tim Süß, Tunahan Kaya, Dustin Feld
For many years now, processor vendors increased the performance of their devices by adding more cores and wider vectorization units to their CPUs instead of scaling up the processors' clock frequency. Moreover, GPUs became popular for solving problems with even more parallel compute power. To exploit the full potential of modern compute devices, specific codes are necessary which are often coded in a hardware-specific manner. Usually, the codes for CPUs are not usable for GPUs and vice versa. The programming API OpenACC tries to close this gap by enabling one code-base to be suitable and optimized for many devices. Nevertheless, OpenACC is rarely used by `standard programmers' and while different code transformers (like PluTo) allow for (semi-)automatic code parallelization for multi-core CPUs, they do generally not support OpenACC yet. We present first promising results of our PluTo extension that generates parallelized codes using OpenACC. Using our transformer we create programs which exploit the parallelism of different platforms without any manual modifications and we achieve performance speedups of up to 100 in comparison to the original unoptimized programs and accelations of 2.05 in comparison to equally generated OpenMP codes.
{"title":"Extending PluTo for Multiple Devices by Integrating OpenACC","authors":"Tim Süß, Tunahan Kaya, Dustin Feld","doi":"10.1109/PDP2018.2018.00049","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00049","url":null,"abstract":"For many years now, processor vendors increased the performance of their devices by adding more cores and wider vectorization units to their CPUs instead of scaling up the processors' clock frequency. Moreover, GPUs became popular for solving problems with even more parallel compute power. To exploit the full potential of modern compute devices, specific codes are necessary which are often coded in a hardware-specific manner. Usually, the codes for CPUs are not usable for GPUs and vice versa. The programming API OpenACC tries to close this gap by enabling one code-base to be suitable and optimized for many devices. Nevertheless, OpenACC is rarely used by `standard programmers' and while different code transformers (like PluTo) allow for (semi-)automatic code parallelization for multi-core CPUs, they do generally not support OpenACC yet. We present first promising results of our PluTo extension that generates parallelized codes using OpenACC. Using our transformer we create programs which exploit the parallelism of different platforms without any manual modifications and we achieve performance speedups of up to 100 in comparison to the original unoptimized programs and accelations of 2.05 in comparison to equally generated OpenMP codes.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114279039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-21DOI: 10.1109/PDP2018.2018.00099
A. Rango, Pietro Napoli, D. D'Ambrosio, W. Spataro, A. D. Renzo, F. Maio
Here we present different preliminary parallel grid-based implementations of a simple particle system with the purpose to evaluate its performances on multi- and many-core computational devices. The system is modeled by means of the Discrete Element Method and the Extended Cellular Automata formalism, while OpenMP and OpenCL are used for parallelization. In particular, both the 3.1 and 4.5 OpenMP specifications have been considered, the latter also able to run on many-core computational devices like GPUs. The results of a first test simulation performed by considering a cubic domain with about 316,000 particles have shown a clear advantage of OpenCL on the considered Tesla K40 Nvidia GPU, while the OpenMP 3.1 implementation has performed better than the corresponding OpenMP 4.5 on the considered Intel Xeon E5-2650 16-thread CPU.
{"title":"Structured Grid-Based Parallel Simulation of a Simple DEM Model on Heterogeneous Systems","authors":"A. Rango, Pietro Napoli, D. D'Ambrosio, W. Spataro, A. D. Renzo, F. Maio","doi":"10.1109/PDP2018.2018.00099","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00099","url":null,"abstract":"Here we present different preliminary parallel grid-based implementations of a simple particle system with the purpose to evaluate its performances on multi- and many-core computational devices. The system is modeled by means of the Discrete Element Method and the Extended Cellular Automata formalism, while OpenMP and OpenCL are used for parallelization. In particular, both the 3.1 and 4.5 OpenMP specifications have been considered, the latter also able to run on many-core computational devices like GPUs. The results of a first test simulation performed by considering a cubic domain with about 316,000 particles have shown a clear advantage of OpenCL on the considered Tesla K40 Nvidia GPU, while the OpenMP 3.1 implementation has performed better than the corresponding OpenMP 4.5 on the considered Intel Xeon E5-2650 16-thread CPU.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114505055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-21DOI: 10.1109/PDP2018.2018.00041
Shin Morishima, Hiroki Matsutani
Blockchain is a distributed ledger system based on P2P network and originally used for a crypto currency system. The P2P network of Blockchain is maintained by full nodes which are in charge of verifying all the transactions in the network. However, most Blockchain user nodes do not act as full nodes, because workload of full nodes is quite high for personal mobile devices. Blockchain search queries, such as confirming balance, transaction contents, and transaction histories, from many users go to the full nodes. As a result, search throughput of full nodes would be a new bottleneck of Blockchain system, because the number of full nodes is less than the number of users of Blockchain systems. In this paper, we propose an acceleration method of Blockchain search using GPUs. More specifically, we introduce an array-based Patricia tree structure suitable for GPU processing so that we can make effective use of Blockchain feature that there are no update and delete queries. In the evaluations, the proposed method is compared with an existing GPU-based key-value search and a conventional CPU-based search in terms of the throughput of Blockchain key search. As a result, the throughput of our proposal is 3.4 times higher than that of the existing GPU-based search and 14.1 times higher than that of the CPU search when the number of keys is 80 ×2^20 and the key length is 256-bit in Blockchain search queries.
{"title":"Accelerating Blockchain Search of Full Nodes Using GPUs","authors":"Shin Morishima, Hiroki Matsutani","doi":"10.1109/PDP2018.2018.00041","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00041","url":null,"abstract":"Blockchain is a distributed ledger system based on P2P network and originally used for a crypto currency system. The P2P network of Blockchain is maintained by full nodes which are in charge of verifying all the transactions in the network. However, most Blockchain user nodes do not act as full nodes, because workload of full nodes is quite high for personal mobile devices. Blockchain search queries, such as confirming balance, transaction contents, and transaction histories, from many users go to the full nodes. As a result, search throughput of full nodes would be a new bottleneck of Blockchain system, because the number of full nodes is less than the number of users of Blockchain systems. In this paper, we propose an acceleration method of Blockchain search using GPUs. More specifically, we introduce an array-based Patricia tree structure suitable for GPU processing so that we can make effective use of Blockchain feature that there are no update and delete queries. In the evaluations, the proposed method is compared with an existing GPU-based key-value search and a conventional CPU-based search in terms of the throughput of Blockchain key search. As a result, the throughput of our proposal is 3.4 times higher than that of the existing GPU-based search and 14.1 times higher than that of the CPU search when the number of keys is 80 ×2^20 and the key length is 256-bit in Blockchain search queries.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114488167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-21DOI: 10.1109/PDP2018.2018.00092
G. Utrera, Marisa Gil, X. Martorell
Algorithmic codes for scientific computing may exhibit diverse levels of tolerance to memory errors, depending on the program behavior when accessing data. There are factors that can be controlled in an HPC program and may influence the tolerance degree to memory errors. A characterization of the degree of vulnerability an application exhibits can help to improve its security as well as save time and resources. In this work, we study some main factors that may have an impact on the propagation of errors originated from memory accesses.
{"title":"Analysis of the Impact Factors on Data Error Propagation in HPC Applications","authors":"G. Utrera, Marisa Gil, X. Martorell","doi":"10.1109/PDP2018.2018.00092","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00092","url":null,"abstract":"Algorithmic codes for scientific computing may exhibit diverse levels of tolerance to memory errors, depending on the program behavior when accessing data. There are factors that can be controlled in an HPC program and may influence the tolerance degree to memory errors. A characterization of the degree of vulnerability an application exhibits can help to improve its security as well as save time and resources. In this work, we study some main factors that may have an impact on the propagation of errors originated from memory accesses.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124806202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}