Pub Date : 2021-07-28DOI: 10.1109/ISPDC52870.2021.9521637
Nikos Deligiannis
{"title":"Keynote Lecture : Gradient compression for efficient distributed deep learning","authors":"Nikos Deligiannis","doi":"10.1109/ISPDC52870.2021.9521637","DOIUrl":"https://doi.org/10.1109/ISPDC52870.2021.9521637","url":null,"abstract":"","PeriodicalId":20515,"journal":{"name":"Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing","volume":"3 1","pages":"xiii"},"PeriodicalIF":0.0,"publicationDate":"2021-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76155764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-28DOI: 10.1109/ISPDC52870.2021.9521642
Philipp Haller
{"title":"Keynote Lecture : Towards Robust, Large-scale Concurrent and Distributed Programming","authors":"Philipp Haller","doi":"10.1109/ISPDC52870.2021.9521642","DOIUrl":"https://doi.org/10.1109/ISPDC52870.2021.9521642","url":null,"abstract":"","PeriodicalId":20515,"journal":{"name":"Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing","volume":"51 1","pages":"xiv"},"PeriodicalIF":0.0,"publicationDate":"2021-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87234877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-28DOI: 10.1109/ISPDC52870.2021.9521636
R. Grosu
{"title":"Keynote Lecture : Neural circuit policies","authors":"R. Grosu","doi":"10.1109/ISPDC52870.2021.9521636","DOIUrl":"https://doi.org/10.1109/ISPDC52870.2021.9521636","url":null,"abstract":"","PeriodicalId":20515,"journal":{"name":"Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing","volume":"43 1","pages":"xii"},"PeriodicalIF":0.0,"publicationDate":"2021-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85103975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-28DOI: 10.1109/ISPDC52870.2021.9521632
D. Rus
{"title":"Keynote Lecture : Learning Representations: Opportunities for Parallel and Distributed Computing","authors":"D. Rus","doi":"10.1109/ISPDC52870.2021.9521632","DOIUrl":"https://doi.org/10.1109/ISPDC52870.2021.9521632","url":null,"abstract":"","PeriodicalId":20515,"journal":{"name":"Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing","volume":"19 1","pages":"xi"},"PeriodicalIF":0.0,"publicationDate":"2021-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73456300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/ISPDC51135.2020.00009
M. Sato
We have been carrying out the FLAGSHIP 2020 to develop the Japanese next-generation flagship supercomputer, Post-K, named as “Fugaku” recently. In the project, we have designed a new Arm-SVE enabled processor, called A64FX, as well as the system including interconnect with the industry partner, Fujitsu. The processor is designed for energy-efficiency and sustained application performance. In the design of the system, the “co-design” with the system and applications is a key to make it efficient and high-performance. We analyzed a set of the target applications provided from applications teams for the design of the processor architecture and the decision of many architectural parameters. The “Fugaku” is being installed and scheduled to be put into operation for public service around 2021. In this talk, several features and some preliminary performance results of the “Fugaku” system and A64FX manycore processor will be presented as well as the overview of the system.
{"title":"The Supercomputer \"Fugaku\" and Arm-SVE enabled A64FX processor for energy-efficiency and sustained application performance","authors":"M. Sato","doi":"10.1109/ISPDC51135.2020.00009","DOIUrl":"https://doi.org/10.1109/ISPDC51135.2020.00009","url":null,"abstract":"We have been carrying out the FLAGSHIP 2020 to develop the Japanese next-generation flagship supercomputer, Post-K, named as “Fugaku” recently. In the project, we have designed a new Arm-SVE enabled processor, called A64FX, as well as the system including interconnect with the industry partner, Fujitsu. The processor is designed for energy-efficiency and sustained application performance. In the design of the system, the “co-design” with the system and applications is a key to make it efficient and high-performance. We analyzed a set of the target applications provided from applications teams for the design of the processor architecture and the decision of many architectural parameters. The “Fugaku” is being installed and scheduled to be put into operation for public service around 2021. In this talk, several features and some preliminary performance results of the “Fugaku” system and A64FX manycore processor will be presented as well as the overview of the system.","PeriodicalId":20515,"journal":{"name":"Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing","volume":"1 1","pages":"1-5"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89320832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-13DOI: 10.5772/INTECHOPEN.81885
G. Vasiliadis
Despite the recent advances in software security hardening techniques, vulner-abilities can always be exploited if the attackers are really determined. Regardless the protection enabled, successful exploitation can always be achieved, even though admittedly, today, it is much harder than it was in the past. Since securing software is still under ongoing research, the community investigates detection methods in order to protect software. Three of the most promising such methods are monitoring the (i) network, (ii) the filesystem, and (iii) the host memory, for possible exploitation. Whenever a malicious operation is detected then the monitor should be able to terminate it and/or alert the administrator. In this chapter, we explore how to utilize the highly parallel capabilities of modern commodity graphics processing units (GPUs) in order to improve the performance of different security tools operating at the network, storage, and memory level, and how they can offload the CPU whenever possible. Our results show that modern GPUs can be very efficient and highly effective at accelerating the pattern matching operations of network intrusion detection systems and antivirus tools, as well as for monitoring the integrity of the base computing systems.
{"title":"Security Applications of GPUs","authors":"G. Vasiliadis","doi":"10.5772/INTECHOPEN.81885","DOIUrl":"https://doi.org/10.5772/INTECHOPEN.81885","url":null,"abstract":"Despite the recent advances in software security hardening techniques, vulner-abilities can always be exploited if the attackers are really determined. Regardless the protection enabled, successful exploitation can always be achieved, even though admittedly, today, it is much harder than it was in the past. Since securing software is still under ongoing research, the community investigates detection methods in order to protect software. Three of the most promising such methods are monitoring the (i) network, (ii) the filesystem, and (iii) the host memory, for possible exploitation. Whenever a malicious operation is detected then the monitor should be able to terminate it and/or alert the administrator. In this chapter, we explore how to utilize the highly parallel capabilities of modern commodity graphics processing units (GPUs) in order to improve the performance of different security tools operating at the network, storage, and memory level, and how they can offload the CPU whenever possible. Our results show that modern GPUs can be very efficient and highly effective at accelerating the pattern matching operations of network intrusion detection systems and antivirus tools, as well as for monitoring the integrity of the base computing systems.","PeriodicalId":20515,"journal":{"name":"Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73609684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-04DOI: 10.5772/intechopen.84193
Satyadhyan Chickerur
High performance computing research had an interesting journey from the year 1972 to this day. In the initial years HPC was considered synonyms with supercomputing and was accessible to the scientists and researchers who worked in the domain of aeronautics, automobiles, petrochemicals, pharmaceuticals, particle physics, weather forecasting, etc. to name a few. Next came a phase where the term supercomputing gradually was replaced by high performance computing and the computing power gradually shifted to PCs in the form of multicore processors for various reasons. This was the time when lot of researchers saw benefit in parallelizing their applications achieving speedups, scale ups and robustness. This was possible because of concepts like Message passing interface, OpenMP, etc. which got evolved. Lot of research was carried out related to HPC systems architecture, computational models, parallel algorithms, and performance optimization, as a result of which renewed interest was created in parallel computing for HPC. This interest was also sustained because of:
{"title":"Introductory Chapter: High Performance Parallel Computing","authors":"Satyadhyan Chickerur","doi":"10.5772/intechopen.84193","DOIUrl":"https://doi.org/10.5772/intechopen.84193","url":null,"abstract":"High performance computing research had an interesting journey from the year 1972 to this day. In the initial years HPC was considered synonyms with supercomputing and was accessible to the scientists and researchers who worked in the domain of aeronautics, automobiles, petrochemicals, pharmaceuticals, particle physics, weather forecasting, etc. to name a few. Next came a phase where the term supercomputing gradually was replaced by high performance computing and the computing power gradually shifted to PCs in the form of multicore processors for various reasons. This was the time when lot of researchers saw benefit in parallelizing their applications achieving speedups, scale ups and robustness. This was possible because of concepts like Message passing interface, OpenMP, etc. which got evolved. Lot of research was carried out related to HPC systems architecture, computational models, parallel algorithms, and performance optimization, as a result of which renewed interest was created in parallel computing for HPC. This interest was also sustained because of:","PeriodicalId":20515,"journal":{"name":"Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing","volume":"72 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88012721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-31DOI: 10.5772/INTECHOPEN.81755
G. Bilotta, V. Zago, A. Hérault
Particle systems, commonly associated with computer graphics, animation, and video games, are an essential component in the implementation of numerical methods ranging from the meshfree methods for computational fluid dynamics and related applications (e.g., smoothed particle hydrodynamics, SPH) to minimization methods for arbitrary problems (e.g., particle swarm optimization, PSO). These methods are frequently embarrassingly parallel in nature, making them a natural fit for implementation on massively parallel computational hardware such as modern graphics processing units (GPUs). However, naive implementations fail to fully exploit the capabilities of this hardware. We present practical solutions to the challenges faced in the efficient parallel implementation of these particle systems, with a focus on performance, robustness, and flexibility. The techniques are illustrated through GPUSPH, the first implementation of SPH to run completely on GPU, and currently supporting multi-GPU clusters, uniform precision independent of domain size, and multiple SPH formulations.
{"title":"Design and Implementation of Particle Systems for Meshfree Methods with High Performance","authors":"G. Bilotta, V. Zago, A. Hérault","doi":"10.5772/INTECHOPEN.81755","DOIUrl":"https://doi.org/10.5772/INTECHOPEN.81755","url":null,"abstract":"Particle systems, commonly associated with computer graphics, animation, and video games, are an essential component in the implementation of numerical methods ranging from the meshfree methods for computational fluid dynamics and related applications (e.g., smoothed particle hydrodynamics, SPH) to minimization methods for arbitrary problems (e.g., particle swarm optimization, PSO). These methods are frequently embarrassingly parallel in nature, making them a natural fit for implementation on massively parallel computational hardware such as modern graphics processing units (GPUs). However, naive implementations fail to fully exploit the capabilities of this hardware. We present practical solutions to the challenges faced in the efficient parallel implementation of these particle systems, with a focus on performance, robustness, and flexibility. The techniques are illustrated through GPUSPH, the first implementation of SPH to run completely on GPU, and currently supporting multi-GPU clusters, uniform precision independent of domain size, and multiple SPH formulations.","PeriodicalId":20515,"journal":{"name":"Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing","volume":"46 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85387904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-10DOI: 10.5772/INTECHOPEN.81124
Song Huang, Song Fu, S. Pakin, M. Lang
The traditional parallel programming models require programmers to explicitly specify parallelism and data movement of the underlying parallel mechanisms. Different from the traditional computation-centric programming, Legion provides a data-centric programming model for extracting parallelism and data movement. In this chapter, we aim to characterize the power and energy consumption of running HPC applications on Legion. We run benchmark applications on compute nodes equipped with both CPU and GPU, and measure the execution time, power consumption and CPU/GPU utilization. Additionally, we test the message passing interface (MPI) version of these applications and compare the performance and power consumption of high-performance computing (HPC) applications using the computation-centric and data-centric programming models. Experimental results indicate Legion applications outperforms MPI applications on both performance and energy efficiency, i.e., Legion applications can be 9.17 times as fast as MPI applications and use only 9.2% energy. Legion effectively explores the heterogeneous architecture and runs applications tasks on GPU. As far as we know, this is the first study to understand the power and energy consumption of Legion programming and runtime infrastructure. Our findings will enable HPC system designers and operators to develop and tune the performance of data-centric HPC applications with constraints on power and energy consumption.
{"title":"Characterizing Power and Energy Efficiency of Legion Data-Centric Runtime and Applications on Heterogeneous High-Performance Computing Systems","authors":"Song Huang, Song Fu, S. Pakin, M. Lang","doi":"10.5772/INTECHOPEN.81124","DOIUrl":"https://doi.org/10.5772/INTECHOPEN.81124","url":null,"abstract":"The traditional parallel programming models require programmers to explicitly specify parallelism and data movement of the underlying parallel mechanisms. Different from the traditional computation-centric programming, Legion provides a data-centric programming model for extracting parallelism and data movement. In this chapter, we aim to characterize the power and energy consumption of running HPC applications on Legion. We run benchmark applications on compute nodes equipped with both CPU and GPU, and measure the execution time, power consumption and CPU/GPU utilization. Additionally, we test the message passing interface (MPI) version of these applications and compare the performance and power consumption of high-performance computing (HPC) applications using the computation-centric and data-centric programming models. Experimental results indicate Legion applications outperforms MPI applications on both performance and energy efficiency, i.e., Legion applications can be 9.17 times as fast as MPI applications and use only 9.2% energy. Legion effectively explores the heterogeneous architecture and runs applications tasks on GPU. As far as we know, this is the first study to understand the power and energy consumption of Legion programming and runtime infrastructure. Our findings will enable HPC system designers and operators to develop and tune the performance of data-centric HPC applications with constraints on power and energy consumption.","PeriodicalId":20515,"journal":{"name":"Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73114387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-24DOI: 10.5772/INTECHOPEN.81191
K. Koyamada, Naohisa Sakamoto
In this chapter, we propose a fused rendering technique that can integrally handle multiple irregular volumes. Although there is a strong requirement for understanding large-scale datasets generated from coupled simulation techniques such as computational structure mechanics (CSM) and computational fluid dynamics (CFD), there is no fused rendering technique to the best of our knowledge. For this purpose, we can employ the particle-based volume rendering (PBVR) technique for each irregular volume dataset. Since the current PBVR technique regards an irregular cell as a planar footprint during depth evaluation, the straightforward employment causes some artifacts especially at the cell boundaries. To solve the problem, we calculate the depth value based on the assumption that the opacity describes the cumulative distribution function (CDF) of a probability variable, w, which shows a length from the entry point in the fragment interval in the cell. In our experiments, we applied our method to numerical simulation results in which two different irregular grid cells are defined in the same space and confirmed its effectiveness with respect to the image quality.
{"title":"Particle-Based Fused Rendering","authors":"K. Koyamada, Naohisa Sakamoto","doi":"10.5772/INTECHOPEN.81191","DOIUrl":"https://doi.org/10.5772/INTECHOPEN.81191","url":null,"abstract":"In this chapter, we propose a fused rendering technique that can integrally handle multiple irregular volumes. Although there is a strong requirement for understanding large-scale datasets generated from coupled simulation techniques such as computational structure mechanics (CSM) and computational fluid dynamics (CFD), there is no fused rendering technique to the best of our knowledge. For this purpose, we can employ the particle-based volume rendering (PBVR) technique for each irregular volume dataset. Since the current PBVR technique regards an irregular cell as a planar footprint during depth evaluation, the straightforward employment causes some artifacts especially at the cell boundaries. To solve the problem, we calculate the depth value based on the assumption that the opacity describes the cumulative distribution function (CDF) of a probability variable, w, which shows a length from the entry point in the fragment interval in the cell. In our experiments, we applied our method to numerical simulation results in which two different irregular grid cells are defined in the same space and confirmed its effectiveness with respect to the image quality.","PeriodicalId":20515,"journal":{"name":"Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82772687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}