Pub Date : 2018-03-21DOI: 10.1109/PDP2018.2018.00067
G. Barlas, L. E. Hiny
In this paper we analytically solve the partitioning problem for performing matrix multiplication on a cluster of heterogeneous multicore machines, equipped with an accelerator, typically a GPU. We derive closed-form solutions that not only solve the problem in an exact manner, but they also allow for predictive analysis that can guide system design. Our work allows an optimum partitioning to be calculated in linear time with respect to the number of cores in the system. The static partitioning afforded by our Divisible Load Theory (DLT) based analysis, minimizes communication overhead and improves efficiency. Our work leverages existing optimized Dense Linear Algebra (DLA) libraries, such as cuBLAS and BLAS, which translates to an easy deployment that can readily exploit state-of-the-art tools. A comparison study concludes the paper, highlighting the beneficial effect of our partitioning approach.
{"title":"Closed-Form Solutions for Dense Matrix-Matrix Multiplication on Heterogeneous Platforms Using Divisible Load Analysis","authors":"G. Barlas, L. E. Hiny","doi":"10.1109/PDP2018.2018.00067","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00067","url":null,"abstract":"In this paper we analytically solve the partitioning problem for performing matrix multiplication on a cluster of heterogeneous multicore machines, equipped with an accelerator, typically a GPU. We derive closed-form solutions that not only solve the problem in an exact manner, but they also allow for predictive analysis that can guide system design. Our work allows an optimum partitioning to be calculated in linear time with respect to the number of cores in the system. The static partitioning afforded by our Divisible Load Theory (DLT) based analysis, minimizes communication overhead and improves efficiency. Our work leverages existing optimized Dense Linear Algebra (DLA) libraries, such as cuBLAS and BLAS, which translates to an easy deployment that can readily exploit state-of-the-art tools. A comparison study concludes the paper, highlighting the beneficial effect of our partitioning approach.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114419165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-21DOI: 10.1109/PDP2018.2018.00013
G. Breaban, S. Stuijk, K. Goossens
Modern embedded systems encompass a fast increasing range of applications, spanning from automotive to multimedia, and industrial automation. To tackle the increasing design complexity, the model-based design paradigm promotes the use of Models of Computation (MoCs) to capture the essential application properties. Existing MoCs are split between the event/time-triggered paradigm and the data-driven paradigm. However, time and data are two inter-related dimensions that are essential for defining the correct application behavior. In this paper we advocate a unified MoC that integrates the notions of time and data while accounting for imperfect clocks. We present the formal properties of our model and show how the Synchronous Data Flow (SDF) MoC can be used to analyze the time performance guarantees.
{"title":"A Unified Programming Model for Time- and Data-Driven Embedded Applications","authors":"G. Breaban, S. Stuijk, K. Goossens","doi":"10.1109/PDP2018.2018.00013","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00013","url":null,"abstract":"Modern embedded systems encompass a fast increasing range of applications, spanning from automotive to multimedia, and industrial automation. To tackle the increasing design complexity, the model-based design paradigm promotes the use of Models of Computation (MoCs) to capture the essential application properties. Existing MoCs are split between the event/time-triggered paradigm and the data-driven paradigm. However, time and data are two inter-related dimensions that are essential for defining the correct application behavior. In this paper we advocate a unified MoC that integrates the notions of time and data while accounting for imperfect clocks. We present the formal properties of our model and show how the Synchronous Data Flow (SDF) MoC can be used to analyze the time performance guarantees.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132242105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-21DOI: 10.1109/PDP2018.2018.00016
Da-Ren Chen, Ping-Feng Wang
In the consideration of the context of human body dynamics, we propose an energy-aware and quality-of-service (QoS) method for wireless body area networks (WBAN). This method exploits the characteristics of physical (PHY) layer based on narrow band signaling with M-PSK modulation and low power micro sensor transceiver. It improves energy efficiency of both receiver and transmitter front-end components while satisfying the QoS metrics in accordance with link quality due to human posture changes or movements. It can meet various QoS requirements of each sensor, improve bandwidth utilization and reduce energy consumption. This method saves an average of 6.2% of energy consumption over the TPC and LA-based methods at a power of -25dBm.
{"title":"Context-Aware Optimization for Energy-Efficient and QoS Wireless Body Area Networks with Human Dynamics","authors":"Da-Ren Chen, Ping-Feng Wang","doi":"10.1109/PDP2018.2018.00016","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00016","url":null,"abstract":"In the consideration of the context of human body dynamics, we propose an energy-aware and quality-of-service (QoS) method for wireless body area networks (WBAN). This method exploits the characteristics of physical (PHY) layer based on narrow band signaling with M-PSK modulation and low power micro sensor transceiver. It improves energy efficiency of both receiver and transmitter front-end components while satisfying the QoS metrics in accordance with link quality due to human posture changes or movements. It can meet various QoS requirements of each sensor, improve bandwidth utilization and reduce energy consumption. This method saves an average of 6.2% of energy consumption over the TPC and LA-based methods at a power of -25dBm.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121830676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-21DOI: 10.1109/PDP2018.2018.00053
M. Danelutto, M. Torquati
The ability to teach parallel programming principles and techniques is becoming fundamental to prepare a new generation of programmers able to master the pervasive parallelism made available by hardware vendors. Classical parallel programming courses leverage either low-level programming frameworks (e.g. those based on Pthreads) or higher level frameworks such as OpenMP or MPI. We discuss our teaching experience within the Master in "Computer Science and networking" where parallel programming is taught leveraging structured parallel programming principles and frameworks. The paper summarizes the results achieved in eight years of experience and shows how the adoption of a structured parallel programming approach improves the efficiency of the teaching process.
{"title":"Increasing Efficiency in Parallel Programming Teaching","authors":"M. Danelutto, M. Torquati","doi":"10.1109/PDP2018.2018.00053","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00053","url":null,"abstract":"The ability to teach parallel programming principles and techniques is becoming fundamental to prepare a new generation of programmers able to master the pervasive parallelism made available by hardware vendors. Classical parallel programming courses leverage either low-level programming frameworks (e.g. those based on Pthreads) or higher level frameworks such as OpenMP or MPI. We discuss our teaching experience within the Master in \"Computer Science and networking\" where parallel programming is taught leveraging structured parallel programming principles and frameworks. The paper summarizes the results achieved in eight years of experience and shows how the adoption of a structured parallel programming approach improves the efficiency of the teaching process.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126071934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-21DOI: 10.1109/PDP2018.2018.00082
Shu Yin, Bing Jiao, Xiaomin Zhu, X. Ruan, Si Chen, Zhuo Tang
As the Energy Wall and the Reliability Wall become unavoidable, it is a demanding and challenging task to reduce energy consumption in large-scale storage systems in modern data centers while retaining acceptable systems reliability. We propose a reliable energy-efficient storage system called DuoFS, which aims at balancing the energy efficiency, the reliability and the performance of parallel storage systems by seamlessly integrating one HDD-based file system and one SSD-based file system. At the heart of the DuoFS is a transformative middleware layer that dispatches files to the one of the two independent parallel file systems based on the files' I/O access popularity. By replicating popular files to the SSD-based file system and pushing the HDD-based file system into the low-power mode under light workload conditions, DuoFS can reduce significant energy consumption, avoid major factors that harm the storage systems reliability, and extract SSDs good I/O performance. Experimental results show that the DuoFS system saves up to 40% of energy, achieves up to 50% better I/O performance while only sacrificing less than 15% of the system's reliability.
{"title":"DuoFS: A Hybrid Storage System Balancing Energy-Efficiency, Reliability, and Performance","authors":"Shu Yin, Bing Jiao, Xiaomin Zhu, X. Ruan, Si Chen, Zhuo Tang","doi":"10.1109/PDP2018.2018.00082","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00082","url":null,"abstract":"As the Energy Wall and the Reliability Wall become unavoidable, it is a demanding and challenging task to reduce energy consumption in large-scale storage systems in modern data centers while retaining acceptable systems reliability. We propose a reliable energy-efficient storage system called DuoFS, which aims at balancing the energy efficiency, the reliability and the performance of parallel storage systems by seamlessly integrating one HDD-based file system and one SSD-based file system. At the heart of the DuoFS is a transformative middleware layer that dispatches files to the one of the two independent parallel file systems based on the files' I/O access popularity. By replicating popular files to the SSD-based file system and pushing the HDD-based file system into the low-power mode under light workload conditions, DuoFS can reduce significant energy consumption, avoid major factors that harm the storage systems reliability, and extract SSDs good I/O performance. Experimental results show that the DuoFS system saves up to 40% of energy, achieves up to 50% better I/O performance while only sacrificing less than 15% of the system's reliability.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127427487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-21DOI: 10.1109/PDP2018.2018.00015
A. Carneiro, J. L. Bez, F. Boito, Bruno Alves Fagundes, Carla Osthoff, P. Navaux
The historical gap between processing and data access speeds causes many applications to spend a large portion of their execution on I/O operations. From the point of view of a large-scale, expensive, supercomputer, it is important to ensure applications achieve the best I/O performance to promote an efficient usage of the machine. In this paper, we evaluate the I/O infrastructure of the Santos Dumont supercomputer, the largest one from Latin America. More specifically, we investigate the performance of collective I/O operations. By conducting an analysis of a scientific application that uses the machine, we identify large performance differences between the available MPI implementations. We then further study the observed phenomenon using the BT-IO and IOR benchmarks, in addition to a custom microbenchmark. We conclude that the customized MPI implementation by Bull (used by more than 20% of the jobs) presents the worst performance for small collective write operations. Our results are being used to help the Santos Dumont users to achieve the best performance for their applications. Additionally, by investigating the observed phenomenon, we provide information to help improve future MPI-IO collective write implementations.
{"title":"Collective I/O Performance on the Santos Dumont Supercomputer","authors":"A. Carneiro, J. L. Bez, F. Boito, Bruno Alves Fagundes, Carla Osthoff, P. Navaux","doi":"10.1109/PDP2018.2018.00015","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00015","url":null,"abstract":"The historical gap between processing and data access speeds causes many applications to spend a large portion of their execution on I/O operations. From the point of view of a large-scale, expensive, supercomputer, it is important to ensure applications achieve the best I/O performance to promote an efficient usage of the machine. In this paper, we evaluate the I/O infrastructure of the Santos Dumont supercomputer, the largest one from Latin America. More specifically, we investigate the performance of collective I/O operations. By conducting an analysis of a scientific application that uses the machine, we identify large performance differences between the available MPI implementations. We then further study the observed phenomenon using the BT-IO and IOR benchmarks, in addition to a custom microbenchmark. We conclude that the customized MPI implementation by Bull (used by more than 20% of the jobs) presents the worst performance for small collective write operations. Our results are being used to help the Santos Dumont users to achieve the best performance for their applications. Additionally, by investigating the observed phenomenon, we provide information to help improve future MPI-IO collective write implementations.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127678040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-21DOI: 10.1109/PDP2018.2018.00104
Luis Germán García Morales, J. E. A. Cobo, N. Bagherzadeh
Network-On-Chip (NoC) along with its extension Wireless NoC (WNoC) were proposed to increase system performance in future generations of Multi-Processor System-on-Chip (MPSoC) with hundreds / thousands of cores. For such platforms, designers seek to propose efficient task mapping mechanisms that establish the arrangement of executable tasks taking advantage of available communication links. These proposals are then evaluated and compared against other designs in terms of execution time, latency, energy, communication cost and other metrics, employing simulation tools to that end. However, current WNoC simulators only aim to evaluate the performance of the communication network; they lack the ability to estimate relevant metrics needed for the evaluation of mapping strategies. In this paper, we present an evaluation strategy aimed to assess the performance of task mapping approaches for WNoC and integrate it into a well-known state-of-the-art simulation tool. Several experiments are conducted to demonstrate the benefits of using our proposed strategy.
{"title":"Simulation-Based Evaluation Strategy for Task Mapping Approaches in WNoC Platforms","authors":"Luis Germán García Morales, J. E. A. Cobo, N. Bagherzadeh","doi":"10.1109/PDP2018.2018.00104","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00104","url":null,"abstract":"Network-On-Chip (NoC) along with its extension Wireless NoC (WNoC) were proposed to increase system performance in future generations of Multi-Processor System-on-Chip (MPSoC) with hundreds / thousands of cores. For such platforms, designers seek to propose efficient task mapping mechanisms that establish the arrangement of executable tasks taking advantage of available communication links. These proposals are then evaluated and compared against other designs in terms of execution time, latency, energy, communication cost and other metrics, employing simulation tools to that end. However, current WNoC simulators only aim to evaluate the performance of the communication network; they lack the ability to estimate relevant metrics needed for the evaluation of mapping strategies. In this paper, we present an evaluation strategy aimed to assess the performance of task mapping approaches for WNoC and integrate it into a well-known state-of-the-art simulation tool. Several experiments are conducted to demonstrate the benefits of using our proposed strategy.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120883542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-21DOI: 10.1109/PDP2018.2018.00046
Zheming Jin, Iris Johnson, H. Finkel
Development of applications using OpenCL targeting FPGAs is an emerging approach on heterogeneous computing systems. This paper uses the data unpacking algorithm in Base64 encoding as a case study to present programming and optimization techniques, and experimental results of the OpenCL-based implementations on an FPGA. We explain the algorithm and evaluate the performance of the kernel implementations with Intel's FPGA OpenCL SDK. The experimental results show kernel vectorization and duplication are two optimization techniques that can improve the kernel performance. The performance of kernel duplication is also closely related to the local work size. Our experiment shows 16-lane vectorization increases the bandwidth by a factor of 2 to 10 for large input data sizes. Moreover, the performance of kernel duplication using 16 compute units is 40% to 1.5% less than that of kernel vectorization depending on the input size. Tuning the local work size can improve the kernel performance by a factor of 3 to 23. For this kernel, using local memory is not an effective technique to improve the kernel performance because input data is not reused. A combination of vectorization and duplication achieves the highest performance of 12.3 GiB/s. Compared to an Intel Xeon E5 CPU and an Nvidia Tesla K80 GPU, the performance of the kernel on the Arria 10 FPGA is 6.7X faster than the CPU and 3X slower than the GPU. The performance per watt on the FPGA is 20.5X higher than the CPU and 1.19X lower than the GPU.
利用OpenCL开发针对fpga的应用程序是异构计算系统的一种新兴方法。本文以Base64编码下的数据解包算法为例,介绍了该算法的编程和优化技术,并给出了基于opencl的FPGA实现实验结果。我们解释了该算法,并利用英特尔的FPGA OpenCL SDK评估了内核实现的性能。实验结果表明,核矢量化和复制是提高核性能的两种优化技术。内核复制的性能也与本地工作大小密切相关。我们的实验表明,对于大的输入数据量,16通道矢量化将带宽提高了2到10倍。此外,根据输入大小的不同,使用16个计算单元的内核复制的性能比内核矢量化的性能低40%到1.5%。调优本地工作大小可以将内核性能提高3到23倍。对于这个内核,使用本地内存并不是提高内核性能的有效技术,因为输入数据不会被重用。矢量化和复制的组合实现了12.3 GiB/s的最高性能。与Intel至强E5 CPU和Nvidia Tesla K80 GPU相比,Arria 10 FPGA上的内核性能比CPU快6.7倍,比GPU慢3倍。FPGA的每瓦性能比CPU高20.5倍,比GPU低1.19倍。
{"title":"Evaluating and Optimizing OpenCL Base64 Data Unpacking Kernel with FPGA","authors":"Zheming Jin, Iris Johnson, H. Finkel","doi":"10.1109/PDP2018.2018.00046","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00046","url":null,"abstract":"Development of applications using OpenCL targeting FPGAs is an emerging approach on heterogeneous computing systems. This paper uses the data unpacking algorithm in Base64 encoding as a case study to present programming and optimization techniques, and experimental results of the OpenCL-based implementations on an FPGA. We explain the algorithm and evaluate the performance of the kernel implementations with Intel's FPGA OpenCL SDK. The experimental results show kernel vectorization and duplication are two optimization techniques that can improve the kernel performance. The performance of kernel duplication is also closely related to the local work size. Our experiment shows 16-lane vectorization increases the bandwidth by a factor of 2 to 10 for large input data sizes. Moreover, the performance of kernel duplication using 16 compute units is 40% to 1.5% less than that of kernel vectorization depending on the input size. Tuning the local work size can improve the kernel performance by a factor of 3 to 23. For this kernel, using local memory is not an effective technique to improve the kernel performance because input data is not reused. A combination of vectorization and duplication achieves the highest performance of 12.3 GiB/s. Compared to an Intel Xeon E5 CPU and an Nvidia Tesla K80 GPU, the performance of the kernel on the Arria 10 FPGA is 6.7X faster than the CPU and 3X slower than the GPU. The performance per watt on the FPGA is 20.5X higher than the CPU and 1.19X lower than the GPU.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127109710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-21DOI: 10.1109/PDP2018.2018.00050
Bilal Fakih, D. E. Baz
This paper aims at presenting Peer-To-Peer HPC a decentralized environment that facilitates the use of heterogeneous multi-cluster platform for loosely synchronous applications. The goal is to exploit all the computing resources (all the available cores of computing nodes) as well as all networks, e.g., Ethernet, Infiniband and Myrinet. Peer-To-Peer HPC functionality relies on a reconfigurable multi network protocol RMNP for controlling multiple network adapters and on OpenMP for the exploitation of all the available cores of computing nodes. We report on efficiency obtained with Grid5000 testbed by combining synchronous and asynchronous iterative schemes of computation with Peer-To-Peer HPC. The experimental results show that our environment scales well.
{"title":"Heterogeneous Computing and Multi-Clustering Support Via Peer-To-Peer HPC","authors":"Bilal Fakih, D. E. Baz","doi":"10.1109/PDP2018.2018.00050","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00050","url":null,"abstract":"This paper aims at presenting Peer-To-Peer HPC a decentralized environment that facilitates the use of heterogeneous multi-cluster platform for loosely synchronous applications. The goal is to exploit all the computing resources (all the available cores of computing nodes) as well as all networks, e.g., Ethernet, Infiniband and Myrinet. Peer-To-Peer HPC functionality relies on a reconfigurable multi network protocol RMNP for controlling multiple network adapters and on OpenMP for the exploitation of all the available cores of computing nodes. We report on efficiency obtained with Grid5000 testbed by combining synchronous and asynchronous iterative schemes of computation with Peer-To-Peer HPC. The experimental results show that our environment scales well.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128395026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-21DOI: 10.1109/PDP2018.2018.00070
M. Pennisi, G. Russo, F. Pappalardo
The developing of novel prophylactic and therapeutic vaccine candidates in the field of cancer immunology brought to very promising results against tumors, entitling full protection with reduced amount of the typical side effects of the actual conventional treatments. However, such treatments required a constant, life-long, administration procedure to keep protection. As both the period of protection and the relative number of administrations grow, the problem of finding the best administration protocol, in time and dosage, becomes more and more complex. Such a problem cannot be usually solved in in vivo experiments, as the costs in terms of time, money, and people would be prohibitive. We propose a hybrid approach that integrates machine learning and parallel genetic algorithms to enhance the research in silico of optimal administration protocols for a cancer vaccine. A neural network is used to improve both crossover and mutation operators. Preliminary results suggest that the use of such could bring to better administration protocols using a similar computational effort.
{"title":"Combining Parallel Genetic Algorithms and Machine Learning to Improve the Research of Optimal Vaccination Protocols","authors":"M. Pennisi, G. Russo, F. Pappalardo","doi":"10.1109/PDP2018.2018.00070","DOIUrl":"https://doi.org/10.1109/PDP2018.2018.00070","url":null,"abstract":"The developing of novel prophylactic and therapeutic vaccine candidates in the field of cancer immunology brought to very promising results against tumors, entitling full protection with reduced amount of the typical side effects of the actual conventional treatments. However, such treatments required a constant, life-long, administration procedure to keep protection. As both the period of protection and the relative number of administrations grow, the problem of finding the best administration protocol, in time and dosage, becomes more and more complex. Such a problem cannot be usually solved in in vivo experiments, as the costs in terms of time, money, and people would be prohibitive. We propose a hybrid approach that integrates machine learning and parallel genetic algorithms to enhance the research in silico of optimal administration protocols for a cancer vaccine. A neural network is used to improve both crossover and mutation operators. Preliminary results suggest that the use of such could bring to better administration protocols using a similar computational effort.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132818672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}