Pub Date : 2020-05-01DOI: 10.1109/IPDPSW50202.2020.00091
Thomas Charest, R. Green
Central Force optimization (CFO) is a fully deterministic population based metaheuristic algorithm based on the analogy of classical kinematics. CFO yields more accurate and consistent results compared to other population based metaheuristics like Particle Swarm optimization and Genetic Algorithms, but does so at the cost of higher computational complexity, leading to increased computational time. This study presents a parallel implementation of CFO written in C++ using OpenMP as implemented for both a multi-core CPU and the Intel Xeon Phi Co-processor. Results show that parallelizing CFO provides promising speedup values from 5-35 on the multi-core CPU and 1-12 on the Intel Xeon Phi.
{"title":"Implementing Central Force optimization on the Intel Xeon Phi","authors":"Thomas Charest, R. Green","doi":"10.1109/IPDPSW50202.2020.00091","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00091","url":null,"abstract":"Central Force optimization (CFO) is a fully deterministic population based metaheuristic algorithm based on the analogy of classical kinematics. CFO yields more accurate and consistent results compared to other population based metaheuristics like Particle Swarm optimization and Genetic Algorithms, but does so at the cost of higher computational complexity, leading to increased computational time. This study presents a parallel implementation of CFO written in C++ using OpenMP as implemented for both a multi-core CPU and the Intel Xeon Phi Co-processor. Results show that parallelizing CFO provides promising speedup values from 5-35 on the multi-core CPU and 1-12 on the Intel Xeon Phi.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"172 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116142574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-05-01DOI: 10.1109/IPDPSW50202.2020.00143
Guilherme C. De Lello, Juliano Caldeira, M. Aredes, F. França, P. Lima
It is well known that energy efficiency plays a key role in ensuring sustainable development. Concerns regarding energy include greenhouse gas emissions, which contribute to global warming, and the possibility of supply interruptions and delivery constraints in some countries and regions of the world. Several studies have suggested that feedback on specific electrical appliances’ consumption could be one of the cheapest and most eco-friendly ways to encourage utility customers in energy conservation. Moreover, there is evidence that the best rates of savings are achieved when the appliance load information is delivered directly to the customers’ smartphones or dedicated displays inside their homes. Nonintrusive Load Monitoring (NILM) is a technique that estimates the energy consumption of individual appliance loads without requiring the installation of sensors in each appliance. In order to provide feedback directly to the end-user, NILM applications could be embedded in IoT smart devices. However, the amount of computational resources required by NILM algorithms proposed in previous research often discourages embedded applications. On the other hand, the Weightless Neural Network model WiSARD is capable of solving pattern recognition tasks by using a memory-based architecture and some of the most simple computational operations: addition and comparison. Those properties suggest that this particular machine learning model is suited for efficiently solving NILM problem. This paper describes and evaluates a new approach to NILM in which the electric loads are disaggregated by using the Weightless Neural Network model WiSARD. Experimental results using the Brazilian Appliance Dataset (BRAD) indicate that it is feasible to embed WiSARD-based NILM algorithms in low-cost IoT smart energy meters.
{"title":"Weightless Neural Networks Applied to Nonintrusive Load Monitoring","authors":"Guilherme C. De Lello, Juliano Caldeira, M. Aredes, F. França, P. Lima","doi":"10.1109/IPDPSW50202.2020.00143","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00143","url":null,"abstract":"It is well known that energy efficiency plays a key role in ensuring sustainable development. Concerns regarding energy include greenhouse gas emissions, which contribute to global warming, and the possibility of supply interruptions and delivery constraints in some countries and regions of the world. Several studies have suggested that feedback on specific electrical appliances’ consumption could be one of the cheapest and most eco-friendly ways to encourage utility customers in energy conservation. Moreover, there is evidence that the best rates of savings are achieved when the appliance load information is delivered directly to the customers’ smartphones or dedicated displays inside their homes. Nonintrusive Load Monitoring (NILM) is a technique that estimates the energy consumption of individual appliance loads without requiring the installation of sensors in each appliance. In order to provide feedback directly to the end-user, NILM applications could be embedded in IoT smart devices. However, the amount of computational resources required by NILM algorithms proposed in previous research often discourages embedded applications. On the other hand, the Weightless Neural Network model WiSARD is capable of solving pattern recognition tasks by using a memory-based architecture and some of the most simple computational operations: addition and comparison. Those properties suggest that this particular machine learning model is suited for efficiently solving NILM problem. This paper describes and evaluates a new approach to NILM in which the electric loads are disaggregated by using the Weightless Neural Network model WiSARD. Experimental results using the Brazilian Appliance Dataset (BRAD) indicate that it is feasible to embed WiSARD-based NILM algorithms in low-cost IoT smart energy meters.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125392962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-05-01DOI: 10.1109/IPDPSW50202.2020.00090
R. Goodwin
This paper discusses load balancing the number of sets on multiple processors to compute the power set. The algorithm complexity measures (e.g. run-time and space usage) are merely functions of the load balance.This paper presents two approaches to computing the power set in a parallel environment. Both approaches use different round robin load balancing algorithms. Because of the load balance assignments, both approaches have different production algorithms for computing the power set. This paper covers both the run-time analyses and the space usage analyses of the parallel algorithms. We present mathematical formulas for the space usage analyses.We present a third, non-parallel algorithm for benchmark purposes. The purpose of the algorithm is to give the total amount of space needed to save the power set. The non-parallel algorithm also gives some idea of the maximum run-time an algorithm should take to find any of the sub-sets to the power set on any given processor in a parallel environment.
{"title":"Load Balancing Run-Times and Space Usage for Computing the Power Set","authors":"R. Goodwin","doi":"10.1109/IPDPSW50202.2020.00090","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00090","url":null,"abstract":"This paper discusses load balancing the number of sets on multiple processors to compute the power set. The algorithm complexity measures (e.g. run-time and space usage) are merely functions of the load balance.This paper presents two approaches to computing the power set in a parallel environment. Both approaches use different round robin load balancing algorithms. Because of the load balance assignments, both approaches have different production algorithms for computing the power set. This paper covers both the run-time analyses and the space usage analyses of the parallel algorithms. We present mathematical formulas for the space usage analyses.We present a third, non-parallel algorithm for benchmark purposes. The purpose of the algorithm is to give the total amount of space needed to save the power set. The non-parallel algorithm also gives some idea of the maximum run-time an algorithm should take to find any of the sub-sets to the power set on any given processor in a parallel environment.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126093402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-05-01DOI: 10.1109/IPDPSW50202.2020.00173
François Tessier, Maxime Martinasso, M. Chesi, Mark Klein, M. Gila
Complex applications and workflows needs are often exclusively expressed in terms of computational resources on HPC systems. In many cases, other resources like storage or network are not allocatable and are shared across the entire HPC system. By looking at the storage resources in particular, any workflow or application should be able to select both its preferred data manager and its required storage capability or capacity. To achieve such a goal, new mechanisms should be introduced. In this work, we present such a tool that dynamically provisions a data management system on top of storage devices. We propose a proof-of-concept that is able to deploy, on-demand, a parallel file-system across intermediate storage nodes on a Cray XC50 system. We show how this mechanism can be easily extended to support more data managers and any type of intermediate storage. Finally, we evaluate the performance of the provisioned storage system with a set of benchmarks.
{"title":"Dynamic Provisioning of Storage Resources: A Case Study with Burst Buffers","authors":"François Tessier, Maxime Martinasso, M. Chesi, Mark Klein, M. Gila","doi":"10.1109/IPDPSW50202.2020.00173","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00173","url":null,"abstract":"Complex applications and workflows needs are often exclusively expressed in terms of computational resources on HPC systems. In many cases, other resources like storage or network are not allocatable and are shared across the entire HPC system. By looking at the storage resources in particular, any workflow or application should be able to select both its preferred data manager and its required storage capability or capacity. To achieve such a goal, new mechanisms should be introduced. In this work, we present such a tool that dynamically provisions a data management system on top of storage devices. We propose a proof-of-concept that is able to deploy, on-demand, a parallel file-system across intermediate storage nodes on a Cray XC50 system. We show how this mechanism can be easily extended to support more data managers and any type of intermediate storage. Finally, we evaluate the performance of the provisioned storage system with a set of benchmarks.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126281085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-05-01DOI: 10.1109/IPDPSW50202.2020.00011
Albert Y. Zomaya
Along with the many developments in computing and communication technologies and the surge of mobile devices, a brand new paradigm, Edge Computing (EC), has gained a lot of momentum in recent times. In parallel, Artificial Intelligence (AI) applications have been thriving fuelled by breakthroughs in deep learning and the emergence of new hardware architectures. Billions of bytes of data, generated at the network edge which is composed of a multitude of heterogeneous devices, put great demands on data processing and structural optimization which led to a genuine demand for the integration of EC and AI leading to what is known today as Edge Intelligence (EI).
{"title":"HCW 2020 Keynote Speaker Edge Intelligence Empowering IoT Data Analytics","authors":"Albert Y. Zomaya","doi":"10.1109/IPDPSW50202.2020.00011","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00011","url":null,"abstract":"Along with the many developments in computing and communication technologies and the surge of mobile devices, a brand new paradigm, Edge Computing (EC), has gained a lot of momentum in recent times. In parallel, Artificial Intelligence (AI) applications have been thriving fuelled by breakthroughs in deep learning and the emergence of new hardware architectures. Billions of bytes of data, generated at the network edge which is composed of a multitude of heterogeneous devices, put great demands on data processing and structural optimization which led to a genuine demand for the integration of EC and AI leading to what is known today as Edge Intelligence (EI).","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128277011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-05-01DOI: 10.1109/IPDPSW50202.2020.00118
Ben Albrecht
This talk will present the ongoing work of developing a Chapel implementation of Random Forest, a popular ensembling learning method utilized both for predictive modeling and feature selection. Language features in Chapel make it possible to easily express shared-memory and distributed-memory implementations of this algorithm. Furthermore, Chapel’s built-in python interoperability functionality made it easier to implement a python front-end, making it accessible to a language popular among data scientists.
{"title":"Random Forests in Chapel","authors":"Ben Albrecht","doi":"10.1109/IPDPSW50202.2020.00118","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00118","url":null,"abstract":"This talk will present the ongoing work of developing a Chapel implementation of Random Forest, a popular ensembling learning method utilized both for predictive modeling and feature selection. Language features in Chapel make it possible to easily express shared-memory and distributed-memory implementations of this algorithm. Furthermore, Chapel’s built-in python interoperability functionality made it easier to implement a python front-end, making it accessible to a language popular among data scientists.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"41 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120838922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-05-01DOI: 10.1109/ipdpsw50202.2020.00186
Eunice E. Santos, Vairavan Murugappan, John Korah
During the last decade, the availability of large amounts of social network information from various social and socio-technical networks has increased dramatically. These data sources are inherently dynamic with constantly evolving relationships and connections between entities. Research in this area must address the challenge of analyzing these dynamic datasets under potentially strict time constraints. In addition, due to the sheer size of these networks, they tend to be stored and analyzed on distributed platforms. In our previous work, we designed methodologies which are anytime and anywhere to design scalable parallel/distributed algorithms that incorporate different forms of network changes. In this work, we will investigate various schemas to balance the incorporation of dynamic network changes that will substantially reduce idleness and load imbalances among processors. We will show theoretically that in most cases our buffer-based methodology performs better than the more common way of handling changes as they come in.
{"title":"New Approaches for Performance Optimization and Analysis of Large-Scale Dynamic Social Network Analysis using Anytime Anywhere Algorithms","authors":"Eunice E. Santos, Vairavan Murugappan, John Korah","doi":"10.1109/ipdpsw50202.2020.00186","DOIUrl":"https://doi.org/10.1109/ipdpsw50202.2020.00186","url":null,"abstract":"During the last decade, the availability of large amounts of social network information from various social and socio-technical networks has increased dramatically. These data sources are inherently dynamic with constantly evolving relationships and connections between entities. Research in this area must address the challenge of analyzing these dynamic datasets under potentially strict time constraints. In addition, due to the sheer size of these networks, they tend to be stored and analyzed on distributed platforms. In our previous work, we designed methodologies which are anytime and anywhere to design scalable parallel/distributed algorithms that incorporate different forms of network changes. In this work, we will investigate various schemas to balance the incorporation of dynamic network changes that will substantially reduce idleness and load imbalances among processors. We will show theoretically that in most cases our buffer-based methodology performs better than the more common way of handling changes as they come in.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"85 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121115171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-05-01DOI: 10.1109/IPDPSW50202.2020.00047
J. Blanuša, R. Stoica, P. Ienne, K. Atasu
Many fundamental graph mining problems, such as maximal clique enumeration and subgraph isomorphism, can be solved using combinatorial algorithms that are naturally expressed in a recursive form. However, recursive graph mining algorithms suffer from a high algorithmic complexity and long execution times. Moreover, because the recursive nature of these algorithms causes unpredictable execution and memory access patterns, parallelizing them on modern computer architectures poses challenges. In this work, we describe an efficient manycore CPU implementation of maximal clique enumeration (MCE), a basic building block of several social and biological network mining algorithms. First, we improve the single-thread performance of MCE by accelerating its computation-intensive kernels through cache-conscious data structures and vector instructions. Then, we develop a multi-core solution and eliminate its scalability bottlenecks by minimizing the scheduling and the memory-management overheads. On highly-parallel modern CPUs, we demonstrate an up to 19-fold performance improvement compared to a state-of-the-art multi-core implementation of MCE.
{"title":"Parallelizing Maximal Clique Enumeration on Modern Manycore Processors","authors":"J. Blanuša, R. Stoica, P. Ienne, K. Atasu","doi":"10.1109/IPDPSW50202.2020.00047","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00047","url":null,"abstract":"Many fundamental graph mining problems, such as maximal clique enumeration and subgraph isomorphism, can be solved using combinatorial algorithms that are naturally expressed in a recursive form. However, recursive graph mining algorithms suffer from a high algorithmic complexity and long execution times. Moreover, because the recursive nature of these algorithms causes unpredictable execution and memory access patterns, parallelizing them on modern computer architectures poses challenges. In this work, we describe an efficient manycore CPU implementation of maximal clique enumeration (MCE), a basic building block of several social and biological network mining algorithms. First, we improve the single-thread performance of MCE by accelerating its computation-intensive kernels through cache-conscious data structures and vector instructions. Then, we develop a multi-core solution and eliminate its scalability bottlenecks by minimizing the scheduling and the memory-management overheads. On highly-parallel modern CPUs, we demonstrate an up to 19-fold performance improvement compared to a state-of-the-art multi-core implementation of MCE.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132441365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-05-01DOI: 10.1109/ipdpsw50202.2020.00005
{"title":"Message from the 2020 General Co-Chairs","authors":"","doi":"10.1109/ipdpsw50202.2020.00005","DOIUrl":"https://doi.org/10.1109/ipdpsw50202.2020.00005","url":null,"abstract":"","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127062092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-05-01DOI: 10.1109/ipdpsw50202.2020.00116
Michael P. Ferguson
Language stability is an important upcoming feature of the Chapel programming language. Chapel users have both requested big changes to the language and also requested that the language become stable. This talk will discuss recent efforts to complete the big changes to the Chapel language so that the language can stabilize.
{"title":"Towards Stability in the Chapel Language","authors":"Michael P. Ferguson","doi":"10.1109/ipdpsw50202.2020.00116","DOIUrl":"https://doi.org/10.1109/ipdpsw50202.2020.00116","url":null,"abstract":"Language stability is an important upcoming feature of the Chapel programming language. Chapel users have both requested big changes to the language and also requested that the language become stable. This talk will discuss recent efforts to complete the big changes to the Chapel language so that the language can stabilize.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127067082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}