Anastasiia Butko, George Michelogiannakis, D. Donofrio, J. Shalf
Extensive research in material science together with outstanding engineering efforts allowed quantum technology to be significantly improved hence enabling continuing scaling of quantum circuit size. In around 10 years, quantum annealing circuits have reached 103 qubits and trailing by several years, universal quantum circuits now demonstrate similar trends. From the current trends we can expect that quantum computers will reach thousands of qubits in the next 5--10 years.
{"title":"Extending classical processors to support future large scale quantum accelerators","authors":"Anastasiia Butko, George Michelogiannakis, D. Donofrio, J. Shalf","doi":"10.1145/3310273.3324898","DOIUrl":"https://doi.org/10.1145/3310273.3324898","url":null,"abstract":"Extensive research in material science together with outstanding engineering efforts allowed quantum technology to be significantly improved hence enabling continuing scaling of quantum circuit size. In around 10 years, quantum annealing circuits have reached 103 qubits and trailing by several years, universal quantum circuits now demonstrate similar trends. From the current trends we can expect that quantum computers will reach thousands of qubits in the next 5--10 years.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114379953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anastasiia Butko, George Michelogiannakis, D. Donofrio, J. Shalf
Optimal mapping of a parallel code's communication graph is increasingly important as both system size and heterogeneity increase. However, the topology-aware task assignment problem is an NP-complete graph isomorphism problem. Existing task scheduling approaches are either heuristic or based on physical optimization algorithms, providing different speed and solution quality tradeoffs. Ising machines such as quantum and digital annealers have recently become available offering an alternative hardware solution to solve certain types of optimization problems. We propose an algorithm that allows expressing the problem for such machines and a domain specific partition strategy that enables to solve larger scale problems. TIGER - topology-aware task assignment mapper tool - implements the proposed algorithm and automatically integrates task - communication graph and an architecture graph into the quantum software environment. We use D-Wave's quantum annealer to demonstrate the solving algorithm and evaluate the proposed tool flow in terms of performance, partition efficiency and solution quality. Results show significant speed-up of the tool flow and reliable solution quality while using TIGER together with the proposed partition.
{"title":"TIGER","authors":"Anastasiia Butko, George Michelogiannakis, D. Donofrio, J. Shalf","doi":"10.1145/3310273.3321556","DOIUrl":"https://doi.org/10.1145/3310273.3321556","url":null,"abstract":"Optimal mapping of a parallel code's communication graph is increasingly important as both system size and heterogeneity increase. However, the topology-aware task assignment problem is an NP-complete graph isomorphism problem. Existing task scheduling approaches are either heuristic or based on physical optimization algorithms, providing different speed and solution quality tradeoffs. Ising machines such as quantum and digital annealers have recently become available offering an alternative hardware solution to solve certain types of optimization problems. We propose an algorithm that allows expressing the problem for such machines and a domain specific partition strategy that enables to solve larger scale problems. TIGER - topology-aware task assignment mapper tool - implements the proposed algorithm and automatically integrates task - communication graph and an architecture graph into the quantum software environment. We use D-Wave's quantum annealer to demonstrate the solving algorithm and evaluate the proposed tool flow in terms of performance, partition efficiency and solution quality. Results show significant speed-up of the tool flow and reliable solution quality while using TIGER together with the proposed partition.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123583636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Parallel programming frameworks such as MPI, OpenSHMEM, Charm++ and Legion have been widely used in many scientific domains (from bioinformatics, to computational physics, chemistry, among others) to implement distributed applications. While they have the same purpose, these frameworks differ in terms of programmability, performance, and scalability under different applications and cluster types. Hence, it is important for programmers to select the programming framework that is best suited to the characteristics of their application types (i.e. its computation and communication patterns) and the hardware setup of the target high-performance computing cluster. In this work, we consider several popular parallel programming frameworks for distributed applications. We first analyze their memory model, execution model, synchronization model and GPU support. We then compare their programmability, performance, scalability, and load-balancing capability on homogeneous computing cluster equipped with GPUs.
{"title":"A comparative study of parallel programming frameworks for distributed GPU applications","authors":"Ruidong Gu, M. Becchi","doi":"10.1145/3310273.3323071","DOIUrl":"https://doi.org/10.1145/3310273.3323071","url":null,"abstract":"Parallel programming frameworks such as MPI, OpenSHMEM, Charm++ and Legion have been widely used in many scientific domains (from bioinformatics, to computational physics, chemistry, among others) to implement distributed applications. While they have the same purpose, these frameworks differ in terms of programmability, performance, and scalability under different applications and cluster types. Hence, it is important for programmers to select the programming framework that is best suited to the characteristics of their application types (i.e. its computation and communication patterns) and the hardware setup of the target high-performance computing cluster. In this work, we consider several popular parallel programming frameworks for distributed applications. We first analyze their memory model, execution model, synchronization model and GPU support. We then compare their programmability, performance, scalability, and load-balancing capability on homogeneous computing cluster equipped with GPUs.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129479182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Burrello, Alex Marchioni, D. Brunelli, L. Benini
Principal component analysis (PCA) is a powerful data reduction method for Structural Health Monitoring. However, its computational cost and data memory footprint pose a significant challenge when PCA has to run on limited capability embedded platforms in low-cost IoT gateways. This paper presents a memory-efficient parallel implementation of the streaming History PCA algorithm. On our dataset, it achieves 10x compression factor and 59x memory reduction with less than 0.15 dB degradation in the reconstructed signal-to-noise ratio (RSNR) compared to standard PCA. Moreover, the algorithm benefits from parallelization on multiple cores, achieving a maximum speedup of 4.8x on Samsung ARTIK 710.
{"title":"Embedding principal component analysis for data reduction in structural health monitoring on low-cost IoT gateways","authors":"A. Burrello, Alex Marchioni, D. Brunelli, L. Benini","doi":"10.1145/3310273.3322822","DOIUrl":"https://doi.org/10.1145/3310273.3322822","url":null,"abstract":"Principal component analysis (PCA) is a powerful data reduction method for Structural Health Monitoring. However, its computational cost and data memory footprint pose a significant challenge when PCA has to run on limited capability embedded platforms in low-cost IoT gateways. This paper presents a memory-efficient parallel implementation of the streaming History PCA algorithm. On our dataset, it achieves 10x compression factor and 59x memory reduction with less than 0.15 dB degradation in the reconstructed signal-to-noise ratio (RSNR) compared to standard PCA. Moreover, the algorithm benefits from parallelization on multiple cores, achieving a maximum speedup of 4.8x on Samsung ARTIK 710.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127028125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xingbin Wang, Rui Hou, Yifan Zhu, Jun Zhang, Dan Meng
Deep neural network (DNN) models are widely used for inference in many application scenarios. DNN accelerators are not designed with security in mind, but for higher performance and lower energy consumption. Hence, they are suffering from the security risk of being attacked. The insecure design flaws of existing DNN accelerators can be exploited to recover the structure of DNN model from the plain instructions, thus the runtime environment can be controlled to obtain the weights of DNN model. Furthermore, the structure of DNN model running on the accelerator is acquired by the side channel information and interrupt status register. To protect general DNN accelerator from being attacked by model inversion attack, this paper proposes a secure and general architecture called NPUFort, which guarantees the confidentiality of the parameters of DNN model and mitigates side-channel information leakage. The experimental results demonstrate the feasibility and effectiveness of the secure architecture of DNN accelerators with negligible performance overhead.
{"title":"NPUFort: a secure architecture of DNN accelerator against model inversion attack","authors":"Xingbin Wang, Rui Hou, Yifan Zhu, Jun Zhang, Dan Meng","doi":"10.1145/3310273.3323070","DOIUrl":"https://doi.org/10.1145/3310273.3323070","url":null,"abstract":"Deep neural network (DNN) models are widely used for inference in many application scenarios. DNN accelerators are not designed with security in mind, but for higher performance and lower energy consumption. Hence, they are suffering from the security risk of being attacked. The insecure design flaws of existing DNN accelerators can be exploited to recover the structure of DNN model from the plain instructions, thus the runtime environment can be controlled to obtain the weights of DNN model. Furthermore, the structure of DNN model running on the accelerator is acquired by the side channel information and interrupt status register. To protect general DNN accelerator from being attacked by model inversion attack, this paper proposes a secure and general architecture called NPUFort, which guarantees the confidentiality of the parameters of DNN model and mitigates side-channel information leakage. The experimental results demonstrate the feasibility and effectiveness of the secure architecture of DNN accelerators with negligible performance overhead.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130484793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the era of Cyber Physical Systems, designers need to offer support for run-time adaptivity considering different constraints, including the internal status of the system. This work proposes a run-time monitoring approach for hardware accelerators, based on the Performance Application Programming Interface.
{"title":"Run-time performance monitoring of hardware accelerators: POSTER","authors":"D. Madroñal, Tiziana Fanni","doi":"10.1145/3310273.3323423","DOIUrl":"https://doi.org/10.1145/3310273.3323423","url":null,"abstract":"In the era of Cyber Physical Systems, designers need to offer support for run-time adaptivity considering different constraints, including the internal status of the system. This work proposes a run-time monitoring approach for hardware accelerators, based on the Performance Application Programming Interface.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129182333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Geneviève Ndour, T. Jost, A. Molnos, Y. Durand, A. Tisserand
Among various power reduction methods, variable bit-width arithmetic units have been proposed in approximate computing literature. In this paper, we add a variable bit-width memory unit in a RISC-V processor. Integrating both computation and memory units with variable bit-width leads to a power reduction: from 7% to 29% for Sobel filter application and from 13% to 24% for an application that computes the position of a robotic arm (forwardk2j). We also propose a global energy model for a RISC-V processor with variable bit-width units (for computation and memory). This model allows us to evaluate the impact of various parameters in both the software application (e.g., the amount of instructions that can be executed with a reduced bit-width) and the hardware architecture (e.g., impact of potential reduction for each unit).
{"title":"Evaluation of variable bit-width units in a RISC-V processor for approximate computing","authors":"Geneviève Ndour, T. Jost, A. Molnos, Y. Durand, A. Tisserand","doi":"10.1145/3310273.3323159","DOIUrl":"https://doi.org/10.1145/3310273.3323159","url":null,"abstract":"Among various power reduction methods, variable bit-width arithmetic units have been proposed in approximate computing literature. In this paper, we add a variable bit-width memory unit in a RISC-V processor. Integrating both computation and memory units with variable bit-width leads to a power reduction: from 7% to 29% for Sobel filter application and from 13% to 24% for an application that computes the position of a robotic arm (forwardk2j). We also propose a global energy model for a RISC-V processor with variable bit-width units (for computation and memory). This model allows us to evaluate the impact of various parameters in both the software application (e.g., the amount of instructions that can be executed with a reduced bit-width) and the hardware architecture (e.g., impact of potential reduction for each unit).","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116638767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
EuroHPC Joint Undertaking is a new European Union's strategic entity focused on pooling of the Union's and national resources on HPC to acquire, build and deploy the most powerful supercomputers in the world within Europe. This talk explores the European Processor Initiative (EPI), one of the cornerstones of this European strategic plan, a joint collaboration between more than twenty partners, representing industrial companies, academia and research centres with the goal to build a production processor with drastically better performance and power in support of the EU's focus on delivering its own Exascale-systems built on EU IP and achieving processor independence. Launched in December, the first three years draws processor and platform design; embedded software, middleware, applications and usage experts from 10 EU countries together to co-design Europe's first HPC Systems on Chip and accelerator technologies. The EU-CPU family is targeted to debut in 2020 on a pre-exascale prototype system and production-ready by the 2021 timeframe.
{"title":"European processor initiative: the industrial cornerstone of EuroHPC for exascale era","authors":"M. Kovač","doi":"10.1145/3310273.3323432","DOIUrl":"https://doi.org/10.1145/3310273.3323432","url":null,"abstract":"EuroHPC Joint Undertaking is a new European Union's strategic entity focused on pooling of the Union's and national resources on HPC to acquire, build and deploy the most powerful supercomputers in the world within Europe. This talk explores the European Processor Initiative (EPI), one of the cornerstones of this European strategic plan, a joint collaboration between more than twenty partners, representing industrial companies, academia and research centres with the goal to build a production processor with drastically better performance and power in support of the EU's focus on delivering its own Exascale-systems built on EU IP and achieving processor independence. Launched in December, the first three years draws processor and platform design; embedded software, middleware, applications and usage experts from 10 EU countries together to co-design Europe's first HPC Systems on Chip and accelerator technologies. The EU-CPU family is targeted to debut in 2020 on a pre-exascale prototype system and production-ready by the 2021 timeframe.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133077273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
On June 28, 2018, he board of directors of the German Informatics Society (GI) adopted new ethical guidelines. Throughout the development process, the main authors, mainly members of GI's "Informatics and Ethics" special interest group in close cooperation with the president of GI, incorporated feedback and suggestions from numerous GI members on the draft.
{"title":"The german informatics society's new ethical guidelines: POSTER","authors":"C. Trinitis, C. Class, Stefan Ullrich","doi":"10.1145/3310273.3323428","DOIUrl":"https://doi.org/10.1145/3310273.3323428","url":null,"abstract":"On June 28, 2018, he board of directors of the German Informatics Society (GI) adopted new ethical guidelines. Throughout the development process, the main authors, mainly members of GI's \"Informatics and Ethics\" special interest group in close cooperation with the president of GI, incorporated feedback and suggestions from numerous GI members on the draft.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"274 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114482720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Palumbo, Tiziana Fanni, Carlo Sau, Luca Pulina, L. Raffo, M. Masin, Evgeny Shindin, P. S. Rojas, K. Desnos, M. Pelcat, Alfonso Rodríguez, E. Juárez, F. Regazzoni, G. Meloni, Katiuscia Zedda, H. Myrhaug, Leszek Kaliciak, Joost Andriaanse, Julio A. de Oliveira Filho, Pablo Muñoz, A. Toffetti
Cyber-Physical Systems (CPS) are embedded computational collaborating devices, capable of sensing and controlling physical elements and, often, responding to humans. Designing and managing systems able to respond to different, concurrent requirements during operation is not straightforward, and introduce the need of proper support at design-time and run-time. The Cross-layer modEl-based fRamework for multi-oBjective dEsign of Reconfigurable systems in unceRtain hybRid envirOnments (CERBERO) EU project has developed a design environment for adaptive CPS. CERBERO approach leverages on model-based methodologies including different technologies and tools developed to cover design and operation from user interactions down to low level computing layer implementation.
{"title":"CERBERO: Cross-layer modEl-based fRamework for multi-oBjective dEsign of reconfigurable systems in unceRtain hybRid envirOnments: Invited paper: CERBERO teams from UniSS, UniCA, IBM Research, TASE, INSA-Rennes, UPM, USI, Abinsula, AmbieSense, TNO, S&T, CRF","authors":"F. Palumbo, Tiziana Fanni, Carlo Sau, Luca Pulina, L. Raffo, M. Masin, Evgeny Shindin, P. S. Rojas, K. Desnos, M. Pelcat, Alfonso Rodríguez, E. Juárez, F. Regazzoni, G. Meloni, Katiuscia Zedda, H. Myrhaug, Leszek Kaliciak, Joost Andriaanse, Julio A. de Oliveira Filho, Pablo Muñoz, A. Toffetti","doi":"10.1145/3310273.3323436","DOIUrl":"https://doi.org/10.1145/3310273.3323436","url":null,"abstract":"Cyber-Physical Systems (CPS) are embedded computational collaborating devices, capable of sensing and controlling physical elements and, often, responding to humans. Designing and managing systems able to respond to different, concurrent requirements during operation is not straightforward, and introduce the need of proper support at design-time and run-time. The Cross-layer modEl-based fRamework for multi-oBjective dEsign of Reconfigurable systems in unceRtain hybRid envirOnments (CERBERO) EU project has developed a design environment for adaptive CPS. CERBERO approach leverages on model-based methodologies including different technologies and tools developed to cover design and operation from user interactions down to low level computing layer implementation.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115847131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}