Pub Date : 2022-10-24DOI: 10.1109/IGSC55832.2022.9969365
K. Fukazawa, Jiacheng Zhou, H. Nakashima
Modern CPUs suffer from power efficiency heterogeneity, which can result in additional energy cost or performance loss. On the other hand, future supercomputers are expected to be power constrained. This paper focuses on energy aware scheduling algorithms targeted on two situations considering this node heterogeneity. In single-node situation, workload consists of various single-node jobs, Combinatorial Optimization Algorithm saves energy by calculating a local optimal power efficiency node allocation plan from KM (Kuhn-Munkres) algorithm. In multi-node situation, power cap causes load unbalancing in multi-node jobs due to the node heterogeneity. Sliding Window Algorithm targets on reducing such unbalancing by sliding window. Proposed algorithms are evaluated in the simulation and real supercomputer environment. In single-node situation, Combinatorial Optimization Algorithm achieved up to 2.92% saving. For the multi-node situation, workload is designed based on real historic workload, and up to 5.36% saving was achieved by Sliding Window Algorithm.
{"title":"Energy Aware Scheduler of Single/Multi-Node Jobs Considering CPU Node Heterogeneity","authors":"K. Fukazawa, Jiacheng Zhou, H. Nakashima","doi":"10.1109/IGSC55832.2022.9969365","DOIUrl":"https://doi.org/10.1109/IGSC55832.2022.9969365","url":null,"abstract":"Modern CPUs suffer from power efficiency heterogeneity, which can result in additional energy cost or performance loss. On the other hand, future supercomputers are expected to be power constrained. This paper focuses on energy aware scheduling algorithms targeted on two situations considering this node heterogeneity. In single-node situation, workload consists of various single-node jobs, Combinatorial Optimization Algorithm saves energy by calculating a local optimal power efficiency node allocation plan from KM (Kuhn-Munkres) algorithm. In multi-node situation, power cap causes load unbalancing in multi-node jobs due to the node heterogeneity. Sliding Window Algorithm targets on reducing such unbalancing by sliding window. Proposed algorithms are evaluated in the simulation and real supercomputer environment. In single-node situation, Combinatorial Optimization Algorithm achieved up to 2.92% saving. For the multi-node situation, workload is designed based on real historic workload, and up to 5.36% saving was achieved by Sliding Window Algorithm.","PeriodicalId":114200,"journal":{"name":"2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127928489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Convolutional Neural Networks (CNNs) are widely used due to their effectiveness in various AI applications such as object recognition, speech processing, etc., where the multiply-and-accumulate (MAC) operation contributes to $sim 95%$ of the computation time. From the hardware implementation perspective, the performance of current CMOS-based MAC accelerators is limited mainly due to their von-Neumann architecture and corresponding limited memory bandwidth. In this way, silicon photonics has been recently explored as a promising solution for accelerator design to improve the speed and power-efficiency of the designs as opposed to electronic memristive crossbars. In this work, we briefly study recent silicon photonics accelerators and take initial steps to develop an open-source and adaptive crossbar architecture simulator for that. Keeping the original functionality of the MNSIM tool [1], we add a new photonic mode that utilizes the pre-existing algorithm to work with a photonic Phase Change Memory (pPCM) based crossbar structure. With inputs from the CNN's topology, the accelerator configuration, and experimentally-benchmarked data, the presented simulator can report the optimal crossbar size, the number of crossbars needed, and the estimation of total area, power, and latency.
{"title":"Toward a Behavioral-Level End-to-End Framework for Silicon Photonics Accelerators","authors":"Emily Lattanzio, Ranyang Zhou, A. Roohi, Abdallah Khreishah, Durga Misra, Shaahin Angizi","doi":"10.1109/IGSC55832.2022.9969371","DOIUrl":"https://doi.org/10.1109/IGSC55832.2022.9969371","url":null,"abstract":"Convolutional Neural Networks (CNNs) are widely used due to their effectiveness in various AI applications such as object recognition, speech processing, etc., where the multiply-and-accumulate (MAC) operation contributes to $sim 95%$ of the computation time. From the hardware implementation perspective, the performance of current CMOS-based MAC accelerators is limited mainly due to their von-Neumann architecture and corresponding limited memory bandwidth. In this way, silicon photonics has been recently explored as a promising solution for accelerator design to improve the speed and power-efficiency of the designs as opposed to electronic memristive crossbars. In this work, we briefly study recent silicon photonics accelerators and take initial steps to develop an open-source and adaptive crossbar architecture simulator for that. Keeping the original functionality of the MNSIM tool [1], we add a new photonic mode that utilizes the pre-existing algorithm to work with a photonic Phase Change Memory (pPCM) based crossbar structure. With inputs from the CNN's topology, the accelerator configuration, and experimentally-benchmarked data, the presented simulator can report the optimal crossbar size, the number of crossbars needed, and the estimation of total area, power, and latency.","PeriodicalId":114200,"journal":{"name":"2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121659219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-24DOI: 10.1109/IGSC55832.2022.9969354
Saad Saleh, A. Goossens, T. Banerjee, B. Koldehofe
Match-action processors play a crucial role of communicating end-users in the Internet by computing network paths and enforcing administrator policies. The computation process uses a specialized memory called Ternary Content Addressable Memory (TCAM) to store processing rules and use header information of network packets to perform a match within a single clock cycle. Currently, TCAM memories consume huge amounts of energy resources due to the use of traditional transistor-based CMOS technology. In this article, we motivate the use of a novel component, the memristor, for the development of a TCAM architecture. Memristors can provide energy efficiency, non-volatility, and better resource density as compared to transistors. We have proposed a novel memristor-based TCAM architecture built upon the voltage divider principle for energy efficient match-action processing. Moreover, we have tested the performance of the memristor-based TCAM architecture using the experimental data of a novel Nb-doped SrTiO3 memristor. Energy analysis of the proposed TCAM architecture for given memristor shows promising power consumption statistics of 16 μ $W$ for a match operation and 1 $mu{W}$. for a mismatch operation.
{"title":"Towards Energy Efficient Memristor-based TCAM for Match-Action Processing","authors":"Saad Saleh, A. Goossens, T. Banerjee, B. Koldehofe","doi":"10.1109/IGSC55832.2022.9969354","DOIUrl":"https://doi.org/10.1109/IGSC55832.2022.9969354","url":null,"abstract":"Match-action processors play a crucial role of communicating end-users in the Internet by computing network paths and enforcing administrator policies. The computation process uses a specialized memory called Ternary Content Addressable Memory (TCAM) to store processing rules and use header information of network packets to perform a match within a single clock cycle. Currently, TCAM memories consume huge amounts of energy resources due to the use of traditional transistor-based CMOS technology. In this article, we motivate the use of a novel component, the memristor, for the development of a TCAM architecture. Memristors can provide energy efficiency, non-volatility, and better resource density as compared to transistors. We have proposed a novel memristor-based TCAM architecture built upon the voltage divider principle for energy efficient match-action processing. Moreover, we have tested the performance of the memristor-based TCAM architecture using the experimental data of a novel Nb-doped SrTiO3 memristor. Energy analysis of the proposed TCAM architecture for given memristor shows promising power consumption statistics of 16 μ $W$ for a match operation and 1 $mu{W}$. for a mismatch operation.","PeriodicalId":114200,"journal":{"name":"2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124099772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-24DOI: 10.1109/IGSC55832.2022.9969362
Igor Fontana De Nardin, P. Stolf, S. Caux
In recent years, academics and industry have increased their efforts to find solutions to reduce greenhouse gas (GHG) due to its impact on climate change. Two approaches to reducing these emissions are decreasing energy consumption and/or increasing the use of clean energy. Data centers are one of the most expensive energy actors in Information and Communications Technology (ICT). One way to provide clean energy to Data Centers is by using power from renewable sources, such as solar and wind. However, renewable energy introduces several uncertainties due to its intermittence. Dealing with these uncertainties demands different approaches at different levels of management. This work is part of the Datazero2 Project which introduces a clean-by-design data center architecture using only renewable energy. Due to no connection to the grid, the data center manager must handle power envelope constraints. This article investigates some scheduling and power capping online heuristics in an attempt to identify the best algorithms to handle fluctuating power profiles without hindering job execution. Then, it details experiments comparing the results of the heuristics. The results show that our heuristic provides a well-balanced solution considering power and Quality of Service (QoS).
{"title":"Evaluation of Heuristics to Manage a Data Center Under Power Constraints","authors":"Igor Fontana De Nardin, P. Stolf, S. Caux","doi":"10.1109/IGSC55832.2022.9969362","DOIUrl":"https://doi.org/10.1109/IGSC55832.2022.9969362","url":null,"abstract":"In recent years, academics and industry have increased their efforts to find solutions to reduce greenhouse gas (GHG) due to its impact on climate change. Two approaches to reducing these emissions are decreasing energy consumption and/or increasing the use of clean energy. Data centers are one of the most expensive energy actors in Information and Communications Technology (ICT). One way to provide clean energy to Data Centers is by using power from renewable sources, such as solar and wind. However, renewable energy introduces several uncertainties due to its intermittence. Dealing with these uncertainties demands different approaches at different levels of management. This work is part of the Datazero2 Project which introduces a clean-by-design data center architecture using only renewable energy. Due to no connection to the grid, the data center manager must handle power envelope constraints. This article investigates some scheduling and power capping online heuristics in an attempt to identify the best algorithms to handle fluctuating power profiles without hindering job execution. Then, it details experiments comparing the results of the heuristics. The results show that our heuristic provides a well-balanced solution considering power and Quality of Service (QoS).","PeriodicalId":114200,"journal":{"name":"2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125425023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-24DOI: 10.1109/IGSC55832.2022.9969372
Wenkai Guan, Cristinel Ababei
In this paper, we present a new scheduling algorithm, Qin2, for heterogeneous datacenters. Its goal is to improve performance measured as jobs completion time by exploiting increased server heterogeneity using deep neural network (DNN) models. The proposed scheduling framework uses an efficient automatic feature selection technique, which significantly reduces the training data size required to train the DNN to levels that provide satisfactory prediction accuracy. Its efficiency is especially helpful when the DNN model is re-trained to adapt it to new types of application workloads arriving to the datacenter. The novelty of the proposed scheduling approach lies in this feature selection technique and the integration of simple and training-efficient DNN models into a scheduler, which is deployed on a real cluster of heterogeneous nodes. Experiments demonstrate that the Qin2 scheduler outperforms state-of-the-art schedulers in terms of jobs completion time.
{"title":"Less is More: Learning Simplicity in Datacenter Scheduling","authors":"Wenkai Guan, Cristinel Ababei","doi":"10.1109/IGSC55832.2022.9969372","DOIUrl":"https://doi.org/10.1109/IGSC55832.2022.9969372","url":null,"abstract":"In this paper, we present a new scheduling algorithm, Qin2, for heterogeneous datacenters. Its goal is to improve performance measured as jobs completion time by exploiting increased server heterogeneity using deep neural network (DNN) models. The proposed scheduling framework uses an efficient automatic feature selection technique, which significantly reduces the training data size required to train the DNN to levels that provide satisfactory prediction accuracy. Its efficiency is especially helpful when the DNN model is re-trained to adapt it to new types of application workloads arriving to the datacenter. The novelty of the proposed scheduling approach lies in this feature selection technique and the integration of simple and training-efficient DNN models into a scheduler, which is deployed on a real cluster of heterogeneous nodes. Experiments demonstrate that the Qin2 scheduler outperforms state-of-the-art schedulers in terms of jobs completion time.","PeriodicalId":114200,"journal":{"name":"2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122455898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-24DOI: 10.1109/IGSC55832.2022.9969367
Maria Patrou, K. Kent, Joran Siu, Michael H. Dawson
Node.js applications can incorporate CPU Dynamic Voltage and Frequency Scaling (DVFS) to adjust their energy consumption and runtime performance. Thus, we build a CPU frequency scaling policy that promotes “green” and high-performing requests and enables customizations of their execution profile. Our technique requires a profiling step to classify the web requests based on the CPU frequency impact on their energy consumption and runtime performance and on their code syntax/paradigm. We also include the case of concurrent request execution in our model to select an appropriate CPU frequency. We enable priority-based requests to work along with this model for users to customize and formulate a policy based on their goals. Finally, we perform an energy-runtime analysis, which shows that our policy with the proposed configurations is an energy-efficient approach compared to the Linux scaling governors.
{"title":"Optimizing Energy Efficiency of Node.js Applications with CPU DVFS Awareness","authors":"Maria Patrou, K. Kent, Joran Siu, Michael H. Dawson","doi":"10.1109/IGSC55832.2022.9969367","DOIUrl":"https://doi.org/10.1109/IGSC55832.2022.9969367","url":null,"abstract":"Node.js applications can incorporate CPU Dynamic Voltage and Frequency Scaling (DVFS) to adjust their energy consumption and runtime performance. Thus, we build a CPU frequency scaling policy that promotes “green” and high-performing requests and enables customizations of their execution profile. Our technique requires a profiling step to classify the web requests based on the CPU frequency impact on their energy consumption and runtime performance and on their code syntax/paradigm. We also include the case of concurrent request execution in our model to select an appropriate CPU frequency. We enable priority-based requests to work along with this model for users to customize and formulate a policy based on their goals. Finally, we perform an energy-runtime analysis, which shows that our policy with the proposed configurations is an energy-efficient approach compared to the Linux scaling governors.","PeriodicalId":114200,"journal":{"name":"2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122564923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-24DOI: 10.1109/IGSC55832.2022.9969356
D. Wilson, Asma H. Al-rawi, Lowren H. Lawson, Siddhartha Jana, Federico Ardanaz, J. Eastep, A. Coskun
Parallel programming across many CPU cores offers many challenges in software design, such as mitigating performance or efficiency loss in applications that reach synchronization points at varying times across the CPU cores. Existing solutions often aim to resolve this through clever optimizations in application design, or by reacting to the imbalance by throttling the CPU core frequency of the early-finishing cores at application run time. In this work, we propose a method to rebalance bulksynchronous MPI applications by selectively speeding up the latefinishing cores throughout application run time. This algorithm makes use of the new Intel® Speed Select Turbo Frequency feature that enables software to guide the hardware toward increasing the turbo frequency limits of some cores in exchange for decreased turbo frequency limits in other cores. We demonstrate up to 40% energy reduction and 17% execution time reduction in a highly-imbalanced, compute-bound benchmark application and up to 21% energy reduction with 5% execution time reduction in an imbalanced real-world application.
{"title":"Guiding Hardware-Driven Turbo with Application Performance Awareness","authors":"D. Wilson, Asma H. Al-rawi, Lowren H. Lawson, Siddhartha Jana, Federico Ardanaz, J. Eastep, A. Coskun","doi":"10.1109/IGSC55832.2022.9969356","DOIUrl":"https://doi.org/10.1109/IGSC55832.2022.9969356","url":null,"abstract":"Parallel programming across many CPU cores offers many challenges in software design, such as mitigating performance or efficiency loss in applications that reach synchronization points at varying times across the CPU cores. Existing solutions often aim to resolve this through clever optimizations in application design, or by reacting to the imbalance by throttling the CPU core frequency of the early-finishing cores at application run time. In this work, we propose a method to rebalance bulksynchronous MPI applications by selectively speeding up the latefinishing cores throughout application run time. This algorithm makes use of the new Intel® Speed Select Turbo Frequency feature that enables software to guide the hardware toward increasing the turbo frequency limits of some cores in exchange for decreased turbo frequency limits in other cores. We demonstrate up to 40% energy reduction and 17% execution time reduction in a highly-imbalanced, compute-bound benchmark application and up to 21% energy reduction with 5% execution time reduction in an imbalanced real-world application.","PeriodicalId":114200,"journal":{"name":"2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133679884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-24DOI: 10.1109/IGSC55832.2022.9969377
Cesar Castellon, Swapnoneel Roy, O. P. Kreidl, Ayan Dutta, Ladislau Bölöni
Hash-based message authentication code (HMAC) involves a secret cryptographic key and an underlying crypto-graphic hash function. HMAC is used to simultaneously verify both integrity and authenticity of messages and, in turn, plays a significant role in secure communication protocols e.g., Transport Layer Security (TLS). The high energy consumption of HMAC is well-known as is the trade-off between security, energy consumption, and performance. Previous research in reducing energy consumption in HMAC has approached the problem primarily at the system software level (e.g. scheduling algorithms). This paper attempts to reduce energy consumption in HMAC by applying an energy-reducing algorithmic engineering technique to the underlying hash function of HMAC, as a means to preserve the promised security benefits. Using pyRAPL, a python library to measure computational energy, we experiment with both the standard and energy-reduced implementations of HMAC for different input sizes (in bytes). Our results show up to 17% reduction in energy consumption by HMAC, while preserving function. Such energy savings in HMAC, by virtue of HMAC's prevalent use in existing network protocols, extrapolate to lighter-weight network operations with respect to total energy consumption.
{"title":"Towards an Energy-Efficient Hash-based Message Authentication Code (HMAC)","authors":"Cesar Castellon, Swapnoneel Roy, O. P. Kreidl, Ayan Dutta, Ladislau Bölöni","doi":"10.1109/IGSC55832.2022.9969377","DOIUrl":"https://doi.org/10.1109/IGSC55832.2022.9969377","url":null,"abstract":"Hash-based message authentication code (HMAC) involves a secret cryptographic key and an underlying crypto-graphic hash function. HMAC is used to simultaneously verify both integrity and authenticity of messages and, in turn, plays a significant role in secure communication protocols e.g., Transport Layer Security (TLS). The high energy consumption of HMAC is well-known as is the trade-off between security, energy consumption, and performance. Previous research in reducing energy consumption in HMAC has approached the problem primarily at the system software level (e.g. scheduling algorithms). This paper attempts to reduce energy consumption in HMAC by applying an energy-reducing algorithmic engineering technique to the underlying hash function of HMAC, as a means to preserve the promised security benefits. Using pyRAPL, a python library to measure computational energy, we experiment with both the standard and energy-reduced implementations of HMAC for different input sizes (in bytes). Our results show up to 17% reduction in energy consumption by HMAC, while preserving function. Such energy savings in HMAC, by virtue of HMAC's prevalent use in existing network protocols, extrapolate to lighter-weight network operations with respect to total energy consumption.","PeriodicalId":114200,"journal":{"name":"2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129740752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-24DOI: 10.1109/IGSC55832.2022.9969361
Alberto Dequino, Francesco Conti, L. Benini
State-of-the-Art Edge Artificial Intelligence (AI) is currently mostly targeted at a train-then-deploy paradigm: edge devices are exclusively responsible for inference, whereas training is delegated to data centers, leading to high energy and CO2 impact. On-Device Continual Learning could help in making Edge AI more sustainable by specializing AI models directly on-field. We deploy a continual image recognition model on a Jetson Xavier NX embedded system, and experimentally investigate how Attention influences performance and its viability as a Continual Learning backbone, analyzing the redundancy of its components to prune and further improve our solution efficiency. We achieve up to 83.81% accuracy on the Core50's new instances and classes scenario, starting from a pre-trained tiny Vision Transformer, surpassing AR1 *free with Latent Replay, and reach performance comparable and superior to the SoA without relying on growing Replay Examples.
{"title":"ViT-LR: Pushing the Envelope for Transformer-Based on-Device Embedded Continual Learning","authors":"Alberto Dequino, Francesco Conti, L. Benini","doi":"10.1109/IGSC55832.2022.9969361","DOIUrl":"https://doi.org/10.1109/IGSC55832.2022.9969361","url":null,"abstract":"State-of-the-Art Edge Artificial Intelligence (AI) is currently mostly targeted at a train-then-deploy paradigm: edge devices are exclusively responsible for inference, whereas training is delegated to data centers, leading to high energy and CO2 impact. On-Device Continual Learning could help in making Edge AI more sustainable by specializing AI models directly on-field. We deploy a continual image recognition model on a Jetson Xavier NX embedded system, and experimentally investigate how Attention influences performance and its viability as a Continual Learning backbone, analyzing the redundancy of its components to prune and further improve our solution efficiency. We achieve up to 83.81% accuracy on the Core50's new instances and classes scenario, starting from a pre-trained tiny Vision Transformer, surpassing AR1 *free with Latent Replay, and reach performance comparable and superior to the SoA without relying on growing Replay Examples.","PeriodicalId":114200,"journal":{"name":"2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130575958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-10DOI: 10.1109/IGSC55832.2022.9969357
Peyton S. Chandarana, Mohammadreza Mohammadi, J. Seekings, Ramtin Zand
As the technology industry is moving towards implementing tasks such as natural language processing, path planning, image classification, and more on smaller edge computing devices, the demand for more efficient implementations of algorithms and hardware accelerators has become a significant area of research. In recent years, several edge deep learning hardware accelerators have been released that specifically focus on reducing the power and area consumed by deep neural networks (DNNs). On the other hand, spiking neural networks (SNNs) which operate on discrete time-series data, have been shown to achieve substantial power reductions over even the aforementioned edge DNN accelerators when deployed on specialized neuromorphic event-based/asynchronous hardware. While neuromorphic hardware has demonstrated great potential for accelerating deep learning tasks at the edge, the current space of algorithms and hardware is limited and still in rather early development. Thus, many hybrid approaches have been proposed which aim to convert pre-trained DNNs into SNNs. In this work, we provide a general guide to converting pre-trained DNNs into SNNs while also presenting techniques to improve the deployment of converted SNNs on neuromorphic hardware with respect to latency, power, and energy. Our experimental results show that when compared against the Intel Neural Compute Stick 2, Intel's neuromorphic processor, Loihi, consumes up to 27× less power and 5× less energy in the tested image classification tasks by using our SNN improvement techniques.
{"title":"Energy-Efficient Deployment of Machine Learning Workloads on Neuromorphic Hardware","authors":"Peyton S. Chandarana, Mohammadreza Mohammadi, J. Seekings, Ramtin Zand","doi":"10.1109/IGSC55832.2022.9969357","DOIUrl":"https://doi.org/10.1109/IGSC55832.2022.9969357","url":null,"abstract":"As the technology industry is moving towards implementing tasks such as natural language processing, path planning, image classification, and more on smaller edge computing devices, the demand for more efficient implementations of algorithms and hardware accelerators has become a significant area of research. In recent years, several edge deep learning hardware accelerators have been released that specifically focus on reducing the power and area consumed by deep neural networks (DNNs). On the other hand, spiking neural networks (SNNs) which operate on discrete time-series data, have been shown to achieve substantial power reductions over even the aforementioned edge DNN accelerators when deployed on specialized neuromorphic event-based/asynchronous hardware. While neuromorphic hardware has demonstrated great potential for accelerating deep learning tasks at the edge, the current space of algorithms and hardware is limited and still in rather early development. Thus, many hybrid approaches have been proposed which aim to convert pre-trained DNNs into SNNs. In this work, we provide a general guide to converting pre-trained DNNs into SNNs while also presenting techniques to improve the deployment of converted SNNs on neuromorphic hardware with respect to latency, power, and energy. Our experimental results show that when compared against the Intel Neural Compute Stick 2, Intel's neuromorphic processor, Loihi, consumes up to 27× less power and 5× less energy in the tested image classification tasks by using our SNN improvement techniques.","PeriodicalId":114200,"journal":{"name":"2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)","volume":"325 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134431033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}