Tao-Yi Lee, Khuong Vo, Wongi Baek, M. Khine, N. Dutt
Noninvasive, and continuous physiological sensing enabled by novel wearable sensors is generating unprecedented diagnostic insights in many medical practices. However, the limited battery capacity of these wearable sensors poses a critical challenge in extending device lifetime in order to prevent omission of informative events. In this work, we exploit the inherent sparsity of physiological signals to intelligently enable selective transmission of these signals and thereby improve the energy efficiency of wearable sensors. We propose STINT, a selective transmission framework that generates a sparse representation of the raw signal based on domain-specific knowledge, and which can be integrated into a wide range of resource-constrained embedded sensing IoT platforms. STINT employs a neural network (NN) for selective transmission: the NN identifies, and transmits only the informative parts of the raw signal, thereby achieving low power operation. We validate STINT and establish its efficacy in the domain of IoT for energy-efficient physiological monitoring, by testing our framework on EcoBP - a novel miniaturized, and wireless continuous blood pressure sensor. Early experimental results on the EcoBP device demonstrate that the STINT-enabled EcoBP sensor outperforms the native platform by 14% of sensor energy consumption, with room for additional energy savings via complementary bluetooth and wireless optimizations.
{"title":"STINT: selective transmission for low-energy physiological monitoring","authors":"Tao-Yi Lee, Khuong Vo, Wongi Baek, M. Khine, N. Dutt","doi":"10.1145/3370748.3406563","DOIUrl":"https://doi.org/10.1145/3370748.3406563","url":null,"abstract":"Noninvasive, and continuous physiological sensing enabled by novel wearable sensors is generating unprecedented diagnostic insights in many medical practices. However, the limited battery capacity of these wearable sensors poses a critical challenge in extending device lifetime in order to prevent omission of informative events. In this work, we exploit the inherent sparsity of physiological signals to intelligently enable selective transmission of these signals and thereby improve the energy efficiency of wearable sensors. We propose STINT, a selective transmission framework that generates a sparse representation of the raw signal based on domain-specific knowledge, and which can be integrated into a wide range of resource-constrained embedded sensing IoT platforms. STINT employs a neural network (NN) for selective transmission: the NN identifies, and transmits only the informative parts of the raw signal, thereby achieving low power operation. We validate STINT and establish its efficacy in the domain of IoT for energy-efficient physiological monitoring, by testing our framework on EcoBP - a novel miniaturized, and wireless continuous blood pressure sensor. Early experimental results on the EcoBP device demonstrate that the STINT-enabled EcoBP sensor outperforms the native platform by 14% of sensor energy consumption, with room for additional energy savings via complementary bluetooth and wireless optimizations.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126591344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Naebeom Park, Yulhwa Kim, Daehyun Ahn, Taesu Kim, Jae-Joon Kim
In Long Short-Term Memory (LSTM) neural network models, a weight matrix tends to be repeatedly loaded from DRAM if the size of on-chip storage of the processor is not large enough to store the entire matrix. To alleviate heavy overhead of DRAM access for weight loading in LSTM computations, we propose a weight reuse scheme which utilizes the weight sharing characteristics in two adjacent time-step computations. Experimental results show that the proposed weight reuse scheme reduces the energy consumption by 28.4-57.3% and increases the overall throughput by 110.8% compared to the conventional schemes.
{"title":"Time-step interleaved weight reuse for LSTM neural network computing","authors":"Naebeom Park, Yulhwa Kim, Daehyun Ahn, Taesu Kim, Jae-Joon Kim","doi":"10.1145/3370748.3406561","DOIUrl":"https://doi.org/10.1145/3370748.3406561","url":null,"abstract":"In Long Short-Term Memory (LSTM) neural network models, a weight matrix tends to be repeatedly loaded from DRAM if the size of on-chip storage of the processor is not large enough to store the entire matrix. To alleviate heavy overhead of DRAM access for weight loading in LSTM computations, we propose a weight reuse scheme which utilizes the weight sharing characteristics in two adjacent time-step computations. Experimental results show that the proposed weight reuse scheme reduces the energy consumption by 28.4-57.3% and increases the overall throughput by 110.8% compared to the conventional schemes.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129570434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents the design and evaluation of an energy-efficient seizure detection system for emerging EEG-based monitoring applications, such as non-convulsive epileptic seizure detection and Freezing-of-Gait (FoG) detection. As part of the BrainWave system, a BrainWave processor for flexible and energy-efficient signal processing is designed. The key system design parameters, including algorithmic optimizations, feature offloading and near-threshold computing are evaluated in this work. The BrainWave processor is evaluated while executing a complex EEG-based epileptic seizure detection algorithm. In a 28-nm FDSOI technology, 325 μJ per classification at 0.9 V and 290 μJ at 0.5 V are achieved using an optimized software-only implementation. By leveraging a Coarse-Grained Reconfigurable Array (CGRA), 160 μJ and 135 μJ are obtained, respectively, while maintaining a high level of flexibility. Near-threshold computing combined with CGRA acceleration leads to an energy reduction of up to 59%, or 55% including idle-time overhead.
{"title":"BrainWave: an energy-efficient EEG monitoring system - evaluation and trade-offs","authors":"B. Bruin, K. Singh, J. Huisken, H. Corporaal","doi":"10.1145/3370748.3406571","DOIUrl":"https://doi.org/10.1145/3370748.3406571","url":null,"abstract":"This paper presents the design and evaluation of an energy-efficient seizure detection system for emerging EEG-based monitoring applications, such as non-convulsive epileptic seizure detection and Freezing-of-Gait (FoG) detection. As part of the BrainWave system, a BrainWave processor for flexible and energy-efficient signal processing is designed. The key system design parameters, including algorithmic optimizations, feature offloading and near-threshold computing are evaluated in this work. The BrainWave processor is evaluated while executing a complex EEG-based epileptic seizure detection algorithm. In a 28-nm FDSOI technology, 325 μJ per classification at 0.9 V and 290 μJ at 0.5 V are achieved using an optimized software-only implementation. By leveraging a Coarse-Grained Reconfigurable Array (CGRA), 160 μJ and 135 μJ are obtained, respectively, while maintaining a high level of flexibility. Near-threshold computing combined with CGRA acceleration leads to an energy reduction of up to 59%, or 55% including idle-time overhead.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114983875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Pasandi, Mackenzie Peterson, Moisés Herrera, Shahin Nazarian, M. Pedram
This paper aims at integrating three powerful techniques namely Deep Learning, Approximate Computing, and Low Power Design into a strategy to optimize logic at the synthesis level. We utilize advances in deep learning to guide an approximate logic synthesis engine to minimize the dynamic power consumption of a given digital CMOS circuit, subject to a predetermined error rate at the primary outputs. Our framework, Deep-PowerX1, focuses on replacing or removing gates on a technology-mapped network and uses a Deep Neural Network (DNN) to predict error rates at primary outputs of the circuit when a specific part of the netlist is approximated. The primary goal of Deep-PowerX is to reduce the dynamic power whereas area reduction serves as a secondary objective. Using the said DNN, Deep-PowerX is able to reduce the exponential time complexity of standard approximate logic synthesis to linear time. Experiments are done on numerous open source benchmark circuits. Results show significant reduction in power and area by up to 1.47× and 1.43× compared to exact solutions and by up to 22% and 27% compared to state-of-the-art approximate logic synthesis tools while having orders of magnitudes lower run-time.
{"title":"Deep-PowerX: a deep learning-based framework for low-power approximate logic synthesis","authors":"G. Pasandi, Mackenzie Peterson, Moisés Herrera, Shahin Nazarian, M. Pedram","doi":"10.1145/3370748.3406555","DOIUrl":"https://doi.org/10.1145/3370748.3406555","url":null,"abstract":"This paper aims at integrating three powerful techniques namely Deep Learning, Approximate Computing, and Low Power Design into a strategy to optimize logic at the synthesis level. We utilize advances in deep learning to guide an approximate logic synthesis engine to minimize the dynamic power consumption of a given digital CMOS circuit, subject to a predetermined error rate at the primary outputs. Our framework, Deep-PowerX1, focuses on replacing or removing gates on a technology-mapped network and uses a Deep Neural Network (DNN) to predict error rates at primary outputs of the circuit when a specific part of the netlist is approximated. The primary goal of Deep-PowerX is to reduce the dynamic power whereas area reduction serves as a secondary objective. Using the said DNN, Deep-PowerX is able to reduce the exponential time complexity of standard approximate logic synthesis to linear time. Experiments are done on numerous open source benchmark circuits. Results show significant reduction in power and area by up to 1.47× and 1.43× compared to exact solutions and by up to 22% and 27% compared to state-of-the-art approximate logic synthesis tools while having orders of magnitudes lower run-time.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123977079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abhinav Goel, Caleb Tung, Sarah Aghajanzadeh, Isha Ghodgaonkar, Shreya Ghosh, G. Thiruvathukal, Yung-Hsiang Lu
Deep Neural Networks (DNNs) achieve state-of-the-art accuracy in many computer vision tasks, such as object counting. Object counting takes two inputs: an image and an object query and reports the number of occurrences of the queried object. To achieve high accuracy, DNNs require billions of operations, making them difficult to deploy on resource-constrained, low-power devices. Prior work shows that a significant number of DNN operations are redundant and can be eliminated without affecting the accuracy. To reduce these redundancies, we propose a hierarchical DNN architecture for object counting. This architecture uses a Region Proposal Network (RPN) to propose regions-of-interest (RoIs) that may contain the queried objects. A hierarchical classifier then efficiently finds the RoIs that actually contain the queried objects. The hierarchy contains groups of visually similar object categories. Small DNNs at each node of the hierarchy classify between these groups. The RoIs are incrementally processed by the hierarchical classifier. If the object in an RoI is in the same group as the queried object, then the next DNN in the hierarchy processes the RoI further; otherwise, the RoI is discarded. By using a few small DNNs to process each image, this method reduces the memory requirement, inference time, energy consumption, and number of operations with negligible accuracy loss when compared with the existing techniques.
{"title":"Low-power object counting with hierarchical neural networks","authors":"Abhinav Goel, Caleb Tung, Sarah Aghajanzadeh, Isha Ghodgaonkar, Shreya Ghosh, G. Thiruvathukal, Yung-Hsiang Lu","doi":"10.1145/3370748.3406569","DOIUrl":"https://doi.org/10.1145/3370748.3406569","url":null,"abstract":"Deep Neural Networks (DNNs) achieve state-of-the-art accuracy in many computer vision tasks, such as object counting. Object counting takes two inputs: an image and an object query and reports the number of occurrences of the queried object. To achieve high accuracy, DNNs require billions of operations, making them difficult to deploy on resource-constrained, low-power devices. Prior work shows that a significant number of DNN operations are redundant and can be eliminated without affecting the accuracy. To reduce these redundancies, we propose a hierarchical DNN architecture for object counting. This architecture uses a Region Proposal Network (RPN) to propose regions-of-interest (RoIs) that may contain the queried objects. A hierarchical classifier then efficiently finds the RoIs that actually contain the queried objects. The hierarchy contains groups of visually similar object categories. Small DNNs at each node of the hierarchy classify between these groups. The RoIs are incrementally processed by the hierarchical classifier. If the object in an RoI is in the same group as the queried object, then the next DNN in the hierarchy processes the RoI further; otherwise, the RoI is discarded. By using a few small DNNs to process each image, this method reduces the memory requirement, inference time, energy consumption, and number of operations with negligible accuracy loss when compared with the existing techniques.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126957015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent efforts for finding novel computing paradigms that meet today's design requirements have given rise to a new trend of processing-in-memory relying on non-volatile memories. In this paper, we present HIPE-MAGIC, a technology-aware synthesis and mapping flow for highly parallel execution of the memristor-based logic. Our framework is built upon two fundamental contributions: balancing techniques during the logic synthesis, mainly targeting benefits of the parallelism offered by memristive crossbar arrays (MCAs), and an efficient technology mapping framework to maximize the performance and area-efficiency of the memristor-based logic. Our experimental evaluations across several benchmark suites demonstrate the superior performance of HIPE-MAGIC in terms of throughput and energy efficiency compared to recently developed synthesis and mapping flows targeting MCAs, as well as the conventional CPU computing.
{"title":"HIPE-MAGIC: a technology-aware synthesis and mapping flow for highly parallel execution of memristor-aided LoGIC","authors":"A. Fayyazi, Amirhossein Esmaili, M. Pedram","doi":"10.1145/3370748.3406557","DOIUrl":"https://doi.org/10.1145/3370748.3406557","url":null,"abstract":"Recent efforts for finding novel computing paradigms that meet today's design requirements have given rise to a new trend of processing-in-memory relying on non-volatile memories. In this paper, we present HIPE-MAGIC, a technology-aware synthesis and mapping flow for highly parallel execution of the memristor-based logic. Our framework is built upon two fundamental contributions: balancing techniques during the logic synthesis, mainly targeting benefits of the parallelism offered by memristive crossbar arrays (MCAs), and an efficient technology mapping framework to maximize the performance and area-efficiency of the memristor-based logic. Our experimental evaluations across several benchmark suites demonstrate the superior performance of HIPE-MAGIC in terms of throughput and energy efficiency compared to recently developed synthesis and mapping flows targeting MCAs, as well as the conventional CPU computing.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114732218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ling Qiu, M. Alam, Abdullah Ash-Saki, Swaroop Ghosh
Variational algorithm using Quantum Approximate Optimization Algorithm (QAOA) can solve the prime factorization problem in near-term noisy quantum computers. Conventional Variational Quantum Factoring (VQF) requires a large number of 2-qubit gates (especially for factoring a large number) resulting in deep circuits. The output quality of the deep quantum circuit is degraded due to errors limiting the computational power of quantum computing. In this paper, we explore various transformations to optimize the QAOA circuit for integer factorization. We propose two criteria to select the optimal quantum circuit that can improve the noise resiliency of VQF.
{"title":"Resiliency analysis and improvement of variational quantum factoring in superconducting qubit","authors":"Ling Qiu, M. Alam, Abdullah Ash-Saki, Swaroop Ghosh","doi":"10.1145/3370748.3406586","DOIUrl":"https://doi.org/10.1145/3370748.3406586","url":null,"abstract":"Variational algorithm using Quantum Approximate Optimization Algorithm (QAOA) can solve the prime factorization problem in near-term noisy quantum computers. Conventional Variational Quantum Factoring (VQF) requires a large number of 2-qubit gates (especially for factoring a large number) resulting in deep circuits. The output quality of the deep quantum circuit is degraded due to errors limiting the computational power of quantum computing. In this paper, we explore various transformations to optimize the QAOA circuit for integer factorization. We propose two criteria to select the optimal quantum circuit that can improve the noise resiliency of VQF.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132745439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","authors":"","doi":"10.5555/3489049","DOIUrl":"https://doi.org/10.5555/3489049","url":null,"abstract":"","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126585076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}