Yoonho Park, Yesung Kang, Sunghoon Kim, Eunji Kwon, Seokhyeong Kang
Convolutional neural networks (CNNs) require a huge amount of off-chip DRAM access, which accounts for most of its energy consumption. Compression of feature maps can reduce the energy consumption of DRAM access. However, previous compression methods show poor compression ratio if the feature maps are either extremely sparse or dense. To improve the compression ratio efficiently, we have exploited the spatial correlation and the distribution of non-zero activations in output feature maps. In this work, we propose a grid-based run-length compression (GRLC) and have implemented a hardware for the GRLC. Compared with a previous compression method [1], GRLC reduces 11% of the DRAM access and 5% of the energy consumption on average in VGG-16, ExtractionNet and ResNet-18.
{"title":"GRLC","authors":"Yoonho Park, Yesung Kang, Sunghoon Kim, Eunji Kwon, Seokhyeong Kang","doi":"10.1145/3370748.3406576","DOIUrl":"https://doi.org/10.1145/3370748.3406576","url":null,"abstract":"Convolutional neural networks (CNNs) require a huge amount of off-chip DRAM access, which accounts for most of its energy consumption. Compression of feature maps can reduce the energy consumption of DRAM access. However, previous compression methods show poor compression ratio if the feature maps are either extremely sparse or dense. To improve the compression ratio efficiently, we have exploited the spatial correlation and the distribution of non-zero activations in output feature maps. In this work, we propose a grid-based run-length compression (GRLC) and have implemented a hardware for the GRLC. Compared with a previous compression method [1], GRLC reduces 11% of the DRAM access and 5% of the energy consumption on average in VGG-16, ExtractionNet and ResNet-18.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124993975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Decision tree is the core algorithm of the random forest learning that has been widely applied to classification and regression problems in the machine learning field. For avoiding underfitting, a decision tree algorithm will stop growing its tree model when the model is a fully-grown tree. However, a fully-grown tree will result in an overfitting problem reducing the accuracy of a decision tree. In such a dilemma, some post-pruning strategies have been proposed to reduce the model complexity of the fully-grown decision tree. Nevertheless, such a process is very energy-inefficiency over an non-volatile-memory-based (NVM-based) system because NVM generally have high writing costs (i.e., energy consumption and I/O latency). Such unnecessary data will induce high writing energy consumption and long I/O latency on NVM-based architectures, especially for low-power-oriented embedded systems. In order to establish a green decision tree (i.e., a tree model with minimized construction energy consumption), this study rethinks a pruning algorithm, namely duo-phase pruning framework, which can significantly decrease the energy consumption on the NVM-based computing system without loss of accuracy.
{"title":"How to cultivate a green decision tree without loss of accuracy?","authors":"Tseng-Yi Chen, Yuan-Hao Chang, Ming-Chang Yang, Huang-wei Chen","doi":"10.1145/3370748.3406566","DOIUrl":"https://doi.org/10.1145/3370748.3406566","url":null,"abstract":"Decision tree is the core algorithm of the random forest learning that has been widely applied to classification and regression problems in the machine learning field. For avoiding underfitting, a decision tree algorithm will stop growing its tree model when the model is a fully-grown tree. However, a fully-grown tree will result in an overfitting problem reducing the accuracy of a decision tree. In such a dilemma, some post-pruning strategies have been proposed to reduce the model complexity of the fully-grown decision tree. Nevertheless, such a process is very energy-inefficiency over an non-volatile-memory-based (NVM-based) system because NVM generally have high writing costs (i.e., energy consumption and I/O latency). Such unnecessary data will induce high writing energy consumption and long I/O latency on NVM-based architectures, especially for low-power-oriented embedded systems. In order to establish a green decision tree (i.e., a tree model with minimized construction energy consumption), this study rethinks a pruning algorithm, namely duo-phase pruning framework, which can significantly decrease the energy consumption on the NVM-based computing system without loss of accuracy.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129139137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qiuwen Lou, Tianqi Gao, P. Faley, M. Niemier, X. Hu, S. Joshi
Emerging memory devices are an attractive choice for implementing very energy-efficient in-situ matrix-vector multiplication (MVM) for use in intelligent edge platforms. Despite their great potential, device-level non-idealities have a large impact on the application-level accuracy of deep neural network (DNN) inference. We introduce a low-density parity-check code (LDPC) based approach to correct non-ideality induced errors encountered during in-situ MVM. We first encode the weights using error correcting codes (ECC), perform MVM on the encoded weights, and then decode the result after in-situ MVM. We show that partial encoding of weights can maintain DNN inference accuracy while minimizing the overhead of LDPC decoding. Within two iterations, our ECC method recovers 60% of the accuracy in MVM computations when 5% of underlying computations are error-prone. Compared to an alternative ECC method which uses arithmetic codes, using LDPC improves AlexNet classification accuracy by 0.8% at iso-energy. Similarly, at iso-energy, we demonstrate an improvement in CIFAR-10 classification accuracy of 54% with VGG-11 when compared to a strategy that uses 2× redundancy in weights. Further design space explorations demonstrate that we can leverage the resilience endowed by ECC to improve energy efficiency (by reducing operating voltage). A 3.3× energy efficiency improvement in DNN inference on CIFAR-10 dataset with VGG-11 is achieved at iso-accuracy.
{"title":"Embedding error correction into crossbars for reliable matrix vector multiplication using emerging devices","authors":"Qiuwen Lou, Tianqi Gao, P. Faley, M. Niemier, X. Hu, S. Joshi","doi":"10.1145/3370748.3406583","DOIUrl":"https://doi.org/10.1145/3370748.3406583","url":null,"abstract":"Emerging memory devices are an attractive choice for implementing very energy-efficient in-situ matrix-vector multiplication (MVM) for use in intelligent edge platforms. Despite their great potential, device-level non-idealities have a large impact on the application-level accuracy of deep neural network (DNN) inference. We introduce a low-density parity-check code (LDPC) based approach to correct non-ideality induced errors encountered during in-situ MVM. We first encode the weights using error correcting codes (ECC), perform MVM on the encoded weights, and then decode the result after in-situ MVM. We show that partial encoding of weights can maintain DNN inference accuracy while minimizing the overhead of LDPC decoding. Within two iterations, our ECC method recovers 60% of the accuracy in MVM computations when 5% of underlying computations are error-prone. Compared to an alternative ECC method which uses arithmetic codes, using LDPC improves AlexNet classification accuracy by 0.8% at iso-energy. Similarly, at iso-energy, we demonstrate an improvement in CIFAR-10 classification accuracy of 54% with VGG-11 when compared to a strategy that uses 2× redundancy in weights. Further design space explorations demonstrate that we can leverage the resilience endowed by ECC to improve energy efficiency (by reducing operating voltage). A 3.3× energy efficiency improvement in DNN inference on CIFAR-10 dataset with VGG-11 is achieved at iso-accuracy.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122475463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Informationsrecherchen im Internet stellen eine der zentralen Komponenten wissenschaftlichen Arbeitens dar. Als digitale Informationsquellen stehen dabei fachwissenschaftlich fundierte Internetangebote, die von wissenschaftlichen Infrastruktureinrichtungen wie dem Leibniz-Zentrum ZPID erarbeitet werden, den populären, nicht fachwissenschaftlich fundierten, kommerziellen Internetangeboten (wie etwa Google, Google Scholar, Web of Science, Yahoo etc.) gegenüber. Adäquate Fertigkeiten im Umgang mit diesen Informationsquellen können auf Seiten von Lernenden nicht vorausgesetzt werden. "Novizen", die über wenig themenspezifisches (Vor-) Wissen und wenige einschlägige Erfahrungen mit Informationsrecherchen verfügen, neigen z.B. dazu, dysfunktionale (z.B. zu unpräzise oder zu eng gefasste) Suchstrategien zu formulieren. Oftmals werden sie auch von der Vielzahl der vermeintlichen "Treffer" überfordert, deren Relevanz, Qualität und Seriosität sie nicht zu beurteilen wissen.
{"title":"BLINK","authors":"Zhe Chen, Garrett J. Blair, H. T. Blair, J. Cong","doi":"10.1145/1597817.1597847","DOIUrl":"https://doi.org/10.1145/1597817.1597847","url":null,"abstract":"Informationsrecherchen im Internet stellen eine der zentralen Komponenten wissenschaftlichen Arbeitens dar. Als digitale Informationsquellen stehen dabei fachwissenschaftlich fundierte Internetangebote, die von wissenschaftlichen Infrastruktureinrichtungen wie dem Leibniz-Zentrum ZPID erarbeitet werden, den populären, nicht fachwissenschaftlich fundierten, kommerziellen Internetangeboten (wie etwa Google, Google Scholar, Web of Science, Yahoo etc.) gegenüber. Adäquate Fertigkeiten im Umgang mit diesen Informationsquellen können auf Seiten von Lernenden nicht vorausgesetzt werden. \"Novizen\", die über wenig themenspezifisches (Vor-) Wissen und wenige einschlägige Erfahrungen mit Informationsrecherchen verfügen, neigen z.B. dazu, dysfunktionale (z.B. zu unpräzise oder zu eng gefasste) Suchstrategien zu formulieren. Oftmals werden sie auch von der Vielzahl der vermeintlichen \"Treffer\" überfordert, deren Relevanz, Qualität und Seriosität sie nicht zu beurteilen wissen.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"219 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115513715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Devashree Tripathy, Hadi Zamani, Debiprasanna Sahoo, L. Bhuyan, M. Satpathy
The leakage power dissipation has become one of the major concerns with technology scaling. The GPGPU register file has grown in size over last decade in order to support the parallel execution of thousands of threads. Given that each thread has its own dedicated set of physical registers, these registers remain idle when corresponding threads go for long latency operation. Existing research shows that the leakage energy consumption of the register file can be reduced by under volting the idle registers to a data-retentive low-leakage voltage (Drowsy Voltage) to ensure that the data is not lost while not in use. In this paper, we develop a realistic model for determining the wake-up time of registers from various under-volting and power gating modes. Next, we propose a hybrid energy saving technique where a combination of power-gating and under-volting can be used to save optimum energy depending on the idle period of the registers with a negligible performance penalty. Our simulation shows that the hybrid energy-saving technique results in 94% leakage energy savings in register files on an average when compared with the conventional clock gating technique and 9% higher leakage energy saving compared to the state-of-art technique.
{"title":"Slumber","authors":"Devashree Tripathy, Hadi Zamani, Debiprasanna Sahoo, L. Bhuyan, M. Satpathy","doi":"10.1145/3370748.3406577","DOIUrl":"https://doi.org/10.1145/3370748.3406577","url":null,"abstract":"The leakage power dissipation has become one of the major concerns with technology scaling. The GPGPU register file has grown in size over last decade in order to support the parallel execution of thousands of threads. Given that each thread has its own dedicated set of physical registers, these registers remain idle when corresponding threads go for long latency operation. Existing research shows that the leakage energy consumption of the register file can be reduced by under volting the idle registers to a data-retentive low-leakage voltage (Drowsy Voltage) to ensure that the data is not lost while not in use. In this paper, we develop a realistic model for determining the wake-up time of registers from various under-volting and power gating modes. Next, we propose a hybrid energy saving technique where a combination of power-gating and under-volting can be used to save optimum energy depending on the idle period of the registers with a negligible performance penalty. Our simulation shows that the hybrid energy-saving technique results in 94% leakage energy savings in register files on an average when compared with the conventional clock gating technique and 9% higher leakage energy saving compared to the state-of-art technique.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"299 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115999680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bingbing Li, Santosh Pandey, Haowen Fang, Yanjun Lyv, Ji Li, Jieyang Chen, Mimi Xie, Lipeng Wan, Hang Liu, Caiwen Ding
In natural language processing (NLP), the "Transformer" architecture was proposed as the first transduction model replying entirely on self-attention mechanisms without using sequence-aligned recurrent neural networks (RNNs) or convolution, and it achieved significant improvements for sequence to sequence tasks. The introduced intensive computation and storage of these pre-trained language representations has impeded their popularity into computation and memory constrained devices. The field-programmable gate array (FPGA) is widely used to accelerate deep learning algorithms for its high parallelism and low latency. However, the trained models are still too large to accommodate to an FPGA fabric. In this paper, we propose an efficient acceleration framework, Ftrans, for transformer-based large scale language representations. Our framework includes enhanced block-circulant matrix (BCM)-based weight representation to enable model compression on large-scale language representations at the algorithm level with few accuracy degradation, and an acceleration design at the architecture level. Experimental results show that our proposed framework significantly reduce the model size of NLP models by up to 16 times. Our FPGA design achieves 27.07× and 81 × improvement in performance and energy efficiency compared to CPU, and up to 8.80× improvement in energy efficiency compared to GPU.
{"title":"FTRANS","authors":"Bingbing Li, Santosh Pandey, Haowen Fang, Yanjun Lyv, Ji Li, Jieyang Chen, Mimi Xie, Lipeng Wan, Hang Liu, Caiwen Ding","doi":"10.1145/3370748.3406567","DOIUrl":"https://doi.org/10.1145/3370748.3406567","url":null,"abstract":"In natural language processing (NLP), the \"Transformer\" architecture was proposed as the first transduction model replying entirely on self-attention mechanisms without using sequence-aligned recurrent neural networks (RNNs) or convolution, and it achieved significant improvements for sequence to sequence tasks. The introduced intensive computation and storage of these pre-trained language representations has impeded their popularity into computation and memory constrained devices. The field-programmable gate array (FPGA) is widely used to accelerate deep learning algorithms for its high parallelism and low latency. However, the trained models are still too large to accommodate to an FPGA fabric. In this paper, we propose an efficient acceleration framework, Ftrans, for transformer-based large scale language representations. Our framework includes enhanced block-circulant matrix (BCM)-based weight representation to enable model compression on large-scale language representations at the algorithm level with few accuracy degradation, and an acceleration design at the architecture level. Experimental results show that our proposed framework significantly reduce the model size of NLP models by up to 16 times. Our FPGA design achieves 27.07× and 81 × improvement in performance and energy efficiency compared to CPU, and up to 8.80× improvement in energy efficiency compared to GPU.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"289 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114768965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hadi Zamani, Devashree Tripathy, L. Bhuyan, Zizhong Chen
The current trend of ever-increasing performance in scientific applications comes with tremendous growth in energy consumption. In this paper, we present a framework for GPU applications, which reduces energy consumption in GPUs through Safe Overclocking and Undervolting (SAOU) without sacrificing performance. The idea is to increase the frequency beyond the safe frequency fsa f eMax and undervolt below Vsa f eMin to get maximum energy saving. Since such overclocking and undervolting may give rise to faults, we employ an enhanced checkpoint-recovery technique to cover the possible errors. Empirically, we explore different errors and derive a fault model that can set the undervolting and overclocking level for maximum energy saving. We target cuBLAS Matrix Multiplication (cuBLAS-MM) kernel for error correction using the checkpoint and recovery (CR) technique as an example of scientific applications. In case of cuBLAS, SAOU achieves up to 22% energy reduction through undervolting and overclocking without sacrificing the performance.
{"title":"SAOU","authors":"Hadi Zamani, Devashree Tripathy, L. Bhuyan, Zizhong Chen","doi":"10.1145/3370748.3406553","DOIUrl":"https://doi.org/10.1145/3370748.3406553","url":null,"abstract":"The current trend of ever-increasing performance in scientific applications comes with tremendous growth in energy consumption. In this paper, we present a framework for GPU applications, which reduces energy consumption in GPUs through Safe Overclocking and Undervolting (SAOU) without sacrificing performance. The idea is to increase the frequency beyond the safe frequency fsa f eMax and undervolt below Vsa f eMin to get maximum energy saving. Since such overclocking and undervolting may give rise to faults, we employ an enhanced checkpoint-recovery technique to cover the possible errors. Empirically, we explore different errors and derive a fault model that can set the undervolting and overclocking level for maximum energy saving. We target cuBLAS Matrix Multiplication (cuBLAS-MM) kernel for error correction using the checkpoint and recovery (CR) technique as an example of scientific applications. In case of cuBLAS, SAOU achieves up to 22% energy reduction through undervolting and overclocking without sacrificing the performance.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121955254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Gauchi, V. Egloff, Maha Kooli, J. Noël, B. Giraud, P. Vivet, S. Mitra, H. Charles
For big data applications, bringing computation to the memory is expected to reduce drastically data transfers, which can be done using recent concepts of Computing-In-Memory (CIM). To address kernels with larger memory data sets, we propose a reconfigurable tile-based architecture composed of Computational-SRAM (C-SRAM) tiles, each enabling arithmetic and logic operations within the memory. The proposed horizontal scalability and vertical data communication are combined to select the optimal vector width for maximum performance. These schemes allow to use vector-based kernels available on existing SIMD engines onto the targeted CIM architecture. For architecture exploration, we propose an instruction-accurate simulation platform using SystemC/TLM to quantify performance and energy of various kernels. For detailed performance evaluation, the platform is calibrated with data extracted from the Place&Route C-SRAM circuit, designed in 22nm FDSOI technology. Compared to 512-bit SIMD architecture, the proposed CIM architecture achieves an EDP reduction up to 60× and 34× for memory bound kernels and for compute bound kernels, respectively.
{"title":"Reconfigurable tiles of computing-in-memory SRAM architecture for scalable vectorization","authors":"R. Gauchi, V. Egloff, Maha Kooli, J. Noël, B. Giraud, P. Vivet, S. Mitra, H. Charles","doi":"10.1145/3370748.3406550","DOIUrl":"https://doi.org/10.1145/3370748.3406550","url":null,"abstract":"For big data applications, bringing computation to the memory is expected to reduce drastically data transfers, which can be done using recent concepts of Computing-In-Memory (CIM). To address kernels with larger memory data sets, we propose a reconfigurable tile-based architecture composed of Computational-SRAM (C-SRAM) tiles, each enabling arithmetic and logic operations within the memory. The proposed horizontal scalability and vertical data communication are combined to select the optimal vector width for maximum performance. These schemes allow to use vector-based kernels available on existing SIMD engines onto the targeted CIM architecture. For architecture exploration, we propose an instruction-accurate simulation platform using SystemC/TLM to quantify performance and energy of various kernels. For detailed performance evaluation, the platform is calibrated with data extracted from the Place&Route C-SRAM circuit, designed in 22nm FDSOI technology. Compared to 512-bit SIMD architecture, the proposed CIM architecture achieves an EDP reduction up to 60× and 34× for memory bound kernels and for compute bound kernels, respectively.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122454647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Tuncel, Shiva Bandyopadhyay, Shambhavi V. Kulshrestha, A. Mendez, Ümit Y. Ogras
Motion energy harvesting is an ideal alternative to battery in wearable applications since it can produce energy on demand. So far, widespread use of this technology has been hindered by bulky, inflexible and impractical designs. New flexible piezoelectric materials enable comfortable use of this technology. However, the energy harvesting potential of this approach has not been thoroughly investigated to date. This paper presents a novel mathematical model for estimating the energy that can be harvested from joint movements on the human body. The proposed model is validated using two different piezoelectric materials attached on a 3D model of the human knee. To the best of our knowledge, this is the first study that combines analytical modeling and experimental validation for joint movements. Thorough experimental evaluations show that 1) users can generate on average 13 μW power while walking, 2) we can predict the generated power with 4.8% modeling error.
{"title":"Towards wearable piezoelectric energy harvesting: modeling and experimental validation","authors":"Y. Tuncel, Shiva Bandyopadhyay, Shambhavi V. Kulshrestha, A. Mendez, Ümit Y. Ogras","doi":"10.1145/3370748.3406578","DOIUrl":"https://doi.org/10.1145/3370748.3406578","url":null,"abstract":"Motion energy harvesting is an ideal alternative to battery in wearable applications since it can produce energy on demand. So far, widespread use of this technology has been hindered by bulky, inflexible and impractical designs. New flexible piezoelectric materials enable comfortable use of this technology. However, the energy harvesting potential of this approach has not been thoroughly investigated to date. This paper presents a novel mathematical model for estimating the energy that can be harvested from joint movements on the human body. The proposed model is validated using two different piezoelectric materials attached on a 3D model of the human knee. To the best of our knowledge, this is the first study that combines analytical modeling and experimental validation for joint movements. Thorough experimental evaluations show that 1) users can generate on average 13 μW power while walking, 2) we can predict the generated power with 4.8% modeling error.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132843555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emerging Non-Volatile memories such as Phase Change Memory (PCM) and Resistive RAM are projected as potential replacements of the traditional DRAM-based main memories. However, limited write endurance and high write energy limit their chances of adoption as a mainstream main memory standard. In this paper, we propose a word-level compression scheme called COMF to reduce bitflips in PCMs by removing the most repeated words from the cache lines before writing into memory. Later, we also propose an intra-line wear leveing technique called WELCOMF that extends COMF to improve lifetime. Experimental results show that the proposed technique improves lifetime by 75% and, reduce bit flips and energy by 45% and 46% respectively over baseline.
{"title":"WELCOMF","authors":"Arijit Nath, H. Kapoor","doi":"10.1145/3370748.3406559","DOIUrl":"https://doi.org/10.1145/3370748.3406559","url":null,"abstract":"Emerging Non-Volatile memories such as Phase Change Memory (PCM) and Resistive RAM are projected as potential replacements of the traditional DRAM-based main memories. However, limited write endurance and high write energy limit their chances of adoption as a mainstream main memory standard. In this paper, we propose a word-level compression scheme called COMF to reduce bitflips in PCMs by removing the most repeated words from the cache lines before writing into memory. Later, we also propose an intra-line wear leveing technique called WELCOMF that extends COMF to improve lifetime. Experimental results show that the proposed technique improves lifetime by 75% and, reduce bit flips and energy by 45% and 46% respectively over baseline.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"15 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122041811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}