首页 > 最新文献

Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design最新文献

英文 中文
GRLC GRLC
Yoonho Park, Yesung Kang, Sunghoon Kim, Eunji Kwon, Seokhyeong Kang
Convolutional neural networks (CNNs) require a huge amount of off-chip DRAM access, which accounts for most of its energy consumption. Compression of feature maps can reduce the energy consumption of DRAM access. However, previous compression methods show poor compression ratio if the feature maps are either extremely sparse or dense. To improve the compression ratio efficiently, we have exploited the spatial correlation and the distribution of non-zero activations in output feature maps. In this work, we propose a grid-based run-length compression (GRLC) and have implemented a hardware for the GRLC. Compared with a previous compression method [1], GRLC reduces 11% of the DRAM access and 5% of the energy consumption on average in VGG-16, ExtractionNet and ResNet-18.
{"title":"GRLC","authors":"Yoonho Park, Yesung Kang, Sunghoon Kim, Eunji Kwon, Seokhyeong Kang","doi":"10.1145/3370748.3406576","DOIUrl":"https://doi.org/10.1145/3370748.3406576","url":null,"abstract":"Convolutional neural networks (CNNs) require a huge amount of off-chip DRAM access, which accounts for most of its energy consumption. Compression of feature maps can reduce the energy consumption of DRAM access. However, previous compression methods show poor compression ratio if the feature maps are either extremely sparse or dense. To improve the compression ratio efficiently, we have exploited the spatial correlation and the distribution of non-zero activations in output feature maps. In this work, we propose a grid-based run-length compression (GRLC) and have implemented a hardware for the GRLC. Compared with a previous compression method [1], GRLC reduces 11% of the DRAM access and 5% of the energy consumption on average in VGG-16, ExtractionNet and ResNet-18.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124993975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
How to cultivate a green decision tree without loss of accuracy? 如何培育绿色决策树而不损失其准确性?
Tseng-Yi Chen, Yuan-Hao Chang, Ming-Chang Yang, Huang-wei Chen
Decision tree is the core algorithm of the random forest learning that has been widely applied to classification and regression problems in the machine learning field. For avoiding underfitting, a decision tree algorithm will stop growing its tree model when the model is a fully-grown tree. However, a fully-grown tree will result in an overfitting problem reducing the accuracy of a decision tree. In such a dilemma, some post-pruning strategies have been proposed to reduce the model complexity of the fully-grown decision tree. Nevertheless, such a process is very energy-inefficiency over an non-volatile-memory-based (NVM-based) system because NVM generally have high writing costs (i.e., energy consumption and I/O latency). Such unnecessary data will induce high writing energy consumption and long I/O latency on NVM-based architectures, especially for low-power-oriented embedded systems. In order to establish a green decision tree (i.e., a tree model with minimized construction energy consumption), this study rethinks a pruning algorithm, namely duo-phase pruning framework, which can significantly decrease the energy consumption on the NVM-based computing system without loss of accuracy.
决策树是随机森林学习的核心算法,已广泛应用于机器学习领域的分类和回归问题。为了避免欠拟合,决策树算法将在模型是完全成熟的树时停止生长其树模型。然而,成熟的树会导致过拟合问题,降低决策树的准确性。在这种困境下,人们提出了一些后修剪策略来降低完全生长决策树的模型复杂性。然而,与基于非易失性存储器(NVM)的系统相比,这样的过程非常节能,因为NVM通常具有高写入成本(即能耗和I/O延迟)。在基于nvm的架构上,这些不必要的数据将导致高写入能耗和长I/O延迟,特别是对于面向低功耗的嵌入式系统。为了建立绿色决策树(即建筑能耗最小的树模型),本研究重新思考了一种剪枝算法,即两阶段剪枝框架,该算法可以在不损失精度的情况下显著降低基于nvm的计算系统的能耗。
{"title":"How to cultivate a green decision tree without loss of accuracy?","authors":"Tseng-Yi Chen, Yuan-Hao Chang, Ming-Chang Yang, Huang-wei Chen","doi":"10.1145/3370748.3406566","DOIUrl":"https://doi.org/10.1145/3370748.3406566","url":null,"abstract":"Decision tree is the core algorithm of the random forest learning that has been widely applied to classification and regression problems in the machine learning field. For avoiding underfitting, a decision tree algorithm will stop growing its tree model when the model is a fully-grown tree. However, a fully-grown tree will result in an overfitting problem reducing the accuracy of a decision tree. In such a dilemma, some post-pruning strategies have been proposed to reduce the model complexity of the fully-grown decision tree. Nevertheless, such a process is very energy-inefficiency over an non-volatile-memory-based (NVM-based) system because NVM generally have high writing costs (i.e., energy consumption and I/O latency). Such unnecessary data will induce high writing energy consumption and long I/O latency on NVM-based architectures, especially for low-power-oriented embedded systems. In order to establish a green decision tree (i.e., a tree model with minimized construction energy consumption), this study rethinks a pruning algorithm, namely duo-phase pruning framework, which can significantly decrease the energy consumption on the NVM-based computing system without loss of accuracy.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129139137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Embedding error correction into crossbars for reliable matrix vector multiplication using emerging devices 嵌入误差校正到横梁可靠的矩阵矢量乘法使用新兴设备
Qiuwen Lou, Tianqi Gao, P. Faley, M. Niemier, X. Hu, S. Joshi
Emerging memory devices are an attractive choice for implementing very energy-efficient in-situ matrix-vector multiplication (MVM) for use in intelligent edge platforms. Despite their great potential, device-level non-idealities have a large impact on the application-level accuracy of deep neural network (DNN) inference. We introduce a low-density parity-check code (LDPC) based approach to correct non-ideality induced errors encountered during in-situ MVM. We first encode the weights using error correcting codes (ECC), perform MVM on the encoded weights, and then decode the result after in-situ MVM. We show that partial encoding of weights can maintain DNN inference accuracy while minimizing the overhead of LDPC decoding. Within two iterations, our ECC method recovers 60% of the accuracy in MVM computations when 5% of underlying computations are error-prone. Compared to an alternative ECC method which uses arithmetic codes, using LDPC improves AlexNet classification accuracy by 0.8% at iso-energy. Similarly, at iso-energy, we demonstrate an improvement in CIFAR-10 classification accuracy of 54% with VGG-11 when compared to a strategy that uses 2× redundancy in weights. Further design space explorations demonstrate that we can leverage the resilience endowed by ECC to improve energy efficiency (by reducing operating voltage). A 3.3× energy efficiency improvement in DNN inference on CIFAR-10 dataset with VGG-11 is achieved at iso-accuracy.
新兴存储设备是实现非常节能的原位矩阵向量乘法(MVM)的一个有吸引力的选择,用于智能边缘平台。尽管具有巨大的潜力,但设备级非理想性对深度神经网络(DNN)推理的应用级精度有很大的影响。我们引入了一种基于低密度奇偶校验码(LDPC)的方法来纠正原位MVM中遇到的非理想性引起的错误。我们首先使用纠错码(error correcting codes, ECC)对权值进行编码,对编码后的权值进行MVM,然后对结果进行原位MVM解码。我们证明了权重的部分编码可以在保持DNN推理精度的同时最小化LDPC解码的开销。在两次迭代中,当5%的底层计算容易出错时,我们的ECC方法在MVM计算中恢复了60%的精度。与使用算术编码的替代ECC方法相比,LDPC在等能量下将AlexNet分类准确率提高了0.8%。类似地,在等能量下,我们证明与使用2倍冗余权值的策略相比,使用VGG-11的CIFAR-10分类准确率提高了54%。进一步的设计空间探索表明,我们可以利用ECC赋予的弹性来提高能源效率(通过降低工作电压)。在等精度下,VGG-11在CIFAR-10数据集上的DNN推理效率提高了3.3倍。
{"title":"Embedding error correction into crossbars for reliable matrix vector multiplication using emerging devices","authors":"Qiuwen Lou, Tianqi Gao, P. Faley, M. Niemier, X. Hu, S. Joshi","doi":"10.1145/3370748.3406583","DOIUrl":"https://doi.org/10.1145/3370748.3406583","url":null,"abstract":"Emerging memory devices are an attractive choice for implementing very energy-efficient in-situ matrix-vector multiplication (MVM) for use in intelligent edge platforms. Despite their great potential, device-level non-idealities have a large impact on the application-level accuracy of deep neural network (DNN) inference. We introduce a low-density parity-check code (LDPC) based approach to correct non-ideality induced errors encountered during in-situ MVM. We first encode the weights using error correcting codes (ECC), perform MVM on the encoded weights, and then decode the result after in-situ MVM. We show that partial encoding of weights can maintain DNN inference accuracy while minimizing the overhead of LDPC decoding. Within two iterations, our ECC method recovers 60% of the accuracy in MVM computations when 5% of underlying computations are error-prone. Compared to an alternative ECC method which uses arithmetic codes, using LDPC improves AlexNet classification accuracy by 0.8% at iso-energy. Similarly, at iso-energy, we demonstrate an improvement in CIFAR-10 classification accuracy of 54% with VGG-11 when compared to a strategy that uses 2× redundancy in weights. Further design space explorations demonstrate that we can leverage the resilience endowed by ECC to improve energy efficiency (by reducing operating voltage). A 3.3× energy efficiency improvement in DNN inference on CIFAR-10 dataset with VGG-11 is achieved at iso-accuracy.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122475463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
BLINK 眨眼
Zhe Chen, Garrett J. Blair, H. T. Blair, J. Cong
Informationsrecherchen im Internet stellen eine der zentralen Komponenten wissenschaftlichen Arbeitens dar. Als digitale Informationsquellen stehen dabei fachwissenschaftlich fundierte Internetangebote, die von wissenschaftlichen Infrastruktureinrichtungen wie dem Leibniz-Zentrum ZPID erarbeitet werden, den populären, nicht fachwissenschaftlich fundierten, kommerziellen Internetangeboten (wie etwa Google, Google Scholar, Web of Science, Yahoo etc.) gegenüber. Adäquate Fertigkeiten im Umgang mit diesen Informationsquellen können auf Seiten von Lernenden nicht vorausgesetzt werden. "Novizen", die über wenig themenspezifisches (Vor-) Wissen und wenige einschlägige Erfahrungen mit Informationsrecherchen verfügen, neigen z.B. dazu, dysfunktionale (z.B. zu unpräzise oder zu eng gefasste) Suchstrategien zu formulieren. Oftmals werden sie auch von der Vielzahl der vermeintlichen "Treffer" überfordert, deren Relevanz, Qualität und Seriosität sie nicht zu beurteilen wissen.
互联网上的信息研究是科学工作的重要内容之一。作为数字信息来源处于fachwissenschaftlich知情的Internetangebote科学Infrastruktureinrichtungen Leibniz-Zentrum如何制订ZPID流行,fachwissenschaftlich不知情的商业Internetangeboten(如谷歌Google Scholar、理科网络、雅虎等对. .)对于学员来说,根本无法具备应对这些信息来源的适当技能。缺少特定主题(在前)知识以及相关信息调查经验的“Novizen”容易想出障碍(如不准确或概念过于狭隘)的搜索策略。此外,他们还有可能应付不了“重击”一类的“重击”,不知道这类“重击”有多相关,有多准确,而且未经深思熟虑。
{"title":"BLINK","authors":"Zhe Chen, Garrett J. Blair, H. T. Blair, J. Cong","doi":"10.1145/1597817.1597847","DOIUrl":"https://doi.org/10.1145/1597817.1597847","url":null,"abstract":"Informationsrecherchen im Internet stellen eine der zentralen Komponenten wissenschaftlichen Arbeitens dar. Als digitale Informationsquellen stehen dabei fachwissenschaftlich fundierte Internetangebote, die von wissenschaftlichen Infrastruktureinrichtungen wie dem Leibniz-Zentrum ZPID erarbeitet werden, den populären, nicht fachwissenschaftlich fundierten, kommerziellen Internetangeboten (wie etwa Google, Google Scholar, Web of Science, Yahoo etc.) gegenüber. Adäquate Fertigkeiten im Umgang mit diesen Informationsquellen können auf Seiten von Lernenden nicht vorausgesetzt werden. \"Novizen\", die über wenig themenspezifisches (Vor-) Wissen und wenige einschlägige Erfahrungen mit Informationsrecherchen verfügen, neigen z.B. dazu, dysfunktionale (z.B. zu unpräzise oder zu eng gefasste) Suchstrategien zu formulieren. Oftmals werden sie auch von der Vielzahl der vermeintlichen \"Treffer\" überfordert, deren Relevanz, Qualität und Seriosität sie nicht zu beurteilen wissen.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"219 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115513715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Slumber 睡眠
Devashree Tripathy, Hadi Zamani, Debiprasanna Sahoo, L. Bhuyan, M. Satpathy
The leakage power dissipation has become one of the major concerns with technology scaling. The GPGPU register file has grown in size over last decade in order to support the parallel execution of thousands of threads. Given that each thread has its own dedicated set of physical registers, these registers remain idle when corresponding threads go for long latency operation. Existing research shows that the leakage energy consumption of the register file can be reduced by under volting the idle registers to a data-retentive low-leakage voltage (Drowsy Voltage) to ensure that the data is not lost while not in use. In this paper, we develop a realistic model for determining the wake-up time of registers from various under-volting and power gating modes. Next, we propose a hybrid energy saving technique where a combination of power-gating and under-volting can be used to save optimum energy depending on the idle period of the registers with a negligible performance penalty. Our simulation shows that the hybrid energy-saving technique results in 94% leakage energy savings in register files on an average when compared with the conventional clock gating technique and 9% higher leakage energy saving compared to the state-of-art technique.
{"title":"Slumber","authors":"Devashree Tripathy, Hadi Zamani, Debiprasanna Sahoo, L. Bhuyan, M. Satpathy","doi":"10.1145/3370748.3406577","DOIUrl":"https://doi.org/10.1145/3370748.3406577","url":null,"abstract":"The leakage power dissipation has become one of the major concerns with technology scaling. The GPGPU register file has grown in size over last decade in order to support the parallel execution of thousands of threads. Given that each thread has its own dedicated set of physical registers, these registers remain idle when corresponding threads go for long latency operation. Existing research shows that the leakage energy consumption of the register file can be reduced by under volting the idle registers to a data-retentive low-leakage voltage (Drowsy Voltage) to ensure that the data is not lost while not in use. In this paper, we develop a realistic model for determining the wake-up time of registers from various under-volting and power gating modes. Next, we propose a hybrid energy saving technique where a combination of power-gating and under-volting can be used to save optimum energy depending on the idle period of the registers with a negligible performance penalty. Our simulation shows that the hybrid energy-saving technique results in 94% leakage energy savings in register files on an average when compared with the conventional clock gating technique and 9% higher leakage energy saving compared to the state-of-art technique.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"299 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115999680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
FTRANS
Bingbing Li, Santosh Pandey, Haowen Fang, Yanjun Lyv, Ji Li, Jieyang Chen, Mimi Xie, Lipeng Wan, Hang Liu, Caiwen Ding
In natural language processing (NLP), the "Transformer" architecture was proposed as the first transduction model replying entirely on self-attention mechanisms without using sequence-aligned recurrent neural networks (RNNs) or convolution, and it achieved significant improvements for sequence to sequence tasks. The introduced intensive computation and storage of these pre-trained language representations has impeded their popularity into computation and memory constrained devices. The field-programmable gate array (FPGA) is widely used to accelerate deep learning algorithms for its high parallelism and low latency. However, the trained models are still too large to accommodate to an FPGA fabric. In this paper, we propose an efficient acceleration framework, Ftrans, for transformer-based large scale language representations. Our framework includes enhanced block-circulant matrix (BCM)-based weight representation to enable model compression on large-scale language representations at the algorithm level with few accuracy degradation, and an acceleration design at the architecture level. Experimental results show that our proposed framework significantly reduce the model size of NLP models by up to 16 times. Our FPGA design achieves 27.07× and 81 × improvement in performance and energy efficiency compared to CPU, and up to 8.80× improvement in energy efficiency compared to GPU.
{"title":"FTRANS","authors":"Bingbing Li, Santosh Pandey, Haowen Fang, Yanjun Lyv, Ji Li, Jieyang Chen, Mimi Xie, Lipeng Wan, Hang Liu, Caiwen Ding","doi":"10.1145/3370748.3406567","DOIUrl":"https://doi.org/10.1145/3370748.3406567","url":null,"abstract":"In natural language processing (NLP), the \"Transformer\" architecture was proposed as the first transduction model replying entirely on self-attention mechanisms without using sequence-aligned recurrent neural networks (RNNs) or convolution, and it achieved significant improvements for sequence to sequence tasks. The introduced intensive computation and storage of these pre-trained language representations has impeded their popularity into computation and memory constrained devices. The field-programmable gate array (FPGA) is widely used to accelerate deep learning algorithms for its high parallelism and low latency. However, the trained models are still too large to accommodate to an FPGA fabric. In this paper, we propose an efficient acceleration framework, Ftrans, for transformer-based large scale language representations. Our framework includes enhanced block-circulant matrix (BCM)-based weight representation to enable model compression on large-scale language representations at the algorithm level with few accuracy degradation, and an acceleration design at the architecture level. Experimental results show that our proposed framework significantly reduce the model size of NLP models by up to 16 times. Our FPGA design achieves 27.07× and 81 × improvement in performance and energy efficiency compared to CPU, and up to 8.80× improvement in energy efficiency compared to GPU.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"289 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114768965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
SAOU SAOU
Hadi Zamani, Devashree Tripathy, L. Bhuyan, Zizhong Chen
The current trend of ever-increasing performance in scientific applications comes with tremendous growth in energy consumption. In this paper, we present a framework for GPU applications, which reduces energy consumption in GPUs through Safe Overclocking and Undervolting (SAOU) without sacrificing performance. The idea is to increase the frequency beyond the safe frequency fsa f eMax and undervolt below Vsa f eMin to get maximum energy saving. Since such overclocking and undervolting may give rise to faults, we employ an enhanced checkpoint-recovery technique to cover the possible errors. Empirically, we explore different errors and derive a fault model that can set the undervolting and overclocking level for maximum energy saving. We target cuBLAS Matrix Multiplication (cuBLAS-MM) kernel for error correction using the checkpoint and recovery (CR) technique as an example of scientific applications. In case of cuBLAS, SAOU achieves up to 22% energy reduction through undervolting and overclocking without sacrificing the performance.
{"title":"SAOU","authors":"Hadi Zamani, Devashree Tripathy, L. Bhuyan, Zizhong Chen","doi":"10.1145/3370748.3406553","DOIUrl":"https://doi.org/10.1145/3370748.3406553","url":null,"abstract":"The current trend of ever-increasing performance in scientific applications comes with tremendous growth in energy consumption. In this paper, we present a framework for GPU applications, which reduces energy consumption in GPUs through Safe Overclocking and Undervolting (SAOU) without sacrificing performance. The idea is to increase the frequency beyond the safe frequency fsa f eMax and undervolt below Vsa f eMin to get maximum energy saving. Since such overclocking and undervolting may give rise to faults, we employ an enhanced checkpoint-recovery technique to cover the possible errors. Empirically, we explore different errors and derive a fault model that can set the undervolting and overclocking level for maximum energy saving. We target cuBLAS Matrix Multiplication (cuBLAS-MM) kernel for error correction using the checkpoint and recovery (CR) technique as an example of scientific applications. In case of cuBLAS, SAOU achieves up to 22% energy reduction through undervolting and overclocking without sacrificing the performance.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121955254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Reconfigurable tiles of computing-in-memory SRAM architecture for scalable vectorization 可重构的内存计算SRAM体系结构,用于可扩展的矢量化
R. Gauchi, V. Egloff, Maha Kooli, J. Noël, B. Giraud, P. Vivet, S. Mitra, H. Charles
For big data applications, bringing computation to the memory is expected to reduce drastically data transfers, which can be done using recent concepts of Computing-In-Memory (CIM). To address kernels with larger memory data sets, we propose a reconfigurable tile-based architecture composed of Computational-SRAM (C-SRAM) tiles, each enabling arithmetic and logic operations within the memory. The proposed horizontal scalability and vertical data communication are combined to select the optimal vector width for maximum performance. These schemes allow to use vector-based kernels available on existing SIMD engines onto the targeted CIM architecture. For architecture exploration, we propose an instruction-accurate simulation platform using SystemC/TLM to quantify performance and energy of various kernels. For detailed performance evaluation, the platform is calibrated with data extracted from the Place&Route C-SRAM circuit, designed in 22nm FDSOI technology. Compared to 512-bit SIMD architecture, the proposed CIM architecture achieves an EDP reduction up to 60× and 34× for memory bound kernels and for compute bound kernels, respectively.
对于大数据应用程序,将计算带入内存有望大幅减少数据传输,这可以使用内存中计算(CIM)的最新概念来实现。为了解决具有更大内存数据集的内核,我们提出了一种可重构的基于块的架构,该架构由计算sram (C-SRAM)块组成,每个块在内存中支持算术和逻辑操作。将所提出的水平可扩展性和垂直数据通信相结合,选择最佳矢量宽度以获得最大性能。这些方案允许在目标CIM体系结构上使用现有SIMD引擎上可用的基于矢量的内核。在架构探索方面,我们提出了一个指令精确的仿真平台,使用SystemC/TLM来量化各种内核的性能和能量。为了进行详细的性能评估,该平台使用从Place&Route C-SRAM电路提取的数据进行校准,该电路采用22nm FDSOI技术设计。与512位SIMD体系结构相比,所提出的CIM体系结构对于内存绑定内核和计算绑定内核分别实现了高达60倍和34倍的EDP降低。
{"title":"Reconfigurable tiles of computing-in-memory SRAM architecture for scalable vectorization","authors":"R. Gauchi, V. Egloff, Maha Kooli, J. Noël, B. Giraud, P. Vivet, S. Mitra, H. Charles","doi":"10.1145/3370748.3406550","DOIUrl":"https://doi.org/10.1145/3370748.3406550","url":null,"abstract":"For big data applications, bringing computation to the memory is expected to reduce drastically data transfers, which can be done using recent concepts of Computing-In-Memory (CIM). To address kernels with larger memory data sets, we propose a reconfigurable tile-based architecture composed of Computational-SRAM (C-SRAM) tiles, each enabling arithmetic and logic operations within the memory. The proposed horizontal scalability and vertical data communication are combined to select the optimal vector width for maximum performance. These schemes allow to use vector-based kernels available on existing SIMD engines onto the targeted CIM architecture. For architecture exploration, we propose an instruction-accurate simulation platform using SystemC/TLM to quantify performance and energy of various kernels. For detailed performance evaluation, the platform is calibrated with data extracted from the Place&Route C-SRAM circuit, designed in 22nm FDSOI technology. Compared to 512-bit SIMD architecture, the proposed CIM architecture achieves an EDP reduction up to 60× and 34× for memory bound kernels and for compute bound kernels, respectively.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122454647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Towards wearable piezoelectric energy harvesting: modeling and experimental validation 面向可穿戴式压电能量收集:建模和实验验证
Y. Tuncel, Shiva Bandyopadhyay, Shambhavi V. Kulshrestha, A. Mendez, Ümit Y. Ogras
Motion energy harvesting is an ideal alternative to battery in wearable applications since it can produce energy on demand. So far, widespread use of this technology has been hindered by bulky, inflexible and impractical designs. New flexible piezoelectric materials enable comfortable use of this technology. However, the energy harvesting potential of this approach has not been thoroughly investigated to date. This paper presents a novel mathematical model for estimating the energy that can be harvested from joint movements on the human body. The proposed model is validated using two different piezoelectric materials attached on a 3D model of the human knee. To the best of our knowledge, this is the first study that combines analytical modeling and experimental validation for joint movements. Thorough experimental evaluations show that 1) users can generate on average 13 μW power while walking, 2) we can predict the generated power with 4.8% modeling error.
运动能量收集是可穿戴应用中电池的理想替代品,因为它可以根据需要产生能量。到目前为止,这项技术的广泛应用受到笨重、不灵活和不切实际的设计的阻碍。新型柔性压电材料使这项技术能够舒适地使用。然而,迄今为止,这种方法的能量收集潜力尚未得到彻底的研究。本文提出了一种新的数学模型,用于估计人体关节运动可以收获的能量。将两种不同的压电材料附着在人体膝盖的三维模型上,验证了所提出的模型。据我们所知,这是第一个结合分析建模和实验验证关节运动的研究。实验结果表明:1)用户行走时产生的平均功率为13 μW; 2)预测产生的功率建模误差为4.8%。
{"title":"Towards wearable piezoelectric energy harvesting: modeling and experimental validation","authors":"Y. Tuncel, Shiva Bandyopadhyay, Shambhavi V. Kulshrestha, A. Mendez, Ümit Y. Ogras","doi":"10.1145/3370748.3406578","DOIUrl":"https://doi.org/10.1145/3370748.3406578","url":null,"abstract":"Motion energy harvesting is an ideal alternative to battery in wearable applications since it can produce energy on demand. So far, widespread use of this technology has been hindered by bulky, inflexible and impractical designs. New flexible piezoelectric materials enable comfortable use of this technology. However, the energy harvesting potential of this approach has not been thoroughly investigated to date. This paper presents a novel mathematical model for estimating the energy that can be harvested from joint movements on the human body. The proposed model is validated using two different piezoelectric materials attached on a 3D model of the human knee. To the best of our knowledge, this is the first study that combines analytical modeling and experimental validation for joint movements. Thorough experimental evaluations show that 1) users can generate on average 13 μW power while walking, 2) we can predict the generated power with 4.8% modeling error.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132843555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
WELCOMF WELCOMF
Arijit Nath, H. Kapoor
Emerging Non-Volatile memories such as Phase Change Memory (PCM) and Resistive RAM are projected as potential replacements of the traditional DRAM-based main memories. However, limited write endurance and high write energy limit their chances of adoption as a mainstream main memory standard. In this paper, we propose a word-level compression scheme called COMF to reduce bitflips in PCMs by removing the most repeated words from the cache lines before writing into memory. Later, we also propose an intra-line wear leveing technique called WELCOMF that extends COMF to improve lifetime. Experimental results show that the proposed technique improves lifetime by 75% and, reduce bit flips and energy by 45% and 46% respectively over baseline.
{"title":"WELCOMF","authors":"Arijit Nath, H. Kapoor","doi":"10.1145/3370748.3406559","DOIUrl":"https://doi.org/10.1145/3370748.3406559","url":null,"abstract":"Emerging Non-Volatile memories such as Phase Change Memory (PCM) and Resistive RAM are projected as potential replacements of the traditional DRAM-based main memories. However, limited write endurance and high write energy limit their chances of adoption as a mainstream main memory standard. In this paper, we propose a word-level compression scheme called COMF to reduce bitflips in PCMs by removing the most repeated words from the cache lines before writing into memory. Later, we also propose an intra-line wear leveing technique called WELCOMF that extends COMF to improve lifetime. Experimental results show that the proposed technique improves lifetime by 75% and, reduce bit flips and energy by 45% and 46% respectively over baseline.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"15 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122041811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1