Pub Date : 2018-05-27DOI: 10.1109/ISCAS.2018.8351354
H. Zhang, Hyuk-Jae Lee, S. Ko
Deep learning is getting more and more attentions in recent years. Many hardware architectures have been proposed for efficient implementation of deep neural network. The arithmetic unit, as a core processing part of the hardware architecture, can determine the functionality of the whole architecture. In this paper, an efficient fixed/floating-point merged multiply-accumulate unit for deep learning processor is proposed. The proposed architecture supports 16-bit half-precision floating-point multiplication with 32-bit single-precision accumulation for training operations of deep learning algorithm. In addition, within the same hardware, the proposed architecture also supports two parallel 8-bit fixed-point multiplications and accumulating the products to 32-bit fixed-point number. This will enable higher throughput for inference operations of deep learning algorithms. Compared to a half-precision multiply-accumulate unit (accumulating to single-precision), the proposed architecture has only 4.6% area overhead. With the proposed multiply-accumulate unit, the deep learning processor can support both training and high-throughput inference.
{"title":"Efficient Fixed/Floating-Point Merged Mixed-Precision Multiply-Accumulate Unit for Deep Learning Processors","authors":"H. Zhang, Hyuk-Jae Lee, S. Ko","doi":"10.1109/ISCAS.2018.8351354","DOIUrl":"https://doi.org/10.1109/ISCAS.2018.8351354","url":null,"abstract":"Deep learning is getting more and more attentions in recent years. Many hardware architectures have been proposed for efficient implementation of deep neural network. The arithmetic unit, as a core processing part of the hardware architecture, can determine the functionality of the whole architecture. In this paper, an efficient fixed/floating-point merged multiply-accumulate unit for deep learning processor is proposed. The proposed architecture supports 16-bit half-precision floating-point multiplication with 32-bit single-precision accumulation for training operations of deep learning algorithm. In addition, within the same hardware, the proposed architecture also supports two parallel 8-bit fixed-point multiplications and accumulating the products to 32-bit fixed-point number. This will enable higher throughput for inference operations of deep learning algorithms. Compared to a half-precision multiply-accumulate unit (accumulating to single-precision), the proposed architecture has only 4.6% area overhead. With the proposed multiply-accumulate unit, the deep learning processor can support both training and high-throughput inference.","PeriodicalId":6569,"journal":{"name":"2018 IEEE International Symposium on Circuits and Systems (ISCAS)","volume":"62 1","pages":"1-5"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83146641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-27DOI: 10.1109/ISCAS.2018.8351167
Ling Qiu, Yingjie Lao
As the size of technology reaches deep nanometer realm, the improvements in area, power, and timing resulting from developments in scaling have started to see a decrease. Alternative approaches to explore design space to achieve energy-efficient digital systems are of great interest in recent years. Approximate computing in hardware design has emerged as a promising paradigm which seeks to trade off the requirement of accuracy for reduction in power consumption and hardware cost. This paper presents a systematic and scalable method for approximate circuit design by employing data-driven feature selection techniques rather than using statistical or theoretical analysis, which is extremely suitable for applications at a larger scale. A case study on approximate multiplier is presented to demonstrate the proposed design flow. Our experimental results show that the proposed approach could achieve better area/power saving and comparable error performance with other existing manual approximate multiplier designs, while greatly reducing the design workload and complexity.
{"title":"A Systematic Method for Approximate Circuit Design Using Feature Selection","authors":"Ling Qiu, Yingjie Lao","doi":"10.1109/ISCAS.2018.8351167","DOIUrl":"https://doi.org/10.1109/ISCAS.2018.8351167","url":null,"abstract":"As the size of technology reaches deep nanometer realm, the improvements in area, power, and timing resulting from developments in scaling have started to see a decrease. Alternative approaches to explore design space to achieve energy-efficient digital systems are of great interest in recent years. Approximate computing in hardware design has emerged as a promising paradigm which seeks to trade off the requirement of accuracy for reduction in power consumption and hardware cost. This paper presents a systematic and scalable method for approximate circuit design by employing data-driven feature selection techniques rather than using statistical or theoretical analysis, which is extremely suitable for applications at a larger scale. A case study on approximate multiplier is presented to demonstrate the proposed design flow. Our experimental results show that the proposed approach could achieve better area/power saving and comparable error performance with other existing manual approximate multiplier designs, while greatly reducing the design workload and complexity.","PeriodicalId":6569,"journal":{"name":"2018 IEEE International Symposium on Circuits and Systems (ISCAS)","volume":"56 1","pages":"1-5"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83483794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-27DOI: 10.1109/ISCAS.2018.8351654
M. Sylvain, Francis Lehoux, S. Morency, Felix Faucher, E. Bharucha, D. Tremblay, Denis Sarrazin, Sylvain Moineau, Michel Allard, Jacques Corbeil, Younès Messaddeq, Benoit Gosselin
This paper presents a new autonomous wireless sensor platform intended for the monitoring of microorganisms and molecules found in harsh environments, like in the northern climates. The EcoChip includes a layered multiwell plate that allows the growth of single strain microorganisms, within a well of the plate, isolated from environmental samples from Northern habitats. It can be deployed in the field for continuous monitoring of microbiological growth within 96 individual wells through a multichannel electro-chemical impedance monitoring circuit. Additional sensors are provided for monitoring luminosity, humidity, temperature, pH, and CO2 release. The embedded electronic board is equipped with a flash memory to accumulate and store sensor data for long periods of time, as well as with a low-power micro-controller, and a power management unit to control and supply all electronic building blocks. When a receiver is located within the transmission range of the EcoChip, a low-power wireless transceiver allows transmission of sensor data stored from on-board memory. We report the measured performance of the system, and we present experimental results obtained in the field during a pilot study performed with the EcoChip deployed in the village of Kuujjuarapik, at a latitude of 55 degrees, in Northern Canada.
{"title":"The EcoChip: A Wireless Multi-Sensor Platform for Comprehensive Environmental Monitoring","authors":"M. Sylvain, Francis Lehoux, S. Morency, Felix Faucher, E. Bharucha, D. Tremblay, Denis Sarrazin, Sylvain Moineau, Michel Allard, Jacques Corbeil, Younès Messaddeq, Benoit Gosselin","doi":"10.1109/ISCAS.2018.8351654","DOIUrl":"https://doi.org/10.1109/ISCAS.2018.8351654","url":null,"abstract":"This paper presents a new autonomous wireless sensor platform intended for the monitoring of microorganisms and molecules found in harsh environments, like in the northern climates. The EcoChip includes a layered multiwell plate that allows the growth of single strain microorganisms, within a well of the plate, isolated from environmental samples from Northern habitats. It can be deployed in the field for continuous monitoring of microbiological growth within 96 individual wells through a multichannel electro-chemical impedance monitoring circuit. Additional sensors are provided for monitoring luminosity, humidity, temperature, pH, and CO2 release. The embedded electronic board is equipped with a flash memory to accumulate and store sensor data for long periods of time, as well as with a low-power micro-controller, and a power management unit to control and supply all electronic building blocks. When a receiver is located within the transmission range of the EcoChip, a low-power wireless transceiver allows transmission of sensor data stored from on-board memory. We report the measured performance of the system, and we present experimental results obtained in the field during a pilot study performed with the EcoChip deployed in the village of Kuujjuarapik, at a latitude of 55 degrees, in Northern Canada.","PeriodicalId":6569,"journal":{"name":"2018 IEEE International Symposium on Circuits and Systems (ISCAS)","volume":"26 1","pages":"1-5"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83525344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-27DOI: 10.1109/ISCAS.2018.8351863
Bathiya Senevirathna, P. Abshire
Wearable electronics capable of recording and transmitting biosignals can provide convenient and pervasive health monitoring. The wireless transmission bandwidth limits the number of recording sites that can be monitored at one time. Compressed sensing (CS) is a promising approach that uses computationally efficient encoding to reduce the number of samples that are transmitted wirelessly, allowing more channels to be monitored over a transmission channel. The rakeness CS approach shows improved performance for higher compression rates, but in prior work it has only been evaluated for single channel data. We analyze the fidelity tradeoffs for compressed sensing implemented on a mobile electroencephalography (EEG) system. We propose several methods for spatiotemporal encoding in rakeness CS and evaluate the performance using a spontaneous EEG dataset recorded during moderate movement. Reconstruction performance depends strongly on the compression ratio and weakly on the method of spatiotemporal encoding. This suggests weak spatial correlation between the different channels of EEG data, which were recorded in an experiment involving self-initiated movement.
{"title":"Spatio-temporal compressed sensing for real-time wireless EEG monitoring","authors":"Bathiya Senevirathna, P. Abshire","doi":"10.1109/ISCAS.2018.8351863","DOIUrl":"https://doi.org/10.1109/ISCAS.2018.8351863","url":null,"abstract":"Wearable electronics capable of recording and transmitting biosignals can provide convenient and pervasive health monitoring. The wireless transmission bandwidth limits the number of recording sites that can be monitored at one time. Compressed sensing (CS) is a promising approach that uses computationally efficient encoding to reduce the number of samples that are transmitted wirelessly, allowing more channels to be monitored over a transmission channel. The rakeness CS approach shows improved performance for higher compression rates, but in prior work it has only been evaluated for single channel data. We analyze the fidelity tradeoffs for compressed sensing implemented on a mobile electroencephalography (EEG) system. We propose several methods for spatiotemporal encoding in rakeness CS and evaluate the performance using a spontaneous EEG dataset recorded during moderate movement. Reconstruction performance depends strongly on the compression ratio and weakly on the method of spatiotemporal encoding. This suggests weak spatial correlation between the different channels of EEG data, which were recorded in an experiment involving self-initiated movement.","PeriodicalId":6569,"journal":{"name":"2018 IEEE International Symposium on Circuits and Systems (ISCAS)","volume":"124 1","pages":"1-5"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88604860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-27DOI: 10.1109/ISCAS.2018.8351141
S. Pavan, Siddharth Baskaran
A novice continuous-time delta-sigma designer is faced with an admittedly complex maze of possible design choices. The right architecture often determines how efficiently the modulator can be implemented. This paper critically examines various popular delta-sigma architectures. It concludes that a single-bit modulator with FIR feedback is a prime candidate that enables a power-efficient implementation for a variety of specifications. To support this thesis, measurement results of an audio delta-sigma modulator, designed in a 65 nm CMOS process are given. The modulator, which incorporates FIR feedback and chopping to reduce 1/f noise, achieves 98.6 dB peak SNDR in a 24 kHz bandwidth and consumes only 260 μ W from a 1.2 V supply.
{"title":"What Architecture Should I Choose for my Continuous-Time Delta-Sigma Modulator?","authors":"S. Pavan, Siddharth Baskaran","doi":"10.1109/ISCAS.2018.8351141","DOIUrl":"https://doi.org/10.1109/ISCAS.2018.8351141","url":null,"abstract":"A novice continuous-time delta-sigma designer is faced with an admittedly complex maze of possible design choices. The right architecture often determines how efficiently the modulator can be implemented. This paper critically examines various popular delta-sigma architectures. It concludes that a single-bit modulator with FIR feedback is a prime candidate that enables a power-efficient implementation for a variety of specifications. To support this thesis, measurement results of an audio delta-sigma modulator, designed in a 65 nm CMOS process are given. The modulator, which incorporates FIR feedback and chopping to reduce 1/f noise, achieves 98.6 dB peak SNDR in a 24 kHz bandwidth and consumes only 260 μ W from a 1.2 V supply.","PeriodicalId":6569,"journal":{"name":"2018 IEEE International Symposium on Circuits and Systems (ISCAS)","volume":"120 1","pages":"1-5"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89373440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-27DOI: 10.1109/ISCAS.2018.8351332
Imad Benacer, F. Boyer, Y. Savaria
This paper presents a traffic manager architecture targeting to meet today's networking requirements, especially reduced latency, and to support the upcoming 5G technology in the software defined networking context. The proposed traffic manager functionalities are policing, scheduling, shaping, and queuing of incoming traffic (packets). The incoming traffic is assumed to be a set of flows in a network processing unit. Traffic management imposes constraints on packets to be sent out in such a way to meet the allowed bandwidth quotas for each flow, and enforce desired quality of service (QoS) targets. The FPGA prototyped architecture is based on the C++ language and is synthesized with the Vivado High-Level Synthesis (HLS) tool. The proposed traffic manager design supports 40 Gb/s per egress port for 64-byte sized packets, running at 80 MHz when implemented on a ZC706 Xilinx board. A throughput improvement of 4.0× over previous reported works is claimed.
{"title":"Design of a Low Latency 40 Gb/s Flow-Based Traffic Manager Using High-Level Synthesis","authors":"Imad Benacer, F. Boyer, Y. Savaria","doi":"10.1109/ISCAS.2018.8351332","DOIUrl":"https://doi.org/10.1109/ISCAS.2018.8351332","url":null,"abstract":"This paper presents a traffic manager architecture targeting to meet today's networking requirements, especially reduced latency, and to support the upcoming 5G technology in the software defined networking context. The proposed traffic manager functionalities are policing, scheduling, shaping, and queuing of incoming traffic (packets). The incoming traffic is assumed to be a set of flows in a network processing unit. Traffic management imposes constraints on packets to be sent out in such a way to meet the allowed bandwidth quotas for each flow, and enforce desired quality of service (QoS) targets. The FPGA prototyped architecture is based on the C++ language and is synthesized with the Vivado High-Level Synthesis (HLS) tool. The proposed traffic manager design supports 40 Gb/s per egress port for 64-byte sized packets, running at 80 MHz when implemented on a ZC706 Xilinx board. A throughput improvement of 4.0× over previous reported works is claimed.","PeriodicalId":6569,"journal":{"name":"2018 IEEE International Symposium on Circuits and Systems (ISCAS)","volume":"33 1","pages":"1-5"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88129263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-27DOI: 10.1109/ISCAS.2018.8351030
Runze Han, P. Huang, Y. Xiang, C. Liu, Zhen Dong, Z. Su, Y. B. Liu, L. Liu, X. Liu, Jinfeng Kang
A novel convolution computing paradigm based on the NOR Flash Array is proposed. Significant improvements both in computing speed and energy consumption are achieved compared to CMOS-based logic computing paradigms. Regarding to the feature extraction task from a 256×256 image, the computing speed of 3.9×104 frame per second (fps) and the energy consumption of 0.057nJ/pixel are achieved using the proposed computing paradigm.
{"title":"A Novel Convolution Computing Paradigm Based on NOR Flash Array with High Computing Speed and Energy Efficient","authors":"Runze Han, P. Huang, Y. Xiang, C. Liu, Zhen Dong, Z. Su, Y. B. Liu, L. Liu, X. Liu, Jinfeng Kang","doi":"10.1109/ISCAS.2018.8351030","DOIUrl":"https://doi.org/10.1109/ISCAS.2018.8351030","url":null,"abstract":"A novel convolution computing paradigm based on the NOR Flash Array is proposed. Significant improvements both in computing speed and energy consumption are achieved compared to CMOS-based logic computing paradigms. Regarding to the feature extraction task from a 256×256 image, the computing speed of 3.9×104 frame per second (fps) and the energy consumption of 0.057nJ/pixel are achieved using the proposed computing paradigm.","PeriodicalId":6569,"journal":{"name":"2018 IEEE International Symposium on Circuits and Systems (ISCAS)","volume":"68 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83626852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-27DOI: 10.1109/ISCAS.2018.8351096
Xiaodong Meng, Xing Li, Y. Yao, C. Tsui, W. Ki
An ultra-lower-power reconfigurable voltage detector for indoor solar energy harvester is presented. The voltage detector monitors the solar cell voltage and sends out a flag signal if the solar cell voltage surpasses the triggering threshold of the detector. Instead of using a traditional dynamic comparator, this design is based on a power-on-reset (POR) circuit. A POR circuit has ultra-low quiescent loss and a fixed triggering threshold, which is determined by its topology and process. Our improvement is to use a feedback loop that allows the triggering threshold to be reconfigurable. The average quiescent loss of this POR voltage detector circuit is 2.774nW. Process and temperature variations can also be compensated by the feedback loop. The energy harvesting system is designed with a 0.18 p,m CMOS process. Equipped with the proposed voltage detector, the whole system achieves a 93.21% peak efficiency at 200μW input power.
{"title":"An Indoor Solar Energy Harvester with Ultra-Low-Power Reconfigurable Power-On-Reset-Styled Voltage Detector","authors":"Xiaodong Meng, Xing Li, Y. Yao, C. Tsui, W. Ki","doi":"10.1109/ISCAS.2018.8351096","DOIUrl":"https://doi.org/10.1109/ISCAS.2018.8351096","url":null,"abstract":"An ultra-lower-power reconfigurable voltage detector for indoor solar energy harvester is presented. The voltage detector monitors the solar cell voltage and sends out a flag signal if the solar cell voltage surpasses the triggering threshold of the detector. Instead of using a traditional dynamic comparator, this design is based on a power-on-reset (POR) circuit. A POR circuit has ultra-low quiescent loss and a fixed triggering threshold, which is determined by its topology and process. Our improvement is to use a feedback loop that allows the triggering threshold to be reconfigurable. The average quiescent loss of this POR voltage detector circuit is 2.774nW. Process and temperature variations can also be compensated by the feedback loop. The energy harvesting system is designed with a 0.18 p,m CMOS process. Equipped with the proposed voltage detector, the whole system achieves a 93.21% peak efficiency at 200μW input power.","PeriodicalId":6569,"journal":{"name":"2018 IEEE International Symposium on Circuits and Systems (ISCAS)","volume":"6 1","pages":"1-5"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89963849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-27DOI: 10.1109/ISCAS.2018.8351740
M. Stanisavljevic, T. Mittelholzer, N. Papandreou, Thomas Parnell, H. Pozidis
Next-generation memory (NGM) technologies present a major opportunity but also a significant challenge, due to their intricate reliability issues. In particular, multilevel-cell (MLC) storage is highly desirable for increasing storage capacity and lowering total cost-per-bit. In phase-change memory (PCM), MLC storage is hampered by sensitivity to temperature variations and resistance drift. A novel drift-invariant detection (DID) scheme that estimates variable read thresholds based on ordered statistics and clustering of the soft read-back signals from a small block of 32 cells has been developed and implemented in hardware to improve reliability and prolong data retention. A low-complexity implementation of the DID on a FPGA platform comprises 20'000 LUTs and 6'000 flip-flops and has a latency of 90ns. We present results from an extensive performance verification that ascertains highly reliable data retrieval up to 13 orders of magnitude in time after programming. Such elevated reliability is necessary for the most anticipated application of NGM, namely persistent far-memory, where the NGM is used as a large memory pool, possibly together with DRAM.
{"title":"Drift-Invariant Detection for Multilevel Phase-Change Memory","authors":"M. Stanisavljevic, T. Mittelholzer, N. Papandreou, Thomas Parnell, H. Pozidis","doi":"10.1109/ISCAS.2018.8351740","DOIUrl":"https://doi.org/10.1109/ISCAS.2018.8351740","url":null,"abstract":"Next-generation memory (NGM) technologies present a major opportunity but also a significant challenge, due to their intricate reliability issues. In particular, multilevel-cell (MLC) storage is highly desirable for increasing storage capacity and lowering total cost-per-bit. In phase-change memory (PCM), MLC storage is hampered by sensitivity to temperature variations and resistance drift. A novel drift-invariant detection (DID) scheme that estimates variable read thresholds based on ordered statistics and clustering of the soft read-back signals from a small block of 32 cells has been developed and implemented in hardware to improve reliability and prolong data retention. A low-complexity implementation of the DID on a FPGA platform comprises 20'000 LUTs and 6'000 flip-flops and has a latency of 90ns. We present results from an extensive performance verification that ascertains highly reliable data retrieval up to 13 orders of magnitude in time after programming. Such elevated reliability is necessary for the most anticipated application of NGM, namely persistent far-memory, where the NGM is used as a large memory pool, possibly together with DRAM.","PeriodicalId":6569,"journal":{"name":"2018 IEEE International Symposium on Circuits and Systems (ISCAS)","volume":"19 1","pages":"1-5"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89529773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-27DOI: 10.1109/ISCAS.2018.8351749
Satyajit Das, Kevin J. M. Martin, P. Coussy, D. Rossi
IoT end-nodes require high performance and extreme energy efficiency to cope with complex near-sensor data analytics algorithms. Processing on multiple programmable processors operating in near-threshold is emerging as a promising solution to exploit the energy boost given by low-voltage operation, while recovering the related frequency degradation with parallelism. In this work, we present a heterogeneous cluster architecture extending a traditional parallel processor cluster with a reconfigurable Integrated Programmable Array (IPA) accelerator. While programmable processors guarantee programming legacy to easily manage peripherals, radio software stacks as well as the global program flow, offloading data-intensive and control-intensive kernels to the IPA leads to much higher system level performance and energy-efficiency. Experimental results show that the proposed heterogeneous cluster outperforms an 8-core homogeneous architecture by up to 4.8× in performance and 4.5× in energy efficiency when executing a mix of control-intensive and data-intensive kernels typical of near-sensor data analytics applications.
{"title":"A Heterogeneous Cluster with Reconfigurable Accelerator for Energy Efficient Near-Sensor Data Analytics","authors":"Satyajit Das, Kevin J. M. Martin, P. Coussy, D. Rossi","doi":"10.1109/ISCAS.2018.8351749","DOIUrl":"https://doi.org/10.1109/ISCAS.2018.8351749","url":null,"abstract":"IoT end-nodes require high performance and extreme energy efficiency to cope with complex near-sensor data analytics algorithms. Processing on multiple programmable processors operating in near-threshold is emerging as a promising solution to exploit the energy boost given by low-voltage operation, while recovering the related frequency degradation with parallelism. In this work, we present a heterogeneous cluster architecture extending a traditional parallel processor cluster with a reconfigurable Integrated Programmable Array (IPA) accelerator. While programmable processors guarantee programming legacy to easily manage peripherals, radio software stacks as well as the global program flow, offloading data-intensive and control-intensive kernels to the IPA leads to much higher system level performance and energy-efficiency. Experimental results show that the proposed heterogeneous cluster outperforms an 8-core homogeneous architecture by up to 4.8× in performance and 4.5× in energy efficiency when executing a mix of control-intensive and data-intensive kernels typical of near-sensor data analytics applications.","PeriodicalId":6569,"journal":{"name":"2018 IEEE International Symposium on Circuits and Systems (ISCAS)","volume":"35 1","pages":"1-5"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90267124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}