Donkyu Baek, Yukai Chen, Alberto Bocca, A. Macii, E. Macii, M. Poncino
Drones are becoming increasingly popular in the commercial market for various package delivery services. In this scenario, the mostly adopted drones are quad-rotors (i.e., quadcopters). The energy consumed by a drone may become an issue, since it may affect (i) the delivery deadline (quality of service), (ii) the number of packages that can be delivered (throughput) and (iii) the battery lifetime (number of recharging cycles). It is thus fundamental try to find the proper compromise between the energy used to complete the delivery and the speed at which the quadcopter flies to reach the destination. In order to achieve this, we have to consider that the energy required by the drone for completing a given delivery task does not exactly correspond to the energy requested to the battery, since the latter is a non-ideal power supply that is able to deliver power with different efficiencies depending on its state of charge. In this paper, we demonstrate that the proposed battery-aware delivery scheduling algorithm carries more packages than the traditional delivery model with the same battery capacity. Moreover, the battery-aware delivery model is 17% more accurate than the traditional delivery model for the same delivery scheme, which prevents the unexpected drone landing.
{"title":"Battery-Aware Energy Model of Drone Delivery Tasks","authors":"Donkyu Baek, Yukai Chen, Alberto Bocca, A. Macii, E. Macii, M. Poncino","doi":"10.1145/3218603.3218614","DOIUrl":"https://doi.org/10.1145/3218603.3218614","url":null,"abstract":"Drones are becoming increasingly popular in the commercial market for various package delivery services. In this scenario, the mostly adopted drones are quad-rotors (i.e., quadcopters). The energy consumed by a drone may become an issue, since it may affect (i) the delivery deadline (quality of service), (ii) the number of packages that can be delivered (throughput) and (iii) the battery lifetime (number of recharging cycles). It is thus fundamental try to find the proper compromise between the energy used to complete the delivery and the speed at which the quadcopter flies to reach the destination. In order to achieve this, we have to consider that the energy required by the drone for completing a given delivery task does not exactly correspond to the energy requested to the battery, since the latter is a non-ideal power supply that is able to deliver power with different efficiencies depending on its state of charge. In this paper, we demonstrate that the proposed battery-aware delivery scheduling algorithm carries more packages than the traditional delivery model with the same battery capacity. Moreover, the battery-aware delivery model is 17% more accurate than the traditional delivery model for the same delivery scheme, which prevents the unexpected drone landing.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85667000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seungkyu Choi, Jaehyeong Sim, Myeonggu Kang, L. Kim
Training convolutional neural network on device has become essential where it allows applications to consider user's individual environment. Meanwhile, the weight update operation from the training process is the primary factor of high energy consumption due to its substantial memory accesses. We propose a dedicated weight update architecture with two key features: (1) a specialized local buffer for the DRAM access deduction (2) a novel dataflow and its suitable processing element array structure for weight gradient computation to optimize the energy consumed by internal memories. Our scheme achieves 14.3%-30.2% total energy reduction by drastically eliminating the memory accesses.
{"title":"TrainWare","authors":"Seungkyu Choi, Jaehyeong Sim, Myeonggu Kang, L. Kim","doi":"10.1145/3218603.3218625","DOIUrl":"https://doi.org/10.1145/3218603.3218625","url":null,"abstract":"Training convolutional neural network on device has become essential where it allows applications to consider user's individual environment. Meanwhile, the weight update operation from the training process is the primary factor of high energy consumption due to its substantial memory accesses. We propose a dedicated weight update architecture with two key features: (1) a specialized local buffer for the DRAM access deduction (2) a novel dataflow and its suitable processing element array structure for weight gradient computation to optimize the energy consumed by internal memories. Our scheme achieves 14.3%-30.2% total energy reduction by drastically eliminating the memory accesses.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76189655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marzieh Vaeztourshizi, M. Kamal, A. Afzali-Kusha, M. Pedram
In1 this paper, we present a highly accurate and energy efficient non-iterative divider, which uses multiplication as its main building block. In this structure, the division operation is performed by first reforming both dividend and divisor inputs, and then multiplying the rounded value of the scaled dividend by the reciprocal of the rounded value of the scaled divisor. Precisely, the interval representing the fractional value of the scaled divisor is partitioned into non-overlapping sub-intervals, and the reciprocal of the scaled divisor is then approximated with a linear function in each of these sub-intervals. The efficacy of the proposed divider structure is assessed by comparing its design parameters and accuracy with state-of-the-art, non-iterative approximate dividers as well as exact dividers in 45nm digital CMOS technology. Circuit simulation results show that the mean absolute relative error of the proposed structure for doing 1 32-bit division is less than 0.2%, while the proposed structure has significantly lower energy consumption than the exact divider. Finally, the effectiveness of the proposed divider in one image processing application is reported and discussed.
{"title":"An Energy-Efficient, Yet Highly-Accurate, Approximate Non-Iterative Divider","authors":"Marzieh Vaeztourshizi, M. Kamal, A. Afzali-Kusha, M. Pedram","doi":"10.1145/3218603.3218650","DOIUrl":"https://doi.org/10.1145/3218603.3218650","url":null,"abstract":"In1 this paper, we present a highly accurate and energy efficient non-iterative divider, which uses multiplication as its main building block. In this structure, the division operation is performed by first reforming both dividend and divisor inputs, and then multiplying the rounded value of the scaled dividend by the reciprocal of the rounded value of the scaled divisor. Precisely, the interval representing the fractional value of the scaled divisor is partitioned into non-overlapping sub-intervals, and the reciprocal of the scaled divisor is then approximated with a linear function in each of these sub-intervals. The efficacy of the proposed divider structure is assessed by comparing its design parameters and accuracy with state-of-the-art, non-iterative approximate dividers as well as exact dividers in 45nm digital CMOS technology. Circuit simulation results show that the mean absolute relative error of the proposed structure for doing 1 32-bit division is less than 0.2%, while the proposed structure has significantly lower energy consumption than the exact divider. Finally, the effectiveness of the proposed divider in one image processing application is reported and discussed.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89402070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pramesh Pandey, Asmita Pal, Koushik Chakraborty, Sanghamitra Roy
SRAM-based PUFs (SPUFs) have emerged as promising security primitives for low-power devices. However, operating 8T-SPUFs at Near-Threshold Computing (NTC) realm is plagued by exacerbated process variation (PV) sensitivity which thwarts their reliable operation. In this paper, we demonstrate the massive degradation in the reliability and uniformity characteristics of 8T-SPUF. By exploiting the opportunities bestowed by schematic asymmetry of 8T-SPUF cells, we propose biasing and sizing based design strategies. Our techniques achieve an immense improvement of more than 55% in the percentage of unreliable cells and improves the proximity to ideal uniformity by 82%, over a baseline NTC 8T-SPUF with no enhancement.
{"title":"Reliability and Uniformity Enhancement in 8T-SRAM based PUFs operating at NTC","authors":"Pramesh Pandey, Asmita Pal, Koushik Chakraborty, Sanghamitra Roy","doi":"10.1145/3218603.3218642","DOIUrl":"https://doi.org/10.1145/3218603.3218642","url":null,"abstract":"SRAM-based PUFs (SPUFs) have emerged as promising security primitives for low-power devices. However, operating 8T-SPUFs at Near-Threshold Computing (NTC) realm is plagued by exacerbated process variation (PV) sensitivity which thwarts their reliable operation. In this paper, we demonstrate the massive degradation in the reliability and uniformity characteristics of 8T-SPUF. By exploiting the opportunities bestowed by schematic asymmetry of 8T-SPUF cells, we propose biasing and sizing based design strategies. Our techniques achieve an immense improvement of more than 55% in the percentage of unreliable cells and improves the proximity to ideal uniformity by 82%, over a baseline NTC 8T-SPUF with no enhancement.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89130957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fruit flies (Drosophila melanogaster) are small insects, with correspondingly small power budgets. Despite this, they perform sophisticated neural computations in real time. Careful study of these insects is revealing how some of these circuits work. Insights from these systems might be helpful in designing other low power circuits.
{"title":"Insights from Biology: Low Power Circuits in the Fruit Fly","authors":"Louis K. Scheffer","doi":"10.1145/3218603.3241337","DOIUrl":"https://doi.org/10.1145/3218603.3241337","url":null,"abstract":"Fruit flies (Drosophila melanogaster) are small insects, with correspondingly small power budgets. Despite this, they perform sophisticated neural computations in real time. Careful study of these insects is revealing how some of these circuits work. Insights from these systems might be helpful in designing other low power circuits.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81023687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mapping Convolutional Neural Network (CNN) to a neuromorphic hardware has been inefficient in synapse memory usage because both kernel/input reuse are not exploited well. We propose a method to enable kernel reuse by utilizing axonal delay, which is a biological parameter for a spiking neuron. Using IBM TrueNorth as a test platform, we demonstrate that the number of cores, neurons, synapses, and synaptic operations per time step can be reduced by up to 20.9x, 27.9x, 88.4x, and 1586x, respectively, compared to the conventional scheme, which raises the possibility of implementing large-scale CNN on neuromorphic hardware.
{"title":"Compact Convolution Mapping on Neuromorphic Hardware using Axonal Delay","authors":"Jinseok Kim, Yulhwa Kim, Sungho Kim, Jae-Joon Kim","doi":"10.1145/3218603.3218639","DOIUrl":"https://doi.org/10.1145/3218603.3218639","url":null,"abstract":"Mapping Convolutional Neural Network (CNN) to a neuromorphic hardware has been inefficient in synapse memory usage because both kernel/input reuse are not exploited well. We propose a method to enable kernel reuse by utilizing axonal delay, which is a biological parameter for a spiking neuron. Using IBM TrueNorth as a test platform, we demonstrate that the number of cores, neurons, synapses, and synaptic operations per time step can be reduced by up to 20.9x, 27.9x, 88.4x, and 1586x, respectively, compared to the conventional scheme, which raises the possibility of implementing large-scale CNN on neuromorphic hardware.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81011634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As a large number of neurons and synapses are needed in spike neural network (SNN) design, emerging devices have been employed to implement synapses and neurons. In this paper, we present a stochastic multi-bit spin orbit torque (SOT) memory based synapse, where only one SOT device is switched for potentiation and depression using modified Gray code. The modified Gray code based approach needs only N devices to represent 2N levels of synapse weights. Early read termination scheme is also adopted to reduce the power consumption of training process by turning off less associated neurons and its ADCs. For MNIST dataset, with comparable classification accuracy, the proposed SNN architecture using 3-bit synapse achieves 68.7% reduction of ADC overhead compared to the conventional 8-level synapse.
{"title":"Spin Orbit Torque Device based Stochastic Multi-bit Synapses for On-chip STDP Learning","authors":"Gyuseong Kang, Yunho Jang, Jongsun Park","doi":"10.1145/3218603.3218654","DOIUrl":"https://doi.org/10.1145/3218603.3218654","url":null,"abstract":"As a large number of neurons and synapses are needed in spike neural network (SNN) design, emerging devices have been employed to implement synapses and neurons. In this paper, we present a stochastic multi-bit spin orbit torque (SOT) memory based synapse, where only one SOT device is switched for potentiation and depression using modified Gray code. The modified Gray code based approach needs only N devices to represent 2N levels of synapse weights. Early read termination scheme is also adopted to reduce the power consumption of training process by turning off less associated neurons and its ADCs. For MNIST dataset, with comparable classification accuracy, the proposed SNN architecture using 3-bit synapse achieves 68.7% reduction of ADC overhead compared to the conventional 8-level synapse.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77161917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jeongho Hwang, Hong-Seok Choi, H. Do, Gyu-Seob Jeong, Daehyun Koh, Seong Ho Park, D. Jeong
The price and power consumption of standard HDMI cables exponentially rise when the data rate increases or cable runs longer. HDMI active optical cable (AOC) can potentially solve price and power issues since fibers are tolerant to loss. However, additional optical components such as vertical-cavity surface-emitting laser (VCSEL) and photodiode (PD) are required. Therefore, drivers and transimpedance amplifiers should be designed carefully for normal operations. In this paper, two types of 4-channel VCSEL drivers for HDMI AOC are presented. The first type of the driver passes data and bias separately. It uses off-chip capacitors for AC coupling. On the other hand, the second type of the driver passes data including DC value without using off-chip capacitors. Structures of the both drivers are based on push-pull current-mode logic (CML) to achieve better power efficiency. Drivers fabricated in 0.18-μm CMOS process consume 36.5 mW/channel at 6 Gb/s and 24.7 mW/channel at 12 Gb/s, respectively.
{"title":"4-Channel Push-Pull VCSEL Drivers for HDMI Active Optical Cable in 0.18-μm CMOS","authors":"Jeongho Hwang, Hong-Seok Choi, H. Do, Gyu-Seob Jeong, Daehyun Koh, Seong Ho Park, D. Jeong","doi":"10.1145/3218603.3218629","DOIUrl":"https://doi.org/10.1145/3218603.3218629","url":null,"abstract":"The price and power consumption of standard HDMI cables exponentially rise when the data rate increases or cable runs longer. HDMI active optical cable (AOC) can potentially solve price and power issues since fibers are tolerant to loss. However, additional optical components such as vertical-cavity surface-emitting laser (VCSEL) and photodiode (PD) are required. Therefore, drivers and transimpedance amplifiers should be designed carefully for normal operations. In this paper, two types of 4-channel VCSEL drivers for HDMI AOC are presented. The first type of the driver passes data and bias separately. It uses off-chip capacitors for AC coupling. On the other hand, the second type of the driver passes data including DC value without using off-chip capacitors. Structures of the both drivers are based on push-pull current-mode logic (CML) to achieve better power efficiency. Drivers fabricated in 0.18-μm CMOS process consume 36.5 mW/channel at 6 Gb/s and 24.7 mW/channel at 12 Gb/s, respectively.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73003513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Value similarity of operands across warps have been exploited to improve energy efficiency of GPUs. Prior work, however, incurs significant overheads to check value similarity for every instruction and does not improve performance as it does not reduce the number of executed instructions. This work proposes Lock 'n Load (LnL) which triggers approximate execution of code regions by only checking similarity of values returned from load instructions and fuses multiple approximated warps into a single warp.
{"title":"Load-Triggered Warp Approximation on GPU","authors":"Zhenhong Liu, Daniel Wong, N. Kim","doi":"10.1145/3218603.3218626","DOIUrl":"https://doi.org/10.1145/3218603.3218626","url":null,"abstract":"Value similarity of operands across warps have been exploited to improve energy efficiency of GPUs. Prior work, however, incurs significant overheads to check value similarity for every instruction and does not improve performance as it does not reduce the number of executed instructions. This work proposes Lock 'n Load (LnL) which triggers approximate execution of code regions by only checking similarity of values returned from load instructions and fuses multiple approximated warps into a single warp.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86986282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sheng Zhang, Adrian Tang, Zhewei Jiang, S. Sethumadhavan, Mingoo Seok
Most modern computing devices make available fine-grained control of operating frequency and voltage for power management. These interfaces, as demonstrated by recent attacks, open up a new class of software fault injection attacks that compromise security on commodity devices. CLKSCREW, a recently-published attack that stretches the frequency of devices beyond their operational limits to induce faults, is one such attack. Statically and permanently limiting frequency and voltage modulation space, i.e., guard-banding, could mitigate such attacks but it incurs large performance degradation and long testing time. Instead, in this paper, we propose a run-time technique which dynamically blacklists unsafe operating performance points using a neural-net model. The model is first trained offline in the design time and then subsequently adjusted at run-time by inspecting a selected set of features such as power management control registers, timing-error signals, and core temperature. We designed the algorithm and hardware, titled a BlackList (BL) core, which is capable of detecting and mitigating such power management-based security attack at high accuracy. The BL core incurs a reasonably small amount of overhead in power, delay, and area.
{"title":"Blacklist Core: Machine-Learning Based Dynamic Operating-Performance-Point Blacklisting for Mitigating Power-Management Security Attacks","authors":"Sheng Zhang, Adrian Tang, Zhewei Jiang, S. Sethumadhavan, Mingoo Seok","doi":"10.1145/3218603.3218624","DOIUrl":"https://doi.org/10.1145/3218603.3218624","url":null,"abstract":"Most modern computing devices make available fine-grained control of operating frequency and voltage for power management. These interfaces, as demonstrated by recent attacks, open up a new class of software fault injection attacks that compromise security on commodity devices. CLKSCREW, a recently-published attack that stretches the frequency of devices beyond their operational limits to induce faults, is one such attack. Statically and permanently limiting frequency and voltage modulation space, i.e., guard-banding, could mitigate such attacks but it incurs large performance degradation and long testing time. Instead, in this paper, we propose a run-time technique which dynamically blacklists unsafe operating performance points using a neural-net model. The model is first trained offline in the design time and then subsequently adjusted at run-time by inspecting a selected set of features such as power management control registers, timing-error signals, and core temperature. We designed the algorithm and hardware, titled a BlackList (BL) core, which is capable of detecting and mitigating such power management-based security attack at high accuracy. The BL core incurs a reasonably small amount of overhead in power, delay, and area.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74184303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}