首页 > 最新文献

Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)最新文献

英文 中文
Battery-Aware Energy Model of Drone Delivery Tasks 无人机送货任务的电池感知能量模型
Donkyu Baek, Yukai Chen, Alberto Bocca, A. Macii, E. Macii, M. Poncino
Drones are becoming increasingly popular in the commercial market for various package delivery services. In this scenario, the mostly adopted drones are quad-rotors (i.e., quadcopters). The energy consumed by a drone may become an issue, since it may affect (i) the delivery deadline (quality of service), (ii) the number of packages that can be delivered (throughput) and (iii) the battery lifetime (number of recharging cycles). It is thus fundamental try to find the proper compromise between the energy used to complete the delivery and the speed at which the quadcopter flies to reach the destination. In order to achieve this, we have to consider that the energy required by the drone for completing a given delivery task does not exactly correspond to the energy requested to the battery, since the latter is a non-ideal power supply that is able to deliver power with different efficiencies depending on its state of charge. In this paper, we demonstrate that the proposed battery-aware delivery scheduling algorithm carries more packages than the traditional delivery model with the same battery capacity. Moreover, the battery-aware delivery model is 17% more accurate than the traditional delivery model for the same delivery scheme, which prevents the unexpected drone landing.
无人机在各种包裹递送服务的商业市场上越来越受欢迎。在这种情况下,大多数采用的无人机是四旋翼(即四轴飞行器)。无人机消耗的能量可能会成为一个问题,因为它可能会影响(i)交付截止日期(服务质量),(ii)可以交付的包裹数量(吞吐量)和(iii)电池寿命(充电周期次数)。因此,找到用于完成交付的能量和四轴飞行器飞行到达目的地的速度之间的适当妥协是根本的尝试。为了实现这一点,我们必须考虑到,无人机完成给定交付任务所需的能量并不完全对应于电池所要求的能量,因为后者是一种非理想的电源,能够根据其充电状态以不同的效率提供电力。在本文中,我们证明了在相同电池容量的情况下,所提出的电池感知配送调度算法比传统配送模型携带更多的包裹。此外,在相同的交付方案下,电池感知交付模型的准确性比传统交付模型高出17%,从而防止了无人机的意外着陆。
{"title":"Battery-Aware Energy Model of Drone Delivery Tasks","authors":"Donkyu Baek, Yukai Chen, Alberto Bocca, A. Macii, E. Macii, M. Poncino","doi":"10.1145/3218603.3218614","DOIUrl":"https://doi.org/10.1145/3218603.3218614","url":null,"abstract":"Drones are becoming increasingly popular in the commercial market for various package delivery services. In this scenario, the mostly adopted drones are quad-rotors (i.e., quadcopters). The energy consumed by a drone may become an issue, since it may affect (i) the delivery deadline (quality of service), (ii) the number of packages that can be delivered (throughput) and (iii) the battery lifetime (number of recharging cycles). It is thus fundamental try to find the proper compromise between the energy used to complete the delivery and the speed at which the quadcopter flies to reach the destination. In order to achieve this, we have to consider that the energy required by the drone for completing a given delivery task does not exactly correspond to the energy requested to the battery, since the latter is a non-ideal power supply that is able to deliver power with different efficiencies depending on its state of charge. In this paper, we demonstrate that the proposed battery-aware delivery scheduling algorithm carries more packages than the traditional delivery model with the same battery capacity. Moreover, the battery-aware delivery model is 17% more accurate than the traditional delivery model for the same delivery scheme, which prevents the unexpected drone landing.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85667000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
TrainWare TrainWare
Seungkyu Choi, Jaehyeong Sim, Myeonggu Kang, L. Kim
Training convolutional neural network on device has become essential where it allows applications to consider user's individual environment. Meanwhile, the weight update operation from the training process is the primary factor of high energy consumption due to its substantial memory accesses. We propose a dedicated weight update architecture with two key features: (1) a specialized local buffer for the DRAM access deduction (2) a novel dataflow and its suitable processing element array structure for weight gradient computation to optimize the energy consumed by internal memories. Our scheme achieves 14.3%-30.2% total energy reduction by drastically eliminating the memory accesses.
{"title":"TrainWare","authors":"Seungkyu Choi, Jaehyeong Sim, Myeonggu Kang, L. Kim","doi":"10.1145/3218603.3218625","DOIUrl":"https://doi.org/10.1145/3218603.3218625","url":null,"abstract":"Training convolutional neural network on device has become essential where it allows applications to consider user's individual environment. Meanwhile, the weight update operation from the training process is the primary factor of high energy consumption due to its substantial memory accesses. We propose a dedicated weight update architecture with two key features: (1) a specialized local buffer for the DRAM access deduction (2) a novel dataflow and its suitable processing element array structure for weight gradient computation to optimize the energy consumed by internal memories. Our scheme achieves 14.3%-30.2% total energy reduction by drastically eliminating the memory accesses.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76189655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
An Energy-Efficient, Yet Highly-Accurate, Approximate Non-Iterative Divider 一种高效、高精度的近似非迭代除法器
Marzieh Vaeztourshizi, M. Kamal, A. Afzali-Kusha, M. Pedram
In1 this paper, we present a highly accurate and energy efficient non-iterative divider, which uses multiplication as its main building block. In this structure, the division operation is performed by first reforming both dividend and divisor inputs, and then multiplying the rounded value of the scaled dividend by the reciprocal of the rounded value of the scaled divisor. Precisely, the interval representing the fractional value of the scaled divisor is partitioned into non-overlapping sub-intervals, and the reciprocal of the scaled divisor is then approximated with a linear function in each of these sub-intervals. The efficacy of the proposed divider structure is assessed by comparing its design parameters and accuracy with state-of-the-art, non-iterative approximate dividers as well as exact dividers in 45nm digital CMOS technology. Circuit simulation results show that the mean absolute relative error of the proposed structure for doing 1 32-bit division is less than 0.2%, while the proposed structure has significantly lower energy consumption than the exact divider. Finally, the effectiveness of the proposed divider in one image processing application is reported and discussed.
在本文中,我们提出了一种高精度和节能的非迭代除法,它以乘法为主要构件。在这个结构中,执行除法操作首先对被除数和除数输入进行重组,然后将被除数的四舍五入值乘以被除数的四舍五入值的倒数。精确地说,将表示比例因子分数值的区间划分为不重叠的子区间,然后在每个子区间内用线性函数逼近比例因子的倒数。通过将其设计参数和精度与最先进的非迭代近似分频器以及45纳米数字CMOS技术中的精确分频器进行比较,评估了所提出的分频器结构的有效性。电路仿真结果表明,该结构进行1个32位除法的平均绝对相对误差小于0.2%,且能耗明显低于精确除法。最后,报告并讨论了该分频器在一个图像处理应用中的有效性。
{"title":"An Energy-Efficient, Yet Highly-Accurate, Approximate Non-Iterative Divider","authors":"Marzieh Vaeztourshizi, M. Kamal, A. Afzali-Kusha, M. Pedram","doi":"10.1145/3218603.3218650","DOIUrl":"https://doi.org/10.1145/3218603.3218650","url":null,"abstract":"In1 this paper, we present a highly accurate and energy efficient non-iterative divider, which uses multiplication as its main building block. In this structure, the division operation is performed by first reforming both dividend and divisor inputs, and then multiplying the rounded value of the scaled dividend by the reciprocal of the rounded value of the scaled divisor. Precisely, the interval representing the fractional value of the scaled divisor is partitioned into non-overlapping sub-intervals, and the reciprocal of the scaled divisor is then approximated with a linear function in each of these sub-intervals. The efficacy of the proposed divider structure is assessed by comparing its design parameters and accuracy with state-of-the-art, non-iterative approximate dividers as well as exact dividers in 45nm digital CMOS technology. Circuit simulation results show that the mean absolute relative error of the proposed structure for doing 1 32-bit division is less than 0.2%, while the proposed structure has significantly lower energy consumption than the exact divider. Finally, the effectiveness of the proposed divider in one image processing application is reported and discussed.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89402070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Reliability and Uniformity Enhancement in 8T-SRAM based PUFs operating at NTC NTC下基于8T-SRAM的puf可靠性和均匀性增强
Pramesh Pandey, Asmita Pal, Koushik Chakraborty, Sanghamitra Roy
SRAM-based PUFs (SPUFs) have emerged as promising security primitives for low-power devices. However, operating 8T-SPUFs at Near-Threshold Computing (NTC) realm is plagued by exacerbated process variation (PV) sensitivity which thwarts their reliable operation. In this paper, we demonstrate the massive degradation in the reliability and uniformity characteristics of 8T-SPUF. By exploiting the opportunities bestowed by schematic asymmetry of 8T-SPUF cells, we propose biasing and sizing based design strategies. Our techniques achieve an immense improvement of more than 55% in the percentage of unreliable cells and improves the proximity to ideal uniformity by 82%, over a baseline NTC 8T-SPUF with no enhancement.
基于sram的puf (spuf)已成为低功耗设备的有前途的安全原语。然而,在近阈值计算(NTC)领域运行8t - spuf受到过程变化(PV)敏感性加剧的困扰,从而阻碍了它们的可靠运行。在本文中,我们证明了8T-SPUF的可靠性和均匀性的严重退化。通过利用8T-SPUF电池的原理图不对称所带来的机会,我们提出了基于偏置和尺寸的设计策略。与基线NTC 8T-SPUF相比,我们的技术实现了超过55%的不可靠电池百分比的巨大改进,并将接近理想均匀度的程度提高了82%,而没有增强。
{"title":"Reliability and Uniformity Enhancement in 8T-SRAM based PUFs operating at NTC","authors":"Pramesh Pandey, Asmita Pal, Koushik Chakraborty, Sanghamitra Roy","doi":"10.1145/3218603.3218642","DOIUrl":"https://doi.org/10.1145/3218603.3218642","url":null,"abstract":"SRAM-based PUFs (SPUFs) have emerged as promising security primitives for low-power devices. However, operating 8T-SPUFs at Near-Threshold Computing (NTC) realm is plagued by exacerbated process variation (PV) sensitivity which thwarts their reliable operation. In this paper, we demonstrate the massive degradation in the reliability and uniformity characteristics of 8T-SPUF. By exploiting the opportunities bestowed by schematic asymmetry of 8T-SPUF cells, we propose biasing and sizing based design strategies. Our techniques achieve an immense improvement of more than 55% in the percentage of unreliable cells and improves the proximity to ideal uniformity by 82%, over a baseline NTC 8T-SPUF with no enhancement.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89130957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Insights from Biology: Low Power Circuits in the Fruit Fly 来自生物学的见解:果蝇的低功耗电路
Louis K. Scheffer
Fruit flies (Drosophila melanogaster) are small insects, with correspondingly small power budgets. Despite this, they perform sophisticated neural computations in real time. Careful study of these insects is revealing how some of these circuits work. Insights from these systems might be helpful in designing other low power circuits.
果蝇(Drosophila melanogaster)是一种小昆虫,相应的能量预算也小。尽管如此,它们可以实时执行复杂的神经计算。对这些昆虫的仔细研究揭示了其中一些回路是如何工作的。从这些系统中获得的见解可能有助于设计其他低功耗电路。
{"title":"Insights from Biology: Low Power Circuits in the Fruit Fly","authors":"Louis K. Scheffer","doi":"10.1145/3218603.3241337","DOIUrl":"https://doi.org/10.1145/3218603.3241337","url":null,"abstract":"Fruit flies (Drosophila melanogaster) are small insects, with correspondingly small power budgets. Despite this, they perform sophisticated neural computations in real time. Careful study of these insects is revealing how some of these circuits work. Insights from these systems might be helpful in designing other low power circuits.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81023687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compact Convolution Mapping on Neuromorphic Hardware using Axonal Delay 神经形态硬件的轴突延迟紧卷积映射
Jinseok Kim, Yulhwa Kim, Sungho Kim, Jae-Joon Kim
Mapping Convolutional Neural Network (CNN) to a neuromorphic hardware has been inefficient in synapse memory usage because both kernel/input reuse are not exploited well. We propose a method to enable kernel reuse by utilizing axonal delay, which is a biological parameter for a spiking neuron. Using IBM TrueNorth as a test platform, we demonstrate that the number of cores, neurons, synapses, and synaptic operations per time step can be reduced by up to 20.9x, 27.9x, 88.4x, and 1586x, respectively, compared to the conventional scheme, which raises the possibility of implementing large-scale CNN on neuromorphic hardware.
卷积神经网络(CNN)映射到神经形态硬件在突触内存使用中效率不高,因为内核/输入重用都没有得到很好的利用。我们提出了一种利用轴突延迟来实现核复用的方法,轴突延迟是一个生物学参数。使用IBM TrueNorth作为测试平台,我们证明了与传统方案相比,每个时间步的内核、神经元、突触和突触操作的数量分别减少了20.9倍、27.9倍、88.4倍和1586倍,这增加了在神经形态硬件上实现大规模CNN的可能性。
{"title":"Compact Convolution Mapping on Neuromorphic Hardware using Axonal Delay","authors":"Jinseok Kim, Yulhwa Kim, Sungho Kim, Jae-Joon Kim","doi":"10.1145/3218603.3218639","DOIUrl":"https://doi.org/10.1145/3218603.3218639","url":null,"abstract":"Mapping Convolutional Neural Network (CNN) to a neuromorphic hardware has been inefficient in synapse memory usage because both kernel/input reuse are not exploited well. We propose a method to enable kernel reuse by utilizing axonal delay, which is a biological parameter for a spiking neuron. Using IBM TrueNorth as a test platform, we demonstrate that the number of cores, neurons, synapses, and synaptic operations per time step can be reduced by up to 20.9x, 27.9x, 88.4x, and 1586x, respectively, compared to the conventional scheme, which raises the possibility of implementing large-scale CNN on neuromorphic hardware.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81011634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Spin Orbit Torque Device based Stochastic Multi-bit Synapses for On-chip STDP Learning 基于自旋轨道扭矩装置的随机多比特突触片上STDP学习
Gyuseong Kang, Yunho Jang, Jongsun Park
As a large number of neurons and synapses are needed in spike neural network (SNN) design, emerging devices have been employed to implement synapses and neurons. In this paper, we present a stochastic multi-bit spin orbit torque (SOT) memory based synapse, where only one SOT device is switched for potentiation and depression using modified Gray code. The modified Gray code based approach needs only N devices to represent 2N levels of synapse weights. Early read termination scheme is also adopted to reduce the power consumption of training process by turning off less associated neurons and its ADCs. For MNIST dataset, with comparable classification accuracy, the proposed SNN architecture using 3-bit synapse achieves 68.7% reduction of ADC overhead compared to the conventional 8-level synapse.
由于尖峰神经网络(SNN)的设计需要大量的神经元和突触,新兴的装置被用来实现突触和神经元。在本文中,我们提出了一种基于随机多比特自旋轨道扭矩(SOT)记忆的突触,其中只有一个SOT器件使用改进的Gray编码切换为增强和抑制。改进的基于Gray码的方法只需要N个设备来表示2N个层次的突触权重。通过关闭关联较少的神经元及其adc来降低训练过程的功耗。对于MNIST数据集,在分类精度相当的情况下,使用3位突触的SNN架构与传统的8位突触相比,ADC开销降低了68.7%。
{"title":"Spin Orbit Torque Device based Stochastic Multi-bit Synapses for On-chip STDP Learning","authors":"Gyuseong Kang, Yunho Jang, Jongsun Park","doi":"10.1145/3218603.3218654","DOIUrl":"https://doi.org/10.1145/3218603.3218654","url":null,"abstract":"As a large number of neurons and synapses are needed in spike neural network (SNN) design, emerging devices have been employed to implement synapses and neurons. In this paper, we present a stochastic multi-bit spin orbit torque (SOT) memory based synapse, where only one SOT device is switched for potentiation and depression using modified Gray code. The modified Gray code based approach needs only N devices to represent 2N levels of synapse weights. Early read termination scheme is also adopted to reduce the power consumption of training process by turning off less associated neurons and its ADCs. For MNIST dataset, with comparable classification accuracy, the proposed SNN architecture using 3-bit synapse achieves 68.7% reduction of ADC overhead compared to the conventional 8-level synapse.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77161917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
4-Channel Push-Pull VCSEL Drivers for HDMI Active Optical Cable in 0.18-μm CMOS 用于HDMI有源光缆的0.18 μm CMOS 4通道推挽式VCSEL驱动器
Jeongho Hwang, Hong-Seok Choi, H. Do, Gyu-Seob Jeong, Daehyun Koh, Seong Ho Park, D. Jeong
The price and power consumption of standard HDMI cables exponentially rise when the data rate increases or cable runs longer. HDMI active optical cable (AOC) can potentially solve price and power issues since fibers are tolerant to loss. However, additional optical components such as vertical-cavity surface-emitting laser (VCSEL) and photodiode (PD) are required. Therefore, drivers and transimpedance amplifiers should be designed carefully for normal operations. In this paper, two types of 4-channel VCSEL drivers for HDMI AOC are presented. The first type of the driver passes data and bias separately. It uses off-chip capacitors for AC coupling. On the other hand, the second type of the driver passes data including DC value without using off-chip capacitors. Structures of the both drivers are based on push-pull current-mode logic (CML) to achieve better power efficiency. Drivers fabricated in 0.18-μm CMOS process consume 36.5 mW/channel at 6 Gb/s and 24.7 mW/channel at 12 Gb/s, respectively.
当数据速率增加或电缆运行时间延长时,标准HDMI电缆的价格和功耗将呈指数级增长。HDMI有源光缆(AOC)可以潜在地解决价格和功率问题,因为光纤可以容忍损耗。然而,需要额外的光学元件,如垂直腔面发射激光器(VCSEL)和光电二极管(PD)。因此,驱动器和跨阻放大器应仔细设计,以保证正常运行。本文介绍了两种用于HDMI AOC的4通道VCSEL驱动程序。第一种驱动分别传递数据和偏置。它使用片外电容进行交流耦合。另一方面,第二种类型的驱动器传递包括直流值在内的数据,而不使用片外电容器。这两个驱动器的结构都基于推挽电流模式逻辑(CML),以实现更好的功率效率。采用0.18 μm CMOS工艺制造的驱动器在6gb /s和12gb /s下的功耗分别为36.5 mW/channel和24.7 mW/channel。
{"title":"4-Channel Push-Pull VCSEL Drivers for HDMI Active Optical Cable in 0.18-μm CMOS","authors":"Jeongho Hwang, Hong-Seok Choi, H. Do, Gyu-Seob Jeong, Daehyun Koh, Seong Ho Park, D. Jeong","doi":"10.1145/3218603.3218629","DOIUrl":"https://doi.org/10.1145/3218603.3218629","url":null,"abstract":"The price and power consumption of standard HDMI cables exponentially rise when the data rate increases or cable runs longer. HDMI active optical cable (AOC) can potentially solve price and power issues since fibers are tolerant to loss. However, additional optical components such as vertical-cavity surface-emitting laser (VCSEL) and photodiode (PD) are required. Therefore, drivers and transimpedance amplifiers should be designed carefully for normal operations. In this paper, two types of 4-channel VCSEL drivers for HDMI AOC are presented. The first type of the driver passes data and bias separately. It uses off-chip capacitors for AC coupling. On the other hand, the second type of the driver passes data including DC value without using off-chip capacitors. Structures of the both drivers are based on push-pull current-mode logic (CML) to achieve better power efficiency. Drivers fabricated in 0.18-μm CMOS process consume 36.5 mW/channel at 6 Gb/s and 24.7 mW/channel at 12 Gb/s, respectively.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73003513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Load-Triggered Warp Approximation on GPU GPU上的负载触发的翘曲近似
Zhenhong Liu, Daniel Wong, N. Kim
Value similarity of operands across warps have been exploited to improve energy efficiency of GPUs. Prior work, however, incurs significant overheads to check value similarity for every instruction and does not improve performance as it does not reduce the number of executed instructions. This work proposes Lock 'n Load (LnL) which triggers approximate execution of code regions by only checking similarity of values returned from load instructions and fuses multiple approximated warps into a single warp.
利用不同经线之间操作数的值相似性来提高gpu的能效。但是,之前的工作需要花费大量的开销来检查每个指令的值相似度,并且不能提高性能,因为它没有减少执行指令的数量。这项工作提出了锁定加载(LnL),它通过仅检查从加载指令返回的值的相似性来触发代码区域的近似执行,并将多个近似翘曲融合为单个翘曲。
{"title":"Load-Triggered Warp Approximation on GPU","authors":"Zhenhong Liu, Daniel Wong, N. Kim","doi":"10.1145/3218603.3218626","DOIUrl":"https://doi.org/10.1145/3218603.3218626","url":null,"abstract":"Value similarity of operands across warps have been exploited to improve energy efficiency of GPUs. Prior work, however, incurs significant overheads to check value similarity for every instruction and does not improve performance as it does not reduce the number of executed instructions. This work proposes Lock 'n Load (LnL) which triggers approximate execution of code regions by only checking similarity of values returned from load instructions and fuses multiple approximated warps into a single warp.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86986282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Blacklist Core: Machine-Learning Based Dynamic Operating-Performance-Point Blacklisting for Mitigating Power-Management Security Attacks 黑名单核心:基于机器学习的动态操作性能点黑名单,用于减轻电源管理安全攻击
Sheng Zhang, Adrian Tang, Zhewei Jiang, S. Sethumadhavan, Mingoo Seok
Most modern computing devices make available fine-grained control of operating frequency and voltage for power management. These interfaces, as demonstrated by recent attacks, open up a new class of software fault injection attacks that compromise security on commodity devices. CLKSCREW, a recently-published attack that stretches the frequency of devices beyond their operational limits to induce faults, is one such attack. Statically and permanently limiting frequency and voltage modulation space, i.e., guard-banding, could mitigate such attacks but it incurs large performance degradation and long testing time. Instead, in this paper, we propose a run-time technique which dynamically blacklists unsafe operating performance points using a neural-net model. The model is first trained offline in the design time and then subsequently adjusted at run-time by inspecting a selected set of features such as power management control registers, timing-error signals, and core temperature. We designed the algorithm and hardware, titled a BlackList (BL) core, which is capable of detecting and mitigating such power management-based security attack at high accuracy. The BL core incurs a reasonably small amount of overhead in power, delay, and area.
大多数现代计算设备都可以为电源管理提供细粒度的工作频率和电压控制。正如最近的攻击所证明的那样,这些接口开辟了一类新的软件故障注入攻击,危及商品设备的安全性。CLKSCREW是最近发布的一种攻击,它延长设备的频率,使其超出其运行限制,从而引发故障,就是这样一种攻击。静态和永久地限制频率和电压调制空间,即保护带,可以减轻这种攻击,但它会导致性能下降和长时间的测试。相反,在本文中,我们提出了一种运行时技术,该技术使用神经网络模型动态地将不安全的运行性能点列入黑名单。该模型首先在设计时离线训练,然后在运行时通过检查一组选定的特征(如电源管理控制寄存器、时序误差信号和核心温度)进行调整。我们设计了算法和硬件,称为黑名单(BL)核心,能够高精度地检测和减轻这种基于电源管理的安全攻击。BL核心在功率、延迟和面积方面的开销相当小。
{"title":"Blacklist Core: Machine-Learning Based Dynamic Operating-Performance-Point Blacklisting for Mitigating Power-Management Security Attacks","authors":"Sheng Zhang, Adrian Tang, Zhewei Jiang, S. Sethumadhavan, Mingoo Seok","doi":"10.1145/3218603.3218624","DOIUrl":"https://doi.org/10.1145/3218603.3218624","url":null,"abstract":"Most modern computing devices make available fine-grained control of operating frequency and voltage for power management. These interfaces, as demonstrated by recent attacks, open up a new class of software fault injection attacks that compromise security on commodity devices. CLKSCREW, a recently-published attack that stretches the frequency of devices beyond their operational limits to induce faults, is one such attack. Statically and permanently limiting frequency and voltage modulation space, i.e., guard-banding, could mitigate such attacks but it incurs large performance degradation and long testing time. Instead, in this paper, we propose a run-time technique which dynamically blacklists unsafe operating performance points using a neural-net model. The model is first trained offline in the design time and then subsequently adjusted at run-time by inspecting a selected set of features such as power management control registers, timing-error signals, and core temperature. We designed the algorithm and hardware, titled a BlackList (BL) core, which is capable of detecting and mitigating such power management-based security attack at high accuracy. The BL core incurs a reasonably small amount of overhead in power, delay, and area.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74184303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1