首页 > 最新文献

2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)最新文献

英文 中文
Asynchronous sub-threshold ultra-low power processor 异步亚阈值超低功耗处理器
R. Diamant, R. Ginosar, C. Sotiriou
Ultra low power VLSI circuits may enable applications such as medical implants, sensor networks and “things” for IoT. Aggressive supply voltage scaling is known to significantly improve power consumption and efficiency, but incurs both performance degradation and high delay variations. We illustrate that the most energy efficient operating point of a pipelined MIPS CPU lies in the deep sub-threshold region. We investigate the optimal selection of technology node, process variant and transistor type, and compare synchronous and asynchronous designs. We identify the optimal performance/power ratio design point for the 28nm high-k metal-gate high-performance process with high VT transistors and a bundled-data asynchronous design style to efficiently accommodate delay variations. We illustrate a 7.4× power efficiency improvement potential for the CPU, coupled with a reduction in power consumption by more than one thousand, relative to a synchronous CPU operating at nominal voltage. The asynchronous sub-threshold MIPS CPU designed in this work is compared with other commercial and research CPUs, and is shown to achieve superior power efficiency.
超低功耗VLSI电路可以实现医疗植入物、传感器网络和物联网“事物”等应用。众所周知,积极的电源电压缩放可以显著提高功耗和效率,但会导致性能下降和高延迟变化。我们说明了流水线式MIPS CPU最节能的工作点位于深亚阈值区域。我们研究了技术节点、工艺变量和晶体管类型的最佳选择,并比较了同步和异步设计。我们确定了28nm高k金属栅极高性能工艺的最佳性能/功率比设计点,该工艺具有高VT晶体管和绑定数据异步设计风格,以有效地适应延迟变化。我们举例说明了与在标称电压下工作的同步CPU相比,CPU的功率效率提高了7.4倍,同时功耗降低了一千多倍。本文设计的异步亚阈值MIPS CPU与其他商用和研究用CPU进行了比较,结果表明其具有优越的功耗效率。
{"title":"Asynchronous sub-threshold ultra-low power processor","authors":"R. Diamant, R. Ginosar, C. Sotiriou","doi":"10.1109/PATMOS.2015.7347592","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347592","url":null,"abstract":"Ultra low power VLSI circuits may enable applications such as medical implants, sensor networks and “things” for IoT. Aggressive supply voltage scaling is known to significantly improve power consumption and efficiency, but incurs both performance degradation and high delay variations. We illustrate that the most energy efficient operating point of a pipelined MIPS CPU lies in the deep sub-threshold region. We investigate the optimal selection of technology node, process variant and transistor type, and compare synchronous and asynchronous designs. We identify the optimal performance/power ratio design point for the 28nm high-k metal-gate high-performance process with high VT transistors and a bundled-data asynchronous design style to efficiently accommodate delay variations. We illustrate a 7.4× power efficiency improvement potential for the CPU, coupled with a reduction in power consumption by more than one thousand, relative to a synchronous CPU operating at nominal voltage. The asynchronous sub-threshold MIPS CPU designed in this work is compared with other commercial and research CPUs, and is shown to achieve superior power efficiency.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116647531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Response time schedulability analysis for hard real-time systems accounting DVFS latency on heterogeneous cluster-based platform 考虑异构集群平台上DVFS延迟的硬实时系统响应时间可调度性分析
E. Valentin, Mário Salvatierra, Rosiane de Freitas, R. Barreto
The power wall is a barrier to improving the processor design process due to the power consumption of components. The usage of heterogeneous multicore platforms is appealing for applications, e.g. hard real-time systems, owing to the potential reduced energy consumption offered by such platforms. However, hard real-time systems are present in life critical environments and reducing the energy consumption on such systems is an onerous and complex process. This paper assesses the problem of providing response time schedulability conditions for hard real-time systems on cluster-based platforms. We extend the existing theory with a novel schedulability test that accounts for the natural latency inherited from the usage of DVFS. We also compare our approach with state of the art methods by means of empirical experiments. Our proposed response time schedulability test avoids up to 99% false positive and false negative errors observed in the well known schedulability analyses' literature.
由于元件的功耗,功率墙是改进处理器设计过程的一个障碍。异构多核平台的使用对诸如硬实时系统等应用具有吸引力,因为这些平台可能会降低能源消耗。然而,硬实时系统存在于生命关键环境中,降低此类系统的能耗是一个繁重而复杂的过程。本文研究了集群平台上硬实时系统响应时间可调度性条件的提供问题。我们用一种新的可调度性测试来扩展现有的理论,该测试解释了DVFS使用所继承的自然延迟。我们还通过经验实验将我们的方法与最先进的方法进行了比较。我们提出的响应时间可调度性测试避免了在众所周知的可调度性分析文献中观察到的高达99%的假阳性和假阴性错误。
{"title":"Response time schedulability analysis for hard real-time systems accounting DVFS latency on heterogeneous cluster-based platform","authors":"E. Valentin, Mário Salvatierra, Rosiane de Freitas, R. Barreto","doi":"10.1109/PATMOS.2015.7347580","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347580","url":null,"abstract":"The power wall is a barrier to improving the processor design process due to the power consumption of components. The usage of heterogeneous multicore platforms is appealing for applications, e.g. hard real-time systems, owing to the potential reduced energy consumption offered by such platforms. However, hard real-time systems are present in life critical environments and reducing the energy consumption on such systems is an onerous and complex process. This paper assesses the problem of providing response time schedulability conditions for hard real-time systems on cluster-based platforms. We extend the existing theory with a novel schedulability test that accounts for the natural latency inherited from the usage of DVFS. We also compare our approach with state of the art methods by means of empirical experiments. Our proposed response time schedulability test avoids up to 99% false positive and false negative errors observed in the well known schedulability analyses' literature.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125426486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Better-than-voltage scaling energy reduction in approximate SRAMs via bit dropping and bit reuse 通过位下降和位重用,在近似sram中降低优于电压的缩放能量
F. Frustaci, D. Blaauw, D. Sylvester, M. Alioto
This paper explores the effectiveness of different knobs to dynamically trade energy consumption with output quality in approximate SRAMs for error-tolerant applications (such as video). Leveraging the different impact of errors on quality at most significant bit (MSB) and least significant bit (LSB) positions, energy savings higher than those provided by simple voltage scaling are enabled. Firstly, a comparison of two techniques, dual-VDD and LSB dropping, is carried out showing that the latter is preferable thanks to its intrinsic simplicity and more pronounced energy savings. Secondly, a selective Error Correction Code (ECC) technique which reuses the LSBs as check bits to protect MSBs is investigated. Measurements on a 28nm CMOS 32kb SRAM show that bit dropping and bit reuse achieve an energy reduction of up to 33% and 28%, compared to simple voltage scaling at iso-quality. When combined together, the two techniques achieve a better energy saving (40%) and a supply voltage reduction of about 100mV at iso-quality. Finally, guidelines to select the energy-optimal combination of the two techniques are provided for a given quality target.
本文探讨了在容错应用(如视频)中,在近似sram中动态交换能耗与输出质量的不同旋钮的有效性。利用误差对最高有效位(MSB)和最低有效位(LSB)位置质量的不同影响,可以实现比简单电压缩放提供的更高的节能。首先,对双vdd和LSB下降两种技术进行了比较,表明后者由于其固有的简单性和更明显的节能而更可取。其次,研究了一种选择性纠错码(ECC)技术,该技术重用lsdb作为校验位来保护msb。在28nm CMOS 32kb SRAM上的测量表明,与在等质量下的简单电压缩放相比,位下降和位重用可实现高达33%和28%的能量减少。当结合在一起时,这两种技术实现了更好的节能(40%),并在等质量下降低了约100mV的电源电压。最后,针对给定的质量目标,给出了选择两种技术能量最优组合的准则。
{"title":"Better-than-voltage scaling energy reduction in approximate SRAMs via bit dropping and bit reuse","authors":"F. Frustaci, D. Blaauw, D. Sylvester, M. Alioto","doi":"10.1109/PATMOS.2015.7347598","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347598","url":null,"abstract":"This paper explores the effectiveness of different knobs to dynamically trade energy consumption with output quality in approximate SRAMs for error-tolerant applications (such as video). Leveraging the different impact of errors on quality at most significant bit (MSB) and least significant bit (LSB) positions, energy savings higher than those provided by simple voltage scaling are enabled. Firstly, a comparison of two techniques, dual-VDD and LSB dropping, is carried out showing that the latter is preferable thanks to its intrinsic simplicity and more pronounced energy savings. Secondly, a selective Error Correction Code (ECC) technique which reuses the LSBs as check bits to protect MSBs is investigated. Measurements on a 28nm CMOS 32kb SRAM show that bit dropping and bit reuse achieve an energy reduction of up to 33% and 28%, compared to simple voltage scaling at iso-quality. When combined together, the two techniques achieve a better energy saving (40%) and a supply voltage reduction of about 100mV at iso-quality. Finally, guidelines to select the energy-optimal combination of the two techniques are provided for a given quality target.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122304814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Energy management via PI control for data parallel applications with throughput constraints 通过PI控制对具有吞吐量限制的数据并行应用程序进行能源管理
A. Molnos, W. Lombardi, D. Puschini, Julien Mottin, S. Lesecq, A. Tonda
This paper presents a new proportional-integral (PI) controller that sets the operating point of computing tiles in a system on chip (SoC). We address data-parallel applications with throughput constraints. The controller settings are investigated for application configurations with different QoS levels and different buffer sizes. The control method is evaluated on a test chip with four tiles executing a realistic HMAX object recognition application. Experimental results suggest that the proposed controller outperforms the state-of-the-art results: it attains, on average, 25% less number of frequency switches and has slightly higher energy savings. The reduction in number of frequency switches is important because it decreases the involved overhead. In addition, the PI controller meets the throughput constraint in cases where other approaches fail.
本文提出了一种新的比例积分(PI)控制器,用于设定片上系统(SoC)中计算块的工作点。我们处理具有吞吐量限制的数据并行应用程序。针对具有不同QoS级别和不同缓冲区大小的应用程序配置,研究了控制器设置。在一个测试芯片上对控制方法进行了评估,该芯片具有四个块,执行了一个现实的HMAX对象识别应用程序。实验结果表明,所提出的控制器优于最先进的结果:它平均减少了25%的频率开关数量,并且节能程度略高。频率开关数量的减少很重要,因为它减少了相关的开销。此外,在其他方法失败的情况下,PI控制器满足吞吐量约束。
{"title":"Energy management via PI control for data parallel applications with throughput constraints","authors":"A. Molnos, W. Lombardi, D. Puschini, Julien Mottin, S. Lesecq, A. Tonda","doi":"10.1109/PATMOS.2015.7347588","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347588","url":null,"abstract":"This paper presents a new proportional-integral (PI) controller that sets the operating point of computing tiles in a system on chip (SoC). We address data-parallel applications with throughput constraints. The controller settings are investigated for application configurations with different QoS levels and different buffer sizes. The control method is evaluated on a test chip with four tiles executing a realistic HMAX object recognition application. Experimental results suggest that the proposed controller outperforms the state-of-the-art results: it attains, on average, 25% less number of frequency switches and has slightly higher energy savings. The reduction in number of frequency switches is important because it decreases the involved overhead. In addition, the PI controller meets the throughput constraint in cases where other approaches fail.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114946947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
VLSI architecture design and implementation of a LDPC encoder for the IEEE 802.22 WRAN standard 针对IEEE 802.22 WRAN标准的VLSI架构设计并实现了LDPC编码器
Nelson Alves Ferreira Neto, J. Oliveira, Wagner Oliveira, Joao Carlos Bittencourt
This paper presents two architectures for the Low Density Parity Check (LDPC) encoder, the first one based on a fully serial approach and the second one in a mixed way, as well as their respective realizations in ASIC. The proposed designs are capable of operating in 84 combinations of code rate and word size, according to the IEEE 802.22 Wireless Regional Area Network (WRAN) standard, aiming low power and small area. Although the proposed architectures are primarily designed for the mentioned standard, they can be easily adapted to other wireless broadband standards.
本文介绍了低密度奇偶校验(LDPC)编码器的两种架构,第一种基于全串行方式,第二种基于混合方式,以及它们各自在ASIC中的实现。根据IEEE 802.22无线区域网络(WRAN)标准,拟议的设计能够以84种码率和字长组合运行,旨在低功耗和小面积。虽然所提出的架构主要是为上述标准设计的,但它们可以很容易地适应其他无线宽带标准。
{"title":"VLSI architecture design and implementation of a LDPC encoder for the IEEE 802.22 WRAN standard","authors":"Nelson Alves Ferreira Neto, J. Oliveira, Wagner Oliveira, Joao Carlos Bittencourt","doi":"10.1109/PATMOS.2015.7347589","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347589","url":null,"abstract":"This paper presents two architectures for the Low Density Parity Check (LDPC) encoder, the first one based on a fully serial approach and the second one in a mixed way, as well as their respective realizations in ASIC. The proposed designs are capable of operating in 84 combinations of code rate and word size, according to the IEEE 802.22 Wireless Regional Area Network (WRAN) standard, aiming low power and small area. Although the proposed architectures are primarily designed for the mentioned standard, they can be easily adapted to other wireless broadband standards.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132448292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
ABeeMap: A mapping algorithm based on multi-objective Artificial Bee Colony ABeeMap:一种基于多目标人工蜂群的映射算法
V. L. Souza, A. Silva-Filho, V. C. Wanderely
This paper presents the ABeeMap, a new approach to FPGA technology mapping. The mapper is based on a hybrid approach that uses pareto-dominance based asynchronous multi-objective Artificial Bee Colony associated with specific heuristics of the problem in order to find better trade-off results among area, performance and power consumption. In a set of 20 designs, we find that in comparison to state-of-the-art technology mapping, our approach is able to reduce the LUT counts and the edge counts. Placing and routing the resulting netlist leads to reduction in the configurable logic blocks count, increasing in estimated operation frequency and reduction in energy consumption.
本文提出了一种新的FPGA技术映射方法ABeeMap。该映射器基于一种混合方法,使用基于帕累托优势的异步多目标人工蜂群,并结合问题的特定启发式,以在面积、性能和功耗之间找到更好的权衡结果。在一组20个设计中,我们发现,与最先进的技术映射相比,我们的方法能够减少LUT计数和边缘计数。放置和路由所得到的网表可以减少可配置逻辑块的数量,增加估计的操作频率并降低能耗。
{"title":"ABeeMap: A mapping algorithm based on multi-objective Artificial Bee Colony","authors":"V. L. Souza, A. Silva-Filho, V. C. Wanderely","doi":"10.1109/PATMOS.2015.7347582","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347582","url":null,"abstract":"This paper presents the ABeeMap, a new approach to FPGA technology mapping. The mapper is based on a hybrid approach that uses pareto-dominance based asynchronous multi-objective Artificial Bee Colony associated with specific heuristics of the problem in order to find better trade-off results among area, performance and power consumption. In a set of 20 designs, we find that in comparison to state-of-the-art technology mapping, our approach is able to reduce the LUT counts and the edge counts. Placing and routing the resulting netlist leads to reduction in the configurable logic blocks count, increasing in estimated operation frequency and reduction in energy consumption.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133999411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Energy efficiency of Zipf traffic distributions within Facebook's data center fabric architecture Zipf流量分布在Facebook数据中心结构架构中的能源效率
L. Durbeck, J. Tront, N. Macias
Open architectures like the one recently unveiled by Facebook allow a detailed assessment of the energy efficiency of commercial data centers. This paper explores the fit of Zipf-like distributions typical of network traffic, to updates of user pages and the entity graph, for the new Facebook data center network architecture. We find that network resource consumption could be reduced by as much as 40-50% through several changes, either to the software, or to the data center design. Of these, employing a connected hub-and-spoke subgraph representation for each popular node, with each pod operating locally on its node of the subgraph, appears to hold the most energy savings potential. This work is part of a larger effort to more completely characterize the efficiency of data center computer-and network architectures beyond the normal reporting of facility power utilization efficiency (PUE), which is blind to energy proportionality and other aspects of the efficiency within the computer- and network architecture, or IT portion, of the data center.
Facebook最近推出的开放式架构允许对商业数据中心的能源效率进行详细评估。本文探讨了类似zipf分布的典型网络流量,用户页面和实体图的更新,对于新的Facebook数据中心网络架构的适合性。我们发现,通过对软件或数据中心设计进行一些更改,网络资源消耗可以减少多达40-50%。其中,为每个流行节点采用连接的轮辐子图表示,每个pod在子图的其节点上本地运行,似乎具有最大的节能潜力。这项工作是一项更大的努力的一部分,目的是更全面地描述数据中心计算机和网络体系结构的效率,而不是常规的设施电力利用效率(PUE)报告,后者忽略了数据中心计算机和网络体系结构(或IT部分)内的能源比例和效率的其他方面。
{"title":"Energy efficiency of Zipf traffic distributions within Facebook's data center fabric architecture","authors":"L. Durbeck, J. Tront, N. Macias","doi":"10.1109/PATMOS.2015.7347601","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347601","url":null,"abstract":"Open architectures like the one recently unveiled by Facebook allow a detailed assessment of the energy efficiency of commercial data centers. This paper explores the fit of Zipf-like distributions typical of network traffic, to updates of user pages and the entity graph, for the new Facebook data center network architecture. We find that network resource consumption could be reduced by as much as 40-50% through several changes, either to the software, or to the data center design. Of these, employing a connected hub-and-spoke subgraph representation for each popular node, with each pod operating locally on its node of the subgraph, appears to hold the most energy savings potential. This work is part of a larger effort to more completely characterize the efficiency of data center computer-and network architectures beyond the normal reporting of facility power utilization efficiency (PUE), which is blind to energy proportionality and other aspects of the efficiency within the computer- and network architecture, or IT portion, of the data center.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"366 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133472768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Combining Pel Decimation with Partial Distortion Elimination to increase SAD energy efficiency 将Pel抽取与部分失真消除相结合提高SAD能量效率
Ismael Seidel, André Beims Bräscher, José Luís Almada Güntzel
The most energy-hungry step of Video Coding (VC) is the Block Matching Algorithm (BMA), even when a simple similarity metric such as the Sum of Absolute Differences (SAD) is employed. Moreover, with the increasing resolutions supported by state-of-the-art VC standards (H.264/AVC, HEVC and VP9), the SAD must be as energy-efficient as possible to increase the battery lifetime in portable mobile devices. Two well-known techniques to decrease the number of operations in SAD calculation are Pel Decimation and Partial Distortion Elimination (PDE). The energy savings provided by the former are dictated by the chosen decimation ratio and comes with a cost in coding efficiency. For the latter, energy savings have no cost in coding efficiency but are dictated by the video content and search parameters. In this work we present two configurable SAD4×4 architectures: one designed to dynamically operate using one among four Pel Decimation ratios (1:1, 4:3, 2:1 or 4:1) and the other one able to use PDE in addition to Pel Decimation. We simulated Pel Decimation and PDE behavior during motion estimation using 22 video samples from the Common Test Conditions (CTC) encoded using 4 different quantization parameters (QPs). Thus, this simulation was performed over 5.82×1012 PDE SADs. The Pel Decimation impacts are shown in terms of Bjøntegaard Delta (BD)-Rate, ranging from 3.16% (1:1 ratio) up to 21.94% (4:1). In addition, we found that by using PDE solely (i.e., without Pel Decimation) one can reduce from 10 to 6.38 (in average) the number of required cycles to calculate one SAD. To show the improvements in terms of energy, we synthesized both presented architectures using a 45nm standard cell library. Finally, the use of PDE can improve energy efficiency more than Pel Decimation alone, without coding efficiency degradation.
视频编码(VC)中最耗能的步骤是块匹配算法(BMA),即使使用简单的相似性度量,如绝对差和(SAD)。此外,随着最先进的VC标准(H.264/AVC, HEVC和VP9)支持的分辨率不断提高,SAD必须尽可能节能,以延长便携式移动设备的电池寿命。在SAD计算中减少运算次数的两种著名技术是Pel Decimation和Partial Distortion Elimination (PDE)。前者提供的节能取决于所选择的抽取比率,并以编码效率为代价。对于后者,节省的能量在编码效率上没有成本,但取决于视频内容和搜索参数。在这项工作中,我们提出了两种可配置的SAD4×4架构:一种设计用于使用四种Pel Decimation比率(1:1,4:3,2:1或4:1)中的一种动态操作,另一种能够使用PDE和Pel Decimation。我们使用22个来自Common Test Conditions (CTC)的视频样本,使用4种不同的量化参数(QPs)编码,模拟了运动估计过程中的Pel Decimation和PDE行为。因此,该模拟是在5.82×1012 PDE SADs上进行的。Pel Decimation的影响以Bjøntegaard Delta (BD)-Rate表示,范围从3.16%(1:1)到21.94%(4:1)。此外,我们发现,通过单独使用PDE(即,不使用Pel Decimation),可以将计算一个SAD所需的周期数从10减少到6.38(平均)。为了展示在能量方面的改进,我们使用45nm标准细胞库合成了这两种结构。最后,使用PDE比单独使用Pel Decimation更能提高能源效率,而不会降低编码效率。
{"title":"Combining Pel Decimation with Partial Distortion Elimination to increase SAD energy efficiency","authors":"Ismael Seidel, André Beims Bräscher, José Luís Almada Güntzel","doi":"10.1109/PATMOS.2015.7347604","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347604","url":null,"abstract":"The most energy-hungry step of Video Coding (VC) is the Block Matching Algorithm (BMA), even when a simple similarity metric such as the Sum of Absolute Differences (SAD) is employed. Moreover, with the increasing resolutions supported by state-of-the-art VC standards (H.264/AVC, HEVC and VP9), the SAD must be as energy-efficient as possible to increase the battery lifetime in portable mobile devices. Two well-known techniques to decrease the number of operations in SAD calculation are Pel Decimation and Partial Distortion Elimination (PDE). The energy savings provided by the former are dictated by the chosen decimation ratio and comes with a cost in coding efficiency. For the latter, energy savings have no cost in coding efficiency but are dictated by the video content and search parameters. In this work we present two configurable SAD4×4 architectures: one designed to dynamically operate using one among four Pel Decimation ratios (1:1, 4:3, 2:1 or 4:1) and the other one able to use PDE in addition to Pel Decimation. We simulated Pel Decimation and PDE behavior during motion estimation using 22 video samples from the Common Test Conditions (CTC) encoded using 4 different quantization parameters (QPs). Thus, this simulation was performed over 5.82×1012 PDE SADs. The Pel Decimation impacts are shown in terms of Bjøntegaard Delta (BD)-Rate, ranging from 3.16% (1:1 ratio) up to 21.94% (4:1). In addition, we found that by using PDE solely (i.e., without Pel Decimation) one can reduce from 10 to 6.38 (in average) the number of required cycles to calculate one SAD. To show the improvements in terms of energy, we synthesized both presented architectures using a 45nm standard cell library. Finally, the use of PDE can improve energy efficiency more than Pel Decimation alone, without coding efficiency degradation.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121283865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Wideband dynamic voltage sensing mechanism for EH systems EH系统的宽带动态电压传感机制
K. Gao, Y. Xu, D. Shang, Fei Xia, A. Yakovlev
In Energy Harvesting (EH) scenarios, the `survival zone' pertains to the state of power supply with insufficient energy to provide a nominal and stable Vdd. In this situation the system Vdd tends to be low and to vary over a wide band. Benefits can be had if the system can already function to some degree under survival zone conditions. Such functionalities may include providing control to improve the efficiency of power processing units and starting the computation load for light but crucial survival-related tasks. Knowledge of the Vdd is often indispensable for running these types of survival zone functionalities. A novel low-power voltage sensing scheme for EH based electronic systems is proposed to function in the survival zone to provide this vital Vdd information. The method is derived by combining voltage controlled delays and simple circuits to implement time comparison. This paper describes the design, implementation and analysis of this sensing subsystem, which itself draws power from the variable and low Vdd which it is sensing.
在能量收集(EH)场景中,“生存区”是指能量不足,无法提供标称且稳定的Vdd的电源状态。在这种情况下,系统Vdd趋向于较低,并在较宽的频带内变化。如果系统已经能够在某种程度上在生存区条件下运行,则可以获得好处。这些功能可能包括提供控制以提高功率处理单元的效率,并启动轻量级但关键的生存相关任务的计算负载。了解Vdd对于运行这些类型的生存区功能通常是必不可少的。提出了一种新的基于EH的电子系统的低功耗电压传感方案,该方案在生存区工作,以提供重要的Vdd信息。该方法采用压控延迟和简单电路相结合的方法来实现时间比较。本文描述了该传感子系统的设计、实现和分析,该传感子系统本身从其所感知的可变和低Vdd中获取功率。
{"title":"Wideband dynamic voltage sensing mechanism for EH systems","authors":"K. Gao, Y. Xu, D. Shang, Fei Xia, A. Yakovlev","doi":"10.1109/PATMOS.2015.7347605","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347605","url":null,"abstract":"In Energy Harvesting (EH) scenarios, the `survival zone' pertains to the state of power supply with insufficient energy to provide a nominal and stable Vdd. In this situation the system Vdd tends to be low and to vary over a wide band. Benefits can be had if the system can already function to some degree under survival zone conditions. Such functionalities may include providing control to improve the efficiency of power processing units and starting the computation load for light but crucial survival-related tasks. Knowledge of the Vdd is often indispensable for running these types of survival zone functionalities. A novel low-power voltage sensing scheme for EH based electronic systems is proposed to function in the survival zone to provide this vital Vdd information. The method is derived by combining voltage controlled delays and simple circuits to implement time comparison. This paper describes the design, implementation and analysis of this sensing subsystem, which itself draws power from the variable and low Vdd which it is sensing.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125299827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Energy-efficient Level Shifter topology 节能电平移位拓扑
Roger Caputo-Llanos, Diego V. S. Sousa, M. Terres, G. Bontorin, R. Reis, M. Johann
Level Shifters (LS) are essential components of integrated circuits with multiple power supply. They work as voltage scaling interfaces between different power domains. In this paper, we present an energy-efficient level shifter with low area topology. It requires only one power rail and can operate nearby the threshold voltage. We validated the proposed topology with simulations on an IBM 130nm CMOS technology. We compared our topology with traditional LS, like the Differential Cascode Voltage Switch (DCVS) or the Puri's topology. The proposed topology requires up to 93.79% less energy under certain conditions. It presented 88.03% smaller delay and 39.6% less Power-Delay Product (PDP) when compared to the DCVS topology. In contrast with the Puri's level shifter, we obtained a reduction of 32.08% in power consumption, 13.26% smaller delay and 15.37% lower PDP. In addition, our level shifter was the only one capable to work at 35% of the nominal supply.
电平转换器(LS)是多电源集成电路的重要组成部分。它们作为不同功率域之间的电压缩放接口。本文提出了一种具有低面积拓扑结构的高能效电平移位器。它只需要一个电源轨,可以在阈值电压附近工作。我们通过在IBM 130nm CMOS技术上的仿真验证了所提出的拓扑结构。我们将我们的拓扑结构与传统的LS进行了比较,如差分级联电压开关(DCVS)或Puri的拓扑结构。在某些条件下,所提出的拓扑需要的能量减少高达93.79%。与DCVS拓扑相比,它的延迟降低了88.03%,功率延迟积(PDP)降低了39.6%。与Puri的电平移位器相比,我们的功耗降低了32.08%,延迟降低了13.26%,PDP降低了15.37%。此外,我们的电平移位器是唯一一个能够在35%的标称供应下工作的。
{"title":"Energy-efficient Level Shifter topology","authors":"Roger Caputo-Llanos, Diego V. S. Sousa, M. Terres, G. Bontorin, R. Reis, M. Johann","doi":"10.1109/PATMOS.2015.7347600","DOIUrl":"https://doi.org/10.1109/PATMOS.2015.7347600","url":null,"abstract":"Level Shifters (LS) are essential components of integrated circuits with multiple power supply. They work as voltage scaling interfaces between different power domains. In this paper, we present an energy-efficient level shifter with low area topology. It requires only one power rail and can operate nearby the threshold voltage. We validated the proposed topology with simulations on an IBM 130nm CMOS technology. We compared our topology with traditional LS, like the Differential Cascode Voltage Switch (DCVS) or the Puri's topology. The proposed topology requires up to 93.79% less energy under certain conditions. It presented 88.03% smaller delay and 39.6% less Power-Delay Product (PDP) when compared to the DCVS topology. In contrast with the Puri's level shifter, we obtained a reduction of 32.08% in power consumption, 13.26% smaller delay and 15.37% lower PDP. In addition, our level shifter was the only one capable to work at 35% of the nominal supply.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116611923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1