2009 IEEE International Conference on Computer Design最新文献

英文中文

WHOLE: A low energy I-Cache with separate way history WHOLE:具有独立历史的低能量I-Cache

2009 IEEE International Conference on Computer Design

Pub Date : 2009-10-04 DOI: 10.1109/ICCD.2009.5413162

Zichao Xie, Dong Tong, Xu Cheng

Set-associative instruction caches achieve low miss rates at the expense of significant energy dissipation. Previous energy-efficient approaches usually suffer from performance degradation and redundant extension bits. In this paper, we propose a Way History Oriented Low Energy Instruction Cache (WHOLE-Cache) design for single issue and in-order execution processors. The WHOLE-Cache design not only achieves a significant portion of energy reduction by effectively reducing dynamic energy dissipation of set-associative instruction cache, but also leads to no additional cycle penalties. Tag comparison results are stored into either the Branch Target Buffer (BTB) or the Instruction Cache (I-Cache) to avoid tag checks and unnecessary way activation for subsequent accesses to visited cache lines. The extended BTB uses way history bits for branch instructions, while the I-Cache extension bits are used in case of fetching consecutive instructions resided in different cache lines. A valid flag is associated with each stored tag comparison result to indicate whether the instruction to be fetched is resided in the recorded location. A simple invalidation scheme is implemented in the cache miss replacement operation. Whenever a cache line is replaced, the pointers to it, which reside in the BTB or other I-cache lines, will be invalidated accordingly. We model the WHOLE-Cache design in Verilog. By deriving basic parameters from TSMC 65nm technology, we use Wattch simulator to evaluate the performance and energy reduction of the WHOLE-Cache in the instruction fetch stage. We use SPEC2000 and Mediabench as benchmarks. It is observed that compared with a conventional 4-way set-associative I-Cache, the energy consumption of the WHOLE-Cache is reduced by 65% without any performance penalty.

集合关联指令缓存以消耗大量能量为代价实现了低缺失率。以前的节能方法通常存在性能下降和冗余扩展位的问题。在本文中，我们提出了一种针对单问题和顺序执行处理器的面向历史的低能量指令缓存(WHOLE-Cache)设计。整体缓存设计不仅通过有效地减少集合关联指令缓存的动态能量耗散实现了显著的能量降低，而且没有额外的周期损失。标签比较结果存储在分支目标缓冲区(BTB)或指令缓存(I-Cache)中，以避免标签检查和不必要的方式激活，以便后续访问已访问的缓存行。扩展的BTB为分支指令使用way历史位，而I-Cache扩展位用于获取驻留在不同缓存行的连续指令。一个有效的标志与每个存储的标记比较结果相关联，以指示要获取的指令是否位于记录的位置。在缓存缺失替换操作中实现了一个简单的无效方案。每当缓存线被替换时，驻留在BTB或其他I-cache线中的指向它的指针将相应地失效。我们在Verilog中对整个缓存设计进行建模。通过获得台积电65nm工艺的基本参数，我们使用watch模拟器来评估WHOLE-Cache在指令提取阶段的性能和能耗。我们使用SPEC2000和mediabbench作为基准。观察到，与传统的4路集合关联I-Cache相比，全缓存的能耗降低了65%，而没有任何性能损失。

{"title":"WHOLE: A low energy I-Cache with separate way history","authors":"Zichao Xie, Dong Tong, Xu Cheng","doi":"10.1109/ICCD.2009.5413162","DOIUrl":"https://doi.org/10.1109/ICCD.2009.5413162","url":null,"abstract":"Set-associative instruction caches achieve low miss rates at the expense of significant energy dissipation. Previous energy-efficient approaches usually suffer from performance degradation and redundant extension bits. In this paper, we propose a Way History Oriented Low Energy Instruction Cache (WHOLE-Cache) design for single issue and in-order execution processors. The WHOLE-Cache design not only achieves a significant portion of energy reduction by effectively reducing dynamic energy dissipation of set-associative instruction cache, but also leads to no additional cycle penalties. Tag comparison results are stored into either the Branch Target Buffer (BTB) or the Instruction Cache (I-Cache) to avoid tag checks and unnecessary way activation for subsequent accesses to visited cache lines. The extended BTB uses way history bits for branch instructions, while the I-Cache extension bits are used in case of fetching consecutive instructions resided in different cache lines. A valid flag is associated with each stored tag comparison result to indicate whether the instruction to be fetched is resided in the recorded location. A simple invalidation scheme is implemented in the cache miss replacement operation. Whenever a cache line is replaced, the pointers to it, which reside in the BTB or other I-cache lines, will be invalidated accordingly. We model the WHOLE-Cache design in Verilog. By deriving basic parameters from TSMC 65nm technology, we use Wattch simulator to evaluate the performance and energy reduction of the WHOLE-Cache in the instruction fetch stage. We use SPEC2000 and Mediabench as benchmarks. It is observed that compared with a conventional 4-way set-associative I-Cache, the energy consumption of the WHOLE-Cache is reduced by 65% without any performance penalty.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133559876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A fast routability- and performance-driven droplet routing algorithm for digital microfluidic biochips 数字微流控生物芯片的快速可达性和性能驱动的液滴路由算法

2009 IEEE International Conference on Computer Design

Pub Date : 2009-10-04 DOI: 10.1109/ICCD.2009.5413119

Tsung-Wei Huang, Tsung-Yi Ho

As the microfluidic technology advances, the design complexity of digital microfluidic biochips (DMFB) are expected to explode in the near future. One of the most critical challenges for DMFB design is the droplet routing problem, which schedules the movement of each droplet in a time-multiplexed manner. In this paper, we propose a fast routability- and performance-driven droplet router for DMFBs. The main contributions of our work are: (1) a global moving vector analysis for constructing preferred routing tracks to minimize the number of used unit cells; (2) an entropy-based equation to determine the routing order of droplets for better routability; (3) a routing compaction technique by dynamic programming to minimize the latest arrival time of droplets. Experimental results show that our algorithm achieves 100% routing completion for all test cases on three Benchmark Suites while the previous algorithms are not. In addition to routability, compared with the state-of-the-art high-performance routing on the Benchmark Suite I [3], the experimental results still show that our algorithm performed better in runtime by 40%, reduced the latest arrival time by 21%, reduced the used unit cells by 10%. Furthermore, experiment results on Benchmark Suite II and III are also very promising. Based on the evaluation of three Benchmark Suites, our algorithm demonstrates the efficiency and robustness of handling complex droplet routing problem over the existing algorithms.

随着微流控技术的发展，数字微流控生物芯片(DMFB)的设计复杂度有望在不久的将来呈爆炸式增长。DMFB设计中最关键的挑战之一是液滴路由问题，该问题以时间复用的方式调度每个液滴的运动。在本文中，我们提出了一种快速可达性和性能驱动的dmfb滴状路由器。我们工作的主要贡献是:(1)构建优选路由路径的全局移动向量分析，以最小化使用的单元格数量;(2)基于熵的方程来确定液滴的路由顺序，以获得更好的可达性;(3)采用动态规划的路由压缩技术，最大限度地减少液滴的最新到达时间。实验结果表明，我们的算法在三个基准套件的所有测试用例中都实现了100%的路由完成，而以前的算法则不能。除了可达性之外，与Benchmark Suite I上最先进的高性能路由相比[3]，实验结果仍然表明，我们的算法在运行时的性能提高了40%，最新到达时间减少了21%，使用的单元格减少了10%。此外，在Benchmark Suite II和III上的实验结果也很有前景。通过对三个基准套件的评估，我们的算法在处理复杂液滴路径问题上比现有算法具有更高的效率和鲁棒性。

{"title":"A fast routability- and performance-driven droplet routing algorithm for digital microfluidic biochips","authors":"Tsung-Wei Huang, Tsung-Yi Ho","doi":"10.1109/ICCD.2009.5413119","DOIUrl":"https://doi.org/10.1109/ICCD.2009.5413119","url":null,"abstract":"As the microfluidic technology advances, the design complexity of digital microfluidic biochips (DMFB) are expected to explode in the near future. One of the most critical challenges for DMFB design is the droplet routing problem, which schedules the movement of each droplet in a time-multiplexed manner. In this paper, we propose a fast routability- and performance-driven droplet router for DMFBs. The main contributions of our work are: (1) a global moving vector analysis for constructing preferred routing tracks to minimize the number of used unit cells; (2) an entropy-based equation to determine the routing order of droplets for better routability; (3) a routing compaction technique by dynamic programming to minimize the latest arrival time of droplets. Experimental results show that our algorithm achieves 100% routing completion for all test cases on three Benchmark Suites while the previous algorithms are not. In addition to routability, compared with the state-of-the-art high-performance routing on the Benchmark Suite I [3], the experimental results still show that our algorithm performed better in runtime by 40%, reduced the latest arrival time by 21%, reduced the used unit cells by 10%. Furthermore, experiment results on Benchmark Suite II and III are also very promising. Based on the evaluation of three Benchmark Suites, our algorithm demonstrates the efficiency and robustness of handling complex droplet routing problem over the existing algorithms.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123706556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 89

Statistical timing analysis based on simulation of lithographic process 基于光刻过程仿真的统计时序分析

2009 IEEE International Conference on Computer Design

Pub Date : 2009-10-04 DOI: 10.1109/ICCD.2009.5413181

Aswin Sreedhar, S. Kundu

The length of poly-gate printed on silicon depends on exposure dose, depth of focus, photo-resist thickness and planarity of the surface. In sub-wavelength lithography, polygate length also varies with layout topology. Poly-gate length determines the effective channel length of a transistor, which determines its performance. Since the sources of error are hard to control, statistical analysis can be used to measure the impact on circuit timing characteristics. Typical lithography-aware methodologies consider only systematic variation such as across chip linewidth variation (ACLV). In this paper we propose a statistical technique for timing yield prediction, based on variational lithography modeling of physical circuit layout. By statistically varying lithographic process parameters we estimate the difference in timing yield estimation of a design. Our simulation results show that if manufacturing process parameters follow a Gaussian distribution, resulting transistors follow a skewed normal distribution, where a greater number of them will have shorter channel length. This led us to investigate whether Statistical Static Timing Analysis (SSTA) is overly pessimistic. The baseline delay model assumed for SSTA in out approach is a Gaussian delay model fitted to skew normal distribution data obtained from statistical litho simulation. Our experiments showed that even after re-centering Gaussian delay model to fit the channel length data with minimum error, it is still overly pessimistic and significantly underestimates circuit performance.

印在硅上的多栅极的长度取决于曝光剂量、聚焦深度、光刻胶厚度和表面的平面度。在亚波长光刻中，多栅极长度也随布局拓扑而变化。多栅极长度决定了晶体管的有效通道长度，有效通道长度决定了晶体管的性能。由于误差来源难以控制，因此可以使用统计分析来测量其对电路时序特性的影响。典型的光刻敏感方法只考虑系统变化，如芯片线宽变化(ACLV)。在本文中，我们提出了一种基于变分光刻建模物理电路布局的定时良率预测的统计技术。通过统计变化的光刻工艺参数，我们估计了一个设计的定时良率估计的差异。仿真结果表明，如果制造工艺参数服从高斯分布，则晶体管服从偏态正态分布，其中晶体管数量越多，通道长度越短。这导致我们调查统计静态时间分析(SSTA)是否过于悲观。SSTA in - out方法的基线延迟模型是一个高斯延迟模型，拟合了统计岩性模拟得到的偏态正态分布数据。我们的实验表明，即使在重新定位高斯延迟模型以最小误差拟合信道长度数据后，它仍然过于悲观，并且显着低估了电路性能。

{"title":"Statistical timing analysis based on simulation of lithographic process","authors":"Aswin Sreedhar, S. Kundu","doi":"10.1109/ICCD.2009.5413181","DOIUrl":"https://doi.org/10.1109/ICCD.2009.5413181","url":null,"abstract":"The length of poly-gate printed on silicon depends on exposure dose, depth of focus, photo-resist thickness and planarity of the surface. In sub-wavelength lithography, polygate length also varies with layout topology. Poly-gate length determines the effective channel length of a transistor, which determines its performance. Since the sources of error are hard to control, statistical analysis can be used to measure the impact on circuit timing characteristics. Typical lithography-aware methodologies consider only systematic variation such as across chip linewidth variation (ACLV). In this paper we propose a statistical technique for timing yield prediction, based on variational lithography modeling of physical circuit layout. By statistically varying lithographic process parameters we estimate the difference in timing yield estimation of a design. Our simulation results show that if manufacturing process parameters follow a Gaussian distribution, resulting transistors follow a skewed normal distribution, where a greater number of them will have shorter channel length. This led us to investigate whether Statistical Static Timing Analysis (SSTA) is overly pessimistic. The baseline delay model assumed for SSTA in out approach is a Gaussian delay model fitted to skew normal distribution data obtained from statistical litho simulation. Our experiments showed that even after re-centering Gaussian delay model to fit the channel length data with minimum error, it is still overly pessimistic and significantly underestimates circuit performance.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125976631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Rapid early-stage microarchitecture design using predictive models 使用预测模型的快速早期微架构设计

2009 IEEE International Conference on Computer Design

Pub Date : 2009-10-04 DOI: 10.1109/ICCD.2009.5413141

Christophe Dubach, Timothy M. Jones, M. O’Boyle

The early-stage design of a new microprocessor involves the evaluation of a wide range of benchmarks across a large number of architectural configurations. Several methods are used to cut down on the required simulation time. Typically, however, existing approaches fail to capture true program behaviour accurately and require a non-negligible number of training simulations to be run. We address these problems by developing a machine learning model that predicts the mean of any given metric, e.g. cycles or energy, across a range of programs, for any microarchitectural configuration. It works by combining only the most representative programs from the benchmark suite based on their behaviour in the design space under consideration. We use our model to predict the mean performance, energy, energy-delay (ED) and energy-delay-squared (EDD) of the SPEC CPU 2000 and MiBench benchmark suites within our design space. We achieve the same level of accuracy as two state-of-the-art prediction techniques but require five times fewer training simulations. Furthermore, our technique is scalable and we show that, asymptotically, it requires an order of magnitude fewer simulations than these existing approaches.

新微处理器的早期设计涉及对大量体系结构配置的广泛基准进行评估。采用了几种方法来缩短所需的仿真时间。然而，通常情况下，现有的方法不能准确地捕获真实的程序行为，并且需要运行不可忽略的训练模拟。我们通过开发一个机器学习模型来解决这些问题，该模型可以预测任何给定度量的平均值，例如周期或能量，跨越一系列程序，适用于任何微架构配置。它的工作原理是根据基准套件中最具代表性的程序在考虑的设计空间中的行为来组合它们。我们使用我们的模型来预测在我们的设计空间内SPEC CPU 2000和MiBench基准套件的平均性能、能量、能量延迟(ED)和能量延迟平方(EDD)。我们达到了与两种最先进的预测技术相同的精度水平，但需要的训练模拟减少了五倍。此外，我们的技术是可扩展的，我们表明，渐近地，它需要比这些现有方法少一个数量级的模拟。

{"title":"Rapid early-stage microarchitecture design using predictive models","authors":"Christophe Dubach, Timothy M. Jones, M. O’Boyle","doi":"10.1109/ICCD.2009.5413141","DOIUrl":"https://doi.org/10.1109/ICCD.2009.5413141","url":null,"abstract":"The early-stage design of a new microprocessor involves the evaluation of a wide range of benchmarks across a large number of architectural configurations. Several methods are used to cut down on the required simulation time. Typically, however, existing approaches fail to capture true program behaviour accurately and require a non-negligible number of training simulations to be run. We address these problems by developing a machine learning model that predicts the mean of any given metric, e.g. cycles or energy, across a range of programs, for any microarchitectural configuration. It works by combining only the most representative programs from the benchmark suite based on their behaviour in the design space under consideration. We use our model to predict the mean performance, energy, energy-delay (ED) and energy-delay-squared (EDD) of the SPEC CPU 2000 and MiBench benchmark suites within our design space. We achieve the same level of accuracy as two state-of-the-art prediction techniques but require five times fewer training simulations. Furthermore, our technique is scalable and we show that, asymptotically, it requires an order of magnitude fewer simulations than these existing approaches.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125077774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Impact analysis of performance faults in modern microprocessors 现代微处理器性能故障的影响分析

2009 IEEE International Conference on Computer Design

Pub Date : 2009-10-04 DOI: 10.1109/ICCD.2009.5413171

Naghmeh Karimi, M. Maniatakos, C. Tirumurti, A. Jas, Y. Makris

Towards improving performance, modern microprocessors incorporate a variety of architectural features, such as branch prediction and speculative execution, which are not critical to the correctness of their operation. While faults in the corresponding hardware may not necessarily affect functional correctness, they may, nevertheless, adversely impact performance. In this paper, we investigate quantitatively the performance impact of such faults using a superscalar, dynamically-scheduled, out-of-order, Alpha-like microprocessor, on which we execute SPEC2000 integer benchmarks. We provide extensive fault simulation-based experimental results and we discuss how this information may guide the inclusion of additional hardware for performance loss recovery and yield enhancement.

为了提高性能，现代微处理器结合了各种体系结构特征，例如分支预测和推测执行，这些对其操作的正确性并不重要。虽然相应硬件中的故障不一定会影响功能的正确性，但它们可能会对性能产生不利影响。在本文中，我们使用一个超标量的、动态调度的、乱序的、类似alpha的微处理器，在其上执行SPEC2000整数基准测试，定量地研究了这些故障对性能的影响。我们提供了广泛的基于故障模拟的实验结果，并讨论了这些信息如何指导包含额外的硬件以恢复性能损失和提高产量。

引用次数: 8

Mid-range wireless energy transfer using inductive resonance for wireless sensors 无线传感器中使用感应共振的中程无线能量传输

2009 IEEE International Conference on Computer Design

Pub Date : 2009-10-04 DOI: 10.1109/ICCD.2009.5413106

S. J. Mazlouman, A. Mahanfar, B. Kaminska

Methods are suggested and tested to measure and optimize the wireless energy transfer efficiency for mid-range (10–100cm) inductive coils with relatively low profile using magnetic resonance. These coils can be used to provide energy for wireless sensors and battery-operated devices. It is shown that for every system, a resonance frequency can be identified where the wireless energy transfer efficiency is optimal. Several prototypes are developed and tested as a proof of validity of the proposed technique. It is also shown that by tuning to the optimum resonant frequency and designing proper matching circuitry, an efficiency of about 25% for moderate profiles can be achieved.

提出并测试了利用磁共振测量和优化低轮廓中程(10-100cm)电感线圈无线能量传输效率的方法。这些线圈可以用来为无线传感器和电池供电的设备提供能量。结果表明，对于每个系统，都可以确定一个无线能量传输效率最优的共振频率。开发并测试了几个原型，以证明所提出技术的有效性。通过调谐到最佳谐振频率并设计合适的匹配电路，可以实现约25%的中等轮廓效率。

引用次数: 45

Efficient architectures for elliptic curve cryptography processors for RFID 射频识别椭圆曲线密码处理器的高效架构

2009 IEEE International Conference on Computer Design

Pub Date : 2009-10-04 DOI: 10.1109/ICCD.2009.5413128

Lawrence Leinweber, C. Papachristou, F. Wolff

RFID tags will supplant barcodes for product identification in the supply chain. The capability of a tag to be read without a line of sight is its principal benefit, but compromises the privacy of the tag owner. Public key cryptography can restore this privacy. Because of the extreme economic constraints of the application, die area and power consumption for cryptographic functions must be minimized. Elliptic curve processors efficiently provide the cryptographic capability needed for RFID. This paper proposes efficient architectures for elliptic curve processors in GF(2m). One design requires six m-bit registers and six Galois field multiply operations per key bit. The other design requires five m-bit registers and seven Galois field multiply operations per key bit. These processors require a small number of circuit elements and clock cycles while providing protection from simple side-channel attacks. Synthesis results are presented for power, area, and delay in 250, 130 and 90 nm technologies. Compared with prior designs from the literature, the proposed processors require less area and energy. For the B-163 curve, with bit-serial multiplier, the first proposed design synthesized in an IBM low-power 130 nm technology requires an area of 9613 gate equivalents, 163,355 cycles and 4.14 µJ for an elliptic curve point multiplication. The other proposed design requires 8756 gate equivalents, 190,570 cycles and 4.19 µJ.

RFID标签将取代条形码在供应链中进行产品识别。在视线之外读取标签的能力是它的主要优点，但会损害标签所有者的隐私。公钥加密可以恢复这种隐私。由于应用的极端经济限制，必须最小化加密功能的芯片面积和功耗。椭圆曲线处理器有效地提供了RFID所需的加密能力。本文提出了GF(2m)中椭圆曲线处理器的高效架构。一种设计需要6个m位寄存器和每个键位6个伽罗瓦域乘法运算。另一种设计需要5个m位寄存器和每个键位7个伽罗瓦域乘法运算。这些处理器需要少量的电路元件和时钟周期，同时提供对简单侧信道攻击的保护。给出了250nm、130nm和90nm工艺的功耗、面积和延迟的综合结果。与文献中先前的设计相比，所提出的处理器需要更少的面积和能量。对于具有位串行乘法器的B-163曲线，采用IBM低功耗130 nm技术合成的首次提出的设计需要9613栅极当量的面积，163,355个周期和4.14µJ的椭圆曲线点乘法。另一种提出的设计需要8756个栅极等效，190,570个周期和4.19µJ。

{"title":"Efficient architectures for elliptic curve cryptography processors for RFID","authors":"Lawrence Leinweber, C. Papachristou, F. Wolff","doi":"10.1109/ICCD.2009.5413128","DOIUrl":"https://doi.org/10.1109/ICCD.2009.5413128","url":null,"abstract":"RFID tags will supplant barcodes for product identification in the supply chain. The capability of a tag to be read without a line of sight is its principal benefit, but compromises the privacy of the tag owner. Public key cryptography can restore this privacy. Because of the extreme economic constraints of the application, die area and power consumption for cryptographic functions must be minimized. Elliptic curve processors efficiently provide the cryptographic capability needed for RFID. This paper proposes efficient architectures for elliptic curve processors in GF(2m). One design requires six m-bit registers and six Galois field multiply operations per key bit. The other design requires five m-bit registers and seven Galois field multiply operations per key bit. These processors require a small number of circuit elements and clock cycles while providing protection from simple side-channel attacks. Synthesis results are presented for power, area, and delay in 250, 130 and 90 nm technologies. Compared with prior designs from the literature, the proposed processors require less area and energy. For the B-163 curve, with bit-serial multiplier, the first proposed design synthesized in an IBM low-power 130 nm technology requires an area of 9613 gate equivalents, 163,355 cycles and 4.14 µJ for an elliptic curve point multiplication. The other proposed design requires 8756 gate equivalents, 190,570 cycles and 4.19 µJ.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131605242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Online multiple error detection in crossbar nano-architectures 交叉棒纳米结构的在线多重误差检测

2009 IEEE International Conference on Computer Design

Pub Date : 2009-10-04 DOI: 10.1109/ICCD.2009.5413135

Navid Farazmand, M. Tahoori

Crossbar nano-architectures based on self-assembled nano-structures are promising alternatives for current CMOS technology, which is facing serious challenges for further down-scaling. One of the major challenges in this nanotechnology is elevated failure rate due to atomic device sizes and inherent lack of control in self-assembly fabrication. Therefore, high permanent and transient failure rates lead to multiple faults during lifetime operation of crossbar nano architectures. In this paper, we present a concurrent multiple error detection scheme for multistage crossbar nano-architectures based on dual-rail implementations of logic functions. We prove the detectability of all single faults as well as most classes of multiple faults in this scheme. Based on statistical multiple fault injection, we compare the proposed technique with other online error detection and masking techniques such as Triple Module Redundancy (TMR), duplication, and parity checking, in terms of fault coverage as well as area and delay overhead.

基于自组装纳米结构的交叉棒纳米结构是当前CMOS技术的一个有希望的替代方案，但其进一步缩小规模面临着严峻的挑战。这种纳米技术的主要挑战之一是由于原子器件尺寸和自组装制造中固有的缺乏控制而导致的故障率升高。因此，高的永久故障率和瞬态故障率会导致交叉杆纳米结构在使用寿命期间出现多种故障。在本文中，我们提出了一种基于逻辑功能双轨实现的多级交叉棒纳米结构并发多重错误检测方案。我们证明了该方案对所有的单故障和大多数类型的多故障都是可检测的。基于统计多故障注入，我们比较了所提出的技术与其他在线错误检测和屏蔽技术，如三模冗余(TMR)、复制和奇偶校验，在故障覆盖、面积和延迟开销方面。

引用次数: 3

FinFET-based dynamic power management of on-chip interconnection networks through adaptive back-gate biasing 基于finfet的自适应后门偏置片上互连网络动态电源管理

2009 IEEE International Conference on Computer Design

Pub Date : 2009-10-04 DOI: 10.1109/ICCD.2009.5413133

Chun-Yi Lee, N. Jha

On-chip interconnection networks are fast becoming significant power-consumers in high-performance chip multiprocessors (CMPs). Increased power consumption leads to more heat, adversely degrades system reliability, and may increase the cost of cooling IC packages. This situation becomes even worse as bulk CMOS scales further into the nanometer regime because of excessive leakage power due to short-channel effects. In this paper, we explore the use of FinFETs, which are promising substitutes for bulk CMOS at the 32nm node and beyond, to design on-chip network routers. We present a detailed design of a variable pipeline stage router (VPSR) targeted at FinFET technology. We employ a dynamic power management scheme, which we call adaptive back-gate biasing (ABGB), for FinFET implementations. We evaluate VPSR and ABGB on a simulation platform specifically designed for power and performance simulations for FinFET-based interconnection networks. The results show that VPSR is able to successfully adapt its power consumption to incoming traffic, with a resultant 20% reduction in power at almost no impact on latency.

片上互连网络正迅速成为高性能芯片多处理器(cmp)中重要的功耗消耗者。功耗增加会导致热量增加，降低系统可靠性，并可能增加IC封装的冷却成本。由于短通道效应导致的泄漏功率过大，当块体CMOS进一步扩展到纳米级时，这种情况变得更加严重。在本文中，我们探索了使用finfet来设计片上网络路由器，finfet是32nm及以上节点上批量CMOS的有前途的替代品。我们提出了一种针对FinFET技术的可变管道级路由器(VPSR)的详细设计。我们采用动态电源管理方案，我们称之为自适应后门偏置(ABGB)，用于FinFET实现。我们在专为基于finfet的互连网络的功率和性能模拟而设计的仿真平台上评估VPSR和ABGB。结果表明，VPSR能够成功地使其功耗适应传入流量，从而在几乎不影响延迟的情况下降低20%的功耗。

引用次数: 23

Reusing cached schedules in an out-of-order processor with in-order issue logic 在无序处理器中使用有序问题逻辑重用缓存的调度

2009 IEEE International Conference on Computer Design

Pub Date : 2009-10-04 DOI: 10.1109/ICCD.2009.5413146

Oscar Palomar, Toni Juan, J. Navarro

The complex and powerful out-of-order issue logic dismisses the repetitive nature of the code, unlike what caches or branch predictors do. We show that 90% of the cycles, the group of instructions selected by the issue logic belongs to just 13% of the total different groups issued: the issue logic of an out-of-order processor is constantly re-discovering what it has already found. To benefit from the repetitive nature of instruction issue, we move the scheduling logic after the commit stage, out of the critical path of execution. The schedules created there are cached and reused to feed a simple in-order issue logic, that could result in a higher frequency design. We present the complete design of our ReLaSch processor, that achieves the same average IPC than a conventional out-of-order processor, and a 1.56 speed-up over the IPC of an in-order processor. We actually surpass the out-of-order IPC in 23 out of 40 SPEC benchmarks, mainly because the broader vision of the code after the commit stage allows creating better schedules.

与缓存或分支预测器不同，复杂而强大的乱序问题逻辑消除了代码的重复性。我们表明，在90%的周期中，由问题逻辑选择的指令组只属于发出的总不同组的13%:无序处理器的问题逻辑不断地重新发现它已经找到的东西。为了从指令问题的重复性中获益，我们在提交阶段之后将调度逻辑移出执行的关键路径。在那里创建的调度被缓存和重用，以提供一个简单的有序问题逻辑，这可能导致更高频率的设计。我们展示了我们的ReLaSch处理器的完整设计，它实现了与传统无序处理器相同的平均IPC，并且比有序处理器的IPC加快了1.56。实际上，在40个SPEC基准测试中，我们有23个超过了无序IPC，这主要是因为在提交阶段之后，更广阔的代码视野允许创建更好的调度。

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2009 IEEE International Conference on Computer Design

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀