首页 > 最新文献

2008 IEEE International Conference on Computer Design最新文献

英文 中文
Power-aware soft error hardening via selective voltage scaling 通过选择性电压缩放实现功率感知软错误硬化
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751877
Kai-Chiang Wu, Diana Marculescu
Nanoscale integrated circuits are becoming increasingly sensitive to radiation-induced transient faults (soft errors) due to current technology scaling trends, such as shrinking feature sizes and reducing supply voltages. Soft errors, which have been a significant concern in memories, are now a main factor in reliability degradation of logic circuits. This paper presents a power-aware methodology using dual supply voltages for soft error hardening. Given a constraint on power overhead, our proposed framework can minimize the soft error rate (SER) of a circuit via selective voltage scaling. On average, circuit SER can be reduced by 33.45% for various sizes of transient glitches with only 11.74% energy increase. The overhead in normalized power-delay-area product per 1% SER reduction is 0.64%, 1.33X less than that of existing state-of-the-art approaches.
由于当前的技术规模趋势,如缩小特征尺寸和降低电源电压,纳米级集成电路对辐射引起的瞬态故障(软错误)变得越来越敏感。软错误是存储器中的一个重要问题,现在是导致逻辑电路可靠性下降的一个主要因素。本文提出了一种利用双电源电压进行软误差强化的功率感知方法。在给定功率开销的约束下,我们提出的框架可以通过选择性电压缩放来最小化电路的软错误率。对于不同尺寸的瞬态故障,电路SER平均可降低33.45%,而能量仅增加11.74%。每降低1%的SER,标准化功率延迟面积产品的开销为0.64%,比现有最先进的方法少1.33倍。
{"title":"Power-aware soft error hardening via selective voltage scaling","authors":"Kai-Chiang Wu, Diana Marculescu","doi":"10.1109/ICCD.2008.4751877","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751877","url":null,"abstract":"Nanoscale integrated circuits are becoming increasingly sensitive to radiation-induced transient faults (soft errors) due to current technology scaling trends, such as shrinking feature sizes and reducing supply voltages. Soft errors, which have been a significant concern in memories, are now a main factor in reliability degradation of logic circuits. This paper presents a power-aware methodology using dual supply voltages for soft error hardening. Given a constraint on power overhead, our proposed framework can minimize the soft error rate (SER) of a circuit via selective voltage scaling. On average, circuit SER can be reduced by 33.45% for various sizes of transient glitches with only 11.74% energy increase. The overhead in normalized power-delay-area product per 1% SER reduction is 0.64%, 1.33X less than that of existing state-of-the-art approaches.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131620213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Ring data location prediction scheme for Non-Uniform Cache Architectures 非统一缓存体系结构环数据位置预测方案
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751936
Sayaka Akioka, Feihui Li, K. Malkowski, P. Raghavan, M. Kandemir, M. J. Irwin
Increases in cache capacity are accompanied by growing wire delays due to technology scaling. Non-uniform cache architecture (NUCA) is one of proposed solutions to reducing the average access latency in such cache designs. While most of the prior NUCA work focuses on data placement, data replacement, and migration related issues, this paper studies the problem of data search (access) in NUCA. In our architecture we arrange sets of banks with equal access latency into rings. Our last access based (LAB) prediction scheme predicts the ring that is expected to contain the required data and checks the banks in that ring first for the data block sought. We compare our scheme to two alternate approaches: searching all rings in parallel, and searching rings sequentially. We show that our LAB ring prediction scheme reduces L2 energy significantly over the sequential and parallel schemes, while maintaining similar performance. Our LAB scheme reduces energy consumption by 15.9% relative to the sequential lookup scheme, and 53.8% relative to the parallel lookup scheme.
随着高速缓存容量的增加,由于技术扩展导致的线路延迟也在增加。非统一缓存架构(NUCA)是减少此类缓存设计中平均访问延迟的解决方案之一。以往的大多数NUCA工作都集中在数据放置、数据替换和迁移相关问题上,本文研究了NUCA中的数据搜索(访问)问题。在我们的体系结构中,我们将具有相等访问延迟的银行组排列成环。我们基于最后访问(LAB)的预测方案预测预期包含所需数据的环,并首先检查该环中的银行以查找所查找的数据块。我们将该方案与两种替代方法进行比较:并行搜索所有环和顺序搜索环。我们表明,我们的LAB环预测方案比顺序和并行方案显著降低了L2能量,同时保持了相似的性能。我们的LAB方案相对于顺序查找方案减少了15.9%的能耗,相对于并行查找方案减少了53.8%的能耗。
{"title":"Ring data location prediction scheme for Non-Uniform Cache Architectures","authors":"Sayaka Akioka, Feihui Li, K. Malkowski, P. Raghavan, M. Kandemir, M. J. Irwin","doi":"10.1109/ICCD.2008.4751936","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751936","url":null,"abstract":"Increases in cache capacity are accompanied by growing wire delays due to technology scaling. Non-uniform cache architecture (NUCA) is one of proposed solutions to reducing the average access latency in such cache designs. While most of the prior NUCA work focuses on data placement, data replacement, and migration related issues, this paper studies the problem of data search (access) in NUCA. In our architecture we arrange sets of banks with equal access latency into rings. Our last access based (LAB) prediction scheme predicts the ring that is expected to contain the required data and checks the banks in that ring first for the data block sought. We compare our scheme to two alternate approaches: searching all rings in parallel, and searching rings sequentially. We show that our LAB ring prediction scheme reduces L2 energy significantly over the sequential and parallel schemes, while maintaining similar performance. Our LAB scheme reduces energy consumption by 15.9% relative to the sequential lookup scheme, and 53.8% relative to the parallel lookup scheme.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133101310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Making register file resistant to power analysis attacks 使寄存器文件抵抗功率分析攻击
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751919
Shuo Wang, Fan Zhang, Jianwei Dai, Lei Wang, Z. Shi
Power analysis attacks are a type of side-channel attacks that exploits the power consumption of computing devices to retrieve secret information. They are very effective in breaking many cryptographic algorithms, especially those running in low-end processors in embedded systems, sensor nodes, and smart cards. Although many countermeasures to power analysis attacks have been proposed, most of them are software based and designed for a specific algorithm. Many of them are also found vulnerable to more advanced attacks. Looking for a low-cost, algorithm-independent solution that can be implemented in many processors and makes all cryptographic algorithms secure against power analysis attacks, we start with register file, where the operands and results of most instructions are stored. In this paper, we propose RFRF, a register file that stores data with a redundant flipped copy. With the redundant copy and a new precharge phase in write operations, RFRF provides data-independent power consumption on read and write for cryptographic algorithms. Although RFRF has large energy overhead, it is only enabled in the security mode. We validate our method with simulations. The results show that the power consumption of RFRF is independent of the values read out from or written to registers. Thus RFRF can help mitigate power analysis attacks.
功率分析攻击是利用计算设备的功耗来检索机密信息的一种侧信道攻击。它们在破解许多加密算法方面非常有效,特别是那些在嵌入式系统、传感器节点和智能卡中的低端处理器中运行的加密算法。虽然已经提出了许多针对功率分析攻击的对策,但大多数都是基于软件的,并且是针对特定算法设计的。其中许多还被发现容易受到更高级的攻击。寻找一种低成本,算法独立的解决方案,可以在许多处理器中实现,并使所有加密算法免受功率分析攻击,我们从寄存器文件开始,其中存储了大多数指令的操作数和结果。在本文中,我们提出了RFRF,一种用冗余翻转副本存储数据的寄存器文件。通过冗余复制和写入操作中的新预充阶段,RFRF为加密算法的读写提供了与数据无关的功耗。虽然RFRF的能量开销很大,但它只在安全模式下启用。我们用仿真验证了我们的方法。结果表明,RFRF的功耗与从寄存器读出或写入寄存器的值无关。因此,RFRF可以帮助减轻功率分析攻击。
{"title":"Making register file resistant to power analysis attacks","authors":"Shuo Wang, Fan Zhang, Jianwei Dai, Lei Wang, Z. Shi","doi":"10.1109/ICCD.2008.4751919","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751919","url":null,"abstract":"Power analysis attacks are a type of side-channel attacks that exploits the power consumption of computing devices to retrieve secret information. They are very effective in breaking many cryptographic algorithms, especially those running in low-end processors in embedded systems, sensor nodes, and smart cards. Although many countermeasures to power analysis attacks have been proposed, most of them are software based and designed for a specific algorithm. Many of them are also found vulnerable to more advanced attacks. Looking for a low-cost, algorithm-independent solution that can be implemented in many processors and makes all cryptographic algorithms secure against power analysis attacks, we start with register file, where the operands and results of most instructions are stored. In this paper, we propose RFRF, a register file that stores data with a redundant flipped copy. With the redundant copy and a new precharge phase in write operations, RFRF provides data-independent power consumption on read and write for cryptographic algorithms. Although RFRF has large energy overhead, it is only enabled in the security mode. We validate our method with simulations. The results show that the power consumption of RFRF is independent of the values read out from or written to registers. Thus RFRF can help mitigate power analysis attacks.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129366748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Digital filter synthesis considering multiple adder graphs for a coefficient 数字滤波器综合考虑多个加法图为一个系数
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751879
Jeong Han, I. Park
In this paper, a new FIR digital filter synthesis algorithm is proposed to consider multiple adder graphs for a coefficient. The proposed algorithm selects an adder graph that can be maximally sharable with the remaining coefficients, while previous dependence-graph algorithms consider only one adder graph when implementing a coefficient. In addition, we propose an addition reordering technique to reduce the computational overhead of finding multiple adder graphs. By using the proposed technique, multiple adder graphs are efficiently generated from a seed adder graph obtained by using previous dependence-graph algorithms. Experimental results show that the proposed algorithm reduces the hardware cost of FIR filters by 23% and 3.4% on average compared to the Hartely and RAGn-hybrid algorithms.
本文提出了一种考虑一个系数的多个加法图的FIR数字滤波器合成算法。该算法选择与剩余系数最大可共享的加法图,而以前的依赖图算法在实现一个系数时只考虑一个加法图。此外,我们提出了一种加法重排序技术,以减少查找多个加法图的计算开销。利用该技术,可以有效地从先前依赖图算法得到的种子加法器图生成多个加法器图。实验结果表明,与hartly和RAGn-hybrid算法相比,该算法可将FIR滤波器的硬件成本平均降低23%和3.4%。
{"title":"Digital filter synthesis considering multiple adder graphs for a coefficient","authors":"Jeong Han, I. Park","doi":"10.1109/ICCD.2008.4751879","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751879","url":null,"abstract":"In this paper, a new FIR digital filter synthesis algorithm is proposed to consider multiple adder graphs for a coefficient. The proposed algorithm selects an adder graph that can be maximally sharable with the remaining coefficients, while previous dependence-graph algorithms consider only one adder graph when implementing a coefficient. In addition, we propose an addition reordering technique to reduce the computational overhead of finding multiple adder graphs. By using the proposed technique, multiple adder graphs are efficiently generated from a seed adder graph obtained by using previous dependence-graph algorithms. Experimental results show that the proposed algorithm reduces the hardware cost of FIR filters by 23% and 3.4% on average compared to the Hartely and RAGn-hybrid algorithms.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124167815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Contention-aware application mapping for Network-on-Chip communication architectures 面向片上网络通信体系结构的竞争感知应用映射
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751856
Chen-Ling Chou, R. Marculescu
In this paper, we analyze the impact of network contention on the application mapping for tile-based network-on-chip (NoC) architectures. Our main theoretical contribution consists of an integer linear programming (ILP) formulation of the contention-aware application mapping problem which aims at minimizing the inter-tile network contention. To solve the scalability problem caused by ILP formulation, we propose a linear programming (LP) approach followed by an mapping heuristic. Taken together, they provide near-optimal solutions while reducing the runtime significantly. Experimental results show that, compared to other existing mapping approaches based on communication energy minimization, our contention-aware mapping technique achieves a significant decrease in packet latency (and implicitly, a throughput increase) with a negligible communication energy overhead.
在本文中,我们分析了网络争用对基于片上网络(NoC)架构的应用映射的影响。我们的主要理论贡献包括一个竞争感知应用映射问题的整数线性规划(ILP)公式,旨在最小化层间网络竞争。为了解决由ILP公式引起的可扩展性问题,我们提出了一种线性规划(LP)方法,然后是映射启发式方法。综合起来,它们提供了近乎最优的解决方案,同时显著缩短了运行时间。实验结果表明,与其他现有的基于通信能量最小化的映射方法相比,我们的竞争感知映射技术在通信能量开销可以忽略不计的情况下显著降低了数据包延迟(并隐含地增加了吞吐量)。
{"title":"Contention-aware application mapping for Network-on-Chip communication architectures","authors":"Chen-Ling Chou, R. Marculescu","doi":"10.1109/ICCD.2008.4751856","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751856","url":null,"abstract":"In this paper, we analyze the impact of network contention on the application mapping for tile-based network-on-chip (NoC) architectures. Our main theoretical contribution consists of an integer linear programming (ILP) formulation of the contention-aware application mapping problem which aims at minimizing the inter-tile network contention. To solve the scalability problem caused by ILP formulation, we propose a linear programming (LP) approach followed by an mapping heuristic. Taken together, they provide near-optimal solutions while reducing the runtime significantly. Experimental results show that, compared to other existing mapping approaches based on communication energy minimization, our contention-aware mapping technique achieves a significant decrease in packet latency (and implicitly, a throughput increase) with a negligible communication energy overhead.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121068527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 155
Combined interpolation architecture for soft-decision decoding of Reed-Solomon codes Reed-Solomon码软判决译码的组合插值结构
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751911
Jiangli Zhu, Xinmiao Zhang, Zhongfeng Wang
Reed-Solomon (RS) codes are one of the most extensively used error control codes in digital communication and storage systems. Recently, significant advancements have been made on algebraic soft-decision decoding (ASD) of RS codes. These algorithms can achieve substantial coding gain with polynomial complexity. One major step of ASD is the interpolation. Various techniques have been proposed to reduce the complexity of this step. Further speedup of this step is limited by the inherent serial nature of the interpolation algorithm. In this paper, taking the bit-level generalized minimum distance (BGMD) ASD as an example, we propose a novel technique to combine the computations from multiple interpolation iterations. Compared to the single interpolation iteration architecture for a (255, 239) RS code, the combined architecture can achieve 2.7 times throughput with only 2% area overhead in high signal-to-noise ratio scenarios.
RS码是数字通信和存储系统中使用最广泛的错误控制码之一。近年来,RS码的代数软判决译码(ASD)研究取得了重大进展。这些算法可以在多项式复杂度下获得可观的编码增益。ASD的一个主要步骤是插值。已经提出了各种技术来降低这一步骤的复杂性。这一步的进一步加速受到插值算法固有的串行特性的限制。本文以比特级广义最小距离(BGMD) ASD为例,提出了一种结合多次插值迭代计算的新方法。与(255,239)RS代码的单一插值迭代架构相比,在高信噪比场景下,组合架构可以实现2.7倍的吞吐量,仅占用2%的面积开销。
{"title":"Combined interpolation architecture for soft-decision decoding of Reed-Solomon codes","authors":"Jiangli Zhu, Xinmiao Zhang, Zhongfeng Wang","doi":"10.1109/ICCD.2008.4751911","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751911","url":null,"abstract":"Reed-Solomon (RS) codes are one of the most extensively used error control codes in digital communication and storage systems. Recently, significant advancements have been made on algebraic soft-decision decoding (ASD) of RS codes. These algorithms can achieve substantial coding gain with polynomial complexity. One major step of ASD is the interpolation. Various techniques have been proposed to reduce the complexity of this step. Further speedup of this step is limited by the inherent serial nature of the interpolation algorithm. In this paper, taking the bit-level generalized minimum distance (BGMD) ASD as an example, we propose a novel technique to combine the computations from multiple interpolation iterations. Compared to the single interpolation iteration architecture for a (255, 239) RS code, the combined architecture can achieve 2.7 times throughput with only 2% area overhead in high signal-to-noise ratio scenarios.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128625948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Issue system protection mechanisms 发布系统保护机制
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751922
P. Chaparro, J. Abella, J. Carretero, X. Vera
Multi-core microprocessors require reducing the FIT (failures-in-time) rate per core drastically to enable a larger number of cores within a FIT budget. Since large arrays like caches and register flies are typically protected with either ECC or parity, the issue system becomes as one of the largest contributors to the core's FIT rate. Soft-errors are an important concern in contemporary microprocessors. Particle hits on the components of a processor are expected to create an increasing number of transient errors in each new microprocessor generation. In addition, the number of hard-errors in the field is expected to grow as burn-in becomes less effective. Moreover, the continuous device shrinking increases the likelihood of in-the-field failures due to rather small defects exacerbated by degradation. This paper proposes on-line mechanisms to detect and recover to a consistent state, classify and confine in-the-field errors in the issue system of both in-order and out-of-order cores. Such mechanisms provide high coverage at a small cost.
多核微处理器需要大幅降低每个核心的FIT(及时故障率),以便在FIT预算内实现更多的核心数量。由于像缓存和寄存器这样的大型数组通常使用ECC或奇偶校验进行保护,因此问题系统成为核心FIT率的最大贡献者之一。软误差是当代微处理器的一个重要问题。粒子对处理器组件的撞击预计会在每一代新的微处理器中产生越来越多的瞬态错误。此外,随着老化变得不那么有效,预计该领域的硬错误数量将会增加。此外,设备的持续收缩增加了由于退化而加剧的相当小的缺陷而导致现场失效的可能性。本文提出了一种在线机制来检测和恢复到一致状态,分类和限制问题系统的现场错误,无论是有序的还是失序的。这种机制以小成本提供高覆盖率。
{"title":"Issue system protection mechanisms","authors":"P. Chaparro, J. Abella, J. Carretero, X. Vera","doi":"10.1109/ICCD.2008.4751922","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751922","url":null,"abstract":"Multi-core microprocessors require reducing the FIT (failures-in-time) rate per core drastically to enable a larger number of cores within a FIT budget. Since large arrays like caches and register flies are typically protected with either ECC or parity, the issue system becomes as one of the largest contributors to the core's FIT rate. Soft-errors are an important concern in contemporary microprocessors. Particle hits on the components of a processor are expected to create an increasing number of transient errors in each new microprocessor generation. In addition, the number of hard-errors in the field is expected to grow as burn-in becomes less effective. Moreover, the continuous device shrinking increases the likelihood of in-the-field failures due to rather small defects exacerbated by degradation. This paper proposes on-line mechanisms to detect and recover to a consistent state, classify and confine in-the-field errors in the issue system of both in-order and out-of-order cores. Such mechanisms provide high coverage at a small cost.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127383191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Chip level thermal profile estimation using on-chip temperature sensors 使用片上温度传感器的芯片级热剖面估计
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751897
Yufu Zhang, Ankur Srivastava, M. Zahran
This paper addresses the problem of chip level thermal profile estimation using runtime temperature sensor readings. We address the challenges of a) availability of only a few thermal sensors with constrained locations (sensors cannot be placed just anywhere) b) random on-chip power density characteristics due to unpredictable workloads and fabrication variability. Firstly we model the random power density as a probability density function. Given this random characteristic and runtime thermal sensor readings, we exploit the correlation between power dissipation of different chip modules to estimate the expected value of temperature at each chip location. Our methods are optimal if the underlying power density has Gaussian nature. We also present a heuristic to generate the chip level thermal profile estimates when the underlying randomness is non-Gaussian. Experimental results indicate that our method generates highly accurate thermal profile estimates of the entire chip at runtime using only a few thermal sensors.
本文解决了使用运行时温度传感器读数进行芯片级热剖面估计的问题。我们解决了以下挑战:a)只有少数位置受限的热传感器可用性(传感器不能放置在任何地方);b)由于不可预测的工作负载和制造可变性,芯片上的随机功率密度特性。首先,我们将随机功率密度建模为概率密度函数。考虑到这种随机特性和运行时热传感器读数,我们利用不同芯片模块功耗之间的相关性来估计每个芯片位置的温度期望值。如果底层功率密度具有高斯性质,我们的方法是最优的。我们还提出了一种启发式方法来生成芯片级热剖面估计,当潜在的随机性是非高斯的。实验结果表明,我们的方法仅使用少数热传感器就能在运行时对整个芯片产生高精度的热分布估计。
{"title":"Chip level thermal profile estimation using on-chip temperature sensors","authors":"Yufu Zhang, Ankur Srivastava, M. Zahran","doi":"10.1109/ICCD.2008.4751897","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751897","url":null,"abstract":"This paper addresses the problem of chip level thermal profile estimation using runtime temperature sensor readings. We address the challenges of a) availability of only a few thermal sensors with constrained locations (sensors cannot be placed just anywhere) b) random on-chip power density characteristics due to unpredictable workloads and fabrication variability. Firstly we model the random power density as a probability density function. Given this random characteristic and runtime thermal sensor readings, we exploit the correlation between power dissipation of different chip modules to estimate the expected value of temperature at each chip location. Our methods are optimal if the underlying power density has Gaussian nature. We also present a heuristic to generate the chip level thermal profile estimates when the underlying randomness is non-Gaussian. Experimental results indicate that our method generates highly accurate thermal profile estimates of the entire chip at runtime using only a few thermal sensors.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127513075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Characterization of granularity and redundancy for SRAMs for optimal yield-per-area 最佳单产sram的粒度和冗余特性
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751865
J. Cha, S. Gupta
Memories are significant proportions of most digital systems and memory-intensive chips continue to lead the migration to new nano-fabrication processes. As these processes have increasingly higher defect rates, especially when they are first adopted, such early migration necessitates the use of increasing levels of redundancy to obtain high yield (per area). We show that as we move into nanometer processes with high defect rates, the level of redundancy needed to optimize yield-per-area is sufficiently high so as to significantly influence design tradeoffs. We then report a first step towards considering the overheads of redundancy during design optimization by characterizing the tradeoffs between the granularity of a design and the level of redundancy that optimizes the yield-per-area of static RAMs (SRAMs). Starting with physical layouts of cells and the desired memory size, we derive probabilities of failure at a range of abstractions - transistor level, cell level, and system level. We then estimate optimal memory granularity, i.e., the size of memory blocks, as well as the optimal number of spare rows and columns that maximize yield-per-area. In particular, we demonstrate the non-monotonic nature of these tradeoffs and present efficient designs for large SRAMs. Our ongoing research is characterizing several other specific tradeoffs, for SRAMs as well as logic blocks.
存储器是大多数数字系统的重要组成部分,内存密集型芯片继续引领着向新的纳米制造工艺的迁移。由于这些过程的缺陷率越来越高,特别是当它们第一次被采用时,这样的早期迁移需要使用不断增加的冗余水平来获得高产量(每个区域)。我们表明,当我们进入具有高缺品率的纳米工艺时,优化单位面积产量所需的冗余水平足够高,从而显著影响设计权衡。然后,我们通过描述设计粒度和优化静态ram (sram)亩产的冗余水平之间的权衡,报告了在设计优化期间考虑冗余开销的第一步。从单元的物理布局和所需的内存大小开始,我们得出了一系列抽象的故障概率——晶体管级、单元级和系统级。然后,我们估计最佳内存粒度,即内存块的大小,以及最大限度地提高亩产量的备用行和列的最佳数量。特别是,我们展示了这些权衡的非单调性,并提出了大型sram的有效设计。我们正在进行的研究是表征其他几个特定的权衡,为sram和逻辑块。
{"title":"Characterization of granularity and redundancy for SRAMs for optimal yield-per-area","authors":"J. Cha, S. Gupta","doi":"10.1109/ICCD.2008.4751865","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751865","url":null,"abstract":"Memories are significant proportions of most digital systems and memory-intensive chips continue to lead the migration to new nano-fabrication processes. As these processes have increasingly higher defect rates, especially when they are first adopted, such early migration necessitates the use of increasing levels of redundancy to obtain high yield (per area). We show that as we move into nanometer processes with high defect rates, the level of redundancy needed to optimize yield-per-area is sufficiently high so as to significantly influence design tradeoffs. We then report a first step towards considering the overheads of redundancy during design optimization by characterizing the tradeoffs between the granularity of a design and the level of redundancy that optimizes the yield-per-area of static RAMs (SRAMs). Starting with physical layouts of cells and the desired memory size, we derive probabilities of failure at a range of abstractions - transistor level, cell level, and system level. We then estimate optimal memory granularity, i.e., the size of memory blocks, as well as the optimal number of spare rows and columns that maximize yield-per-area. In particular, we demonstrate the non-monotonic nature of these tradeoffs and present efficient designs for large SRAMs. Our ongoing research is characterizing several other specific tradeoffs, for SRAMs as well as logic blocks.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121927630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Architecture implementation of an improved decimal CORDIC method 改进的十进制CORDIC方法的体系结构实现
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751846
J. L. Sánchez, H. Mora, J. M. Pascual, A. Jimeno-Morenilla
Since radix-10 arithmetic has been gaining renewed importance over the last few years, high performance decimal systems and techniques are highly demanded. In this paper, a modification of the CORDIC method for decimal arithmetic is proposed so as to improve calculations. The algorithm works with BCD operands and no conversion to binary is needed. A significant reduction in the number of iterations in comparison to the original decimal CORDIC method is achieved. The experiments showing the advantages of the new method are described. Also, the results with regard to delay obtained by means of an FPGA implementation of the method are shown.
由于基数-10算术在过去几年中获得了新的重要性,因此高度需要高性能的十进制系统和技术。本文提出了对十进制算法CORDIC方法的一种改进,以提高计算效率。该算法适用于BCD操作数,不需要转换为二进制。与原始的十进制CORDIC方法相比,迭代次数显著减少。实验表明了新方法的优越性。此外,还给出了通过FPGA实现该方法获得的有关延迟的结果。
{"title":"Architecture implementation of an improved decimal CORDIC method","authors":"J. L. Sánchez, H. Mora, J. M. Pascual, A. Jimeno-Morenilla","doi":"10.1109/ICCD.2008.4751846","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751846","url":null,"abstract":"Since radix-10 arithmetic has been gaining renewed importance over the last few years, high performance decimal systems and techniques are highly demanded. In this paper, a modification of the CORDIC method for decimal arithmetic is proposed so as to improve calculations. The algorithm works with BCD operands and no conversion to binary is needed. A significant reduction in the number of iterations in comparison to the original decimal CORDIC method is achieved. The experiments showing the advantages of the new method are described. Also, the results with regard to delay obtained by means of an FPGA implementation of the method are shown.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121878167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2008 IEEE International Conference on Computer Design
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1