首页 > 最新文献

2008 IEEE International Conference on Computer Design最新文献

英文 中文
Power-aware soft error hardening via selective voltage scaling 通过选择性电压缩放实现功率感知软错误硬化
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751877
Kai-Chiang Wu, Diana Marculescu
Nanoscale integrated circuits are becoming increasingly sensitive to radiation-induced transient faults (soft errors) due to current technology scaling trends, such as shrinking feature sizes and reducing supply voltages. Soft errors, which have been a significant concern in memories, are now a main factor in reliability degradation of logic circuits. This paper presents a power-aware methodology using dual supply voltages for soft error hardening. Given a constraint on power overhead, our proposed framework can minimize the soft error rate (SER) of a circuit via selective voltage scaling. On average, circuit SER can be reduced by 33.45% for various sizes of transient glitches with only 11.74% energy increase. The overhead in normalized power-delay-area product per 1% SER reduction is 0.64%, 1.33X less than that of existing state-of-the-art approaches.
由于当前的技术规模趋势,如缩小特征尺寸和降低电源电压,纳米级集成电路对辐射引起的瞬态故障(软错误)变得越来越敏感。软错误是存储器中的一个重要问题,现在是导致逻辑电路可靠性下降的一个主要因素。本文提出了一种利用双电源电压进行软误差强化的功率感知方法。在给定功率开销的约束下,我们提出的框架可以通过选择性电压缩放来最小化电路的软错误率。对于不同尺寸的瞬态故障,电路SER平均可降低33.45%,而能量仅增加11.74%。每降低1%的SER,标准化功率延迟面积产品的开销为0.64%,比现有最先进的方法少1.33倍。
{"title":"Power-aware soft error hardening via selective voltage scaling","authors":"Kai-Chiang Wu, Diana Marculescu","doi":"10.1109/ICCD.2008.4751877","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751877","url":null,"abstract":"Nanoscale integrated circuits are becoming increasingly sensitive to radiation-induced transient faults (soft errors) due to current technology scaling trends, such as shrinking feature sizes and reducing supply voltages. Soft errors, which have been a significant concern in memories, are now a main factor in reliability degradation of logic circuits. This paper presents a power-aware methodology using dual supply voltages for soft error hardening. Given a constraint on power overhead, our proposed framework can minimize the soft error rate (SER) of a circuit via selective voltage scaling. On average, circuit SER can be reduced by 33.45% for various sizes of transient glitches with only 11.74% energy increase. The overhead in normalized power-delay-area product per 1% SER reduction is 0.64%, 1.33X less than that of existing state-of-the-art approaches.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131620213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Ring data location prediction scheme for Non-Uniform Cache Architectures 非统一缓存体系结构环数据位置预测方案
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751936
Sayaka Akioka, Feihui Li, K. Malkowski, P. Raghavan, M. Kandemir, M. J. Irwin
Increases in cache capacity are accompanied by growing wire delays due to technology scaling. Non-uniform cache architecture (NUCA) is one of proposed solutions to reducing the average access latency in such cache designs. While most of the prior NUCA work focuses on data placement, data replacement, and migration related issues, this paper studies the problem of data search (access) in NUCA. In our architecture we arrange sets of banks with equal access latency into rings. Our last access based (LAB) prediction scheme predicts the ring that is expected to contain the required data and checks the banks in that ring first for the data block sought. We compare our scheme to two alternate approaches: searching all rings in parallel, and searching rings sequentially. We show that our LAB ring prediction scheme reduces L2 energy significantly over the sequential and parallel schemes, while maintaining similar performance. Our LAB scheme reduces energy consumption by 15.9% relative to the sequential lookup scheme, and 53.8% relative to the parallel lookup scheme.
随着高速缓存容量的增加,由于技术扩展导致的线路延迟也在增加。非统一缓存架构(NUCA)是减少此类缓存设计中平均访问延迟的解决方案之一。以往的大多数NUCA工作都集中在数据放置、数据替换和迁移相关问题上,本文研究了NUCA中的数据搜索(访问)问题。在我们的体系结构中,我们将具有相等访问延迟的银行组排列成环。我们基于最后访问(LAB)的预测方案预测预期包含所需数据的环,并首先检查该环中的银行以查找所查找的数据块。我们将该方案与两种替代方法进行比较:并行搜索所有环和顺序搜索环。我们表明,我们的LAB环预测方案比顺序和并行方案显著降低了L2能量,同时保持了相似的性能。我们的LAB方案相对于顺序查找方案减少了15.9%的能耗,相对于并行查找方案减少了53.8%的能耗。
{"title":"Ring data location prediction scheme for Non-Uniform Cache Architectures","authors":"Sayaka Akioka, Feihui Li, K. Malkowski, P. Raghavan, M. Kandemir, M. J. Irwin","doi":"10.1109/ICCD.2008.4751936","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751936","url":null,"abstract":"Increases in cache capacity are accompanied by growing wire delays due to technology scaling. Non-uniform cache architecture (NUCA) is one of proposed solutions to reducing the average access latency in such cache designs. While most of the prior NUCA work focuses on data placement, data replacement, and migration related issues, this paper studies the problem of data search (access) in NUCA. In our architecture we arrange sets of banks with equal access latency into rings. Our last access based (LAB) prediction scheme predicts the ring that is expected to contain the required data and checks the banks in that ring first for the data block sought. We compare our scheme to two alternate approaches: searching all rings in parallel, and searching rings sequentially. We show that our LAB ring prediction scheme reduces L2 energy significantly over the sequential and parallel schemes, while maintaining similar performance. Our LAB scheme reduces energy consumption by 15.9% relative to the sequential lookup scheme, and 53.8% relative to the parallel lookup scheme.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133101310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Making register file resistant to power analysis attacks 使寄存器文件抵抗功率分析攻击
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751919
Shuo Wang, Fan Zhang, Jianwei Dai, Lei Wang, Z. Shi
Power analysis attacks are a type of side-channel attacks that exploits the power consumption of computing devices to retrieve secret information. They are very effective in breaking many cryptographic algorithms, especially those running in low-end processors in embedded systems, sensor nodes, and smart cards. Although many countermeasures to power analysis attacks have been proposed, most of them are software based and designed for a specific algorithm. Many of them are also found vulnerable to more advanced attacks. Looking for a low-cost, algorithm-independent solution that can be implemented in many processors and makes all cryptographic algorithms secure against power analysis attacks, we start with register file, where the operands and results of most instructions are stored. In this paper, we propose RFRF, a register file that stores data with a redundant flipped copy. With the redundant copy and a new precharge phase in write operations, RFRF provides data-independent power consumption on read and write for cryptographic algorithms. Although RFRF has large energy overhead, it is only enabled in the security mode. We validate our method with simulations. The results show that the power consumption of RFRF is independent of the values read out from or written to registers. Thus RFRF can help mitigate power analysis attacks.
功率分析攻击是利用计算设备的功耗来检索机密信息的一种侧信道攻击。它们在破解许多加密算法方面非常有效,特别是那些在嵌入式系统、传感器节点和智能卡中的低端处理器中运行的加密算法。虽然已经提出了许多针对功率分析攻击的对策,但大多数都是基于软件的,并且是针对特定算法设计的。其中许多还被发现容易受到更高级的攻击。寻找一种低成本,算法独立的解决方案,可以在许多处理器中实现,并使所有加密算法免受功率分析攻击,我们从寄存器文件开始,其中存储了大多数指令的操作数和结果。在本文中,我们提出了RFRF,一种用冗余翻转副本存储数据的寄存器文件。通过冗余复制和写入操作中的新预充阶段,RFRF为加密算法的读写提供了与数据无关的功耗。虽然RFRF的能量开销很大,但它只在安全模式下启用。我们用仿真验证了我们的方法。结果表明,RFRF的功耗与从寄存器读出或写入寄存器的值无关。因此,RFRF可以帮助减轻功率分析攻击。
{"title":"Making register file resistant to power analysis attacks","authors":"Shuo Wang, Fan Zhang, Jianwei Dai, Lei Wang, Z. Shi","doi":"10.1109/ICCD.2008.4751919","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751919","url":null,"abstract":"Power analysis attacks are a type of side-channel attacks that exploits the power consumption of computing devices to retrieve secret information. They are very effective in breaking many cryptographic algorithms, especially those running in low-end processors in embedded systems, sensor nodes, and smart cards. Although many countermeasures to power analysis attacks have been proposed, most of them are software based and designed for a specific algorithm. Many of them are also found vulnerable to more advanced attacks. Looking for a low-cost, algorithm-independent solution that can be implemented in many processors and makes all cryptographic algorithms secure against power analysis attacks, we start with register file, where the operands and results of most instructions are stored. In this paper, we propose RFRF, a register file that stores data with a redundant flipped copy. With the redundant copy and a new precharge phase in write operations, RFRF provides data-independent power consumption on read and write for cryptographic algorithms. Although RFRF has large energy overhead, it is only enabled in the security mode. We validate our method with simulations. The results show that the power consumption of RFRF is independent of the values read out from or written to registers. Thus RFRF can help mitigate power analysis attacks.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129366748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Low-cost open-page prefetch scheduling in chip multiprocessors 芯片多处理器中的低成本开页预取调度
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751890
Marius Grannæs, Magnus Jahre, L. Natvig
The pressure on off-chip memory increases significantly as more cores compete for the same resources. A CMP deals with the memory wall by exploiting thread level parallelism (TLP), shifting the focus from reducing overall memory latency to memory throughput. This extends to the memory controller where the 3D structure of modern DRAM is exploited to increase throughput. Traditionally, prefetching reduces latency by fetching data before it is needed. In this paper we explore how prefetching can be used to increase memory throughput. We present our own low-cost open-page prefetch scheduler that exploits the 3D structure of DRAM when issuing prefetches. We show that because of the complex structure of modern DRAM, prefetches can be made cheaper than ordinary reads, thus making prefetching beneficial even when prefetcher accuracy is low. As a result, prefetching with good coverage is more important than high accuracy. By exploiting this observation our low-cost open page scheme increases performance and QoS. Furthermore, we explore how prefetches should be scheduled in a state of the art memory controller by examining sequential, scheduled region, CZone/delta correlation and reference prediction table prefetchers.
随着越来越多的核心争夺相同的资源,片外内存的压力会显著增加。CMP通过利用线程级并行性(TLP)来处理内存墙,将重点从减少总体内存延迟转移到内存吞吐量。这扩展到内存控制器,其中利用现代DRAM的3D结构来提高吞吐量。传统上,预取通过在需要之前获取数据来减少延迟。在本文中,我们探讨了如何使用预取来提高内存吞吐量。我们提出了自己的低成本的打开页预取调度器,它在发出预取时利用了DRAM的3D结构。我们表明,由于现代DRAM的复杂结构,预取可以比普通读取更便宜,因此即使在预取精度低的情况下,预取也是有益的。因此,覆盖率高的预取比精度高的预取更为重要。通过利用这一观察结果,我们的低成本打开页面方案提高了性能和QoS。此外,我们通过检查顺序、计划区域、CZone/delta相关性和参考预测表预取器,探讨了如何在最先进的内存控制器状态下调度预取。
{"title":"Low-cost open-page prefetch scheduling in chip multiprocessors","authors":"Marius Grannæs, Magnus Jahre, L. Natvig","doi":"10.1109/ICCD.2008.4751890","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751890","url":null,"abstract":"The pressure on off-chip memory increases significantly as more cores compete for the same resources. A CMP deals with the memory wall by exploiting thread level parallelism (TLP), shifting the focus from reducing overall memory latency to memory throughput. This extends to the memory controller where the 3D structure of modern DRAM is exploited to increase throughput. Traditionally, prefetching reduces latency by fetching data before it is needed. In this paper we explore how prefetching can be used to increase memory throughput. We present our own low-cost open-page prefetch scheduler that exploits the 3D structure of DRAM when issuing prefetches. We show that because of the complex structure of modern DRAM, prefetches can be made cheaper than ordinary reads, thus making prefetching beneficial even when prefetcher accuracy is low. As a result, prefetching with good coverage is more important than high accuracy. By exploiting this observation our low-cost open page scheme increases performance and QoS. Furthermore, we explore how prefetches should be scheduled in a state of the art memory controller by examining sequential, scheduled region, CZone/delta correlation and reference prediction table prefetchers.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"162 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115831101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Digital filter synthesis considering multiple adder graphs for a coefficient 数字滤波器综合考虑多个加法图为一个系数
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751879
Jeong Han, I. Park
In this paper, a new FIR digital filter synthesis algorithm is proposed to consider multiple adder graphs for a coefficient. The proposed algorithm selects an adder graph that can be maximally sharable with the remaining coefficients, while previous dependence-graph algorithms consider only one adder graph when implementing a coefficient. In addition, we propose an addition reordering technique to reduce the computational overhead of finding multiple adder graphs. By using the proposed technique, multiple adder graphs are efficiently generated from a seed adder graph obtained by using previous dependence-graph algorithms. Experimental results show that the proposed algorithm reduces the hardware cost of FIR filters by 23% and 3.4% on average compared to the Hartely and RAGn-hybrid algorithms.
本文提出了一种考虑一个系数的多个加法图的FIR数字滤波器合成算法。该算法选择与剩余系数最大可共享的加法图,而以前的依赖图算法在实现一个系数时只考虑一个加法图。此外,我们提出了一种加法重排序技术,以减少查找多个加法图的计算开销。利用该技术,可以有效地从先前依赖图算法得到的种子加法器图生成多个加法器图。实验结果表明,与hartly和RAGn-hybrid算法相比,该算法可将FIR滤波器的硬件成本平均降低23%和3.4%。
{"title":"Digital filter synthesis considering multiple adder graphs for a coefficient","authors":"Jeong Han, I. Park","doi":"10.1109/ICCD.2008.4751879","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751879","url":null,"abstract":"In this paper, a new FIR digital filter synthesis algorithm is proposed to consider multiple adder graphs for a coefficient. The proposed algorithm selects an adder graph that can be maximally sharable with the remaining coefficients, while previous dependence-graph algorithms consider only one adder graph when implementing a coefficient. In addition, we propose an addition reordering technique to reduce the computational overhead of finding multiple adder graphs. By using the proposed technique, multiple adder graphs are efficiently generated from a seed adder graph obtained by using previous dependence-graph algorithms. Experimental results show that the proposed algorithm reduces the hardware cost of FIR filters by 23% and 3.4% on average compared to the Hartely and RAGn-hybrid algorithms.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124167815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Contention-aware application mapping for Network-on-Chip communication architectures 面向片上网络通信体系结构的竞争感知应用映射
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751856
Chen-Ling Chou, R. Marculescu
In this paper, we analyze the impact of network contention on the application mapping for tile-based network-on-chip (NoC) architectures. Our main theoretical contribution consists of an integer linear programming (ILP) formulation of the contention-aware application mapping problem which aims at minimizing the inter-tile network contention. To solve the scalability problem caused by ILP formulation, we propose a linear programming (LP) approach followed by an mapping heuristic. Taken together, they provide near-optimal solutions while reducing the runtime significantly. Experimental results show that, compared to other existing mapping approaches based on communication energy minimization, our contention-aware mapping technique achieves a significant decrease in packet latency (and implicitly, a throughput increase) with a negligible communication energy overhead.
在本文中,我们分析了网络争用对基于片上网络(NoC)架构的应用映射的影响。我们的主要理论贡献包括一个竞争感知应用映射问题的整数线性规划(ILP)公式,旨在最小化层间网络竞争。为了解决由ILP公式引起的可扩展性问题,我们提出了一种线性规划(LP)方法,然后是映射启发式方法。综合起来,它们提供了近乎最优的解决方案,同时显著缩短了运行时间。实验结果表明,与其他现有的基于通信能量最小化的映射方法相比,我们的竞争感知映射技术在通信能量开销可以忽略不计的情况下显著降低了数据包延迟(并隐含地增加了吞吐量)。
{"title":"Contention-aware application mapping for Network-on-Chip communication architectures","authors":"Chen-Ling Chou, R. Marculescu","doi":"10.1109/ICCD.2008.4751856","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751856","url":null,"abstract":"In this paper, we analyze the impact of network contention on the application mapping for tile-based network-on-chip (NoC) architectures. Our main theoretical contribution consists of an integer linear programming (ILP) formulation of the contention-aware application mapping problem which aims at minimizing the inter-tile network contention. To solve the scalability problem caused by ILP formulation, we propose a linear programming (LP) approach followed by an mapping heuristic. Taken together, they provide near-optimal solutions while reducing the runtime significantly. Experimental results show that, compared to other existing mapping approaches based on communication energy minimization, our contention-aware mapping technique achieves a significant decrease in packet latency (and implicitly, a throughput increase) with a negligible communication energy overhead.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121068527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 155
Characterization of granularity and redundancy for SRAMs for optimal yield-per-area 最佳单产sram的粒度和冗余特性
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751865
J. Cha, S. Gupta
Memories are significant proportions of most digital systems and memory-intensive chips continue to lead the migration to new nano-fabrication processes. As these processes have increasingly higher defect rates, especially when they are first adopted, such early migration necessitates the use of increasing levels of redundancy to obtain high yield (per area). We show that as we move into nanometer processes with high defect rates, the level of redundancy needed to optimize yield-per-area is sufficiently high so as to significantly influence design tradeoffs. We then report a first step towards considering the overheads of redundancy during design optimization by characterizing the tradeoffs between the granularity of a design and the level of redundancy that optimizes the yield-per-area of static RAMs (SRAMs). Starting with physical layouts of cells and the desired memory size, we derive probabilities of failure at a range of abstractions - transistor level, cell level, and system level. We then estimate optimal memory granularity, i.e., the size of memory blocks, as well as the optimal number of spare rows and columns that maximize yield-per-area. In particular, we demonstrate the non-monotonic nature of these tradeoffs and present efficient designs for large SRAMs. Our ongoing research is characterizing several other specific tradeoffs, for SRAMs as well as logic blocks.
存储器是大多数数字系统的重要组成部分,内存密集型芯片继续引领着向新的纳米制造工艺的迁移。由于这些过程的缺陷率越来越高,特别是当它们第一次被采用时,这样的早期迁移需要使用不断增加的冗余水平来获得高产量(每个区域)。我们表明,当我们进入具有高缺品率的纳米工艺时,优化单位面积产量所需的冗余水平足够高,从而显著影响设计权衡。然后,我们通过描述设计粒度和优化静态ram (sram)亩产的冗余水平之间的权衡,报告了在设计优化期间考虑冗余开销的第一步。从单元的物理布局和所需的内存大小开始,我们得出了一系列抽象的故障概率——晶体管级、单元级和系统级。然后,我们估计最佳内存粒度,即内存块的大小,以及最大限度地提高亩产量的备用行和列的最佳数量。特别是,我们展示了这些权衡的非单调性,并提出了大型sram的有效设计。我们正在进行的研究是表征其他几个特定的权衡,为sram和逻辑块。
{"title":"Characterization of granularity and redundancy for SRAMs for optimal yield-per-area","authors":"J. Cha, S. Gupta","doi":"10.1109/ICCD.2008.4751865","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751865","url":null,"abstract":"Memories are significant proportions of most digital systems and memory-intensive chips continue to lead the migration to new nano-fabrication processes. As these processes have increasingly higher defect rates, especially when they are first adopted, such early migration necessitates the use of increasing levels of redundancy to obtain high yield (per area). We show that as we move into nanometer processes with high defect rates, the level of redundancy needed to optimize yield-per-area is sufficiently high so as to significantly influence design tradeoffs. We then report a first step towards considering the overheads of redundancy during design optimization by characterizing the tradeoffs between the granularity of a design and the level of redundancy that optimizes the yield-per-area of static RAMs (SRAMs). Starting with physical layouts of cells and the desired memory size, we derive probabilities of failure at a range of abstractions - transistor level, cell level, and system level. We then estimate optimal memory granularity, i.e., the size of memory blocks, as well as the optimal number of spare rows and columns that maximize yield-per-area. In particular, we demonstrate the non-monotonic nature of these tradeoffs and present efficient designs for large SRAMs. Our ongoing research is characterizing several other specific tradeoffs, for SRAMs as well as logic blocks.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121927630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Chip level thermal profile estimation using on-chip temperature sensors 使用片上温度传感器的芯片级热剖面估计
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751897
Yufu Zhang, Ankur Srivastava, M. Zahran
This paper addresses the problem of chip level thermal profile estimation using runtime temperature sensor readings. We address the challenges of a) availability of only a few thermal sensors with constrained locations (sensors cannot be placed just anywhere) b) random on-chip power density characteristics due to unpredictable workloads and fabrication variability. Firstly we model the random power density as a probability density function. Given this random characteristic and runtime thermal sensor readings, we exploit the correlation between power dissipation of different chip modules to estimate the expected value of temperature at each chip location. Our methods are optimal if the underlying power density has Gaussian nature. We also present a heuristic to generate the chip level thermal profile estimates when the underlying randomness is non-Gaussian. Experimental results indicate that our method generates highly accurate thermal profile estimates of the entire chip at runtime using only a few thermal sensors.
本文解决了使用运行时温度传感器读数进行芯片级热剖面估计的问题。我们解决了以下挑战:a)只有少数位置受限的热传感器可用性(传感器不能放置在任何地方);b)由于不可预测的工作负载和制造可变性,芯片上的随机功率密度特性。首先,我们将随机功率密度建模为概率密度函数。考虑到这种随机特性和运行时热传感器读数,我们利用不同芯片模块功耗之间的相关性来估计每个芯片位置的温度期望值。如果底层功率密度具有高斯性质,我们的方法是最优的。我们还提出了一种启发式方法来生成芯片级热剖面估计,当潜在的随机性是非高斯的。实验结果表明,我们的方法仅使用少数热传感器就能在运行时对整个芯片产生高精度的热分布估计。
{"title":"Chip level thermal profile estimation using on-chip temperature sensors","authors":"Yufu Zhang, Ankur Srivastava, M. Zahran","doi":"10.1109/ICCD.2008.4751897","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751897","url":null,"abstract":"This paper addresses the problem of chip level thermal profile estimation using runtime temperature sensor readings. We address the challenges of a) availability of only a few thermal sensors with constrained locations (sensors cannot be placed just anywhere) b) random on-chip power density characteristics due to unpredictable workloads and fabrication variability. Firstly we model the random power density as a probability density function. Given this random characteristic and runtime thermal sensor readings, we exploit the correlation between power dissipation of different chip modules to estimate the expected value of temperature at each chip location. Our methods are optimal if the underlying power density has Gaussian nature. We also present a heuristic to generate the chip level thermal profile estimates when the underlying randomness is non-Gaussian. Experimental results indicate that our method generates highly accurate thermal profile estimates of the entire chip at runtime using only a few thermal sensors.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127513075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Test cost minimization through adaptive test development 通过自适应测试开发最小化测试成本
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751867
Mingjing Chen, A. Orailoglu
The ever-increasing complexity of mixed-signal circuits imposes an increasingly complicated and comprehensive parametric test requirement, resulting in a highly lengthened manufacturing test phase. Attaining parametric test cost reduction with no test quality degradation constitutes a critical challenge during test development. The capability of parametric test data to capture systematic process variations engenders a highly accurate prediction of the efficiency of each test for a particular lot of chips even on the basis of a small quantity of characterized data. The predicted test efficiency further enables the adjustment of the test set and test order, leading to an early detection of faults. We explore such an adaptive strategy, by introducing a technique that prunes the test set based on a test correlation analysis. A test selection algorithm is proposed to identify the minimum set of tests that delivers a satisfactory defect coverage. A probabilistic measure that reflects the defect detection efficiency is used to order the test set so as to enhance the probability of an early detection of faulty chips. The test sequence is further optimized during the testing process by dynamically adjusting the initial test order to adapt to the local defect pattern fluctuations in the lot of chips under test. Experimental results show that the proposed technique delivers significant test time reductions while attaining higher test quality compared to previous adaptive test methodologies.
随着混合信号电路复杂性的不断增加,对参数测试的要求也越来越复杂和全面,导致制造测试阶段大大延长。在测试开发过程中,在不降低测试质量的情况下获得参数化测试成本是一个关键的挑战。参数测试数据捕捉系统过程变化的能力产生了对特定批次芯片的每次测试效率的高度准确的预测,即使是基于少量的特征数据。预测的测试效率可以进一步调整测试集和测试顺序,从而早期发现故障。我们通过引入一种基于测试相关性分析的测试集修剪技术来探索这种自适应策略。提出了一种测试选择算法来确定提供令人满意的缺陷覆盖率的最小测试集。采用反映缺陷检测效率的概率度量对测试集进行排序,以提高早期发现缺陷芯片的概率。在测试过程中,通过动态调整初始测试顺序来进一步优化测试顺序,以适应被测芯片批次中局部缺陷模式的波动。实验结果表明,与以前的自适应测试方法相比,该方法显著减少了测试时间,同时获得了更高的测试质量。
{"title":"Test cost minimization through adaptive test development","authors":"Mingjing Chen, A. Orailoglu","doi":"10.1109/ICCD.2008.4751867","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751867","url":null,"abstract":"The ever-increasing complexity of mixed-signal circuits imposes an increasingly complicated and comprehensive parametric test requirement, resulting in a highly lengthened manufacturing test phase. Attaining parametric test cost reduction with no test quality degradation constitutes a critical challenge during test development. The capability of parametric test data to capture systematic process variations engenders a highly accurate prediction of the efficiency of each test for a particular lot of chips even on the basis of a small quantity of characterized data. The predicted test efficiency further enables the adjustment of the test set and test order, leading to an early detection of faults. We explore such an adaptive strategy, by introducing a technique that prunes the test set based on a test correlation analysis. A test selection algorithm is proposed to identify the minimum set of tests that delivers a satisfactory defect coverage. A probabilistic measure that reflects the defect detection efficiency is used to order the test set so as to enhance the probability of an early detection of faulty chips. The test sequence is further optimized during the testing process by dynamically adjusting the initial test order to adapt to the local defect pattern fluctuations in the lot of chips under test. Experimental results show that the proposed technique delivers significant test time reductions while attaining higher test quality compared to previous adaptive test methodologies.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127890990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
Issue system protection mechanisms 发布系统保护机制
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751922
P. Chaparro, J. Abella, J. Carretero, X. Vera
Multi-core microprocessors require reducing the FIT (failures-in-time) rate per core drastically to enable a larger number of cores within a FIT budget. Since large arrays like caches and register flies are typically protected with either ECC or parity, the issue system becomes as one of the largest contributors to the core's FIT rate. Soft-errors are an important concern in contemporary microprocessors. Particle hits on the components of a processor are expected to create an increasing number of transient errors in each new microprocessor generation. In addition, the number of hard-errors in the field is expected to grow as burn-in becomes less effective. Moreover, the continuous device shrinking increases the likelihood of in-the-field failures due to rather small defects exacerbated by degradation. This paper proposes on-line mechanisms to detect and recover to a consistent state, classify and confine in-the-field errors in the issue system of both in-order and out-of-order cores. Such mechanisms provide high coverage at a small cost.
多核微处理器需要大幅降低每个核心的FIT(及时故障率),以便在FIT预算内实现更多的核心数量。由于像缓存和寄存器这样的大型数组通常使用ECC或奇偶校验进行保护,因此问题系统成为核心FIT率的最大贡献者之一。软误差是当代微处理器的一个重要问题。粒子对处理器组件的撞击预计会在每一代新的微处理器中产生越来越多的瞬态错误。此外,随着老化变得不那么有效,预计该领域的硬错误数量将会增加。此外,设备的持续收缩增加了由于退化而加剧的相当小的缺陷而导致现场失效的可能性。本文提出了一种在线机制来检测和恢复到一致状态,分类和限制问题系统的现场错误,无论是有序的还是失序的。这种机制以小成本提供高覆盖率。
{"title":"Issue system protection mechanisms","authors":"P. Chaparro, J. Abella, J. Carretero, X. Vera","doi":"10.1109/ICCD.2008.4751922","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751922","url":null,"abstract":"Multi-core microprocessors require reducing the FIT (failures-in-time) rate per core drastically to enable a larger number of cores within a FIT budget. Since large arrays like caches and register flies are typically protected with either ECC or parity, the issue system becomes as one of the largest contributors to the core's FIT rate. Soft-errors are an important concern in contemporary microprocessors. Particle hits on the components of a processor are expected to create an increasing number of transient errors in each new microprocessor generation. In addition, the number of hard-errors in the field is expected to grow as burn-in becomes less effective. Moreover, the continuous device shrinking increases the likelihood of in-the-field failures due to rather small defects exacerbated by degradation. This paper proposes on-line mechanisms to detect and recover to a consistent state, classify and confine in-the-field errors in the issue system of both in-order and out-of-order cores. Such mechanisms provide high coverage at a small cost.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127383191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2008 IEEE International Conference on Computer Design
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1