2011 IEEE 29th International Conference on Computer Design (ICCD)最新文献

英文中文

Memristor-based IMPLY logic design procedure 基于记忆电阻的隐式逻辑设计程序

2011 IEEE 29th International Conference on Computer Design (ICCD)

Pub Date : 2011-10-09 DOI: 10.1109/ICCD.2011.6081389

Shahar Kvatinsky, A. Kolodny, U. Weiser, E. Friedman

Memristors can be used as logic gates. No design methodology exists, however, for memristor-based combinatorial logic. In this paper, the design and behavior of a memristive-based logic gate - an IMPLY gate - are presented and design issues such as the tradeoff between speed (fast write times) and correct logic behavior are described, as part of an overall design methodology. A memristor model is described for determining the write time and state drift. It is shown that the widely used memristor model - a linear ion drift memristor - is impractical for characterizing an IMPLY logic gate, and a different memristor model is necessary such as a memristor with a current threshold.

忆阻器可用作逻辑门。然而，对于基于忆阻器的组合逻辑，没有设计方法存在。在本文中，介绍了基于记忆的逻辑门的设计和行为，并描述了设计问题，如速度(快速写入时间)和正确的逻辑行为之间的权衡，作为整体设计方法的一部分。描述了一种用于确定写入时间和状态漂移的忆阻器模型。结果表明，目前广泛使用的线性离子漂移忆阻器模型在表征隐式逻辑门时是不现实的，需要一种不同的忆阻器模型，如带电流阈值的忆阻器。

引用次数: 147

RoShaQ: High-performance on-chip router with shared queues RoShaQ:具有共享队列的高性能片上路由器

2011 IEEE 29th International Conference on Computer Design (ICCD)

Pub Date : 2011-10-09 DOI: 10.1109/ICCD.2011.6081402

A. Tran, B. Baas

On-chip router typically has buffers dedicated to its input or output ports for temporarily storing packets in case contention occurs on output physical channels. Buffers, unfortunately, consume significant portions of router area and power. While running a traffic trace, however, not all input ports of routers have incoming packets needed to be transferred at the same time. As a result, a large number of buffer queues in the network are empty while other queues are mostly busy. This observation motivates us to design RoShaQ, a router architecture that maximizes buffer utilization by allowing to share multiple buffer queues among input ports. Sharing queues, in fact, makes using buffers more efficient hence is able to achieve higher throughput when the network load becomes heavy. On the other side, at light traffic load, our router achieves low latency by allowing packets to effectively bypass these shared queues. Experimental results show that RoShaQ is 21% less latency and 14% higher saturation throughput than a typical virtual-channel (VC) router with 4% higher power and 16% larger area. Due to its higher performance, RoShaQ consumes 7% less energy per a transferred packet than a VC router given the same buffer space capacity.

片上路由器通常具有专用于其输入或输出端口的缓冲区，用于在输出物理通道上发生争用时临时存储数据包。不幸的是，缓冲区消耗了路由器的很大一部分面积和功率。然而，在运行流量跟踪时，并非所有路由器的输入端口都有需要同时传输的传入数据包。因此，网络中大量的缓冲队列是空的，而其他队列大多是忙的。这种观察促使我们设计RoShaQ，这是一种路由器架构，通过允许在输入端口之间共享多个缓冲区队列来最大化缓冲区利用率。实际上，共享队列可以提高缓冲区的使用效率，因此在网络负载变重时能够实现更高的吞吐量。另一方面，在轻流量负载下，我们的路由器通过允许数据包有效地绕过这些共享队列来实现低延迟。实验结果表明，与典型的虚拟通道(VC)路由器相比，RoShaQ延迟减少21%，饱和吞吐量提高14%，功耗提高4%，面积增加16%。由于其更高的性能，在相同的缓冲空间容量下，RoShaQ每传输一个数据包消耗的能量比VC路由器少7%。

{"title":"RoShaQ: High-performance on-chip router with shared queues","authors":"A. Tran, B. Baas","doi":"10.1109/ICCD.2011.6081402","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081402","url":null,"abstract":"On-chip router typically has buffers dedicated to its input or output ports for temporarily storing packets in case contention occurs on output physical channels. Buffers, unfortunately, consume significant portions of router area and power. While running a traffic trace, however, not all input ports of routers have incoming packets needed to be transferred at the same time. As a result, a large number of buffer queues in the network are empty while other queues are mostly busy. This observation motivates us to design RoShaQ, a router architecture that maximizes buffer utilization by allowing to share multiple buffer queues among input ports. Sharing queues, in fact, makes using buffers more efficient hence is able to achieve higher throughput when the network load becomes heavy. On the other side, at light traffic load, our router achieves low latency by allowing packets to effectively bypass these shared queues. Experimental results show that RoShaQ is 21% less latency and 14% higher saturation throughput than a typical virtual-channel (VC) router with 4% higher power and 16% larger area. Due to its higher performance, RoShaQ consumes 7% less energy per a transferred packet than a VC router given the same buffer space capacity.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127076483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 49

A simple pipelined squaring circuit for DSP 一个简单的流水线平方电路的DSP

2011 IEEE 29th International Conference on Computer Design (ICCD)

Pub Date : 2011-10-09 DOI: 10.1109/ICCD.2011.6081392

V. Risojevic, A. Avramović, Z. Babic, P. Bulić

There are many digital signal processing applications where a shorter time delay of algorithms and efficient implementations are more important than accuracy. Since squaring is one of the fundamental operations widely used in digital signal processing algorithms, approximate squaring is proposed. We present a simple way of approximate squaring that allows achieving a desired accuracy. The proposed method uses the same simple combinational logic for the first approximation and correction terms. Performed analysis for various bit-length operands and level of approximation showed that maximum relative errors and average relative errors decrease significantly by adding more correction terms. The proposed squaring method can be implemented with a great level of parallelism. The pipelined implementation is also proposed in this paper. The proposed squarer achieved significant savings in area and power when compared to multiplier based squarer. As an example, an analysis of the impact of Euclidean distance calculation by approximate squaring on image retrieval is performed.

在许多数字信号处理应用中，较短的算法延迟和有效的实现比精度更重要。由于平方是数字信号处理算法中广泛使用的基本运算之一，因此提出了近似平方。我们提出了一种简单的近似平方方法，可以达到期望的精度。提出的方法对第一近似项和修正项使用相同的简单组合逻辑。对不同位长度操作数和近似水平的分析表明，通过增加校正项，最大相对误差和平均相对误差显著降低。所提出的平方方法能够以较高的并行度实现。本文还提出了流水线化的实现方法。与基于乘数的平方器相比，所提出的平方器在面积和功率方面实现了显着的节省。作为实例，分析了近似平方法计算欧氏距离对图像检索的影响。

引用次数: 9

Hardware Trojans: The defense and attack of integrated circuits 硬件木马:集成电路的防御和攻击

2011 IEEE 29th International Conference on Computer Design (ICCD)

Pub Date : 2011-10-09 DOI: 10.1109/ICCD.2011.6081412

Trey Reece, W. H. Robinson

Despite the increasing threat of hardware Trojans in modern circuits, there is currently a lack of Trojans available with which to test anti-Trojan techniques. This paper presents both a defensive technique designed to aid in the detection of Trojans in integrated circuits, as well as a design ideology to confound similar defensive techniques. The defense structure was awarded 2nd place in the 2009 Embedded Systems Challenge (ESC) competition, and Trojans following the attack ideology were submitted to the same competition the following year.

尽管硬件木马在现代电路中的威胁越来越大，但目前缺乏可用的木马来测试反木马技术。本文提出了一种旨在帮助检测集成电路中的木马的防御技术，以及一种混淆类似防御技术的设计思想。该防御结构在2009年嵌入式系统挑战赛(ESC)竞赛中获得第二名，并在第二年提交了遵循攻击思想的木马程序。

引用次数: 9

The DIMM tree architecture: A high bandwidth and scalable memory system 内存树结构:高带宽和可扩展的内存系统

2011 IEEE 29th International Conference on Computer Design (ICCD)

Pub Date : 2011-10-09 DOI: 10.1109/ICCD.2011.6081428

Kanit Therdsteerasukdi, Gyungsu Byun, J. Ir, Glenn D. Reinman, J. Cong, Mau-Chung Frank Chang

The demand for capacity and off-chip bandwidth to DRAM will continue to grow as we integrate more cores onto a die. However, as the data rate of DRAM has increased, the number of DIMMs supported on a multi-drop bus has decreased. Therefore, traditional memory systems are not sufficient to meet both these demands. We propose the DIMM tree architecture for better scalability by connecting the DIMMs as a tree. The DIMM tree architecture is able to grow the number of DIMMs exponentially with each level of latency in the tree. We also propose application of Multiband Radio Frequency Interconnect (MRF-I) to the DIMM tree architecture for even greater scalability and higher throughput. The DIMM tree architecture without MRF-I was able to scale up to 64 DIMMs with only an 8% degradation in throughput over an ideal system. The DIMM tree architecture with MRF-I was able to increase throughput by 68% (up to 200%) on a 64-DIMM system over a 4-DIMM system.

随着我们将更多的内核集成到一个芯片上，对DRAM容量和片外带宽的需求将继续增长。然而，随着DRAM数据速率的提高，支持多滴总线的dimm数量减少了。因此，传统的存储系统不足以满足这两种需求。我们提出了DIMM树架构，通过将DIMM连接成树来获得更好的可扩展性。DIMM树架构能够随着树中的每个延迟级别呈指数级增长DIMM数量。我们还建议将多频段射频互连(MRF-I)应用于DIMM树结构，以获得更大的可扩展性和更高的吞吐量。没有MRF-I的DIMM树架构能够扩展到64个DIMM，而吞吐量仅比理想系统降低8%。与4-DIMM系统相比，具有MRF-I的DIMM树架构能够将64-DIMM系统的吞吐量提高68%(最高200%)。

引用次数: 22

A reconfigurable fault-tolerant routing algorithm to optimize the network-on-chip performance and latency in presence of intermittent and permanent faults 一种可重构的容错路由算法，用于优化片上网络在间歇性和永久性故障下的性能和延迟

2011 IEEE 29th International Conference on Computer Design (ICCD)

Pub Date : 2011-10-09 DOI: 10.1109/ICCD.2011.6081436

Reyhaneh Jabbarvand Behrouz, M. Modarressi, H. Sarbazi-Azad

As the semiconductor industry advances to the deep sub-micron and nano technology points, the on-chip components are more prone to the defects during manufacturing and faults during system operation. Consequently, fault tolerant techniques are essential to improve the yield of modern complex chips. We propose a fault-tolerant routing algorithm that keeps the negative effect of faulty components on the NoC power and performance as low as possible. Targeting intermittent faults, we achieve fault tolerance by employing a simple and fast mechanism composed of two processes: NoC monitoring and route adaption. Experimental results show the effectiveness of the proposed technique, in that it offers lower average message latency and power consumption and a higher reliability, compared to some related work.

随着半导体工业向深亚微米和纳米技术发展，片上元件在制造过程中更容易出现缺陷，在系统运行过程中更容易出现故障。因此，容错技术对于提高现代复杂芯片的成品率至关重要。我们提出了一种容错路由算法，使故障组件对NoC功率和性能的负面影响尽可能低。针对间歇性故障，采用由NoC监测和路由自适应两个过程组成的简单快速的容错机制。实验结果表明了该技术的有效性，与一些相关工作相比，该技术具有更低的平均消息延迟和功耗，并且具有更高的可靠性。

引用次数: 3

Red team: Design of intelligent hardware trojans with known defense schemes 红队:设计具有已知防御方案的智能硬件木马

2011 IEEE 29th International Conference on Computer Design (ICCD)

Pub Date : 2011-10-09 DOI: 10.1109/ICCD.2011.6081416

Xuehui Zhang, Nicholas Tuzzio, M. Tehranipoor

In the past few years, several Trojan detection approaches have been developed to prevent the damages caused by Trojans, making Trojan insertion more and more difficult. As part of the Embedded Systems Challenge (ESC), we were given two different designs with two different Trojan detection methods, and we tried to design Trojans which could avoid detection. We developed Trojans that remain undetectable by the delay fingerprinting and ring-oscillator monitoring Trojan detection methods embedded into these benchmarks. Experimental results on a Xilinx FPGA demonstrate that most of our hardware Trojans were undetected using the inserted detection mechanisms.

在过去的几年里，为了防止木马造成的危害，已经开发了几种木马检测方法，使得木马的插入越来越困难。作为嵌入式系统挑战(ESC)的一部分，我们给出了两种不同的设计，采用两种不同的木马检测方法，我们试图设计可以避免检测的木马。我们开发的木马仍然无法检测到的延迟指纹和环形振荡器监测木马检测方法嵌入到这些基准。在Xilinx FPGA上的实验结果表明，使用插入式检测机制，大多数硬件木马都未被检测到。

引用次数: 12

Exploring the vulnerability of CMPs to soft errors with 3D stacked non-volatile memory 探讨三维堆叠非易失性存储器的cmp软错误脆弱性

2011 IEEE 29th International Conference on Computer Design (ICCD)

Pub Date : 2011-10-09 DOI: 10.1145/2491679

Guangyu Sun, E. Kursun, J. Rivers, Yuan Xie

Spin-transfer Torque Random Access Memory (STT-RAM) emerges for on-chip memory in microprocessor architectures. Thanks to the magnetic field based storage STT-RAM cells have immunity to radiation induced soft errors that affect electrical charge based data storage, which is a major challenge in SRAM based caches in current microprocessors. In this study we explore the soft error resilience benefits and design trade offs of 3D-stacked STT-RAM for multi-core architectures. We use 3D stacking as an enabler for modular integration of STT-RAM caches with minimum disruption in the baseline processor design flow, while providing further interconnectivity and capacity advantages. We take an in-depth look at alternative replacement schemes in terms of performance, power, temperature, and reliability trade-offs to capture the multi-variable optimization challenges microprocessor architectures face. We analyze and compare the characteristics of STT-RAM, SRAM, and DRAM alternatives for various levels of the cache hierarchy in terms of reliability.

自旋转移扭矩随机存取存储器(STT-RAM)出现在微处理器架构的片上存储器中。由于基于磁场的存储，STT-RAM单元对影响基于电荷的数据存储的辐射引起的软误差具有免疫力，这是当前微处理器中基于SRAM的缓存的主要挑战。在本研究中，我们探讨了用于多核架构的3d堆叠STT-RAM的软错误弹性优势和设计权衡。我们使用3D堆叠作为STT-RAM缓存模块化集成的推手，在基线处理器设计流程中最小的中断，同时提供进一步的互连性和容量优势。我们从性能、功耗、温度和可靠性方面深入研究替代方案，以捕捉微处理器架构面临的多变量优化挑战。在可靠性方面，我们分析和比较了STT-RAM、SRAM和DRAM替代方案在不同级别缓存层次结构中的特点。

引用次数: 24

Techniques for LI-BDN synthesis for hybrid microarchitectural simulation 混合微建筑模拟中LI-BDN合成技术

2011 IEEE 29th International Conference on Computer Design (ICCD)

Pub Date : 2011-10-09 DOI: 10.1109/ICCD.2011.6081405

Tyler S. Harris, Zhuo Ruan, D. Penry

Computer designers rely upon near-cycle-accurate microarchitectural simulation to explore the design space of new systems. Unfortunately, such simulators are becoming increasingly slow as systems become more complex. Hybrid simulators which offload some of the simulation work onto FPGAs can increase the speed; however, such simulators must be automatically synthesized or the time to design them becomes prohibitive. Furthermore, FPGA implementations of simulators may require multiple FPGA clock cycles to implement behavior that takes place within one simulated clock cycle, making correct arbitrary composition of simulator components impossible and limiting the amount of hardware concurrency which can be achieved. Latency-Insensitive Bounded Dataflow Networks (LI-BDNs) have been suggested as a means to permit composition of simulator components in FPGAs. However, previous work has required that LI-BDNs be created manually. This paper introduces techniques for automated synthesis of LI-BDNs from the processes of a System-C microarchitectural model. We demonstrate that LI-BDNs can be successfully synthesized. We also introduce a technique for reducing the overhead of LI-BDNs when the latency-insensitive property is unnecessary, resulting in up to a 60% reduction in FPGA resource requirements.

计算机设计师依靠近周期精确的微建筑模拟来探索新系统的设计空间。不幸的是，随着系统变得越来越复杂，这种模拟器变得越来越慢。混合仿真器将部分仿真工作转移到fpga上，可以提高速度;然而，这样的模拟器必须自动合成，否则设计它们的时间就会变得令人望而却步。此外，模拟器的FPGA实现可能需要多个FPGA时钟周期来实现在一个模拟时钟周期内发生的行为，这使得模拟器组件的正确任意组合成为不可能的，并且限制了可以实现的硬件并发数量。延迟不敏感的有界数据流网络(li - bdn)已被建议作为一种允许在fpga中组合模拟器组件的手段。然而，以前的工作需要手动创建li - bdn。本文介绍了从System-C微体系结构模型的过程中自动合成li - bdn的技术。我们证明了li - bdn是可以成功合成的。我们还介绍了一种技术，用于在不需要延迟不敏感属性时减少li - bdn的开销，从而使FPGA资源需求减少多达60%。

{"title":"Techniques for LI-BDN synthesis for hybrid microarchitectural simulation","authors":"Tyler S. Harris, Zhuo Ruan, D. Penry","doi":"10.1109/ICCD.2011.6081405","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081405","url":null,"abstract":"Computer designers rely upon near-cycle-accurate microarchitectural simulation to explore the design space of new systems. Unfortunately, such simulators are becoming increasingly slow as systems become more complex. Hybrid simulators which offload some of the simulation work onto FPGAs can increase the speed; however, such simulators must be automatically synthesized or the time to design them becomes prohibitive. Furthermore, FPGA implementations of simulators may require multiple FPGA clock cycles to implement behavior that takes place within one simulated clock cycle, making correct arbitrary composition of simulator components impossible and limiting the amount of hardware concurrency which can be achieved. Latency-Insensitive Bounded Dataflow Networks (LI-BDNs) have been suggested as a means to permit composition of simulator components in FPGAs. However, previous work has required that LI-BDNs be created manually. This paper introduces techniques for automated synthesis of LI-BDNs from the processes of a System-C microarchitectural model. We demonstrate that LI-BDNs can be successfully synthesized. We also introduce a technique for reducing the overhead of LI-BDNs when the latency-insensitive property is unnecessary, resulting in up to a 60% reduction in FPGA resource requirements.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"282 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120963477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Positive Davio-based synthesis algorithm for reversible logic 基于正david的可逆逻辑综合算法

2011 IEEE 29th International Conference on Computer Design (ICCD)

Pub Date : 2011-10-09 DOI: 10.1109/ICCD.2011.6081399

Yu Pang, Shaoquan Wang, Zhilong He, Jinzhao Lin, S. Sultana, K. Radecka

Reversible logic is a key technique for quantum computing so leading to low-power designs. However, current synthesis algorithms for reversible circuits are low efficiency and do not obtain optimized reversible circuits, so they are only applied to small logic functions. In this paper, we propose a new method based on positive Davio expansion to synthesize reversible circuits, which generates a positive Davio decision diagram for a logic function and transfers diagram nodes to reversible circuits. The algorithm has advantages of optimizing area and fast synthesis speed compared to BDD (Binary decision diagram) based and RM (Reed-Muller) based synthesis method, so it can be adapted for large functions.

可逆逻辑是量子计算的关键技术，可实现低功耗设计。然而，目前的可逆电路综合算法效率低，不能得到优化的可逆电路，只能应用于小的逻辑函数。本文提出了一种基于正Davio展开的可逆电路合成新方法，该方法生成逻辑函数的正Davio决策图，并将图节点转移到可逆电路中。与基于二进制决策图(Binary decision diagram, BDD)和基于Reed-Muller (RM)的合成方法相比，该算法具有面积优化和合成速度快的优点，可以适用于大函数。

引用次数: 21

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2011 IEEE 29th International Conference on Computer Design (ICCD)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀