首页 > 最新文献

2009 IEEE Computer Society Annual Symposium on VLSI最新文献

英文 中文
A New Placement Algorithm for Reduction of Soft Errors in Macrocell Based Design of Nanometer Circuits 基于宏单元的纳米电路设计中减小软误差的新布局算法
Pub Date : 2009-05-13 DOI: 10.1109/ISVLSI.2009.37
K. Bhattacharya, N. Ranganathan
The rates of transient faults such as soft errors have been significantly impacted due to the aggressive scaling trends in the nanometer regime. In the past, several circuit optimization techniques have been proposed for preventing soft errors in logic circuits. These approaches include, inclusion of concurrent error detection circuits on selective nodes, selective gate sizing, dual-VDD assignment and selective node hardening at the transistor level. However, we show in this paper that larger wirelengths for nets can act as larger RC ladders and can effectively filter out the transient glitches due to radiation strikes. Based on this, we propose a simulated annealing based placement algorithm that significantly reduces the SER of logic circuits. We accurately capture the soft error masking effects by using a new metric called the {em logical observability}. The cost function for simulated annealing is modeled as the summation of the logical observability weighted with the netlength for each net, while simultaneously constraining the total area and the total wirelength. The algorithm tries to assign higher wirelengths for nets with low masking probability for higher glitch reduction, while maintaining low delay and area penalty for the overall circuit. Each placement configuration is represented as a sequence pair and the moves in the space of sequence pairs are probabilistically accepted depending upon the cost gradient and the iteration count. Higher cost moves have a higher probability of acceptance at initial iterations for better state space exploration, while at later iterations the algorithm greedily tries to minimize the cost. To the best of our knowledge, this is the first time that soft error rate reduction is attempted during the placement stage. The proposed algorithm has been implemented and validated on the ISCAS85 benchmarks. We have experimented using the FreePDK 45nm Process Design Kit and the OSU cell library which indicate that our radiation immune placement algorithm can significantly reduce the SER in logic circuits with very low overheads in delay and area.
由于纳米范围内的严重结垢趋势,软错误等瞬态故障的发生率受到了显著的影响。在过去,已经提出了几种电路优化技术来防止逻辑电路中的软错误。这些方法包括,在选择节点上包含并发错误检测电路,选择栅极尺寸,双vdd分配和晶体管级别的选择节点硬化。然而,我们在本文中表明,较大的无线网络可以作为较大的RC梯子,可以有效地过滤掉由于辐射打击造成的瞬态故障。基于此,我们提出了一种基于模拟退火的布局算法,可以显著降低逻辑电路的SER。我们通过使用一种称为{em逻辑可观察性}的新度量来准确捕获软错误掩蔽效应。模拟退火的代价函数建模为每个网络的逻辑可观察性与网长加权的总和,同时约束了总面积和总长度。该算法尝试为低掩蔽概率的网络分配更高的波长,以获得更高的小故障减少,同时保持整个电路的低延迟和面积惩罚。每个放置配置被表示为一个序列对,序列对空间中的移动概率取决于代价梯度和迭代计数。为了更好地探索状态空间,在初始迭代中,成本较高的移动有更高的接受概率,而在随后的迭代中,算法会贪婪地尝试最小化成本。据我们所知,这是第一次在放置阶段尝试降低软错误率。该算法已在ISCAS85基准测试中得到了实现和验证。我们使用FreePDK 45nm工艺设计套件和OSU细胞库进行了实验,结果表明我们的辐射免疫放置算法可以显著降低逻辑电路中的SER,并且延迟和面积开销非常低。
{"title":"A New Placement Algorithm for Reduction of Soft Errors in Macrocell Based Design of Nanometer Circuits","authors":"K. Bhattacharya, N. Ranganathan","doi":"10.1109/ISVLSI.2009.37","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.37","url":null,"abstract":"The rates of transient faults such as soft errors have been significantly impacted due to the aggressive scaling trends in the nanometer regime. In the past, several circuit optimization techniques have been proposed for preventing soft errors in logic circuits. These approaches include, inclusion of concurrent error detection circuits on selective nodes, selective gate sizing, dual-VDD assignment and selective node hardening at the transistor level. However, we show in this paper that larger wirelengths for nets can act as larger RC ladders and can effectively filter out the transient glitches due to radiation strikes. Based on this, we propose a simulated annealing based placement algorithm that significantly reduces the SER of logic circuits. We accurately capture the soft error masking effects by using a new metric called the {em logical observability}. The cost function for simulated annealing is modeled as the summation of the logical observability weighted with the netlength for each net, while simultaneously constraining the total area and the total wirelength. The algorithm tries to assign higher wirelengths for nets with low masking probability for higher glitch reduction, while maintaining low delay and area penalty for the overall circuit. Each placement configuration is represented as a sequence pair and the moves in the space of sequence pairs are probabilistically accepted depending upon the cost gradient and the iteration count. Higher cost moves have a higher probability of acceptance at initial iterations for better state space exploration, while at later iterations the algorithm greedily tries to minimize the cost. To the best of our knowledge, this is the first time that soft error rate reduction is attempted during the placement stage. The proposed algorithm has been implemented and validated on the ISCAS85 benchmarks. We have experimented using the FreePDK 45nm Process Design Kit and the OSU cell library which indicate that our radiation immune placement algorithm can significantly reduce the SER in logic circuits with very low overheads in delay and area.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126482646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Thermal-Assisted Spin Transfer Torque Memory (STT-RAM) Cell Design Exploration 热辅助自旋传递扭矩存储器(STT-RAM)电池设计探索
Pub Date : 2009-05-13 DOI: 10.1109/ISVLSI.2009.17
Hai Helen Li, Haiwen Xi, Yiran Chen, J. Stricklin, Xiaobin Wang, Tong Zhang
Thermal-assisted spin-transfer torque random access memory (STT-RAM) has been considered as a promising candidate of next-generation nonvolatile memory technology. We conducted finite element simulation on thermal dynamics in the programming process of thermal-assisted STT-RAM. Special attentions have been paid to the scalability and design space of the thermal-assist programming scheme by varying the memory element dimension and resistance-area product. We also provide systematic analysis and comparison between the thermal-assisted STT-RAM and standard STT-RAM. Discussions on the writeability and scalability of thermal-assisted STT-RAM are also conducted.
热辅助自旋传递扭矩随机存取存储器(STT-RAM)被认为是下一代非易失性存储技术的一个有前途的候选人。我们对热辅助STT-RAM编程过程中的热动力学进行了有限元模拟。通过改变存储单元的尺寸和电阻面积积,特别关注了热辅助编程方案的可扩展性和设计空间。我们还对热辅助STT-RAM和标准STT-RAM进行了系统的分析和比较。讨论了热辅助STT-RAM的可写性和可扩展性。
{"title":"Thermal-Assisted Spin Transfer Torque Memory (STT-RAM) Cell Design Exploration","authors":"Hai Helen Li, Haiwen Xi, Yiran Chen, J. Stricklin, Xiaobin Wang, Tong Zhang","doi":"10.1109/ISVLSI.2009.17","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.17","url":null,"abstract":"Thermal-assisted spin-transfer torque random access memory (STT-RAM) has been considered as a promising candidate of next-generation nonvolatile memory technology. We conducted finite element simulation on thermal dynamics in the programming process of thermal-assisted STT-RAM. Special attentions have been paid to the scalability and design space of the thermal-assist programming scheme by varying the memory element dimension and resistance-area product. We also provide systematic analysis and comparison between the thermal-assisted STT-RAM and standard STT-RAM. Discussions on the writeability and scalability of thermal-assisted STT-RAM are also conducted.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122317513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Power-Efficient Body-Coupled Self-Cascode LC Oscillator for Low-Power Injection-Locked Transmitter Applications 低功率注入锁定发射机应用的高效体耦合自级联LC振荡器
Pub Date : 2009-05-13 DOI: 10.1109/ISVLSI.2009.14
M. Haider, S. Islam
Improving the efficiency and the power consumption of a RF transmitter in a wireless sensor network is a major design challenge. Unlike conventional transmitter architectures, injection-locked transmitter (ILTX) provides high efficiency with reduced power consumption. The core block of an ILTX is an injection-locked oscillator. Therefore this paper reports a low-voltage low-power injection-locked LC oscillator employing self-cascode structure and body-terminal coupling. Self-cascode structure of the oscillator provides low-voltage and low-power operation while body terminal coupling facilitates low-power operation without degrading voltage headroom. The proposed oscillator has been fabricated using 0.18-μm RF CMOS process. Measurement results indicate that the designed oscillator can operate with a supply voltage as low as 1 V and exhibits 36.75 MHz locking range for an injection signal of 1.29 GHz at 0 dBm. The core oscillator consumes only 2 mW of power that makes the proposed design highly suitable for low-power transmitter applications.
提高无线传感器网络中射频发射机的效率和功耗是一个主要的设计挑战。与传统的发射机架构不同,注入锁定发射机(ILTX)在降低功耗的同时提供高效率。ILTX的核心模块是一个注入锁定振荡器。为此,本文报道了一种采用自级联结构和体端耦合的低压低功率注入锁定LC振荡器。振荡器的自级联结构提供了低电压和低功耗工作,而体端耦合则在不降低电压净空的情况下实现了低功耗工作。该振荡器采用0.18 μm RF CMOS工艺制备。测量结果表明,所设计的振荡器可以在低至1 V的电源电压下工作,在0 dBm的1.29 GHz注入信号下具有36.75 MHz的锁定范围。核心振荡器仅消耗2 mW的功率,这使得所提出的设计非常适合低功率发射机应用。
{"title":"Power-Efficient Body-Coupled Self-Cascode LC Oscillator for Low-Power Injection-Locked Transmitter Applications","authors":"M. Haider, S. Islam","doi":"10.1109/ISVLSI.2009.14","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.14","url":null,"abstract":"Improving the efficiency and the power consumption of a RF transmitter in a wireless sensor network is a major design challenge. Unlike conventional transmitter architectures, injection-locked transmitter (ILTX) provides high efficiency with reduced power consumption. The core block of an ILTX is an injection-locked oscillator. Therefore this paper reports a low-voltage low-power injection-locked LC oscillator employing self-cascode structure and body-terminal coupling. Self-cascode structure of the oscillator provides low-voltage and low-power operation while body terminal coupling facilitates low-power operation without degrading voltage headroom. The proposed oscillator has been fabricated using 0.18-μm RF CMOS process. Measurement results indicate that the designed oscillator can operate with a supply voltage as low as 1 V and exhibits 36.75 MHz locking range for an injection signal of 1.29 GHz at 0 dBm. The core oscillator consumes only 2 mW of power that makes the proposed design highly suitable for low-power transmitter applications.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134293852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
High Performance Non-blocking Switch Design in 3D Die-Stacking Technology 三维模堆技术中的高性能无阻塞开关设计
Pub Date : 2009-05-13 DOI: 10.1109/ISVLSI.2009.53
D. L. Lewis, S. Yalamanchili, H. Lee
Die stacking is a promising new technology that enables integration of devices in the third dimension. It allows the stacking of multiple active layers directly on top of one another with short, dense die-to-die vias providing communication. Previous work has shown significant bene¿ts at all design targets, from stacking memory on logic to partitioning individual architectural units across multiple layers. Many high-speed processor units—ALUs, register ¿les, caches, and instruction schedulers—have all been designed in 3D, achieving signi¿cant, simultaneous power savings and performance boosts. Other work has looked at the implementation of network-on-chip in a die stack but restricted the focus to planar designs of the various unit(processors, routers, etc.). This work follows up on these two re-search areas to explore the 3D design of router components, speci¿cally the crossbar. We examine the implementation of a crossbar and two multistage interconnect networks to determine the potential bene¿ts of 3D implementations. Compared to equivalent planar designs,we achieve a maximum delay reduction of 26% and maximum power savings of 24%.
芯片堆叠技术是一种很有前途的新技术,可以实现器件在三维空间的集成。它允许多个活动层直接堆叠在另一个层的顶部,通过短而密集的模对模通道提供通信。以前的工作已经在所有设计目标上显示了显著的好处,从在逻辑上堆叠内存到跨多层划分单个架构单元。许多高速处理器单元——alu、寄存器、缓存和指令调度器——都采用了3D设计,实现了显著的节能和性能提升。其他工作着眼于在芯片堆栈中实现片上网络,但将重点限制在各种单元(处理器,路由器等)的平面设计上。本工作在这两个研究领域的基础上,探索路由器部件的三维设计,特别是横杆的三维设计。我们研究了交叉杆和两个多级互连网络的实现,以确定3D实现的潜在好处。与同等平面设计相比,我们实现了最大延迟减少26%和最大功耗节省24%。
{"title":"High Performance Non-blocking Switch Design in 3D Die-Stacking Technology","authors":"D. L. Lewis, S. Yalamanchili, H. Lee","doi":"10.1109/ISVLSI.2009.53","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.53","url":null,"abstract":"Die stacking is a promising new technology that enables integration of devices in the third dimension. It allows the stacking of multiple active layers directly on top of one another with short, dense die-to-die vias providing communication. Previous work has shown significant bene¿ts at all design targets, from stacking memory on logic to partitioning individual architectural units across multiple layers. Many high-speed processor units—ALUs, register ¿les, caches, and instruction schedulers—have all been designed in 3D, achieving signi¿cant, simultaneous power savings and performance boosts. Other work has looked at the implementation of network-on-chip in a die stack but restricted the focus to planar designs of the various unit(processors, routers, etc.). This work follows up on these two re-search areas to explore the 3D design of router components, speci¿cally the crossbar. We examine the implementation of a crossbar and two multistage interconnect networks to determine the potential bene¿ts of 3D implementations. Compared to equivalent planar designs,we achieve a maximum delay reduction of 26% and maximum power savings of 24%.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131857922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
A High-Speed GCD Chip: A Case Study in Asynchronous Design 高速GCD芯片:异步设计案例研究
Pub Date : 2009-05-13 DOI: 10.1109/ISVLSI.2009.47
Gennette Gill, John Hansen, Ankur Agiwal, L. Vicci, Montek Singh
This paper presents the design of a greatest common divisor (GCD) chip as a case study in asynchronous or clockless design.  The design uses fine-grain asynchronous pipelining to achieve fairly high performance.  At the same time, the use of robust asynchronous handshaking in lieu of clocking allows the design to gracefully adapt its operation to voltage and temperature variations, without the need for clock recalibration.The design was fabricated in a 0.13$mu$m CMOS process, using standard cells and with full testability support.  Resulting chips were evaluated for performance and robustness, using a large set of test vectors for good fault coverage.  Under nominal operating conditions (1.5V and 27C), the fabricated parts were able to deliver up to 8 giga GCD algorithmic iterations per second (equivalent to 1 GHz clock speed).  Moreover, they were functionally correct across a wide range of voltages  (0.5V to 4V) and temperatures (-45C to 150C).  This case study bolsters our confidence in the potential of aynchronous design techniques to help produce reliable ASICS that are fast, testable, and that operate under a wide range of conditions.
本文介绍了一种最大公约数(GCD)芯片的设计,作为异步或无时钟设计的案例研究。该设计采用细粒度异步流水线来实现相当高的性能。同时,使用鲁棒异步握手代替时钟,使设计能够优雅地适应电压和温度变化,而无需重新校准时钟。该设计是在0.13$mu$m CMOS工艺中制造的,使用标准电池并具有完全的可测试性支持。结果芯片的性能和鲁棒性被评估,使用大量的测试向量来获得良好的故障覆盖率。在标准工作条件下(1.5V和27C),制造的部件能够提供高达每秒8千兆GCD算法迭代(相当于1 GHz时钟速度)。此外,它们在很宽的电压范围(0.5V至4V)和温度范围(-45℃至150℃)内都是功能正确的。本案例研究增强了我们对异步设计技术潜力的信心,有助于生产快速、可测试、可在各种条件下运行的可靠ASICS。
{"title":"A High-Speed GCD Chip: A Case Study in Asynchronous Design","authors":"Gennette Gill, John Hansen, Ankur Agiwal, L. Vicci, Montek Singh","doi":"10.1109/ISVLSI.2009.47","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.47","url":null,"abstract":"This paper presents the design of a greatest common divisor (GCD) chip as a case study in asynchronous or clockless design.  The design uses fine-grain asynchronous pipelining to achieve fairly high performance.  At the same time, the use of robust asynchronous handshaking in lieu of clocking allows the design to gracefully adapt its operation to voltage and temperature variations, without the need for clock recalibration.The design was fabricated in a 0.13$mu$m CMOS process, using standard cells and with full testability support.  Resulting chips were evaluated for performance and robustness, using a large set of test vectors for good fault coverage.  Under nominal operating conditions (1.5V and 27C), the fabricated parts were able to deliver up to 8 giga GCD algorithmic iterations per second (equivalent to 1 GHz clock speed).  Moreover, they were functionally correct across a wide range of voltages  (0.5V to 4V) and temperatures (-45C to 150C).  This case study bolsters our confidence in the potential of aynchronous design techniques to help produce reliable ASICS that are fast, testable, and that operate under a wide range of conditions.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134320790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Energy-Efficient Encoding for High-Performance Buses with Staggered Repeaters 交错中继器高性能总线的节能编码
Pub Date : 2009-05-13 DOI: 10.1109/ISVLSI.2009.58
S. Jayaprakash, N. Mahapatra
High-performance buses often use staggered repeaters to mitigate the adverse impact on latency of worst-case capacitive crosstalk between adjacent wires by exploiting the data-dependent nature of crosstalk. An undesirable side effect of staggered repeaters is that they may increase the overall energy of a bus carrying highly correlated traffic associated with real-world benchmarks. In this paper, we introduce an energy model for a staggered-repeater bus (SRB)configuration and propose a low-power dynamic encoding scheme that yields average bus energy reductions for an SRB in excess of 28% and 26% for data and instruction traffic, respectively, for SPEC CPU2k benchmarks.
高性能总线通常使用交错中继器,通过利用串扰的数据依赖性来减轻相邻导线之间最坏情况下电容串扰对延迟的不利影响。交错中继器的一个不受欢迎的副作用是,它们可能会增加承载与现实基准相关的高度相关流量的总线的总能量。在本文中,我们介绍了交错中继总线(SRB)配置的能量模型,并提出了一种低功耗动态编码方案,该方案在SPEC CPU2k基准测试中,对于数据和指令流量,SRB的平均总线能量减少分别超过28%和26%。
{"title":"Energy-Efficient Encoding for High-Performance Buses with Staggered Repeaters","authors":"S. Jayaprakash, N. Mahapatra","doi":"10.1109/ISVLSI.2009.58","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.58","url":null,"abstract":"High-performance buses often use staggered repeaters to mitigate the adverse impact on latency of worst-case capacitive crosstalk between adjacent wires by exploiting the data-dependent nature of crosstalk. An undesirable side effect of staggered repeaters is that they may increase the overall energy of a bus carrying highly correlated traffic associated with real-world benchmarks. In this paper, we introduce an energy model for a staggered-repeater bus (SRB)configuration and propose a low-power dynamic encoding scheme that yields average bus energy reductions for an SRB in excess of 28% and 26% for data and instruction traffic, respectively, for SPEC CPU2k benchmarks.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129473745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Testing Circuit-Partitioned 3D IC Designs 测试电路分区3D集成电路设计
Pub Date : 2009-05-13 DOI: 10.1109/ISVLSI.2009.48
D. L. Lewis, H. Lee
3D integration is an emerging technology that allows for the vertical stacking of multiple silicon die. These stacked die are tightly integrated with through-silicon vias and promise significant power and area reductions by replacing long global wires with short vertical connections. This technology necessitates that neighboring logical blocks exist on different layers in the stack. However, such functional partitions disable intra-chip communication pre-bond and thus disrupt traditional test techniques.Previous work has described a general test architecture that enables pre-bond testability of an architecturally partitioned 3D processor and provided mechanisms for basic layer functionality. This work proposes new test methods for designs partitioned at the circuits level,in which the gates and transistors of individual circuits could be split across multiple die layers. We investigated a bit-partitioned adder unit and a port-split register file, which represents the most difficult circuit-partitioned design to test pre-bond but which is used widely in many circuits. Two layouts of each circuit, planar and 3D, are produced. Our experiments verify the performance and power results and examine the test coverage achieved.
3D集成是一种新兴技术,它允许多个硅芯片垂直堆叠。这些堆叠的芯片与硅通孔紧密集成,通过用短垂直连接取代长全球线,可以显著降低功耗和面积。这种技术要求相邻的逻辑块存在于堆栈的不同层上。然而,这种功能分区禁用芯片内通信预键,从而破坏了传统的测试技术。之前的工作描述了一种通用的测试架构,可以实现架构分区3D处理器的键前测试性,并提供了基本层功能的机制。这项工作提出了新的测试方法,用于在电路级划分的设计,其中单个电路的门和晶体管可以在多个芯片层上分割。我们研究了位分割加法器单元和端口分割寄存器文件,它们代表了最困难的电路分割设计来测试预键,但在许多电路中广泛使用。每个电路有平面和三维两种布局。我们的实验验证了性能和功耗结果,并检查了所实现的测试覆盖率。
{"title":"Testing Circuit-Partitioned 3D IC Designs","authors":"D. L. Lewis, H. Lee","doi":"10.1109/ISVLSI.2009.48","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.48","url":null,"abstract":"3D integration is an emerging technology that allows for the vertical stacking of multiple silicon die. These stacked die are tightly integrated with through-silicon vias and promise significant power and area reductions by replacing long global wires with short vertical connections. This technology necessitates that neighboring logical blocks exist on different layers in the stack. However, such functional partitions disable intra-chip communication pre-bond and thus disrupt traditional test techniques.Previous work has described a general test architecture that enables pre-bond testability of an architecturally partitioned 3D processor and provided mechanisms for basic layer functionality. This work proposes new test methods for designs partitioned at the circuits level,in which the gates and transistors of individual circuits could be split across multiple die layers. We investigated a bit-partitioned adder unit and a port-split register file, which represents the most difficult circuit-partitioned design to test pre-bond but which is used widely in many circuits. Two layouts of each circuit, planar and 3D, are produced. Our experiments verify the performance and power results and examine the test coverage achieved.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121458139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
A Low Cost Low Power Quaternary LUT Cell for Fault Tolerant Applications in Future Technologies 一种低成本、低功耗的四元LUT单元,用于未来技术的容错应用
Pub Date : 2009-05-13 DOI: 10.1109/ISVLSI.2009.34
E. Rhod, L. Carro
Field Programmable Gate Arrays offer flexibility to program hardware systems together with the possibility to explore any level of parallelism available in the application. Unfortunately, this flexibility costs a huge amount of circuit area necessary to implement all the routing switches and wires. Also, device scaling in new and future technologies brings along a severe increase in the soft error rate of circuits, for combinational and sequential logic. In order to reduce the impact of the wires and switches and cope with SETs in FPGAs, this work proposes a low power voltage-mode quaternary LUT (QLUT) design that uses quaternary logic to reduce the area spent in switches and routing wires. At the same time, the proposed QLUT provides robustness against SETs. Results show that the fault tolerant QLU There proposed detects all faults that can cause an error with significant less area and less power when comparing to the binary correspondent LUT protected with the DWC technique. In order to evaluate how the proposed QLUT will deal with the process variability of sub 90nm technologies, extensive Monte Carlo simulations were performed and these results are here discussed.
现场可编程门阵列为编程硬件系统提供了灵活性,并有可能探索应用程序中可用的任何并行级别。不幸的是,这种灵活性需要大量的电路面积来实现所有路由开关和电线。此外,新技术和未来技术中的器件缩放带来了电路软错误率的严重增加,用于组合和顺序逻辑。为了减少电线和开关的影响并应对fpga中的set,本工作提出了一种低功率电压模式四元LUT (QLUT)设计,该设计使用四元逻辑来减少交换机和布线电线所花费的面积。同时,提出的QLUT提供了对集合的鲁棒性。结果表明,与采用DWC技术保护的二进制对应LUT相比,所提出的容错QLU能够以更小的面积和更低的功耗检测到所有可能导致错误的故障。为了评估所提出的QLUT将如何处理亚90nm技术的工艺变化,进行了大量的蒙特卡罗模拟,并在这里讨论这些结果。
{"title":"A Low Cost Low Power Quaternary LUT Cell for Fault Tolerant Applications in Future Technologies","authors":"E. Rhod, L. Carro","doi":"10.1109/ISVLSI.2009.34","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.34","url":null,"abstract":"Field Programmable Gate Arrays offer flexibility to program hardware systems together with the possibility to explore any level of parallelism available in the application. Unfortunately, this flexibility costs a huge amount of circuit area necessary to implement all the routing switches and wires. Also, device scaling in new and future technologies brings along a severe increase in the soft error rate of circuits, for combinational and sequential logic. In order to reduce the impact of the wires and switches and cope with SETs in FPGAs, this work proposes a low power voltage-mode quaternary LUT (QLUT) design that uses quaternary logic to reduce the area spent in switches and routing wires. At the same time, the proposed QLUT provides robustness against SETs. Results show that the fault tolerant QLU There proposed detects all faults that can cause an error with significant less area and less power when comparing to the binary correspondent LUT protected with the DWC technique. In order to evaluate how the proposed QLUT will deal with the process variability of sub 90nm technologies, extensive Monte Carlo simulations were performed and these results are here discussed.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122026322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Hardware Design of the H.264/AVC Variable Block Size Motion Estimation for Real-Time 1080HD Video Encoding 实时1080HD视频编码中H.264/AVC可变块大小运动估计的硬件设计
Pub Date : 2009-05-13 DOI: 10.1109/ISVLSI.2009.11
R. Porto, L. Agostini, S. Bampi
Amongst the video compression standards, the latest one is the H.264/AVC. This standard reaches the highest compression rates when compared to the previous standards. On the other hand, it has a high computational complexity. This high computational complexity makes it difficult the development of software applications running in a current processor when high definitions videos are considered. Thus, hardware implementations become essential. Addressing the hardware architectures, this work presents the architectural design for the variable block size motion estimation (VBSME) defined in the H.264/AVC standard. This architecture is based on full search motion estimation algorithm and SAD calculation. This architecture is able to produce the 41 motion vectors within a macroblock as specified in the standard. The implementation of this architecture was based on standard cell methodology in 0.18μm CMOS technology. The architecture reached a throughput of 34 1080HD frames per second.
在视频压缩标准中,最新的是H.264/AVC。与以前的标准相比,该标准达到了最高的压缩率。另一方面,它具有较高的计算复杂度。当考虑高清晰度视频时,这种高计算复杂性使得在当前处理器中运行的软件应用程序的开发变得困难。因此,硬件实现变得至关重要。针对硬件架构,本文提出了H.264/AVC标准中定义的可变块大小运动估计(VBSME)的架构设计。该结构基于全搜索运动估计算法和SAD计算。该体系结构能够在标准中指定的宏块内产生41个运动向量。该架构的实现基于0.18μm CMOS技术的标准单元方法。该架构达到了每秒34个1080HD帧的吞吐量。
{"title":"Hardware Design of the H.264/AVC Variable Block Size Motion Estimation for Real-Time 1080HD Video Encoding","authors":"R. Porto, L. Agostini, S. Bampi","doi":"10.1109/ISVLSI.2009.11","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.11","url":null,"abstract":"Amongst the video compression standards, the latest one is the H.264/AVC. This standard reaches the highest compression rates when compared to the previous standards. On the other hand, it has a high computational complexity. This high computational complexity makes it difficult the development of software applications running in a current processor when high definitions videos are considered. Thus, hardware implementations become essential. Addressing the hardware architectures, this work presents the architectural design for the variable block size motion estimation (VBSME) defined in the H.264/AVC standard. This architecture is based on full search motion estimation algorithm and SAD calculation. This architecture is able to produce the 41 motion vectors within a macroblock as specified in the standard. The implementation of this architecture was based on standard cell methodology in 0.18μm CMOS technology. The architecture reached a throughput of 34 1080HD frames per second.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129265900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
A Low-power Low-cost Optical Router for Optical Networks-on-Chip in Multiprocessor Systems-on-Chip 用于多处理器片上系统的片上光网络的低功耗低成本光路由器
Pub Date : 2009-05-13 DOI: 10.1109/ISVLSI.2009.19
Huaxi Gu, Mo Kwai Hung Morton, Jiang Xu, Wei Zhang
Networks-on-chip (NoCs) can improve the communication bandwidth and power efficiency of multiprocessor systems-on-chip (MPSoC). However, traditional metallic interconnects consume significant amount of power to deliver even higher communication bandwidth required in the near future. Optical NoCs are based on optical interconnects and optical routers, and have significant bandwidth and power advantages. This paper proposed a high-performance low-power low-cost optical router, Cygnus, for optical NoCs. Cygnus is non-blocking and based on silicon microresonators. We compared Cygnus with other microresonator-based routers, and analyzed their power consumption, optical power insertion loss, and the number of microresonators used in detail. The results show that Cygnus has the lowest power consumption and losses, and requires the lowest number of microresonators. For example, Cygnus has 50% less power consumption, 51% less optical power insertion loss, and 20% less microresonators than the optimized traditional optical crossbar router. Comparing to a high-performance 45nm electronic router, Cygnus consumes 96% less power. Moreover, the passive routing feature of Cygnus guarantees that, while using dimension order routing algorithm, the maximum power consumption to route a packet through a network is a small constant number, regardless of the network size. For example, the maximum power consumption is 4.80fJ/bit under current technologies. We simulated and analyzed an 8x8 2D mesh NoC built from Cygnus and showed the end-to-end delay and network throughput under different offered loads and packet sizes.
片上网络(noc)可以提高多处理器片上系统(MPSoC)的通信带宽和功耗效率。然而,在不久的将来,传统的金属互连需要消耗大量的功率来提供更高的通信带宽。光noc基于光互连和光路由器,具有显著的带宽和功耗优势。本文提出了一种高性能、低功耗、低成本的光路由器Cygnus。天鹅座是无阻塞的,基于硅微谐振器。我们将Cygnus与其他基于微谐振器的路由器进行了比较,并详细分析了它们的功耗、光功率插入损耗和使用的微谐振器数量。结果表明,Cygnus具有最低的功耗和损耗,并且需要最少的微谐振器数量。例如,与优化后的传统光交叉条路由器相比,Cygnus的功耗降低50%,光功率插入损耗降低51%,微谐振器减少20%。与高性能的45纳米电子路由器相比,Cygnus的功耗降低了96%。此外,Cygnus的被动路由特性保证了在使用维序路由算法时,无论网络大小如何,通过网络路由数据包的最大功耗都是一个很小的常数。例如,在现有技术下,最大功耗为4.80fJ/bit。我们模拟并分析了基于Cygnus构建的8x8 2D mesh NoC,并展示了在不同提供的负载和数据包大小下的端到端延迟和网络吞吐量。
{"title":"A Low-power Low-cost Optical Router for Optical Networks-on-Chip in Multiprocessor Systems-on-Chip","authors":"Huaxi Gu, Mo Kwai Hung Morton, Jiang Xu, Wei Zhang","doi":"10.1109/ISVLSI.2009.19","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.19","url":null,"abstract":"Networks-on-chip (NoCs) can improve the communication bandwidth and power efficiency of multiprocessor systems-on-chip (MPSoC). However, traditional metallic interconnects consume significant amount of power to deliver even higher communication bandwidth required in the near future. Optical NoCs are based on optical interconnects and optical routers, and have significant bandwidth and power advantages. This paper proposed a high-performance low-power low-cost optical router, Cygnus, for optical NoCs. Cygnus is non-blocking and based on silicon microresonators. We compared Cygnus with other microresonator-based routers, and analyzed their power consumption, optical power insertion loss, and the number of microresonators used in detail. The results show that Cygnus has the lowest power consumption and losses, and requires the lowest number of microresonators. For example, Cygnus has 50% less power consumption, 51% less optical power insertion loss, and 20% less microresonators than the optimized traditional optical crossbar router. Comparing to a high-performance 45nm electronic router, Cygnus consumes 96% less power. Moreover, the passive routing feature of Cygnus guarantees that, while using dimension order routing algorithm, the maximum power consumption to route a packet through a network is a small constant number, regardless of the network size. For example, the maximum power consumption is 4.80fJ/bit under current technologies. We simulated and analyzed an 8x8 2D mesh NoC built from Cygnus and showed the end-to-end delay and network throughput under different offered loads and packet sizes.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"165 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115488711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 154
期刊
2009 IEEE Computer Society Annual Symposium on VLSI
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1