2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC)最新文献

英文中文

Geyser-1: A MIPS R3000 CPU core with fine-grained run-time power gating Geyser-1: MIPS R3000 CPU内核，具有细粒度运行时电源门控

2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2010-01-18 DOI: 10.5555/1899721.1899808

D. Ikebuchi, N. Seki, Y. Kojima, M. Kamata, Lei Zhao, H. Amano, T. Shirai, S. Koyama, T. Hashida, Y. Umahashi, H. Masuda, K. Usami, S. Takeda, Hiroshi Nakamura, M. Namiki, Masaaki Kondo

Geyser-1 is a MIPS CPU which provides a fine-grained run-time power gating (PG) controlled by instructions. Unlike traditional PGs, it uses special standard cells in which the virtual ground (VGND) is separated from the real ground, and a certain number of the sleep transistors are inserted for quick power shut-down and wake-up. In Geyser-1, the fine-grained run-time PG is applied to computational modules in the execution stage. The power shut-down and wakeup are controlled with architectural and software level. This implementation is the first available CPU with this type of run-time PG technique. Geyser-1 has both time and spatial fine-grained PG and works well with a real chip.

Geyser-1是一个MIPS CPU，它提供了一个由指令控制的细粒度运行时功率门控(PG)。与传统的pg不同，它使用特殊的标准单元，其中虚拟地(VGND)与真实地分开，并插入一定数量的睡眠晶体管，用于快速断电和唤醒。在Geyser-1中，细粒度运行时PG应用于执行阶段的计算模块。电源的关闭和唤醒在架构和软件层面进行控制。这个实现是第一个使用这种类型的运行时PG技术的CPU。Geyser-1具有时间和空间细粒度PG，并且与真正的芯片配合良好。

引用次数: 7

iRetILP: An efficient incremental algorithm for min-period retiming under general delay model iRetILP:一般延迟模型下最小周期重定时的一种高效增量算法

2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2010-01-18 DOI: 10.1109/ASPDAC.2010.5419917

D. Das, Jia Wang, H. Zhou

Retiming is one of the most powerful sequential transformations that relocates flip-flops in a circuit without changing its functionality. The min-period retiming problem seeks a solution with the minimal clock period. Since most min-period retiming algorithms assume a simple constant delay model that does not take into account many prominent electrical effects in ultra deep sub micron vlsi designs, a general delay model was proposed to improve the accuracy of the retiming optimization. Due to the complexity of the general delay model, the formulation of min-period retiming under such model is based on integer linear programming (ILP). However, because the previous ILP formulation was derived on a dense path graph, it incurred huge storage and running time overhead for the ILP solvers and the application was limited to small circuits. In this paper, we present the iRetILP algorithm to solve the min-period retiming problem efficiently under the general delay model by formulating and solving the ILP problems incrementally. Experimental results show that iRetILP is on average 100× faster than the previous algorithm for small circuits and is highly scalable to large circuits in term of memory consumption and running time.

重定时是最强大的顺序转换之一，它可以在不改变其功能的情况下重新定位电路中的触发器。最小周期重定时问题寻求最小时钟周期的解决方案。由于大多数最小周期重定时算法都假设一个简单的恒定延迟模型，而没有考虑到超深亚微米超大规模集成电路设计中许多突出的电效应，因此提出了一个通用延迟模型来提高重定时优化的精度。由于一般时滞模型的复杂性，该模型下的最小周期重定时是基于整数线性规划(ILP)的。然而，由于以前的ILP公式是在密集的路径图上推导出来的，因此它为ILP求解器带来了巨大的存储和运行时间开销，并且应用仅限于小型电路。在本文中，我们提出了一种iRetILP算法，通过逐步表述和求解ILP问题，有效地解决了一般延迟模型下的最小周期重定时问题。实验结果表明，对于小电路，iRetILP算法的平均速度比以前的算法快100倍，并且在内存消耗和运行时间方面具有很高的可扩展性。

{"title":"iRetILP: An efficient incremental algorithm for min-period retiming under general delay model","authors":"D. Das, Jia Wang, H. Zhou","doi":"10.1109/ASPDAC.2010.5419917","DOIUrl":"https://doi.org/10.1109/ASPDAC.2010.5419917","url":null,"abstract":"Retiming is one of the most powerful sequential transformations that relocates flip-flops in a circuit without changing its functionality. The min-period retiming problem seeks a solution with the minimal clock period. Since most min-period retiming algorithms assume a simple constant delay model that does not take into account many prominent electrical effects in ultra deep sub micron vlsi designs, a general delay model was proposed to improve the accuracy of the retiming optimization. Due to the complexity of the general delay model, the formulation of min-period retiming under such model is based on integer linear programming (ILP). However, because the previous ILP formulation was derived on a dense path graph, it incurred huge storage and running time overhead for the ILP solvers and the application was limited to small circuits. In this paper, we present the iRetILP algorithm to solve the min-period retiming problem efficiently under the general delay model by formulating and solving the ILP problems incrementally. Experimental results show that iRetILP is on average 100× faster than the previous algorithm for small circuits and is highly scalable to large circuits in term of memory consumption and running time.","PeriodicalId":152569,"journal":{"name":"2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128842098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Analyzing impact of multiple ABB and AVS domains on throughput of power and thermal-constrained multi-core processors 分析多个ABB和AVS域对功率和热约束多核处理器吞吐量的影响

2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2010-01-18 DOI: 10.1109/ASPDAC.2010.5419889

Jungseob Lee, Shiyu Zhou, N. Kim

Recently, semiconductor industries have integrated more cores in a single die, which substantially improves the throughput of the processors running highly-parallel applications. However, many existing applications do not have high enough parallelism to exploit multiple cores in a die, slowing the transition to many-core processors with smaller and more cores that benefit future applications with high parallelism. In this paper, we analyze the impact of multiple adaptive voltage scaling (AVS) and adaptive body biasing (ABB) domains on the throughput of power and thermal-constrained multi-core processors when they are combined with per-core power-gating (PCPG). Both AVS and ABB can be effectively used to either increase frequency (thus throughput) or decrease power consumption of the processors. Meanwhile, PCPG can provide extra power and thermal headroom when application's parallelism is limited. First, we analyze the throughput impact of applying AVS, ABB, and PCPG for power and thermal constrained multi-core processors. Second, we investigate the impact of multiple AVS and ABB domains on the throughput, and recommend the most cost-effective number of domains for AVS and ABB in 16 and 8-core processors. Our analysis using the 32nm predictive technology model considering within-die variations suggests that the most cost-effective number of domains for AVS and/or ABB should be one for each when they are combined with PCPG in both 16 and 8-core processors. Since within-die core-to-core variations provide many choices in terms of core frequency and power consumption for limited-parallelism applications, one AVS or ABB domain can leads to the throughput improvement by 1.77∼2.49x; more than one AVS and/or ABB domains only improve the throughput marginally.

最近，半导体行业在单个芯片中集成了更多的核心，这大大提高了运行高度并行应用的处理器的吞吐量。然而，许多现有的应用程序没有足够高的并行性来利用一个芯片中的多个核心，这减慢了向具有更小和更多核心的多核处理器的过渡，这有利于未来具有高并行性的应用程序。在本文中，我们分析了多个自适应电压缩放(AVS)和自适应体偏置(ABB)域与单核功率门控(PCPG)结合使用时对功率和热约束多核处理器吞吐量的影响。AVS和ABB都可以有效地用于提高频率(从而提高吞吐量)或降低处理器的功耗。同时，在应用并行性有限的情况下，PCPG可以提供额外的功率和热余量。首先，我们分析了在功率和热受限的多核处理器上应用AVS、ABB和PCPG对吞吐量的影响。其次，我们研究了多个AVS和ABB域对吞吐量的影响，并推荐了在16核和8核处理器中AVS和ABB最具成本效益的域数量。我们使用32nm预测技术模型进行分析，考虑到芯片内的变化，表明当AVS和/或ABB在16核和8核处理器中与PCPG结合使用时，最具成本效益的域数量应该是每个域一个。由于芯片内核心到核心的变化为有限并行应用提供了许多核心频率和功耗方面的选择，因此一个AVS或ABB域可以使吞吐量提高1.77 ~ 2.49倍;多个AVS和/或ABB域只能略微提高吞吐量。

{"title":"Analyzing impact of multiple ABB and AVS domains on throughput of power and thermal-constrained multi-core processors","authors":"Jungseob Lee, Shiyu Zhou, N. Kim","doi":"10.1109/ASPDAC.2010.5419889","DOIUrl":"https://doi.org/10.1109/ASPDAC.2010.5419889","url":null,"abstract":"Recently, semiconductor industries have integrated more cores in a single die, which substantially improves the throughput of the processors running highly-parallel applications. However, many existing applications do not have high enough parallelism to exploit multiple cores in a die, slowing the transition to many-core processors with smaller and more cores that benefit future applications with high parallelism. In this paper, we analyze the impact of multiple adaptive voltage scaling (AVS) and adaptive body biasing (ABB) domains on the throughput of power and thermal-constrained multi-core processors when they are combined with per-core power-gating (PCPG). Both AVS and ABB can be effectively used to either increase frequency (thus throughput) or decrease power consumption of the processors. Meanwhile, PCPG can provide extra power and thermal headroom when application's parallelism is limited. First, we analyze the throughput impact of applying AVS, ABB, and PCPG for power and thermal constrained multi-core processors. Second, we investigate the impact of multiple AVS and ABB domains on the throughput, and recommend the most cost-effective number of domains for AVS and ABB in 16 and 8-core processors. Our analysis using the 32nm predictive technology model considering within-die variations suggests that the most cost-effective number of domains for AVS and/or ABB should be one for each when they are combined with PCPG in both 16 and 8-core processors. Since within-die core-to-core variations provide many choices in terms of core frequency and power consumption for limited-parallelism applications, one AVS or ABB domain can leads to the throughput improvement by 1.77∼2.49x; more than one AVS and/or ABB domains only improve the throughput marginally.","PeriodicalId":152569,"journal":{"name":"2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123793974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Current source modeling in the presence of body bias 存在体偏的电流源建模

2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2010-01-18 DOI: 10.1109/ASPDAC.2010.5419896

Saket Gupta, S. Sapatnekar

With the increasing use of adaptive body biases in high-performance designs, it has become necessary to build timing models that can include these effects. State-of-the-art timing tools use current source models (CSMs), which have proven to be fast and accurate. However, a straightforward extension of CSMs to incorporate multiple body biases results in unreasonably large characterization tables for each cell. We propose a new approach to compactly capture body bias effects within a mainstream CSM framework. Our approach features a table reduction method for compact storage, and a fast and novel waveform sensitivity method for timing evaluation. On a 45nm technology, we demonstrate high accuracy, with worst-case errors of under 5% in both slew and delay as compared to HSPICE. We show a speedup of over five orders of magnitude over HSPICE and almost 70x over conventional CSMs.

随着在高性能设计中越来越多地使用自适应身体偏差，有必要建立可以包括这些影响的时间模型。最先进的定时工具使用电流源模型(csm)，这已被证明是快速和准确的。然而，直接扩展csm以纳入多个体偏差会导致每个细胞的不合理的大表征表。我们提出了一种在主流CSM框架内紧凑捕获身体偏差效应的新方法。我们的方法具有用于紧凑存储的表约简方法和用于时序评估的快速新颖波形灵敏度方法。在45nm技术上，我们证明了高精度，与HSPICE相比，最坏情况下的转换和延迟误差均低于5%。我们的速度比HSPICE快5个数量级，比传统csm快近70倍。

引用次数: 4

Novel dual-Vth independent-gate FinFET circuits 新型双v独立栅极FinFET电路

2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2010-01-18 DOI: 10.1109/ASPDAC.2010.5419680

M. Rostami, K. Mohanram

This paper describes gate work function and oxide thickness tuning to realize novel circuits using dual-Vth independent-gate FinFETs. Dual-Vth FinFETs with independent gates enable series and parallel merge transformations in logic gates, realizing compact low power alternatives. Furthermore, they also enable the design of a new class of compact logic gates with higher expressive power and flexibility than conventional forms, e.g., implementing 12 unique Boolean functions using only four transistors. The gates are designed and calibrated using the University of Florida double-gate model into a technology library. Synthesis results for 14 benchmark circuits from the ISCAS and OpenSPARC suites indicate that on average, the enhanced library reduces delay, power, and area by 9%, 21%, and 27%, respectively, over a conventional library designed using FinFETs in 32nm technology.

本文介绍了利用双v独立栅极finfet实现新型电路的门功函数和氧化物厚度调谐。具有独立门的双vth finfet可在逻辑门中实现串联和并联合并转换，实现紧凑的低功耗替代方案。此外，它们还能够设计出比传统形式具有更高表达能力和灵活性的新型紧凑型逻辑门，例如，仅使用四个晶体管即可实现12个独特的布尔函数。门的设计和校准使用佛罗里达大学的双门模型进入一个技术库。来自ISCAS和OpenSPARC套件的14个基准电路的合成结果表明，平均而言，与使用32纳米finfet设计的传统库相比，增强库分别降低了9%，21%和27%的延迟，功耗和面积。

引用次数: 34

Improved clock-gating control scheme for transparent pipeline 改进的透明管道时钟门控控制方案

2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2010-01-18 DOI: 10.1109/ASPDAC.2010.5419847

J. Choi, Byung Guk Kim, A. Dasgupta, K. Roy

This paper presents a stage-level clock-gating scheme for clock power improvement. The proposed technique efficiently implements the concept of transparent pipeline which improves clocking power by dynamically making pipeline registers transparent. We developed new control scheme for transparent pipeline which can be applied to any number of pipeline stages. A low-overhead flip-flop with transparent mode is also proposed to reduce implementation overhead. The proposed clock-gating control logic is extended to pipeline collapsing which allows energy/performance trade-off through dynamic frequency scaling. Simulation results on IBM 90nm technology show that the proposed approach has less overhead (∼25%) than the previous transparent pipeline scheme and improves up to 40% of clocking power in 64-bit 7-stage pipeline over traditional stage-level clock-gating technique.

提出了一种提高时钟功率的级配方案。该技术有效地实现了透明管道的概念，通过动态地使管道寄存器透明来提高时钟功率。我们开发了一种新的透明管道控制方案，可以应用于任意数量的管道阶段。为了减少实现开销，还提出了一种具有透明模式的低开销触发器。提出的时钟门控逻辑扩展到管道崩溃，允许能量/性能权衡通过动态频率缩放。在IBM 90nm技术上的仿真结果表明，该方法比以前的透明管道方案开销更小(约25%)，并且在64位7级管道中比传统的级级时钟门控技术提高了高达40%的时钟功率。

引用次数: 7

A low latency wormhole router for asynchronous on-chip networks 用于异步片上网络的低延迟虫洞路由器

2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2010-01-18 DOI: 10.5555/1899721.1899827

Wei Song, D. Edwards

Asynchronous on-chip networks are power efficient and tolerant to process variation but they are slower than synchronous on-chip networks. A low latency asynchronous wormhole router is proposed using sliced sub-channels and the lookahead pipeline. Channel slicing removes the C-element tree in the completion detection circuit and converts a channel into multiple independent sub-channels reducing the cycle period. The lookahead pipeline uses the early evaluation protocol to reduce cycle period. Using the lookahead pipeline on the pipeline stages with the maximal cycle period improves the overall throughput. The router is a pure standard cell design implemented by a 0.13 µm technology. The cycle period of the router at the typical corner is 1.7 ns, providing 2.35GByte/sec throughput per port.

异步片上网络具有功耗效率和对进程变化的容忍度，但它们比同步片上网络慢。提出了一种低延迟异步虫洞路由器，该路由器采用切片子通道和前瞻管道。通道切片去除了补全检测电路中的c元素树，将一个通道转换成多个独立的子通道，减少了周期。前瞻性管道使用早期评估协议来缩短周期。在具有最大周期的管道阶段上使用前瞻性管道可以提高总体吞吐量。该路由器是一个纯标准单元设计，采用0.13µm技术实现。典型拐角处的路由器周期为1.7 ns，每个端口的吞吐量为2.35GByte/sec。

引用次数: 15

A 60GHz direct-conversion transmitter in 65nm CMOS technology 采用65nm CMOS技术的60GHz直接转换发射机

2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2010-01-18 DOI: 10.1109/ASPDAC.2010.5419862

Naoki Takayama, Kota Matsushita, Shogo Ito, Ning Li, K. Okada, A. Matsuzawa

This paper presents a 60 GHz direct-conversion transmitter in 65 nm CMOS technology. The power amplifier consists of 4-stage transistors. The circuit model of de-coupling capacitor is built as a transmission line to consider the physical length. In the measurement results, the conversion gain is above 9.6dB at 58–65GHz band, and the 1 dB compression point is 1.6 dBm with 60 GHz LO frequency and 1 dB LO power.

本文提出了一种采用65纳米CMOS技术的60 GHz直接转换发射机。功率放大器由4级晶体管组成。考虑物理长度，将解耦电容作为传输线建立电路模型。测量结果显示，在58 ~ 65ghz频段，转换增益大于9.6dB，在60 GHz LO频率和1 dB LO功率下，1db压缩点为1.6 dBm。

引用次数: 2

An electrically adjustable 3-terminal regulator with post-fabrication level-trimming function 一个电可调的3端调节器与后加工水平修剪功能

2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2010-01-18 DOI: 10.1109/ASPDAC.2010.5419859

Hiroyuki Morimoto, H. Koike, Kazuyuki Nakamura

This paper describes a new technique for 3-terminal regulators to adjust the output voltage level without additional terminals or extra off-chip components. By applying a serial control pattern using the intermediate voltage level between the supply voltage and the regulator output, the adjustment data in the internal nonvolatile memory are safely updated without noise disturbance. In an on-board test with a chip fabricated using a 0.35-µm standard CMOS process, we confirm successful output voltage adjustment with sub-10mV precision.

本文介绍了一种无需附加端子或片外元件的三端稳压器调节输出电压电平的新技术。通过使用电源电压和调节器输出之间的中间电压电平应用串行控制模式，内部非易失性存储器中的调整数据在没有噪声干扰的情况下安全地更新。在采用0.35µm标准CMOS工艺制造的芯片的板上测试中，我们证实了输出电压的成功调整精度低于10mv。

引用次数: 1

Checker-pattern and shared two pixels LOFIC CMOS image sensors 检波模式和共享两像素LOFIC CMOS图像传感器

2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC)

Pub Date : 2010-01-18 DOI: 10.1109/ASPDAC.2010.5419872

Y. Tashiro, Shun Kawada, Shin Sakai, S. Sugawa

Two wide dynamic range CMOS image sensors with lateral overflow integration capacitor have been developed. A checker-pattern image sensor has achieved high area efficiency by placing the color filters and on-chip microlens along the direction at an angle of 45°. A shared two pixels image sensor has achieved small pixel pitch by introducing a lateral overflow gate in each pixel. The fabricated image sensors exhibit high full well capacity, low noise, wide dynamic range and high resolution performance.

研制了两种具有横向溢流集成电容的宽动态范围CMOS图像传感器。通过沿45°角方向放置彩色滤光片和片上微透镜，实现了高面积效率的格子图案图像传感器。一种共享的二像素图像传感器通过在每个像素中引入横向溢流门来实现小像素间距。所制备的图像传感器具有高满井容量、低噪声、宽动态范围和高分辨率等特点。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀