首页 > 最新文献

2009 IEEE Computer Society Annual Symposium on VLSI最新文献

英文 中文
A Low-power Low-cost Optical Router for Optical Networks-on-Chip in Multiprocessor Systems-on-Chip 用于多处理器片上系统的片上光网络的低功耗低成本光路由器
Pub Date : 2009-05-13 DOI: 10.1109/ISVLSI.2009.19
Huaxi Gu, Mo Kwai Hung Morton, Jiang Xu, Wei Zhang
Networks-on-chip (NoCs) can improve the communication bandwidth and power efficiency of multiprocessor systems-on-chip (MPSoC). However, traditional metallic interconnects consume significant amount of power to deliver even higher communication bandwidth required in the near future. Optical NoCs are based on optical interconnects and optical routers, and have significant bandwidth and power advantages. This paper proposed a high-performance low-power low-cost optical router, Cygnus, for optical NoCs. Cygnus is non-blocking and based on silicon microresonators. We compared Cygnus with other microresonator-based routers, and analyzed their power consumption, optical power insertion loss, and the number of microresonators used in detail. The results show that Cygnus has the lowest power consumption and losses, and requires the lowest number of microresonators. For example, Cygnus has 50% less power consumption, 51% less optical power insertion loss, and 20% less microresonators than the optimized traditional optical crossbar router. Comparing to a high-performance 45nm electronic router, Cygnus consumes 96% less power. Moreover, the passive routing feature of Cygnus guarantees that, while using dimension order routing algorithm, the maximum power consumption to route a packet through a network is a small constant number, regardless of the network size. For example, the maximum power consumption is 4.80fJ/bit under current technologies. We simulated and analyzed an 8x8 2D mesh NoC built from Cygnus and showed the end-to-end delay and network throughput under different offered loads and packet sizes.
片上网络(noc)可以提高多处理器片上系统(MPSoC)的通信带宽和功耗效率。然而,在不久的将来,传统的金属互连需要消耗大量的功率来提供更高的通信带宽。光noc基于光互连和光路由器,具有显著的带宽和功耗优势。本文提出了一种高性能、低功耗、低成本的光路由器Cygnus。天鹅座是无阻塞的,基于硅微谐振器。我们将Cygnus与其他基于微谐振器的路由器进行了比较,并详细分析了它们的功耗、光功率插入损耗和使用的微谐振器数量。结果表明,Cygnus具有最低的功耗和损耗,并且需要最少的微谐振器数量。例如,与优化后的传统光交叉条路由器相比,Cygnus的功耗降低50%,光功率插入损耗降低51%,微谐振器减少20%。与高性能的45纳米电子路由器相比,Cygnus的功耗降低了96%。此外,Cygnus的被动路由特性保证了在使用维序路由算法时,无论网络大小如何,通过网络路由数据包的最大功耗都是一个很小的常数。例如,在现有技术下,最大功耗为4.80fJ/bit。我们模拟并分析了基于Cygnus构建的8x8 2D mesh NoC,并展示了在不同提供的负载和数据包大小下的端到端延迟和网络吞吐量。
{"title":"A Low-power Low-cost Optical Router for Optical Networks-on-Chip in Multiprocessor Systems-on-Chip","authors":"Huaxi Gu, Mo Kwai Hung Morton, Jiang Xu, Wei Zhang","doi":"10.1109/ISVLSI.2009.19","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.19","url":null,"abstract":"Networks-on-chip (NoCs) can improve the communication bandwidth and power efficiency of multiprocessor systems-on-chip (MPSoC). However, traditional metallic interconnects consume significant amount of power to deliver even higher communication bandwidth required in the near future. Optical NoCs are based on optical interconnects and optical routers, and have significant bandwidth and power advantages. This paper proposed a high-performance low-power low-cost optical router, Cygnus, for optical NoCs. Cygnus is non-blocking and based on silicon microresonators. We compared Cygnus with other microresonator-based routers, and analyzed their power consumption, optical power insertion loss, and the number of microresonators used in detail. The results show that Cygnus has the lowest power consumption and losses, and requires the lowest number of microresonators. For example, Cygnus has 50% less power consumption, 51% less optical power insertion loss, and 20% less microresonators than the optimized traditional optical crossbar router. Comparing to a high-performance 45nm electronic router, Cygnus consumes 96% less power. Moreover, the passive routing feature of Cygnus guarantees that, while using dimension order routing algorithm, the maximum power consumption to route a packet through a network is a small constant number, regardless of the network size. For example, the maximum power consumption is 4.80fJ/bit under current technologies. We simulated and analyzed an 8x8 2D mesh NoC built from Cygnus and showed the end-to-end delay and network throughput under different offered loads and packet sizes.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"165 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115488711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 154
Algorithms for Estimating Number of Glitches and Dynamic Power in CMOS Circuits with Delay Variations 具有延迟变化的CMOS电路中故障数和动态功率估计算法
Pub Date : 2009-05-13 DOI: 10.1109/ISVLSI.2009.57
Jins D. Alexander, V. Agrawal
Dynamic power dissipation of a CMOS VLSI circuit depends on the signal activity at gate outputs. The activity includes the steady-state logic transitions as well as glitches. The latter are a function of gate delays, which, for modern VLSI circuits, have wide process-related variations. Both average and peak power dissipation are useful and are traditionally estimated by Monte Carlo simulation. This is expensive and the accuracy, especially for peak power,depends upon the number of circuit delay samples that are simulated. We present an alternative. We use zero-delay simulation of a vector pair to determine the steady-state logic activity. We derive linear-time algorithms that, using delay bounds for gates, determine the maximum, minimum and average number of transitions that each gate output can produce. From this information, we estimate the average and peak energy consumed by each vector pair in a given vector set. For a set of random vectors applied to c7552 circuit, our analysis determined the per-vector energy consumption as 82.2 picojoules average and 196.3 picojoules peak. In comparison, Monte Carlo simulation of 1,000 circuit samples gave 82.8 picojoules average and 146.1 picojoules peak. The discrepancy of the peak consumption will reduce if more samples were simulated in the Monte Carlo method. Even with 1,000 samples the CPU time of the Monte Carlo analysis was three orders of magnitude greater than the alternative method we offer in this paper.
CMOS VLSI电路的动态功耗取决于门输出端的信号活度。该活动包括稳态逻辑转换和故障。后者是门延迟的函数,对于现代VLSI电路来说,门延迟具有广泛的与工艺相关的变化。平均和峰值功耗都是有用的,传统上是通过蒙特卡罗模拟来估计的。这是昂贵的,而且精度,特别是峰值功率,取决于模拟的电路延迟样本的数量。我们提出了另一种选择。我们使用向量对的零延迟模拟来确定稳态逻辑活动。我们推导线性时间算法,使用门的延迟界,确定每个门输出可以产生的最大,最小和平均转换数。根据这些信息,我们估计了给定向量集中每个向量对消耗的平均能量和峰值能量。对于应用于c7552电路的一组随机矢量,我们的分析确定了每个矢量的能量消耗为平均82.2皮焦耳和峰值196.3皮焦耳。相比之下,蒙特卡罗模拟的1000个电路样品得到平均82.8皮焦耳和峰值146.1皮焦耳。蒙特卡罗方法模拟的样本越多,峰值消耗的差异就越小。即使有1,000个样本,蒙特卡罗分析的CPU时间也比我们在本文中提供的替代方法大三个数量级。
{"title":"Algorithms for Estimating Number of Glitches and Dynamic Power in CMOS Circuits with Delay Variations","authors":"Jins D. Alexander, V. Agrawal","doi":"10.1109/ISVLSI.2009.57","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.57","url":null,"abstract":"Dynamic power dissipation of a CMOS VLSI circuit depends on the signal activity at gate outputs. The activity includes the steady-state logic transitions as well as glitches. The latter are a function of gate delays, which, for modern VLSI circuits, have wide process-related variations. Both average and peak power dissipation are useful and are traditionally estimated by Monte Carlo simulation. This is expensive and the accuracy, especially for peak power,depends upon the number of circuit delay samples that are simulated. We present an alternative. We use zero-delay simulation of a vector pair to determine the steady-state logic activity. We derive linear-time algorithms that, using delay bounds for gates, determine the maximum, minimum and average number of transitions that each gate output can produce. From this information, we estimate the average and peak energy consumed by each vector pair in a given vector set. For a set of random vectors applied to c7552 circuit, our analysis determined the per-vector energy consumption as 82.2 picojoules average and 196.3 picojoules peak. In comparison, Monte Carlo simulation of 1,000 circuit samples gave 82.8 picojoules average and 146.1 picojoules peak. The discrepancy of the peak consumption will reduce if more samples were simulated in the Monte Carlo method. Even with 1,000 samples the CPU time of the Monte Carlo analysis was three orders of magnitude greater than the alternative method we offer in this paper.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"176 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120954213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Secure Leakage-Proof Public Verification of IP Marks in VLSI Physical Design VLSI物理设计中IP标记的安全防漏公开验证
Pub Date : 2009-05-13 DOI: 10.1109/ISVLSI.2009.35
Debasri Saha, S. Sur-Kolay
Reuse of Intellectual Property (IP) of VLSI physical design facilitates integration of more components on a single chip in shrinking time-to-market. For intellectual property protection (IPP), various kinds of IP marks are embedded into the design for establishing the veracity of a legal owner. However, public verification of IP marks is not leakage-proof. Current techniques include a sufficiently large set of public marks containing a header and a message body in addition to private ones to facilitate only public verification at the cost of significant increase in design overhead. But these techniques are not effective, as attackers manage to obtain potential clues to tamper public marks rendering public verification invalid and may also suitably override the marks to include own signature resulting in wrong public identification of IP owner. Here we propose a zero-knowledge protocol to ensure robust and absolutely leakage proof convincing public verification with the help of private marks. We have tested our protocol for FPGA benchmarks. The results on overhead and robustness are encouraging.
VLSI物理设计的知识产权(IP)重用有助于在单个芯片上集成更多组件,缩短上市时间。对于知识产权保护(IPP),在设计中嵌入各种知识产权标志,以确定合法所有者的真实性。然而,对知识产权商标的公开验证并不是防漏的。目前的技术包括一组足够大的公共标记,除了私有标记外,还包含标头和消息体,以便仅以显著增加设计开销为代价进行公共验证。但是这些技术并不有效,因为攻击者设法获得篡改公共标记的潜在线索,从而使公共验证无效,并且还可能适当地覆盖标记以包含自己的签名,从而导致错误的IP所有者的公共识别。在此,我们提出了一个零知识协议,以确保在私有标记的帮助下进行可靠的绝对防泄漏的公开验证。我们已经对我们的协议进行了FPGA基准测试。开销和健壮性方面的结果令人鼓舞。
{"title":"Secure Leakage-Proof Public Verification of IP Marks in VLSI Physical Design","authors":"Debasri Saha, S. Sur-Kolay","doi":"10.1109/ISVLSI.2009.35","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.35","url":null,"abstract":"Reuse of Intellectual Property (IP) of VLSI physical design facilitates integration of more components on a single chip in shrinking time-to-market. For intellectual property protection (IPP), various kinds of IP marks are embedded into the design for establishing the veracity of a legal owner. However, public verification of IP marks is not leakage-proof. Current techniques include a sufficiently large set of public marks containing a header and a message body in addition to private ones to facilitate only public verification at the cost of significant increase in design overhead. But these techniques are not effective, as attackers manage to obtain potential clues to tamper public marks rendering public verification invalid and may also suitably override the marks to include own signature resulting in wrong public identification of IP owner. Here we propose a zero-knowledge protocol to ensure robust and absolutely leakage proof convincing public verification with the help of private marks. We have tested our protocol for FPGA benchmarks. The results on overhead and robustness are encouraging.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125627762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Context-aware Post Routing Redundant Via Insertion 上下文感知后路由冗余通过插入
Pub Date : 2009-05-13 DOI: 10.1109/ISVLSI.2009.39
P. Chu, Rung-Bin Lin, Da-Wei Hsu, Yu-Hsing Chen, Wei-Chiu Tseng
Effective algorithms have been invented for post-routing redundant via insertion (RVI). However, implementations of these algorithms often ignore some practical issues. In this article, we implement a post-routing RVI algorithm that takes into account interconnect contexts during RVI. Experimental results show that our context-aware RVI on average raises via1 (vias between metal layer 1 and 2) insertion rate from 37.4% to 72.1% and total insertion rate from 72.5% to 85.8%. On average, it increases RVI rate of critical paths by 3.6%. Besides, with redundant pin-area minimization, our approach reduces metal 1 and metal 2 area used for RVI at pins by 3%.
针对后路由插入冗余(RVI)问题,提出了有效的算法。然而,这些算法的实现往往忽略了一些实际问题。在本文中,我们实现了一个路由后RVI算法,该算法在RVI期间考虑了互连上下文。实验结果表明,我们的上下文感知RVI平均将via1(金属层1和金属层2之间的孔)插入率从37.4%提高到72.1%,总插入率从72.5%提高到85.8%。它使关键路径的RVI率平均提高3.6%。此外,通过最小化冗余引脚面积,我们的方法将引脚处用于RVI的金属1和金属2面积减少了3%。
{"title":"Context-aware Post Routing Redundant Via Insertion","authors":"P. Chu, Rung-Bin Lin, Da-Wei Hsu, Yu-Hsing Chen, Wei-Chiu Tseng","doi":"10.1109/ISVLSI.2009.39","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.39","url":null,"abstract":"Effective algorithms have been invented for post-routing redundant via insertion (RVI). However, implementations of these algorithms often ignore some practical issues. In this article, we implement a post-routing RVI algorithm that takes into account interconnect contexts during RVI. Experimental results show that our context-aware RVI on average raises via1 (vias between metal layer 1 and 2) insertion rate from 37.4% to 72.1% and total insertion rate from 72.5% to 85.8%. On average, it increases RVI rate of critical paths by 3.6%. Besides, with redundant pin-area minimization, our approach reduces metal 1 and metal 2 area used for RVI at pins by 3%.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"92 11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128019611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Modern Floorplanning with Boundary Clustering Constraint 具有边界聚类约束的现代平面规划
Pub Date : 2009-05-13 DOI: 10.1109/ISVLSI.2009.24
Li Li, Yuchun Ma, N. Xu, Yu Wang, Xianlong Hong
With the development of SOC designs, modern floorplanning typically needs to provide extra options to meet the different emerging requirements in the hierarchical designs, such as boundary constraint for I/O connection, clustering constraint for performance and reliability, etc. This paper addresses modern floorplanning with boundary clustering constraint. It has been empirically shown that the modern constraints extremely restrict the solution space; that is, a large number of randomly generated floorplans might be infeasible. In order to effectively search the feasible solutions, the feasible conditions based on B*-tree representation with boundary clustering constraint are investigated. The properties, coupled with an efficient simulated annealing algorithm, provide the way to produce feasible floorplans by dynamic repairing, which can transform an infeasible solution into a feasible one if the constraint is violated. Our algorithm is verified by using the MCNC and GSRC benchmarks, and the empirical results show that our algorithm can obtain promising solutions in acceptable time
随着SOC设计的发展,现代布局通常需要提供额外的选项来满足分层设计中出现的不同需求,例如I/O连接的边界约束,性能和可靠性的集群约束等。本文研究了具有边界聚类约束的现代楼层规划问题。经验表明,现代约束极大地限制了解空间;也就是说,大量随机生成的平面图可能是不可行的。为了有效地搜索可行解,研究了基于边界聚类约束的B*树表示的可行条件。结合有效的模拟退火算法,提供了一种通过动态修复生成可行平面图的方法,该方法可以在违反约束的情况下将不可行的解转化为可行的解。通过MCNC和GSRC的基准测试对算法进行了验证,实证结果表明我们的算法能够在可接受的时间内得到有希望的解
{"title":"Modern Floorplanning with Boundary Clustering Constraint","authors":"Li Li, Yuchun Ma, N. Xu, Yu Wang, Xianlong Hong","doi":"10.1109/ISVLSI.2009.24","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.24","url":null,"abstract":"With the development of SOC designs, modern floorplanning typically needs to provide extra options to meet the different emerging requirements in the hierarchical designs, such as boundary constraint for I/O connection, clustering constraint for performance and reliability, etc. This paper addresses modern floorplanning with boundary clustering constraint. It has been empirically shown that the modern constraints extremely restrict the solution space; that is, a large number of randomly generated floorplans might be infeasible. In order to effectively search the feasible solutions, the feasible conditions based on B*-tree representation with boundary clustering constraint are investigated. The properties, coupled with an efficient simulated annealing algorithm, provide the way to produce feasible floorplans by dynamic repairing, which can transform an infeasible solution into a feasible one if the constraint is violated. Our algorithm is verified by using the MCNC and GSRC benchmarks, and the empirical results show that our algorithm can obtain promising solutions in acceptable time","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129453950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Reduction of Current Mismatch in PLL Charge Pump 锁相环电荷泵电流失配的减小
Pub Date : 2009-05-13 DOI: 10.1109/ISVLSI.2009.45
H. Fazeel, L. Raghavan, Chandrasekaran Srinivasaraman, Manish Jain
Low static phase offset is desired in Phase Locked Loops (PLL) employed in high speed I/O interfaces and frequency synthesizers. In this work, non idealities in phase frequency detector and charge pump contributing to static phase offset have been studied and their relative contributions analyzed in detail. A new charge pump architecture with reduced mismatch between Up and Dn current sources has been presented. It makes use of a single two stage amplifier for both current steering and reduction of mismatch. The efficacy of this architecture has been demonstrated with simulation results on a PLL running at an input reference frequency of 500MHz in65nm CMOS technology.
在高速I/O接口和频率合成器中使用的锁相环(PLL)需要低静态相位偏移。本文研究了相频检测器和电荷泵的非理想性对静态相位偏移的影响,并详细分析了它们的相对贡献。提出了一种新的电荷泵结构,减少了上、小电流源之间的不匹配。它利用单个两级放大器进行电流控制和减少失配。在输入参考频率为500MHz、采用65nm CMOS技术的锁相环上的仿真结果证明了该结构的有效性。
{"title":"Reduction of Current Mismatch in PLL Charge Pump","authors":"H. Fazeel, L. Raghavan, Chandrasekaran Srinivasaraman, Manish Jain","doi":"10.1109/ISVLSI.2009.45","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.45","url":null,"abstract":"Low static phase offset is desired in Phase Locked Loops (PLL) employed in high speed I/O interfaces and frequency synthesizers. In this work, non idealities in phase frequency detector and charge pump contributing to static phase offset have been studied and their relative contributions analyzed in detail. A new charge pump architecture with reduced mismatch between Up and Dn current sources has been presented. It makes use of a single two stage amplifier for both current steering and reduction of mismatch. The efficacy of this architecture has been demonstrated with simulation results on a PLL running at an input reference frequency of 500MHz in65nm CMOS technology.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":" 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120834021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
A Novel Low Area Overhead Body Bias FPGA Architecture for Low Power Applications 一种适用于低功耗应用的新型低面积开销体偏置FPGA架构
Pub Date : 2009-05-13 DOI: 10.1109/ISVLSI.2009.51
Sungmin Bae, K. Ramakrishnan, N. Vijaykrishnan
As technology scales, leakage power shares a dominant part in the total power dissipation of the chip and reaches up to 50% or even higher at elevated temperatures in 45 nm technology. Leakage power dissipation is especially problematic for FPGAs due to their reconfigurable nature and large number of inactive resources. Body biasing is an efficient technique to reduce leakage current which has been widely adopted in 45nm technology low power architectures.FPGAs with coarse grained body bias control only incurred about 10% of the area overhead while increasing the granularity to the finest level dramatically increases the area overhead over 100%. However, the coarse grained body bias control FPGA may not result in satisfactory leakage power reduction since all the paths passing a resource must have enough slacks. To overcome the assignment limitation, we propose a novel FPGA architecture which uses body biasing technique and clock skew scheduling at a coarse grained architecture level. Clock skew scheduling technique only incurs 3.35% of additional area overhead in order to distribute slack to the resource instead of increasing the minimum body-bias granularity. Further, we propose a body bias assignment algorithm to leverage the proposed architecture. Experimental results demonstrate that the proposed architecture achieved an average leakage reduction of about 76% as compared to 61% of coarse grained architecture.
随着技术规模的扩大,泄漏功率在芯片总功耗中占主导地位,在45纳米技术中,泄漏功率在高温下可达到50%甚至更高。由于fpga的可重构特性和大量非活动资源,泄漏功耗对fpga来说尤其成问题。体偏置是一种有效的降低漏电流的技术,已广泛应用于45nm工艺的低功耗架构中。具有粗粒度体偏置控制的fpga仅产生约10%的面积开销,而将粒度增加到最细级别则会显着增加超过100%的面积开销。然而,粗粒度体偏置控制FPGA可能无法获得令人满意的泄漏功率降低,因为通过资源的所有路径必须有足够的松弛。为了克服分配限制,我们提出了一种新的FPGA架构,该架构在粗粒度架构级别上使用体偏置技术和时钟倾斜调度。时钟偏差调度技术只会产生3.35%的额外面积开销,以便将空闲分配给资源,而不是增加最小体偏差粒度。此外,我们提出了一种身体偏差分配算法来利用所提出的架构。实验结果表明,与粗粒度结构61%的平均泄漏减少率相比,该结构的平均泄漏减少率约为76%。
{"title":"A Novel Low Area Overhead Body Bias FPGA Architecture for Low Power Applications","authors":"Sungmin Bae, K. Ramakrishnan, N. Vijaykrishnan","doi":"10.1109/ISVLSI.2009.51","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.51","url":null,"abstract":"As technology scales, leakage power shares a dominant part in the total power dissipation of the chip and reaches up to 50% or even higher at elevated temperatures in 45 nm technology. Leakage power dissipation is especially problematic for FPGAs due to their reconfigurable nature and large number of inactive resources. Body biasing is an efficient technique to reduce leakage current which has been widely adopted in 45nm technology low power architectures.FPGAs with coarse grained body bias control only incurred about 10% of the area overhead while increasing the granularity to the finest level dramatically increases the area overhead over 100%. However, the coarse grained body bias control FPGA may not result in satisfactory leakage power reduction since all the paths passing a resource must have enough slacks. To overcome the assignment limitation, we propose a novel FPGA architecture which uses body biasing technique and clock skew scheduling at a coarse grained architecture level. Clock skew scheduling technique only incurs 3.35% of additional area overhead in order to distribute slack to the resource instead of increasing the minimum body-bias granularity. Further, we propose a body bias assignment algorithm to leverage the proposed architecture. Experimental results demonstrate that the proposed architecture achieved an average leakage reduction of about 76% as compared to 61% of coarse grained architecture.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115133579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
An Analytical Model to Study Optimal Area Breakdown between Cores and Caches in a Chip Multiprocessor 片式多处理器核与缓存间最优区域分解的解析模型
Pub Date : 2009-05-13 DOI: 10.1109/ISVLSI.2009.27
Taecheol Oh, Hyunjin Lee, Kiyeon Lee, Sangyeun Cho
A key design issue for chip multiprocessors (CMPs) is how to exploit the finite chip area to get the best system throughput.The most dominant area-consuming components in a CMP are processor cores and caches today.There is an important trade-off between the number of cores and the amount of cache in a single CMP chip.If we have too few cores, the system throughput will be limited by the number of threads.If we have too small cache capacity, the system may perform poorly due to frequent cache misses.This paper presents a simple and effective analytical model to study the trade-off of the core count and the cache capacity in a CMP under a finite die area constraint.Our model differentiates shared, private, and hybrid cache organizations.Our work will complement more detailed yet time-consuming simulation approaches by enabling one to quickly study how key chip area allocation parameters affect the system performance.
芯片多处理器(cmp)的一个关键设计问题是如何利用有限的芯片面积来获得最佳的系统吞吐量。目前,CMP中最主要的面积消耗组件是处理器内核和缓存。在单个CMP芯片中的内核数量和缓存数量之间存在一个重要的权衡。如果我们有太少的内核,系统吞吐量将受到线程数量的限制。如果我们的缓存容量太小,系统可能会因为频繁的缓存丢失而性能不佳。本文提出了一个简单有效的分析模型,用于研究在有限模面积约束下CMP中芯数和缓存容量的权衡。我们的模型区分了共享、私有和混合缓存组织。我们的工作将通过使人们能够快速研究关键芯片区域分配参数如何影响系统性能来补充更详细但耗时的仿真方法。
{"title":"An Analytical Model to Study Optimal Area Breakdown between Cores and Caches in a Chip Multiprocessor","authors":"Taecheol Oh, Hyunjin Lee, Kiyeon Lee, Sangyeun Cho","doi":"10.1109/ISVLSI.2009.27","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.27","url":null,"abstract":"A key design issue for chip multiprocessors (CMPs) is how to exploit the finite chip area to get the best system throughput.The most dominant area-consuming components in a CMP are processor cores and caches today.There is an important trade-off between the number of cores and the amount of cache in a single CMP chip.If we have too few cores, the system throughput will be limited by the number of threads.If we have too small cache capacity, the system may perform poorly due to frequent cache misses.This paper presents a simple and effective analytical model to study the trade-off of the core count and the cache capacity in a CMP under a finite die area constraint.Our model differentiates shared, private, and hybrid cache organizations.Our work will complement more detailed yet time-consuming simulation approaches by enabling one to quickly study how key chip area allocation parameters affect the system performance.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117041547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
All Digital Duty Cycle Correction Circuit in 90nm Based on Mutex 基于互斥锁的90nm全数字占空比校正电路
Pub Date : 2009-05-13 DOI: 10.1109/ISVLSI.2009.41
S. Ramasahayam, M. Srinivas
A duty cycle correction circuit (DCC) for high frequency clocks with fine resolution is designed and tested at 1.2V in 90nm CMOS process. Spice simulations show that this duty cycle corrector can adjust the output duty cycle to 50±0.5% with input clock at 500MHz and input duty cycle ranging from20% to 80%. DCC will not introduce any delay in the forward path, which makes it suitable for multi-phase clock applications. The proposed implementation uses the high frequency delay line and MUTEX (Mutual Exclusion Element) based circuit for achieving high resolution.
设计了一种高分辨率高频时钟的占空比校正电路(DCC),并在1.2V的90nm CMOS工艺下进行了测试。Spice仿真结果表明,该占空比校正器在输入时钟频率为500MHz,输入占空比范围为20% ~ 80%的情况下,可将输出占空比调整为50±0.5%。DCC不会在正向路径中引入任何延迟,这使得它适用于多相时钟应用。采用高频延迟线和互斥元件(MUTEX)电路实现高分辨率。
{"title":"All Digital Duty Cycle Correction Circuit in 90nm Based on Mutex","authors":"S. Ramasahayam, M. Srinivas","doi":"10.1109/ISVLSI.2009.41","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.41","url":null,"abstract":"A duty cycle correction circuit (DCC) for high frequency clocks with fine resolution is designed and tested at 1.2V in 90nm CMOS process. Spice simulations show that this duty cycle corrector can adjust the output duty cycle to 50±0.5% with input clock at 500MHz and input duty cycle ranging from20% to 80%. DCC will not introduce any delay in the forward path, which makes it suitable for multi-phase clock applications. The proposed implementation uses the high frequency delay line and MUTEX (Mutual Exclusion Element) based circuit for achieving high resolution.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"54 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128078197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Lossless Compression Using Efficient Encoding of Bitmasks 使用有效的位掩码编码的无损压缩
Pub Date : 2009-05-13 DOI: 10.1109/ISVLSI.2009.18
C. Murthy, P. Mishra
Lossless compression is widely used to improve both memory requirement and communication bandwidth in embedded systems. Dictionary based compression techniques are very popular because of their good compression efficiency and fast decompression mechanism. Bitmask based compression improves the effectiveness of the dictionary based approaches by recording minor differences using bitmasks. This paper proposes an efficient encoding of bitmasks used in bitmask-based compression. We prove that a n-bit bitmask (records n differences) can be encoded using only n-1 bits. This encoding improves compression efficiency while reduces decompression hardware overhead. We have applied our approach in a wide a variety of domains including code compression, FPGA bitstream compression as well as control word compression. Our experimental results using a wide variety of benchmarks demonstrate that our approach improves the compression efficiency by 3 to 10% without adding any additional decompression overhead.
在嵌入式系统中,无损压缩被广泛用于提高内存需求和通信带宽。基于字典的压缩技术以其良好的压缩效率和快速的解压机制而受到广泛的应用。基于位掩码的压缩通过使用位掩码记录微小的差异,提高了基于字典的压缩方法的有效性。提出了一种有效的位掩码编码方法,用于基于位掩码的压缩。我们证明了一个n位的位掩码(记录n个差异)可以只用n-1位进行编码。这种编码提高了压缩效率,同时减少了解压缩硬件开销。我们已经将我们的方法应用于各种各样的领域,包括代码压缩,FPGA比特流压缩以及控制字压缩。我们使用各种基准测试的实验结果表明,我们的方法在不增加任何额外的解压开销的情况下将压缩效率提高了3%到10%。
{"title":"Lossless Compression Using Efficient Encoding of Bitmasks","authors":"C. Murthy, P. Mishra","doi":"10.1109/ISVLSI.2009.18","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.18","url":null,"abstract":"Lossless compression is widely used to improve both memory requirement and communication bandwidth in embedded systems. Dictionary based compression techniques are very popular because of their good compression efficiency and fast decompression mechanism. Bitmask based compression improves the effectiveness of the dictionary based approaches by recording minor differences using bitmasks. This paper proposes an efficient encoding of bitmasks used in bitmask-based compression. We prove that a n-bit bitmask (records n differences) can be encoded using only n-1 bits. This encoding improves compression efficiency while reduces decompression hardware overhead. We have applied our approach in a wide a variety of domains including code compression, FPGA bitstream compression as well as control word compression. Our experimental results using a wide variety of benchmarks demonstrate that our approach improves the compression efficiency by 3 to 10% without adding any additional decompression overhead.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"173 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131181085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
2009 IEEE Computer Society Annual Symposium on VLSI
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1