首页 > 最新文献

2009 IEEE Computer Society Annual Symposium on VLSI最新文献

英文 中文
Algorithms for Estimating Number of Glitches and Dynamic Power in CMOS Circuits with Delay Variations 具有延迟变化的CMOS电路中故障数和动态功率估计算法
Pub Date : 2009-05-13 DOI: 10.1109/ISVLSI.2009.57
Jins D. Alexander, V. Agrawal
Dynamic power dissipation of a CMOS VLSI circuit depends on the signal activity at gate outputs. The activity includes the steady-state logic transitions as well as glitches. The latter are a function of gate delays, which, for modern VLSI circuits, have wide process-related variations. Both average and peak power dissipation are useful and are traditionally estimated by Monte Carlo simulation. This is expensive and the accuracy, especially for peak power,depends upon the number of circuit delay samples that are simulated. We present an alternative. We use zero-delay simulation of a vector pair to determine the steady-state logic activity. We derive linear-time algorithms that, using delay bounds for gates, determine the maximum, minimum and average number of transitions that each gate output can produce. From this information, we estimate the average and peak energy consumed by each vector pair in a given vector set. For a set of random vectors applied to c7552 circuit, our analysis determined the per-vector energy consumption as 82.2 picojoules average and 196.3 picojoules peak. In comparison, Monte Carlo simulation of 1,000 circuit samples gave 82.8 picojoules average and 146.1 picojoules peak. The discrepancy of the peak consumption will reduce if more samples were simulated in the Monte Carlo method. Even with 1,000 samples the CPU time of the Monte Carlo analysis was three orders of magnitude greater than the alternative method we offer in this paper.
CMOS VLSI电路的动态功耗取决于门输出端的信号活度。该活动包括稳态逻辑转换和故障。后者是门延迟的函数,对于现代VLSI电路来说,门延迟具有广泛的与工艺相关的变化。平均和峰值功耗都是有用的,传统上是通过蒙特卡罗模拟来估计的。这是昂贵的,而且精度,特别是峰值功率,取决于模拟的电路延迟样本的数量。我们提出了另一种选择。我们使用向量对的零延迟模拟来确定稳态逻辑活动。我们推导线性时间算法,使用门的延迟界,确定每个门输出可以产生的最大,最小和平均转换数。根据这些信息,我们估计了给定向量集中每个向量对消耗的平均能量和峰值能量。对于应用于c7552电路的一组随机矢量,我们的分析确定了每个矢量的能量消耗为平均82.2皮焦耳和峰值196.3皮焦耳。相比之下,蒙特卡罗模拟的1000个电路样品得到平均82.8皮焦耳和峰值146.1皮焦耳。蒙特卡罗方法模拟的样本越多,峰值消耗的差异就越小。即使有1,000个样本,蒙特卡罗分析的CPU时间也比我们在本文中提供的替代方法大三个数量级。
{"title":"Algorithms for Estimating Number of Glitches and Dynamic Power in CMOS Circuits with Delay Variations","authors":"Jins D. Alexander, V. Agrawal","doi":"10.1109/ISVLSI.2009.57","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.57","url":null,"abstract":"Dynamic power dissipation of a CMOS VLSI circuit depends on the signal activity at gate outputs. The activity includes the steady-state logic transitions as well as glitches. The latter are a function of gate delays, which, for modern VLSI circuits, have wide process-related variations. Both average and peak power dissipation are useful and are traditionally estimated by Monte Carlo simulation. This is expensive and the accuracy, especially for peak power,depends upon the number of circuit delay samples that are simulated. We present an alternative. We use zero-delay simulation of a vector pair to determine the steady-state logic activity. We derive linear-time algorithms that, using delay bounds for gates, determine the maximum, minimum and average number of transitions that each gate output can produce. From this information, we estimate the average and peak energy consumed by each vector pair in a given vector set. For a set of random vectors applied to c7552 circuit, our analysis determined the per-vector energy consumption as 82.2 picojoules average and 196.3 picojoules peak. In comparison, Monte Carlo simulation of 1,000 circuit samples gave 82.8 picojoules average and 146.1 picojoules peak. The discrepancy of the peak consumption will reduce if more samples were simulated in the Monte Carlo method. Even with 1,000 samples the CPU time of the Monte Carlo analysis was three orders of magnitude greater than the alternative method we offer in this paper.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"176 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120954213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Secure Leakage-Proof Public Verification of IP Marks in VLSI Physical Design VLSI物理设计中IP标记的安全防漏公开验证
Pub Date : 2009-05-13 DOI: 10.1109/ISVLSI.2009.35
Debasri Saha, S. Sur-Kolay
Reuse of Intellectual Property (IP) of VLSI physical design facilitates integration of more components on a single chip in shrinking time-to-market. For intellectual property protection (IPP), various kinds of IP marks are embedded into the design for establishing the veracity of a legal owner. However, public verification of IP marks is not leakage-proof. Current techniques include a sufficiently large set of public marks containing a header and a message body in addition to private ones to facilitate only public verification at the cost of significant increase in design overhead. But these techniques are not effective, as attackers manage to obtain potential clues to tamper public marks rendering public verification invalid and may also suitably override the marks to include own signature resulting in wrong public identification of IP owner. Here we propose a zero-knowledge protocol to ensure robust and absolutely leakage proof convincing public verification with the help of private marks. We have tested our protocol for FPGA benchmarks. The results on overhead and robustness are encouraging.
VLSI物理设计的知识产权(IP)重用有助于在单个芯片上集成更多组件,缩短上市时间。对于知识产权保护(IPP),在设计中嵌入各种知识产权标志,以确定合法所有者的真实性。然而,对知识产权商标的公开验证并不是防漏的。目前的技术包括一组足够大的公共标记,除了私有标记外,还包含标头和消息体,以便仅以显著增加设计开销为代价进行公共验证。但是这些技术并不有效,因为攻击者设法获得篡改公共标记的潜在线索,从而使公共验证无效,并且还可能适当地覆盖标记以包含自己的签名,从而导致错误的IP所有者的公共识别。在此,我们提出了一个零知识协议,以确保在私有标记的帮助下进行可靠的绝对防泄漏的公开验证。我们已经对我们的协议进行了FPGA基准测试。开销和健壮性方面的结果令人鼓舞。
{"title":"Secure Leakage-Proof Public Verification of IP Marks in VLSI Physical Design","authors":"Debasri Saha, S. Sur-Kolay","doi":"10.1109/ISVLSI.2009.35","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.35","url":null,"abstract":"Reuse of Intellectual Property (IP) of VLSI physical design facilitates integration of more components on a single chip in shrinking time-to-market. For intellectual property protection (IPP), various kinds of IP marks are embedded into the design for establishing the veracity of a legal owner. However, public verification of IP marks is not leakage-proof. Current techniques include a sufficiently large set of public marks containing a header and a message body in addition to private ones to facilitate only public verification at the cost of significant increase in design overhead. But these techniques are not effective, as attackers manage to obtain potential clues to tamper public marks rendering public verification invalid and may also suitably override the marks to include own signature resulting in wrong public identification of IP owner. Here we propose a zero-knowledge protocol to ensure robust and absolutely leakage proof convincing public verification with the help of private marks. We have tested our protocol for FPGA benchmarks. The results on overhead and robustness are encouraging.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125627762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Process Variation Tolerant Self-Compensating Sense Amplifier Design 一种过程容差自补偿感测放大器设计
Pub Date : 2009-05-13 DOI: 10.1109/ISVLSI.2009.50
A. Choudhary, S. Kundu
Lithography related CD variations, fluctuations in dopant density, oxide thickness and parametric variations of devices are identified as major challenges in ITRS. Due to growth in size of embedded SRAMs as well as usage of sense amplifier based signaling techniques, process variation in sense amplifiers lead to significant loss of yield. In this paper, we present a process variation tolerant self-compensating sense amplifier design, using an active compensation circuitry. Results from statistical simulation in a 32nm process show that the proposed active compensation is highly effective in restoring yield at a level comparable to that of sense amplifiers without significant process variations.
光刻相关的CD变化、掺杂剂密度的波动、氧化物厚度和器件参数变化被认为是ITRS的主要挑战。由于嵌入式sram尺寸的增长以及基于信号技术的感测放大器的使用,感测放大器的工艺变化导致产量的显著损失。本文提出了一种采用有源补偿电路的过程变化容忍自补偿感测放大器的设计。在32nm制程中的统计模拟结果表明,所提出的主动补偿在没有显著制程变化的情况下,可以非常有效地将良率恢复到与感测放大器相当的水平。
{"title":"A Process Variation Tolerant Self-Compensating Sense Amplifier Design","authors":"A. Choudhary, S. Kundu","doi":"10.1109/ISVLSI.2009.50","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.50","url":null,"abstract":"Lithography related CD variations, fluctuations in dopant density, oxide thickness and parametric variations of devices are identified as major challenges in ITRS. Due to growth in size of embedded SRAMs as well as usage of sense amplifier based signaling techniques, process variation in sense amplifiers lead to significant loss of yield. In this paper, we present a process variation tolerant self-compensating sense amplifier design, using an active compensation circuitry. Results from statistical simulation in a 32nm process show that the proposed active compensation is highly effective in restoring yield at a level comparable to that of sense amplifiers without significant process variations.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123055566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Context-aware Post Routing Redundant Via Insertion 上下文感知后路由冗余通过插入
Pub Date : 2009-05-13 DOI: 10.1109/ISVLSI.2009.39
P. Chu, Rung-Bin Lin, Da-Wei Hsu, Yu-Hsing Chen, Wei-Chiu Tseng
Effective algorithms have been invented for post-routing redundant via insertion (RVI). However, implementations of these algorithms often ignore some practical issues. In this article, we implement a post-routing RVI algorithm that takes into account interconnect contexts during RVI. Experimental results show that our context-aware RVI on average raises via1 (vias between metal layer 1 and 2) insertion rate from 37.4% to 72.1% and total insertion rate from 72.5% to 85.8%. On average, it increases RVI rate of critical paths by 3.6%. Besides, with redundant pin-area minimization, our approach reduces metal 1 and metal 2 area used for RVI at pins by 3%.
针对后路由插入冗余(RVI)问题,提出了有效的算法。然而,这些算法的实现往往忽略了一些实际问题。在本文中,我们实现了一个路由后RVI算法,该算法在RVI期间考虑了互连上下文。实验结果表明,我们的上下文感知RVI平均将via1(金属层1和金属层2之间的孔)插入率从37.4%提高到72.1%,总插入率从72.5%提高到85.8%。它使关键路径的RVI率平均提高3.6%。此外,通过最小化冗余引脚面积,我们的方法将引脚处用于RVI的金属1和金属2面积减少了3%。
{"title":"Context-aware Post Routing Redundant Via Insertion","authors":"P. Chu, Rung-Bin Lin, Da-Wei Hsu, Yu-Hsing Chen, Wei-Chiu Tseng","doi":"10.1109/ISVLSI.2009.39","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.39","url":null,"abstract":"Effective algorithms have been invented for post-routing redundant via insertion (RVI). However, implementations of these algorithms often ignore some practical issues. In this article, we implement a post-routing RVI algorithm that takes into account interconnect contexts during RVI. Experimental results show that our context-aware RVI on average raises via1 (vias between metal layer 1 and 2) insertion rate from 37.4% to 72.1% and total insertion rate from 72.5% to 85.8%. On average, it increases RVI rate of critical paths by 3.6%. Besides, with redundant pin-area minimization, our approach reduces metal 1 and metal 2 area used for RVI at pins by 3%.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"92 11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128019611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Modern Floorplanning with Boundary Clustering Constraint 具有边界聚类约束的现代平面规划
Pub Date : 2009-05-13 DOI: 10.1109/ISVLSI.2009.24
Li Li, Yuchun Ma, N. Xu, Yu Wang, Xianlong Hong
With the development of SOC designs, modern floorplanning typically needs to provide extra options to meet the different emerging requirements in the hierarchical designs, such as boundary constraint for I/O connection, clustering constraint for performance and reliability, etc. This paper addresses modern floorplanning with boundary clustering constraint. It has been empirically shown that the modern constraints extremely restrict the solution space; that is, a large number of randomly generated floorplans might be infeasible. In order to effectively search the feasible solutions, the feasible conditions based on B*-tree representation with boundary clustering constraint are investigated. The properties, coupled with an efficient simulated annealing algorithm, provide the way to produce feasible floorplans by dynamic repairing, which can transform an infeasible solution into a feasible one if the constraint is violated. Our algorithm is verified by using the MCNC and GSRC benchmarks, and the empirical results show that our algorithm can obtain promising solutions in acceptable time
随着SOC设计的发展,现代布局通常需要提供额外的选项来满足分层设计中出现的不同需求,例如I/O连接的边界约束,性能和可靠性的集群约束等。本文研究了具有边界聚类约束的现代楼层规划问题。经验表明,现代约束极大地限制了解空间;也就是说,大量随机生成的平面图可能是不可行的。为了有效地搜索可行解,研究了基于边界聚类约束的B*树表示的可行条件。结合有效的模拟退火算法,提供了一种通过动态修复生成可行平面图的方法,该方法可以在违反约束的情况下将不可行的解转化为可行的解。通过MCNC和GSRC的基准测试对算法进行了验证,实证结果表明我们的算法能够在可接受的时间内得到有希望的解
{"title":"Modern Floorplanning with Boundary Clustering Constraint","authors":"Li Li, Yuchun Ma, N. Xu, Yu Wang, Xianlong Hong","doi":"10.1109/ISVLSI.2009.24","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.24","url":null,"abstract":"With the development of SOC designs, modern floorplanning typically needs to provide extra options to meet the different emerging requirements in the hierarchical designs, such as boundary constraint for I/O connection, clustering constraint for performance and reliability, etc. This paper addresses modern floorplanning with boundary clustering constraint. It has been empirically shown that the modern constraints extremely restrict the solution space; that is, a large number of randomly generated floorplans might be infeasible. In order to effectively search the feasible solutions, the feasible conditions based on B*-tree representation with boundary clustering constraint are investigated. The properties, coupled with an efficient simulated annealing algorithm, provide the way to produce feasible floorplans by dynamic repairing, which can transform an infeasible solution into a feasible one if the constraint is violated. Our algorithm is verified by using the MCNC and GSRC benchmarks, and the empirical results show that our algorithm can obtain promising solutions in acceptable time","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129453950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Reduction of Current Mismatch in PLL Charge Pump 锁相环电荷泵电流失配的减小
Pub Date : 2009-05-13 DOI: 10.1109/ISVLSI.2009.45
H. Fazeel, L. Raghavan, Chandrasekaran Srinivasaraman, Manish Jain
Low static phase offset is desired in Phase Locked Loops (PLL) employed in high speed I/O interfaces and frequency synthesizers. In this work, non idealities in phase frequency detector and charge pump contributing to static phase offset have been studied and their relative contributions analyzed in detail. A new charge pump architecture with reduced mismatch between Up and Dn current sources has been presented. It makes use of a single two stage amplifier for both current steering and reduction of mismatch. The efficacy of this architecture has been demonstrated with simulation results on a PLL running at an input reference frequency of 500MHz in65nm CMOS technology.
在高速I/O接口和频率合成器中使用的锁相环(PLL)需要低静态相位偏移。本文研究了相频检测器和电荷泵的非理想性对静态相位偏移的影响,并详细分析了它们的相对贡献。提出了一种新的电荷泵结构,减少了上、小电流源之间的不匹配。它利用单个两级放大器进行电流控制和减少失配。在输入参考频率为500MHz、采用65nm CMOS技术的锁相环上的仿真结果证明了该结构的有效性。
{"title":"Reduction of Current Mismatch in PLL Charge Pump","authors":"H. Fazeel, L. Raghavan, Chandrasekaran Srinivasaraman, Manish Jain","doi":"10.1109/ISVLSI.2009.45","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.45","url":null,"abstract":"Low static phase offset is desired in Phase Locked Loops (PLL) employed in high speed I/O interfaces and frequency synthesizers. In this work, non idealities in phase frequency detector and charge pump contributing to static phase offset have been studied and their relative contributions analyzed in detail. A new charge pump architecture with reduced mismatch between Up and Dn current sources has been presented. It makes use of a single two stage amplifier for both current steering and reduction of mismatch. The efficacy of this architecture has been demonstrated with simulation results on a PLL running at an input reference frequency of 500MHz in65nm CMOS technology.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":" 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120834021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
A Novel Low Area Overhead Body Bias FPGA Architecture for Low Power Applications 一种适用于低功耗应用的新型低面积开销体偏置FPGA架构
Pub Date : 2009-05-13 DOI: 10.1109/ISVLSI.2009.51
Sungmin Bae, K. Ramakrishnan, N. Vijaykrishnan
As technology scales, leakage power shares a dominant part in the total power dissipation of the chip and reaches up to 50% or even higher at elevated temperatures in 45 nm technology. Leakage power dissipation is especially problematic for FPGAs due to their reconfigurable nature and large number of inactive resources. Body biasing is an efficient technique to reduce leakage current which has been widely adopted in 45nm technology low power architectures.FPGAs with coarse grained body bias control only incurred about 10% of the area overhead while increasing the granularity to the finest level dramatically increases the area overhead over 100%. However, the coarse grained body bias control FPGA may not result in satisfactory leakage power reduction since all the paths passing a resource must have enough slacks. To overcome the assignment limitation, we propose a novel FPGA architecture which uses body biasing technique and clock skew scheduling at a coarse grained architecture level. Clock skew scheduling technique only incurs 3.35% of additional area overhead in order to distribute slack to the resource instead of increasing the minimum body-bias granularity. Further, we propose a body bias assignment algorithm to leverage the proposed architecture. Experimental results demonstrate that the proposed architecture achieved an average leakage reduction of about 76% as compared to 61% of coarse grained architecture.
随着技术规模的扩大,泄漏功率在芯片总功耗中占主导地位,在45纳米技术中,泄漏功率在高温下可达到50%甚至更高。由于fpga的可重构特性和大量非活动资源,泄漏功耗对fpga来说尤其成问题。体偏置是一种有效的降低漏电流的技术,已广泛应用于45nm工艺的低功耗架构中。具有粗粒度体偏置控制的fpga仅产生约10%的面积开销,而将粒度增加到最细级别则会显着增加超过100%的面积开销。然而,粗粒度体偏置控制FPGA可能无法获得令人满意的泄漏功率降低,因为通过资源的所有路径必须有足够的松弛。为了克服分配限制,我们提出了一种新的FPGA架构,该架构在粗粒度架构级别上使用体偏置技术和时钟倾斜调度。时钟偏差调度技术只会产生3.35%的额外面积开销,以便将空闲分配给资源,而不是增加最小体偏差粒度。此外,我们提出了一种身体偏差分配算法来利用所提出的架构。实验结果表明,与粗粒度结构61%的平均泄漏减少率相比,该结构的平均泄漏减少率约为76%。
{"title":"A Novel Low Area Overhead Body Bias FPGA Architecture for Low Power Applications","authors":"Sungmin Bae, K. Ramakrishnan, N. Vijaykrishnan","doi":"10.1109/ISVLSI.2009.51","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.51","url":null,"abstract":"As technology scales, leakage power shares a dominant part in the total power dissipation of the chip and reaches up to 50% or even higher at elevated temperatures in 45 nm technology. Leakage power dissipation is especially problematic for FPGAs due to their reconfigurable nature and large number of inactive resources. Body biasing is an efficient technique to reduce leakage current which has been widely adopted in 45nm technology low power architectures.FPGAs with coarse grained body bias control only incurred about 10% of the area overhead while increasing the granularity to the finest level dramatically increases the area overhead over 100%. However, the coarse grained body bias control FPGA may not result in satisfactory leakage power reduction since all the paths passing a resource must have enough slacks. To overcome the assignment limitation, we propose a novel FPGA architecture which uses body biasing technique and clock skew scheduling at a coarse grained architecture level. Clock skew scheduling technique only incurs 3.35% of additional area overhead in order to distribute slack to the resource instead of increasing the minimum body-bias granularity. Further, we propose a body bias assignment algorithm to leverage the proposed architecture. Experimental results demonstrate that the proposed architecture achieved an average leakage reduction of about 76% as compared to 61% of coarse grained architecture.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115133579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
An Analytical Model to Study Optimal Area Breakdown between Cores and Caches in a Chip Multiprocessor 片式多处理器核与缓存间最优区域分解的解析模型
Pub Date : 2009-05-13 DOI: 10.1109/ISVLSI.2009.27
Taecheol Oh, Hyunjin Lee, Kiyeon Lee, Sangyeun Cho
A key design issue for chip multiprocessors (CMPs) is how to exploit the finite chip area to get the best system throughput.The most dominant area-consuming components in a CMP are processor cores and caches today.There is an important trade-off between the number of cores and the amount of cache in a single CMP chip.If we have too few cores, the system throughput will be limited by the number of threads.If we have too small cache capacity, the system may perform poorly due to frequent cache misses.This paper presents a simple and effective analytical model to study the trade-off of the core count and the cache capacity in a CMP under a finite die area constraint.Our model differentiates shared, private, and hybrid cache organizations.Our work will complement more detailed yet time-consuming simulation approaches by enabling one to quickly study how key chip area allocation parameters affect the system performance.
芯片多处理器(cmp)的一个关键设计问题是如何利用有限的芯片面积来获得最佳的系统吞吐量。目前,CMP中最主要的面积消耗组件是处理器内核和缓存。在单个CMP芯片中的内核数量和缓存数量之间存在一个重要的权衡。如果我们有太少的内核,系统吞吐量将受到线程数量的限制。如果我们的缓存容量太小,系统可能会因为频繁的缓存丢失而性能不佳。本文提出了一个简单有效的分析模型,用于研究在有限模面积约束下CMP中芯数和缓存容量的权衡。我们的模型区分了共享、私有和混合缓存组织。我们的工作将通过使人们能够快速研究关键芯片区域分配参数如何影响系统性能来补充更详细但耗时的仿真方法。
{"title":"An Analytical Model to Study Optimal Area Breakdown between Cores and Caches in a Chip Multiprocessor","authors":"Taecheol Oh, Hyunjin Lee, Kiyeon Lee, Sangyeun Cho","doi":"10.1109/ISVLSI.2009.27","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.27","url":null,"abstract":"A key design issue for chip multiprocessors (CMPs) is how to exploit the finite chip area to get the best system throughput.The most dominant area-consuming components in a CMP are processor cores and caches today.There is an important trade-off between the number of cores and the amount of cache in a single CMP chip.If we have too few cores, the system throughput will be limited by the number of threads.If we have too small cache capacity, the system may perform poorly due to frequent cache misses.This paper presents a simple and effective analytical model to study the trade-off of the core count and the cache capacity in a CMP under a finite die area constraint.Our model differentiates shared, private, and hybrid cache organizations.Our work will complement more detailed yet time-consuming simulation approaches by enabling one to quickly study how key chip area allocation parameters affect the system performance.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117041547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
On-line MPSoC Scheduling Considering Power Gating Induced Power/Ground Noise 考虑电源门控感应功率/地噪声的MPSoC在线调度
Pub Date : 2009-05-13 DOI: 10.1109/ISVLSI.2009.54
Yan Xu, Weichen Liu, Yu Wang, Jiang Xu, Xiaoming Chen, Huazhong Yang
Power gating induced power/ground(P/G) noise is a major reliability problem facing by low power MPSoCs using power gating techniques. Powering on and off a process unit in MPSoCs will induce large P/G noise and can cause timing divergence and even functional errors in surrounding processing units. P/G noise is different from thermal or energy which is an accumulative effect. The noise level should be predicted and victim circuits should be protected before the noise is induced. hence, the power gating-aware scheduling problem with the consideration of P/G noise should be solved using an on-line method considering the run-time variation of tasks' execution time. In this paper, we formulate an on-line task scheduling problem with the consideration of P/G noise based on our detailed P/G noise analysis platform for MPSoC. An efficient on-line Greedy Heuristic (GH) algorithm that adapts well to real-time variation is proposed to reduce noise protection penalty and improve MPSoC performance. Our experiments show that the algorithm can achieve an average 26% performance improvement together with an average 73% noise protection penalty saving compared with the conservative stop-go method.  We also compare our technique with a two-step solution that computes a static schedule at compile time and make adjustment on the schedule according to runtime variations. For benchmark with larger task number, GH method achieves impressive performance improvement comparing with the two-step solution.
功率门控引起的功率/地(P/G)噪声是采用功率门控技术的低功率mpsoc面临的主要可靠性问题。在mpsoc中,打开和关闭一个处理单元会产生很大的P/G噪声,并可能导致周围处理单元的时序偏差甚至功能错误。P/G噪声不同于热能或能量,是一种累积效应。在诱发噪声之前,应预测噪声水平并保护受害电路。因此,考虑P/G噪声的功率门敏感调度问题需要采用考虑任务执行时间运行时变化的在线方法来解决。本文基于MPSoC详细的P/G噪声分析平台,提出了一个考虑P/G噪声的在线任务调度问题。为了降低噪声保护损失,提高MPSoC性能,提出了一种有效的在线贪婪启发式算法,该算法能很好地适应实时变化。实验表明,与保守的停止-走方法相比,该算法的性能平均提高了26%,噪声保护惩罚平均节省了73%。我们还将我们的技术与在编译时计算静态调度并根据运行时变化对调度进行调整的两步解决方案进行了比较。对于任务数较大的基准测试,GH方法比两步法获得了显著的性能提升。
{"title":"On-line MPSoC Scheduling Considering Power Gating Induced Power/Ground Noise","authors":"Yan Xu, Weichen Liu, Yu Wang, Jiang Xu, Xiaoming Chen, Huazhong Yang","doi":"10.1109/ISVLSI.2009.54","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.54","url":null,"abstract":"Power gating induced power/ground(P/G) noise is a major reliability problem facing by low power MPSoCs using power gating techniques. Powering on and off a process unit in MPSoCs will induce large P/G noise and can cause timing divergence and even functional errors in surrounding processing units. P/G noise is different from thermal or energy which is an accumulative effect. The noise level should be predicted and victim circuits should be protected before the noise is induced. hence, the power gating-aware scheduling problem with the consideration of P/G noise should be solved using an on-line method considering the run-time variation of tasks' execution time. In this paper, we formulate an on-line task scheduling problem with the consideration of P/G noise based on our detailed P/G noise analysis platform for MPSoC. An efficient on-line Greedy Heuristic (GH) algorithm that adapts well to real-time variation is proposed to reduce noise protection penalty and improve MPSoC performance. Our experiments show that the algorithm can achieve an average 26% performance improvement together with an average 73% noise protection penalty saving compared with the conservative stop-go method.  We also compare our technique with a two-step solution that computes a static schedule at compile time and make adjustment on the schedule according to runtime variations. For benchmark with larger task number, GH method achieves impressive performance improvement comparing with the two-step solution.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132230358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Lossless Compression Using Efficient Encoding of Bitmasks 使用有效的位掩码编码的无损压缩
Pub Date : 2009-05-13 DOI: 10.1109/ISVLSI.2009.18
C. Murthy, P. Mishra
Lossless compression is widely used to improve both memory requirement and communication bandwidth in embedded systems. Dictionary based compression techniques are very popular because of their good compression efficiency and fast decompression mechanism. Bitmask based compression improves the effectiveness of the dictionary based approaches by recording minor differences using bitmasks. This paper proposes an efficient encoding of bitmasks used in bitmask-based compression. We prove that a n-bit bitmask (records n differences) can be encoded using only n-1 bits. This encoding improves compression efficiency while reduces decompression hardware overhead. We have applied our approach in a wide a variety of domains including code compression, FPGA bitstream compression as well as control word compression. Our experimental results using a wide variety of benchmarks demonstrate that our approach improves the compression efficiency by 3 to 10% without adding any additional decompression overhead.
在嵌入式系统中,无损压缩被广泛用于提高内存需求和通信带宽。基于字典的压缩技术以其良好的压缩效率和快速的解压机制而受到广泛的应用。基于位掩码的压缩通过使用位掩码记录微小的差异,提高了基于字典的压缩方法的有效性。提出了一种有效的位掩码编码方法,用于基于位掩码的压缩。我们证明了一个n位的位掩码(记录n个差异)可以只用n-1位进行编码。这种编码提高了压缩效率,同时减少了解压缩硬件开销。我们已经将我们的方法应用于各种各样的领域,包括代码压缩,FPGA比特流压缩以及控制字压缩。我们使用各种基准测试的实验结果表明,我们的方法在不增加任何额外的解压开销的情况下将压缩效率提高了3%到10%。
{"title":"Lossless Compression Using Efficient Encoding of Bitmasks","authors":"C. Murthy, P. Mishra","doi":"10.1109/ISVLSI.2009.18","DOIUrl":"https://doi.org/10.1109/ISVLSI.2009.18","url":null,"abstract":"Lossless compression is widely used to improve both memory requirement and communication bandwidth in embedded systems. Dictionary based compression techniques are very popular because of their good compression efficiency and fast decompression mechanism. Bitmask based compression improves the effectiveness of the dictionary based approaches by recording minor differences using bitmasks. This paper proposes an efficient encoding of bitmasks used in bitmask-based compression. We prove that a n-bit bitmask (records n differences) can be encoded using only n-1 bits. This encoding improves compression efficiency while reduces decompression hardware overhead. We have applied our approach in a wide a variety of domains including code compression, FPGA bitstream compression as well as control word compression. Our experimental results using a wide variety of benchmarks demonstrate that our approach improves the compression efficiency by 3 to 10% without adding any additional decompression overhead.","PeriodicalId":137508,"journal":{"name":"2009 IEEE Computer Society Annual Symposium on VLSI","volume":"173 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131181085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
2009 IEEE Computer Society Annual Symposium on VLSI
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1