首页 > 最新文献

2008 IEEE International Conference on Computer Design最新文献

英文 中文
Temperature-aware clock tree synthesis considering spatiotemporal hot spot correlations 考虑时空热点相关性的温度感知时钟树合成
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751848
Chunchen Liu, Junjie Su, Yiyu Shi
Temperature variation in microprocessors is a workload dependent problem. In such a design, the clock skew should be minimized with respect to temperature variation. Existing work has studied clock tree embedding perturbation considering time variant temperature variation. There is no existing method that can reduce skew variation. This paper develops an efficient yet effective simultaneous hotspot avoid embedding and thermal aware routing (TMST) method, where hotspot embedding avoid tree topology located in area with high temperature possibility and thermal aware routing reduce skew in tree path with more smooth temperature area. With a thermally tolerable tree structure, our method can reduce not only delay skew but also skew variation (skew violation range). Compared with existing temperature-aware clock tree method, our TMST solution reduces skew variation by 2X compared with the greedy-DME (GDME) method of Edahiro and existing thermal aware clock synthesis TACO and PECO. With the scale from 100 down to 1 temperature maps, our TMST also guarantees the smallest wire length overflow. TMST reduces the worst case skew up to 4X than PECO and 5X than TACO.
微处理器中的温度变化是一个与工作负载相关的问题。在这样的设计中,时钟的偏差应该与温度变化有关。已有的工作研究了考虑时变温度变化的时钟树嵌入微扰。目前还没有一种方法可以减少偏度变化。本文提出了一种高效且有效的同时嵌入热点和热感知路由(TMST)方法,其中热点嵌入避免了位于高温可能性区域的树状拓扑,热感知路由减少了温度区域更光滑的树状路径的偏移。该方法采用热容许树结构,不仅可以减少延迟偏斜,还可以减少偏斜变化(偏斜违反范围)。与现有的温度感知时钟树方法相比,我们的TMST解决方案比Edahiro的贪婪- dme (GDME)方法和现有的热感知时钟合成TACO和PECO减少了2X的偏度变化。从100到1温度图的比例,我们的TMST还保证最小的电线长度溢出。TMST减少了最坏情况下的倾斜,比PECO高4倍,比TACO高5倍。
{"title":"Temperature-aware clock tree synthesis considering spatiotemporal hot spot correlations","authors":"Chunchen Liu, Junjie Su, Yiyu Shi","doi":"10.1109/ICCD.2008.4751848","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751848","url":null,"abstract":"Temperature variation in microprocessors is a workload dependent problem. In such a design, the clock skew should be minimized with respect to temperature variation. Existing work has studied clock tree embedding perturbation considering time variant temperature variation. There is no existing method that can reduce skew variation. This paper develops an efficient yet effective simultaneous hotspot avoid embedding and thermal aware routing (TMST) method, where hotspot embedding avoid tree topology located in area with high temperature possibility and thermal aware routing reduce skew in tree path with more smooth temperature area. With a thermally tolerable tree structure, our method can reduce not only delay skew but also skew variation (skew violation range). Compared with existing temperature-aware clock tree method, our TMST solution reduces skew variation by 2X compared with the greedy-DME (GDME) method of Edahiro and existing thermal aware clock synthesis TACO and PECO. With the scale from 100 down to 1 temperature maps, our TMST also guarantees the smallest wire length overflow. TMST reduces the worst case skew up to 4X than PECO and 5X than TACO.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122443834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Two dimensional highly associative level-two cache design 二维高度关联的二级缓存设计
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751934
Chuanjun Zhang, Bing Xue
High associativity is important for level-two cache designs [9]. Implementing CAM-based highly associative caches (CAM-HAC), however, is both costly in hardware and exhibits poor scalability. We propose to implement the CAM-HAC in macro-blocks to improve scalability. Each macro-block contains 128-row and 8-column of cache blocks. We name it Two dimensional Cache, or T-Cache. Each macro-block has an associativity equivalent to 128times8=1024-way. Twelve bits of the T-Cachepsilas tag are implemented by using CAM, while the remaining tag uses SRAM; Furthermore, random replacement is used in rows to balance cache sets usage while LRU is used in columns to select the victim from a row. The hardware complexity for replacement is reduced greatly compared to a traditional CAM-HAC using LRU solely. Experimental results show that the T-Cache achieves a 16% miss rate reduction over a traditional 8-way unified L2 cache. This translates into an average IPC improvement of 5% and as high as 18%. The T-Cache exhibits a 4% total memory access-related energy savings due to the reduction to applicationspsila execution time.
高关联性对于二级缓存设计非常重要[9]。然而,实现基于cam的高关联缓存(CAM-HAC)在硬件上成本很高,而且可扩展性很差。我们建议在宏块中实现CAM-HAC以提高可扩展性。每个宏块包含128行和8列的缓存块。我们称之为二维缓存,或者t -缓存。每个宏块的结合性相当于128times8=1024-way。T-Cachepsilas标签的12位使用CAM实现,其余标签使用SRAM;此外,在行中使用随机替换来平衡缓存集的使用,而在列中使用LRU来从一行中选择受害者。与仅使用LRU的传统CAM-HAC相比,大大降低了更换硬件的复杂性。实验结果表明,与传统的8路统一L2缓存相比,T-Cache的丢失率降低了16%。这意味着IPC平均提高了5%,最高可达18%。由于减少了应用程序的执行时间,T-Cache显示了4%的内存访问相关的总能源节约。
{"title":"Two dimensional highly associative level-two cache design","authors":"Chuanjun Zhang, Bing Xue","doi":"10.1109/ICCD.2008.4751934","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751934","url":null,"abstract":"High associativity is important for level-two cache designs [9]. Implementing CAM-based highly associative caches (CAM-HAC), however, is both costly in hardware and exhibits poor scalability. We propose to implement the CAM-HAC in macro-blocks to improve scalability. Each macro-block contains 128-row and 8-column of cache blocks. We name it Two dimensional Cache, or T-Cache. Each macro-block has an associativity equivalent to 128times8=1024-way. Twelve bits of the T-Cachepsilas tag are implemented by using CAM, while the remaining tag uses SRAM; Furthermore, random replacement is used in rows to balance cache sets usage while LRU is used in columns to select the victim from a row. The hardware complexity for replacement is reduced greatly compared to a traditional CAM-HAC using LRU solely. Experimental results show that the T-Cache achieves a 16% miss rate reduction over a traditional 8-way unified L2 cache. This translates into an average IPC improvement of 5% and as high as 18%. The T-Cache exhibits a 4% total memory access-related energy savings due to the reduction to applicationspsila execution time.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121530197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Systematic design of high-radix Montgomery multipliers for RSA processors RSA处理器高基数Montgomery乘法器的系统设计
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751894
A. Miyamoto, N. Homma, T. Aoki, Akashi Satoh
The present paper proposes a systematic design approach to provide the optimal high-radix Montgomery multipliers for an RSA processor satisfying user requirements. We introduces three multiplier-based architectures using different intermediate-data forms ((i) single form, (ii) semi carry-save form, and (iii) carry-save form, and combined them with a wide variety of arithmetic components. Their radices are also parameterized from 28 to 264. A total of 202 designs for 1,024-bit RSA processors were obtained for each radix, and were synthesized using a 90-nm CMOS standard cell library. The smallest design of 0.9 Kgates with 137.8 ms/RSA to the fastest design of 1.8 ms/RSA at 74.7 Kgates were then obtained. In addition, the optimal design to meet the user requirements can be easily obtained from all the combinations. In addition to choosing the datapath architecture, the arithmetic component, and the radix parameters, the proposed systematic approach can also adopt other process technologies.
本文提出了一种系统的设计方法,为满足用户需求的RSA处理器提供最佳的高基数蒙哥马利乘法器。我们介绍了三种基于乘数的架构,使用不同的中间数据形式((i)单一形式,(ii)半进位保存形式和(iii)进位保存形式),并将它们与各种各样的算术组件结合起来。它们的根也从28到264参数化。每个基数共获得202个1024位RSA处理器设计,并使用90 nm CMOS标准单元库进行合成。从137.8 ms/RSA的最小设计0.9 Kgates到74.7 Kgates 1.8 ms/RSA的最快设计。此外,可以很容易地从所有组合中获得满足用户要求的最优设计。除了选择数据路径体系结构、算法组件和基数参数外,该方法还可以采用其他处理技术。
{"title":"Systematic design of high-radix Montgomery multipliers for RSA processors","authors":"A. Miyamoto, N. Homma, T. Aoki, Akashi Satoh","doi":"10.1109/ICCD.2008.4751894","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751894","url":null,"abstract":"The present paper proposes a systematic design approach to provide the optimal high-radix Montgomery multipliers for an RSA processor satisfying user requirements. We introduces three multiplier-based architectures using different intermediate-data forms ((i) single form, (ii) semi carry-save form, and (iii) carry-save form, and combined them with a wide variety of arithmetic components. Their radices are also parameterized from 28 to 264. A total of 202 designs for 1,024-bit RSA processors were obtained for each radix, and were synthesized using a 90-nm CMOS standard cell library. The smallest design of 0.9 Kgates with 137.8 ms/RSA to the fastest design of 1.8 ms/RSA at 74.7 Kgates were then obtained. In addition, the optimal design to meet the user requirements can be easily obtained from all the combinations. In addition to choosing the datapath architecture, the arithmetic component, and the radix parameters, the proposed systematic approach can also adopt other process technologies.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127832731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Seamless sequence of software defined radio designs through hardware reconfigurability of FPGAs 通过fpga的硬件可重构性实现软件无线电设计的无缝序列
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751871
A. H. Gholamipour, E. Bozorgzadeh, L. Bao
Software Defined Radio (SDR) base stations can compensate for failures in disaster scenarios by assimilating different communication technologies. FPGAs play an important role in the platform of an SDR base station because of flexibility and DSP processing power that they deliver. The flexibility of FPGAs comes at the high cost of reconfiguration time overhead which can be a serious deterrence because of QoS requirements of real time traffic. In this paper we propose a solution to reduce reconfiguration time overhead at system-level where we are provided the configuration of each wireless system. Following that we step further and integrate our solution in to a floorplanner to generate placements for wireless systems which can systematically hide or reduce reconfiguration time overhead. Our experiments show the effectiveness of our approach.
软件定义无线电(SDR)基站可以通过吸收不同的通信技术来补偿灾难场景中的故障。fpga由于其灵活性和DSP处理能力在SDR基站平台中扮演着重要的角色。fpga的灵活性是以高昂的重新配置时间开销为代价的,由于实时流量的QoS要求,这可能是一个严重的阻碍。在本文中,我们提出了一个解决方案,以减少系统级的重新配置时间开销,其中我们提供了每个无线系统的配置。接下来,我们进一步将我们的解决方案集成到地板规划器中,以生成无线系统的放置位置,从而可以系统地隐藏或减少重新配置的时间开销。我们的实验证明了我们方法的有效性。
{"title":"Seamless sequence of software defined radio designs through hardware reconfigurability of FPGAs","authors":"A. H. Gholamipour, E. Bozorgzadeh, L. Bao","doi":"10.1109/ICCD.2008.4751871","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751871","url":null,"abstract":"Software Defined Radio (SDR) base stations can compensate for failures in disaster scenarios by assimilating different communication technologies. FPGAs play an important role in the platform of an SDR base station because of flexibility and DSP processing power that they deliver. The flexibility of FPGAs comes at the high cost of reconfiguration time overhead which can be a serious deterrence because of QoS requirements of real time traffic. In this paper we propose a solution to reduce reconfiguration time overhead at system-level where we are provided the configuration of each wireless system. Following that we step further and integrate our solution in to a floorplanner to generate placements for wireless systems which can systematically hide or reduce reconfiguration time overhead. Our experiments show the effectiveness of our approach.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128197281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Design and evaluation of an optical CPU-DRAM interconnect 一种光学CPU-DRAM互连的设计与评价
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751906
Amit Hadke, Tony Benavides, R. Amirtharajah, M. Farrens, V. Akella
We present OCDIMM (Optically Connected DIMM), a CPU-DRAM interface that uses multiwavelength optical interconnects. We show that OCDIMM is more scalable and offers higher bandwidth and lower latency than FBDIMM (Fully-Buffered DIMM), a state-of-the-art electrical alternative. Though OCDIMM is more power efficient than FBDIMM, we show that ultimately the total power consumption in the memory subsystem is a key impediment to scalability and thus to achieving truly balanced computing systems in the terascale era.
我们提出OCDIMM(光连接DIMM),一种使用多波长光互连的CPU-DRAM接口。我们表明OCDIMM具有更高的可扩展性,并提供比FBDIMM(全缓冲DIMM)更高的带宽和更低的延迟,FBDIMM是一种最先进的电子替代品。虽然OCDIMM比FBDIMM更节能,但我们表明,最终内存子系统的总功耗是可扩展性的关键障碍,从而在太万亿级时代实现真正平衡的计算系统。
{"title":"Design and evaluation of an optical CPU-DRAM interconnect","authors":"Amit Hadke, Tony Benavides, R. Amirtharajah, M. Farrens, V. Akella","doi":"10.1109/ICCD.2008.4751906","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751906","url":null,"abstract":"We present OCDIMM (Optically Connected DIMM), a CPU-DRAM interface that uses multiwavelength optical interconnects. We show that OCDIMM is more scalable and offers higher bandwidth and lower latency than FBDIMM (Fully-Buffered DIMM), a state-of-the-art electrical alternative. Though OCDIMM is more power efficient than FBDIMM, we show that ultimately the total power consumption in the memory subsystem is a key impediment to scalability and thus to achieving truly balanced computing systems in the terascale era.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131987309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Prototyping a hybrid main memory using a virtual machine monitor 使用虚拟机监视器对混合主内存进行原型设计
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751873
Dong Ye, Aravind Pavuluri, Carl A. Waldspurger, Brian Tsang, Bohuslav Rychlik, Steven Woo
We use a novel virtualization-based approach for computer architecture performance analysis. We present a case study analyzing a hypothetical hybrid main memory, which consists of a first-level DRAM augmented by a 10-100x slower second-level memory. This architecture is motivated by the recent emergence of lower-cost, higher-density, and lower-power alternative memory technologies. To model such a system, we customize a virtual machine monitor (VMM) with delay-simulation and instrumentation code. Benchmarks representing server, technical computing, and desktop productivity workloads are evaluated in virtual machines (VMs). Relative to baseline all-DRAM systems, these workloads experience widely varying performance degradation when run on hybrid main memory systems which have significant amounts of second-level memory.
我们使用一种新颖的基于虚拟化的方法进行计算机体系结构性能分析。我们提出了一个案例研究,分析了一个假设的混合主存储器,它由一级DRAM和10-100倍慢的二级存储器组成。这种架构是由最近出现的低成本、高密度和低功耗替代存储器技术推动的。为了对这样的系统建模,我们使用延迟仿真和仪表代码定制了一个虚拟机监视器(VMM)。表示服务器、技术计算和桌面生产力工作负载的基准在虚拟机(vm)中进行评估。与基线全dram系统相比,这些工作负载在具有大量二级内存的混合主内存系统上运行时,会经历不同程度的性能下降。
{"title":"Prototyping a hybrid main memory using a virtual machine monitor","authors":"Dong Ye, Aravind Pavuluri, Carl A. Waldspurger, Brian Tsang, Bohuslav Rychlik, Steven Woo","doi":"10.1109/ICCD.2008.4751873","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751873","url":null,"abstract":"We use a novel virtualization-based approach for computer architecture performance analysis. We present a case study analyzing a hypothetical hybrid main memory, which consists of a first-level DRAM augmented by a 10-100x slower second-level memory. This architecture is motivated by the recent emergence of lower-cost, higher-density, and lower-power alternative memory technologies. To model such a system, we customize a virtual machine monitor (VMM) with delay-simulation and instrumentation code. Benchmarks representing server, technical computing, and desktop productivity workloads are evaluated in virtual machines (VMs). Relative to baseline all-DRAM systems, these workloads experience widely varying performance degradation when run on hybrid main memory systems which have significant amounts of second-level memory.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134400656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Early stage FPGA interconnect leakage power estimation 早期FPGA互连漏功率估计
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751898
Shilpa Bhoj, D. Bhatia
Increasing transistor densities, rising popularity in mobile applications and migration towards eco-friendly computing systems have made power dissipation a key FPGA design issue. To meet stringent budgets, system architects need accurate estimates of power distribution at various design stages. In this work, we make several key contributions to FPGA leakage power estimation. First, we develop an accurate and efficient model to estimate total interconnect leakage power at various design stages prior to routing. Our methods derive leakage power estimates based on predicted values of routing congestion and interconnect resource utilization. We then extend the model to accomodate complex segmented routing architectures and low leakage architectures. Finally we formulate relations to generate post place leakage power estimates of individual routing channels. Our models for overall leakage power estimation achieve average accuracy rates of 93% and 89% for uniform and segmented routing architectures respectively. Experimentation results also establish the accuracy of the channel level estimation models at 85% and 80% for uniform and segmented routing structures. Our models and techniques would help designers make informed decisions by providing information on the power consumption of the interconnect fabric well before routing. Additionally, the equations can be used for architectural explorations and embedded in power and thermal aware CAD tools.
晶体管密度的增加,移动应用的日益普及以及向环保计算系统的迁移使得功耗成为FPGA设计的关键问题。为了满足严格的预算,系统架构师需要在各个设计阶段对功率分布进行准确的估计。在这项工作中,我们对FPGA泄漏功率估计做出了几项关键贡献。首先,我们开发了一个准确有效的模型来估计在布线之前的各个设计阶段的总互连泄漏功率。我们的方法基于路由拥塞和互连资源利用率的预测值得出泄漏功率估计。然后,我们扩展模型以适应复杂的分段路由架构和低泄漏架构。最后,我们建立了生成各个路由通道的后置泄漏功率估计的关系式。我们的整体泄漏功率估计模型在均匀和分段路由架构下分别达到93%和89%的平均准确率。实验结果还表明,对于均匀和分段路由结构,信道电平估计模型的精度分别为85%和80%。我们的模型和技术可以在布线之前提供有关互连结构功耗的信息,从而帮助设计人员做出明智的决策。此外,这些方程可以用于建筑探索,并嵌入到功率和热感知CAD工具中。
{"title":"Early stage FPGA interconnect leakage power estimation","authors":"Shilpa Bhoj, D. Bhatia","doi":"10.1109/ICCD.2008.4751898","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751898","url":null,"abstract":"Increasing transistor densities, rising popularity in mobile applications and migration towards eco-friendly computing systems have made power dissipation a key FPGA design issue. To meet stringent budgets, system architects need accurate estimates of power distribution at various design stages. In this work, we make several key contributions to FPGA leakage power estimation. First, we develop an accurate and efficient model to estimate total interconnect leakage power at various design stages prior to routing. Our methods derive leakage power estimates based on predicted values of routing congestion and interconnect resource utilization. We then extend the model to accomodate complex segmented routing architectures and low leakage architectures. Finally we formulate relations to generate post place leakage power estimates of individual routing channels. Our models for overall leakage power estimation achieve average accuracy rates of 93% and 89% for uniform and segmented routing architectures respectively. Experimentation results also establish the accuracy of the channel level estimation models at 85% and 80% for uniform and segmented routing structures. Our models and techniques would help designers make informed decisions by providing information on the power consumption of the interconnect fabric well before routing. Additionally, the equations can be used for architectural explorations and embedded in power and thermal aware CAD tools.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129021637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Router and cell library co-development for improving redundant via insertion at pins 路由器和单元库共同开发,通过插针提高冗余度
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751929
Wei-Chiu Tseng, Yu-Hsing Chen, Rung-Bin Lin
In this paper we propose a synergetic approach that integrates router design and cell library engineering for improving post-routing via1 (via between M1 and M2) doubling rate at pins. We develop a double-via (DV) aware multilevel router to exploit the via1 doubling possibilities provided to the cells in a conventional as well as a DV-driven cell library. Compared to a non-DV-aware router using a conventional cell library, our approach using a DV-driven library can on average raise via1 doubling rate by 34%, raise total via doubling rate by 11%, reduce the total number of vias by 3%, and reduce the total number of via1s by 8%. All this can be achieved without incurring any performance and area penalties.
在本文中,我们提出了一种集成了路由器设计和单元库工程的协同方法,以提高引脚的路由后via1(通过M1和M2之间)加倍率。我们开发了一种双通道(DV)感知的多电平路由器,以利用传统和DV驱动的单元库中提供的via1加倍可能性。与使用传统蜂窝库的非dv感知路由器相比,我们使用dv驱动库的方法平均可将via1加倍率提高34%,将总通过加倍率提高11%,减少总过孔数3%,减少总过孔数8%。所有这些都可以在不产生任何性能和区域损失的情况下实现。
{"title":"Router and cell library co-development for improving redundant via insertion at pins","authors":"Wei-Chiu Tseng, Yu-Hsing Chen, Rung-Bin Lin","doi":"10.1109/ICCD.2008.4751929","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751929","url":null,"abstract":"In this paper we propose a synergetic approach that integrates router design and cell library engineering for improving post-routing via1 (via between M1 and M2) doubling rate at pins. We develop a double-via (DV) aware multilevel router to exploit the via1 doubling possibilities provided to the cells in a conventional as well as a DV-driven cell library. Compared to a non-DV-aware router using a conventional cell library, our approach using a DV-driven library can on average raise via1 doubling rate by 34%, raise total via doubling rate by 11%, reduce the total number of vias by 3%, and reduce the total number of via1s by 8%. All this can be achieved without incurring any performance and area penalties.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129308142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Comparative analysis of NBTI effects on low power and high performance flip-flops NBTI对低功耗和高性能人字拖影响的对比分析
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751862
K. Ramakrishnan, Xiaoxia Wu, N. Vijaykrishnan, Yuan Xie
Mitigating the circuit aging effect in digital circuits has become a very important concern for current and future technology nodes. Negative Bias Temperature Instability (NBTI) is one of the most important circuit aging mechanisms, which can incur timing errors. Flip-flops play a vital role as storage elements in pipelined architectures and are prone to effects of aging. NBTI increases the transistor threshold voltage, affecting the performance of the chip. In this paper, we study the effects of NBTI on the timing characteristics of different types of low power and high performance flip-flops. Factors such as input data probability and temperature which affect the degradation rate are also analyzed.
减轻数字电路中的电路老化效应已成为当前和未来技术节点非常关注的问题。负偏置温度不稳定性(NBTI)是电路老化最重要的机制之一,它会导致时序误差。人字拖在流水线架构中扮演着至关重要的存储元素,并且容易受到老化的影响。NBTI增加了晶体管的阈值电压,影响芯片的性能。本文研究了NBTI对不同类型的低功耗高性能触发器时序特性的影响。分析了输入数据概率、温度等因素对降解率的影响。
{"title":"Comparative analysis of NBTI effects on low power and high performance flip-flops","authors":"K. Ramakrishnan, Xiaoxia Wu, N. Vijaykrishnan, Yuan Xie","doi":"10.1109/ICCD.2008.4751862","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751862","url":null,"abstract":"Mitigating the circuit aging effect in digital circuits has become a very important concern for current and future technology nodes. Negative Bias Temperature Instability (NBTI) is one of the most important circuit aging mechanisms, which can incur timing errors. Flip-flops play a vital role as storage elements in pipelined architectures and are prone to effects of aging. NBTI increases the transistor threshold voltage, affecting the performance of the chip. In this paper, we study the effects of NBTI on the timing characteristics of different types of low power and high performance flip-flops. Factors such as input data probability and temperature which affect the degradation rate are also analyzed.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"322 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116364565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Optimization of Propagate Partial SAD and SAD tree motion estimation hardwired engine for H.264 H.264中传播部分SAD和SAD树运动估计硬线引擎的优化
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751881
Zhenyu Liu, S. Goto, T. Ikenaga
Variable block size motion estimation algorithm is the effcient approach to reduce the temporal redundancies and it has been adopted by the latest video coding standard H.264/AVC. The computational complexity augment coming from the variable block size technique makes the hardwired accelerator essential, especially for real-time applications. In this paper, the authors apply the architecture level and the circuits level approaches to improve the performance of Propagate Partial SAD and SAD Tree hardwired engines, which outperform other counterparts when considering the impact of supporting the variable block size technique. Experiments demonstrate that by using the proposed approaches, compared with the original architectures, 14.7% and 18.0% hardware cost can be saved for Propagate Partial SAD architecture and SAD Tree architecture, respectively. With TSMC 0.18 mm 1P6M CMOS technology, the proposed Propagate Partial SAD architecture attains 231.6 MHz operating frequency at a cost of 84.1 k gates. Correspondingly, the execution speed of the optimized SAD Tree architecture is improved to 204.8 MHz with 88.5 k gate hardware overhead.
变块大小运动估计算法是减少时间冗余的有效方法,已被最新的视频编码标准H.264/AVC所采用。可变块大小技术带来的计算复杂性的增加使得硬连线加速器变得必不可少,特别是在实时应用中。在本文中,作者采用体系结构级和电路级的方法来提高传播部分SAD和SAD树硬连线引擎的性能,在考虑支持可变块大小技术的影响时,它们优于其他同类引擎。实验表明,采用本文提出的方法,与原有结构相比,可分别节省14.7%和18.0%的硬件成本。采用台积电0.18 mm 1P6M CMOS技术,所提出的Propagate Partial SAD架构以84.1 k栅极成本达到231.6 MHz的工作频率。相应地,优化后的SAD树架构的执行速度提高到204.8 MHz,栅极硬件开销为88.5 k。
{"title":"Optimization of Propagate Partial SAD and SAD tree motion estimation hardwired engine for H.264","authors":"Zhenyu Liu, S. Goto, T. Ikenaga","doi":"10.1109/ICCD.2008.4751881","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751881","url":null,"abstract":"Variable block size motion estimation algorithm is the effcient approach to reduce the temporal redundancies and it has been adopted by the latest video coding standard H.264/AVC. The computational complexity augment coming from the variable block size technique makes the hardwired accelerator essential, especially for real-time applications. In this paper, the authors apply the architecture level and the circuits level approaches to improve the performance of Propagate Partial SAD and SAD Tree hardwired engines, which outperform other counterparts when considering the impact of supporting the variable block size technique. Experiments demonstrate that by using the proposed approaches, compared with the original architectures, 14.7% and 18.0% hardware cost can be saved for Propagate Partial SAD architecture and SAD Tree architecture, respectively. With TSMC 0.18 mm 1P6M CMOS technology, the proposed Propagate Partial SAD architecture attains 231.6 MHz operating frequency at a cost of 84.1 k gates. Correspondingly, the execution speed of the optimized SAD Tree architecture is improved to 204.8 MHz with 88.5 k gate hardware overhead.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114443297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
2008 IEEE International Conference on Computer Design
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1