首页 > 最新文献

2009 22nd International Conference on VLSI Design最新文献

英文 中文
New Techniques for Accelerating Small Delay ATPG and Generating Compact Test Sets 加速小延迟ATPG和生成紧凑测试集的新技术
Pub Date : 2009-01-05 DOI: 10.1109/VLSI.Design.2009.64
Boxue Yin, D. Xiang, Zhen Chen
The small delay defects testing has two challenges. One is that the longest testable path selection for every target fault in ATPG consumes much CPU time. The other is the test data volume are very large. In this paper, we propose two strategies to resolve these two problems. A new path selection in advance scheme is proposed to accelerate ATPG. It aims to find fewer paths and cover more faults in advance, which is different from the previous works. To reduce the test data volume, we propose a novel scan-based test scheme. We partition the scan flip-flops into some scan chains. The first scan flip-flop of every scan chain works in enhanced scan mode. And other scan flip-flops work in broad-side mode. This can significantly increase the don't care bits of every test pattern and provide more room for test compaction. Then the test pattern count can be reduced significantly. Experimental results show the efficiency of these techniques.
小延迟缺陷测试面临两个挑战。一是ATPG中每个目标故障的最长可测试路径选择会消耗大量CPU时间。二是测试数据量非常大。在本文中,我们提出了两种策略来解决这两个问题。提出了一种新的加速ATPG的提前路径选择方案。与以往的作品不同,它的目的是提前找到更少的路径,覆盖更多的故障。为了减少测试数据量,我们提出了一种新的基于扫描的测试方案。我们将扫描触发器划分为若干个扫描链。每个扫描链的第一个扫描触发器工作在增强扫描模式下。其他的扫描触发器工作在宽边模式。这可以显著增加每个测试模式的“不关心”部分,并为测试压缩提供更多空间。然后,测试模式计数可以显著减少。实验结果表明了这些技术的有效性。
{"title":"New Techniques for Accelerating Small Delay ATPG and Generating Compact Test Sets","authors":"Boxue Yin, D. Xiang, Zhen Chen","doi":"10.1109/VLSI.Design.2009.64","DOIUrl":"https://doi.org/10.1109/VLSI.Design.2009.64","url":null,"abstract":"The small delay defects testing has two challenges. One is that the longest testable path selection for every target fault in ATPG consumes much CPU time. The other is the test data volume are very large. In this paper, we propose two strategies to resolve these two problems. A new path selection in advance scheme is proposed to accelerate ATPG. It aims to find fewer paths and cover more faults in advance, which is different from the previous works. To reduce the test data volume, we propose a novel scan-based test scheme. We partition the scan flip-flops into some scan chains. The first scan flip-flop of every scan chain works in enhanced scan mode. And other scan flip-flops work in broad-side mode. This can significantly increase the don't care bits of every test pattern and provide more room for test compaction. Then the test pattern count can be reduced significantly. Experimental results show the efficiency of these techniques.","PeriodicalId":267121,"journal":{"name":"2009 22nd International Conference on VLSI Design","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132267347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Infrastructures for Education, Research and Industry in Microelectronics A Look Worldwide and a Look at India 微电子领域的教育、研究和工业基础设施——全球展望和印度展望
Pub Date : 2009-01-05 DOI: 10.1109/VLSI.Design.2009.17
B. Courtois, K. Torki, S. Dumont, S. Eyraud, J.-F. Paillotin, G. D. Pendina
Infrastructures to provide access to custom integrated hardware manufacturing facilities are important because they allow Students and Researchers to access professional facilities at a reasonable cost, and they allow Companies to access small volume production, otherwise difficult to obtain directly from manufacturers. This paper is reviewing the most recent developments at CMP like the introduction of a CMOS 45nm process, the cooperation between the major infrastructures services available worldwide and recent developments w.r.t. India. The conclusion is addressing technical developments as well as considerations like globalization and excellence.
提供定制集成硬件制造设施的基础设施很重要,因为它们允许学生和研究人员以合理的成本使用专业设施,并且它们允许公司进行小批量生产,否则很难直接从制造商那里获得。本文回顾了CMP的最新发展,如CMOS 45纳米工艺的引入,全球主要基础设施服务之间的合作以及印度的最新发展。结论涉及技术发展以及全球化和卓越等考虑因素。
{"title":"Infrastructures for Education, Research and Industry in Microelectronics A Look Worldwide and a Look at India","authors":"B. Courtois, K. Torki, S. Dumont, S. Eyraud, J.-F. Paillotin, G. D. Pendina","doi":"10.1109/VLSI.Design.2009.17","DOIUrl":"https://doi.org/10.1109/VLSI.Design.2009.17","url":null,"abstract":"Infrastructures to provide access to custom integrated hardware manufacturing facilities are important because they allow Students and Researchers to access professional facilities at a reasonable cost, and they allow Companies to access small volume production, otherwise difficult to obtain directly from manufacturers. This paper is reviewing the most recent developments at CMP like the introduction of a CMOS 45nm process, the cooperation between the major infrastructures services available worldwide and recent developments w.r.t. India. The conclusion is addressing technical developments as well as considerations like globalization and excellence.","PeriodicalId":267121,"journal":{"name":"2009 22nd International Conference on VLSI Design","volume":"155 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134285275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dedicated Rewriting: Automatic Verification of Low Power Transformations in RTL 专用重写:RTL中低功耗转换的自动验证
Pub Date : 2009-01-05 DOI: 10.1109/VLSI.Design.2009.85
V. Viswanath, Shobha Vasudevan, J. Abraham
We present dedicated rewriting, a novel technique to automatically prove the correctness of low power transformations in hardware systems described at the Register Transfer Level (RTL). We guarantee the correctness of any low power transformation by providing a functional equivalence proof of the hardware design before and after the transformation. We characterize low power transformations as rules, within our system. Dedicated rewriting is a highly automated deductive verification technique specially honed for proving correctness of low power transformations. We provide a notion of equivalence and establish the equivalence proof within our dedicated rewriting system. We demonstrate our technique on a non-trivial case study. We show equivalence of a Verilog RTL implementation of a Viterbi decoder, a component of the DRM SoC, before and after the application of multiple low power transformations.
我们提出了专用重写,这是一种新颖的技术,可以自动证明在寄存器传输级(RTL)描述的硬件系统中的低功耗转换的正确性。我们通过提供转换前后硬件设计的功能等价证明,保证任何低功耗转换的正确性。在我们的系统中,我们将低功耗转换描述为规则。专用重写是一种高度自动化的演绎验证技术,专门用于证明低功耗转换的正确性。我们提供了等价的概念,并在我们的专用重写系统中建立了等价证明。我们在一个重要的案例研究中演示了我们的技术。我们展示了在多个低功耗转换应用之前和之后,Verilog RTL实现Viterbi解码器(DRM SoC的一个组件)的等效性。
{"title":"Dedicated Rewriting: Automatic Verification of Low Power Transformations in RTL","authors":"V. Viswanath, Shobha Vasudevan, J. Abraham","doi":"10.1109/VLSI.Design.2009.85","DOIUrl":"https://doi.org/10.1109/VLSI.Design.2009.85","url":null,"abstract":"We present dedicated rewriting, a novel technique to automatically prove the correctness of low power transformations in hardware systems described at the Register Transfer Level (RTL). We guarantee the correctness of any low power transformation by providing a functional equivalence proof of the hardware design before and after the transformation. We characterize low power transformations as rules, within our system. Dedicated rewriting is a highly automated deductive verification technique specially honed for proving correctness of low power transformations. We provide a notion of equivalence and establish the equivalence proof within our dedicated rewriting system. We demonstrate our technique on a non-trivial case study. We show equivalence of a Verilog RTL implementation of a Viterbi decoder, a component of the DRM SoC, before and after the application of multiple low power transformations.","PeriodicalId":267121,"journal":{"name":"2009 22nd International Conference on VLSI Design","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131646544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Why is Design Automation and Reuse of Analog Designs Increasingly Trailing the Digital World? 为什么模拟设计的设计自动化和重用越来越落后于数字世界?
Pub Date : 2009-01-05 DOI: 10.1109/VLSI.Design.2009.107
G. Agarwal, Prakash Bare
Summary: Demand for high performance in today's systems requires some key IP components to be designed in analog, whereas quick turn around time requires significant use of digital IPs. While there has been significant progress in design automation and design reuse of digital circuits in the last couple of decades, much has not changed for analog design. Design capture in low
摘要:当今系统对高性能的需求要求一些关键IP组件以模拟方式设计,而快速的周转时间要求大量使用数字IP。虽然在过去的几十年里,数字电路的设计自动化和设计重用方面取得了重大进展,但模拟电路的设计并没有改变。低设计捕获
{"title":"Why is Design Automation and Reuse of Analog Designs Increasingly Trailing the Digital World?","authors":"G. Agarwal, Prakash Bare","doi":"10.1109/VLSI.Design.2009.107","DOIUrl":"https://doi.org/10.1109/VLSI.Design.2009.107","url":null,"abstract":"Summary: Demand for high performance in today's systems requires some key IP components to be designed in analog, whereas quick turn around time requires significant use of digital IPs. While there has been significant progress in design automation and design reuse of digital circuits in the last couple of decades, much has not changed for analog design. Design capture in low","PeriodicalId":267121,"journal":{"name":"2009 22nd International Conference on VLSI Design","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124981428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Implementation of Floating-Point Reciprocator on FPGA 浮点往复器在FPGA上的高效实现
Pub Date : 2009-01-05 DOI: 10.1109/VLSI.Design.2009.12
M. Jaiswal, N. Chandrachoodan
In this paper we have presented an efficient FPGA implementation of a  reciprocator for both IEEE single-precision and double-precision  floating point numbers. The method is based on the use of look-up tables and partial block multipliers. Compared with previously reported work, the modules occupy less area with a higher performance and less latency. The designs trade off either 1 unit in last-place (ulp) or 2 ulp of accuracy (for double or single precision respectively), without rounding, to obtain a better implementation. Rounding can also be added to the design to restore some accuracy at a slight cost in area.
在本文中,我们提出了一种高效的FPGA实现,用于IEEE单精度和双精度浮点数的往复式。该方法基于查找表和部分块乘法器的使用。与先前报道的工作相比,该模块占用的面积更小,性能更高,延迟更短。这些设计在没有舍入的情况下权衡了最后一个位置(ulp)的1个单位或2个单位的精度(分别为双精度或单精度),以获得更好的实现。也可以将舍入添加到设计中,以稍微减少面积的代价来恢复一些精度。
{"title":"Efficient Implementation of Floating-Point Reciprocator on FPGA","authors":"M. Jaiswal, N. Chandrachoodan","doi":"10.1109/VLSI.Design.2009.12","DOIUrl":"https://doi.org/10.1109/VLSI.Design.2009.12","url":null,"abstract":"In this paper we have presented an efficient FPGA implementation of a  reciprocator for both IEEE single-precision and double-precision  floating point numbers. The method is based on the use of look-up tables and partial block multipliers. Compared with previously reported work, the modules occupy less area with a higher performance and less latency. The designs trade off either 1 unit in last-place (ulp) or 2 ulp of accuracy (for double or single precision respectively), without rounding, to obtain a better implementation. Rounding can also be added to the design to restore some accuracy at a slight cost in area.","PeriodicalId":267121,"journal":{"name":"2009 22nd International Conference on VLSI Design","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116146361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Exploring the Limits of Port Reduction in Centralized Register Files 中心化寄存器文件端口缩减的极限探索
Pub Date : 2009-01-05 DOI: 10.1109/VLSI.Design.2009.29
Sandeep Sirsi, Aneesh Aggarwal
Register file access falls on the critical path of a microprocessor because large heavily ported register files are used to exploit more parallelism. In this paper, we focus on reducing register file complexity by reducing the number of register file read ports. The goal of this paper is to explore the limits of read port reduction in a centralized integer register file i.e. how few read ports can be provided to a centralized integer register file, while still maintaining performance? A naïve port reduction may result in significant performance degradation and does not give a true measure of the limits, while clever techniques may be able to further reduce the number of ports. Hence, in this paper, we drastically reduce the number of ports and then investigate techniques to improve the performance of the reduced-ported register file. Our experiments show that the techniques allow further port reduction by improving the performance from reduced-ported RFs. For instance, with our experimental parameters, the naïve port reduction method requires at least five read ports to maintain a performance impact of less than 5%, whereas, our techniques require only three ports.
寄存器文件访问落在微处理器的关键路径上,因为使用了大量移植的大寄存器文件来利用更多的并行性。在本文中,我们的重点是通过减少寄存器文件读取端口的数量来降低寄存器文件的复杂性。本文的目标是探索集中式整数寄存器文件中读端口减少的限制,即在保持性能的同时,可以为集中式整数寄存器文件提供多少读端口?naïve端口减少可能会导致显著的性能下降,并且不能给出限制的真实度量,而巧妙的技术可能能够进一步减少端口的数量。因此,在本文中,我们大幅减少端口的数量,然后研究技术,以提高减少端口的寄存器文件的性能。我们的实验表明,这些技术可以通过提高端口减少的rf的性能来进一步减少端口。例如,使用我们的实验参数,naïve端口减少方法需要至少五个读端口来保持小于5%的性能影响,而我们的技术只需要三个端口。
{"title":"Exploring the Limits of Port Reduction in Centralized Register Files","authors":"Sandeep Sirsi, Aneesh Aggarwal","doi":"10.1109/VLSI.Design.2009.29","DOIUrl":"https://doi.org/10.1109/VLSI.Design.2009.29","url":null,"abstract":"Register file access falls on the critical path of a microprocessor because large heavily ported register files are used to exploit more parallelism. In this paper, we focus on reducing register file complexity by reducing the number of register file read ports. The goal of this paper is to explore the limits of read port reduction in a centralized integer register file i.e. how few read ports can be provided to a centralized integer register file, while still maintaining performance? A naïve port reduction may result in significant performance degradation and does not give a true measure of the limits, while clever techniques may be able to further reduce the number of ports. Hence, in this paper, we drastically reduce the number of ports and then investigate techniques to improve the performance of the reduced-ported register file. Our experiments show that the techniques allow further port reduction by improving the performance from reduced-ported RFs. For instance, with our experimental parameters, the naïve port reduction method requires at least five read ports to maintain a performance impact of less than 5%, whereas, our techniques require only three ports.","PeriodicalId":267121,"journal":{"name":"2009 22nd International Conference on VLSI Design","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126574469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
ReConfigurable Technologies 可重构技术
Pub Date : 2009-01-05 DOI: 10.1109/VLSI.Design.2009.123
Mona Mathur
It has been envisioned that in the future it would be possible for the designer to have the complete flexibility that software offers at the hardware speeds which will ensure reduction in cost and the product turn-around time substantially. Optimal performance needs of applications can be met if fine-grained field reconfigurations can be made possible in hardware. There are several problems and challenges which need to be addressed – these include specification of reconfigurable architectures and processors, software environments that support reconfiguration, increasing heterogeneity and complexity of the systems and SoCs and power management. It is one of the goals of this talk to stimulate a discussion on reconfigurable design by introducing some key Issues.
它已经设想,在未来,它将有可能为设计师提供完整的灵活性,软件提供在硬件的速度,这将确保在成本和产品周转时间大幅降低。如果可以在硬件中实现细粒度的字段重新配置,则可以满足应用程序的最佳性能需求。有几个问题和挑战需要解决——这些包括可重构架构和处理器的规范,支持可重构的软件环境,增加系统和soc的异构性和复杂性以及电源管理。通过介绍一些关键问题来激发对可重构设计的讨论是本次演讲的目标之一。
{"title":"ReConfigurable Technologies","authors":"Mona Mathur","doi":"10.1109/VLSI.Design.2009.123","DOIUrl":"https://doi.org/10.1109/VLSI.Design.2009.123","url":null,"abstract":"It has been envisioned that in the future it would be possible for the designer to have the complete flexibility that software offers at the hardware speeds which will ensure reduction in cost and the product turn-around time substantially. Optimal performance needs of applications can be met if fine-grained field reconfigurations can be made possible in hardware. There are several problems and challenges which need to be addressed – these include specification of reconfigurable architectures and processors, software environments that support reconfiguration, increasing heterogeneity and complexity of the systems and SoCs and power management. It is one of the goals of this talk to stimulate a discussion on reconfigurable design by introducing some key Issues.","PeriodicalId":267121,"journal":{"name":"2009 22nd International Conference on VLSI Design","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126594808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
An Approach to Measure the Performance Impact of Dynamic Voltage Fluctuations Using Static Timing Analysis 用静态时序分析测量动态电压波动对性能影响的方法
Pub Date : 2009-01-05 DOI: 10.1109/VLSI.Design.2009.45
Ramamurthy Vishweshwara, R. Venkatraman, H. Udayakumar, N. Arvind
Design closure for predictable silicon performance is emerging as the most challenging digital VLSI design problem in advanced deep-submicron technology nodes. One of the significant problems is effective power-grid distribution,and the comprehension of the impact of voltage drops in the power grid on design timing and performance. This paper proposes a way by which the complex interactions between timing and dynamic power drops can be comprehended without being significantly pessimistic, while also not losing out on accuracy. We highlight the heuristics that we have used in this regard to reduce the complexity of the timing analysis, and to reduce the overall computation time. The overall method uses conventional analysis approaches for dynamic voltage-drop and timing. This method proposes options for comprehending effects of dynamic voltage drops during traditional design-closure methods and also highlights means of validating any assumptions made. Comparison results between performance degradation due to voltage drop assumptions and the traditional margin based approaches show significant reduction in the pessimism and these are presented in this paper.
在先进的深亚微米技术节点中,可预测硅性能的设计闭合是最具挑战性的数字VLSI设计问题。其中一个重要的问题是有效的电网分配,以及对电网电压降对设计时序和性能影响的理解。本文提出了一种方法,该方法可以在不显着悲观的情况下理解时序和动态功率下降之间的复杂相互作用,同时也不会损失精度。我们强调了我们在这方面使用的启发式方法,以减少时序分析的复杂性,并减少总体计算时间。总体方法采用传统的动态电压降和定时分析方法。该方法提出了在传统设计闭合方法中理解动态电压降影响的选项,并强调了验证所做假设的方法。电压降假设导致的性能下降与传统的基于裕度的方法的比较结果表明,悲观情绪显著降低,并在本文中提出了这些结果。
{"title":"An Approach to Measure the Performance Impact of Dynamic Voltage Fluctuations Using Static Timing Analysis","authors":"Ramamurthy Vishweshwara, R. Venkatraman, H. Udayakumar, N. Arvind","doi":"10.1109/VLSI.Design.2009.45","DOIUrl":"https://doi.org/10.1109/VLSI.Design.2009.45","url":null,"abstract":"Design closure for predictable silicon performance is emerging as the most challenging digital VLSI design problem in advanced deep-submicron technology nodes. One of the significant problems is effective power-grid distribution,and the comprehension of the impact of voltage drops in the power grid on design timing and performance. This paper proposes a way by which the complex interactions between timing and dynamic power drops can be comprehended without being significantly pessimistic, while also not losing out on accuracy. We highlight the heuristics that we have used in this regard to reduce the complexity of the timing analysis, and to reduce the overall computation time. The overall method uses conventional analysis approaches for dynamic voltage-drop and timing. This method proposes options for comprehending effects of dynamic voltage drops during traditional design-closure methods and also highlights means of validating any assumptions made. Comparison results between performance degradation due to voltage drop assumptions and the traditional margin based approaches show significant reduction in the pessimism and these are presented in this paper.","PeriodicalId":267121,"journal":{"name":"2009 22nd International Conference on VLSI Design","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129281779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Accelerating System-Level Design Tasks Using Commodity Graphics Hardware: A Case Study 使用商品图形硬件加速系统级设计任务:一个案例研究
Pub Date : 2009-01-05 DOI: 10.1109/VLSI.Design.2009.35
Unmesh D. Bordoloi, S. Chakraborty
Many system-level design tasks (e.g. timing analysis, hardware/software partitioning and design space exploration) involve computational kernels that are intractable (usually NP-hard). As a result, they involve high running times even for mid-sized problems. In this paper we explore the possibility of using commodity graphics processing units (GPUs) to accelerate such tasks that commonly arise in the electronic design automation (EDA) domain. We demonstrate this idea via a detailed case study on a general hardware/software design space exploration problem and propose a GPU-based engine for it. Not only does this problem commonly arise in the embedded systems domain, its computational kernel turns out to be a general combinatorial optimization problem (viz. the knapsack problem) which lies at the heart of several EDA applications. Our experimental results show that our GPU-based implementation offers very attractive speedups for this computational kernel (up to 100×), and speedups of up to 17× for the full problem. In contrast to ASIC/FPGA-based accelerators – since even low-end desktop and notebook computers are today equipped with GPUs – our solution involves no extra hardware cost. Although recent research has shown the benefits of using GPUs for a variety of non-graphics applications (e.g. in databases and bioinformatics), hardly any work has been done on harnessing the parallelism of GPUs to accelerate problems from the EDA domain. We hope that our results and the generality of the problem we address will motivate researchers from this community to explore the possibility of using GPUs for a wider variety of problems from the EDA domain.
许多系统级设计任务(例如时序分析,硬件/软件划分和设计空间探索)涉及难以处理的计算内核(通常是NP-hard)。因此,即使对于中等规模的问题,它们也涉及高运行时间。在本文中,我们探讨了使用商品图形处理单元(gpu)来加速电子设计自动化(EDA)领域中常见的此类任务的可能性。我们通过对一般硬件/软件设计空间探索问题的详细案例研究来证明这一想法,并为此提出了一个基于gpu的引擎。这个问题不仅经常出现在嵌入式系统领域,它的计算内核是一个通用的组合优化问题(即背包问题),这是几个EDA应用程序的核心。我们的实验结果表明,我们基于gpu的实现为这个计算内核提供了非常有吸引力的加速(高达100倍),对于整个问题的加速高达17倍。与基于ASIC/ fpga的加速器相比,我们的解决方案不需要额外的硬件成本,因为即使是低端的台式机和笔记本电脑也配备了gpu。尽管最近的研究已经显示了在各种非图形应用(例如数据库和生物信息学)中使用gpu的好处,但几乎没有任何工作已经完成了利用gpu的并行性来加速EDA领域的问题。我们希望我们的结果和我们解决的问题的普遍性将激励这个社区的研究人员探索使用gpu解决EDA领域更广泛问题的可能性。
{"title":"Accelerating System-Level Design Tasks Using Commodity Graphics Hardware: A Case Study","authors":"Unmesh D. Bordoloi, S. Chakraborty","doi":"10.1109/VLSI.Design.2009.35","DOIUrl":"https://doi.org/10.1109/VLSI.Design.2009.35","url":null,"abstract":"Many system-level design tasks (e.g. timing analysis, hardware/software partitioning and design space exploration) involve computational kernels that are intractable (usually NP-hard). As a result, they involve high running times even for mid-sized problems. In this paper we explore the possibility of using commodity graphics processing units (GPUs) to accelerate such tasks that commonly arise in the electronic design automation (EDA) domain. We demonstrate this idea via a detailed case study on a general hardware/software design space exploration problem and propose a GPU-based engine for it. Not only does this problem commonly arise in the embedded systems domain, its computational kernel turns out to be a general combinatorial optimization problem (viz. the knapsack problem) which lies at the heart of several EDA applications. Our experimental results show that our GPU-based implementation offers very attractive speedups for this computational kernel (up to 100×), and speedups of up to 17× for the full problem. In contrast to ASIC/FPGA-based accelerators – since even low-end desktop and notebook computers are today equipped with GPUs – our solution involves no extra hardware cost. Although recent research has shown the benefits of using GPUs for a variety of non-graphics applications (e.g. in databases and bioinformatics), hardly any work has been done on harnessing the parallelism of GPUs to accelerate problems from the EDA domain. We hope that our results and the generality of the problem we address will motivate researchers from this community to explore the possibility of using GPUs for a wider variety of problems from the EDA domain.","PeriodicalId":267121,"journal":{"name":"2009 22nd International Conference on VLSI Design","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124196174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Simultaneous Routing and Feedthrough Algorithm to Decongest Top Channel 同时路由和馈通算法减少顶部信道的拥塞
Pub Date : 2009-01-05 DOI: 10.1109/VLSI.Design.2009.83
S. Prasad, Anuj Kumar
In macrocell based SoC design, a routing plan to decongest top channel is an important step during floor planning. While previous approaches attempt at reducing congestion of chip as a whole, there is no attempt to specifically decongest top channel. We present an algorithmic approach to decongest top channel by using very few feedthroughs. Results show that compared to conventional methods, we can decongest top channel by using 20% lesser feedthrough buffers, and better top channel routing resource utilization.
在基于macrocell的SoC设计中,减少顶层通道拥塞的路由规划是布局规划的重要步骤。虽然以前的方法试图减少芯片整体的拥塞,但没有尝试专门减少顶部通道的拥塞。我们提出了一种算法方法,通过使用很少的反馈来减少顶部通道的拥挤。结果表明,与传统方法相比,该方法减少了20%的馈通缓冲区,减少了顶部信道的拥塞,提高了顶部信道路由资源的利用率。
{"title":"Simultaneous Routing and Feedthrough Algorithm to Decongest Top Channel","authors":"S. Prasad, Anuj Kumar","doi":"10.1109/VLSI.Design.2009.83","DOIUrl":"https://doi.org/10.1109/VLSI.Design.2009.83","url":null,"abstract":"In macrocell based SoC design, a routing plan to decongest top channel is an important step during floor planning. While previous approaches attempt at reducing congestion of chip as a whole, there is no attempt to specifically decongest top channel. We present an algorithmic approach to decongest top channel by using very few feedthroughs. Results show that compared to conventional methods, we can decongest top channel by using 20% lesser feedthrough buffers, and better top channel routing resource utilization.","PeriodicalId":267121,"journal":{"name":"2009 22nd International Conference on VLSI Design","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127617546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2009 22nd International Conference on VLSI Design
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1