Pub Date : 2009-01-05DOI: 10.1109/VLSI.Design.2009.64
Boxue Yin, D. Xiang, Zhen Chen
The small delay defects testing has two challenges. One is that the longest testable path selection for every target fault in ATPG consumes much CPU time. The other is the test data volume are very large. In this paper, we propose two strategies to resolve these two problems. A new path selection in advance scheme is proposed to accelerate ATPG. It aims to find fewer paths and cover more faults in advance, which is different from the previous works. To reduce the test data volume, we propose a novel scan-based test scheme. We partition the scan flip-flops into some scan chains. The first scan flip-flop of every scan chain works in enhanced scan mode. And other scan flip-flops work in broad-side mode. This can significantly increase the don't care bits of every test pattern and provide more room for test compaction. Then the test pattern count can be reduced significantly. Experimental results show the efficiency of these techniques.
{"title":"New Techniques for Accelerating Small Delay ATPG and Generating Compact Test Sets","authors":"Boxue Yin, D. Xiang, Zhen Chen","doi":"10.1109/VLSI.Design.2009.64","DOIUrl":"https://doi.org/10.1109/VLSI.Design.2009.64","url":null,"abstract":"The small delay defects testing has two challenges. One is that the longest testable path selection for every target fault in ATPG consumes much CPU time. The other is the test data volume are very large. In this paper, we propose two strategies to resolve these two problems. A new path selection in advance scheme is proposed to accelerate ATPG. It aims to find fewer paths and cover more faults in advance, which is different from the previous works. To reduce the test data volume, we propose a novel scan-based test scheme. We partition the scan flip-flops into some scan chains. The first scan flip-flop of every scan chain works in enhanced scan mode. And other scan flip-flops work in broad-side mode. This can significantly increase the don't care bits of every test pattern and provide more room for test compaction. Then the test pattern count can be reduced significantly. Experimental results show the efficiency of these techniques.","PeriodicalId":267121,"journal":{"name":"2009 22nd International Conference on VLSI Design","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132267347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-01-05DOI: 10.1109/VLSI.Design.2009.63
K. Usami, T. Shirai, T. Hashida, H. Masuda, S. Takeda, M. Nakata, N. Seki, H. Amano, M. Namiki, Masashi Imai, Masaaki Kondo, Hiroshi Nakamura
This paper describes a design and implementation methodology for fine-grain power gating. Since sleep-in and wakeup are controlled in a fine granularity in run time, shortening the transition time between the sleep and active states is strongly required. In particular, shortening the wakeup time is essential because it affects the execution time and hence does the performance. However, this requirement makes suppression of the ground-bounce more difficult. We propose a novel technique to skew the wakeup timings of fine-grain local power domains to suppress the ground bounce. Delay of buffers driving power switches is skewed in the buffer tree by selectively downsizing them. We designed a MIPS R3000 based CPU core in a 90nm CMOS technology and applied our technique to internal function units. Simulation results showed that our technique reduces the rush current to 47% over the case to turn-on the power switches simultaneously. This resulted in suppressing the ground bounce to 53mV with 3.3ns wakeup time. Simulation results from running benchmark programs showed that the total power dissipation for the function units was reduced by up to 15% at 25°C and by 62% at 100°C. Effectiveness in power savings is discussed from the viewpoint of the temperature-dependent break-even points and the consecutive idle time in the program.
{"title":"Design and Implementation of Fine-Grain Power Gating with Ground Bounce Suppression","authors":"K. Usami, T. Shirai, T. Hashida, H. Masuda, S. Takeda, M. Nakata, N. Seki, H. Amano, M. Namiki, Masashi Imai, Masaaki Kondo, Hiroshi Nakamura","doi":"10.1109/VLSI.Design.2009.63","DOIUrl":"https://doi.org/10.1109/VLSI.Design.2009.63","url":null,"abstract":"This paper describes a design and implementation methodology for fine-grain power gating. Since sleep-in and wakeup are controlled in a fine granularity in run time, shortening the transition time between the sleep and active states is strongly required. In particular, shortening the wakeup time is essential because it affects the execution time and hence does the performance. However, this requirement makes suppression of the ground-bounce more difficult. We propose a novel technique to skew the wakeup timings of fine-grain local power domains to suppress the ground bounce. Delay of buffers driving power switches is skewed in the buffer tree by selectively downsizing them. We designed a MIPS R3000 based CPU core in a 90nm CMOS technology and applied our technique to internal function units. Simulation results showed that our technique reduces the rush current to 47% over the case to turn-on the power switches simultaneously. This resulted in suppressing the ground bounce to 53mV with 3.3ns wakeup time. Simulation results from running benchmark programs showed that the total power dissipation for the function units was reduced by up to 15% at 25°C and by 62% at 100°C. Effectiveness in power savings is discussed from the viewpoint of the temperature-dependent break-even points and the consecutive idle time in the program.","PeriodicalId":267121,"journal":{"name":"2009 22nd International Conference on VLSI Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125893086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-01-05DOI: 10.1109/VLSI.Design.2009.30
T. Samanta, H. Rahaman, P. Ghosal, P. Dasgupta
Interconnects are vital in deep sub-micron VLSI design, as they impose constraints, such as delay, congestion, crosstalk, power dissipation and others, and consume resources. These parameters affect the efforts for obtaining a feasible solution for the global routing of multiple nets. In addition, efforts are on for exploration and use of non-Manhattan routing architectures. In this work, we focus on the specific problem of multi-net multi-pin global Y -routing for custom-built design styles with several available routing layers. The problem is formulated as a minimum crossing Y -Steiner Minimal tree problem with multi-layer assignment. Experimental results are quite encouraging.
{"title":"A Method for the Multi-Net Multi-Pin Routing Problem with Layer Assignment","authors":"T. Samanta, H. Rahaman, P. Ghosal, P. Dasgupta","doi":"10.1109/VLSI.Design.2009.30","DOIUrl":"https://doi.org/10.1109/VLSI.Design.2009.30","url":null,"abstract":"Interconnects are vital in deep sub-micron VLSI design, as they impose constraints, such as delay, congestion, crosstalk, power dissipation and others, and consume resources. These parameters affect the efforts for obtaining a feasible solution for the global routing of multiple nets. In addition, efforts are on for exploration and use of non-Manhattan routing architectures. In this work, we focus on the specific problem of multi-net multi-pin global Y -routing for custom-built design styles with several available routing layers. The problem is formulated as a minimum crossing Y -Steiner Minimal tree problem with multi-layer assignment. Experimental results are quite encouraging.","PeriodicalId":267121,"journal":{"name":"2009 22nd International Conference on VLSI Design","volume":"402 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115100780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-01-05DOI: 10.1109/VLSI.Design.2009.107
G. Agarwal, Prakash Bare
Summary: Demand for high performance in today's systems requires some key IP components to be designed in analog, whereas quick turn around time requires significant use of digital IPs. While there has been significant progress in design automation and design reuse of digital circuits in the last couple of decades, much has not changed for analog design. Design capture in low
{"title":"Why is Design Automation and Reuse of Analog Designs Increasingly Trailing the Digital World?","authors":"G. Agarwal, Prakash Bare","doi":"10.1109/VLSI.Design.2009.107","DOIUrl":"https://doi.org/10.1109/VLSI.Design.2009.107","url":null,"abstract":"Summary: Demand for high performance in today's systems requires some key IP components to be designed in analog, whereas quick turn around time requires significant use of digital IPs. While there has been significant progress in design automation and design reuse of digital circuits in the last couple of decades, much has not changed for analog design. Design capture in low","PeriodicalId":267121,"journal":{"name":"2009 22nd International Conference on VLSI Design","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124981428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-01-05DOI: 10.1109/VLSI.Design.2009.114
Jörg Henkel, N. Vijaykrishnan, S. Parameswaran, R. Ragel
Designers of embedded systems have traditionally optimized circuits for speed, size, power and time to market. Recently however, the dependability of the system is emerging as a great concern to the modern designer with the decrease in feature size and the increase in the demand for functionality. Yet another crucial concern is the security of systems used for storage of personal details and for financial transactions. A significant number of techniques that are used to overcome security and dependability are the same or have similar origins. Thus this tutorial will examine the overlapping concerns of security and dependability and the design methods used to overcome the problems and threats. This tutorial is divided into four parts: the first will examine dependability issues due to technology effects; the second will look at reliability aware designs; the third, will describe the security threats; and, the fourth part will illustrate the countermeasures to security and reliability issues Part I: Dependability Issues due to Technology Effects and Architectural Countermeasures Moore’s law has been in place for more than four decades. Each new technology node provided advantages in basically all major design constraints (performance, power, area, etc.). When migrating to upcoming technology nodes it will become obvious that this win-win situation soon will be at an end. Or, in other words, in future it becomes far more difficult and expensive to migrate to new technology nodes. One major point is an inherent undependability which will become a challenging problem. Undependability addressed within this part of the tutorial is related to a) Fabrication and Design-Time Effects like “Yield and Process Variations” and “Complexity” as well as b) run-time effects as “Aging Effects”, “Thermal Effects” and “Soft Errors”. The first part of this tutorial will give the details of these effects and a prospect of how these effects might influence future architectures for embedded systems. An overview of selected state-of-the-art paradigms and approaches is given including a focus on organic computing principles as well as run-time adaptive embedded processor architectures that can deal with dependability issues. Part II: Reliability Aware Design for Embedded Systems Design of robust embedded systems meeting stringent quality, reliability, and availability requirements is becoming increasingly difficult in advanced technologies. The current design paradigm which assumes that no gate or interconnect will ever operate incorrectly within the lifetime of a product must change to cope with such failures. New architectural features are required for robust system design with built-in mechanisms for failure tolerance, detection and recovery during normal system operation. This part of the tutorial will focus on new design techniques required for building robust systems: concurrent error detection, recovery, and selfrepair. A broad spectrum of circuit-level, logic-level,
{"title":"Security and Dependability of Embedded Systems: A Computer Architects' Perspective","authors":"Jörg Henkel, N. Vijaykrishnan, S. Parameswaran, R. Ragel","doi":"10.1109/VLSI.Design.2009.114","DOIUrl":"https://doi.org/10.1109/VLSI.Design.2009.114","url":null,"abstract":"Designers of embedded systems have traditionally optimized circuits for speed, size, power and time to market. Recently however, the dependability of the system is emerging as a great concern to the modern designer with the decrease in feature size and the increase in the demand for functionality. Yet another crucial concern is the security of systems used for storage of personal details and for financial transactions. A significant number of techniques that are used to overcome security and dependability are the same or have similar origins. Thus this tutorial will examine the overlapping concerns of security and dependability and the design methods used to overcome the problems and threats. This tutorial is divided into four parts: the first will examine dependability issues due to technology effects; the second will look at reliability aware designs; the third, will describe the security threats; and, the fourth part will illustrate the countermeasures to security and reliability issues Part I: Dependability Issues due to Technology Effects and Architectural Countermeasures Moore’s law has been in place for more than four decades. Each new technology node provided advantages in basically all major design constraints (performance, power, area, etc.). When migrating to upcoming technology nodes it will become obvious that this win-win situation soon will be at an end. Or, in other words, in future it becomes far more difficult and expensive to migrate to new technology nodes. One major point is an inherent undependability which will become a challenging problem. Undependability addressed within this part of the tutorial is related to a) Fabrication and Design-Time Effects like “Yield and Process Variations” and “Complexity” as well as b) run-time effects as “Aging Effects”, “Thermal Effects” and “Soft Errors”. The first part of this tutorial will give the details of these effects and a prospect of how these effects might influence future architectures for embedded systems. An overview of selected state-of-the-art paradigms and approaches is given including a focus on organic computing principles as well as run-time adaptive embedded processor architectures that can deal with dependability issues. Part II: Reliability Aware Design for Embedded Systems Design of robust embedded systems meeting stringent quality, reliability, and availability requirements is becoming increasingly difficult in advanced technologies. The current design paradigm which assumes that no gate or interconnect will ever operate incorrectly within the lifetime of a product must change to cope with such failures. New architectural features are required for robust system design with built-in mechanisms for failure tolerance, detection and recovery during normal system operation. This part of the tutorial will focus on new design techniques required for building robust systems: concurrent error detection, recovery, and selfrepair. A broad spectrum of circuit-level, logic-level,","PeriodicalId":267121,"journal":{"name":"2009 22nd International Conference on VLSI Design","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130422575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-01-05DOI: 10.1109/VLSI.Design.2009.45
Ramamurthy Vishweshwara, R. Venkatraman, H. Udayakumar, N. Arvind
Design closure for predictable silicon performance is emerging as the most challenging digital VLSI design problem in advanced deep-submicron technology nodes. One of the significant problems is effective power-grid distribution,and the comprehension of the impact of voltage drops in the power grid on design timing and performance. This paper proposes a way by which the complex interactions between timing and dynamic power drops can be comprehended without being significantly pessimistic, while also not losing out on accuracy. We highlight the heuristics that we have used in this regard to reduce the complexity of the timing analysis, and to reduce the overall computation time. The overall method uses conventional analysis approaches for dynamic voltage-drop and timing. This method proposes options for comprehending effects of dynamic voltage drops during traditional design-closure methods and also highlights means of validating any assumptions made. Comparison results between performance degradation due to voltage drop assumptions and the traditional margin based approaches show significant reduction in the pessimism and these are presented in this paper.
{"title":"An Approach to Measure the Performance Impact of Dynamic Voltage Fluctuations Using Static Timing Analysis","authors":"Ramamurthy Vishweshwara, R. Venkatraman, H. Udayakumar, N. Arvind","doi":"10.1109/VLSI.Design.2009.45","DOIUrl":"https://doi.org/10.1109/VLSI.Design.2009.45","url":null,"abstract":"Design closure for predictable silicon performance is emerging as the most challenging digital VLSI design problem in advanced deep-submicron technology nodes. One of the significant problems is effective power-grid distribution,and the comprehension of the impact of voltage drops in the power grid on design timing and performance. This paper proposes a way by which the complex interactions between timing and dynamic power drops can be comprehended without being significantly pessimistic, while also not losing out on accuracy. We highlight the heuristics that we have used in this regard to reduce the complexity of the timing analysis, and to reduce the overall computation time. The overall method uses conventional analysis approaches for dynamic voltage-drop and timing. This method proposes options for comprehending effects of dynamic voltage drops during traditional design-closure methods and also highlights means of validating any assumptions made. Comparison results between performance degradation due to voltage drop assumptions and the traditional margin based approaches show significant reduction in the pessimism and these are presented in this paper.","PeriodicalId":267121,"journal":{"name":"2009 22nd International Conference on VLSI Design","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129281779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-01-05DOI: 10.1109/VLSI.Design.2009.35
Unmesh D. Bordoloi, S. Chakraborty
Many system-level design tasks (e.g. timing analysis, hardware/software partitioning and design space exploration) involve computational kernels that are intractable (usually NP-hard). As a result, they involve high running times even for mid-sized problems. In this paper we explore the possibility of using commodity graphics processing units (GPUs) to accelerate such tasks that commonly arise in the electronic design automation (EDA) domain. We demonstrate this idea via a detailed case study on a general hardware/software design space exploration problem and propose a GPU-based engine for it. Not only does this problem commonly arise in the embedded systems domain, its computational kernel turns out to be a general combinatorial optimization problem (viz. the knapsack problem) which lies at the heart of several EDA applications. Our experimental results show that our GPU-based implementation offers very attractive speedups for this computational kernel (up to 100×), and speedups of up to 17× for the full problem. In contrast to ASIC/FPGA-based accelerators – since even low-end desktop and notebook computers are today equipped with GPUs – our solution involves no extra hardware cost. Although recent research has shown the benefits of using GPUs for a variety of non-graphics applications (e.g. in databases and bioinformatics), hardly any work has been done on harnessing the parallelism of GPUs to accelerate problems from the EDA domain. We hope that our results and the generality of the problem we address will motivate researchers from this community to explore the possibility of using GPUs for a wider variety of problems from the EDA domain.
{"title":"Accelerating System-Level Design Tasks Using Commodity Graphics Hardware: A Case Study","authors":"Unmesh D. Bordoloi, S. Chakraborty","doi":"10.1109/VLSI.Design.2009.35","DOIUrl":"https://doi.org/10.1109/VLSI.Design.2009.35","url":null,"abstract":"Many system-level design tasks (e.g. timing analysis, hardware/software partitioning and design space exploration) involve computational kernels that are intractable (usually NP-hard). As a result, they involve high running times even for mid-sized problems. In this paper we explore the possibility of using commodity graphics processing units (GPUs) to accelerate such tasks that commonly arise in the electronic design automation (EDA) domain. We demonstrate this idea via a detailed case study on a general hardware/software design space exploration problem and propose a GPU-based engine for it. Not only does this problem commonly arise in the embedded systems domain, its computational kernel turns out to be a general combinatorial optimization problem (viz. the knapsack problem) which lies at the heart of several EDA applications. Our experimental results show that our GPU-based implementation offers very attractive speedups for this computational kernel (up to 100×), and speedups of up to 17× for the full problem. In contrast to ASIC/FPGA-based accelerators – since even low-end desktop and notebook computers are today equipped with GPUs – our solution involves no extra hardware cost. Although recent research has shown the benefits of using GPUs for a variety of non-graphics applications (e.g. in databases and bioinformatics), hardly any work has been done on harnessing the parallelism of GPUs to accelerate problems from the EDA domain. We hope that our results and the generality of the problem we address will motivate researchers from this community to explore the possibility of using GPUs for a wider variety of problems from the EDA domain.","PeriodicalId":267121,"journal":{"name":"2009 22nd International Conference on VLSI Design","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124196174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-01-05DOI: 10.1109/VLSI.Design.2009.123
Mona Mathur
It has been envisioned that in the future it would be possible for the designer to have the complete flexibility that software offers at the hardware speeds which will ensure reduction in cost and the product turn-around time substantially. Optimal performance needs of applications can be met if fine-grained field reconfigurations can be made possible in hardware. There are several problems and challenges which need to be addressed – these include specification of reconfigurable architectures and processors, software environments that support reconfiguration, increasing heterogeneity and complexity of the systems and SoCs and power management. It is one of the goals of this talk to stimulate a discussion on reconfigurable design by introducing some key Issues.
{"title":"ReConfigurable Technologies","authors":"Mona Mathur","doi":"10.1109/VLSI.Design.2009.123","DOIUrl":"https://doi.org/10.1109/VLSI.Design.2009.123","url":null,"abstract":"It has been envisioned that in the future it would be possible for the designer to have the complete flexibility that software offers at the hardware speeds which will ensure reduction in cost and the product turn-around time substantially. Optimal performance needs of applications can be met if fine-grained field reconfigurations can be made possible in hardware. There are several problems and challenges which need to be addressed – these include specification of reconfigurable architectures and processors, software environments that support reconfiguration, increasing heterogeneity and complexity of the systems and SoCs and power management. It is one of the goals of this talk to stimulate a discussion on reconfigurable design by introducing some key Issues.","PeriodicalId":267121,"journal":{"name":"2009 22nd International Conference on VLSI Design","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126594808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-01-05DOI: 10.1109/VLSI.Design.2009.83
S. Prasad, Anuj Kumar
In macrocell based SoC design, a routing plan to decongest top channel is an important step during floor planning. While previous approaches attempt at reducing congestion of chip as a whole, there is no attempt to specifically decongest top channel. We present an algorithmic approach to decongest top channel by using very few feedthroughs. Results show that compared to conventional methods, we can decongest top channel by using 20% lesser feedthrough buffers, and better top channel routing resource utilization.
{"title":"Simultaneous Routing and Feedthrough Algorithm to Decongest Top Channel","authors":"S. Prasad, Anuj Kumar","doi":"10.1109/VLSI.Design.2009.83","DOIUrl":"https://doi.org/10.1109/VLSI.Design.2009.83","url":null,"abstract":"In macrocell based SoC design, a routing plan to decongest top channel is an important step during floor planning. While previous approaches attempt at reducing congestion of chip as a whole, there is no attempt to specifically decongest top channel. We present an algorithmic approach to decongest top channel by using very few feedthroughs. Results show that compared to conventional methods, we can decongest top channel by using 20% lesser feedthrough buffers, and better top channel routing resource utilization.","PeriodicalId":267121,"journal":{"name":"2009 22nd International Conference on VLSI Design","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127617546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-01-05DOI: 10.1109/VLSI.Design.2009.29
Sandeep Sirsi, Aneesh Aggarwal
Register file access falls on the critical path of a microprocessor because large heavily ported register files are used to exploit more parallelism. In this paper, we focus on reducing register file complexity by reducing the number of register file read ports. The goal of this paper is to explore the limits of read port reduction in a centralized integer register file i.e. how few read ports can be provided to a centralized integer register file, while still maintaining performance? A naïve port reduction may result in significant performance degradation and does not give a true measure of the limits, while clever techniques may be able to further reduce the number of ports. Hence, in this paper, we drastically reduce the number of ports and then investigate techniques to improve the performance of the reduced-ported register file. Our experiments show that the techniques allow further port reduction by improving the performance from reduced-ported RFs. For instance, with our experimental parameters, the naïve port reduction method requires at least five read ports to maintain a performance impact of less than 5%, whereas, our techniques require only three ports.
{"title":"Exploring the Limits of Port Reduction in Centralized Register Files","authors":"Sandeep Sirsi, Aneesh Aggarwal","doi":"10.1109/VLSI.Design.2009.29","DOIUrl":"https://doi.org/10.1109/VLSI.Design.2009.29","url":null,"abstract":"Register file access falls on the critical path of a microprocessor because large heavily ported register files are used to exploit more parallelism. In this paper, we focus on reducing register file complexity by reducing the number of register file read ports. The goal of this paper is to explore the limits of read port reduction in a centralized integer register file i.e. how few read ports can be provided to a centralized integer register file, while still maintaining performance? A naïve port reduction may result in significant performance degradation and does not give a true measure of the limits, while clever techniques may be able to further reduce the number of ports. Hence, in this paper, we drastically reduce the number of ports and then investigate techniques to improve the performance of the reduced-ported register file. Our experiments show that the techniques allow further port reduction by improving the performance from reduced-ported RFs. For instance, with our experimental parameters, the naïve port reduction method requires at least five read ports to maintain a performance impact of less than 5%, whereas, our techniques require only three ports.","PeriodicalId":267121,"journal":{"name":"2009 22nd International Conference on VLSI Design","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126574469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}