To meet conflicting flexibility, performance and cost constraints of demanding signal processing applications, future designs in this domain will contain an increasing number of application specific programmahle.units combined with complex communication and memory infrastructures. Novel architecture trends like Application Specific Instruction-set Processors (ASPS) as well as customized buses and Network-on-Chip based communication promise enormous potential for optimization. However, state-of-the-art tooling and design practice is not in a shape to take advantage of this advances in computer architecture and silicon technology. Currently, EDA industry develops two diverging strategies to cope with the design complexity of such application specific, heterogeneous MP-SoC platforms. First, the IPdriven approach emphasizes the composition of MP-SoC platforms from configurahle off-the-shelf Intellectual Property blocks. On the other hand, the design-driven approach strives to take design efficiency to the required level by use of system level design methodologies and IP generation tools. In this paper, we discuss technical and economical aspects of both strategies. Based on the analysis of recent trends in computer achitecture and system level design, we envision a hand-in-hand approach of signal processing platform architectures and design metholodgy to conquer the complexitv crisis in etnemine MP-SoC developments.
{"title":"Heterogeneous MP-SoC - the solution to energy-efficient signal processing","authors":"Tim Kogel, H. Meyr","doi":"10.1145/996566.996754","DOIUrl":"https://doi.org/10.1145/996566.996754","url":null,"abstract":"To meet conflicting flexibility, performance and cost constraints of demanding signal processing applications, future designs in this domain will contain an increasing number of application specific programmahle.units combined with complex communication and memory infrastructures. Novel architecture trends like Application Specific Instruction-set Processors (ASPS) as well as customized buses and Network-on-Chip based communication promise enormous potential for optimization. However, state-of-the-art tooling and design practice is not in a shape to take advantage of this advances in computer architecture and silicon technology. Currently, EDA industry develops two diverging strategies to cope with the design complexity of such application specific, heterogeneous MP-SoC platforms. First, the IPdriven approach emphasizes the composition of MP-SoC platforms from configurahle off-the-shelf Intellectual Property blocks. On the other hand, the design-driven approach strives to take design efficiency to the required level by use of system level design methodologies and IP generation tools. In this paper, we discuss technical and economical aspects of both strategies. Based on the analysis of recent trends in computer achitecture and system level design, we envision a hand-in-hand approach of signal processing platform architectures and design metholodgy to conquer the complexitv crisis in etnemine MP-SoC developments.","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124993051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In System on Chip (SoC) design, growing design complexity has forced designers to start designs at higher abstraction levels. This paper proposes an SoC design methodology that makes full use of FPGA capabilities. Design modules in different abstraction levels are all combined and run together in an FPGA prototyping system that fully emulates the target SoC. The higher abstraction level design modules run on microprocessors embedded in the FPGAs, while lower-level synthesizable RTL design modules are directly mapped onto FPGA reconfigurable cells. We made a hardware wrapper that gets the embedded microprocessors to interface with the fully synthesized modules through IBM CoreConnect buses. Using this methodology, we developed an image processor SoC with cryptographic functions, and we verified the design by running real firmware and application programs. For the designs that are too large to be fit into an FPGA, dynamic reconfiguration method is used.
{"title":"An SoC design methodology using FPGAs and embedded microprocessors","authors":"N. Ohba, K. Takano","doi":"10.1145/996566.996769","DOIUrl":"https://doi.org/10.1145/996566.996769","url":null,"abstract":"In System on Chip (SoC) design, growing design complexity has forced designers to start designs at higher abstraction levels. This paper proposes an SoC design methodology that makes full use of FPGA capabilities. Design modules in different abstraction levels are all combined and run together in an FPGA prototyping system that fully emulates the target SoC. The higher abstraction level design modules run on microprocessors embedded in the FPGAs, while lower-level synthesizable RTL design modules are directly mapped onto FPGA reconfigurable cells. We made a hardware wrapper that gets the embedded microprocessors to interface with the fully synthesized modules through IBM CoreConnect buses. Using this methodology, we developed an image processor SoC with cryptographic functions, and we verified the design by running real firmware and application programs. For the designs that are too large to be fit into an FPGA, dynamic reconfiguration method is used.","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124179706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joshua J. Pieper, A. Mellan, J. M. Paul, D. E. Thomas, F. Karim
As multiprocessor systems-on-chip become a reality, performance modeling becomes a challenge. To quickly evaluate many architectures, some type of high-level simulation is required, including high-level cache simulation. We propose to perform this cache simulation by defining a metric to represent memory behavior independently of cache structure and back-annotate this into the original application. While the annotation phase is complex, requiring time comparable to normal address trace based simulation, it need only be performed once per application set and thus enables simulation to be sped up by a factor of 20 to 50 over trace based simulation. This is important for embedded systems, as software is often evaluated against many input sets and many architectures. Our results show the technique is accurate to within 20% of miss rate for uniprocessors and was able to reduce the die area of a multiprocessor chip by a projected 14% over a naive design by accurately sizing caches for each processor.
{"title":"High level cache simulation for heterogeneous multiprocessors","authors":"Joshua J. Pieper, A. Mellan, J. M. Paul, D. E. Thomas, F. Karim","doi":"10.1145/996566.996652","DOIUrl":"https://doi.org/10.1145/996566.996652","url":null,"abstract":"As multiprocessor systems-on-chip become a reality, performance modeling becomes a challenge. To quickly evaluate many architectures, some type of high-level simulation is required, including high-level cache simulation. We propose to perform this cache simulation by defining a metric to represent memory behavior independently of cache structure and back-annotate this into the original application. While the annotation phase is complex, requiring time comparable to normal address trace based simulation, it need only be performed once per application set and thus enables simulation to be sped up by a factor of 20 to 50 over trace based simulation. This is important for embedded systems, as software is often evaluated against many input sets and many architectures. Our results show the technique is accurate to within 20% of miss rate for uniprocessors and was able to reduce the die area of a multiprocessor chip by a projected 14% over a naive design by accurately sizing caches for each processor.","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127089848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Several factors such as process variation, noises, and delay defects can degrade the reliabilities of a circuit. Traditional methods add a pessimistic timing margin to resolve delay variation problems. In this paper, instead of sacrificing the performance, we propose a re-synthesis technique which adds redundant logics to protect the performance. Because nodes in the critical paths have zero slacks and are vulnerable to delay variation, we formulate the problem of tolerating delay variation to be the problem of increasing the slacks of nodes. Our re-synthesis technique can increase the slacks of all nodes or wires to be larger than a pre-determined value. Our experimental results show that additional area penalty is around 21% for 10% of delay variation tolerance.
{"title":"Re-synthesis for delay variation tolerance","authors":"Shih-Chieh Chang, C. Hsieh, Kai-Chiang Wu","doi":"10.1145/996566.996785","DOIUrl":"https://doi.org/10.1145/996566.996785","url":null,"abstract":"Several factors such as process variation, noises, and delay defects can degrade the reliabilities of a circuit. Traditional methods add a pessimistic timing margin to resolve delay variation problems. In this paper, instead of sacrificing the performance, we propose a re-synthesis technique which adds redundant logics to protect the performance. Because nodes in the critical paths have zero slacks and are vulnerable to delay variation, we formulate the problem of tolerating delay variation to be the problem of increasing the slacks of nodes. Our re-synthesis technique can increase the slacks of all nodes or wires to be larger than a pre-determined value. Our experimental results show that additional area penalty is around 21% for 10% of delay variation tolerance.","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"2132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129979957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The high leakage devices in nanometer technologies as well as the low activity rates in system-on-a-chip (SOC) contribute to the growing significance of leakage power at the system level. We first present system-level leakage-power modeling and characteristics and discuss ways to reduce leakage for caches. Considering the interdependence between leakage power and temperature, we then discuss thermal runaway and dynamic power and thermal management (DPTM) to reduce power and prevent thermal violations. We show that a thermal-independent leakage model may hide actual failures of DPTM. Finally, we present voltage scaling considering DPTM for different packaging options. We show that the optimal Vdd for the hest throughput may be smaller than the largest Vdd allowed by the given packaging platform, and that advanced cooling techniques can improve throughput significantly.
{"title":"System level leakage reduction considering the interdependence of temperature and leakage","authors":"Lei He, W. Liao, M. Stan","doi":"10.1145/996566.996572","DOIUrl":"https://doi.org/10.1145/996566.996572","url":null,"abstract":"The high leakage devices in nanometer technologies as well as the low activity rates in system-on-a-chip (SOC) contribute to the growing significance of leakage power at the system level. We first present system-level leakage-power modeling and characteristics and discuss ways to reduce leakage for caches. Considering the interdependence between leakage power and temperature, we then discuss thermal runaway and dynamic power and thermal management (DPTM) to reduce power and prevent thermal violations. We show that a thermal-independent leakage model may hide actual failures of DPTM. Finally, we present voltage scaling considering DPTM for different packaging options. We show that the optimal Vdd for the hest throughput may be smaller than the largest Vdd allowed by the given packaging platform, and that advanced cooling techniques can improve throughput significantly.","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126511349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Parametric representations used for symbolic simulation of circuits usually use BDDs. After a few steps of symbolic simulation, state set representation is converted from one parametric representation to another smaller representation, in a process called reparameterization. For large circuits, the reparametrization step often results in a blowup of BDDs and is expensive due to a large number of quantifications of input variables involved. Efficient SAT solvers have been applied successfully for many verification problems. This paper presents a novel SAT-based reparameterization algorithm that is largely immune to the large number of input variables that need to be quantified. We show experimental results on large industrial circuits and compare our new algorithm to both SAT-based Bounded Model Checking and BDD based symbolic simulation. We were able to achieve on average 3x improvement in time and space over BMC and able to complete many examples that BDD based approach could not even finish.
{"title":"A SAT-based algorithm for reparameterization in symbolic simulation","authors":"P. Chauhan, E. Clarke, D. Kroening","doi":"10.1145/996566.996711","DOIUrl":"https://doi.org/10.1145/996566.996711","url":null,"abstract":"Parametric representations used for symbolic simulation of circuits usually use BDDs. After a few steps of symbolic simulation, state set representation is converted from one parametric representation to another smaller representation, in a process called reparameterization. For large circuits, the reparametrization step often results in a blowup of BDDs and is expensive due to a large number of quantifications of input variables involved. Efficient SAT solvers have been applied successfully for many verification problems. This paper presents a novel SAT-based reparameterization algorithm that is largely immune to the large number of input variables that need to be quantified. We show experimental results on large industrial circuits and compare our new algorithm to both SAT-based Bounded Model Checking and BDD based symbolic simulation. We were able to achieve on average 3x improvement in time and space over BMC and able to complete many examples that BDD based approach could not even finish.","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"217 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121478187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Parasitic parameter extraction is a crucial issue in Integrated Circuit design. Integral equation based solvers, which guarantee high accuracy, suffer from a time and memory bottleneck arising from the dense matrices generated. .
{"title":"A fast parasitic extractor based on low-rank multilevel matrix compression for conductor and dielectric modeling in microelectronics and MEMS","authors":"D. Gope, Swagato Chakraborty, V. Jandhyala","doi":"10.1145/996566.996780","DOIUrl":"https://doi.org/10.1145/996566.996780","url":null,"abstract":"Parasitic parameter extraction is a crucial issue in Integrated Circuit design. Integral equation based solvers, which guarantee high accuracy, suffer from a time and memory bottleneck arising from the dense matrices generated. .","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121359687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
X-tolerant deterministic BIST (XDBIST) was recently presented as a method to efficiently compress and apply scan patterns generated by automatic test pattern generation (ATPG) in a logic built-in self-test architecture. In this paper we introduce a novel selector architecture that allows arbitrary compression ratios, scales to any number of scan chains and minimizes area overhead. XDBIST test-coverage, full X-tolerance and scan-based diagnosis ability are preserved and are the same as deterministic scan-ATPG.
{"title":"Scalable selector architecture for X-tolerant deterministic BIST","authors":"P. Wohl, J. Waicukauski, Sanjay B. Patel","doi":"10.1145/996566.996814","DOIUrl":"https://doi.org/10.1145/996566.996814","url":null,"abstract":"X-tolerant deterministic BIST (XDBIST) was recently presented as a method to efficiently compress and apply scan patterns generated by automatic test pattern generation (ATPG) in a logic built-in self-test architecture. In this paper we introduce a novel selector architecture that allows arbitrary compression ratios, scales to any number of scan chains and minimizes area overhead. XDBIST test-coverage, full X-tolerance and scan-based diagnosis ability are preserved and are the same as deterministic scan-ATPG.","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114255785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we formulate symmetry detection for incompletely specified functions as an equation without using cofactor computation and equivalence checking. Based on this equation, a symmetry detection algorithm is proposed. This algorithm can simultaneously find non-equivalence and equivalence symmetries. Experimental results on a set of benchmarks show that our algorithm is indeed very effective in solving symmetry detection problem for incompletely specified functions.
{"title":"Symmetry detection for incompletely specified functions","authors":"Kuo-Hua Wang, Jia-Hung Chen","doi":"10.1145/996566.996690","DOIUrl":"https://doi.org/10.1145/996566.996690","url":null,"abstract":"In this paper, we formulate symmetry detection for incompletely specified functions as an equation without using cofactor computation and equivalence checking. Based on this equation, a symmetry detection algorithm is proposed. This algorithm can simultaneously find non-equivalence and equivalence symmetries. Experimental results on a set of benchmarks show that our algorithm is indeed very effective in solving symmetry detection problem for incompletely specified functions.","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"236 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122630173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Leakage current has become a stringent constraint in modern processor designs in addition to traditional constraints on frequency. Since leakage current exhibits a strong inverse correlation with circuit delay, effective parametric yield prediction must consider the dependence of leakage current on frequency. In this paper, we present a new chip-level statistical method to estimate the total leakage current in the presence of within-die and die-to-die variability. We develop a closed-form expression for total chip leakage that models the dependence of the leakage current distribution on a number of process parameters. The model is based on the concept of scaling factors to capture the effects of within-die variability. Using this model, we then present an integrated approach to accurately estimate the yield loss when both frequency and power limits are imposed on a design. Our method demonstrates the importance of considering both these limiters in calculating the yield of a lot.
{"title":"Parametric yield estimation considering leakage variability","authors":"Rajeev R. Rao, A. Devgan, D. Blaauw, D. Sylvester","doi":"10.1145/996566.996693","DOIUrl":"https://doi.org/10.1145/996566.996693","url":null,"abstract":"Leakage current has become a stringent constraint in modern processor designs in addition to traditional constraints on frequency. Since leakage current exhibits a strong inverse correlation with circuit delay, effective parametric yield prediction must consider the dependence of leakage current on frequency. In this paper, we present a new chip-level statistical method to estimate the total leakage current in the presence of within-die and die-to-die variability. We develop a closed-form expression for total chip leakage that models the dependence of the leakage current distribution on a number of process parameters. The model is based on the concept of scaling factors to capture the effects of within-die variability. Using this model, we then present an integrated approach to accurately estimate the yield loss when both frequency and power limits are imposed on a design. Our method demonstrates the importance of considering both these limiters in calculating the yield of a lot.","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124013473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}