Pub Date : 1999-11-07DOI: 10.1109/ICCAD.1999.810624
A. Demir, P. Feldmann
Introduces a methodology for the evaluation of the interference noise caused by digital switching activity in sensitive circuits of a mixed digital-analog chip. The digital switching activity is modeled stochastically as functions defined on Markov chains. The actual interference signal is obtained through the modulation of this discrete stochastic signal with real current injection patterns stored a priori in a pre-characterized library. The interference noise results from the propagation of these continuous stochastic signals through the linear network that models the chip power grid, substrate and relevant package parasitics. The interference noise power spectral density is computed by linear frequency-domain analysis. The methodology is implemented using advanced numerical techniques that are capable of tackling very large problems.
{"title":"Modeling and simulation of the interference due to digital switching in mixed-signal ICs","authors":"A. Demir, P. Feldmann","doi":"10.1109/ICCAD.1999.810624","DOIUrl":"https://doi.org/10.1109/ICCAD.1999.810624","url":null,"abstract":"Introduces a methodology for the evaluation of the interference noise caused by digital switching activity in sensitive circuits of a mixed digital-analog chip. The digital switching activity is modeled stochastically as functions defined on Markov chains. The actual interference signal is obtained through the modulation of this discrete stochastic signal with real current injection patterns stored a priori in a pre-characterized library. The interference noise results from the propagation of these continuous stochastic signals through the linear network that models the chip power grid, substrate and relevant package parasitics. The interference noise power spectral density is computed by linear frequency-domain analysis. The methodology is implemented using advanced numerical techniques that are capable of tackling very large problems.","PeriodicalId":6414,"journal":{"name":"1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051)","volume":"6 1","pages":"70-74"},"PeriodicalIF":0.0,"publicationDate":"1999-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79860902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-11-07DOI: 10.1109/ICCAD.1999.810614
I. Neumann, D. Stoffel, H. Hartje, W. Kunz
Presents a new timing-driven approach for cell replication tailored to the practical needs of standard cell layout design. Cell replication methods have been studied extensively in the context of generic partitioning problems. However, until now, it has remained unclear what practical benefit can be obtained from this concept in a realistic environment for timing-driven layout synthesis. Therefore, this paper presents a timing-driven cell replication procedure, demonstrates its incorporation into a standard cell placement and routing tool, and examines its benefit on the final circuit performance in comparison with conventional gate or transistor sizing techniques. Furthermore, we demonstrate that cell replication can deteriorate the stuck-at fault testability of circuits and show that stuck-at redundancy elimination must be integrated into the placement procedure. Experimental results demonstrate the usefulness of the proposed methodology and suggest that cell replication should be an integral part of the physical design flow complementing traditional gate sizing techniques.
{"title":"Cell replication and redundancy elimination during placement for cycle time optimization","authors":"I. Neumann, D. Stoffel, H. Hartje, W. Kunz","doi":"10.1109/ICCAD.1999.810614","DOIUrl":"https://doi.org/10.1109/ICCAD.1999.810614","url":null,"abstract":"Presents a new timing-driven approach for cell replication tailored to the practical needs of standard cell layout design. Cell replication methods have been studied extensively in the context of generic partitioning problems. However, until now, it has remained unclear what practical benefit can be obtained from this concept in a realistic environment for timing-driven layout synthesis. Therefore, this paper presents a timing-driven cell replication procedure, demonstrates its incorporation into a standard cell placement and routing tool, and examines its benefit on the final circuit performance in comparison with conventional gate or transistor sizing techniques. Furthermore, we demonstrate that cell replication can deteriorate the stuck-at fault testability of circuits and show that stuck-at redundancy elimination must be integrated into the placement procedure. Experimental results demonstrate the usefulness of the proposed methodology and suggest that cell replication should be an integral part of the physical design flow complementing traditional gate sizing techniques.","PeriodicalId":6414,"journal":{"name":"1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051)","volume":"5 1","pages":"25-30"},"PeriodicalIF":0.0,"publicationDate":"1999-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83386891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-11-07DOI: 10.1109/ICCAD.1999.810680
S. Ravi, G. Lakshminarayana, N. Jha
Available techniques for testing core-based systems-on-a-chip (SOCs) do not provide a systematic means for synthesising low-overhead test architectures and compact test solutions. In this paper, we provide a comprehensive framework that generates low-overhead compact test solutions for SOCs. First, we develop a common ground for addressing issues such as core test requirements, core access and test hardware additions. For this purpose, we introduce finite-state automata for modeling tests, transparency modes and test hardware behavior. In many cases, the tests repeat a basic set of test actions for different test data which can again be modeled using finite-state automata. While earlier work can derive a single symbolic test for a module in a register-transfer level (RTL) circuit as a finite-state automaton, this work extends the methodology to the system level, and, additionally contributes a satisfiability-based solution to the problem of applying a sequence of tests phased in time. This problem is known to be a bottleneck in testability analysis not only at the system level, but also at the RTL. Experimental results show that the system-level average area overhead for making SOCs testable with our method is only 4.4%, while achieving an average test application time reduction of 78.5% over recent approaches. At the same time, it provides 100% test coverage of the precomputed test sets/sequences of the embedded cores.
{"title":"A framework for testing core-based systems-on-a-chip","authors":"S. Ravi, G. Lakshminarayana, N. Jha","doi":"10.1109/ICCAD.1999.810680","DOIUrl":"https://doi.org/10.1109/ICCAD.1999.810680","url":null,"abstract":"Available techniques for testing core-based systems-on-a-chip (SOCs) do not provide a systematic means for synthesising low-overhead test architectures and compact test solutions. In this paper, we provide a comprehensive framework that generates low-overhead compact test solutions for SOCs. First, we develop a common ground for addressing issues such as core test requirements, core access and test hardware additions. For this purpose, we introduce finite-state automata for modeling tests, transparency modes and test hardware behavior. In many cases, the tests repeat a basic set of test actions for different test data which can again be modeled using finite-state automata. While earlier work can derive a single symbolic test for a module in a register-transfer level (RTL) circuit as a finite-state automaton, this work extends the methodology to the system level, and, additionally contributes a satisfiability-based solution to the problem of applying a sequence of tests phased in time. This problem is known to be a bottleneck in testability analysis not only at the system level, but also at the RTL. Experimental results show that the system-level average area overhead for making SOCs testable with our method is only 4.4%, while achieving an average test application time reduction of 78.5% over recent approaches. At the same time, it provides 100% test coverage of the precomputed test sets/sequences of the embedded cores.","PeriodicalId":6414,"journal":{"name":"1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051)","volume":"61 1","pages":"385-390"},"PeriodicalIF":0.0,"publicationDate":"1999-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81027151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-11-07DOI: 10.1109/ICCAD.1999.810678
E. Gad, M. Nakhla
A new algorithm based on model reduction using the Krylov subspace technique is proposed to compute the DC solution of large nonlinear circuits. The proposed method combines continuation methods with model reduction techniques. Thus it enables the application of the continuation methods to an equivalent reduced-order set of nonlinear equations instead of the original system. This results in a significant reduction in the computational expense as the size of the reduced equations is much less than that of the original system. The reduced order system is obtained by projecting the set of nonlinear equations, whose solution represents the DC operating point, into a subspace of a much lower dimension. It is also shown that both the reduced-order system and the original system share the first q derivatives w.r.t. the circuit variable used to parameterize the family of the solution trajectories generated by the continuation method.
{"title":"Model reduction for DC solution of large nonlinear circuits","authors":"E. Gad, M. Nakhla","doi":"10.1109/ICCAD.1999.810678","DOIUrl":"https://doi.org/10.1109/ICCAD.1999.810678","url":null,"abstract":"A new algorithm based on model reduction using the Krylov subspace technique is proposed to compute the DC solution of large nonlinear circuits. The proposed method combines continuation methods with model reduction techniques. Thus it enables the application of the continuation methods to an equivalent reduced-order set of nonlinear equations instead of the original system. This results in a significant reduction in the computational expense as the size of the reduced equations is much less than that of the original system. The reduced order system is obtained by projecting the set of nonlinear equations, whose solution represents the DC operating point, into a subspace of a much lower dimension. It is also shown that both the reduced-order system and the original system share the first q derivatives w.r.t. the circuit variable used to parameterize the family of the solution trajectories generated by the continuation method.","PeriodicalId":6414,"journal":{"name":"1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051)","volume":"25 1","pages":"376-379"},"PeriodicalIF":0.0,"publicationDate":"1999-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78761315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-11-07DOI: 10.1109/ICCAD.1999.810712
K. Lahiri, A. Raghunathan, S. Dey
This paper addresses the problem of efficient and accurate performance analysis to drive the exploration and design of bus-based system-on-chip (SOC) communication architectures. Our technique fills a gap in existing techniques for system-level performance analysis, which are either too slow to use in an iterative communication architecture design framework (e.g., simulation of the complete system), or are not accurate enough to drive the design of the communication architecture (e.g., techniques that perform a static analysis of the system performance). The proposed system-level performance analysis technique consists of: initial co-simulation performed after HW/SW partitioning and mapping, with the communication between components modeled in an abstract manner (e.g., as events or data transfers); extraction of abstracted symbolic traces, represented as a bus and synchronization event (BSE) graph, that captures the activity of the various system components and their communication over time; and manipulation of the BSE graph using the bus parameters, to derive the behavior of the system accounting for effects of the bus architecture. We present experimental results on several example systems, including a TCP/IP network interface card sub-system. The results indicate that our performance estimation technique is over two orders of magnitude faster than performing a complete system simulation, while being very accurate (within 2.2% of performance estimates derived from accurate HW/SW co-simulation).
{"title":"Fast performance analysis of bus-based system-on-chip communication architectures","authors":"K. Lahiri, A. Raghunathan, S. Dey","doi":"10.1109/ICCAD.1999.810712","DOIUrl":"https://doi.org/10.1109/ICCAD.1999.810712","url":null,"abstract":"This paper addresses the problem of efficient and accurate performance analysis to drive the exploration and design of bus-based system-on-chip (SOC) communication architectures. Our technique fills a gap in existing techniques for system-level performance analysis, which are either too slow to use in an iterative communication architecture design framework (e.g., simulation of the complete system), or are not accurate enough to drive the design of the communication architecture (e.g., techniques that perform a static analysis of the system performance). The proposed system-level performance analysis technique consists of: initial co-simulation performed after HW/SW partitioning and mapping, with the communication between components modeled in an abstract manner (e.g., as events or data transfers); extraction of abstracted symbolic traces, represented as a bus and synchronization event (BSE) graph, that captures the activity of the various system components and their communication over time; and manipulation of the BSE graph using the bus parameters, to derive the behavior of the system accounting for effects of the bus architecture. We present experimental results on several example systems, including a TCP/IP network interface card sub-system. The results indicate that our performance estimation technique is over two orders of magnitude faster than performing a complete system simulation, while being very accurate (within 2.2% of performance estimates derived from accurate HW/SW co-simulation).","PeriodicalId":6414,"journal":{"name":"1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051)","volume":"50 1","pages":"566-572"},"PeriodicalIF":0.0,"publicationDate":"1999-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90578813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-11-07DOI: 10.1109/ICCAD.1999.810663
A. Bogliolo, Roberto Corgnati, E. Macii, M. Poncino
We propose a new RTL power macromodel that is suitable for re-configurable, synthesizable soft-macros. The model is parameterized with respect to the input data size (i.e., bit-width), and can be automatically scaled with respect to different technology libraries and/or synthesis options. Scalability is obtained through a single additional characterization run, and does not require the disclosure of any intellectual property. The model is derived from empirical analysis of the sensitivity of power on input statistics, input data size and technology. The experiments prove that, with limited approximation, it is possible to de-couple the effects on power of these three factors. The proposed solution is innovative, since no previous macromodel supports automatic technology scaling, and yields estimation errors within 15%.
{"title":"Parameterized RTL power models for combinational soft macros","authors":"A. Bogliolo, Roberto Corgnati, E. Macii, M. Poncino","doi":"10.1109/ICCAD.1999.810663","DOIUrl":"https://doi.org/10.1109/ICCAD.1999.810663","url":null,"abstract":"We propose a new RTL power macromodel that is suitable for re-configurable, synthesizable soft-macros. The model is parameterized with respect to the input data size (i.e., bit-width), and can be automatically scaled with respect to different technology libraries and/or synthesis options. Scalability is obtained through a single additional characterization run, and does not require the disclosure of any intellectual property. The model is derived from empirical analysis of the sensitivity of power on input statistics, input data size and technology. The experiments prove that, with limited approximation, it is possible to de-couple the effects on power of these three factors. The proposed solution is innovative, since no previous macromodel supports automatic technology scaling, and yields estimation errors within 15%.","PeriodicalId":6414,"journal":{"name":"1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051)","volume":"32 1","pages":"284-287"},"PeriodicalIF":0.0,"publicationDate":"1999-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87025984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-11-07DOI: 10.1109/ICCAD.1999.810717
D. Kirovski, M. Potkonjak
Recently, a number of techniques for IP protection have been introduced that rely on a selection of a global solution to an optimization problem according to a unique user-specific digital signature. Although such techniques may provide convincing proof of authorship with low hardware overhead, they fail to protect parts of design, do not provide an easy procedure for watermark detection, and are not capable of detecting the watermark when the design or its part is augmented in another larger design. Since these demands are of the highest interest for the IP business, we introduce localized watermarking as an IP protection technique that enables these features while satisfying the demand for low-cost and transparency. We propose a set of protocols that implement the new watermarking methodology at the operation scheduling design level. We have demonstrated that the difficulty of erasing or finding another signature in the synthesized design can be made arbitrarily computationally difficult. The watermarking method has been tested on a set of real-life benchmarks where high likelihood of authorship has been achieved with negligible overhead in solution quality.
{"title":"Localized watermarking: methodology and application to operation scheduling","authors":"D. Kirovski, M. Potkonjak","doi":"10.1109/ICCAD.1999.810717","DOIUrl":"https://doi.org/10.1109/ICCAD.1999.810717","url":null,"abstract":"Recently, a number of techniques for IP protection have been introduced that rely on a selection of a global solution to an optimization problem according to a unique user-specific digital signature. Although such techniques may provide convincing proof of authorship with low hardware overhead, they fail to protect parts of design, do not provide an easy procedure for watermark detection, and are not capable of detecting the watermark when the design or its part is augmented in another larger design. Since these demands are of the highest interest for the IP business, we introduce localized watermarking as an IP protection technique that enables these features while satisfying the demand for low-cost and transparency. We propose a set of protocols that implement the new watermarking methodology at the operation scheduling design level. We have demonstrated that the difficulty of erasing or finding another signature in the synthesized design can be made arbitrarily computationally difficult. The watermarking method has been tested on a set of real-life benchmarks where high likelihood of authorship has been achieved with negligible overhead in solution quality.","PeriodicalId":6414,"journal":{"name":"1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051)","volume":"3 1","pages":"596-599"},"PeriodicalIF":0.0,"publicationDate":"1999-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86202906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-11-07DOI: 10.1109/ICCAD.1999.810611
Kei-Yong Khoo, Zhan Yu, A. Willson
Addresses the bit-level optimization of carry-save adder (CSA) arrays when the operands are of unequal wordlength (such as in some datapaths in digital signal processing circuits). We first show that by relaxing the carry-save representation to allow for more than two signals per bit position, we gain flexibility in the bit-level implementation of CSA arrays that can be exploited to achieve a more efficient design. We then propose algorithms to optimize a single adder array at the bit-level. In addition, we proposed a heuristic to optimize a series of adder arrays that might occur in a datapath. We have applied our algorithms to the optimization of high-speed digital FIR filters and have achieved 15% to 30% savings (weighted cost) in the overall filter implementation array in comparison to the standard carry-save implementation.
{"title":"Bit-level arithmetic optimization for carry-save additions","authors":"Kei-Yong Khoo, Zhan Yu, A. Willson","doi":"10.1109/ICCAD.1999.810611","DOIUrl":"https://doi.org/10.1109/ICCAD.1999.810611","url":null,"abstract":"Addresses the bit-level optimization of carry-save adder (CSA) arrays when the operands are of unequal wordlength (such as in some datapaths in digital signal processing circuits). We first show that by relaxing the carry-save representation to allow for more than two signals per bit position, we gain flexibility in the bit-level implementation of CSA arrays that can be exploited to achieve a more efficient design. We then propose algorithms to optimize a single adder array at the bit-level. In addition, we proposed a heuristic to optimize a series of adder arrays that might occur in a datapath. We have applied our algorithms to the optimization of high-speed digital FIR filters and have achieved 15% to 30% savings (weighted cost) in the overall filter implementation array in comparison to the standard carry-save implementation.","PeriodicalId":6414,"journal":{"name":"1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051)","volume":"87 1","pages":"14-18"},"PeriodicalIF":0.0,"publicationDate":"1999-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81162619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-11-07DOI: 10.1109/ICCAD.1999.810689
C. Alpert, A. Devgan, Stephen T. Quay
Wire sizing and buffer insertion/sizing are critical optimizations in deep submicron design. The past years have seen several studies of buffer insertion, wire sizing, and their simultaneous optimization. When wiring long interconnect, tapering, i.e., reducing the wire width as the distance from the driver increases, has proven effective. However tapering is not widely utilized in industry since it is difficult to integrate into a complete routing methodology. The article examines the benefits of wire sizing with tapering when combined with buffer insertion. We perform several experiments with actual IBM technologies. Results indicate that wire tapering reduces delay typically by less than 5% compared to uniform wire sizing, when buffers can be inserted. Consequently, we suggest that it may not be worthwhile to maintain a routing methodology that supports wire tapering.
{"title":"Is wire tapering worthwhile?","authors":"C. Alpert, A. Devgan, Stephen T. Quay","doi":"10.1109/ICCAD.1999.810689","DOIUrl":"https://doi.org/10.1109/ICCAD.1999.810689","url":null,"abstract":"Wire sizing and buffer insertion/sizing are critical optimizations in deep submicron design. The past years have seen several studies of buffer insertion, wire sizing, and their simultaneous optimization. When wiring long interconnect, tapering, i.e., reducing the wire width as the distance from the driver increases, has proven effective. However tapering is not widely utilized in industry since it is difficult to integrate into a complete routing methodology. The article examines the benefits of wire sizing with tapering when combined with buffer insertion. We perform several experiments with actual IBM technologies. Results indicate that wire tapering reduces delay typically by less than 5% compared to uniform wire sizing, when buffers can be inserted. Consequently, we suggest that it may not be worthwhile to maintain a routing methodology that supports wire tapering.","PeriodicalId":6414,"journal":{"name":"1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051)","volume":"21 1","pages":"430-435"},"PeriodicalIF":0.0,"publicationDate":"1999-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89632719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Minimizing power consumption is of paramount importance during the design of embedded (mobile computing) systems that come as systems-on-a-chip, since interdependencies between design characteristics like power, performance and area for various system parts (cores) are becoming increasingly influential. In this scenario, interfaces play a key role, since they allow one to control/exploit these interdependencies with the aim of meeting design constraints like power. In this paper, we present a comprehensive approach to explore this impact. We consider a whole system comprising a CPU, caches, a main memory and interfaces between those cores, and we demonstrate the high impact that an adequate adaptation between core parameters and interface parameters has in terms of power consumption. We find in particular that cache parameters and the configurations of cache buses have a significant impact in this respect. In addition, we make the important observation that optimizing for performance no longer implies that power is optimized as well in deep submicron technologies. Instead, we find that, especially for newer technologies, the relative interface power contribution increases, leading to scenarios where we obtain a real power/performance tradeoff. In summary, our explorations have revealed as yet uninvestigated interdependencies that represent the first step towards future efforts to optimize/adapt interfaces and caches in core-based systems for low-power designs.
{"title":"Interface and cache power exploration for core-based embedded system design","authors":"T. Givargis, Jörg Henkel, F. Vahid","doi":"10.5555/339492.340025","DOIUrl":"https://doi.org/10.5555/339492.340025","url":null,"abstract":"Minimizing power consumption is of paramount importance during the design of embedded (mobile computing) systems that come as systems-on-a-chip, since interdependencies between design characteristics like power, performance and area for various system parts (cores) are becoming increasingly influential. In this scenario, interfaces play a key role, since they allow one to control/exploit these interdependencies with the aim of meeting design constraints like power. In this paper, we present a comprehensive approach to explore this impact. We consider a whole system comprising a CPU, caches, a main memory and interfaces between those cores, and we demonstrate the high impact that an adequate adaptation between core parameters and interface parameters has in terms of power consumption. We find in particular that cache parameters and the configurations of cache buses have a significant impact in this respect. In addition, we make the important observation that optimizing for performance no longer implies that power is optimized as well in deep submicron technologies. Instead, we find that, especially for newer technologies, the relative interface power contribution increases, leading to scenarios where we obtain a real power/performance tradeoff. In summary, our explorations have revealed as yet uninvestigated interdependencies that represent the first step towards future efforts to optimize/adapt interfaces and caches in core-based systems for low-power designs.","PeriodicalId":6414,"journal":{"name":"1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051)","volume":"205 1","pages":"270-273"},"PeriodicalIF":0.0,"publicationDate":"1999-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89647636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}