Pub Date : 1999-11-07DOI: 10.1109/ICCAD.1999.810654
Christoph Albrecht, B. Korte, Jürgen Schietke, J. Vygen
We consider the problem of finding an optimal clock schedule, i.e. optimal arrival times for clock signals at latches of a VLSI chip. We describe a general model which includes all previously considered models. Then we show how to optimize the cycle time and optimally balance slacks on data paths and on clocktree paths. The problem of finding a clock schedule with the optimum cycle time was solved before, either by linear programming or by binary search, using a test for negative circuits in a digraph as a subroutine. We show for the first time that a direct combinatorial algorithm solves this problem optimally. Incidentally, this yields a new efficient method for timing analysis with transparent latches. Moreover, we extend this algorithm to the slack balancing problem: To make the chip less sensitive to routing detours, process variations and manufacturing skew it is desirable to have as few critical paths as possible. We show how to find the clock schedule with minimum number of critical paths (optimum slack distribution) in a well-defined sense. Rather than fixed dock arrival times we show how to obtain as large as possible intervals for the clock arrival times. This can be considered as slack on clocktree paths. Indeed, we can find the global optimum of simultaneous optimization of slacks on all data paths and clocktree paths. All the above is done by very efficient network optimization algorithms, based on parametric shortest paths. Our computational results with recent IBM processor chips show that the number of critical paths decreases dramatically, in addition to a considerable improvement of the cycle time. The running times are reasonable even for the largest designs.
{"title":"Cycle time and slack optimization for VLSI-chips","authors":"Christoph Albrecht, B. Korte, Jürgen Schietke, J. Vygen","doi":"10.1109/ICCAD.1999.810654","DOIUrl":"https://doi.org/10.1109/ICCAD.1999.810654","url":null,"abstract":"We consider the problem of finding an optimal clock schedule, i.e. optimal arrival times for clock signals at latches of a VLSI chip. We describe a general model which includes all previously considered models. Then we show how to optimize the cycle time and optimally balance slacks on data paths and on clocktree paths. The problem of finding a clock schedule with the optimum cycle time was solved before, either by linear programming or by binary search, using a test for negative circuits in a digraph as a subroutine. We show for the first time that a direct combinatorial algorithm solves this problem optimally. Incidentally, this yields a new efficient method for timing analysis with transparent latches. Moreover, we extend this algorithm to the slack balancing problem: To make the chip less sensitive to routing detours, process variations and manufacturing skew it is desirable to have as few critical paths as possible. We show how to find the clock schedule with minimum number of critical paths (optimum slack distribution) in a well-defined sense. Rather than fixed dock arrival times we show how to obtain as large as possible intervals for the clock arrival times. This can be considered as slack on clocktree paths. Indeed, we can find the global optimum of simultaneous optimization of slacks on all data paths and clocktree paths. All the above is done by very efficient network optimization algorithms, based on parametric shortest paths. Our computational results with recent IBM processor chips show that the number of critical paths decreases dramatically, in addition to a considerable improvement of the cycle time. The running times are reasonable even for the largest designs.","PeriodicalId":6414,"journal":{"name":"1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051)","volume":"1 1","pages":"232-238"},"PeriodicalIF":0.0,"publicationDate":"1999-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78575780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-11-07DOI: 10.1109/ICCAD.1999.810677
Janet Roveda, E. Kuh, Qingjian Yu
A new Chebyshev expansion based model for distributed interconnect networks is presented in this paper. Unlike the moment methods, this new model is optimal and it does not require the knowledge of expansion points. An automatic order selection scheme is also included in the new model. By using the integrated congruence transform, we guarantee the passivity of the new model for distributed interconnect networks. Because of the orthogonality of Chebyshev polynomials, the Modified Gram-Schmidt algorithm can be simplified. In the experimental examples, the new model is found to be accurate and efficient.
{"title":"The Chebyshev expansion based passive model for distributed interconnect networks","authors":"Janet Roveda, E. Kuh, Qingjian Yu","doi":"10.1109/ICCAD.1999.810677","DOIUrl":"https://doi.org/10.1109/ICCAD.1999.810677","url":null,"abstract":"A new Chebyshev expansion based model for distributed interconnect networks is presented in this paper. Unlike the moment methods, this new model is optimal and it does not require the knowledge of expansion points. An automatic order selection scheme is also included in the new model. By using the integrated congruence transform, we guarantee the passivity of the new model for distributed interconnect networks. Because of the orthogonality of Chebyshev polynomials, the Modified Gram-Schmidt algorithm can be simplified. In the experimental examples, the new model is found to be accurate and efficient.","PeriodicalId":6414,"journal":{"name":"1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051)","volume":"10 1","pages":"370-375"},"PeriodicalIF":0.0,"publicationDate":"1999-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78842224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-11-07DOI: 10.1109/ICCAD.1999.810710
Rachid Helaihel, K. Olukotun
Using Java in embedded systems is plagued by problems of limited runtime performance and unpredictable runtime behavior. The Java Multi-Threaded Processor (JMTP) provides solutions to these problems. The JMTP architecture is a single chip containing an off-the-shelf general purpose processor core coupled with an array of Java Thread Processors (JTPs). Performance can be improved using this architecture by exploiting coarse-grained parallelism in the application. These performance improvements are achieved with relatively small hardware costs. Runtime predictability is improved by implementing a subset of the Java Virtual Machine (JVM) specification in the JTP and trimming away complexity without excessively restricting the Java code a JTP can handle. Moreover the JMTP architecture incorporates hardware to adaptively manage shared JMTP resources in order to satisfy JTP thread timing constraints or provide an early warning for a timing violation. This is an important feature for applications with quality-of-service demands. In addition to the hardware architecture, we describe a software framework that analyzes a Java application for expressed and implicit coarse-grained concurrent threads to execute on JTPs. This framework identifies the optimal mapping of an application to a JMTP with an arbitrary number of JTPs. We have tested this framework on a variety of applications including IDEA encryption with different JTP configurations and confirmed that the algorithm was able to obtain desired results in each case.
{"title":"JMTP: an architecture for exploiting concurrency in embedded Java applications with real-time considerations","authors":"Rachid Helaihel, K. Olukotun","doi":"10.1109/ICCAD.1999.810710","DOIUrl":"https://doi.org/10.1109/ICCAD.1999.810710","url":null,"abstract":"Using Java in embedded systems is plagued by problems of limited runtime performance and unpredictable runtime behavior. The Java Multi-Threaded Processor (JMTP) provides solutions to these problems. The JMTP architecture is a single chip containing an off-the-shelf general purpose processor core coupled with an array of Java Thread Processors (JTPs). Performance can be improved using this architecture by exploiting coarse-grained parallelism in the application. These performance improvements are achieved with relatively small hardware costs. Runtime predictability is improved by implementing a subset of the Java Virtual Machine (JVM) specification in the JTP and trimming away complexity without excessively restricting the Java code a JTP can handle. Moreover the JMTP architecture incorporates hardware to adaptively manage shared JMTP resources in order to satisfy JTP thread timing constraints or provide an early warning for a timing violation. This is an important feature for applications with quality-of-service demands. In addition to the hardware architecture, we describe a software framework that analyzes a Java application for expressed and implicit coarse-grained concurrent threads to execute on JTPs. This framework identifies the optimal mapping of an application to a JMTP with an arbitrary number of JTPs. We have tested this framework on a variety of applications including IDEA encryption with different JTP configurations and confirmed that the algorithm was able to obtain desired results in each case.","PeriodicalId":6414,"journal":{"name":"1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051)","volume":"15 1","pages":"551-557"},"PeriodicalIF":0.0,"publicationDate":"1999-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73699571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-11-07DOI: 10.1109/ICCAD.1999.810690
M. Beattie, L. Pileggi
The increasing interconnect density and operating frequencies of system-on-a-chip (SOC) designs necessitates extraction of parasitic electromagnetic couplings beyond the localized confines of functional design blocks. In addition, SOC design styles and gridless variable-width routing make it increasingly difficult to use precharacterized library shapes for parasitic extraction. A comprehensive capacitance and inductance extraction solution requires a hierarchical data representation and fast runtime algorithms. We illustrate through examples that both the multipole method and hierarchical refinement, which are the two most successful approaches for parasitic extraction to date, work efficiently only under certain, limiting conditions. To improve this situation we present an approach which combines the best of both methods into a concurrent multipole refinement representation of the electromagnetic interaction which is efficient for arbitrary interconnect configurations. We use a generalized formulation of electromagnetic interactions to exploit the similarities in capacitance and inductance extraction for greater efficiency.
{"title":"Electromagnetic parasitic extraction via a multipole method with hierarchical refinement","authors":"M. Beattie, L. Pileggi","doi":"10.1109/ICCAD.1999.810690","DOIUrl":"https://doi.org/10.1109/ICCAD.1999.810690","url":null,"abstract":"The increasing interconnect density and operating frequencies of system-on-a-chip (SOC) designs necessitates extraction of parasitic electromagnetic couplings beyond the localized confines of functional design blocks. In addition, SOC design styles and gridless variable-width routing make it increasingly difficult to use precharacterized library shapes for parasitic extraction. A comprehensive capacitance and inductance extraction solution requires a hierarchical data representation and fast runtime algorithms. We illustrate through examples that both the multipole method and hierarchical refinement, which are the two most successful approaches for parasitic extraction to date, work efficiently only under certain, limiting conditions. To improve this situation we present an approach which combines the best of both methods into a concurrent multipole refinement representation of the electromagnetic interaction which is efficient for arbitrary interconnect configurations. We use a generalized formulation of electromagnetic interactions to exploit the similarities in capacitance and inductance extraction for greater efficiency.","PeriodicalId":6414,"journal":{"name":"1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051)","volume":"14 1","pages":"437-444"},"PeriodicalIF":0.0,"publicationDate":"1999-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73806061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-11-07DOI: 10.1109/ICCAD.1999.810636
A. Samavedam, K. Mayaram, T. Fiez
A scalable macromodel for substrate noise coupling in heavily doped substrates has been developed. This model is simple since it requires only four parameters which can readily be extracted from a small number of device simulations or measurements. Once these parameters have been determined the model can be used for any spacing between the injection and sensing contacts and for different contact geometries. The scalability of the model with separation and width provides insight into substrate coupling and optimization issues prior to and during the layout phase. The model is validated for a 2 /spl mu/m and a 0.5 /spl mu/m CMOS process where it is shown that the simple model predicts the noise coupling accurately. Measurements from a chip fabricated in a 0.5 /spl mu/m CMOS process show good agreement with the model.
建立了一个可扩展的高掺杂衬底噪声耦合宏观模型。该模型很简单,因为它只需要四个参数,这些参数可以很容易地从少量设备模拟或测量中提取出来。一旦确定了这些参数,该模型就可以用于注射和感应触点之间的任何间距以及不同的触点几何形状。该模型具有分离和宽度的可扩展性,可以在布局阶段之前和期间深入了解基板耦合和优化问题。在2 /spl mu/m和0.5 /spl mu/m的CMOS工艺中验证了该模型,结果表明该简单模型能够准确地预测噪声耦合。在0.5 /spl μ m CMOS工艺中制造的芯片的测量结果与模型吻合良好。
{"title":"A scalable substrate noise coupling model for mixed-signal ICs","authors":"A. Samavedam, K. Mayaram, T. Fiez","doi":"10.1109/ICCAD.1999.810636","DOIUrl":"https://doi.org/10.1109/ICCAD.1999.810636","url":null,"abstract":"A scalable macromodel for substrate noise coupling in heavily doped substrates has been developed. This model is simple since it requires only four parameters which can readily be extracted from a small number of device simulations or measurements. Once these parameters have been determined the model can be used for any spacing between the injection and sensing contacts and for different contact geometries. The scalability of the model with separation and width provides insight into substrate coupling and optimization issues prior to and during the layout phase. The model is validated for a 2 /spl mu/m and a 0.5 /spl mu/m CMOS process where it is shown that the simple model predicts the noise coupling accurately. Measurements from a chip fabricated in a 0.5 /spl mu/m CMOS process show good agreement with the model.","PeriodicalId":6414,"journal":{"name":"1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051)","volume":"10 1","pages":"128-131"},"PeriodicalIF":0.0,"publicationDate":"1999-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73145603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-11-07DOI: 10.1109/ICCAD.1999.810718
A. Kahng, D. Kirovski, S. Mantik, M. Potkonjak, J. Wong
We give the first study of copy detection techniques for VLSI CAD applications; these techniques are complementary to previous watermarking-based IP protection methods in finding and proving improper use of design IP. After reviewing related literature (notably in the text processing domain), we propose a generic methodology for copy detection based on determining basic elements within structural representations of solutions (IPs), calculating (context-independent) signatures for such elements, and performing fast comparisons to identify potential violators of IP rights. We give example implementations of this methodology in the domains of scheduling, graph coloring and gate-level layout; experimental results show the effectiveness of our copy detection schemes as well as the low overhead of implementation. We remark on open research areas, notably the potentially deep and complementary interaction between watermarking and copy detection.
{"title":"Copy detection for intellectual property protection of VLSI designs","authors":"A. Kahng, D. Kirovski, S. Mantik, M. Potkonjak, J. Wong","doi":"10.1109/ICCAD.1999.810718","DOIUrl":"https://doi.org/10.1109/ICCAD.1999.810718","url":null,"abstract":"We give the first study of copy detection techniques for VLSI CAD applications; these techniques are complementary to previous watermarking-based IP protection methods in finding and proving improper use of design IP. After reviewing related literature (notably in the text processing domain), we propose a generic methodology for copy detection based on determining basic elements within structural representations of solutions (IPs), calculating (context-independent) signatures for such elements, and performing fast comparisons to identify potential violators of IP rights. We give example implementations of this methodology in the domains of scheduling, graph coloring and gate-level layout; experimental results show the effectiveness of our copy detection schemes as well as the low overhead of implementation. We remark on open research areas, notably the potentially deep and complementary interaction between watermarking and copy detection.","PeriodicalId":6414,"journal":{"name":"1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051)","volume":"28 1","pages":"600-604"},"PeriodicalIF":0.0,"publicationDate":"1999-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88300778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-11-07DOI: 10.1109/ICCAD.1999.810704
P. Rezvani, A. Ajami, Massoud Pedram, H. Savoj
We present LEOPARD, a fanout optimization algorithm based on the effort delay model for near-continuous size buffer libraries. Our algorithm minimizes area under required timing and input capacitance constraints by finding the tree topology and assigning different gains to each buffer to minimize the total buffer area. Experimental results show that the new algorithm achieves significant buffer area improvement compared to previous approaches.
{"title":"LEOPARD: a Logical Effort-based fanout OPtimizer for ARea and Delay","authors":"P. Rezvani, A. Ajami, Massoud Pedram, H. Savoj","doi":"10.1109/ICCAD.1999.810704","DOIUrl":"https://doi.org/10.1109/ICCAD.1999.810704","url":null,"abstract":"We present LEOPARD, a fanout optimization algorithm based on the effort delay model for near-continuous size buffer libraries. Our algorithm minimizes area under required timing and input capacitance constraints by finding the tree topology and assigning different gains to each buffer to minimize the total buffer area. Experimental results show that the new algorithm achieves significant buffer area improvement compared to previous approaches.","PeriodicalId":6414,"journal":{"name":"1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051)","volume":"53 4 1","pages":"516-519"},"PeriodicalIF":0.0,"publicationDate":"1999-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86782340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-11-07DOI: 10.1109/ICCAD.1999.810610
Robert M. Fuhrer, S. Nowick
The optimal state minimization problem is to select a reduced state machine having the best logic implementation over all possible state reductions and encodings. The OPTIMIST (OPTImal MInimization of STates) algorithm (R.M. Fuhrer et al., 1997) was the first general solution to this problem for synchronous finite state machines (FSMs). In this paper, we present the first solution for asynchronous FSMs. This paper makes two contributions. First, we introduce OPTIMISTA (OPTIMIST-Asynchronous), a new algorithm which guarantees optimum 2-level output logic for asynchronous FSMs. In asynchronous machines, output logic is often critical: it usually determines the machine latency. The algorithm is formulated as a binate constraint satisfaction problem, which is solved using a binate solver. The second contribution is a novel alternative result: the unreduced machine itself can be used directly to obtain minimum-cardinality output logic. Thus, this paper presents two approaches: using OPTIMISTA, which simultaneously performs state and logic minimization; or using no state reduction (if output logic cardinality is of sole interest). Extensions for literal optimization, targetted to multi-level logic, are also proposed.
{"title":"OPTIMISTA: state minimization of asynchronous FSMs for optimum output logic","authors":"Robert M. Fuhrer, S. Nowick","doi":"10.1109/ICCAD.1999.810610","DOIUrl":"https://doi.org/10.1109/ICCAD.1999.810610","url":null,"abstract":"The optimal state minimization problem is to select a reduced state machine having the best logic implementation over all possible state reductions and encodings. The OPTIMIST (OPTImal MInimization of STates) algorithm (R.M. Fuhrer et al., 1997) was the first general solution to this problem for synchronous finite state machines (FSMs). In this paper, we present the first solution for asynchronous FSMs. This paper makes two contributions. First, we introduce OPTIMISTA (OPTIMIST-Asynchronous), a new algorithm which guarantees optimum 2-level output logic for asynchronous FSMs. In asynchronous machines, output logic is often critical: it usually determines the machine latency. The algorithm is formulated as a binate constraint satisfaction problem, which is solved using a binate solver. The second contribution is a novel alternative result: the unreduced machine itself can be used directly to obtain minimum-cardinality output logic. Thus, this paper presents two approaches: using OPTIMISTA, which simultaneously performs state and logic minimization; or using no state reduction (if output logic cardinality is of sole interest). Extensions for literal optimization, targetted to multi-level logic, are also proposed.","PeriodicalId":6414,"journal":{"name":"1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051)","volume":"34 1","pages":"7-13"},"PeriodicalIF":0.0,"publicationDate":"1999-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86899665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-11-07DOI: 10.1109/ICCAD.1999.810703
R. Murgai
Fanout optimization is a fundamental problem in timing optimization. Most of the research has focussed on the fanout optimization problem for a single net (or the local fanout optimization problem-LFO). The real goal, however, is to optimize the delay through the entire circuit by fanout optimization, This is the global fanout optimization (GFO) problem. H. Touati (1990) claims that visiting nets of the network in a reverse topological order (from primary outputs to inputs), applying the optimum LFO algorithm to each net, computing the new required time at the source and propagating the delay changes to the fanins yields a provably optimum solution to the GFO problem. This result implies that GFO is solvable in polynomial time if LFO is. We show that that is not the case. We prove that GFO is NP-complete even if there are a constant number of buffering choices at each net. We analyze Touati's result and point out the flaw in his argument. We then present sufficient conditions for the optimality of the reverse topological algorithm.
{"title":"On the global fanout optimization problem","authors":"R. Murgai","doi":"10.1109/ICCAD.1999.810703","DOIUrl":"https://doi.org/10.1109/ICCAD.1999.810703","url":null,"abstract":"Fanout optimization is a fundamental problem in timing optimization. Most of the research has focussed on the fanout optimization problem for a single net (or the local fanout optimization problem-LFO). The real goal, however, is to optimize the delay through the entire circuit by fanout optimization, This is the global fanout optimization (GFO) problem. H. Touati (1990) claims that visiting nets of the network in a reverse topological order (from primary outputs to inputs), applying the optimum LFO algorithm to each net, computing the new required time at the source and propagating the delay changes to the fanins yields a provably optimum solution to the GFO problem. This result implies that GFO is solvable in polynomial time if LFO is. We show that that is not the case. We prove that GFO is NP-complete even if there are a constant number of buffering choices at each net. We analyze Touati's result and point out the flaw in his argument. We then present sufficient conditions for the optimality of the reverse topological algorithm.","PeriodicalId":6414,"journal":{"name":"1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051)","volume":"1 1","pages":"511-515"},"PeriodicalIF":0.0,"publicationDate":"1999-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85016189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-11-07DOI: 10.1109/ICCAD.1999.810661
Eui-Young Chung, L. Benini, G. Micheli
Dynamic power management (DPM) is a technique to reduce the power consumption of electronic systems by selectively shutting down idle components. The quality of the shutdown control algorithm (the power management policy) mostly depends on knowledge of the user's behavior, which in many cases is initially unknown or non-stationary. For this reason, DPM policies should be capable of adapting to changes in user behavior. In this paper, we present a novel DPM scheme based on idle period clustering and adaptive learning trees. We also provide a design guide for applying our technique to components with multiple sleep states. Experimental results show that our technique outperforms other advanced DPM schemes as well as simple time-out policies. The proposed approach shows little deviation of efficiency for various workloads having different characteristics, while other policies show that their efficiency changes drastically depending on the trace data characteristics. Furthermore, experimental evidence indicates that our workload learning algorithm is stable and has fast convergence.
{"title":"Dynamic power management using adaptive learning tree","authors":"Eui-Young Chung, L. Benini, G. Micheli","doi":"10.1109/ICCAD.1999.810661","DOIUrl":"https://doi.org/10.1109/ICCAD.1999.810661","url":null,"abstract":"Dynamic power management (DPM) is a technique to reduce the power consumption of electronic systems by selectively shutting down idle components. The quality of the shutdown control algorithm (the power management policy) mostly depends on knowledge of the user's behavior, which in many cases is initially unknown or non-stationary. For this reason, DPM policies should be capable of adapting to changes in user behavior. In this paper, we present a novel DPM scheme based on idle period clustering and adaptive learning trees. We also provide a design guide for applying our technique to components with multiple sleep states. Experimental results show that our technique outperforms other advanced DPM schemes as well as simple time-out policies. The proposed approach shows little deviation of efficiency for various workloads having different characteristics, while other policies show that their efficiency changes drastically depending on the trace data characteristics. Furthermore, experimental evidence indicates that our workload learning algorithm is stable and has fast convergence.","PeriodicalId":6414,"journal":{"name":"1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051)","volume":"40 1","pages":"274-279"},"PeriodicalIF":0.0,"publicationDate":"1999-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74118013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}