Pub Date : 2011-11-07DOI: 10.1109/ICCAD.2011.6105328
Ali Shafiee, M. Zolghadr, M. Arjomand, H. Sarbazi-Azad
Programmable hardware is gaining popularity as it can keep pace with growing performance demand in tight power budget, design and test cost, and serious reliability concerns of future multiprocessor embedded systems. Compatible with this trend, Network-on-Chip, as a potential bottleneck of future multi-cores, should also support pro-grammability. Here, we address this issue in design and implementation of routing algorithm for two-dimensional mesh. To this end, we allocate paths based on input traffic pattern and in parallel with customizing routing restriction for deadlock freedom. To achieve this, we propose extended turn model (ETM), a novel parametric deadlock-free routing for 2D meshes that generalize prior turn-based routing methods (e.g., odd-even) with great degree of freedoms. This model facilitates design of Mixed-Integer Linear Programming (MILP) approach, which considers channel dependency turns as independent variables and decides for both path allocation and routing restriction. We solve this problem by genetic algorithm and evaluate it using simulation experiments. Results reveal that application-aware ETM-based path allocation outperforms prior turn-based approaches under synthetic and real traffic loads.
{"title":"Application-aware deadlock-free oblivious routing based on extended turn-model","authors":"Ali Shafiee, M. Zolghadr, M. Arjomand, H. Sarbazi-Azad","doi":"10.1109/ICCAD.2011.6105328","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105328","url":null,"abstract":"Programmable hardware is gaining popularity as it can keep pace with growing performance demand in tight power budget, design and test cost, and serious reliability concerns of future multiprocessor embedded systems. Compatible with this trend, Network-on-Chip, as a potential bottleneck of future multi-cores, should also support pro-grammability. Here, we address this issue in design and implementation of routing algorithm for two-dimensional mesh. To this end, we allocate paths based on input traffic pattern and in parallel with customizing routing restriction for deadlock freedom. To achieve this, we propose extended turn model (ETM), a novel parametric deadlock-free routing for 2D meshes that generalize prior turn-based routing methods (e.g., odd-even) with great degree of freedoms. This model facilitates design of Mixed-Integer Linear Programming (MILP) approach, which considers channel dependency turns as independent variables and decides for both path allocation and routing restriction. We solve this problem by genetic algorithm and evaluate it using simulation experiments. Results reveal that application-aware ETM-based path allocation outperforms prior turn-based approaches under synthetic and real traffic loads.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"145 1","pages":"213-218"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73914237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-07DOI: 10.1109/ICCAD.2011.6105406
A. Ayupov, S. Burns
This paper presents an algorithm for compressing long traces generated using RTL or other fast simulation. The compressed traces can be used by power analysis tools to estimate power on the original traces. We show that the length of the compressed trace is independent of the length of original trace and is a function of circuit size (precisely, its active part) for which the trace was generated. Our experiments show up to 578× compression ratio on several long RTL traces (up to 320,000 clock transitions) used for power analysis on three industrial blocks (4K, 114K and 202K gates). This leads to significant runtime improvement, especially when the traces are reused over multiple power analysis runs. The dynamic power estimated using compressed traces is within 5% of the power analysis on original traces.
{"title":"A trace compression algorithm targeting power estimation of long benchmarks","authors":"A. Ayupov, S. Burns","doi":"10.1109/ICCAD.2011.6105406","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105406","url":null,"abstract":"This paper presents an algorithm for compressing long traces generated using RTL or other fast simulation. The compressed traces can be used by power analysis tools to estimate power on the original traces. We show that the length of the compressed trace is independent of the length of original trace and is a function of circuit size (precisely, its active part) for which the trace was generated. Our experiments show up to 578× compression ratio on several long RTL traces (up to 320,000 clock transitions) used for power analysis on three industrial blocks (4K, 114K and 202K gates). This leads to significant runtime improvement, especially when the traces are reused over multiple power analysis runs. The dynamic power estimated using compressed traces is within 5% of the power analysis on original traces.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"38 1","pages":"702-707"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82307436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-07DOI: 10.1109/ICCAD.2011.6105311
Dimitri de Jonghe, G. Gielen
Automated abstraction of large analog circuits greatly improves simulation time in custom analog design flows. Due to the high degree of variety of circuits this task is mainly a manual ad-hoc approach. This paper proposes an automated modeling approach for large scale analog circuits that produces compact expressions from a SPICE netlist. The presented method builds upon the state-of-the-art Trajectory PieceWise (TPW) approach. Because of their data-driven nature, TPW implementations generate models that require on-the-fly database interpolation during simulation, which is not embedded in a standard commercial design flow. Our approach solves this by recombining TPW samples as a surface in a mixed state space-frequency domain, revealing information about the circuit's nonlinear behavior. The resulting data, termed Transfer Function Trajectories (TFT), is fitted with a parametric vector fitting algorithm and further translated to system blocks. These are compatible with VHDL-AMS/Verilog-AMS, Matlab/Simulink or hand calculations at all design stages. The models show high accuracy and a speedup of 10×–40× against the ELDO simulator for large circuits up to 150 nodes.
{"title":"Efficient analytical macromodeling of large analog circuits by Transfer Function Trajectories","authors":"Dimitri de Jonghe, G. Gielen","doi":"10.1109/ICCAD.2011.6105311","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105311","url":null,"abstract":"Automated abstraction of large analog circuits greatly improves simulation time in custom analog design flows. Due to the high degree of variety of circuits this task is mainly a manual ad-hoc approach. This paper proposes an automated modeling approach for large scale analog circuits that produces compact expressions from a SPICE netlist. The presented method builds upon the state-of-the-art Trajectory PieceWise (TPW) approach. Because of their data-driven nature, TPW implementations generate models that require on-the-fly database interpolation during simulation, which is not embedded in a standard commercial design flow. Our approach solves this by recombining TPW samples as a surface in a mixed state space-frequency domain, revealing information about the circuit's nonlinear behavior. The resulting data, termed Transfer Function Trajectories (TFT), is fitted with a parametric vector fitting algorithm and further translated to system blocks. These are compatible with VHDL-AMS/Verilog-AMS, Matlab/Simulink or hand calculations at all design stages. The models show high accuracy and a speedup of 10×–40× against the ELDO simulator for large circuits up to 150 nodes.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"101 1","pages":"91-94"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80800535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-07DOI: 10.1109/ICCAD.2011.6105390
Hratch Mangassarian, A. Veneris, D. Smith, Sean Safarpour
Design debugging has become a resource-intensive bottleneck in modern VLSI CAD flows, consuming as much as 60% of the total verification effort. With typical design sizes exceeding the half-million synthesized gates mark, the growing number of blocks to be examined dramatically slows down the debugging process. The aim of this work is to prune the number of debugging iterations for finding all potential bugs, without affecting the debugging resolution. This is achieved by using structural dominance relationships between circuit components. More specifically, an iterative fixpoint algorithm is presented for finding dominance relationships between multiple-output blocks of the design. These relationships are then leveraged for the early discovery of potential bugs, along with their corrections, resulting in significant debugging speed-ups. Extensive experiments on real industrial designs show that 66% of solutions are discovered early due to dominator implications. This results in consistent performance gains in all cases and a 1.7× overall speed-up for finding all potential bugs, demonstrating the robustness and practicality of the proposed approach.
{"title":"Debugging with dominance: On-the-fly RTL debug solution implications","authors":"Hratch Mangassarian, A. Veneris, D. Smith, Sean Safarpour","doi":"10.1109/ICCAD.2011.6105390","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105390","url":null,"abstract":"Design debugging has become a resource-intensive bottleneck in modern VLSI CAD flows, consuming as much as 60% of the total verification effort. With typical design sizes exceeding the half-million synthesized gates mark, the growing number of blocks to be examined dramatically slows down the debugging process. The aim of this work is to prune the number of debugging iterations for finding all potential bugs, without affecting the debugging resolution. This is achieved by using structural dominance relationships between circuit components. More specifically, an iterative fixpoint algorithm is presented for finding dominance relationships between multiple-output blocks of the design. These relationships are then leveraged for the early discovery of potential bugs, along with their corrections, resulting in significant debugging speed-ups. Extensive experiments on real industrial designs show that 66% of solutions are discovered early due to dominator implications. This results in consistent performance gains in all cases and a 1.7× overall speed-up for finding all potential bugs, demonstrating the robustness and practicality of the proposed approach.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"6 1","pages":"587-594"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79758623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Preconditioned Conjugate Gradient (PCG) method has been demonstrated to be effective in solving large-scale linear systems for sparse and symmetric positive definite matrices. One critical problem in PCG is to design a good preconditioner, which can significantly reduce the runtime while keeping memory usage efficient. Universal preconditioners are simple and easy to construct, but their effectiveness is highly problem-dependent. On the other hand, domain-specific preconditioners that explore the underlying physical meaning of the matrices usually work better, but are difficult to design. In this paper, we study the problem in the context of power grid simulation, and develop a novel preconditioner based on the power grid structure through simple circuit simulations. Experimental results show 43% reduction in the number of iterations and 23% speedup over existing universal preconditioners.
{"title":"On the preconditioner of conjugate gradient method — A power grid simulation perspective","authors":"Chung-Han Chou, Nien-Yu Tsai, Hao Yu, Che-Rung Lee, Yiyu Shi, Shih-Chieh Chang","doi":"10.1109/ICCAD.2011.6105374","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105374","url":null,"abstract":"Preconditioned Conjugate Gradient (PCG) method has been demonstrated to be effective in solving large-scale linear systems for sparse and symmetric positive definite matrices. One critical problem in PCG is to design a good preconditioner, which can significantly reduce the runtime while keeping memory usage efficient. Universal preconditioners are simple and easy to construct, but their effectiveness is highly problem-dependent. On the other hand, domain-specific preconditioners that explore the underlying physical meaning of the matrices usually work better, but are difficult to design. In this paper, we study the problem in the context of power grid simulation, and develop a novel preconditioner based on the power grid structure through simple circuit simulations. Experimental results show 43% reduction in the number of iterations and 23% speedup over existing universal preconditioners.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"28 1","pages":"494-497"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84365802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-07DOI: 10.1109/ICCAD.2011.6105416
Xiaochun Yu, R. D. Blanton
At substantial cost, conventional methods for evaluating test quality apply a specially-generated test set to a large population of manufactured chips. In contrast, a new time-efficient framework for evaluating test quality (FETQ) that uses tester data from normal production has been developed and validated. FETQ estimates the quality of both static and adaptive test metrics, where the latter guides test using the results of statistical data analysis. FETQ is innovative since instead of evaluating a single measure of effectiveness (e.g., number of unique defects detected), it provides a confidence interval of effectiveness based on the analysis of a collection of test sets. FETQ is demonstrated by measuring the chip-detection capability of several static and adaptive test metrics using tester data from actual ICs.
{"title":"Statistical defect-detection analysis of test sets using readily-available tester data","authors":"Xiaochun Yu, R. D. Blanton","doi":"10.1109/ICCAD.2011.6105416","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105416","url":null,"abstract":"At substantial cost, conventional methods for evaluating test quality apply a specially-generated test set to a large population of manufactured chips. In contrast, a new time-efficient framework for evaluating test quality (FETQ) that uses tester data from normal production has been developed and validated. FETQ estimates the quality of both static and adaptive test metrics, where the latter guides test using the results of statistical data analysis. FETQ is innovative since instead of evaluating a single measure of effectiveness (e.g., number of unique defects detected), it provides a confidence interval of effectiveness based on the analysis of a collection of test sets. FETQ is demonstrated by measuring the chip-detection capability of several static and adaptive test metrics using tester data from actual ICs.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"23 1","pages":"768-773"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76664345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-07DOI: 10.1109/ICCAD.2011.6105412
Hua-Yu Chang, I. Jiang, Yao-Wen Chang
Due to the rapidly increasing design complexity in modern IC design, more and more timing failures are detected at late stages. Without deferring time-to-market, metal-only ECO is an economical technique to correct these late-found failures. Typically, a design undergoes many ECO runs in design houses; the usage of spare cells is of significant importance. Hence, in this paper, we aim at timing ECO using the least number of spare cells. We observe that a path with good timing is desired to be geometrically smooth. Different from negative slack and gate delay used in most of prior work, we propose a new metric of timing criticality — fixability — considering the smoothness of critical paths. To measure the smoothness of a path, we use Bézier curve as the golden path. Furthermore, in order to concurrently fix timing violations, we derive the dominance property to divide violated paths into independent segments. Based on Bézier curve smoothing, fixability identification, and the dominance property, we develop an efficient algorithm to fix violations. Compared with the state-of-the-art works, experimental results show that our algorithm not only effectively resolves all timing violations with few spare cells but also achieves 22.8X and 42.6X speedups.
{"title":"Timing ECO optimization via Bézier curve smoothing and fixability identification","authors":"Hua-Yu Chang, I. Jiang, Yao-Wen Chang","doi":"10.1109/ICCAD.2011.6105412","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105412","url":null,"abstract":"Due to the rapidly increasing design complexity in modern IC design, more and more timing failures are detected at late stages. Without deferring time-to-market, metal-only ECO is an economical technique to correct these late-found failures. Typically, a design undergoes many ECO runs in design houses; the usage of spare cells is of significant importance. Hence, in this paper, we aim at timing ECO using the least number of spare cells. We observe that a path with good timing is desired to be geometrically smooth. Different from negative slack and gate delay used in most of prior work, we propose a new metric of timing criticality — fixability — considering the smoothness of critical paths. To measure the smoothness of a path, we use Bézier curve as the golden path. Furthermore, in order to concurrently fix timing violations, we derive the dominance property to divide violated paths into independent segments. Based on Bézier curve smoothing, fixability identification, and the dominance property, we develop an efficient algorithm to fix violations. Compared with the state-of-the-art works, experimental results show that our algorithm not only effectively resolves all timing violations with few spare cells but also achieves 22.8X and 42.6X speedups.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"40 1","pages":"742-746"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85035740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a cooperation between the programmer, the compiler and the runtime system to identify, exploit and efficiently exercise the parallelism available in many pointer based applications. Our parallelization strategy, called Cooperative Parallelization, is driven by programmer directives as well as runtime information. We show that minimal information from the programmer can be combined with runtime information to extract latent parallelism in many pointer intensive applications that involve trees and linked lists. We implemented a compilation framework which automatically parallelizes programs annotated with parallelism directives. We evaluated our approach on a collection of linked list and tree based applications. Our results show that we can achieve speedups of up to 15× on a sixteen-core platform. We also compared our approach to OpenMP both qualitatively and quantitatively.
{"title":"Cooperative parallelization","authors":"Praveen Yedlapalli, Emre Kultursay, M. Kandemir","doi":"10.5555/2132325.2132358","DOIUrl":"https://doi.org/10.5555/2132325.2132358","url":null,"abstract":"We propose a cooperation between the programmer, the compiler and the runtime system to identify, exploit and efficiently exercise the parallelism available in many pointer based applications. Our parallelization strategy, called Cooperative Parallelization, is driven by programmer directives as well as runtime information. We show that minimal information from the programmer can be combined with runtime information to extract latent parallelism in many pointer intensive applications that involve trees and linked lists. We implemented a compilation framework which automatically parallelizes programs annotated with parallelism directives. We evaluated our approach on a collection of linked list and tree based applications. Our results show that we can achieve speedups of up to 15× on a sixteen-core platform. We also compared our approach to OpenMP both qualitatively and quantitatively.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"1 1","pages":"134-141"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83546401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-07DOI: 10.1109/ICCAD.2011.6105299
R. Ghaida, K. Agarwal, S. Nassif, Xin Yuan, L. Liebmann, Puneet Gupta
While the next generation of lithography systems is still under development, extending optical lithography using double patterning (DP) is the only solution to continue technology scaling. The biggest technical challenge of DP is the presence of mask-assignment conflicts in dense layers. In this paper, we propose a framework for DP conflict removal for standard cells. First, we offer an O(n) algorithm for mask assignment (up to 200× faster than the ILP-based approach) that guarantees a conflict-free solution if one exists. We then formulate the problem of conflict removal as a linear program (LP), which permits an extremely fast run-time (less than 10 seconds in real time for typical cells). The framework removes DP conflicts and legalizes the layout across all layers simultaneously while minimizing layout perturbation. For cells from a commercial 22nm library designed without any DP awareness, our method usually removes all DP conflicts without any area increase; for some complex cells, the method still removes all conflicts with a modest 6.7% average increase in area. The method is more general, however, and can also be applied for macro layouts and the interconnect layers in complete designs as we demonstrate in the paper.
{"title":"A framework for double patterning-enabled design","authors":"R. Ghaida, K. Agarwal, S. Nassif, Xin Yuan, L. Liebmann, Puneet Gupta","doi":"10.1109/ICCAD.2011.6105299","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105299","url":null,"abstract":"While the next generation of lithography systems is still under development, extending optical lithography using double patterning (DP) is the only solution to continue technology scaling. The biggest technical challenge of DP is the presence of mask-assignment conflicts in dense layers. In this paper, we propose a framework for DP conflict removal for standard cells. First, we offer an O(n) algorithm for mask assignment (up to 200× faster than the ILP-based approach) that guarantees a conflict-free solution if one exists. We then formulate the problem of conflict removal as a linear program (LP), which permits an extremely fast run-time (less than 10 seconds in real time for typical cells). The framework removes DP conflicts and legalizes the layout across all layers simultaneously while minimizing layout perturbation. For cells from a commercial 22nm library designed without any DP awareness, our method usually removes all DP conflicts without any area increase; for some complex cells, the method still removes all conflicts with a modest 6.7% average increase in area. The method is more general, however, and can also be applied for macro layouts and the interconnect layers in complete designs as we demonstrate in the paper.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"1 1","pages":"14-20"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83743705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pulsed latches have emerged as a popular technique to reduce the power consumption and delay for clock networks. However, the current physical synthesis flow for pulsed latches still performs circuit placement and clock-network synthesis separately, which limits achievable power reduction. This paper presents the first work in the literature to perform placement and clock-network co-synthesis for pulsed-latch designs. With the interplay between placement and clock-network synthesis, the clock-network power and timing can be optimized simultaneously. Novel progressive network forces are introduced to globally guide the placer for iterative improvements, while the clock-network synthesizer makes use of updated latch locations to optimize power and timing locally. Experimental results show that our framework can substantially minimize power consumption and improve timing slacks, compared to existing synthesis flows.
{"title":"PRICE: Power reduction by placement and clock-network co-synthesis for pulsed-latch designs","authors":"Yi-Lin Chuang, Hong-Ting Lin, Tsung-Yi Ho, Yao-Wen Chang, Diana Marculescu","doi":"10.1109/ICCAD.2011.6105310","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105310","url":null,"abstract":"Pulsed latches have emerged as a popular technique to reduce the power consumption and delay for clock networks. However, the current physical synthesis flow for pulsed latches still performs circuit placement and clock-network synthesis separately, which limits achievable power reduction. This paper presents the first work in the literature to perform placement and clock-network co-synthesis for pulsed-latch designs. With the interplay between placement and clock-network synthesis, the clock-network power and timing can be optimized simultaneously. Novel progressive network forces are introduced to globally guide the placer for iterative improvements, while the clock-network synthesizer makes use of updated latch locations to optimize power and timing locally. Experimental results show that our framework can substantially minimize power consumption and improve timing slacks, compared to existing synthesis flows.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"123 1","pages":"85-90"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89426424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}