Pub Date : 2004-02-16DOI: 10.1007/978-1-4020-6488-3_5
M. Jersak, R. Henia, R. Ernst
{"title":"Context-aware performance analysis for efficient embedded system design","authors":"M. Jersak, R. Henia, R. Ernst","doi":"10.1007/978-1-4020-6488-3_5","DOIUrl":"https://doi.org/10.1007/978-1-4020-6488-3_5","url":null,"abstract":"","PeriodicalId":335658,"journal":{"name":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124361528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-02-16DOI: 10.1007/1-4020-8076-X_13
A. Bona, V. Zaccaria, R. Zafalon
{"title":"System level power modeling and simulation of high-end industrial network-on-chip","authors":"A. Bona, V. Zaccaria, R. Zafalon","doi":"10.1007/1-4020-8076-X_13","DOIUrl":"https://doi.org/10.1007/1-4020-8076-X_13","url":null,"abstract":"","PeriodicalId":335658,"journal":{"name":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124427888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-02-16DOI: 10.1109/DATE.2004.1268938
Mladen Nikitovic, M. Brorsson
In this paper, we have investigated the efficiency of two power-saving strategies that reduces both static and dynamic power consumption when applied to a chip-multiprocessor (CMP). They are evaluated under two workload scenarios and compared against a conventional uni-processor architecture and a CMP without any power-aware scheduling. The results show that energy due to static and dynamic power consumption can be reduced by up to 78% and that further 8% energy can be saved at the expense of response-time of non-critical applications. Furthermore, a small study on the potential impact of system-level events showed that system calls can contribute significantly to the total energy consumed.
{"title":"A low power strategy for future mobile terminals","authors":"Mladen Nikitovic, M. Brorsson","doi":"10.1109/DATE.2004.1268938","DOIUrl":"https://doi.org/10.1109/DATE.2004.1268938","url":null,"abstract":"In this paper, we have investigated the efficiency of two power-saving strategies that reduces both static and dynamic power consumption when applied to a chip-multiprocessor (CMP). They are evaluated under two workload scenarios and compared against a conventional uni-processor architecture and a CMP without any power-aware scheduling. The results show that energy due to static and dynamic power consumption can be reduced by up to 78% and that further 8% energy can be saved at the expense of response-time of non-critical applications. Furthermore, a small study on the potential impact of system-level events showed that system calls can contribute significantly to the total energy consumed.","PeriodicalId":335658,"journal":{"name":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132294319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-02-16DOI: 10.1109/DATE.2004.1269055
Arijit Mondal, P. Chakrabarti, C. Mandal
Present day designers require deep reasoning methods to analyze circuit timing. This includes analysis of effects of dynamic behavior (like glitches) on critical paths, simultaneous switching and identification of specific patterns and their timings. This paper proposes a novel approach that uses a combination of symbolic event propagation and temporal reasoning to extract timing properties of gate-level circuits. The formulation captures complex situations like triggering of traditional false paths and simultaneous switching in a unified symbolic representation in addition to identifying false paths, critical paths as well as conditions for such situations. This information is then represented as an event-time graph. A simple temporal logic on events is proposed that can be used to formulate a wide class of useful queries for various input scenarios. These include maximum/minimum delays, transition times, duration of patterns, etc. An algorithm is developed that retrieves answers to such queries from the event-time graph. A complete BDD based implementation of this system has been made. Results on the ISCAS85 benchmarks indicate very interesting properties of these circuits.
{"title":"A new approach to timing analysis using event propagation and temporal logic","authors":"Arijit Mondal, P. Chakrabarti, C. Mandal","doi":"10.1109/DATE.2004.1269055","DOIUrl":"https://doi.org/10.1109/DATE.2004.1269055","url":null,"abstract":"Present day designers require deep reasoning methods to analyze circuit timing. This includes analysis of effects of dynamic behavior (like glitches) on critical paths, simultaneous switching and identification of specific patterns and their timings. This paper proposes a novel approach that uses a combination of symbolic event propagation and temporal reasoning to extract timing properties of gate-level circuits. The formulation captures complex situations like triggering of traditional false paths and simultaneous switching in a unified symbolic representation in addition to identifying false paths, critical paths as well as conditions for such situations. This information is then represented as an event-time graph. A simple temporal logic on events is proposed that can be used to formulate a wide class of useful queries for various input scenarios. These include maximum/minimum delays, transition times, duration of patterns, etc. An algorithm is developed that retrieves answers to such queries from the event-time graph. A complete BDD based implementation of this system has been made. Results on the ISCAS85 benchmarks indicate very interesting properties of these circuits.","PeriodicalId":335658,"journal":{"name":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133773454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-02-16DOI: 10.1109/DATE.2004.1268880
O. Sinanoglu, A. Orailoglu
Scan-based cores impose considerable test power challenges due to excessive switching activity during shift cycles. The consequent test power constraints force SOC designers to sacrifice parallelism among core tests, as exceeding power thresholds may damage the chip being tested. Reduction of test power for SOC cores can thus increase the number of cores that can be tested in parallel, improving significantly SOC test application time. In this paper, we propose a scan chain modification technique that inserts logic gates on the scan path. The consequent beneficial test data transformations are utilized to reduce the scan chain transitions during shift cycles and hence test power. We introduce a matrix band algebra that models the impact of logic gate insertion between scan cells on the test stimulus and response transformations realized. As we have successfully modeled the response transformations as well, the methodology we propose is capable of truly minimizing the overall test power. The test vectors and responses are analyzed in an intertwined manner, identifying the best possible scan chain modification, which is realized at minimal area cost. Experimental results justify the efficacy of the proposed methodology as well.
{"title":"Scan power minimization through stimulus and response transformations","authors":"O. Sinanoglu, A. Orailoglu","doi":"10.1109/DATE.2004.1268880","DOIUrl":"https://doi.org/10.1109/DATE.2004.1268880","url":null,"abstract":"Scan-based cores impose considerable test power challenges due to excessive switching activity during shift cycles. The consequent test power constraints force SOC designers to sacrifice parallelism among core tests, as exceeding power thresholds may damage the chip being tested. Reduction of test power for SOC cores can thus increase the number of cores that can be tested in parallel, improving significantly SOC test application time. In this paper, we propose a scan chain modification technique that inserts logic gates on the scan path. The consequent beneficial test data transformations are utilized to reduce the scan chain transitions during shift cycles and hence test power. We introduce a matrix band algebra that models the impact of logic gate insertion between scan cells on the test stimulus and response transformations realized. As we have successfully modeled the response transformations as well, the methodology we propose is capable of truly minimizing the overall test power. The test vectors and responses are analyzed in an intertwined manner, identifying the best possible scan chain modification, which is realized at minimal area cost. Experimental results justify the efficacy of the proposed methodology as well.","PeriodicalId":335658,"journal":{"name":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133606543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-02-16DOI: 10.1109/DATE.2004.1269084
George F. Viamontes, I. Markov, J. Hayes
Simulating quantum computation on a classical computer is a difficult problem. The matrices representing quantum gates, and vectors modeling qubit states grow exponentially with the number of qubits. It has been shown experimentally that the QuIDD (Quantum Information Decision Diagram) datastructure greatly facilitates simulations using memory and runtime that are polynomial in the number of qubits. In this paper, we present a complexity analysis which formally describes this class of matrices and vectors. We also present an improved implementation of QuIDDs which can simulate Grover's algorithm for quantum search with the asymptotic runtime complexity of an ideal quantum computer up to negligible overhead.
{"title":"High-performance QuIDD-based simulation of quantum circuits","authors":"George F. Viamontes, I. Markov, J. Hayes","doi":"10.1109/DATE.2004.1269084","DOIUrl":"https://doi.org/10.1109/DATE.2004.1269084","url":null,"abstract":"Simulating quantum computation on a classical computer is a difficult problem. The matrices representing quantum gates, and vectors modeling qubit states grow exponentially with the number of qubits. It has been shown experimentally that the QuIDD (Quantum Information Decision Diagram) datastructure greatly facilitates simulations using memory and runtime that are polynomial in the number of qubits. In this paper, we present a complexity analysis which formally describes this class of matrices and vectors. We also present an improved implementation of QuIDDs which can simulate Grover's algorithm for quantum search with the asymptotic runtime complexity of an ideal quantum computer up to negligible overhead.","PeriodicalId":335658,"journal":{"name":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127008422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-02-16DOI: 10.1109/DATE.2004.1269111
J. Minz, M. Pathak, S. Lim
In this paper, we study the net and pin distribution problem for global routing targeting three dimensional packaging layout via system-on-package (SOP). The routing environment for the new emerging mixed-signal SOP technology is more advanced than that of the conventional PCB or MCM technology - pins are located at all layers of SOP packaging substrate rather than the top-most layer only. This is the first work to formulate and solve the multi-layer net and pin distribution for layer, wirelength, and crosstalk minimization.
{"title":"Net and pin distribution for 3D package global routing","authors":"J. Minz, M. Pathak, S. Lim","doi":"10.1109/DATE.2004.1269111","DOIUrl":"https://doi.org/10.1109/DATE.2004.1269111","url":null,"abstract":"In this paper, we study the net and pin distribution problem for global routing targeting three dimensional packaging layout via system-on-package (SOP). The routing environment for the new emerging mixed-signal SOP technology is more advanced than that of the conventional PCB or MCM technology - pins are located at all layers of SOP packaging substrate rather than the top-most layer only. This is the first work to formulate and solve the multi-layer net and pin distribution for layer, wirelength, and crosstalk minimization.","PeriodicalId":335658,"journal":{"name":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130476081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-02-16DOI: 10.1109/DATE.2004.1269219
J. C. Diaz, Marta Saburit
This paper describes the clock management of a mixed signal, high-speed, multi-clock, fully synchronous circuit. The MA1111A13 circuit clock distribution is a complicated structure that seamlessly incorporates different well-known techniques for power reduction, asynchronous clock domains inter-operability, and compatibility with different IO timing standards and data rates. This complex clocking scheme has been successfully integrated into the standard semi-custom physical design flow. The physical implementation of the clock network with synopsys astro is also presented.
{"title":"Clock management in a Gigabit Ethernet physical layer transceiver circuit","authors":"J. C. Diaz, Marta Saburit","doi":"10.1109/DATE.2004.1269219","DOIUrl":"https://doi.org/10.1109/DATE.2004.1269219","url":null,"abstract":"This paper describes the clock management of a mixed signal, high-speed, multi-clock, fully synchronous circuit. The MA1111A13 circuit clock distribution is a complicated structure that seamlessly incorporates different well-known techniques for power reduction, asynchronous clock domains inter-operability, and compatibility with different IO timing standards and data rates. This complex clocking scheme has been successfully integrated into the standard semi-custom physical design flow. The physical implementation of the clock network with synopsys astro is also presented.","PeriodicalId":335658,"journal":{"name":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","volume":"05 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130645000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-02-16DOI: 10.1109/DATE.2004.1268999
A. Jalabert, S. Murali, L. Benini, G. Micheli
Future systems on chips (SoCs) will integrate a large number of processor and storage cores onto a single chip and require networks on chip (NoC) to support the heavy communication demands of the system. The individual components of the SoCs will be heterogeneous in nature with widely varying functionality and communication requirements. The communication infrastructure should optimally match communication patterns among these components accounting for the individual component needs. In this paper we present /spl times/pipesCompiler, a tool for automatically instantiating an application-specific NoC for heterogeneous multi-processor SoCs. The /spl times/pipesCompiler instantiates a network of building blocks from a library of composable soft macros (switches, network interfaces and links) described in SystemC at the cycle-accurate level. The network components are optimized for that particular network and support reliable, latency-insensitive operation. Example systems with application-specific NoCs built using the /spl times/pipesCompiler show large savings in area (factor of 6.5), power (factor of 2.4) and latency (factor of 1.42) when compared to a general-purpose mesh-based NoC architecture.
{"title":"/spl times/pipesCompiler: a tool for instantiating application specific networks on chip","authors":"A. Jalabert, S. Murali, L. Benini, G. Micheli","doi":"10.1109/DATE.2004.1268999","DOIUrl":"https://doi.org/10.1109/DATE.2004.1268999","url":null,"abstract":"Future systems on chips (SoCs) will integrate a large number of processor and storage cores onto a single chip and require networks on chip (NoC) to support the heavy communication demands of the system. The individual components of the SoCs will be heterogeneous in nature with widely varying functionality and communication requirements. The communication infrastructure should optimally match communication patterns among these components accounting for the individual component needs. In this paper we present /spl times/pipesCompiler, a tool for automatically instantiating an application-specific NoC for heterogeneous multi-processor SoCs. The /spl times/pipesCompiler instantiates a network of building blocks from a library of composable soft macros (switches, network interfaces and links) described in SystemC at the cycle-accurate level. The network components are optimized for that particular network and support reliable, latency-insensitive operation. Example systems with application-specific NoCs built using the /spl times/pipesCompiler show large savings in area (factor of 6.5), power (factor of 2.4) and latency (factor of 1.42) when compared to a general-purpose mesh-based NoC architecture.","PeriodicalId":335658,"journal":{"name":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116620063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-02-16DOI: 10.1109/DATE.2004.1268876
H. Posadas, F. Herrera, P. Sánchez, E. Villar, Francisco Blasco
As both the ITRS and the Medea+ DA Roadmaps have highlighted, early performance estimation is an essential step in any SoC design methodology based on International Technology Roadmap for Semiconductors (2001) and The MEDEA+ Design Automation Roadmap (2002). This paper presents a C++ library for timing estimation at system level. The library is based on a general and systematic methodology that takes as input the original SystemC source code without any modification and provides the estimation parameters by simply including the library within a usual simulation. As a consequence, the same models of computation used during system design are preserved and all simulation conditions are maintained. The method exploits the advantages of dynamic analysis, that is, easy management of unpredictable data-dependent conditions and computational efficiency compared with other alternatives (ISS or RT simulation, without the need for SW generation and compilation and HW synthesis). Results obtained on several examples show the accuracy of the method. In addition to the fundamental parameters needed for system-level design exploration, the proposed methodology allows the designer to include capture points at any place in the code. The user can process the corresponding captured events for unrestricted timing constraint verification.
{"title":"System-level performance analysis in SystemC","authors":"H. Posadas, F. Herrera, P. Sánchez, E. Villar, Francisco Blasco","doi":"10.1109/DATE.2004.1268876","DOIUrl":"https://doi.org/10.1109/DATE.2004.1268876","url":null,"abstract":"As both the ITRS and the Medea+ DA Roadmaps have highlighted, early performance estimation is an essential step in any SoC design methodology based on International Technology Roadmap for Semiconductors (2001) and The MEDEA+ Design Automation Roadmap (2002). This paper presents a C++ library for timing estimation at system level. The library is based on a general and systematic methodology that takes as input the original SystemC source code without any modification and provides the estimation parameters by simply including the library within a usual simulation. As a consequence, the same models of computation used during system design are preserved and all simulation conditions are maintained. The method exploits the advantages of dynamic analysis, that is, easy management of unpredictable data-dependent conditions and computational efficiency compared with other alternatives (ISS or RT simulation, without the need for SW generation and compilation and HW synthesis). Results obtained on several examples show the accuracy of the method. In addition to the fundamental parameters needed for system-level design exploration, the proposed methodology allows the designer to include capture points at any place in the code. The user can process the corresponding captured events for unrestricted timing constraint verification.","PeriodicalId":335658,"journal":{"name":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117005244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}