Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106753
Surrendra Dudani, Jayant Nagda
We present a methodology to obtain high level functional verification closure. We discuss the current advances in critical technologies that are part of the verification closure solution. Within this methodology, assertion specifications are the starting point and central to the collaboration required between various verification tasks to efficiently search for tests and allow automation to proceed. The goal of verification closure is to generate a complete set of tests that meet the design quality criteria established for the design.
{"title":"High level functional verification closure","authors":"Surrendra Dudani, Jayant Nagda","doi":"10.1109/ICCD.2002.1106753","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106753","url":null,"abstract":"We present a methodology to obtain high level functional verification closure. We discuss the current advances in critical technologies that are part of the verification closure solution. Within this methodology, assertion specifications are the starting point and central to the collaboration required between various verification tasks to efficiently search for tests and allow automation to proceed. The goal of verification closure is to generate a complete set of tests that meet the design quality criteria established for the design.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116986788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106820
Stefan Ihmor, M. Visarius, W. Hardt
The complexity of embedded systems has increased rapidly during the last years. Several design approaches, including system-level design as well as IP-based design, have improved the design process. The rising number of instantiated components implicates a set of complex interfaces. High-speed data transmission rates, fault tolerance and predictability are key challenges for interface design. Thus high sophisticated interfaces have to be generated with respect to the different applications. In this paper we present a design methodology for application-specific real-time interfaces. The high-level design specification is done in a UML-based formalism. An interface-block (IFB) is derived from this specification. The IFB handles data sequencing and protocol generation. Both parts are controlled hierarchically. Within the IFB all application specific restrictions, channel features, and target platform characteristics are taken into account. Our approach is illustrated by a case study implementing a real-time communication between two interacting robots.
{"title":"A design methodology for application-specific real-time interfaces","authors":"Stefan Ihmor, M. Visarius, W. Hardt","doi":"10.1109/ICCD.2002.1106820","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106820","url":null,"abstract":"The complexity of embedded systems has increased rapidly during the last years. Several design approaches, including system-level design as well as IP-based design, have improved the design process. The rising number of instantiated components implicates a set of complex interfaces. High-speed data transmission rates, fault tolerance and predictability are key challenges for interface design. Thus high sophisticated interfaces have to be generated with respect to the different applications. In this paper we present a design methodology for application-specific real-time interfaces. The high-level design specification is done in a UML-based formalism. An interface-block (IFB) is derived from this specification. The IFB handles data sequencing and protocol generation. Both parts are controlled hierarchically. Within the IFB all application specific restrictions, channel features, and target platform characteristics are taken into account. Our approach is illustrated by a case study implementing a real-time communication between two interacting robots.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127105717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106756
E. Costa, S. Bampi, J. Monteiro
We present a new architecture for signed multiplication which maintains the pure form of an array multiplier, exhibiting a much lower overhead than the Booth architecture. This architecture is extended for radix-2/sup m/ encoding, which leads to a reduction of the number of partial lines, enabling a significant improvement in performance and power consumption. The flexibility of our architecture allows for the easy construction of multipliers for different values of m, as opposed to the Booth architecture for which implementations for m > 2 are complex. The results we present show that the proposed architecture with radix-4 compares favorably in performance and power with the Modified Booth multiplier. We have experimented our architecture with different values of m and concluded that m = 4 minimizes both delay and power.
{"title":"A new architecture for signed radix-2/sup m/ pure array multipliers","authors":"E. Costa, S. Bampi, J. Monteiro","doi":"10.1109/ICCD.2002.1106756","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106756","url":null,"abstract":"We present a new architecture for signed multiplication which maintains the pure form of an array multiplier, exhibiting a much lower overhead than the Booth architecture. This architecture is extended for radix-2/sup m/ encoding, which leads to a reduction of the number of partial lines, enabling a significant improvement in performance and power consumption. The flexibility of our architecture allows for the easy construction of multipliers for different values of m, as opposed to the Booth architecture for which implementations for m > 2 are complex. The results we present show that the proposed architecture with radix-4 compares favorably in performance and power with the Modified Booth multiplier. We have experimented our architecture with different values of m and concluded that m = 4 minimizes both delay and power.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"459 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125833384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106802
Carlos Molina, Antonio González, Jordi Tubella
This paper presents a novel microarchitecture to exploit trace-level speculation by means of two threads working cooperatively in a speculative and non-speculative way respectively. The architecture presents two main benefits: (a) no significant penalties are introduced in the presence of a misspeculation and (b) any type of trace predictor can work together with this proposal. In this way, aggressive trace predictors can be incorporated since misspeculations do not introduce significant penalties. We describe in detail TSMA (trace-level speculative multithreaded architecture) and present initial results to show the benefits of this proposal. We show how simple trace predictors achieve significant speed-up in the majority of cases. Results of a simple trace speculation mechanism show an average speed-up of 16%.
{"title":"Trace-level speculative multithreaded architecture","authors":"Carlos Molina, Antonio González, Jordi Tubella","doi":"10.1109/ICCD.2002.1106802","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106802","url":null,"abstract":"This paper presents a novel microarchitecture to exploit trace-level speculation by means of two threads working cooperatively in a speculative and non-speculative way respectively. The architecture presents two main benefits: (a) no significant penalties are introduced in the presence of a misspeculation and (b) any type of trace predictor can work together with this proposal. In this way, aggressive trace predictors can be incorporated since misspeculations do not introduce significant penalties. We describe in detail TSMA (trace-level speculative multithreaded architecture) and present initial results to show the benefits of this proposal. We show how simple trace predictors achieve significant speed-up in the majority of cases. Results of a simple trace speculation mechanism show an average speed-up of 16%.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122448741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106780
I-De Huang, S. Gupta, M. Breuer
We have developed an accurate and efficient methodology to perform static timing analysis (STA) in combinational logic blocks in the presence of multiple crosstalk-induced noise effects. The crosstalk model used is more accurate because it considers skew, input transition times, and driver strengths. This crosstalk model is enhanced to handle timing ranges for performing STA. The methodology also uses more accurate delay models for gates. The presence of one or more coupling capacitances can create cyclic timing dependencies, even in an otherwise acyclic circuit. We have developed an approach to partition the circuit into minimal timing-iterative subcircuits (TISs) that encapsulate the cyclic timing dependencies. When used in conjunction with our levelization procedure, iterative timing analysis is confined within individual TISs. We have demonstrated that the maximum arrival time values computed by the proposed STA using integrated delay models are much closer to detailed circuit simulation results than an STA that uses the 3C/sub c/ delay model.
{"title":"Accurate and efficient static timing analysis with crosstalk","authors":"I-De Huang, S. Gupta, M. Breuer","doi":"10.1109/ICCD.2002.1106780","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106780","url":null,"abstract":"We have developed an accurate and efficient methodology to perform static timing analysis (STA) in combinational logic blocks in the presence of multiple crosstalk-induced noise effects. The crosstalk model used is more accurate because it considers skew, input transition times, and driver strengths. This crosstalk model is enhanced to handle timing ranges for performing STA. The methodology also uses more accurate delay models for gates. The presence of one or more coupling capacitances can create cyclic timing dependencies, even in an otherwise acyclic circuit. We have developed an approach to partition the circuit into minimal timing-iterative subcircuits (TISs) that encapsulate the cyclic timing dependencies. When used in conjunction with our levelization procedure, iterative timing analysis is confined within individual TISs. We have demonstrated that the maximum arrival time values computed by the proposed STA using integrated delay models are much closer to detailed circuit simulation results than an STA that uses the 3C/sub c/ delay model.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114408711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106803
Manvi Agarwal, S. Nandy, J. V. Eijndhoven, S. Balakrishnan
VLIW processors are statically scheduled processors and their performance depends on the quality of schedules generated by the compiler's scheduler. We propose a new scheduling scheme where the application is first divided into decision trees and then further split into traces. Traces are speculatively scheduled on the processor based on their probability of execution. We have developed a tool "SpliTree" to generate traces automatically. By using dynamic branch prediction for scheduling traces our scheme achieves approximately 1.4/spl times/ performance improvement over that using decision trees for Spec92 benchmarks simulated on TriMedia/spl trade/.
{"title":"Speculative trace scheduling in VLIW processors","authors":"Manvi Agarwal, S. Nandy, J. V. Eijndhoven, S. Balakrishnan","doi":"10.1109/ICCD.2002.1106803","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106803","url":null,"abstract":"VLIW processors are statically scheduled processors and their performance depends on the quality of schedules generated by the compiler's scheduler. We propose a new scheduling scheme where the application is first divided into decision trees and then further split into traces. Traces are speculatively scheduled on the processor based on their probability of execution. We have developed a tool \"SpliTree\" to generate traces automatically. By using dynamic branch prediction for scheduling traces our scheme achieves approximately 1.4/spl times/ performance improvement over that using decision trees for Spec92 benchmarks simulated on TriMedia/spl trade/.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129780769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106811
M. Ley, H. Grünbacher
This paper describes the architecture and implementation of the first industrial single chip communication controller for the Time Triggered Protocol (TTP/C). TTP/C is an emerging communication protocol for fault-tolerant real time systems. Typical applications are safety-critical digital control systems such as drive-by-wire and fly-by-wire. We applied a VHDL based design flow to implement an application specific RISC core with several specialized peripheral blocks, RAMs, flash memory and analog cells. For production of the 27 mm/sup 2/ chip a 0.35 /spl mu/ Flash-CMOS technology is used Fully tested samples are already available and proved the design to be "first time right".
{"title":"TTA-C2 a single chip communication controller for the time-triggered-protocol","authors":"M. Ley, H. Grünbacher","doi":"10.1109/ICCD.2002.1106811","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106811","url":null,"abstract":"This paper describes the architecture and implementation of the first industrial single chip communication controller for the Time Triggered Protocol (TTP/C). TTP/C is an emerging communication protocol for fault-tolerant real time systems. Typical applications are safety-critical digital control systems such as drive-by-wire and fly-by-wire. We applied a VHDL based design flow to implement an application specific RISC core with several specialized peripheral blocks, RAMs, flash memory and analog cells. For production of the 27 mm/sup 2/ chip a 0.35 /spl mu/ Flash-CMOS technology is used Fully tested samples are already available and proved the design to be \"first time right\".","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128513639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106778
W. Srisa-an, D. Lo, J. M. Chang
The Active Memory System - a garbage collected memory module - was introduced as a way to provide hardware support for garbage collection in embedded systems. The major component in the design was the Active Memory Processor (AMP) that utilized a set of bit-maps and a combinational circuit to perform mark-sweep garbage collection. The design can achieve constant time for both allocation and sweeping. In this paper two enhancements are made to the design of AMP so that it can perform one-bit reference counting that postpones the need to perform garbage collection. Moreover, a caching mechanism is also introduced to reduce the hardware cost of the design. The experimental results show that the proposed modification can reduce the number of garbage collection invocations by 76%. The speed-up in marking time can be as much as 5.81. With the caching mechanism, the hardware cost can be as small as 27 K gates and 6 KB of SRAM.
{"title":"Performance enhancements to the Active Memory System","authors":"W. Srisa-an, D. Lo, J. M. Chang","doi":"10.1109/ICCD.2002.1106778","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106778","url":null,"abstract":"The Active Memory System - a garbage collected memory module - was introduced as a way to provide hardware support for garbage collection in embedded systems. The major component in the design was the Active Memory Processor (AMP) that utilized a set of bit-maps and a combinational circuit to perform mark-sweep garbage collection. The design can achieve constant time for both allocation and sweeping. In this paper two enhancements are made to the design of AMP so that it can perform one-bit reference counting that postpones the need to perform garbage collection. Moreover, a caching mechanism is also introduced to reduce the hardware cost of the design. The experimental results show that the proposed modification can reduce the number of garbage collection invocations by 76%. The speed-up in marking time can be as much as 5.81. With the caching mechanism, the hardware cost can be as small as 27 K gates and 6 KB of SRAM.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124749181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106759
Jari Nikara, S. Vassiliadis, J. Takala, M. Sima, P. Liuha
In this paper a parallel Variable-Length Decoding (VLD) scheme is introduced. The scheme is capable of decoding all the codewords in an N-bit buffer whose accumulated codelength is at most N. The proposed method partially breaks the recursive dependency related to the MPEG-2 VLD. All possible codewords in the buffer are detected in parallel and the sum of the codelengths is provided to the external shifter aligning the variable-length coded input stream for a new decoding cycle. Two length detection mechanisms are proposed: the first approach determines the length in a parallel/serial fashion and the second using a new device denoted as MultiplexedAdd. In order to prove feasibility and determine the limiting factors of our proposal, the parallel/serial codeword detector with 32-bit input has been described in behavioral non-optimized VHDL and mapped onto Altera's ACEX EP1K100 FPGA. The implemented prototype exhibits a latency of 110 ns and uses 32% of the logic cells of the device. When applied to MPEG-2 standard benchmark scenes, on average 3.5 symbols are decoded per cycle.
{"title":"Parallel multiple-symbol variable-length decoding","authors":"Jari Nikara, S. Vassiliadis, J. Takala, M. Sima, P. Liuha","doi":"10.1109/ICCD.2002.1106759","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106759","url":null,"abstract":"In this paper a parallel Variable-Length Decoding (VLD) scheme is introduced. The scheme is capable of decoding all the codewords in an N-bit buffer whose accumulated codelength is at most N. The proposed method partially breaks the recursive dependency related to the MPEG-2 VLD. All possible codewords in the buffer are detected in parallel and the sum of the codelengths is provided to the external shifter aligning the variable-length coded input stream for a new decoding cycle. Two length detection mechanisms are proposed: the first approach determines the length in a parallel/serial fashion and the second using a new device denoted as MultiplexedAdd. In order to prove feasibility and determine the limiting factors of our proposal, the parallel/serial codeword detector with 32-bit input has been described in behavioral non-optimized VHDL and mapped onto Altera's ACEX EP1K100 FPGA. The implemented prototype exhibits a latency of 110 ns and uses 32% of the logic cells of the device. When applied to MPEG-2 standard benchmark scenes, on average 3.5 symbols are decoded per cycle.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130179470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-16DOI: 10.1109/ICCD.2002.1106815
N. Z. Basturkmen, S. Reddy, I. Pomeranz
Peak power consumption during testing is an important concern. For scan designs, a high level of switching activity is created in the circuit during scan shifts, which increases power consumption considerably. In this paper we propose a pseudo-random BIST scheme for scan designs, which reduces the peak power consumption as well as the average power consumption as measured by the switching activity in the circuit. The method reduces the switching activity in the scan chains and the activity in the circuit under test by limiting the scan shifts to a portion of the scan chain structure using scan chain disable. Experimental results on various benchmark circuits demonstrate that the technique reduces the switching activity caused by scan shifts.
{"title":"A low power pseudo-random BIST technique","authors":"N. Z. Basturkmen, S. Reddy, I. Pomeranz","doi":"10.1109/ICCD.2002.1106815","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106815","url":null,"abstract":"Peak power consumption during testing is an important concern. For scan designs, a high level of switching activity is created in the circuit during scan shifts, which increases power consumption considerably. In this paper we propose a pseudo-random BIST scheme for scan designs, which reduces the peak power consumption as well as the average power consumption as measured by the switching activity in the circuit. The method reduces the switching activity in the scan chains and the activity in the circuit under test by limiting the scan shifts to a portion of the scan chain structure using scan chain disable. Experimental results on various benchmark circuits demonstrate that the technique reduces the switching activity caused by scan shifts.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133901352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}