Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528820
P. Franaszek, C. J. Georgiou, Chung-Sheng Li
We describe a method of controlling a three-stage Clos nonblocking switch where "speculative" self-routing over the Clos fabric is augmented with reservations over a control network that connects controllers in the input and output stages of the switch. The effect is that most connections succeed over the speculative path while those subject to contention are processed over the control network. We present simulation result which indicate that the inclusion of a control network yields significant benefits under heavily nonuniform traffic conditions.
{"title":"Adaptive routing in Clos networks","authors":"P. Franaszek, C. J. Georgiou, Chung-Sheng Li","doi":"10.1109/ICCD.1995.528820","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528820","url":null,"abstract":"We describe a method of controlling a three-stage Clos nonblocking switch where \"speculative\" self-routing over the Clos fabric is augmented with reservations over a control network that connects controllers in the input and output stages of the switch. The effect is that most connections succeed over the speculative path while those subject to contention are processed over the control network. We present simulation result which indicate that the inclusion of a control network yields significant benefits under heavily nonuniform traffic conditions.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127799800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528787
W. Richardson, E. Brunvand
Self-timed systems structured as multiple concurrent processes and communicating through self-timed queues are a convenient way to implement decoupled computer architectures. Machines of this type can exploit instruction level parallelism in a natural way, and can be easily modified and extended. However, providing a precise exception model for a self-timed micropipelined processor can be difficult, since the processor state does not change at uniformly discrete intervals. We present a precise exception method implemented for Fred, a self-timed, decoupled, pipelined computer architecture with out-of-order instruction completion.
{"title":"Precise exception handling for a self-timed processor","authors":"W. Richardson, E. Brunvand","doi":"10.1109/ICCD.1995.528787","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528787","url":null,"abstract":"Self-timed systems structured as multiple concurrent processes and communicating through self-timed queues are a convenient way to implement decoupled computer architectures. Machines of this type can exploit instruction level parallelism in a natural way, and can be easily modified and extended. However, providing a precise exception model for a self-timed micropipelined processor can be difficult, since the processor state does not change at uniformly discrete intervals. We present a precise exception method implemented for Fred, a self-timed, decoupled, pipelined computer architecture with out-of-order instruction completion.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132659191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528831
Jin Li, Chuan-lin Wu
Multicast function is essential for an ATM switch. We propose a novel architecture for an output-buffer ATM switch and a shared-buffer ATM switch to realize the multicast function in a more efficient way. In an output-buffer ATM switch, we dedicate a first-in and first-out (FIFO) shared buffer for all multicast cells to increase buffer utilization. In a shared-buffer ATM switch, we dedicate a FIFO address queue for all multicast cells to simplify the design of the control logic. Performance evaluation of the new switch is also provided. Since each multicast cell occupies only one buffer space, the proposed switch achieves a better cell-loss performance under multicast traffic loads without the need for complicated control circuitry.
{"title":"A novel architecture for an ATM switch","authors":"Jin Li, Chuan-lin Wu","doi":"10.1109/ICCD.1995.528831","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528831","url":null,"abstract":"Multicast function is essential for an ATM switch. We propose a novel architecture for an output-buffer ATM switch and a shared-buffer ATM switch to realize the multicast function in a more efficient way. In an output-buffer ATM switch, we dedicate a first-in and first-out (FIFO) shared buffer for all multicast cells to increase buffer utilization. In a shared-buffer ATM switch, we dedicate a FIFO address queue for all multicast cells to simplify the design of the control logic. Performance evaluation of the new switch is also provided. Since each multicast cell occupies only one buffer space, the proposed switch achieves a better cell-loss performance under multicast traffic loads without the need for complicated control circuitry.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"293 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132745328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528922
E. D. Greef, F. Catthoor, H. Man
In this paper, several DSP system design principles are presented which are valid for a large class of memory-intensive algorithms. Our main focus lies on the optimization of the memory and I/O, since these are dominant cost factors in the domain of video and imaging applications. This has resulted in several formalizable mapping principles, which allow to prevent the memory from becoming a bottleneck. First, it as shown that for this class of applications, compile-time data caching decisions not only have a large effect on the performance, but also can have an even larger effect on the overall system cost and power consumption. This is illustrated by means of experiments in which the whole range of no cache up to large cache sizes is scanned. Next, it is shown that when enforcing constant I/O rates to reduce buffer sizes, the area gain may be far more important than the small performance decrease associated with it. A technique to achieve this in an efficient way is proposed. The main test-vehicle which is used throughout the paper to demonstrate our approach is the class of motion estimation type algorithms.
{"title":"Memory organization for video algorithms on programmable signal processors","authors":"E. D. Greef, F. Catthoor, H. Man","doi":"10.1109/ICCD.1995.528922","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528922","url":null,"abstract":"In this paper, several DSP system design principles are presented which are valid for a large class of memory-intensive algorithms. Our main focus lies on the optimization of the memory and I/O, since these are dominant cost factors in the domain of video and imaging applications. This has resulted in several formalizable mapping principles, which allow to prevent the memory from becoming a bottleneck. First, it as shown that for this class of applications, compile-time data caching decisions not only have a large effect on the performance, but also can have an even larger effect on the overall system cost and power consumption. This is illustrated by means of experiments in which the whole range of no cache up to large cache sizes is scanned. Next, it is shown that when enforcing constant I/O rates to reduce buffer sizes, the area gain may be far more important than the small performance decrease associated with it. A technique to achieve this in an efficient way is proposed. The main test-vehicle which is used throughout the paper to demonstrate our approach is the class of motion estimation type algorithms.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133398164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528790
J. P. Calvez, O. Pasquier
Performance assessment of embedded Hw/Sw systems built with various types of VLSI components, i.e. heterogeneous multi-processor architectures, is important to help the development of complex real-time applications. To design such a tool, two issues are to be solved, relevant information gathered simultaneously on several components without disturbing the application behavior, and the display of the performance results in a way which is easily interpreted by designers. This paper presents an interesting solution for the two above issues. We first describe what the goal for designers is and what kind of applications are concerned. Then we describe the principle of collecting an event trace and the technique to evaluate the selected performance indexes. The monitoring technique, based on a specific ASIC, is nonintrusive and allows to capture real-time event occurrences from software tasks and even from hardware functions implemented in ASICs. Each event is automatically time-stamped, collected and processed in real-time to evaluate the performance indexes selected by the designer. We also describe the display tool which clearly shows to the designer the results according to different representations. This technique and the associated real-time performance analyzer are integrated in a whole development process based on the MCSE methodology.
{"title":"Performance assessment of embedded Hw/Sw systems","authors":"J. P. Calvez, O. Pasquier","doi":"10.1109/ICCD.1995.528790","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528790","url":null,"abstract":"Performance assessment of embedded Hw/Sw systems built with various types of VLSI components, i.e. heterogeneous multi-processor architectures, is important to help the development of complex real-time applications. To design such a tool, two issues are to be solved, relevant information gathered simultaneously on several components without disturbing the application behavior, and the display of the performance results in a way which is easily interpreted by designers. This paper presents an interesting solution for the two above issues. We first describe what the goal for designers is and what kind of applications are concerned. Then we describe the principle of collecting an event trace and the technique to evaluate the selected performance indexes. The monitoring technique, based on a specific ASIC, is nonintrusive and allows to capture real-time event occurrences from software tasks and even from hardware functions implemented in ASICs. Each event is automatically time-stamped, collected and processed in real-time to evaluate the performance indexes selected by the designer. We also describe the display tool which clearly shows to the designer the results according to different representations. This technique and the associated real-time performance analyzer are integrated in a whole development process based on the MCSE methodology.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130375269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528824
I. Pomeranz, S. Reddy
A test generation procedure to detect multiple state-table faults in finite-state machines is proposed. The importance of multiple state-table faults and their advantages as test generation objectives to avoid the need for checking experiments are considered. The proposed procedure is based on a new method for implicit enumeration of large numbers of multiple faults by using incompletely specified faulty machines. Experimental results are presented to demonstrate the effectiveness of implicit fault enumeration in detecting large numbers of multiple faults.
{"title":"Test generation for multiple state-table faults in finite-state machines","authors":"I. Pomeranz, S. Reddy","doi":"10.1109/ICCD.1995.528824","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528824","url":null,"abstract":"A test generation procedure to detect multiple state-table faults in finite-state machines is proposed. The importance of multiple state-table faults and their advantages as test generation objectives to avoid the need for checking experiments are considered. The proposed procedure is based on a new method for implicit enumeration of large numbers of multiple faults by using incompletely specified faulty machines. Experimental results are presented to demonstrate the effectiveness of implicit fault enumeration in detecting large numbers of multiple faults.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126346757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528843
S. Reddy
Summary form only given. As the testing area becomes mature, the challenges it poses shift. We describe some of these challenges and how they are addressed in recent works in various areas of testing. In recent years, the formulations of testing problems have changed from "given a problem, find a solution" to "given a problem and quality measures, find a high-quality solution". Quality guarantees in the form of lower and upper bounds and optimal solutions are derived, in addition to the more conventional demonstration of performance on benchmark circuits. Quality guarantees allow one to measure the distance between a given solution and an optimal solution, and provide criteria for evaluating a new procedure that are more effective than comparison to previously proposed procedures. We review several areas where bounds and optimal solutions have been found. Most procedures are specific to a given problem, and cannot be reused to solve other problems. In contrast, general-purpose paradigms allow a large variety of problems to be solved cost-effectively by plugging in the appropriate procedures into the same algorithm. Such paradigms allow faster program development and reuse of expertise acquired in solving other problems under the same paradigm. We describe several attempts at using existing paradigms and developing new ones, that successfully compete with special-purpose procedures. Recent works address testing issues at increasingly higher levels of the design cycle and offer an integrated treatment of design and test. High-level failure models are considered as well as solutions that are completely independent of a failure model. We describe some of these works and the advantages of the two directions. We conclude with an (incomplete) list of challenges for future research.
{"title":"Testing-what's missing? An incomplete list of challenges","authors":"S. Reddy","doi":"10.1109/ICCD.1995.528843","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528843","url":null,"abstract":"Summary form only given. As the testing area becomes mature, the challenges it poses shift. We describe some of these challenges and how they are addressed in recent works in various areas of testing. In recent years, the formulations of testing problems have changed from \"given a problem, find a solution\" to \"given a problem and quality measures, find a high-quality solution\". Quality guarantees in the form of lower and upper bounds and optimal solutions are derived, in addition to the more conventional demonstration of performance on benchmark circuits. Quality guarantees allow one to measure the distance between a given solution and an optimal solution, and provide criteria for evaluating a new procedure that are more effective than comparison to previously proposed procedures. We review several areas where bounds and optimal solutions have been found. Most procedures are specific to a given problem, and cannot be reused to solve other problems. In contrast, general-purpose paradigms allow a large variety of problems to be solved cost-effectively by plugging in the appropriate procedures into the same algorithm. Such paradigms allow faster program development and reuse of expertise acquired in solving other problems under the same paradigm. We describe several attempts at using existing paradigms and developing new ones, that successfully compete with special-purpose procedures. Recent works address testing issues at increasingly higher levels of the design cycle and offer an integrated treatment of design and test. High-level failure models are considered as well as solutions that are completely independent of a failure model. We describe some of these works and the advantages of the two directions. We conclude with an (incomplete) list of challenges for future research.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"52 27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124619820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528905
N. Passos, E. Sha, L. Chao
This paper presents a novel optimization technique for the design of application specific integrated circuits dedicated to perform iterative or recursive time-critical sections of multi-dimensional problems, such as image processing applications. These sections are modeled as cyclic multi-dimensional data flow graphs (MDFGs). This new technique, called multi-dimensional interleaving consists of an expansion and compression of the iteration space while considering memory requirements. It guarantees that all functional elements of a circuitry can be executed simultaneously, and no additional memory queues proportional to the problem size are required. The algorithm runs in O(|E|) time, where E is the set of edges of the MDFG representing the circuit.
{"title":"Multi-dimensional interleaving for time-and-memory design optimization","authors":"N. Passos, E. Sha, L. Chao","doi":"10.1109/ICCD.1995.528905","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528905","url":null,"abstract":"This paper presents a novel optimization technique for the design of application specific integrated circuits dedicated to perform iterative or recursive time-critical sections of multi-dimensional problems, such as image processing applications. These sections are modeled as cyclic multi-dimensional data flow graphs (MDFGs). This new technique, called multi-dimensional interleaving consists of an expansion and compression of the iteration space while considering memory requirements. It guarantees that all functional elements of a circuitry can be executed simultaneously, and no additional memory queues proportional to the problem size are required. The algorithm runs in O(|E|) time, where E is the set of edges of the MDFG representing the circuit.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129891493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528944
Stefan Radtke, J. Bargfrede, W. Anheier
The generation of test patterns for digital circuits is known as an NP hard problem. Due to the backtracking mechanism in the sequential algorithms for test pattern generation it is difficult to speed up the process. In this paper we present a parallel formulation of the FAN algorithm implemented on a heterogeneous cluster of workstations. Two different methods are used to take into account easy- and hard-to-detect faults. We show the strategies for our parallel implementations as well as implementation details. Linear speedups are shown with the results. Furthermore we introduce a new method for test vector compaction using a genetic algorithm. This results in smaller test sets compared to traditional methods. The reader should be familiar with notations of the FAN algorithm.
{"title":"Distributed automatic test pattern generation with a parallel FAN algorithm","authors":"Stefan Radtke, J. Bargfrede, W. Anheier","doi":"10.1109/ICCD.1995.528944","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528944","url":null,"abstract":"The generation of test patterns for digital circuits is known as an NP hard problem. Due to the backtracking mechanism in the sequential algorithms for test pattern generation it is difficult to speed up the process. In this paper we present a parallel formulation of the FAN algorithm implemented on a heterogeneous cluster of workstations. Two different methods are used to take into account easy- and hard-to-detect faults. We show the strategies for our parallel implementations as well as implementation details. Linear speedups are shown with the results. Furthermore we introduce a new method for test vector compaction using a genetic algorithm. This results in smaller test sets compared to traditional methods. The reader should be familiar with notations of the FAN algorithm.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128822374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528798
J. Wellman, E. Davidson
In this paper we propose a new execution trace driven simulation technique, called the Resource Conflict Methodology (RCM) for modeling and simulating computer systems early in the design cycle. By using a simplified hardware element model which allows the user to easily add or delete hardware elements in the model, RCM allows the user to readily change the machine design being investigated and to evaluate the resulting machine on a given workload. We describe the RCM model with reference to a family of superscalar processors and develop an RCM-based analysis program (called REAP) for this family of processors. Using REAP, we demonstrate the validity of our method by comparing its RCM performance estimates to those of a traditional early design stage timer model.
{"title":"The resource conflict methodology for early-stage design space exploration of superscalar RISC processors","authors":"J. Wellman, E. Davidson","doi":"10.1109/ICCD.1995.528798","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528798","url":null,"abstract":"In this paper we propose a new execution trace driven simulation technique, called the Resource Conflict Methodology (RCM) for modeling and simulating computer systems early in the design cycle. By using a simplified hardware element model which allows the user to easily add or delete hardware elements in the model, RCM allows the user to readily change the machine design being investigated and to evaluate the resulting machine on a given workload. We describe the RCM model with reference to a family of superscalar processors and develop an RCM-based analysis program (called REAP) for this family of processors. Using REAP, we demonstrate the validity of our method by comparing its RCM performance estimates to those of a traditional early design stage timer model.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126581743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}