In this paper we address the problem of code generation for basic blocks in heterogeneous memory-register DSP processors. We propose a new a technique, based on register-transfer paths, that can be used for efficiently dismantling basic block DAGs (Directed Acyclic Graphs) into expression trees. This approach builds on recent results which report optimal code generation algorithm for expression trees for these architectures. This technique has been implemented and experimentally validated for the TMS320C25, a popular fixed point DSP processor. The results show that good code quality can be obtained using the proposed technique. An analysis of the type of DAGs found in the DSPstone benchmark programs reveals that the majority of basic blocks in this benchmark set are expression trees and leaf DAGs. This leads to our claim that tree based algorithms, like the one described in this paper, should be the technique of choice for basic blocks code generation with heterogeneous memory register architectures.
{"title":"Using register-transfer paths in code generation for heterogeneous memory-register architectures","authors":"G. Araújo, S. Malik, M. Lee","doi":"10.1109/DAC.1996.545644","DOIUrl":"https://doi.org/10.1109/DAC.1996.545644","url":null,"abstract":"In this paper we address the problem of code generation for basic blocks in heterogeneous memory-register DSP processors. We propose a new a technique, based on register-transfer paths, that can be used for efficiently dismantling basic block DAGs (Directed Acyclic Graphs) into expression trees. This approach builds on recent results which report optimal code generation algorithm for expression trees for these architectures. This technique has been implemented and experimentally validated for the TMS320C25, a popular fixed point DSP processor. The results show that good code quality can be obtained using the proposed technique. An analysis of the type of DAGs found in the DSPstone benchmark programs reveals that the majority of basic blocks in this benchmark set are expression trees and leaf DAGs. This leads to our claim that tree based algorithms, like the one described in this paper, should be the technique of choice for basic blocks code generation with heterogeneous memory register architectures.","PeriodicalId":152966,"journal":{"name":"33rd Design Automation Conference Proceedings, 1996","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115328805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a technique for simulating processors and attached hardware using the principle of compiled simulation. Unlike existing, inhouse and off-the-shelf hardware/software co-simulators, which use interpretive processor simulation, the proposed technique performs instruction decoding and simulation scheduling at compile time. The technique offers up to three orders of magnitude faster simulation. The high speed allows the user to explore algorithms and hardware/software trade-offs before any hardware implementation. In this paper, the sources of the speedup and the limitations of the technique are analyzed and the realization of the simulation compiler is presented.
{"title":"Compiled HW/SW co-simulation","authors":"V. Zivojnovic, H. Meyr","doi":"10.1109/DAC.1996.545662","DOIUrl":"https://doi.org/10.1109/DAC.1996.545662","url":null,"abstract":"This paper presents a technique for simulating processors and attached hardware using the principle of compiled simulation. Unlike existing, inhouse and off-the-shelf hardware/software co-simulators, which use interpretive processor simulation, the proposed technique performs instruction decoding and simulation scheduling at compile time. The technique offers up to three orders of magnitude faster simulation. The high speed allows the user to explore algorithms and hardware/software trade-offs before any hardware implementation. In this paper, the sources of the speedup and the limitations of the technique are analyzed and the realization of the simulation compiler is presented.","PeriodicalId":152966,"journal":{"name":"33rd Design Automation Conference Proceedings, 1996","volume":"390 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124559578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Previous solutions to the difficult problem of identifying sequential redundancy are either based on incorrect theoretical results, or rely an unrealistic simplifying assumptions, or are applicable only to small circuits. In this paper we show the limitations of the existing definitions of sequential redundancy and introduce a new concept of c-cycle redundancy as a generalization of the conventional notion of sequential redundancy. We present an efficient algorithm, FIRES, to identify c-cycle redundancies without search. FIRES does not assume the existence of a global reset nor does it require any state transition information. FIRES has provably polynomial-time complexity and is practical for large circuits. Experimental results on benchmark circuits indicate that FIRES identifies a large number of redundancies. We show that, in general, the redundant faults identified by FIRES are not easy targets for state-of-the-art sequential rest generators.
{"title":"Identifying sequential redundancies without search","authors":"M. Iyer, D. E. Long, M. Abramovici","doi":"10.1145/240518.240605","DOIUrl":"https://doi.org/10.1145/240518.240605","url":null,"abstract":"Previous solutions to the difficult problem of identifying sequential redundancy are either based on incorrect theoretical results, or rely an unrealistic simplifying assumptions, or are applicable only to small circuits. In this paper we show the limitations of the existing definitions of sequential redundancy and introduce a new concept of c-cycle redundancy as a generalization of the conventional notion of sequential redundancy. We present an efficient algorithm, FIRES, to identify c-cycle redundancies without search. FIRES does not assume the existence of a global reset nor does it require any state transition information. FIRES has provably polynomial-time complexity and is practical for large circuits. Experimental results on benchmark circuits indicate that FIRES identifies a large number of redundancies. We show that, in general, the redundant faults identified by FIRES are not easy targets for state-of-the-art sequential rest generators.","PeriodicalId":152966,"journal":{"name":"33rd Design Automation Conference Proceedings, 1996","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129390680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The advent of parallel executing address calculation units (ACUs) in digital signal processor (DSP) and application specific instruction-set processor (ASIP) architectures has made a strong impact on an application's ability to efficiently access memories. Unfortunately, successful compiler techniques which map high-level language data constructs to the addressing units of the architecture have lagged far behind. Since access to data is often the most demanding task in DSP, this mapping can be the most crucial function of the compiler. This paper introduces a new retargetable approach and prototype tool for the analysis of array references and traversals for efficient use of ACUs. The ArrSyn utility is designed to be used either as an enhancement to an existing dedicated compiler or as an aid for architecture exploration.
{"title":"Address calculation for retargetable compilation and exploration of instruction-set architectures","authors":"C. Liem, P. Paulin, A. Jerraya","doi":"10.1145/240518.240631","DOIUrl":"https://doi.org/10.1145/240518.240631","url":null,"abstract":"The advent of parallel executing address calculation units (ACUs) in digital signal processor (DSP) and application specific instruction-set processor (ASIP) architectures has made a strong impact on an application's ability to efficiently access memories. Unfortunately, successful compiler techniques which map high-level language data constructs to the addressing units of the architecture have lagged far behind. Since access to data is often the most demanding task in DSP, this mapping can be the most crucial function of the compiler. This paper introduces a new retargetable approach and prototype tool for the analysis of array references and traversals for efficient use of ACUs. The ArrSyn utility is designed to be used either as an enhancement to an existing dedicated compiler or as an aid for architecture exploration.","PeriodicalId":152966,"journal":{"name":"33rd Design Automation Conference Proceedings, 1996","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129847313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, a new method for the design of complex multimedia ASIC is introduced. Using this design method, VLSI with embedded software and high-speed emulators can be developed concurrently. The method has proven to be effective through actual design of VLSI for audio compression and decompression in a Mini-Disc system.
{"title":"VLSI design and system level verification for the Mini-Disc","authors":"T. Fujimoto, T. Kambe","doi":"10.1145/240518.240612","DOIUrl":"https://doi.org/10.1145/240518.240612","url":null,"abstract":"In this paper, a new method for the design of complex multimedia ASIC is introduced. Using this design method, VLSI with embedded software and high-speed emulators can be developed concurrently. The method has proven to be effective through actual design of VLSI for audio compression and decompression in a Mini-Disc system.","PeriodicalId":152966,"journal":{"name":"33rd Design Automation Conference Proceedings, 1996","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127348720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
While delay modeling of gates with a single switching input has received considerable attention, the case of multiple inputs switching in close temporal proximity is just beginning to be addressed in the literature. The effect of proximity of input transitions can be significant on the delay and output transition time. The few attempts that have addressed this issue are based on a series-parallel transistor collapsing method that reduces the multi-input gate to an inverter. This limits the technique to CMOS technology. Moreover, none of them discuss the appropriate choice of voltage thresholds to measure delay for a multi-input gate. In this paper, we first present a method for the choice of voltage thresholds for a multi-input gate that ensures a positive value of delay under all input conditions. We next introduce a dual-input proximity model for the case when only two inputs of the gate are switching. We then propose a simple approximate algorithm for calculating the delay and output transition time that makes repeated use of the dual-input proximity model without collapsing the gate into an equivalent inverter. Comparison with simulation results shows that our method performs quite well in practice.
{"title":"Modeling the effects of temporal proximity of input transitions on gate propagation delay and transition time","authors":"V. Chandramouli, K. Sakallah","doi":"10.1109/DAC.1996.545649","DOIUrl":"https://doi.org/10.1109/DAC.1996.545649","url":null,"abstract":"While delay modeling of gates with a single switching input has received considerable attention, the case of multiple inputs switching in close temporal proximity is just beginning to be addressed in the literature. The effect of proximity of input transitions can be significant on the delay and output transition time. The few attempts that have addressed this issue are based on a series-parallel transistor collapsing method that reduces the multi-input gate to an inverter. This limits the technique to CMOS technology. Moreover, none of them discuss the appropriate choice of voltage thresholds to measure delay for a multi-input gate. In this paper, we first present a method for the choice of voltage thresholds for a multi-input gate that ensures a positive value of delay under all input conditions. We next introduce a dual-input proximity model for the case when only two inputs of the gate are switching. We then propose a simple approximate algorithm for calculating the delay and output transition time that makes repeated use of the dual-input proximity model without collapsing the gate into an equivalent inverter. Comparison with simulation results shows that our method performs quite well in practice.","PeriodicalId":152966,"journal":{"name":"33rd Design Automation Conference Proceedings, 1996","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130996906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chin-Chi Teng, Yi-Kan Cheng, E. Rosenbaum, S. Kang
In this paper, we present a hierarchical reliability-driven CAD system for the design of electromigration resistant circuits. The top of the hierarchy aims at quickly identifying those critical interconnects with potential electromigration reliability problems. Then the detailed electromigration analysis of critical interconnects is carried out by an accurate and computationally efficient simulation tool (ITEM). This top-down approach provides a feasible solution to the complicated electromigration diagnosis problem.
{"title":"Hierarchical electromigration reliability diagnosis for VLSI interconnects","authors":"Chin-Chi Teng, Yi-Kan Cheng, E. Rosenbaum, S. Kang","doi":"10.1145/240518.240661","DOIUrl":"https://doi.org/10.1145/240518.240661","url":null,"abstract":"In this paper, we present a hierarchical reliability-driven CAD system for the design of electromigration resistant circuits. The top of the hierarchy aims at quickly identifying those critical interconnects with potential electromigration reliability problems. Then the detailed electromigration analysis of critical interconnects is carried out by an accurate and computationally efficient simulation tool (ITEM). This top-down approach provides a feasible solution to the complicated electromigration diagnosis problem.","PeriodicalId":152966,"journal":{"name":"33rd Design Automation Conference Proceedings, 1996","volume":"158 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123329307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Kudva, G. Gopalakrishnan, H. Jacobson, S. Nowick
This paper addresses the problem of realizing hazard-free single-output Boolean functions through a network of customized complex CMOS gates tailored to a given asynchronous controller specification. A customized CMOS gate network can either be a single CMOS gate or a multilevel network of CMOS gates. It is shown that hazard-free requirements for such networks are less restrictive than for simple gate networks. Analysis and efficient synthesis methods to generate such networks under a multiple-input change assumption (MIC) are presented.
{"title":"Synthesis of hazard-free customized CMOS complex-gate networks under multiple-input changes","authors":"P. Kudva, G. Gopalakrishnan, H. Jacobson, S. Nowick","doi":"10.1145/240518.240534","DOIUrl":"https://doi.org/10.1145/240518.240534","url":null,"abstract":"This paper addresses the problem of realizing hazard-free single-output Boolean functions through a network of customized complex CMOS gates tailored to a given asynchronous controller specification. A customized CMOS gate network can either be a single CMOS gate or a multilevel network of CMOS gates. It is shown that hazard-free requirements for such networks are less restrictive than for simple gate networks. Analysis and efficient synthesis methods to generate such networks under a multiple-input change assumption (MIC) are presented.","PeriodicalId":152966,"journal":{"name":"33rd Design Automation Conference Proceedings, 1996","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121784185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes a novel equivalence checking method for combinational circuits, which utilizes relations among internal signals represented by binary decision diagrams. To verify circuits efficiently, a proper set of internal signals that are independent with each other should be chosen. A heuristic based on analysis of circuit structure is proposed to select such a set of internal signals. The proposed verifier requires only a minute for equivalence checking of all the ISCAS'85 benchmarks on SUN-4/10.
{"title":"An efficient equivalence checker for combinational circuits","authors":"Y. Matsunaga","doi":"10.1109/DAC.1996.545651","DOIUrl":"https://doi.org/10.1109/DAC.1996.545651","url":null,"abstract":"This paper describes a novel equivalence checking method for combinational circuits, which utilizes relations among internal signals represented by binary decision diagrams. To verify circuits efficiently, a proper set of internal signals that are independent with each other should be chosen. A heuristic based on analysis of circuit structure is proposed to select such a set of internal signals. The proposed verifier requires only a minute for equivalence checking of all the ISCAS'85 benchmarks on SUN-4/10.","PeriodicalId":152966,"journal":{"name":"33rd Design Automation Conference Proceedings, 1996","volume":"307 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132284366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The testability of basic DSP datapath structures using pseudorandom built in self test techniques is examined. The addition of variance mismatched signals is identified as a testing problem, and the associated fault detection probabilities are derived in terms of signal probability distributions. A method of calculating these distributions is described, and it is shown how these distributions can be used to predict testing problems that arise from the correlation properties of test sequences generated using linear feedback shift registers. Finally, it is shown empirically that variance matching using associativity transformations can reduce the number of untested faults by a factor of eight over variance mismatched designs.
{"title":"Pseudorandom-pattern test resistance in high-performance DSP datapaths","authors":"L. Goodby, A. Orailoglu","doi":"10.1109/DAC.1996.545683","DOIUrl":"https://doi.org/10.1109/DAC.1996.545683","url":null,"abstract":"The testability of basic DSP datapath structures using pseudorandom built in self test techniques is examined. The addition of variance mismatched signals is identified as a testing problem, and the associated fault detection probabilities are derived in terms of signal probability distributions. A method of calculating these distributions is described, and it is shown how these distributions can be used to predict testing problems that arise from the correlation properties of test sequences generated using linear feedback shift registers. Finally, it is shown empirically that variance matching using associativity transformations can reduce the number of untested faults by a factor of eight over variance mismatched designs.","PeriodicalId":152966,"journal":{"name":"33rd Design Automation Conference Proceedings, 1996","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134430347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}