Pub Date : 1998-01-04DOI: 10.1109/ICVD.1998.646655
R. Parthasarathy, R. Sridhar
Wave pipelining is a digital design technique that can be applied to combinational logic circuits to increase the throughput of the system without increasing the demand for storage space and power. The internal capacitances of the gates are used for storage. The gate library for wave pipelining should have input independent, functionality independent and load capacitance independent delays. Conventional static CMOS has input dependent delay and is not suitable for wave pipelining. The wave pipelining design technique requires path delay equalization along all paths from the input to output. Delay balancing is achieved in a design by means of a process called "tuning". Rough tuning, is performed to balance all the paths with the same number of gates and fine tuning is done to adjust the sizes of transistors in the driver gate for different loads. The design styles that have been proposed for wave pipelining have unbalanced input loading and this results in complex fine tuning process. In this paper double pass transistor logic style (DPL) gates are modified to form a library of basic gates having perfect input symmetry. The balanced input capacitance of the DPL gates makes the fine tuning process less computation intensive. A fine tuning method is presented in this paper for wave pipeline designs with DPL logic. An 8 bit adder was designed and the results are presented to show the performance efficiency of double pass transistor logic for wave pipelining.
{"title":"Double pass-transistor logic for high performance wave pipeline circuits","authors":"R. Parthasarathy, R. Sridhar","doi":"10.1109/ICVD.1998.646655","DOIUrl":"https://doi.org/10.1109/ICVD.1998.646655","url":null,"abstract":"Wave pipelining is a digital design technique that can be applied to combinational logic circuits to increase the throughput of the system without increasing the demand for storage space and power. The internal capacitances of the gates are used for storage. The gate library for wave pipelining should have input independent, functionality independent and load capacitance independent delays. Conventional static CMOS has input dependent delay and is not suitable for wave pipelining. The wave pipelining design technique requires path delay equalization along all paths from the input to output. Delay balancing is achieved in a design by means of a process called \"tuning\". Rough tuning, is performed to balance all the paths with the same number of gates and fine tuning is done to adjust the sizes of transistors in the driver gate for different loads. The design styles that have been proposed for wave pipelining have unbalanced input loading and this results in complex fine tuning process. In this paper double pass transistor logic style (DPL) gates are modified to form a library of basic gates having perfect input symmetry. The balanced input capacitance of the DPL gates makes the fine tuning process less computation intensive. A fine tuning method is presented in this paper for wave pipeline designs with DPL logic. An 8 bit adder was designed and the results are presented to show the performance efficiency of double pass transistor logic for wave pipelining.","PeriodicalId":139023,"journal":{"name":"Proceedings Eleventh International Conference on VLSI Design","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134128607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICVD.1998.646625
G. Ascia, V. Catania
This paper presents the architecture of a parallel processor dedicated to real-time fuzzy application. The main features of the architecture are: a pre-computation phase of the positive degree of truth of the antecedent with fuzzy inputs; a detection phase of the active rules. The processing speed is up to 2.8 MFLIPS (256 Rules, 8 Antecedents, 1 Consequent). The silicon area estimated is 25 mm/sup 2/.
{"title":"A framework for a parallel architecture dedicated to soft computing","authors":"G. Ascia, V. Catania","doi":"10.1109/ICVD.1998.646625","DOIUrl":"https://doi.org/10.1109/ICVD.1998.646625","url":null,"abstract":"This paper presents the architecture of a parallel processor dedicated to real-time fuzzy application. The main features of the architecture are: a pre-computation phase of the positive degree of truth of the antecedent with fuzzy inputs; a detection phase of the active rules. The processing speed is up to 2.8 MFLIPS (256 Rules, 8 Antecedents, 1 Consequent). The silicon area estimated is 25 mm/sup 2/.","PeriodicalId":139023,"journal":{"name":"Proceedings Eleventh International Conference on VLSI Design","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133324349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICVD.1998.646658
Vijay A. Nebhrajani, Nayan Suthar
This paper provides a deeper insight into the synthesis mechanism of VHDL tools. It examines three methods of writing VHDL code, and each of the three models finite state machines in a different way. There can be significant reductions in the VLSI area and improvements in performance by adopting a certain modeling style, but this is at the cost of writing low level VHDL code, thereby undermining the purpose of VHDL as the design, entry medium. However, there is a simpler approach, which is demonstrated by a software tool called vtvt which allows writing VHDL code at high level and optimizes for area and performance without the burden of writing and maintaining low level code.
{"title":"Finite state machines: a deeper look into synthesis optimization for VHDL","authors":"Vijay A. Nebhrajani, Nayan Suthar","doi":"10.1109/ICVD.1998.646658","DOIUrl":"https://doi.org/10.1109/ICVD.1998.646658","url":null,"abstract":"This paper provides a deeper insight into the synthesis mechanism of VHDL tools. It examines three methods of writing VHDL code, and each of the three models finite state machines in a different way. There can be significant reductions in the VLSI area and improvements in performance by adopting a certain modeling style, but this is at the cost of writing low level VHDL code, thereby undermining the purpose of VHDL as the design, entry medium. However, there is a simpler approach, which is demonstrated by a software tool called vtvt which allows writing VHDL code at high level and optimizes for area and performance without the burden of writing and maintaining low level code.","PeriodicalId":139023,"journal":{"name":"Proceedings Eleventh International Conference on VLSI Design","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132585153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICVD.1998.646621
M. F. Abdulla, C. Ravikumar, Anshul Kumar
Signature based techniques have been well known for the built-in self-test of integrated systems. We propose a novel test architecture which uses a judicious combination of mutual testing and signature testing to achieve low test area overhead, low aliasing probability and low test application time. The proposed architecture is powerful for testing highly concurrent systems in applications such as iterative logic arrays, real-time systems, systolic arrays, and low-latency pipelines which tend to have a large number of functional modules of a similar nature. We provide graph-theoretic optimization algorithms to optimize the test area and test application time of the resulting test architecture.
{"title":"Hybrid testing schemes based on mutual and signature testing","authors":"M. F. Abdulla, C. Ravikumar, Anshul Kumar","doi":"10.1109/ICVD.1998.646621","DOIUrl":"https://doi.org/10.1109/ICVD.1998.646621","url":null,"abstract":"Signature based techniques have been well known for the built-in self-test of integrated systems. We propose a novel test architecture which uses a judicious combination of mutual testing and signature testing to achieve low test area overhead, low aliasing probability and low test application time. The proposed architecture is powerful for testing highly concurrent systems in applications such as iterative logic arrays, real-time systems, systolic arrays, and low-latency pipelines which tend to have a large number of functional modules of a similar nature. We provide graph-theoretic optimization algorithms to optimize the test area and test application time of the resulting test architecture.","PeriodicalId":139023,"journal":{"name":"Proceedings Eleventh International Conference on VLSI Design","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114541727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICVD.1998.646636
S. Balajee, A. Majhi
One of the major requirements for testing VLSI devices is the validation of its timing specifications. Timing specifications would typically include frequency, propagation delays, minimum pulse width, phase offsets, setup time and hold time measurements. Although parametric specifications may exist for a nominal speed (frequency) of operation of the digital device, it may be necessary to characterize the device under test (DUT) to determine the highest operating frequency of the DUT and the required environmental parameters to run at the highest frequency. Characterization involves measurement of setup time, hold time and pulse width of the signals. In this paper, we have presented an automated AC (timing) characterization flow for digital circuit testing. We have recommended a STIL (Standard Tester Interface Language) like syntax for the timing tests. Various timing data (setup and hold time, propagation delay etc.) are measured in the first pass of the characterization process and are automatically back annotated to the timing test flow to reduce the total test cycle time. The approach will also help in finding the maximum operating frequency of the DUT and speed binning (i.e., sorting the devices based on their operating frequency).
{"title":"Automated AC (timing) characterization for digital circuit testing","authors":"S. Balajee, A. Majhi","doi":"10.1109/ICVD.1998.646636","DOIUrl":"https://doi.org/10.1109/ICVD.1998.646636","url":null,"abstract":"One of the major requirements for testing VLSI devices is the validation of its timing specifications. Timing specifications would typically include frequency, propagation delays, minimum pulse width, phase offsets, setup time and hold time measurements. Although parametric specifications may exist for a nominal speed (frequency) of operation of the digital device, it may be necessary to characterize the device under test (DUT) to determine the highest operating frequency of the DUT and the required environmental parameters to run at the highest frequency. Characterization involves measurement of setup time, hold time and pulse width of the signals. In this paper, we have presented an automated AC (timing) characterization flow for digital circuit testing. We have recommended a STIL (Standard Tester Interface Language) like syntax for the timing tests. Various timing data (setup and hold time, propagation delay etc.) are measured in the first pass of the characterization process and are automatically back annotated to the timing test flow to reduce the total test cycle time. The approach will also help in finding the maximum operating frequency of the DUT and speed binning (i.e., sorting the devices based on their operating frequency).","PeriodicalId":139023,"journal":{"name":"Proceedings Eleventh International Conference on VLSI Design","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114601580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICVD.1998.646608
V. Krishna, R. Chandramouli, N. Ranganathan
Accurate switching activity estimation is crucial for power budgeting. It is impractical to obtain an accurate estimate by simulating the circuit for all possible inputs. An alternate approach would be to compute tight bounds for the switching activity. In this paper, we propose a non-simulative method to compute bounds for switching activity at the logic level. First, we show that the switching activity can be modeled as the Bayesian distance for an abstract two class problem. The computation of the upper and lower bounds for the switching activity is unified in to a single function, /spl psi/(/spl alpha/,p,/spl rho/), where /spl alpha/ is a parameter, /spl rho/ is the temporal correlation factor and p is the signal probability. The constraints on /spl alpha/ for /spl psi/(/spl alpha/,p,/spl rho/) to be tight upper and lower bounds are derived. The proposed approach computes bounds for individual gate switching. Experimental results are obtained by taking spatial and temporal correlations into account. The computations are simple and fast.
{"title":"Computation of lower and upper bounds for switching activity: a unified approach","authors":"V. Krishna, R. Chandramouli, N. Ranganathan","doi":"10.1109/ICVD.1998.646608","DOIUrl":"https://doi.org/10.1109/ICVD.1998.646608","url":null,"abstract":"Accurate switching activity estimation is crucial for power budgeting. It is impractical to obtain an accurate estimate by simulating the circuit for all possible inputs. An alternate approach would be to compute tight bounds for the switching activity. In this paper, we propose a non-simulative method to compute bounds for switching activity at the logic level. First, we show that the switching activity can be modeled as the Bayesian distance for an abstract two class problem. The computation of the upper and lower bounds for the switching activity is unified in to a single function, /spl psi/(/spl alpha/,p,/spl rho/), where /spl alpha/ is a parameter, /spl rho/ is the temporal correlation factor and p is the signal probability. The constraints on /spl alpha/ for /spl psi/(/spl alpha/,p,/spl rho/) to be tight upper and lower bounds are derived. The proposed approach computes bounds for individual gate switching. Experimental results are obtained by taking spatial and temporal correlations into account. The computations are simple and fast.","PeriodicalId":139023,"journal":{"name":"Proceedings Eleventh International Conference on VLSI Design","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133811343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICVD.1998.646652
S. Venkataraman, W. Fuchs, J. Patel
This paper describes a technique to accelerate diagnostic fault simulation of sequential circuits using fault sampling. Diagnostic fault simulation involves computing the indistinguishability relationship between all pairs of modeled faults. The input space is the set of all pairs of modeled faults, thus making the simulation computationally intensive. The diagnostic simulation process is accelerated by considering a sub-space of the input space that is obtained using fault sampling. Results on performance speedup and diagnostic resolution loss are provided for the ISCAS 89 benchmark circuits.
{"title":"Diagnostic simulation of sequential circuits using fault sampling","authors":"S. Venkataraman, W. Fuchs, J. Patel","doi":"10.1109/ICVD.1998.646652","DOIUrl":"https://doi.org/10.1109/ICVD.1998.646652","url":null,"abstract":"This paper describes a technique to accelerate diagnostic fault simulation of sequential circuits using fault sampling. Diagnostic fault simulation involves computing the indistinguishability relationship between all pairs of modeled faults. The input space is the set of all pairs of modeled faults, thus making the simulation computationally intensive. The diagnostic simulation process is accelerated by considering a sub-space of the input space that is obtained using fault sampling. Results on performance speedup and diagnostic resolution loss are provided for the ISCAS 89 benchmark circuits.","PeriodicalId":139023,"journal":{"name":"Proceedings Eleventh International Conference on VLSI Design","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133272966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICVD.1998.646572
S. Ramprasad, Naresh R Shanbhag, I. Hajj
Presented in this paper is a source-coding framework for the design of coding schemes to reduce transition activity. These schemes are suited for high capacitance busses where the extra power dissipation due to the encoder and the decoder circuitry is offset by the power savings at the bus. A framework to characterize low-power encoding schemes is developed based upon the source-channel coding view. In this framework, a data source (characterized in a probabilistic manner) is passed through a decorrelating function f/sub 1/ first. Next, a variant of entropy coding function f/sub 2/ is employed, which reduces the transition activity. The framework is then employed to derive novel encoding schemes whereby practical forms for f/sub 1/ and f/sub 2/ are proposed. Simulation results with an encoding scheme for data busses indicate an average reduction in transition activity of 36%. This translates into a reduction in total power dissipation for bus capacitances greater than 14 pF/bit in 1.2 /spl mu/ CMOS technology and eight times more pourer savings compared to existing schemes with a typical value for bus capacitance of 50p F/bit. Simulation results with an encoding scheme for instruction address busses indicate an average reduction in transition activity by a factor of 3 times and 1.5 times over the Gray and TO coding schemes respectively.
{"title":"Coding for low-power address and data busses: a source-coding framework and applications","authors":"S. Ramprasad, Naresh R Shanbhag, I. Hajj","doi":"10.1109/ICVD.1998.646572","DOIUrl":"https://doi.org/10.1109/ICVD.1998.646572","url":null,"abstract":"Presented in this paper is a source-coding framework for the design of coding schemes to reduce transition activity. These schemes are suited for high capacitance busses where the extra power dissipation due to the encoder and the decoder circuitry is offset by the power savings at the bus. A framework to characterize low-power encoding schemes is developed based upon the source-channel coding view. In this framework, a data source (characterized in a probabilistic manner) is passed through a decorrelating function f/sub 1/ first. Next, a variant of entropy coding function f/sub 2/ is employed, which reduces the transition activity. The framework is then employed to derive novel encoding schemes whereby practical forms for f/sub 1/ and f/sub 2/ are proposed. Simulation results with an encoding scheme for data busses indicate an average reduction in transition activity of 36%. This translates into a reduction in total power dissipation for bus capacitances greater than 14 pF/bit in 1.2 /spl mu/ CMOS technology and eight times more pourer savings compared to existing schemes with a typical value for bus capacitance of 50p F/bit. Simulation results with an encoding scheme for instruction address busses indicate an average reduction in transition activity by a factor of 3 times and 1.5 times over the Gray and TO coding schemes respectively.","PeriodicalId":139023,"journal":{"name":"Proceedings Eleventh International Conference on VLSI Design","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125643294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICVD.1998.646599
A. Balakrishnan, S. Chakradhar
We propose a new partial scan technique that incurs significantly less area overhead than the pipeline technique (all feedback cycles including self-loops are broken) and yet achieves very high test coverage in short CPU times. Our proposal selects scan flip-flops so that the circuit satisfies two key properties in the test mode. First, the circuit is partitioned into peripherally interacting finite state machines (peripheral partitions). Peripheral partitions do not have combinational paths between flip-flops belonging to different partitions. Second, the flip-flop dependency graph (S-graph) of each peripheral partition has a tree structure. Our technique does not require self-loops to be broken. We believe that peripheral partitions with tree structure S-graphs inherently require low sequential test generation resources. We develop an efficient algorithm for peripheral partitioning and tree decomposition of the S-graph. The scan flip-flop selection algorithm iteratively partitions the S-graph into disjoint sub-graphs with the tree structure. We report results on all the large circuits in the ISCAS 89 benchmark set. These results show that our technique produces scan circuits for which very high (near 100%) fault efficiency is achievable in extremely short CPU times. The high fault efficiencies achieved by our technique are comparable to that of pipeline circuits. However, the area overhead for our technique is significantly less than the pipeline case.
{"title":"Peripheral partitioning and tree decomposition for partial scan","authors":"A. Balakrishnan, S. Chakradhar","doi":"10.1109/ICVD.1998.646599","DOIUrl":"https://doi.org/10.1109/ICVD.1998.646599","url":null,"abstract":"We propose a new partial scan technique that incurs significantly less area overhead than the pipeline technique (all feedback cycles including self-loops are broken) and yet achieves very high test coverage in short CPU times. Our proposal selects scan flip-flops so that the circuit satisfies two key properties in the test mode. First, the circuit is partitioned into peripherally interacting finite state machines (peripheral partitions). Peripheral partitions do not have combinational paths between flip-flops belonging to different partitions. Second, the flip-flop dependency graph (S-graph) of each peripheral partition has a tree structure. Our technique does not require self-loops to be broken. We believe that peripheral partitions with tree structure S-graphs inherently require low sequential test generation resources. We develop an efficient algorithm for peripheral partitioning and tree decomposition of the S-graph. The scan flip-flop selection algorithm iteratively partitions the S-graph into disjoint sub-graphs with the tree structure. We report results on all the large circuits in the ISCAS 89 benchmark set. These results show that our technique produces scan circuits for which very high (near 100%) fault efficiency is achievable in extremely short CPU times. The high fault efficiencies achieved by our technique are comparable to that of pipeline circuits. However, the area overhead for our technique is significantly less than the pipeline case.","PeriodicalId":139023,"journal":{"name":"Proceedings Eleventh International Conference on VLSI Design","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126260069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICVD.1998.646646
V. Srinivasan, R. Vemuri
This paper presents a fast and efficient heuristic for pipelining a loop under resource-constraints. The loop is represented as a dependence graph, G whose nodes are operations that are bound to available resources and edges denote the data dependencies between the operations. The data dependencies restrict the degree of parallelism that can be achieved while scheduling the graph. We propose a fast retiming based graph transformation technique which relates the data dependencies in the graph while maintaining functional equivalence. Relaxing data dependencies provides more flexibility for the scheduler to schedule operations, thereby leading to faster throughput. Our objective is to obtain a retimed graph which when scheduled achieves an optimal/near-optimal pipelined steady state throughput. A detailed algorithm is presented to solve the problem. We provide results that illustrate the effectiveness of our algorithm.
{"title":"A retiming based relaxation heuristic for resource-constrained loop pipelining","authors":"V. Srinivasan, R. Vemuri","doi":"10.1109/ICVD.1998.646646","DOIUrl":"https://doi.org/10.1109/ICVD.1998.646646","url":null,"abstract":"This paper presents a fast and efficient heuristic for pipelining a loop under resource-constraints. The loop is represented as a dependence graph, G whose nodes are operations that are bound to available resources and edges denote the data dependencies between the operations. The data dependencies restrict the degree of parallelism that can be achieved while scheduling the graph. We propose a fast retiming based graph transformation technique which relates the data dependencies in the graph while maintaining functional equivalence. Relaxing data dependencies provides more flexibility for the scheduler to schedule operations, thereby leading to faster throughput. Our objective is to obtain a retimed graph which when scheduled achieves an optimal/near-optimal pipelined steady state throughput. A detailed algorithm is presented to solve the problem. We provide results that illustrate the effectiveness of our algorithm.","PeriodicalId":139023,"journal":{"name":"Proceedings Eleventh International Conference on VLSI Design","volume":"163 12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129242908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}