Pub Date : 1997-11-13DOI: 10.1109/ICCAD.1997.643573
J. Cong, H. Li, S. Lim, Toshiyuki Shibuya, D. Xu
In this paper, we present an efficient Iterative Improvement based Partitioning (IIP) algorithm called LSR/MFFS, that combines signal flow based Maximum Fanout Free Subgraph (MFFS) clustering algorithm with Loose and Stable net Removal (LSR) partitioning algorithm. The MFFS algorithm generalizes existing MFFC decomposition method from combinational circuits to general sequential circuits in order to handle cycles naturally. We also study the properties of the nets that straddle the cutline carefully, and introduce the concepts of the loose and stable nets as well as effective ways to remove them out of the cutset. The LSR/MFFS algorithm first applies LSR algorithm to clustered netlist generated by MFFS algorithm for global-level cutsize optimization and then declusters netlist for further cutsize refinement. As a result, the LSR/MFFS algorithm has achieved the best cutsize result among all the bipartitioning algorithms published in the literatures with very promising runtime performance. In particular, it outperforms the recent state-of-the-art IIP algorithms LA3-CDIP, CLIP-PROP/sub f/, Strawman, hMetis-FM, and MLc by 17.4%, 12.1%, 5.9%, 3.1%, and 1.9%, respectively. It also outperforms the state-of-the-art non-IIP algorithms Paraboli, FEB, and PANZA by 32.0%, 21.4%, and 1.4%, respectively.
{"title":"Large scale circuit partitioning with loose/stable net removal and signal flow based clustering","authors":"J. Cong, H. Li, S. Lim, Toshiyuki Shibuya, D. Xu","doi":"10.1109/ICCAD.1997.643573","DOIUrl":"https://doi.org/10.1109/ICCAD.1997.643573","url":null,"abstract":"In this paper, we present an efficient Iterative Improvement based Partitioning (IIP) algorithm called LSR/MFFS, that combines signal flow based Maximum Fanout Free Subgraph (MFFS) clustering algorithm with Loose and Stable net Removal (LSR) partitioning algorithm. The MFFS algorithm generalizes existing MFFC decomposition method from combinational circuits to general sequential circuits in order to handle cycles naturally. We also study the properties of the nets that straddle the cutline carefully, and introduce the concepts of the loose and stable nets as well as effective ways to remove them out of the cutset. The LSR/MFFS algorithm first applies LSR algorithm to clustered netlist generated by MFFS algorithm for global-level cutsize optimization and then declusters netlist for further cutsize refinement. As a result, the LSR/MFFS algorithm has achieved the best cutsize result among all the bipartitioning algorithms published in the literatures with very promising runtime performance. In particular, it outperforms the recent state-of-the-art IIP algorithms LA3-CDIP, CLIP-PROP/sub f/, Strawman, hMetis-FM, and MLc by 17.4%, 12.1%, 5.9%, 3.1%, and 1.9%, respectively. It also outperforms the state-of-the-art non-IIP algorithms Paraboli, FEB, and PANZA by 32.0%, 21.4%, and 1.4%, respectively.","PeriodicalId":187521,"journal":{"name":"1997 Proceedings of IEEE International Conference on Computer Aided Design (ICCAD)","volume":"211 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127040287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-11-13DOI: 10.1109/ICCAD.1997.643404
Y. Kukimoto, W. Gosti, A. Saldanha, R. Brayton
This paper is concerned with approximate delay computation algorithms for combinational circuits. As a result of intensive research in the early 90's efficient tools exist which can analyze circuits of thousands of gates in a few minutes or even in seconds for many cases. However, the computation time of these tools is not so predictable since the internal engine of the analysis is either a SAT solver or a modified ATPG algorithm, both of which are just heuristic algorithms for an NP-complete problem. Although they are highly tuned for CAD applications, there exists a class of problem instances which exhibits the worst-case exponential CPU time behavior. In the context of timing analysis, circuits with a high amount of reconvergence, e.g. C6288 of the ISCAS benchmark suite, are known to be difficult to analyze under sophisticated delay models even with state-of-the-art techniques. To make timing analysis of such corner case circuits feasible we propose an approximate computation scheme to the timing analysis problem as an extension to the exact analysis method proposed previously. Sensitization conditions are conservatively approximated in a selective fashion so that the size of SAT problems solved during analysis is controlled. Experimental results show that the approximation technique is effective in reducing the total analysis time without losing accuracy for the case where the exact approach takes much time or cannot complete.
{"title":"Approximate timing analysis of combinational circuits under the XBDO model","authors":"Y. Kukimoto, W. Gosti, A. Saldanha, R. Brayton","doi":"10.1109/ICCAD.1997.643404","DOIUrl":"https://doi.org/10.1109/ICCAD.1997.643404","url":null,"abstract":"This paper is concerned with approximate delay computation algorithms for combinational circuits. As a result of intensive research in the early 90's efficient tools exist which can analyze circuits of thousands of gates in a few minutes or even in seconds for many cases. However, the computation time of these tools is not so predictable since the internal engine of the analysis is either a SAT solver or a modified ATPG algorithm, both of which are just heuristic algorithms for an NP-complete problem. Although they are highly tuned for CAD applications, there exists a class of problem instances which exhibits the worst-case exponential CPU time behavior. In the context of timing analysis, circuits with a high amount of reconvergence, e.g. C6288 of the ISCAS benchmark suite, are known to be difficult to analyze under sophisticated delay models even with state-of-the-art techniques. To make timing analysis of such corner case circuits feasible we propose an approximate computation scheme to the timing analysis problem as an extension to the exact analysis method proposed previously. Sensitization conditions are conservatively approximated in a selective fashion so that the size of SAT problems solved during analysis is controlled. Experimental results show that the approximation technique is effective in reducing the total analysis time without losing accuracy for the case where the exact approach takes much time or cannot complete.","PeriodicalId":187521,"journal":{"name":"1997 Proceedings of IEEE International Conference on Computer Aided Design (ICCAD)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115367059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-11-13DOI: 10.1109/ICCAD.1997.643569
C. Papachristou, M. Baklashov
This paper presents a test synthesis technique for behavioral descriptions. The technique is guided by two testability metrics which quantify the controllability and observability of behavioral variables and structural signals. The method is based on utilizing redundant register transfers in the data path to produce a test behavior with better controllability and observability properties. This approach can avoid unnecessary insertions of test structures in the data path. A test scheme for conditional statements has been developed involving minimal changes in the controller. Our experimental results show improvements in fault coverage at modest hardware overhead.
{"title":"A test synthesis technique using redundant register transfers","authors":"C. Papachristou, M. Baklashov","doi":"10.1109/ICCAD.1997.643569","DOIUrl":"https://doi.org/10.1109/ICCAD.1997.643569","url":null,"abstract":"This paper presents a test synthesis technique for behavioral descriptions. The technique is guided by two testability metrics which quantify the controllability and observability of behavioral variables and structural signals. The method is based on utilizing redundant register transfers in the data path to produce a test behavior with better controllability and observability properties. This approach can avoid unnecessary insertions of test structures in the data path. A test scheme for conditional statements has been developed involving minimal changes in the controller. Our experimental results show improvements in fault coverage at modest hardware overhead.","PeriodicalId":187521,"journal":{"name":"1997 Proceedings of IEEE International Conference on Computer Aided Design (ICCAD)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133613793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-11-13DOI: 10.1109/ICCAD.1997.643608
S. Gavrilov, A. Glebov, S. Pullela, S. C. Moore, A. Dharchoudhury, R. Panda, G. Vijayan, D. Blaauw
Traditional synthesis techniques optimize CMOS circuits in two phases: i) logic minimization and ii) library mapping phase. Typically, the structures and the sizes of the gates in the library are chosen to yield good synthesis results over many blocks or even for an entire chip. Consequently this approach precludes an optimal design of individual blocks which may need custom structures. The authors present a new transistor level technique that optimizes CMOS circuits both structurally and size-wise. The technique is independent of a library and hence can explore a design space much larger than that possible due to gate level optimization. Results demonstrate a significant improvement in circuit performance of the resynthesized circuits.
{"title":"Library-less synthesis for static CMOS combinational logic circuits","authors":"S. Gavrilov, A. Glebov, S. Pullela, S. C. Moore, A. Dharchoudhury, R. Panda, G. Vijayan, D. Blaauw","doi":"10.1109/ICCAD.1997.643608","DOIUrl":"https://doi.org/10.1109/ICCAD.1997.643608","url":null,"abstract":"Traditional synthesis techniques optimize CMOS circuits in two phases: i) logic minimization and ii) library mapping phase. Typically, the structures and the sizes of the gates in the library are chosen to yield good synthesis results over many blocks or even for an entire chip. Consequently this approach precludes an optimal design of individual blocks which may need custom structures. The authors present a new transistor level technique that optimizes CMOS circuits both structurally and size-wise. The technique is independent of a library and hence can explore a design space much larger than that possible due to gate level optimization. Results demonstrate a significant improvement in circuit performance of the resynthesized circuits.","PeriodicalId":187521,"journal":{"name":"1997 Proceedings of IEEE International Conference on Computer Aided Design (ICCAD)","volume":"656 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132156509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-11-13DOI: 10.1109/ICCAD.1997.643621
R. Kurshan, V. Levin, M. Minea, D. Peled, Hüsnü Yenigün
We describe a method for verifying hardware whose correct behaviour depends upon its software interface. It is presumed that the hardware is presented as a synchronous RTL model whereas the software is presented as an asynchronous abstraction. Our methodology incorporates partial order reduction on the software side, and localization reduction, to deal with the computational complexity of the verification. The partial order reduction is implemented as a constraint on the transition relation of a synchronous transformation of the software model. The reduced transformed model then may be verified using a verification algorithm whose scope is purely synchronous models, without modification. Thus, independent of the interface verification problem, this gives a general method for combining partial order reduction with symbolic model checking.
{"title":"Verifying hardware in its software context","authors":"R. Kurshan, V. Levin, M. Minea, D. Peled, Hüsnü Yenigün","doi":"10.1109/ICCAD.1997.643621","DOIUrl":"https://doi.org/10.1109/ICCAD.1997.643621","url":null,"abstract":"We describe a method for verifying hardware whose correct behaviour depends upon its software interface. It is presumed that the hardware is presented as a synchronous RTL model whereas the software is presented as an asynchronous abstraction. Our methodology incorporates partial order reduction on the software side, and localization reduction, to deal with the computational complexity of the verification. The partial order reduction is implemented as a constraint on the transition relation of a synchronous transformation of the software model. The reduced transformed model then may be verified using a verification algorithm whose scope is purely synchronous models, without modification. Thus, independent of the interface verification problem, this gives a general method for combining partial order reduction with symbolic model checking.","PeriodicalId":187521,"journal":{"name":"1997 Proceedings of IEEE International Conference on Computer Aided Design (ICCAD)","volume":" 20","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134506150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-11-13DOI: 10.1109/ICCAD.1997.643620
D. Kagaris, S. Tragoudas
We present a polynomial time algorithm that finds the maximum weighted independent set of a transitive graph. The studied problem finds applications in a variety of VLSI contexts, including path delay fault testing, scheduling in high level synthesis and channel routing in physical design automation. The algorithm has been implemented and incorporated in a CAD tool for path delay fault testing. We experimentally verify its impact in the latter context.
{"title":"Maximum independent sets on transitive graphs and their applications in testing and CAD","authors":"D. Kagaris, S. Tragoudas","doi":"10.1109/ICCAD.1997.643620","DOIUrl":"https://doi.org/10.1109/ICCAD.1997.643620","url":null,"abstract":"We present a polynomial time algorithm that finds the maximum weighted independent set of a transitive graph. The studied problem finds applications in a variety of VLSI contexts, including path delay fault testing, scheduling in high level synthesis and channel routing in physical design automation. The algorithm has been implemented and incorporated in a CAD tool for path delay fault testing. We experimentally verify its impact in the latter context.","PeriodicalId":187521,"journal":{"name":"1997 Proceedings of IEEE International Conference on Computer Aided Design (ICCAD)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130062898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-11-13DOI: 10.1109/ICCAD.1997.643399
A. Devgan
Noise analysis and avoidance is an increasingly critical step in deep submicron design. Ever increasing requirements on performance have led to widespread use of dynamic logic circuit families and its other derivatives. These aggressive circuit families trade off noise margin for timing performance making them more susceptible to noise failure and increasing the need for noise analysis. Currently, noise analysis is performed either through circuit or timing simulation or through model order reduction. These techniques in use are still inefficient for analyzing massive amount of interconnect data found in present day integrated circuits. This paper presents efficient techniques for estimation of coupled noise in on-chip interconnects. This noise estimation metric is an upper bound for RC circuits, being similar in spirit to Elmore delay in timing analysis. Such an efficient noise metric is especially useful for noise criticality pruning and physical design based noise avoidance techniques.
{"title":"Efficient coupled noise estimation for on-chip interconnects","authors":"A. Devgan","doi":"10.1109/ICCAD.1997.643399","DOIUrl":"https://doi.org/10.1109/ICCAD.1997.643399","url":null,"abstract":"Noise analysis and avoidance is an increasingly critical step in deep submicron design. Ever increasing requirements on performance have led to widespread use of dynamic logic circuit families and its other derivatives. These aggressive circuit families trade off noise margin for timing performance making them more susceptible to noise failure and increasing the need for noise analysis. Currently, noise analysis is performed either through circuit or timing simulation or through model order reduction. These techniques in use are still inefficient for analyzing massive amount of interconnect data found in present day integrated circuits. This paper presents efficient techniques for estimation of coupled noise in on-chip interconnects. This noise estimation metric is an upper bound for RC circuits, being similar in spirit to Elmore delay in timing analysis. Such an efficient noise metric is especially useful for noise criticality pruning and physical design based noise avoidance techniques.","PeriodicalId":187521,"journal":{"name":"1997 Proceedings of IEEE International Conference on Computer Aided Design (ICCAD)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125205786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-11-13DOI: 10.1109/ICCAD.1997.643599
S. Dey, S. Bommu
Efficient exploration of the system design space necessitates fast and accurate performance estimation as opposed to the computationally prohibitive alternative of exhaustive simulation. The paper addresses the issue of worst case performance analysis of a system described as a set of concurrent communicating processes. We show that the synchronization overhead associated with inter process communication can contribute significantly to the overall system performance. Application of existing performance analysis techniques, which target single process descriptions, lead to inaccurate performance estimates as the synchronization overhead is not accounted for. We present PERC, a fast and accurate worst case performance analysis technique which analyzes inter process communication, and accounts for synchronization overhead while computing the worst case performance estimate of a given system implementation. Application of PERC to example systems described as multiple communicating processes shows the ability of the proposed method to accurately estimate the worst case performance of the system implementation.
{"title":"Performance analysis of a system of communicating processes","authors":"S. Dey, S. Bommu","doi":"10.1109/ICCAD.1997.643599","DOIUrl":"https://doi.org/10.1109/ICCAD.1997.643599","url":null,"abstract":"Efficient exploration of the system design space necessitates fast and accurate performance estimation as opposed to the computationally prohibitive alternative of exhaustive simulation. The paper addresses the issue of worst case performance analysis of a system described as a set of concurrent communicating processes. We show that the synchronization overhead associated with inter process communication can contribute significantly to the overall system performance. Application of existing performance analysis techniques, which target single process descriptions, lead to inaccurate performance estimates as the synchronization overhead is not accounted for. We present PERC, a fast and accurate worst case performance analysis technique which analyzes inter process communication, and accounts for synchronization overhead while computing the worst case performance estimate of a given system implementation. Application of PERC to example systems described as multiple communicating processes shows the ability of the proposed method to accurately estimate the worst case performance of the system implementation.","PeriodicalId":187521,"journal":{"name":"1997 Proceedings of IEEE International Conference on Computer Aided Design (ICCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130835842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-11-13DOI: 10.1109/ICCAD.1997.643370
Tuyen V. Nguyen, Jing Li
This paper presents a general rational block Lanczos algorithm for computing multipoint matrix Pade approximation of linear multiport networks, which model many important circuits in digital, analog, or mixed signal designs. This algorithm generalizes a novel block Lanczos algorithm with a reliable adaptive scheme for breakdown treatment to address two drawbacks of the single frequency Pade approximation: poor approximation of the transfer function in the frequency domain far away from the expansion point and the instability of the reduced model when the original system is stable. In addition, due to smaller Krylov subspace corresponding to each frequency point, the rational algorithm also alleviates the possible breakdowns when completing high order approximations. The cost of full backward orthogonalization with respect to all previous Lanczos vectors in a rational Lanczos algorithm, as compared to a partial backward orthogonalization in a single point Lanczos algorithm, is offset by more accurate and smaller order approximations.
{"title":"Multipoint Pade approximation using a rational block Lanczos algorithm","authors":"Tuyen V. Nguyen, Jing Li","doi":"10.1109/ICCAD.1997.643370","DOIUrl":"https://doi.org/10.1109/ICCAD.1997.643370","url":null,"abstract":"This paper presents a general rational block Lanczos algorithm for computing multipoint matrix Pade approximation of linear multiport networks, which model many important circuits in digital, analog, or mixed signal designs. This algorithm generalizes a novel block Lanczos algorithm with a reliable adaptive scheme for breakdown treatment to address two drawbacks of the single frequency Pade approximation: poor approximation of the transfer function in the frequency domain far away from the expansion point and the instability of the reduced model when the original system is stable. In addition, due to smaller Krylov subspace corresponding to each frequency point, the rational algorithm also alleviates the possible breakdowns when completing high order approximations. The cost of full backward orthogonalization with respect to all previous Lanczos vectors in a rational Lanczos algorithm, as compared to a partial backward orthogonalization in a single point Lanczos algorithm, is offset by more accurate and smaller order approximations.","PeriodicalId":187521,"journal":{"name":"1997 Proceedings of IEEE International Conference on Computer Aided Design (ICCAD)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130996772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Memory-intensive behaviors often contain large arrays that are synthesized into off-chip memories. With the increasing gap between on-chip and off-chip memory access delays, it is imperative to exploit the efficient access mode features of modern-day memories (e.g. page-mode DRAMs) in order to alleviate the memory bandwidth bottleneck. Our work addresses this issue by: (a) modeling realistic off-chip memory access modes for High-level Synthesis (HLS), (b) presenting algorithms to infer applicability of HLS with these memory access modes, and (c) transforming input behavior to provide further memory access optimizations during HLS. We demonstrate the utility of our approach using a suite of memory-intensive benchmarks with a realistic DRAM library module. Experimental results show a significant performance improvement (more than 40%) as a result of our optimization techniques.
{"title":"Exploiting off-chip memory access modes in high-level synthesis","authors":"P. Panda, N. Dutt, A. Nicolau","doi":"10.5555/266388.266503","DOIUrl":"https://doi.org/10.5555/266388.266503","url":null,"abstract":"Memory-intensive behaviors often contain large arrays that are synthesized into off-chip memories. With the increasing gap between on-chip and off-chip memory access delays, it is imperative to exploit the efficient access mode features of modern-day memories (e.g. page-mode DRAMs) in order to alleviate the memory bandwidth bottleneck. Our work addresses this issue by: (a) modeling realistic off-chip memory access modes for High-level Synthesis (HLS), (b) presenting algorithms to infer applicability of HLS with these memory access modes, and (c) transforming input behavior to provide further memory access optimizations during HLS. We demonstrate the utility of our approach using a suite of memory-intensive benchmarks with a realistic DRAM library module. Experimental results show a significant performance improvement (more than 40%) as a result of our optimization techniques.","PeriodicalId":187521,"journal":{"name":"1997 Proceedings of IEEE International Conference on Computer Aided Design (ICCAD)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133683041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}