Pub Date : 1998-12-02DOI: 10.1109/ISSS.1998.730616
Karam S. Chatha, R. Vemuri
We present a tool for synthesis of pipelined implementations of hardware-software systems. The tool uses iterative hardware-software partitioning and pipelined scheduling to obtain optimal partitions which satisfy the timing and area constraints. The partitioner uses a branch and bound approach with a unique objective function which minimizes the initiation interval of the final design. It takes communication time and hardware sharing into account. This paper also presents techniques for generation of good initial solution and search space bounding for the partitioning algorithm. A candidate partition is evaluated by generating its pipelined schedule. The scheduler uses a list based scheduler and a retiming transformation to optimize the initiation interval, number of pipeline stages and memory requirements of a particular design alternative. The effectiveness of the tool is demonstrated by experimentation.
{"title":"A tool for partitioning and pipelined scheduling of hardware-software systems","authors":"Karam S. Chatha, R. Vemuri","doi":"10.1109/ISSS.1998.730616","DOIUrl":"https://doi.org/10.1109/ISSS.1998.730616","url":null,"abstract":"We present a tool for synthesis of pipelined implementations of hardware-software systems. The tool uses iterative hardware-software partitioning and pipelined scheduling to obtain optimal partitions which satisfy the timing and area constraints. The partitioner uses a branch and bound approach with a unique objective function which minimizes the initiation interval of the final design. It takes communication time and hardware sharing into account. This paper also presents techniques for generation of good initial solution and search space bounding for the partitioning algorithm. A candidate partition is evaluated by generating its pipelined schedule. The scheduler uses a list based scheduler and a retiming transformation to optimize the initiation interval, number of pipeline stages and memory requirements of a particular design alternative. The effectiveness of the tool is demonstrated by experimentation.","PeriodicalId":305333,"journal":{"name":"Proceedings. 11th International Symposium on System Synthesis (Cat. No.98EX210)","volume":"321 24","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120880575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-12-02DOI: 10.1109/ISSS.1998.730614
Ing-Jer Huang, Ping-Huei Xie
Designing a cost effective superscalar architecture for x86 compatible microprocessors is a challenging task in terms of both technical difficulty and commercial value. One of the important design issues is the measurements of the distribution of functional unit usage and the micro operation level parallelism (MLP), which together determine the proper allocation of functional units in the superscalar architecture. To obtain such measurements, an x86 instruction set CAD system x86 Workshop is developed, which consists of both instruction set analysis and optimization tools. x86 Workshop has been applied to analyze several popular Windows95 applications such as Word, Excel, Communicator etc. The MLP and distribution of functional unit usage are measured for these applications. The measurements are used to evaluate several existing x86 superscalar processors and suggest future extension.
{"title":"Application of instruction analysis/synthesis tools to x86's functional unit allocation","authors":"Ing-Jer Huang, Ping-Huei Xie","doi":"10.1109/ISSS.1998.730614","DOIUrl":"https://doi.org/10.1109/ISSS.1998.730614","url":null,"abstract":"Designing a cost effective superscalar architecture for x86 compatible microprocessors is a challenging task in terms of both technical difficulty and commercial value. One of the important design issues is the measurements of the distribution of functional unit usage and the micro operation level parallelism (MLP), which together determine the proper allocation of functional units in the superscalar architecture. To obtain such measurements, an x86 instruction set CAD system x86 Workshop is developed, which consists of both instruction set analysis and optimization tools. x86 Workshop has been applied to analyze several popular Windows95 applications such as Word, Excel, Communicator etc. The MLP and distribution of functional unit usage are measured for these applications. The measurements are used to evaluate several existing x86 superscalar processors and suggest future extension.","PeriodicalId":305333,"journal":{"name":"Proceedings. 11th International Symposium on System Synthesis (Cat. No.98EX210)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127436734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-12-02DOI: 10.1109/ISSS.1998.730611
T. Givargis, F. Vahid
Reducing power dissipation is becoming more important in the design of embedded systems. Core-based system design opens up the opportunity for exploring different bus interfaces in order to optimize for reduced power. We give a first approach for exploring a range of possible bus configurations, such as width and coding schemes, for a given set of communication channels. Our approach uses power estimation formulas, for fast performance. We use this approach to explore different bus interfaces for a real GPS navigation system in order to select the optimal bus interface for minimum power consumption.
{"title":"Interface exploration for reduced power in core-based systems","authors":"T. Givargis, F. Vahid","doi":"10.1109/ISSS.1998.730611","DOIUrl":"https://doi.org/10.1109/ISSS.1998.730611","url":null,"abstract":"Reducing power dissipation is becoming more important in the design of embedded systems. Core-based system design opens up the opportunity for exploring different bus interfaces in order to optimize for reduced power. We give a first approach for exploring a range of possible bus configurations, such as width and coding schemes, for a given set of communication channels. Our approach uses power estimation formulas, for fast performance. We use this approach to explore different bus interfaces for a real GPS navigation system in order to select the optimal bus interface for minimum power consumption.","PeriodicalId":305333,"journal":{"name":"Proceedings. 11th International Symposium on System Synthesis (Cat. No.98EX210)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116632037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-12-02DOI: 10.1109/ISSS.1998.730612
T. Okuma, H. Tomiyama, A. Inoue, E. Fajar, H. Yasuura
In this paper we propose instruction encoding techniques for embedded system design, which encode immediate fields of instructions to reduce the size of an instruction memory. Although our proposed techniques require an additional decoder for the encoded immediate values, experimental results demonstrate the effectiveness of our techniques to reduce the chip area.
{"title":"Instruction encoding techniques for area minimization of instruction ROM","authors":"T. Okuma, H. Tomiyama, A. Inoue, E. Fajar, H. Yasuura","doi":"10.1109/ISSS.1998.730612","DOIUrl":"https://doi.org/10.1109/ISSS.1998.730612","url":null,"abstract":"In this paper we propose instruction encoding techniques for embedded system design, which encode immediate fields of instructions to reduce the size of an instruction memory. Although our proposed techniques require an additional decoder for the encoded immediate values, experimental results demonstrate the effectiveness of our techniques to reduce the chip area.","PeriodicalId":305333,"journal":{"name":"Proceedings. 11th International Symposium on System Synthesis (Cat. No.98EX210)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114444704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-12-02DOI: 10.1109/ISSS.1998.730591
W. Cheng, Y. Lin
Since most DSP applications access large amount of data stored in the memory, a DSP code generator must minimize the addressing overhead. In this paper, we propose a method for addressing optimization in loop execution targeted toward DSP processors with auto-increment/decrement feature in their address generation unit. Our optimization methods include a multi-phase data ordering and a graph-based address register allocation. The proposed approaches have been evaluated using a set of core algorithms targeted towards the TI TMS320C40 DSP processor. Experimental results show that our system is indeed more effective compared to a commercial optimizing DSP compiler.
{"title":"Addressing optimization for loop execution targeting DSP with auto-increment/decrement architecture","authors":"W. Cheng, Y. Lin","doi":"10.1109/ISSS.1998.730591","DOIUrl":"https://doi.org/10.1109/ISSS.1998.730591","url":null,"abstract":"Since most DSP applications access large amount of data stored in the memory, a DSP code generator must minimize the addressing overhead. In this paper, we propose a method for addressing optimization in loop execution targeted toward DSP processors with auto-increment/decrement feature in their address generation unit. Our optimization methods include a multi-phase data ordering and a graph-based address register allocation. The proposed approaches have been evaluated using a set of core algorithms targeted towards the TI TMS320C40 DSP processor. Experimental results show that our system is indeed more effective compared to a commercial optimizing DSP compiler.","PeriodicalId":305333,"journal":{"name":"Proceedings. 11th International Symposium on System Synthesis (Cat. No.98EX210)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122876752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-12-02DOI: 10.1109/ISSS.1998.730597
A. Kountouris, C. Wolinski
False path analysis is an activity with applications in a variety of computer science and engineering domains like for instance high-level synthesis, worst case execution time estimation, software testing etc. In this paper a method to automate false path analysis, based on a control flow graph connected to a hierarchical BDD based control representation, is described. By its ability to reason on predicate expressions involving arithmetic inequalities, this method overcomes certain limitations of previous approaches. Preliminary experimental results confirm its effectiveness.
{"title":"False path analysis based on a hierarchical control representation","authors":"A. Kountouris, C. Wolinski","doi":"10.1109/ISSS.1998.730597","DOIUrl":"https://doi.org/10.1109/ISSS.1998.730597","url":null,"abstract":"False path analysis is an activity with applications in a variety of computer science and engineering domains like for instance high-level synthesis, worst case execution time estimation, software testing etc. In this paper a method to automate false path analysis, based on a control flow graph connected to a hierarchical BDD based control representation, is described. By its ability to reason on predicate expressions involving arithmetic inequalities, this method overcomes certain limitations of previous approaches. Preliminary experimental results confirm its effectiveness.","PeriodicalId":305333,"journal":{"name":"Proceedings. 11th International Symposium on System Synthesis (Cat. No.98EX210)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130112361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-12-02DOI: 10.1109/ISSS.1998.730602
Allan Rae, S. Parameswaran
This paper presents an application-specific, heterogeneous multiprocessor synthesis system, named HeMPS, that combines a form of Evolutionary Computation known as Differential Evolution with a scheduling heuristic to search the design space efficiently. We demonstrate the effectiveness of our technique by comparing it to similar existing systems. The proposed strategy is shown to be faster than recent systems on large problems while providing equivalent or improved final solutions.
{"title":"Application-specific heterogeneous multiprocessor synthesis using differential-evolution","authors":"Allan Rae, S. Parameswaran","doi":"10.1109/ISSS.1998.730602","DOIUrl":"https://doi.org/10.1109/ISSS.1998.730602","url":null,"abstract":"This paper presents an application-specific, heterogeneous multiprocessor synthesis system, named HeMPS, that combines a form of Evolutionary Computation known as Differential Evolution with a scheduling heuristic to search the design space efficiently. We demonstrate the effectiveness of our technique by comparing it to similar existing systems. The proposed strategy is shown to be faster than recent systems on large problems while providing equivalent or improved final solutions.","PeriodicalId":305333,"journal":{"name":"Proceedings. 11th International Symposium on System Synthesis (Cat. No.98EX210)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116622989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-12-02DOI: 10.1109/ISSS.1998.730592
D. Keitel-Schulz, N. Wehn
After being niche markets for several years, application markets for one-chip integration of large DRAMs and logic circuits are growing very rapidly as the transition to 0.25 /spl mu/m technologies will offer customers up to 128 Mbit of embedded DRAM and 500 Kgates logic. However embedded DRAM implies many technical challenges to be solved. In this paper we will address some of these technical issues in more detail.
{"title":"Issues in embedded DRAM development and applications","authors":"D. Keitel-Schulz, N. Wehn","doi":"10.1109/ISSS.1998.730592","DOIUrl":"https://doi.org/10.1109/ISSS.1998.730592","url":null,"abstract":"After being niche markets for several years, application markets for one-chip integration of large DRAMs and logic circuits are growing very rapidly as the transition to 0.25 /spl mu/m technologies will offer customers up to 128 Mbit of embedded DRAM and 500 Kgates logic. However embedded DRAM implies many technical challenges to be solved. In this paper we will address some of these technical issues in more detail.","PeriodicalId":305333,"journal":{"name":"Proceedings. 11th International Symposium on System Synthesis (Cat. No.98EX210)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129215977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-12-02DOI: 10.1109/ISSS.1998.730601
Y. Hwang, Yuan-Hung Wang
In this paper, we propose the target board architecture of a rapid prototyping embedded system based on hardware software codesign. The target board contains a TMS320C30 DSP processor and up to four Xilinx XC4025E FPGAs. Various communication channels between the C30 and the FPGAs are provided and a master-master computing paradigm is supported HW/SW communication protocols, ranging from handshaking, batch to queue controlled, as well as the corresponding interfaces are described in VHDL and C codes respectively and can be easily augmented to the mapped design. A codesign implementation example based on G.728 LD-CELP speech decoder shows the proposed communication protocols and interfaces lead to very small time and circuitry overhead.
{"title":"Communication and interface synthesis on a rapid prototyping hardware/software codesign system","authors":"Y. Hwang, Yuan-Hung Wang","doi":"10.1109/ISSS.1998.730601","DOIUrl":"https://doi.org/10.1109/ISSS.1998.730601","url":null,"abstract":"In this paper, we propose the target board architecture of a rapid prototyping embedded system based on hardware software codesign. The target board contains a TMS320C30 DSP processor and up to four Xilinx XC4025E FPGAs. Various communication channels between the C30 and the FPGAs are provided and a master-master computing paradigm is supported HW/SW communication protocols, ranging from handshaking, batch to queue controlled, as well as the corresponding interfaces are described in VHDL and C codes respectively and can be easily augmented to the mapped design. A codesign implementation example based on G.728 LD-CELP speech decoder shows the proposed communication protocols and interfaces lead to very small time and circuitry overhead.","PeriodicalId":305333,"journal":{"name":"Proceedings. 11th International Symposium on System Synthesis (Cat. No.98EX210)","volume":"81 3 Suppl 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116398612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-12-02DOI: 10.1109/ISSS.1998.730605
F. Catthoor, D. Verkest, E. Brockmeyer
This paper describes an attempt to bring together the many different system design flows existing in architecture and system design technology research, into a more abstract but unifying meta flow. Many existing system and architecture design flows have a strong resemblance and unnecessary overlap. Mainly due to a lack of a common and consistent terminology coupled to a common reference basis, it is now nearly impossible to compare and reuse (sub)steps. In addition, there is a too strong separation between research in different communities. To alleviate this problem, we introduce a more abstract but unifying meta flow which attempts to bridge the gap between the existing flows. From this meta flow, a particular design flow can be instantiated for a given application (domain) by leaving out the non-required stages/steps, by selecting a (sub)step sequence which is compatible with the partial meta-flow order, and by selecting the appropriate technique for all remaining (sub)steps (e.g. the type of scheduler). This paper focuses on the principles at the task- and instruction-level abstractions. It also provides an illustration of the pourer of the meta-flow principles for a realistic multi-media compression demonstrator from the MPEG4 context.
{"title":"Proposal for unified system design meta flow in task-level and instruction-level design technology research for multi-media applications","authors":"F. Catthoor, D. Verkest, E. Brockmeyer","doi":"10.1109/ISSS.1998.730605","DOIUrl":"https://doi.org/10.1109/ISSS.1998.730605","url":null,"abstract":"This paper describes an attempt to bring together the many different system design flows existing in architecture and system design technology research, into a more abstract but unifying meta flow. Many existing system and architecture design flows have a strong resemblance and unnecessary overlap. Mainly due to a lack of a common and consistent terminology coupled to a common reference basis, it is now nearly impossible to compare and reuse (sub)steps. In addition, there is a too strong separation between research in different communities. To alleviate this problem, we introduce a more abstract but unifying meta flow which attempts to bridge the gap between the existing flows. From this meta flow, a particular design flow can be instantiated for a given application (domain) by leaving out the non-required stages/steps, by selecting a (sub)step sequence which is compatible with the partial meta-flow order, and by selecting the appropriate technique for all remaining (sub)steps (e.g. the type of scheduler). This paper focuses on the principles at the task- and instruction-level abstractions. It also provides an illustration of the pourer of the meta-flow principles for a realistic multi-media compression demonstrator from the MPEG4 context.","PeriodicalId":305333,"journal":{"name":"Proceedings. 11th International Symposium on System Synthesis (Cat. No.98EX210)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117266395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}