At the Electronic System Level (ESL), a well-defined design model enables early design space exploration and automatic synthesis on custom multiprocessor platforms. However, the initial design model is usually manually recoded from unstructured and sequential source code. To efficiently create cleanly structured and parallel models, this paper proposes a designer-in-the-loop approach on Eclipse platform where the system model is analyzed and recoded using automated functions. Particularly, advanced static analysis at compile time can guarantee that the parallelism in the model is safe and free from race conditions. Experiments using the tool with a class of graduate students show significant productivity gains and error reduction in model creation.
{"title":"Designer-in-the-loop recoding of ESL models using static parallel access conflict analysis","authors":"Xu Han, Weiwei Chen, R. Dömer","doi":"10.1145/2463596.2463599","DOIUrl":"https://doi.org/10.1145/2463596.2463599","url":null,"abstract":"At the Electronic System Level (ESL), a well-defined design model enables early design space exploration and automatic synthesis on custom multiprocessor platforms. However, the initial design model is usually manually recoded from unstructured and sequential source code. To efficiently create cleanly structured and parallel models, this paper proposes a designer-in-the-loop approach on Eclipse platform where the system model is analyzed and recoded using automated functions. Particularly, advanced static analysis at compile time can guarantee that the parallelism in the model is safe and free from race conditions. Experiments using the tool with a class of graduate students show significant productivity gains and error reduction in model creation.","PeriodicalId":344517,"journal":{"name":"M-SCOPES","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115230555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Jahn, S. Kobbe, Santiago Pagani, Jian-Jia Chen, J. Henkel
Systems continue to comprise a rapidly growing number of cores on a single chip to gain performance benefits from parallel processing. A key challenge is how their computational resources can be used efficiently, which depends to a large degree on how their resources are allocated to the applications. In this paper, we describe our current research for addressing this challenge and highlight current and upcoming hurdles that need to be addressed.
{"title":"Runtime resource allocation for software pipelines","authors":"J. Jahn, S. Kobbe, Santiago Pagani, Jian-Jia Chen, J. Henkel","doi":"10.1145/2463596.2486156","DOIUrl":"https://doi.org/10.1145/2463596.2486156","url":null,"abstract":"Systems continue to comprise a rapidly growing number of cores on a single chip to gain performance benefits from parallel processing. A key challenge is how their computational resources can be used efficiently, which depends to a large degree on how their resources are allocated to the applications. In this paper, we describe our current research for addressing this challenge and highlight current and upcoming hurdles that need to be addressed.","PeriodicalId":344517,"journal":{"name":"M-SCOPES","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114225298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Virtualized runtime environments like Java Virtual Machine (JVM) or Microsoft .NET's Common Language Runtime (CLR) introduce additional challenges to real-time software development. Since applications for such environments are usually deployed in platform independent intermediate code, one issue is the timing of code transformation from intermediate code into native code. We have developed a solution for this problem, so that code transformation is suitable for real-time systems. It combines pre-compilation of intermediate code with the elimination of indirect references in native code. The gain of determinism comes with an increased application startup time. In this paper we present an optimization that utilizes an Ahead-of-Time compiler to reduce the startup time while keeping the real-time suitable timing behaviour. In an experiment we compare our approach with existing ones and demonstrate its benefits for certain application cases.
{"title":"Reducing startup time of a deterministic virtualizing runtime environment","authors":"Martin Däumler, Matthias Werner","doi":"10.1145/2463596.2463604","DOIUrl":"https://doi.org/10.1145/2463596.2463604","url":null,"abstract":"Virtualized runtime environments like Java Virtual Machine (JVM) or Microsoft .NET's Common Language Runtime (CLR) introduce additional challenges to real-time software development. Since applications for such environments are usually deployed in platform independent intermediate code, one issue is the timing of code transformation from intermediate code into native code. We have developed a solution for this problem, so that code transformation is suitable for real-time systems. It combines pre-compilation of intermediate code with the elimination of indirect references in native code. The gain of determinism comes with an increased application startup time. In this paper we present an optimization that utilizes an Ahead-of-Time compiler to reduce the startup time while keeping the real-time suitable timing behaviour. In an experiment we compare our approach with existing ones and demonstrate its benefits for certain application cases.","PeriodicalId":344517,"journal":{"name":"M-SCOPES","volume":"39 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125736998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present the first approach to Optimal Placement of Bank Selection Instructions in Polynomial Time; previous approaches were not optimal or did not provably run in polynomial time. Our approach requires the input program to be structured, which is automatically true for many programming languages and for others, such as C, is equivalent to a bound on the number of goto labels per function. When not restricted to structured programs, the problem is NP-hard. A prototype implementation in a mainstream compiler for embedded systems shows the practical feasibility of our approach. Our approach and implementation are easy to retarget for different optimization goals and architectures.
{"title":"Optimal placement of bank selection instructions in polynomial time","authors":"P. K. Krause","doi":"10.1145/2463596.2463598","DOIUrl":"https://doi.org/10.1145/2463596.2463598","url":null,"abstract":"We present the first approach to Optimal Placement of Bank Selection Instructions in Polynomial Time; previous approaches were not optimal or did not provably run in polynomial time. Our approach requires the input program to be structured, which is automatically true for many programming languages and for others, such as C, is equivalent to a bound on the number of goto labels per function. When not restricted to structured programs, the problem is NP-hard. A prototype implementation in a mainstream compiler for embedded systems shows the practical feasibility of our approach. Our approach and implementation are easy to retarget for different optimization goals and architectures.","PeriodicalId":344517,"journal":{"name":"M-SCOPES","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127055349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Roberto Castañeda Lozano, Gabriel Hjort Blindell, M. Carlsson, Frej Drejhammar, Christian Schulte
Compiler back-ends generate assembly code by solving three main tasks: instruction selection, register allocation and instruction scheduling. We introduce constraint models and solving techniques for these code generation tasks and describe how the models can be composed to generate code in unison. The use of constraint programming, a technique to model and solve combinatorial problems, makes code generation simple, flexible, robust and potentially optimal.
{"title":"Constraint-based code generation","authors":"Roberto Castañeda Lozano, Gabriel Hjort Blindell, M. Carlsson, Frej Drejhammar, Christian Schulte","doi":"10.1145/2463596.2486155","DOIUrl":"https://doi.org/10.1145/2463596.2486155","url":null,"abstract":"Compiler back-ends generate assembly code by solving three main tasks: instruction selection, register allocation and instruction scheduling. We introduce constraint models and solving techniques for these code generation tasks and describe how the models can be composed to generate code in unison. The use of constraint programming, a technique to model and solve combinatorial problems, makes code generation simple, flexible, robust and potentially optimal.","PeriodicalId":344517,"journal":{"name":"M-SCOPES","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128440957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Massimiliano Zilli, Wolfgang Raschke, Johannes Loinig, R. Weiss, C. Steger
Java Card is a Java running environment developed for low-end embedded systems such as smart cards. In this context of scarce resources, ROM size plays a very important role and compression techniques help reducing program sizes as much as possible. Dictionary compression is the most promising technique and has been taken in consideration in this field by several authors. Java Card can adopt a dictionary compression scheme, substituting repeated sequences of bytecodes with new macros stored into a dictionary. This approach does not break the Java Card standard, but requires the use of an ad hoc Java virtual machine and an additional custom component in the converted applet (CAP) file. This paper presents two derived compaction techniques and discusses two scenarios: the first adopts an adaptive (dynamic) dictionary, while the second uses a static one. Although the base dictionary compression technique performs better with an adaptive dictionary, the two proposed techniques perform very close to the base one with a static dictionary. Moreover, we present a different compression mechanism based on re-engineering the CAP file through subroutines. This last technique achieves a higher compression rate, but it is fully compliant with the existing Java Card environments.
{"title":"On the dictionary compression for Java card environment","authors":"Massimiliano Zilli, Wolfgang Raschke, Johannes Loinig, R. Weiss, C. Steger","doi":"10.1145/2463596.2463605","DOIUrl":"https://doi.org/10.1145/2463596.2463605","url":null,"abstract":"Java Card is a Java running environment developed for low-end embedded systems such as smart cards. In this context of scarce resources, ROM size plays a very important role and compression techniques help reducing program sizes as much as possible. Dictionary compression is the most promising technique and has been taken in consideration in this field by several authors.\u0000 Java Card can adopt a dictionary compression scheme, substituting repeated sequences of bytecodes with new macros stored into a dictionary. This approach does not break the Java Card standard, but requires the use of an ad hoc Java virtual machine and an additional custom component in the converted applet (CAP) file. This paper presents two derived compaction techniques and discusses two scenarios: the first adopts an adaptive (dynamic) dictionary, while the second uses a static one. Although the base dictionary compression technique performs better with an adaptive dictionary, the two proposed techniques perform very close to the base one with a static dictionary. Moreover, we present a different compression mechanism based on re-engineering the CAP file through subroutines. This last technique achieves a higher compression rate, but it is fully compliant with the existing Java Card environments.","PeriodicalId":344517,"journal":{"name":"M-SCOPES","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128944672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Hausmans, M. Wiggers, Stefan J. Geuns, M. Bekooij
Dataflow analysis techniques are suitable for the temporal analysis of real-time stream processing applications. However, the applicability of these models is currently limited to systems with starvation-free schedulers, such as Time-Division Multiplexing (TDM) schedulers. Removal of this limitation would broaden the application domain of dataflow analysis techniques significantly. In this paper we present a temporal analysis technique for Homogeneous Synchronous Dataflow (HSDF) graphs, that is also applicable for systems with non-starvation-free schedulers. Unlike existing dataflow analysis techniques, the proposed analysis technique makes use of an enabling-jitter characterization and iterative fixed-point computation. The presented approach is applicable for arbitrary (cyclic) graph topologies. Buffer capacity constraints are taken into account during the analysis and sufficient buffer capacities can be determined afterwards. The approach presented in this paper is the first approach that considers non-starvation-free schedulers in combination with arbitrary HSDF graphs The proposed dataflow analysis technique is implemented in a tool. This tool is used to evaluate the analysis technique using examples that illustrate some important differences with other temporal analysis methods. The case-study discusses how the method presented in this paper can be used to solve a problem with the inaccuracy of the temporal analysis results of a real-time stream processing system. This stream processing system consists of an FM receiver together with a DAB receiver application which both share a Digital Signal Processor (DSP).
{"title":"Dataflow analysis for multiprocessor systems with non-starvation-free schedulers","authors":"J. Hausmans, M. Wiggers, Stefan J. Geuns, M. Bekooij","doi":"10.1145/2463596.2463603","DOIUrl":"https://doi.org/10.1145/2463596.2463603","url":null,"abstract":"Dataflow analysis techniques are suitable for the temporal analysis of real-time stream processing applications. However, the applicability of these models is currently limited to systems with starvation-free schedulers, such as Time-Division Multiplexing (TDM) schedulers. Removal of this limitation would broaden the application domain of dataflow analysis techniques significantly.\u0000 In this paper we present a temporal analysis technique for Homogeneous Synchronous Dataflow (HSDF) graphs, that is also applicable for systems with non-starvation-free schedulers. Unlike existing dataflow analysis techniques, the proposed analysis technique makes use of an enabling-jitter characterization and iterative fixed-point computation.\u0000 The presented approach is applicable for arbitrary (cyclic) graph topologies. Buffer capacity constraints are taken into account during the analysis and sufficient buffer capacities can be determined afterwards. The approach presented in this paper is the first approach that considers non-starvation-free schedulers in combination with arbitrary HSDF graphs\u0000 The proposed dataflow analysis technique is implemented in a tool. This tool is used to evaluate the analysis technique using examples that illustrate some important differences with other temporal analysis methods. The case-study discusses how the method presented in this paper can be used to solve a problem with the inaccuracy of the temporal analysis results of a real-time stream processing system. This stream processing system consists of an FM receiver together with a DAB receiver application which both share a Digital Signal Processor (DSP).","PeriodicalId":344517,"journal":{"name":"M-SCOPES","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114645939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cyclo-Static DataFlow (CSDF) is a powerful model for the specification of DSP applications. However, as in any asynchronous model, the synchronization of the different communicating tasks (processes) is made through buffers that have to be sized such that timing constraints are met. In this paper, we want to determine buffer sizes such that the throughput constraint is satisfied. This problem has been proved to be of exponential complexity. Exact techniques to solve this problem are too time and/or space consuming because of the self-timed schedule needed to evaluate the maximum throughput. Therefore, a periodic schedule is used. Each CSDF actor is associated with a period that satisfies the throughput constraint and sufficient buffer sizes are derived in polynomial time. However, within a period, an actor phases can be scheduled in different manners which impacts the evaluation of sufficient buffer sizes. This paper presents a Min-Max Linear Program that derives an optimized periodic phases scheduling per CSDF actor in order to minimize buffer sizes. It is shown through different applications that this Min-Max Linear Program allows to obtain close to optimal values while running in polynomial time.
{"title":"Cyclo-static DataFlow phases scheduling optimization for buffer sizes minimization","authors":"M. Benazouz, Alix Munier Kordon","doi":"10.1145/2463596.2463602","DOIUrl":"https://doi.org/10.1145/2463596.2463602","url":null,"abstract":"Cyclo-Static DataFlow (CSDF) is a powerful model for the specification of DSP applications. However, as in any asynchronous model, the synchronization of the different communicating tasks (processes) is made through buffers that have to be sized such that timing constraints are met. In this paper, we want to determine buffer sizes such that the throughput constraint is satisfied. This problem has been proved to be of exponential complexity. Exact techniques to solve this problem are too time and/or space consuming because of the self-timed schedule needed to evaluate the maximum throughput. Therefore, a periodic schedule is used. Each CSDF actor is associated with a period that satisfies the throughput constraint and sufficient buffer sizes are derived in polynomial time. However, within a period, an actor phases can be scheduled in different manners which impacts the evaluation of sufficient buffer sizes. This paper presents a Min-Max Linear Program that derives an optimized periodic phases scheduling per CSDF actor in order to minimize buffer sizes. It is shown through different applications that this Min-Max Linear Program allows to obtain close to optimal values while running in polynomial time.","PeriodicalId":344517,"journal":{"name":"M-SCOPES","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132542521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sascha Roloff, A. Weichslgartner, Jan Heisswolf, Frank Hannig, J. Teich
Multi- and many-core systems become more and more mainstream and therefore new communication infrastructures like Networks-on-Chip (NoC) and new programming languages like IBM's X10 with its partitioned global address space (PGAS) are introduced. In this paper we present an X10-based simulator, which is capable to simulate the network traffic that occurs inside the X10 program. This holistic approach enables to simulate the functionality and the indicated traffic together, in contrast to pure network simulators where usually only synthetic traffic or traces are used. We explain how the communication overhead is extracted from the X10 run-time and how to simulate the NoC behavior. In experiments we show that the proposed simulator is up to 10 x faster than a comparable SystemC-based simulator and at the same time preserves high accuracy. Furthermore, we present a quality and simulation speed tradeoff by using different simulation modes for a set of real world parallel applications.
{"title":"NoC simulation in heterogeneous architectures for PGAS programming model","authors":"Sascha Roloff, A. Weichslgartner, Jan Heisswolf, Frank Hannig, J. Teich","doi":"10.1145/2463596.2463606","DOIUrl":"https://doi.org/10.1145/2463596.2463606","url":null,"abstract":"Multi- and many-core systems become more and more mainstream and therefore new communication infrastructures like Networks-on-Chip (NoC) and new programming languages like IBM's X10 with its partitioned global address space (PGAS) are introduced. In this paper we present an X10-based simulator, which is capable to simulate the network traffic that occurs inside the X10 program. This holistic approach enables to simulate the functionality and the indicated traffic together, in contrast to pure network simulators where usually only synthetic traffic or traces are used. We explain how the communication overhead is extracted from the X10 run-time and how to simulate the NoC behavior. In experiments we show that the proposed simulator is up to 10 x faster than a comparable SystemC-based simulator and at the same time preserves high accuracy. Furthermore, we present a quality and simulation speed tradeoff by using different simulation modes for a set of real world parallel applications.","PeriodicalId":344517,"journal":{"name":"M-SCOPES","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127504191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
For more than four decades Moore's Law has provided a steady exponential grow where each new technology node provided a win-win situation as shrinking features sizes not only led to more complex circuits but also led to faster and less expensive embedded on-chip systems. As Moore's Law approaches physical limits, though, reliability becomes a severe problem: aging effects like electro migration, NBTI, increased susceptibility against soft errors etc. increasingly jeopardize reliability. The talk starts with an overview of aging and soft error effects and deducts that many reliability-threatening effects are directly or indirectly related to thermal issues. The talk gives some background on thermal issues and also presents effective solutions that scale especially with respect to multi-core systems.
{"title":"Embedded on-chip reliability: it's a thermal challenge","authors":"J. Henkel","doi":"10.1145/2463596.2488357","DOIUrl":"https://doi.org/10.1145/2463596.2488357","url":null,"abstract":"For more than four decades Moore's Law has provided a steady exponential grow where each new technology node provided a win-win situation as shrinking features sizes not only led to more complex circuits but also led to faster and less expensive embedded on-chip systems. As Moore's Law approaches physical limits, though, reliability becomes a severe problem: aging effects like electro migration, NBTI, increased susceptibility against soft errors etc. increasingly jeopardize reliability. The talk starts with an overview of aging and soft error effects and deducts that many reliability-threatening effects are directly or indirectly related to thermal issues. The talk gives some background on thermal issues and also presents effective solutions that scale especially with respect to multi-core systems.","PeriodicalId":344517,"journal":{"name":"M-SCOPES","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128201142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}