One of the most critical challenges for today's and future data-intensive and big-data problems (ranging from economics and business activities to public administration, from national security to many scientific research areas) is data storage and analysis. The primary goal is to increase the understanding of processes by extracting highly useful values hidden in the huge volumes of data. The increase of the data size has already surpassed the capabilities of today's computation architectures which suffer from the limited bandwidth (due to communication and memory-access bottlenecks), energy inefficiency and limited scalability (due to CMOS technology). This talk will first address the CMOS scaling and its impact on different aspects of IC and electronics; the major limitations the scaling is facing (such as leakage, yield, reliability, etc) will be shown and the need of a new technology will be motivated. Thereafter, an overview of computing systems, developed since the introduction of Stored program computers by John von Neumann in the forties, will be given. Shortcomings of today's architectures to deal with data-intensive applications will be discussed. It will be shown that the speed at which data is growing has already surpassed the capabilities of today's computation architectures suffering from communication bottleneck and energy inefficiency; hence the need for a new architecture. Finally, the talk will introduce a new architecture paradigm for big data problems; it is based on the integration of the storage and computation in the same physical location (using a cross-bar topology) and the use of non-volatile resistive-switching technology, based on memristors, instead of CMOS technology. The huge potential of such architecture in realizing order of magnitude improvement will be illustrated by comparing it with the state-of-the art architectures (multi-core, GPUs, FPGAs) for different data-intensive applications.
{"title":"Computation in Memory for Data-Intensive Applications: Beyond CMOS and beyond Von- Neumann","authors":"S. Hamdioui","doi":"10.1145/2764967.2771820","DOIUrl":"https://doi.org/10.1145/2764967.2771820","url":null,"abstract":"One of the most critical challenges for today's and future data-intensive and big-data problems (ranging from economics and business activities to public administration, from national security to many scientific research areas) is data storage and analysis. The primary goal is to increase the understanding of processes by extracting highly useful values hidden in the huge volumes of data. The increase of the data size has already surpassed the capabilities of today's computation architectures which suffer from the limited bandwidth (due to communication and memory-access bottlenecks), energy inefficiency and limited scalability (due to CMOS technology). This talk will first address the CMOS scaling and its impact on different aspects of IC and electronics; the major limitations the scaling is facing (such as leakage, yield, reliability, etc) will be shown and the need of a new technology will be motivated. Thereafter, an overview of computing systems, developed since the introduction of Stored program computers by John von Neumann in the forties, will be given. Shortcomings of today's architectures to deal with data-intensive applications will be discussed. It will be shown that the speed at which data is growing has already surpassed the capabilities of today's computation architectures suffering from communication bottleneck and energy inefficiency; hence the need for a new architecture. Finally, the talk will introduce a new architecture paradigm for big data problems; it is based on the integration of the storage and computation in the same physical location (using a cross-bar topology) and the use of non-volatile resistive-switching technology, based on memristors, instead of CMOS technology. The huge potential of such architecture in realizing order of magnitude improvement will be illustrated by comparing it with the state-of-the art architectures (multi-core, GPUs, FPGAs) for different data-intensive applications.","PeriodicalId":110157,"journal":{"name":"Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122421996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Software-pipelining is an important technique for increasing the instruction level parallelism of loops during compilation. Currently, the LLVM compiler infrastructure does not offer this optimization although some target specific implementations do exist. We have implemented a high-level method for software-pipelining within the LLVM framework. By implementing this within LLVM's optimization layer we have taken the first steps towards a target independent software-pipelining method.
{"title":"High-level software-pipelining in LLVM","authors":"Roel Jordans, H. Corporaal","doi":"10.1145/2764967.2771935","DOIUrl":"https://doi.org/10.1145/2764967.2771935","url":null,"abstract":"Software-pipelining is an important technique for increasing the instruction level parallelism of loops during compilation. Currently, the LLVM compiler infrastructure does not offer this optimization although some target specific implementations do exist. We have implemented a high-level method for software-pipelining within the LLVM framework. By implementing this within LLVM's optimization layer we have taken the first steps towards a target independent software-pipelining method.","PeriodicalId":110157,"journal":{"name":"Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123491769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a new approach to efficiently search for suitable compiler pass sequences, a challenge known as phase ordering. Our approach relies on information about the relative positions of compiler passes in compiler pass sequences previously generated for a set of functions when compiling for a specific processor. We enhanced two iterative compiler pass exploration schemes, one relying on simple sequential compiler pass insertion and other implementing an auto-tuned simulated annealing process, with a data structure that holds information about the relative positions of compiler sequences; in order to reduce the set of compiler passes considered for insertion in a given position of a given candidate compiler pass sequence to include only the passes that have a higher probability of performing well on that relative position in the compiler sequence, speeding up the exploration time as a result. We tested our approach with two different compilers and two different targets; the ReflectC and the LLVM compilers, targeting a MicroBlaze processor and a LEON3 processor, respectively. The experimental results show that we can considerably reduce the number of algorithm iterations by a factor of up to more than an order of magnitude when targeting the MicroBlaze or the LEON3, while finding compiler sequences that result in binaries that when executed on the target processor/simulator are able to outperform (i.e. use less CPU cycles) all the standard optimization levels (i.e., we compare against the most performing optimization level flag on each kernel, e.g. -O1, -O2 or -O3 in the case of LLVM) by a geometric mean performance improvement of 1.23x and 1.20x when targeting the MicroBlaze processor, and 1.94x and 2.65x when targetting the LEON3 processor; for each of the two exploration algorithms and two kernel sets considered.
{"title":"Use of Previously Acquired Positioning of Optimizations for Phase Ordering Exploration","authors":"Ricardo Nobre, L. G. A. Martins, João MP Cardoso","doi":"10.1145/2764967.2764978","DOIUrl":"https://doi.org/10.1145/2764967.2764978","url":null,"abstract":"This paper presents a new approach to efficiently search for suitable compiler pass sequences, a challenge known as phase ordering. Our approach relies on information about the relative positions of compiler passes in compiler pass sequences previously generated for a set of functions when compiling for a specific processor. We enhanced two iterative compiler pass exploration schemes, one relying on simple sequential compiler pass insertion and other implementing an auto-tuned simulated annealing process, with a data structure that holds information about the relative positions of compiler sequences; in order to reduce the set of compiler passes considered for insertion in a given position of a given candidate compiler pass sequence to include only the passes that have a higher probability of performing well on that relative position in the compiler sequence, speeding up the exploration time as a result. We tested our approach with two different compilers and two different targets; the ReflectC and the LLVM compilers, targeting a MicroBlaze processor and a LEON3 processor, respectively. The experimental results show that we can considerably reduce the number of algorithm iterations by a factor of up to more than an order of magnitude when targeting the MicroBlaze or the LEON3, while finding compiler sequences that result in binaries that when executed on the target processor/simulator are able to outperform (i.e. use less CPU cycles) all the standard optimization levels (i.e., we compare against the most performing optimization level flag on each kernel, e.g. -O1, -O2 or -O3 in the case of LLVM) by a geometric mean performance improvement of 1.23x and 1.20x when targeting the MicroBlaze processor, and 1.94x and 2.65x when targetting the LEON3 processor; for each of the two exploration algorithms and two kernel sets considered.","PeriodicalId":110157,"journal":{"name":"Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems","volume":"37 19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125702988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giuseppe Tagliavini, Germain Haugou, A. Marongiu, L. Benini
Nowadays Computer Vision application are ubiquitous, and their presence on embedded devices is more and more widespread. Heterogeneous embedded systems featuring a clustered manycore accelerator are a very promising target to execute embedded vision algorithms, but the code optimization for these platforms is a challenging task. Moreover, designers really need support tools that are both fast and accurate. In this work we introduce ADRENALINE, an environment for development and optimization of OpenVX applications targeting manycore accelerators. ADRENALINE consists of a custom OpenVX run-time and a virtual platform, and overall it is intended to provide support to enhance performance of embedded vision applications.
{"title":"A framework for optimizing OpenVX applications performance on embedded manycore accelerators","authors":"Giuseppe Tagliavini, Germain Haugou, A. Marongiu, L. Benini","doi":"10.1145/2764967.2776858","DOIUrl":"https://doi.org/10.1145/2764967.2776858","url":null,"abstract":"Nowadays Computer Vision application are ubiquitous, and their presence on embedded devices is more and more widespread. Heterogeneous embedded systems featuring a clustered manycore accelerator are a very promising target to execute embedded vision algorithms, but the code optimization for these platforms is a challenging task. Moreover, designers really need support tools that are both fast and accurate. In this work we introduce ADRENALINE, an environment for development and optimization of OpenVX applications targeting manycore accelerators. ADRENALINE consists of a custom OpenVX run-time and a virtual platform, and overall it is intended to provide support to enhance performance of embedded vision applications.","PeriodicalId":110157,"journal":{"name":"Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130841950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Real-time dataflow analysis techniques for multiprocessor systems ignore that the execution of tasks belonging to different operation modes are mutually exclusive. This results in more resources being reserved than strictly needed and a low resource utilization. In this paper we present a dataflow analysis approach which takes into account that tasks belonging to different modes often execute mutually exclusive. Therefore less resources need to be reserved to satisfy a throughput constraint and a higher processor utilization can be obtained. Furthermore, we introduce a lock which is used to enforce mutual exclusive execution of tasks during a mode transition when beneficial. The effects of mutual exclusive execution are included in a Structured Variable-Rate Phased Dataflow (SVPDF) temporal analysis model which is used to determine whether adding a lock results in satisfaction of the throughput constraint. This model is generated from a sequential input specification of the application such that deadlock-free execution, even after the addition of locks, is guaranteed. The applicability and benefits of the approach are demonstrated using a WLAN 802.11g application which switches between a detection and a decoding mode. It is shown that the use of two locks improves the worst-case response times of 3 tasks such that they can share the same processor, which improves the utilization of this processor and frees 2 other processors.
{"title":"Utilization Improvement by Enforcing Mutual Exclusive Task Execution in Modal Stream Processing Applications","authors":"G. Kuiper, Stefan J. Geuns, M. Bekooij","doi":"10.1145/2764967.2764970","DOIUrl":"https://doi.org/10.1145/2764967.2764970","url":null,"abstract":"Real-time dataflow analysis techniques for multiprocessor systems ignore that the execution of tasks belonging to different operation modes are mutually exclusive. This results in more resources being reserved than strictly needed and a low resource utilization. In this paper we present a dataflow analysis approach which takes into account that tasks belonging to different modes often execute mutually exclusive. Therefore less resources need to be reserved to satisfy a throughput constraint and a higher processor utilization can be obtained. Furthermore, we introduce a lock which is used to enforce mutual exclusive execution of tasks during a mode transition when beneficial. The effects of mutual exclusive execution are included in a Structured Variable-Rate Phased Dataflow (SVPDF) temporal analysis model which is used to determine whether adding a lock results in satisfaction of the throughput constraint. This model is generated from a sequential input specification of the application such that deadlock-free execution, even after the addition of locks, is guaranteed. The applicability and benefits of the approach are demonstrated using a WLAN 802.11g application which switches between a detection and a decoding mode. It is shown that the use of two locks improves the worst-case response times of 3 tasks such that they can share the same processor, which improves the utilization of this processor and frees 2 other processors.","PeriodicalId":110157,"journal":{"name":"Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122443978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In hard real-time multitasking systems, applying WCET-oriented code optimizations to individual tasks may not lead to optimal results with regard to the system's schedulability. We propose an approach based on Integer-Linear Programming which is able to perform schedulability aware code optimizations for periodic task sets with fixed priorities. We evaluate our approach by using a static instruction SPM optimization for the Infineon TriCore microcontroller.
{"title":"Schedulability Aware WCET-Optimization of Periodic Preemptive Hard Real-Time Multitasking Systems","authors":"Arno Luppold, H. Falk","doi":"10.1145/2764967.2771930","DOIUrl":"https://doi.org/10.1145/2764967.2771930","url":null,"abstract":"In hard real-time multitasking systems, applying WCET-oriented code optimizations to individual tasks may not lead to optimal results with regard to the system's schedulability. We propose an approach based on Integer-Linear Programming which is able to perform schedulability aware code optimizations for periodic task sets with fixed priorities. We evaluate our approach by using a static instruction SPM optimization for the Infineon TriCore microcontroller.","PeriodicalId":110157,"journal":{"name":"Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124687659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Traditionally, variables have been considered as atoms by register allocation: Each variable was to be placed in one register, or spilt (placed in main memory) or rematerialized (recalculated as needed). Some flexibility arose from what would be considered a register: Register aliasing allowed to treat a register meant to hold a 16-bit variable as two registers that could hold an 8-bit variable each. We allow for far more lexibility in register allocation: We decide on the storage of variables bytewise, i. e. we decide for each individual byte in a variable whether to store it in memory or a register, and consider any byte of any register as a possible storage location. We implemented a backend for the STM8 architecture (STMicroelectronics' current 8-bit architecture) in the C-compiler sdcc, and experimentally evaluate the beneits of bytewise register allocation. The results show that bytewise register allocation can result in substantial improvements in the generated code. Optimizing for code size we obtained 27.2%, 13.2% and 9.2% reductions in code size in the Whetstone, Dhrystone and Coremark benchmarks, respectively, when using bytewise allocation and spilling compared to conventional allocation.
{"title":"Bytewise Register Allocation","authors":"P. K. Krause","doi":"10.1145/2764967.2764971","DOIUrl":"https://doi.org/10.1145/2764967.2764971","url":null,"abstract":"Traditionally, variables have been considered as atoms by register allocation: Each variable was to be placed in one register, or spilt (placed in main memory) or rematerialized (recalculated as needed). Some flexibility arose from what would be considered a register: Register aliasing allowed to treat a register meant to hold a 16-bit variable as two registers that could hold an 8-bit variable each. We allow for far more lexibility in register allocation: We decide on the storage of variables bytewise, i. e. we decide for each individual byte in a variable whether to store it in memory or a register, and consider any byte of any register as a possible storage location. We implemented a backend for the STM8 architecture (STMicroelectronics' current 8-bit architecture) in the C-compiler sdcc, and experimentally evaluate the beneits of bytewise register allocation. The results show that bytewise register allocation can result in substantial improvements in the generated code. Optimizing for code size we obtained 27.2%, 13.2% and 9.2% reductions in code size in the Whetstone, Dhrystone and Coremark benchmarks, respectively, when using bytewise allocation and spilling compared to conventional allocation.","PeriodicalId":110157,"journal":{"name":"Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116994158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bartosz Ziólek, Mariusz Ryndzionek, Z. Chamski, P. Romaniuk
MpicOS is a reactive nano-kernel designed for controlling power- and energy-bound multicore embedded systems. Contrary to the mainstream approach of providing a multithreading framework with context saving, MpicOS is articulated around the reactive trigger-response abstraction with ultra-low power waits and a minimal API based on events and continuations. This change of paradigm keeps low the cost of re-engineering existing software, yet it results in major gains in power and energy usage of the system. Additionally, the reactive approach enables the deployment of novel applications on existing hardware platforms, resulting in new market opportunities and improved user experience.
{"title":"Synchronous Reactive Nano-Kernels: Exploring the Limits of Power and Energy Efficiency in Embedded Systems","authors":"Bartosz Ziólek, Mariusz Ryndzionek, Z. Chamski, P. Romaniuk","doi":"10.1145/2764967.2771934","DOIUrl":"https://doi.org/10.1145/2764967.2771934","url":null,"abstract":"MpicOS is a reactive nano-kernel designed for controlling power- and energy-bound multicore embedded systems. Contrary to the mainstream approach of providing a multithreading framework with context saving, MpicOS is articulated around the reactive trigger-response abstraction with ultra-low power waits and a minimal API based on events and continuations. This change of paradigm keeps low the cost of re-engineering existing software, yet it results in major gains in power and energy usage of the system. Additionally, the reactive approach enables the deployment of novel applications on existing hardware platforms, resulting in new market opportunities and improved user experience.","PeriodicalId":110157,"journal":{"name":"Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124591485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Runtime adaptability is expected to adjust the application and the mapping of computations according to usage contexts, operating environments, resources availability, etc. However, extending applications with adaptive features can be a complex task, especially due to the current lack of programming models and compiler support. One of the run-time adaptability possibilities is the use of specialized code according to data workloads and environments. Traditional approaches use multiple code versions generated offline and, during runtime, a strategy is responsible to select a code version. Moving code generation to runtime can achieve important improvements but may impose unacceptable overhead. This paper presents an aspect-oriented programming approach for runtime adaptability. We focus on a separation of concerns (strategies vs. application) promoted by a domain-specific language for programming runtime strategies. Our strategies allow runtime specialization based on contextual information. We use a template-based runtime code generation approach to achieve program specialization. We demonstrate our approach with examples from image processing, which depict the benefits of runtime specialization and illustrate how several factors need to be considered to efficiently adapt the application.
{"title":"Programming Strategies for Contextual Runtime Specialization","authors":"Tiago Carvalho, Pedro Pinto, João MP Cardoso","doi":"10.1145/2764967.2764973","DOIUrl":"https://doi.org/10.1145/2764967.2764973","url":null,"abstract":"Runtime adaptability is expected to adjust the application and the mapping of computations according to usage contexts, operating environments, resources availability, etc. However, extending applications with adaptive features can be a complex task, especially due to the current lack of programming models and compiler support. One of the run-time adaptability possibilities is the use of specialized code according to data workloads and environments. Traditional approaches use multiple code versions generated offline and, during runtime, a strategy is responsible to select a code version. Moving code generation to runtime can achieve important improvements but may impose unacceptable overhead. This paper presents an aspect-oriented programming approach for runtime adaptability. We focus on a separation of concerns (strategies vs. application) promoted by a domain-specific language for programming runtime strategies. Our strategies allow runtime specialization based on contextual information. We use a template-based runtime code generation approach to achieve program specialization. We demonstrate our approach with examples from image processing, which depict the benefits of runtime specialization and illustrate how several factors need to be considered to efficiently adapt the application.","PeriodicalId":110157,"journal":{"name":"Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121645643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Ngo, J. Talpin, T. Gautier, L. Besnard, P. Guernic
This presentation demonstrates a scalable, modular, refinable methodology for translation validation applied to a mature (20 years old), large (500k lines of C), open source (Eclipse/Polarsys IWG project POP) code generation suite, all by using off-the-shelf, open-source, SAT/SMT verification tools (Yices), by adapting and optimizing the translation validation principle introduced by Pnueli et al. in 1998. This methodology results from the ANR project VERISYNC, in which we aimed at revisiting Pnueli's seminal work on translation validation using off-the-shelf, up-to-date, verification technology. In face of the enormous task at hand, the verification of a compiler infrastructure comprising around 500 000 lines of C code, we devised to narrow down and isolate the problem to the very data-structures manipulated by the infrastructure at the successive steps of code generation, in order to both optimize the whole verification process and make the implementation of a working prototype at all doable. Our presentation outlines the successive steps of this endeavour, from clock synthesis, static scheduling to target code production.
{"title":"Modular translation validation of a full-sized synchronous compiler using off-the-shelf verification tools","authors":"V. Ngo, J. Talpin, T. Gautier, L. Besnard, P. Guernic","doi":"10.1145/2764967.2775291","DOIUrl":"https://doi.org/10.1145/2764967.2775291","url":null,"abstract":"This presentation demonstrates a scalable, modular, refinable methodology for translation validation applied to a mature (20 years old), large (500k lines of C), open source (Eclipse/Polarsys IWG project POP) code generation suite, all by using off-the-shelf, open-source, SAT/SMT verification tools (Yices), by adapting and optimizing the translation validation principle introduced by Pnueli et al. in 1998. This methodology results from the ANR project VERISYNC, in which we aimed at revisiting Pnueli's seminal work on translation validation using off-the-shelf, up-to-date, verification technology. In face of the enormous task at hand, the verification of a compiler infrastructure comprising around 500 000 lines of C code, we devised to narrow down and isolate the problem to the very data-structures manipulated by the infrastructure at the successive steps of code generation, in order to both optimize the whole verification process and make the implementation of a working prototype at all doable. Our presentation outlines the successive steps of this endeavour, from clock synthesis, static scheduling to target code production.","PeriodicalId":110157,"journal":{"name":"Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117179705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}