Pub Date : 2011-11-01DOI: 10.1109/DASIP.2011.6136867
Teemu Laukkarinen, J. Suhonen, T. Hämäläinen, Marko Hännikäinen
For enabling successful field pilots of Wireless Sensor Network (WSN) applications, the network reliability and prototype testing become limiting factors. Application pilot studies need to operate end-to-end, covering the physical durability of devices, embedded software, and infrastructure interfaces and data collection. This paper summarizes our pilot study experiences, and what tools and practices were required. Six lessons are proposed: a systematic pilot template results straightforward pilot completion; shared WSN infrastructure reduces labor; tailored embedded software testing tools are needed; the pilot must be prepared carefully; the WSN technology must be usable for research partners; and the pilot must be maintained and maintenance tools are required in large scale pilots. Our experiences base on over 20 pilot studies and over 1000 deployed devices. This paper describes 11 main pilots, which utilize from 10 to 377 devices per pilot.
{"title":"Pilot studies of wireless sensor networks: Practical experiences","authors":"Teemu Laukkarinen, J. Suhonen, T. Hämäläinen, Marko Hännikäinen","doi":"10.1109/DASIP.2011.6136867","DOIUrl":"https://doi.org/10.1109/DASIP.2011.6136867","url":null,"abstract":"For enabling successful field pilots of Wireless Sensor Network (WSN) applications, the network reliability and prototype testing become limiting factors. Application pilot studies need to operate end-to-end, covering the physical durability of devices, embedded software, and infrastructure interfaces and data collection. This paper summarizes our pilot study experiences, and what tools and practices were required. Six lessons are proposed: a systematic pilot template results straightforward pilot completion; shared WSN infrastructure reduces labor; tailored embedded software testing tools are needed; the pilot must be prepared carefully; the WSN technology must be usable for research partners; and the pilot must be maintained and maintenance tools are required in large scale pilots. Our experiences base on over 20 pilot studies and over 1000 deployed devices. This paper describes 11 main pilots, which utilize from 10 to 377 devices per pilot.","PeriodicalId":199500,"journal":{"name":"Proceedings of the 2011 Conference on Design & Architectures for Signal & Image Processing (DASIP)","volume":"97 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120884624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/DASIP.2011.6136887
Ville Kaseva, T. Hämäläinen, Marko Hännikäinen
Wireless Sensor Networks (WSNs) form an attractive technology for ubiquitous indoor localization. The localized node lifetime is maximized by using energy-efficient radios and minimizing their active time. However, the most low-cost and low-power radios do not include Received Signal Strength Indicator (RSSI) functionality commonly used for RF-based localization. In this paper, we present a range-free localization algorithm for localized nodes with minimized radio communication and radios without RSSI. The low complexity of the algorithm enables implementation in resource-constrained hardware for in-network localization. We experimented the algorithm using a real WSN implementation. In room-level localization, the area was resolved correctly 96% of the time. The maximum point-based error was 8.70 m. The corresponding values for sub-room-level localization are 100% and 4.20 m. The prototype implementation consumed 1900 B of program memory. The data memory consumption varied from 18 B to 180 B, and the power consumption from 345 μW to 2.48 mW depending on the amount of localization data.
{"title":"Range-free algorithm for energy-efficient indoor localization in Wireless Sensor Networks","authors":"Ville Kaseva, T. Hämäläinen, Marko Hännikäinen","doi":"10.1109/DASIP.2011.6136887","DOIUrl":"https://doi.org/10.1109/DASIP.2011.6136887","url":null,"abstract":"Wireless Sensor Networks (WSNs) form an attractive technology for ubiquitous indoor localization. The localized node lifetime is maximized by using energy-efficient radios and minimizing their active time. However, the most low-cost and low-power radios do not include Received Signal Strength Indicator (RSSI) functionality commonly used for RF-based localization. In this paper, we present a range-free localization algorithm for localized nodes with minimized radio communication and radios without RSSI. The low complexity of the algorithm enables implementation in resource-constrained hardware for in-network localization. We experimented the algorithm using a real WSN implementation. In room-level localization, the area was resolved correctly 96% of the time. The maximum point-based error was 8.70 m. The corresponding values for sub-room-level localization are 100% and 4.20 m. The prototype implementation consumed 1900 B of program memory. The data memory consumption varied from 18 B to 180 B, and the power consumption from 345 μW to 2.48 mW depending on the amount of localization data.","PeriodicalId":199500,"journal":{"name":"Proceedings of the 2011 Conference on Design & Architectures for Signal & Image Processing (DASIP)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116503240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/DASIP.2011.6136855
V. Brost, Charles Meunier, D. Saptono, Fan Yang
Modern FPGA chips, with their larger memory capacity and reconfigurability potential, are opening new frontiers in rapid prototyping of embedded systems. With the advent of high density FPGAs it is now possible to implement a high performance Very Long Instruction Word (VLIW) processor core in an FPGA. With VLIW architecture, the processor effectiveness depends on the ability of compilers to provide sufficient Instruction Level Parallelism (ILP) from program code. This paper describes research result about enabling the VLIW processor model for real-time processing applications by exploiting FPGA technology. Our goals are to keep the flexibility of processors in order to shorten the development cycle, and to use the powerful FPGA resources in order to increase real-time performance. We present a flexible VLIW VHDL processor model with a variable instruction set and a customizable architecture which allow exploiting intrinsic parallelism of a target application using advanced compiler technology and implementing it in an optimal manner on FPGA. Some common algorithms of image processing were tested and validated on an FPGA Virtex-6 based board using the proposed development cycle. Our approach applies some criteria for co-design tools: flexibility, modularity, performance, and reusability.
{"title":"Flexible VLIW processor based on FPGA for real-time image processing","authors":"V. Brost, Charles Meunier, D. Saptono, Fan Yang","doi":"10.1109/DASIP.2011.6136855","DOIUrl":"https://doi.org/10.1109/DASIP.2011.6136855","url":null,"abstract":"Modern FPGA chips, with their larger memory capacity and reconfigurability potential, are opening new frontiers in rapid prototyping of embedded systems. With the advent of high density FPGAs it is now possible to implement a high performance Very Long Instruction Word (VLIW) processor core in an FPGA. With VLIW architecture, the processor effectiveness depends on the ability of compilers to provide sufficient Instruction Level Parallelism (ILP) from program code. This paper describes research result about enabling the VLIW processor model for real-time processing applications by exploiting FPGA technology. Our goals are to keep the flexibility of processors in order to shorten the development cycle, and to use the powerful FPGA resources in order to increase real-time performance. We present a flexible VLIW VHDL processor model with a variable instruction set and a customizable architecture which allow exploiting intrinsic parallelism of a target application using advanced compiler technology and implementing it in an optimal manner on FPGA. Some common algorithms of image processing were tested and validated on an FPGA Virtex-6 based board using the proposed development cycle. Our approach applies some criteria for co-design tools: flexibility, modularity, performance, and reusability.","PeriodicalId":199500,"journal":{"name":"Proceedings of the 2011 Conference on Design & Architectures for Signal & Image Processing (DASIP)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132613538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/DASIP.2011.6136869
Roberto Airoldi, F. Garzia, J. Nurmi
This paper presents the study of an efficient trade-off between memory requirements and performance for the implementation of FFT pruning algorithm. FFT pruning algorithm is utilized in NC-OFDM systems to simplify the FFT algorithm complexity in presence of subcarrier sparseness. State-of-the-art implementations offer good performance with the drawback of high resources utilization, i.e. data memory for storage of the configuration matrix. In this work we introduce the partial pruning algorithm as an efficient way to implement FFT pruning, obtaining a balanced trade-off between performance and resources allocation. Cycle accurate simulation results showed that even in presence of low-medium input sparseness levels the proposed algorithm can reduce the computation time of at least a 20% factor, when compared to traditional FFT algorithms and, at the same time, decreases the memory utilization up to a 20% factor over state of the art pruning algorithms.
{"title":"Efficient FFT pruning algorithm for non-contiguous OFDM systems","authors":"Roberto Airoldi, F. Garzia, J. Nurmi","doi":"10.1109/DASIP.2011.6136869","DOIUrl":"https://doi.org/10.1109/DASIP.2011.6136869","url":null,"abstract":"This paper presents the study of an efficient trade-off between memory requirements and performance for the implementation of FFT pruning algorithm. FFT pruning algorithm is utilized in NC-OFDM systems to simplify the FFT algorithm complexity in presence of subcarrier sparseness. State-of-the-art implementations offer good performance with the drawback of high resources utilization, i.e. data memory for storage of the configuration matrix. In this work we introduce the partial pruning algorithm as an efficient way to implement FFT pruning, obtaining a balanced trade-off between performance and resources allocation. Cycle accurate simulation results showed that even in presence of low-medium input sparseness levels the proposed algorithm can reduce the computation time of at least a 20% factor, when compared to traditional FFT algorithms and, at the same time, decreases the memory utilization up to a 20% factor over state of the art pruning algorithms.","PeriodicalId":199500,"journal":{"name":"Proceedings of the 2011 Conference on Design & Architectures for Signal & Image Processing (DASIP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131780501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/DASIP.2011.6136876
F. Palumbo, N. Carta, L. Raffo
Dataflow Model of Computation (D-MoC) is particularly suitable to close the gap between hardware architects and software developers. Leveraging on the combination of the D-MoC with a coarse-grained reconfigurable approach to hardware design, we propose a tool, the Multi-Dataflow Composer (MDC) tool, able to improve time-to-market of modern complex multi-purpose systems by allowing the derivation of HDL runtime reconfigurable platforms starting from the D-MoC models of the targeted set of applications. MDC tool has proven to provide a considerable on-chip area saving: the 82% of saving has been reached combining of different applications in the image processing domain, adopting a 90 nm CMOS technology. In future the MDC tool, with a very small integration effort, will also be extremely useful to create multi-standard codec platforms for MPEG RVC applications.
{"title":"The Multi-Dataflow Composer tool: A runtime reconfigurable HDL platform composer","authors":"F. Palumbo, N. Carta, L. Raffo","doi":"10.1109/DASIP.2011.6136876","DOIUrl":"https://doi.org/10.1109/DASIP.2011.6136876","url":null,"abstract":"Dataflow Model of Computation (D-MoC) is particularly suitable to close the gap between hardware architects and software developers. Leveraging on the combination of the D-MoC with a coarse-grained reconfigurable approach to hardware design, we propose a tool, the Multi-Dataflow Composer (MDC) tool, able to improve time-to-market of modern complex multi-purpose systems by allowing the derivation of HDL runtime reconfigurable platforms starting from the D-MoC models of the targeted set of applications. MDC tool has proven to provide a considerable on-chip area saving: the 82% of saving has been reached combining of different applications in the image processing domain, adopting a 90 nm CMOS technology. In future the MDC tool, with a very small integration effort, will also be extremely useful to create multi-standard codec platforms for MPEG RVC applications.","PeriodicalId":199500,"journal":{"name":"Proceedings of the 2011 Conference on Design & Architectures for Signal & Image Processing (DASIP)","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132184834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/DASIP.2011.6136893
T. Schlechter
Power efficiency is an important issue in mobile communication systems. Especially for mobile user equipments, the energy budget, limited by a battery, has to be treated carefully. Despite this fact, quite an amount of energy is wasted in todays user equipments, as analog and digital frontend in communication systems are engineered for extracting the wanted signal from a spectral environment defined in the corresponding communication standards with their extremely tough requirements. In a real receiving process those requirements can typically be considered as dramatically less critical. Capturing the environmental transmission conditions and adapting the receiver architecture to the actual needs allows to save energy during the receiving process. An efficient architecture being able to fulfill this task for a typical Long Term Evolution scenario is desired and introduced in this paper. The development of a suitable filterchain is described and a complexity comparison to Fast Fourier Transformation based methods is given.
{"title":"Multiplier free filter bank based concept for blocker detection in LTE systems","authors":"T. Schlechter","doi":"10.1109/DASIP.2011.6136893","DOIUrl":"https://doi.org/10.1109/DASIP.2011.6136893","url":null,"abstract":"Power efficiency is an important issue in mobile communication systems. Especially for mobile user equipments, the energy budget, limited by a battery, has to be treated carefully. Despite this fact, quite an amount of energy is wasted in todays user equipments, as analog and digital frontend in communication systems are engineered for extracting the wanted signal from a spectral environment defined in the corresponding communication standards with their extremely tough requirements. In a real receiving process those requirements can typically be considered as dramatically less critical. Capturing the environmental transmission conditions and adapting the receiver architecture to the actual needs allows to save energy during the receiving process. An efficient architecture being able to fulfill this task for a typical Long Term Evolution scenario is desired and introduced in this paper. The development of a suitable filterchain is described and a complexity comparison to Fast Fourier Transformation based methods is given.","PeriodicalId":199500,"journal":{"name":"Proceedings of the 2011 Conference on Design & Architectures for Signal & Image Processing (DASIP)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124861944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/DASIP.2011.6136882
W. Stechele, J. Hartmann, E. Maehle
Robotic Vision combined with real-time control imposes challenging requirements on embedded computing nodes in robots, exhibiting strong variations in computational load due to dynamically changing activity profiles. Reconfigurable Multiprocessor System-on-Chip offers a solution by efficiently handling the robot's resources, but reconfiguration management seems challenging. The goal of this paper is to present first ideas on self-learning reconfiguration management for Reco nfigurable multicore computing nodes with dynamic reconfiguration of soft-core CPUs and HW accelerators, to support dynamically changing activity profiles in Robotic Vision scenarios.
{"title":"An approach to self-learning multicore reconfiguration management applied on Robotic Vision","authors":"W. Stechele, J. Hartmann, E. Maehle","doi":"10.1109/DASIP.2011.6136882","DOIUrl":"https://doi.org/10.1109/DASIP.2011.6136882","url":null,"abstract":"Robotic Vision combined with real-time control imposes challenging requirements on embedded computing nodes in robots, exhibiting strong variations in computational load due to dynamically changing activity profiles. Reconfigurable Multiprocessor System-on-Chip offers a solution by efficiently handling the robot's resources, but reconfiguration management seems challenging. The goal of this paper is to present first ideas on self-learning reconfiguration management for Reco nfigurable multicore computing nodes with dynamic reconfiguration of soft-core CPUs and HW accelerators, to support dynamically changing activity profiles in Robotic Vision scenarios.","PeriodicalId":199500,"journal":{"name":"Proceedings of the 2011 Conference on Design & Architectures for Signal & Image Processing (DASIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130426670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/DASIP.2011.6136888
Jukka Saastamoinen, J. Kreku
As most of the applications of embedded system products are realized in software, the performance estimation of software is crucial for successful system design. Significant part of the functionality of these applications is based on services provided by the underlying software libraries. Often used performance evaluation technique today is the system-level performance simulation of the applications and platforms using abstracted workload and execution platform models. The accuracy of the software performance results is dependent on how closely the application workload model reflects actual software as a whole. This paper presents a methodology which combines compiler based user code workload model generation with workload extraction of pre-compiled libraries, while exploiting an overall approach and execution platform model developed previously. Benefit of the proposed methodology compared to earlier solution is experimented using a set of benchmarks.
{"title":"Application workload model generation methodologies for system-level design exploration","authors":"Jukka Saastamoinen, J. Kreku","doi":"10.1109/DASIP.2011.6136888","DOIUrl":"https://doi.org/10.1109/DASIP.2011.6136888","url":null,"abstract":"As most of the applications of embedded system products are realized in software, the performance estimation of software is crucial for successful system design. Significant part of the functionality of these applications is based on services provided by the underlying software libraries. Often used performance evaluation technique today is the system-level performance simulation of the applications and platforms using abstracted workload and execution platform models. The accuracy of the software performance results is dependent on how closely the application workload model reflects actual software as a whole. This paper presents a methodology which combines compiler based user code workload model generation with workload extraction of pre-compiled libraries, while exploiting an overall approach and execution platform model developed previously. Benefit of the proposed methodology compared to earlier solution is experimented using a set of benchmarks.","PeriodicalId":199500,"journal":{"name":"Proceedings of the 2011 Conference on Design & Architectures for Signal & Image Processing (DASIP)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115603223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/DASIP.2011.6136900
G. Ochoa-Ruiz, E. Bourennane, H. Rabah, Ouassila Labbani
Dynamic Partial Reconfiguration (DPR) has been introduced in recent years as a method to increase the flexibility of FPGA designs. However, using DPR for building complex systems remains a daunting task. Recently, approaches based on MDE and UML MARTE standard have emerged which aim to simplify the design of complex SoCs. Moreover, with the recent standardization of the IP-XACT specification, there is an increasing interest to use it in MDE methodologies to ease system integration and to enable design flow automation. In this paper we propose an MARTE/MDE approach which exploits the capabilities of IP-XACT to model and automatically generate DPR SoC designs. In particular, our goal is the creation of the structural top level description of the system and to include DPR support in the used IP cores. The generated IP-XACT descriptions are transformed to obtain the files required as inputs by the EDK flow and then synthesized to generate the netlists used by the DPR flow. The methodology is demonstrated using two CODEC cores (CAVLC and VLC) into a MicroBlaze based DPR SoC.
{"title":"High-level modelling and automatic generation of dynamicaly reconfigurable systems","authors":"G. Ochoa-Ruiz, E. Bourennane, H. Rabah, Ouassila Labbani","doi":"10.1109/DASIP.2011.6136900","DOIUrl":"https://doi.org/10.1109/DASIP.2011.6136900","url":null,"abstract":"Dynamic Partial Reconfiguration (DPR) has been introduced in recent years as a method to increase the flexibility of FPGA designs. However, using DPR for building complex systems remains a daunting task. Recently, approaches based on MDE and UML MARTE standard have emerged which aim to simplify the design of complex SoCs. Moreover, with the recent standardization of the IP-XACT specification, there is an increasing interest to use it in MDE methodologies to ease system integration and to enable design flow automation. In this paper we propose an MARTE/MDE approach which exploits the capabilities of IP-XACT to model and automatically generate DPR SoC designs. In particular, our goal is the creation of the structural top level description of the system and to include DPR support in the used IP cores. The generated IP-XACT descriptions are transformed to obtain the files required as inputs by the EDK flow and then synthesized to generate the netlists used by the DPR flow. The methodology is demonstrated using two CODEC cores (CAVLC and VLC) into a MicroBlaze based DPR SoC.","PeriodicalId":199500,"journal":{"name":"Proceedings of the 2011 Conference on Design & Architectures for Signal & Image Processing (DASIP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117163427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-01DOI: 10.1109/DASIP.2011.6136875
Ghislain Roquier, E. Bezati, Richard Thavot, M. Mattavelli
The possibility of specifying both software and hardware components from a unified high-level description of an application is a very attractive design approach. However, despite the efforts spent for implementing such an approach using general purpose programming languages, it has not yet shown to be viable and efficient for complex designs. One of the reasons is that the sequential programming model does not naturally provide explicit and scalable parallelism and composability properties that effectively permits to build portable applications that can be efficiently mapped on different kind of heterogeneous platforms. Conversely dataflow programming is an approach that naturally provides explicit parallel programs with composability properties. This paper presents a methodology for the hardware/software co-design that enables, by direct synthesis of both hardware descriptions (HDL), software components (C/C++) and mutual interfaces, to generate an implementation of the application from an unique dataflow program, running onto heterogeneous architectures composed by reconfigurable hardware and multi-core processors. Experimental results based on the implementation of a JPEG codec onto an heterogeneous platform are also provided to show the capabilities and flexibility of the implementation approach.
{"title":"Hardware/software co-design of dataflow programs for reconfigurable hardware and multi-core platforms","authors":"Ghislain Roquier, E. Bezati, Richard Thavot, M. Mattavelli","doi":"10.1109/DASIP.2011.6136875","DOIUrl":"https://doi.org/10.1109/DASIP.2011.6136875","url":null,"abstract":"The possibility of specifying both software and hardware components from a unified high-level description of an application is a very attractive design approach. However, despite the efforts spent for implementing such an approach using general purpose programming languages, it has not yet shown to be viable and efficient for complex designs. One of the reasons is that the sequential programming model does not naturally provide explicit and scalable parallelism and composability properties that effectively permits to build portable applications that can be efficiently mapped on different kind of heterogeneous platforms. Conversely dataflow programming is an approach that naturally provides explicit parallel programs with composability properties. This paper presents a methodology for the hardware/software co-design that enables, by direct synthesis of both hardware descriptions (HDL), software components (C/C++) and mutual interfaces, to generate an implementation of the application from an unique dataflow program, running onto heterogeneous architectures composed by reconfigurable hardware and multi-core processors. Experimental results based on the implementation of a JPEG codec onto an heterogeneous platform are also provided to show the capabilities and flexibility of the implementation approach.","PeriodicalId":199500,"journal":{"name":"Proceedings of the 2011 Conference on Design & Architectures for Signal & Image Processing (DASIP)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121318094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}