Pub Date : 2016-10-12DOI: 10.1109/DASIP.2016.7853791
E. Raffin, W. Hamidouche, Erwan Nogues, M. Pelcat, D. Ménard
Scalable video coding offers a large choice of configurations when decoding a compressed video. A single encoded bitstream can be decoded in multiple modes, from a full video quality mode to different degraded video quality modes. In the bitstream, data is separated into layers, each layer containing the information relative to a quality level and depending on information from other layers. In the context of an energy constrained scalable video decoder executed on an embedded multicore platform, this paper investigates the energy consumption of an optimized decoder relative to the decoded layers and decoded video quality. These numbers show that a large set of trade-offs between energy and quality is offered by SHVC and can be used to precisely adapt the decoder to its energy constraints.
{"title":"Scalable HEVC decoder for mobile devices: Trade-off between energy consumption and quality","authors":"E. Raffin, W. Hamidouche, Erwan Nogues, M. Pelcat, D. Ménard","doi":"10.1109/DASIP.2016.7853791","DOIUrl":"https://doi.org/10.1109/DASIP.2016.7853791","url":null,"abstract":"Scalable video coding offers a large choice of configurations when decoding a compressed video. A single encoded bitstream can be decoded in multiple modes, from a full video quality mode to different degraded video quality modes. In the bitstream, data is separated into layers, each layer containing the information relative to a quality level and depending on information from other layers. In the context of an energy constrained scalable video decoder executed on an embedded multicore platform, this paper investigates the energy consumption of an optimized decoder relative to the decoded layers and decoded video quality. These numbers show that a large set of trade-offs between energy and quality is offered by SHVC and can be used to precisely adapt the decoder to its energy constraints.","PeriodicalId":6494,"journal":{"name":"2016 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"1 1","pages":"18-25"},"PeriodicalIF":0.0,"publicationDate":"2016-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91300786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-10-12DOI: 10.1109/DASIP.2016.7853833
M. Tran, E. Casseau, M. Gautier
Field Programmable Gate Array (FPGA) technology is expected to play a key role in the development of Software Defined Radio platforms. To reduce design time required when targeting such a technology, high-level synthesis tools can be used. These tools are available in current FPGA CAD tools. In this demo, we will present the design of a FFT component for Long Term Evolution standard and its implementation on a Xilinx Virtex 6 based ML605 board. Our flexible FFT can support FFT sizes among 128, 256, 512, 1024, 1536 and 2048 to compute OFDM symbols. The FFT is specified at a high-level (i.e. in C language). Both dynamic partial reconfiguration and run-time configuration based on input control signals of the flexible FFT will be shown. These two approaches provide interesting tradeoff between reconfiguration time and area.
{"title":"Demo abstract: FPGA-based implementation of a flexible FFT dedicated to LTE standard","authors":"M. Tran, E. Casseau, M. Gautier","doi":"10.1109/DASIP.2016.7853833","DOIUrl":"https://doi.org/10.1109/DASIP.2016.7853833","url":null,"abstract":"Field Programmable Gate Array (FPGA) technology is expected to play a key role in the development of Software Defined Radio platforms. To reduce design time required when targeting such a technology, high-level synthesis tools can be used. These tools are available in current FPGA CAD tools. In this demo, we will present the design of a FFT component for Long Term Evolution standard and its implementation on a Xilinx Virtex 6 based ML605 board. Our flexible FFT can support FFT sizes among 128, 256, 512, 1024, 1536 and 2048 to compute OFDM symbols. The FFT is specified at a high-level (i.e. in C language). Both dynamic partial reconfiguration and run-time configuration based on input control signals of the flexible FFT will be shown. These two approaches provide interesting tradeoff between reconfiguration time and area.","PeriodicalId":6494,"journal":{"name":"2016 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"26 1","pages":"241-242"},"PeriodicalIF":0.0,"publicationDate":"2016-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81591561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-10-12DOI: 10.1109/DASIP.2016.7853825
Tian Xia, Mohamad-Al-Fadl Rihani, Jean-Christophe Prévotet, F. Nouvel
Today, the CPU-FPGA hybrid architecture has become more and more popular in embedded systems. In this approach CPU and FPGA domains are tightly connected by dedicated interconnections, which makes it possible to enhance the traditional CPU virtualization with the dynamic partial reconfiguration (DPR) technology on FPGA. Our research is intended to propose an innovative approach Ker-ONE, which provides a lightweight micro-kernel to support real-time virtualization. Plus, it provide an abstract and transparent layer for virtual machines (VM) to access reconfigurable accelerators. In this demo, the proposed framework is implemented on ARM-FPGA platform, and the mechanism of real-time scheduling/allocation is presented in details via GUI demonstration. We have shown that our approach manages to achieve the a high level of performance with low overheads.
{"title":"Demo: Ker-ONE: Embedded virtualization approach with dynamic reconfigurable accelerators management","authors":"Tian Xia, Mohamad-Al-Fadl Rihani, Jean-Christophe Prévotet, F. Nouvel","doi":"10.1109/DASIP.2016.7853825","DOIUrl":"https://doi.org/10.1109/DASIP.2016.7853825","url":null,"abstract":"Today, the CPU-FPGA hybrid architecture has become more and more popular in embedded systems. In this approach CPU and FPGA domains are tightly connected by dedicated interconnections, which makes it possible to enhance the traditional CPU virtualization with the dynamic partial reconfiguration (DPR) technology on FPGA. Our research is intended to propose an innovative approach Ker-ONE, which provides a lightweight micro-kernel to support real-time virtualization. Plus, it provide an abstract and transparent layer for virtual machines (VM) to access reconfigurable accelerators. In this demo, the proposed framework is implemented on ARM-FPGA platform, and the mechanism of real-time scheduling/allocation is presented in details via GUI demonstration. We have shown that our approach manages to achieve the a high level of performance with low overheads.","PeriodicalId":6494,"journal":{"name":"2016 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"52 1","pages":"225-226"},"PeriodicalIF":0.0,"publicationDate":"2016-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82860397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-10-12DOI: 10.1109/DASIP.2016.7853832
Théotime Bollengier, M. Najem, Jean-Christophe Le Lann, Loïc Lagadec
Overlays are reconfigurable architectures synthesized on commercial of the shelf (COTS) FPGAs. Overlays bring some advantages such as portability, resources abstraction, fast configuration, and can exhibit features independent from the host FPGA. We designed a fine-grained overlay implementing novel features easing the management of such architectures in a cluster of heterogeneous COTS FPGAs. This demonstration shows the use of this overlay in an FPGA cluster, performing a hardware application live migration between two nodes of a cluster. It also illustrates fault tolerance of the cluster.
{"title":"Demo: Overlay architectures for heterogeneous FPGA cluster management","authors":"Théotime Bollengier, M. Najem, Jean-Christophe Le Lann, Loïc Lagadec","doi":"10.1109/DASIP.2016.7853832","DOIUrl":"https://doi.org/10.1109/DASIP.2016.7853832","url":null,"abstract":"Overlays are reconfigurable architectures synthesized on commercial of the shelf (COTS) FPGAs. Overlays bring some advantages such as portability, resources abstraction, fast configuration, and can exhibit features independent from the host FPGA. We designed a fine-grained overlay implementing novel features easing the management of such architectures in a cluster of heterogeneous COTS FPGAs. This demonstration shows the use of this overlay in an FPGA cluster, performing a hardware application live migration between two nodes of a cluster. It also illustrates fault tolerance of the cluster.","PeriodicalId":6494,"journal":{"name":"2016 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"115 1","pages":"239-240"},"PeriodicalIF":0.0,"publicationDate":"2016-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80852288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-10-10DOI: 10.1109/DASIP.2016.7853796
R. Danilo, Hugues Wouafo, C. Chavet, Vincent Gripon, L. Conde-Canencia, P. Coussy
Associative Memories (AM) are storage devices that allow addressing content from part of it, in opposition of classical index-based memories. This property makes them promising candidates for various search challenges including pattern detection in images. Clustered based Neural Networks (CbNN) allow efficient design of AM by providing fast pattern retrieval, especially when implemented in hardware. In particular, they can be used to store and next quickly identify oriented edges in images. However, current models of CbNN only provide good performances when facing erasures in the inputs. This paper introduces several improvements to the CbNN model in order to cope with intrusion and additive noises. Namely, we change the initialization of neurons to account for precise information depending on Euclidean distance. We also update the activation rules accordingly, resulting in an efficient handling of various types of input noise. To complete this paper, associated hardware architectures are presented along with the proposed computation models and those are compared with the existing CbNN implementation. Synthesis results show that among them, several divide the cost of that implementation by 3 while increasing the maximal frequency by 25%.
{"title":"Associative Memory based on clustered Neural Networks: Improved model and architecture for Oriented Edge Detection","authors":"R. Danilo, Hugues Wouafo, C. Chavet, Vincent Gripon, L. Conde-Canencia, P. Coussy","doi":"10.1109/DASIP.2016.7853796","DOIUrl":"https://doi.org/10.1109/DASIP.2016.7853796","url":null,"abstract":"Associative Memories (AM) are storage devices that allow addressing content from part of it, in opposition of classical index-based memories. This property makes them promising candidates for various search challenges including pattern detection in images. Clustered based Neural Networks (CbNN) allow efficient design of AM by providing fast pattern retrieval, especially when implemented in hardware. In particular, they can be used to store and next quickly identify oriented edges in images. However, current models of CbNN only provide good performances when facing erasures in the inputs. This paper introduces several improvements to the CbNN model in order to cope with intrusion and additive noises. Namely, we change the initialization of neurons to account for precise information depending on Euclidean distance. We also update the activation rules accordingly, resulting in an efficient handling of various types of input noise. To complete this paper, associated hardware architectures are presented along with the proposed computation models and those are compared with the existing CbNN implementation. Synthesis results show that among them, several divide the cost of that implementation by 3 while increasing the maximal frequency by 25%.","PeriodicalId":6494,"journal":{"name":"2016 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"1 1","pages":"51-58"},"PeriodicalIF":0.0,"publicationDate":"2016-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77104112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-10-01DOI: 10.1109/DASIP.2016.7853792
Alexandre Mercat, W. Hamidouche, M. Pelcat, D. Ménard
The High Efficiency Video Coding (HEVC) standard provides up to 40% bitrate savings compared to the state-of-art H.264/AVC standard for the same perceptual video quality. Power consumption constraints represent a serious challenge for embedded applications based on a software design. A large number of systems are likely to integrate the HEVC codec in the long run and will need to be energy aware. In this context, we carry out a complexity study of the HEVC coding trees encoding process. This study shows that the complexity of encoding a Coding Unit (CU) of a given size has a non trivial probability density shape and thus can hardly be predicted with accuracy. However, we propose a model that linearly links the ratios between the complexities of coarse-grain and lower-grain CU encodings with a precision error under 6%. This model is valid for a wide range of video contents coded in Intra configurations at different bitrates. This information is useful to control encoder energy during the encoding process on battery limited devices.
{"title":"Estimating encoding complexity of a real-time embedded software HEVC codec","authors":"Alexandre Mercat, W. Hamidouche, M. Pelcat, D. Ménard","doi":"10.1109/DASIP.2016.7853792","DOIUrl":"https://doi.org/10.1109/DASIP.2016.7853792","url":null,"abstract":"The High Efficiency Video Coding (HEVC) standard provides up to 40% bitrate savings compared to the state-of-art H.264/AVC standard for the same perceptual video quality. Power consumption constraints represent a serious challenge for embedded applications based on a software design. A large number of systems are likely to integrate the HEVC codec in the long run and will need to be energy aware. In this context, we carry out a complexity study of the HEVC coding trees encoding process. This study shows that the complexity of encoding a Coding Unit (CU) of a given size has a non trivial probability density shape and thus can hardly be predicted with accuracy. However, we propose a model that linearly links the ratios between the complexities of coarse-grain and lower-grain CU encodings with a precision error under 6%. This model is valid for a wide range of video contents coded in Intra configurations at different bitrates. This information is useful to control encoder energy during the encoding process on battery limited devices.","PeriodicalId":6494,"journal":{"name":"2016 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"65 1","pages":"26-33"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74007789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-10-01DOI: 10.1109/DASIP.2016.7853809
F. Lemaitre, L. Lacassagne
Many linear algebra libraries, such as the Intel MKL, Magma or Eigen, provide fast Cholesky factorization. These libraries are suited for big matrices but perform slowly on small ones. Even though State-of-the-Art studies begin to take an interest in small matrices, they usually feature a few hundreds rows. Fields like Computer Vision or High Energy Physics use tiny matrices. In this paper we show that it is possible to speedup the Cholesky factorization for tiny matrices by grouping them in batches and using highly specialized code. We provide High Level Transformations that accelerate the factorization for current Intel SIMD architectures (SSE, AVX2, KNC, AVX512). We achieve with these transformations combined with SIMD a speedup from 13 to 31 for the whole resolution compared to the naive code on a single core AVX2 machine and a speedup from 15 to 33 with multithreading compared to the multithreaded naive code.
{"title":"Batched Cholesky factorization for tiny matrices","authors":"F. Lemaitre, L. Lacassagne","doi":"10.1109/DASIP.2016.7853809","DOIUrl":"https://doi.org/10.1109/DASIP.2016.7853809","url":null,"abstract":"Many linear algebra libraries, such as the Intel MKL, Magma or Eigen, provide fast Cholesky factorization. These libraries are suited for big matrices but perform slowly on small ones. Even though State-of-the-Art studies begin to take an interest in small matrices, they usually feature a few hundreds rows. Fields like Computer Vision or High Energy Physics use tiny matrices. In this paper we show that it is possible to speedup the Cholesky factorization for tiny matrices by grouping them in batches and using highly specialized code. We provide High Level Transformations that accelerate the factorization for current Intel SIMD architectures (SSE, AVX2, KNC, AVX512). We achieve with these transformations combined with SIMD a speedup from 13 to 31 for the whole resolution compared to the naive code on a single core AVX2 machine and a speedup from 15 to 33 with multithreading compared to the multithreaded naive code.","PeriodicalId":6494,"journal":{"name":"2016 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"5 1","pages":"130-137"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74032305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-10-01DOI: 10.1109/DASIP.2016.7853814
Robert Krutsch, S. Naidu
Convolution Neural Networks today provide the best results for many image detection and image recognition problems. The network accuracy increase in the past years is obtained through an increase in complexity of the structure and amount of parameters of the deep networks. Memory bandwidth and power consumption constraints are limiting the deployment of such state-of-the-art architecture in low power embedded applications. Reduced coefficient bit depth is one of the most frequently used approach to bring the deep learning neural networks into low power embedded hardware accelerators. In this paper we propose a reduced precision, fixed point implementation that can reduce bandwidth and power consumption significantly. The results show that with an 8bit representation for more than 64% of the parameters less than 0.5% accuracy is lost. As expected, the error resilience varies from layer to layer and convolution kernel to convolution kernel. To cope with this variability and understand what parameter need what type of precision we have developed a Monte Carlo simulation tool that explores the decision space.
{"title":"Monte Carlo method based precision analysis of deep convolution nets","authors":"Robert Krutsch, S. Naidu","doi":"10.1109/DASIP.2016.7853814","DOIUrl":"https://doi.org/10.1109/DASIP.2016.7853814","url":null,"abstract":"Convolution Neural Networks today provide the best results for many image detection and image recognition problems. The network accuracy increase in the past years is obtained through an increase in complexity of the structure and amount of parameters of the deep networks. Memory bandwidth and power consumption constraints are limiting the deployment of such state-of-the-art architecture in low power embedded applications. Reduced coefficient bit depth is one of the most frequently used approach to bring the deep learning neural networks into low power embedded hardware accelerators. In this paper we propose a reduced precision, fixed point implementation that can reduce bandwidth and power consumption significantly. The results show that with an 8bit representation for more than 64% of the parameters less than 0.5% accuracy is lost. As expected, the error resilience varies from layer to layer and convolution kernel to convolution kernel. To cope with this variability and understand what parameter need what type of precision we have developed a Monte Carlo simulation tool that explores the decision space.","PeriodicalId":6494,"journal":{"name":"2016 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"42 1","pages":"162-167"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78435175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-10-01DOI: 10.1109/DASIP.2016.7853811
J. A. Holanda, João MP Cardoso, E. Marques
This paper describes the mapping and the acceleration of an object detection algorithm on a multiprocessor system based on an FPGA. We use HOG (Histogram of Oriented Gradients), one of the most popular algorithms for detection of different classes of objects and currently being used in smart embedded systems. The use of HOG on such systems requires efficient implementations in order to provide high performance possibly with low energy/power consumption budgets. Also, as variations and adaptations of this algorithm are needed to deal with different scenarios and classes of objects, programmability is required to allow greater development flexibility. In this paper we show our approach towards implementing the HOG algorithm into a multi-softcore Nios II based-system, bearing in mind high-performance and programmability issues. By applying source-to-source transformations we obtain speedups of 19× and by using pipelined processing we reduce the algorithms execution time 49×. We also show that improving the hardware with acceleration units can result in speedups of 72.4× compared to the embedded baseline application.
{"title":"A pipelined multi-softcore approach for the HOG algorithm","authors":"J. A. Holanda, João MP Cardoso, E. Marques","doi":"10.1109/DASIP.2016.7853811","DOIUrl":"https://doi.org/10.1109/DASIP.2016.7853811","url":null,"abstract":"This paper describes the mapping and the acceleration of an object detection algorithm on a multiprocessor system based on an FPGA. We use HOG (Histogram of Oriented Gradients), one of the most popular algorithms for detection of different classes of objects and currently being used in smart embedded systems. The use of HOG on such systems requires efficient implementations in order to provide high performance possibly with low energy/power consumption budgets. Also, as variations and adaptations of this algorithm are needed to deal with different scenarios and classes of objects, programmability is required to allow greater development flexibility. In this paper we show our approach towards implementing the HOG algorithm into a multi-softcore Nios II based-system, bearing in mind high-performance and programmability issues. By applying source-to-source transformations we obtain speedups of 19× and by using pipelined processing we reduce the algorithms execution time 49×. We also show that improving the hardware with acceleration units can result in speedups of 72.4× compared to the embedded baseline application.","PeriodicalId":6494,"journal":{"name":"2016 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"21 1","pages":"146-153"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77474897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-10-01DOI: 10.1109/DASIP.2016.7853813
W. Stechele, T. Kryjak, L. Lacassagne, D. Houzet, M. Danek
The focus of this special session is on computational challenges and solutions related to automotive parallel computing. The five papers cover aspects of machine learning, FPGA-based hardware acceleration, memory optimization, and multi-core systems. Application areas include image and radar processing, as well as AUTOSAR applications.
{"title":"Special session 1 automotive parallel computing challenges - architectures, applications and tricks","authors":"W. Stechele, T. Kryjak, L. Lacassagne, D. Houzet, M. Danek","doi":"10.1109/DASIP.2016.7853813","DOIUrl":"https://doi.org/10.1109/DASIP.2016.7853813","url":null,"abstract":"The focus of this special session is on computational challenges and solutions related to automotive parallel computing. The five papers cover aspects of machine learning, FPGA-based hardware acceleration, memory optimization, and multi-core systems. Application areas include image and radar processing, as well as AUTOSAR applications.","PeriodicalId":6494,"journal":{"name":"2016 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"13 1","pages":"161"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84443384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}