Pub Date : 2016-10-01DOI: 10.1109/DASIP.2016.7853823
P. Langlois, Kevin J. M. Martin, E. J. Martínez
During the DASIP 2016 Demo Night, universities, public research institutes and companies will demonstrate their hardware platforms, prototypes and tools. Demos shown at the Demo Night are accompanied by a short paper describing the demo and associated work. The goal of this event is to present collaborative projects and to demonstrate working solutions. DASIP Demo Night includes a reception with a casual social environment conducive to friendly discussions and networking.
{"title":"Demo Night","authors":"P. Langlois, Kevin J. M. Martin, E. J. Martínez","doi":"10.1109/DASIP.2016.7853823","DOIUrl":"https://doi.org/10.1109/DASIP.2016.7853823","url":null,"abstract":"During the DASIP 2016 Demo Night, universities, public research institutes and companies will demonstrate their hardware platforms, prototypes and tools. Demos shown at the Demo Night are accompanied by a short paper describing the demo and associated work. The goal of this event is to present collaborative projects and to demonstrate working solutions. DASIP Demo Night includes a reception with a casual social environment conducive to friendly discussions and networking.","PeriodicalId":6494,"journal":{"name":"2016 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"1 1","pages":"222"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84497207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-10-01DOI: 10.1109/DASIP.2016.7853816
D. Houzet, V. Fresse, H. Konik
most of advanced driver assistance systems are developed for safety and better driving. Safety system using image processing, like Hough transform, requires a lot of memory whose underutilization can lead to decrease the real time performances. Internal memories on reconfigurable devices such as FPGA are limited in size, number and bandwidth. Memory optimization cannot be done solely at the application level. Holistic design-space exploration is necessary to leverage the inherent locality of applications and reduce memory accesses. In this paper, we target FPGA internal memories optimization by adding a small register-based multi-ported cache memory in front of each internal FPGA memory block to increase their bandwidth. The dimensions of this cache are explored according to the locality of the function implemented. The exploration uses a cumulative-write cache exhibiting 1.5 to 2 speedup compared to the best FPGA implementations. The solution is optimized with an identical number of memory and few added registers and LUT.
{"title":"FPGA memory optimization for real-time imaging","authors":"D. Houzet, V. Fresse, H. Konik","doi":"10.1109/DASIP.2016.7853816","DOIUrl":"https://doi.org/10.1109/DASIP.2016.7853816","url":null,"abstract":"most of advanced driver assistance systems are developed for safety and better driving. Safety system using image processing, like Hough transform, requires a lot of memory whose underutilization can lead to decrease the real time performances. Internal memories on reconfigurable devices such as FPGA are limited in size, number and bandwidth. Memory optimization cannot be done solely at the application level. Holistic design-space exploration is necessary to leverage the inherent locality of applications and reduce memory accesses. In this paper, we target FPGA internal memories optimization by adding a small register-based multi-ported cache memory in front of each internal FPGA memory block to increase their bandwidth. The dimensions of this cache are explored according to the locality of the function implemented. The exploration uses a cumulative-write cache exhibiting 1.5 to 2 speedup compared to the best FPGA implementations. The solution is optimized with an identical number of memory and few added registers and LUT.","PeriodicalId":6494,"journal":{"name":"2016 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"34 1","pages":"176-182"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72863895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-10-01DOI: 10.1109/DASIP.2016.7853826
A. C. Yuzuguler, W. Simon, A. Ibrahim, F. Angiolini, M. Arditi, J. Thiran, G. Micheli
In medical diagnosis, ultrasound (US) imaging is one of the most common, safe, and powerful techniques. Volumetric (3D) US is potentially very attractive, compared to 2D US, because it might enable telesonography - decoupling the local image acquisition, by an untrained person, and the diagnosis, by the trained sonographer, who can be remote. Unfortunately, current 3D systems are hospital-oriented, bulky and expensive, and they cannot be available in emergency operations or rural areas. This motivates us to develop a portable US platform with cheap, battery-operated, more efficient electronics.
{"title":"Demo: Efficient delay and apodization for on-FPGA 3D ultrasound","authors":"A. C. Yuzuguler, W. Simon, A. Ibrahim, F. Angiolini, M. Arditi, J. Thiran, G. Micheli","doi":"10.1109/DASIP.2016.7853826","DOIUrl":"https://doi.org/10.1109/DASIP.2016.7853826","url":null,"abstract":"In medical diagnosis, ultrasound (US) imaging is one of the most common, safe, and powerful techniques. Volumetric (3D) US is potentially very attractive, compared to 2D US, because it might enable telesonography - decoupling the local image acquisition, by an untrained person, and the diagnosis, by the trained sonographer, who can be remote. Unfortunately, current 3D systems are hospital-oriented, bulky and expensive, and they cannot be available in emergency operations or rural areas. This motivates us to develop a portable US platform with cheap, battery-operated, more efficient electronics.","PeriodicalId":6494,"journal":{"name":"2016 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"1 1","pages":"227-228"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88584491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-10-01DOI: 10.1109/DASIP.2016.7853802
M. A. Arslan, F. Gruian, K. Kuchcinski, Andreas Karlsson
Today's multimedia and DSP applications impose requirements on performance and power consumption that only custom processor architectures with SIMD capabilities can satisfy. However, the specific features of such architectures, including vector operations and high-bandwidth complex memory organization, make them notoriously complicated and time consuming to program. In this paper we present an automated code generation approach that dramatically reduces the effort of programming such architectures, by carrying out instruction scheduling and memory allocation based on a constraint programming formulation. Furthermore, the quality of the generated code is close to that of hand-written code by an experienced programmer with knowledge of the architecture. We demonstrate the viability of our approach on an existing custom heterogeneous DSP architecture, by compiling and running a number of typical DSP kernels, and comparing the results to hand-optimized code.
{"title":"Code generation for a SIMD architecture with custom memory organisation","authors":"M. A. Arslan, F. Gruian, K. Kuchcinski, Andreas Karlsson","doi":"10.1109/DASIP.2016.7853802","DOIUrl":"https://doi.org/10.1109/DASIP.2016.7853802","url":null,"abstract":"Today's multimedia and DSP applications impose requirements on performance and power consumption that only custom processor architectures with SIMD capabilities can satisfy. However, the specific features of such architectures, including vector operations and high-bandwidth complex memory organization, make them notoriously complicated and time consuming to program. In this paper we present an automated code generation approach that dramatically reduces the effort of programming such architectures, by carrying out instruction scheduling and memory allocation based on a constraint programming formulation. Furthermore, the quality of the generated code is close to that of hand-written code by an experienced programmer with knowledge of the architecture. We demonstrate the viability of our approach on an existing custom heterogeneous DSP architecture, by compiling and running a number of typical DSP kernels, and comparing the results to hand-optimized code.","PeriodicalId":6494,"journal":{"name":"2016 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"168 1","pages":"90-97"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77775362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-10-01DOI: 10.1109/DASIP.2016.7853807
Erwan Moreac, A. Rossi, J. Laurent, P. Bomel
Networks-on-Chip (NoCs) are recognized as the solution to address the communication bottleneck in Multiprocessor System-on-Chip (MPSoC). As the NoC represents a significant part of the system power consumption, MPSoC designers expect accurate power models in order to produce energy efficient systems. Nowadays, NoC simulators rely on power models that integrate link models without crosstalk modeling. In this work, we present a link power model with crosstalk modeling embedded in a NoC simulator. We show that the crosstalk effect has a deep impact on NoC energy consumption since our results demonstrate that classical models generate errors up to 45.5% on the whole NoC energy consumption estimation.
{"title":"Crosstalk-aware link power model for Networks-on-Chip","authors":"Erwan Moreac, A. Rossi, J. Laurent, P. Bomel","doi":"10.1109/DASIP.2016.7853807","DOIUrl":"https://doi.org/10.1109/DASIP.2016.7853807","url":null,"abstract":"Networks-on-Chip (NoCs) are recognized as the solution to address the communication bottleneck in Multiprocessor System-on-Chip (MPSoC). As the NoC represents a significant part of the system power consumption, MPSoC designers expect accurate power models in order to produce energy efficient systems. Nowadays, NoC simulators rely on power models that integrate link models without crosstalk modeling. In this work, we present a link power model with crosstalk modeling embedded in a NoC simulator. We show that the crosstalk effect has a deep impact on NoC energy consumption since our results demonstrate that classical models generate errors up to 45.5% on the whole NoC energy consumption estimation.","PeriodicalId":6494,"journal":{"name":"2016 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"20 1","pages":"121-128"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83358267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-10-01DOI: 10.1109/DASIP.2016.7853794
F. Palumbo
In the field of Signal Processing in general, and in particular in the Image Processing one, it is quite common to customize the underling architecture to improve computing efficiency. This section is dedicated to Architectures for Image Processing and four different papers will be presented. Solutions based on application specific processors, characterized on the processing requirements, may improve on board processing and facilitate data transmission from distributed computing nodes as presented in first paper. Memory hierarchy implementation and management is fundamental to improve computing efficiency. In this sense, the second paper investigates the usage of associative memories for pattern detection purposes and will apply them in the context of Clustered Neural Networks, while the third one presents a memory efficient architecture implementing in hardware the Multi-Scale Line Detector algorithm for real-time retinal blood vessel detection. Finally, the last paper is more system oriented, being focused on modelling techniques to derive and verify lossless compression IP cores.
{"title":"Session 2: Architectures for image processing","authors":"F. Palumbo","doi":"10.1109/DASIP.2016.7853794","DOIUrl":"https://doi.org/10.1109/DASIP.2016.7853794","url":null,"abstract":"In the field of Signal Processing in general, and in particular in the Image Processing one, it is quite common to customize the underling architecture to improve computing efficiency. This section is dedicated to Architectures for Image Processing and four different papers will be presented. Solutions based on application specific processors, characterized on the processing requirements, may improve on board processing and facilitate data transmission from distributed computing nodes as presented in first paper. Memory hierarchy implementation and management is fundamental to improve computing efficiency. In this sense, the second paper investigates the usage of associative memories for pattern detection purposes and will apply them in the context of Clustered Neural Networks, while the third one presents a memory efficient architecture implementing in hardware the Multi-Scale Line Detector algorithm for real-time retinal blood vessel detection. Finally, the last paper is more system oriented, being focused on modelling techniques to derive and verify lossless compression IP cores.","PeriodicalId":6494,"journal":{"name":"2016 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"1 1","pages":"42"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79402787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-10-01DOI: 10.1109/DASIP.2016.7853831
R. Salvador, H. Fabelo, R. Lazcano, S. Ortega, D. Madroñal, G. Callicó, E. Juárez, C. Sanz
In this paper, a demonstrator of three different elements of the EU FET HELICoiD project is introduced. The goal of this demonstration is to show how the combination of hyperspectral imaging and machine learning can be a potential solution to precise real-time detection of tumor tissues during surgical operations. The HELICoiD setup consists of two hyperspectral cameras, a scanning unit, an illumination system, a data processing system and an EMB01 accelerator platform, which hosts an MPPA-256 manycore chip. All the components are mounted fulfilling restrictions from surgical environments, as shown in the accompanying video recorded at the operating room. An in-vivo human brain hyperspectral image data base, obtained at the University Hospital Doctor Negrin in Las Palmas de Gran Canaria, has been employed as input to different supervised classification algorithms (SVM, RF, NN) and to a spatial-spectral filtering stage (SVM-KNN). The resulting classification maps are shown in this demo. In addition, the implementation of the SVM-KNN classification algorithm on the MPPA EMB01 platform is demonstrated in the live demo.
{"title":"Demo: HELICoiD tool demonstrator for real-time brain cancer detection","authors":"R. Salvador, H. Fabelo, R. Lazcano, S. Ortega, D. Madroñal, G. Callicó, E. Juárez, C. Sanz","doi":"10.1109/DASIP.2016.7853831","DOIUrl":"https://doi.org/10.1109/DASIP.2016.7853831","url":null,"abstract":"In this paper, a demonstrator of three different elements of the EU FET HELICoiD project is introduced. The goal of this demonstration is to show how the combination of hyperspectral imaging and machine learning can be a potential solution to precise real-time detection of tumor tissues during surgical operations. The HELICoiD setup consists of two hyperspectral cameras, a scanning unit, an illumination system, a data processing system and an EMB01 accelerator platform, which hosts an MPPA-256 manycore chip. All the components are mounted fulfilling restrictions from surgical environments, as shown in the accompanying video recorded at the operating room. An in-vivo human brain hyperspectral image data base, obtained at the University Hospital Doctor Negrin in Las Palmas de Gran Canaria, has been employed as input to different supervised classification algorithms (SVM, RF, NN) and to a spatial-spectral filtering stage (SVM-KNN). The resulting classification maps are shown in this demo. In addition, the implementation of the SVM-KNN classification algorithm on the MPPA EMB01 platform is demonstrated in the live demo.","PeriodicalId":6494,"journal":{"name":"2016 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"265 1","pages":"237-238"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79582294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-10-01DOI: 10.1109/DASIP.2016.7853824
Mohamed Mourad Hafidhi, E. Boutillon, Arnaud Dion
The increase in integration density and the requirement of low power supplies to reduce energy consumption can make circuits more and more sensitive to hardware errors. The loss of robustness increases with process/voltage and temperature (PVT) variations. This demo presents a platform used first to implement a noiseless GPS receiver algorithm. Redundant mechanisms can be added, then, to the design to make the GPS receiver more resilient against upset errors due low supply voltage. The platform can be used, so, to evaluate the performance and the complexity of the proposed mechanisms.
{"title":"Demo: Localisation in a faulty digital GPS receiver","authors":"Mohamed Mourad Hafidhi, E. Boutillon, Arnaud Dion","doi":"10.1109/DASIP.2016.7853824","DOIUrl":"https://doi.org/10.1109/DASIP.2016.7853824","url":null,"abstract":"The increase in integration density and the requirement of low power supplies to reduce energy consumption can make circuits more and more sensitive to hardware errors. The loss of robustness increases with process/voltage and temperature (PVT) variations. This demo presents a platform used first to implement a noiseless GPS receiver algorithm. Redundant mechanisms can be added, then, to the design to make the GPS receiver more resilient against upset errors due low supply voltage. The platform can be used, so, to evaluate the performance and the complexity of the proposed mechanisms.","PeriodicalId":6494,"journal":{"name":"2016 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"14 1","pages":"223-224"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75302897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-10-01DOI: 10.1109/DASIP.2016.7853830
Ronan Parois, W. Hamidouche, E. Mora, M. Raulet, O. Déforges
In this paper we present a real-time streaming demonstration with a multi-layer architecture of a pipelined software High Efficiency Video Coding (HEVC) encoders with inter-layer prediction enabling Scalable HEVC (SHVC) encodings. This SHVC encoder is implemented on an innovative platform performing real-time encodings that already demonstrated promising performance with HDR, HFR and SHVC implementation in previous demonstrations [1], [2]. The transmitted content consists of a spatial SHVC bitstream composed of a High Definition (HD) Base Layer (BL) and an Ultra HD (UHD) Enhancement Layer (EL). The encoder reads an UHD video sequences through Serial Digital Interface (SDI) ports and broadcasts the SHVC bitstream through an Internet Protocol (IP) channel. The bitstream is then decoded using a GPAC player with a real-time decoder.
{"title":"Demo: UHD live video streaming with a real-time scalable HEVC encoder","authors":"Ronan Parois, W. Hamidouche, E. Mora, M. Raulet, O. Déforges","doi":"10.1109/DASIP.2016.7853830","DOIUrl":"https://doi.org/10.1109/DASIP.2016.7853830","url":null,"abstract":"In this paper we present a real-time streaming demonstration with a multi-layer architecture of a pipelined software High Efficiency Video Coding (HEVC) encoders with inter-layer prediction enabling Scalable HEVC (SHVC) encodings. This SHVC encoder is implemented on an innovative platform performing real-time encodings that already demonstrated promising performance with HDR, HFR and SHVC implementation in previous demonstrations [1], [2]. The transmitted content consists of a spatial SHVC bitstream composed of a High Definition (HD) Base Layer (BL) and an Ultra HD (UHD) Enhancement Layer (EL). The encoder reads an UHD video sequences through Serial Digital Interface (SDI) ports and broadcasts the SHVC bitstream through an Internet Protocol (IP) channel. The bitstream is then decoded using a GPAC player with a real-time decoder.","PeriodicalId":6494,"journal":{"name":"2016 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"7 1","pages":"235-236"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86337959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-10-01DOI: 10.1109/DASIP.2016.7853793
Georgios Georgakarakos, Simon Holmbacka, J. Lilius
In this paper we evaluate the impact of task programming model in scalability and energy efficiency of dynamically parallel applications like HEVC decoding. We develop a task-based parallel HEVC decoding implementation supporting Tiles and Wavefront Parallel Processing. We measure and compare thread-based HEVC decoding against its alternative version supporting task-based parallelism. Results show that the task programming model can improve scalability and energy efficiency of HEVC decoding for various parallel application parameters (task dependencies, task granularity) and computing platforms ranging from server to laptop and embedded environments.
{"title":"Analysis on scalability and energy efficiency of HEVC decoding using task-based programming model","authors":"Georgios Georgakarakos, Simon Holmbacka, J. Lilius","doi":"10.1109/DASIP.2016.7853793","DOIUrl":"https://doi.org/10.1109/DASIP.2016.7853793","url":null,"abstract":"In this paper we evaluate the impact of task programming model in scalability and energy efficiency of dynamically parallel applications like HEVC decoding. We develop a task-based parallel HEVC decoding implementation supporting Tiles and Wavefront Parallel Processing. We measure and compare thread-based HEVC decoding against its alternative version supporting task-based parallelism. Results show that the task programming model can improve scalability and energy efficiency of HEVC decoding for various parallel application parameters (task dependencies, task granularity) and computing platforms ranging from server to laptop and embedded environments.","PeriodicalId":6494,"journal":{"name":"2016 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"48 9 1","pages":"34-41"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82822248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}