2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)最新文献

英文中文

Proposition and evaluation of a real-time generic architecture for a laser stripe detection system on FPGA 基于FPGA的激光条纹检测系统实时通用架构的提出与评价

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)

Pub Date : 2017-09-01 DOI: 10.1109/DASIP.2017.8122110

Seher Colak, E. Dumas, V. Fresse, O. Alata

Laser triangulation applications are commonly used for industrial quality control. Such algorithms require real-time systems often made of a computing unit close to the image sensor through a short and fast link. Choosing a camera with integrated Field Programmable Gate Array (FPGA) as the computing unit can provide high pipeline and parallel computing adapted to process image in real-time. Moreover, it is necessary in the industry to maintain code for several years whatever the system upgrade. So the conceived operators should be flexible to adapt to any hardware changes (sensor or FPGA) or any tool update with minimum effort. The purpose of this article is to present a generic architecture for laser stripe detection based on the centroid algorithm for a FPGA-based system. Evaluation of the use of resources with respect to two parameters (image width and parallelism) is pointed out. With three syntheses, models have been extracted to forecast evolution of these resources and an error analysis have been conducted to validate these models.

激光三角测量应用通常用于工业质量控制。这种算法需要实时系统，通常由一个通过短而快速的链路靠近图像传感器的计算单元组成。选择集成了现场可编程门阵列(FPGA)的摄像机作为计算单元，可以提供适合于实时图像处理的高流水线和并行计算。此外，在行业中，无论系统升级如何，都有必要维护代码数年。因此，设想的操作人员应该灵活地适应任何硬件更改(传感器或FPGA)或任何工具更新，以最小的努力。本文的目的是为基于fpga的系统提供一种基于质心算法的激光条纹检测通用架构。指出了两个参数(图像宽度和并行度)对资源利用的评价。通过三种综合方法，提取了预测资源演化的模型，并对模型进行了误差分析。

引用次数: 2

Detecting data-parallel synchronous dataflow graphs 检测数据并行同步数据流图

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)

Pub Date : 2017-09-01 DOI: 10.1109/DASIP.2017.8122118

Sudeep Kanur, J. Lilius, Johan Ersfolk

Synchronous Dataflow (SDF), a popular subset of the dataflow programming paradigm, gives a well structured formalism to capture signal and stream processing applications. With data-parallel architectures becoming ubiquitous, several frameworks leverage the SDF formalism to map applications to parallel architectures. But, these frameworks assume that the Synchronous Dataflow graphs (SDFGs) under consideration already are data-parallel. In this paper, we address the lack of mechanisms required to detect if an SDFG can be executed in a data-parallel fashion. We develop necessary and sufficient conditions that an SDFG must satisfy for its data-parallel execution. In addition, we develop methods that detect and transform SDFGs that cannot be determined to be data-parallel through visual graph inspection alone. We report on a prototype implementation of the developed conditions as a compiler pass in PREESM framework and test them against some useful applications expressed as an SDFG.

同步数据流(SDF)是数据流编程范例的一个流行子集，它提供了一种结构良好的形式来捕获信号和流处理应用程序。随着数据并行体系结构变得无处不在，一些框架利用SDF形式将应用程序映射到并行体系结构。但是，这些框架假定正在考虑的同步数据流图(sdfg)已经是数据并行的。在本文中，我们解决了缺乏检测SDFG是否可以以数据并行方式执行所需的机制的问题。我们提出了SDFG数据并行执行必须满足的充分必要条件。此外，我们还开发了检测和转换sdfg的方法，这些方法不能仅通过视觉图形检查确定为数据并行。作为PREESM框架中的编译器，我们报告了开发条件的原型实现，并针对一些表示为SDFG的有用应用程序进行了测试。

引用次数: 0

Hardware-software abandoned object detection vision system in heterogeneous zynq device 异构zynq设备中抛弃硬件软件的目标检测视觉系统

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)

Pub Date : 2017-09-01 DOI: 10.1109/DASIP.2017.8122122

T. Kryjak, Artur Skirzynski, M. Gorgon

In this paper a hardware-software abandoned object detection vision system implemented in the Zynq SoC (System on Chip) device is presented. First, the solution was implemented in C++ and run as a bare metal application on the ARM processor core of the Zynq (using floating and fixed-point computations). For the target video stream 1280 χ 720 @ 50 fps (74.25 MHz pixel clock) it reached only 2 fps. Therefore, to speed-up the application, it was decided to move some of the image processing and analysis operations to the programmable logic. This allowed to obtain real-time image processing i.e. 50 fps, with power consumption of less than 4W.

本文提出了一种基于Zynq SoC(片上系统)器件的弃硬件弃软件的目标检测视觉系统。首先，该解决方案是在c++中实现的，并作为裸机应用程序在Zynq的ARM处理器核心上运行(使用浮点和定点计算)。对于目标视频流1280 χ 720 @ 50 fps (74.25 MHz像素时钟)，它仅达到2 fps。因此，为了加快应用程序的速度，决定将一些图像处理和分析操作转移到可编程逻辑上。这允许获得实时图像处理，即50 fps，功耗低于4W。

引用次数: 0

The best of both: High-performance anc deterministic real-time executive by application-specific multi-core SoCs 两者兼而有之:通过特定于应用程序的多核soc实现高性能和确定性实时执行

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)

Pub Date : 2017-09-01 DOI: 10.1109/DASIP.2017.8122107

Steffen Vaas, Peter Ulbrich, M. Reichenbach, D. Fey

Embedded multi-core processors improve performance significantly and are desirable in many application-fields. This in particular includes safety-critical real-time systems, which typically require a deterministic temporal behavior. However, even tasks without dependencies running on different cores can interfere due to, sometimes hidden, shared hardware resources, such as common memories or buses. Consequently, only a pessimistic assumption of the worst-case execution time (WCET) that incorporates interference can be given. Hence, the aspired performance gain fizzles out in the poor temporal analyzability. Based on the fact that in safety-critical systems all tasks and their dependencies are known at compile-time, this paper presents an approach to generate application-specific, deterministic multi-core processor architectures for these systems. Thereby safety-critical tasks are executed on dedicated Deterministic Execution Units (DEUs) including lightweight, deterministic processor cores, bus systems, memories and peripherals. The remaining soft real-time tasks are executed on a general purpose multi-core processor that offers performance over determinism. Consequently, timing analysis for hard real-time tasks is significantly simplified, since interferences caused by shared resources and scheduling are effectively eliminated. To show the benefits of our approach, an application-specific architecture for a flight controller was generated and compared to an ARM Cortex-A9 dual-core as reference. Overall, we were able to significantly improve temporal properties of safety-critical tasks while preserving the overall performance for soft real-time tasks.

嵌入式多核处理器显著提高了性能，在许多应用领域都是理想的。这尤其包括对安全至关重要的实时系统，这些系统通常需要确定性的时间行为。然而，即使是在不同内核上运行的没有依赖关系的任务，也可能由于隐藏的共享硬件资源(如公共内存或总线)而产生干扰。因此，只能给出包含干扰的最坏情况执行时间(WCET)的悲观假设。因此，期望的性能增益在较差的时间可分析性中失败了。基于在安全关键型系统中，所有任务及其依赖关系在编译时都是已知的这一事实，本文提出了一种为这些系统生成特定于应用程序的确定性多核处理器体系结构的方法。因此，安全关键任务在专用的确定性执行单元(deu)上执行，包括轻量级、确定性处理器核心、总线系统、存储器和外设。其余的软实时任务在提供性能优于确定性的通用多核处理器上执行。因此，由于有效地消除了共享资源和调度造成的干扰，因此大大简化了硬实时任务的时序分析。为了展示我们的方法的好处，我们生成了一个飞行控制器的特定应用架构，并将其与ARM Cortex-A9双核作为参考进行了比较。总的来说，我们能够显著提高安全关键任务的时间属性，同时保持软实时任务的整体性能。

{"title":"The best of both: High-performance anc deterministic real-time executive by application-specific multi-core SoCs","authors":"Steffen Vaas, Peter Ulbrich, M. Reichenbach, D. Fey","doi":"10.1109/DASIP.2017.8122107","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122107","url":null,"abstract":"Embedded multi-core processors improve performance significantly and are desirable in many application-fields. This in particular includes safety-critical real-time systems, which typically require a deterministic temporal behavior. However, even tasks without dependencies running on different cores can interfere due to, sometimes hidden, shared hardware resources, such as common memories or buses. Consequently, only a pessimistic assumption of the worst-case execution time (WCET) that incorporates interference can be given. Hence, the aspired performance gain fizzles out in the poor temporal analyzability. Based on the fact that in safety-critical systems all tasks and their dependencies are known at compile-time, this paper presents an approach to generate application-specific, deterministic multi-core processor architectures for these systems. Thereby safety-critical tasks are executed on dedicated Deterministic Execution Units (DEUs) including lightweight, deterministic processor cores, bus systems, memories and peripherals. The remaining soft real-time tasks are executed on a general purpose multi-core processor that offers performance over determinism. Consequently, timing analysis for hard real-time tasks is significantly simplified, since interferences caused by shared resources and scheduling are effectively eliminated. To show the benefits of our approach, an application-specific architecture for a flight controller was generated and compared to an ARM Cortex-A9 dual-core as reference. Overall, we were able to significantly improve temporal properties of safety-critical tasks while preserving the overall performance for soft real-time tasks.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"7 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78960169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Single-FPGA complete 3D and 2D medical ultrasound imager 单fpga完整的三维和二维医学超声成像仪

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)

Pub Date : 2017-09-01 DOI: 10.1109/DASIP.2017.8122113

A. Ibrahim, W. Simon, Damien Doy, E. Pignat, F. Angiolini, M. Arditi, J. Thiran, G. Micheli

3D ultrasound (US) acquisition acquires volumetric images, thus alleviating a classical US imaging bottleneck that requires a highly-trained sonographer to operate the US probe. However, this opportunity has not been explored in practice, since 3D US machines are only suitable for hospital usage in terms of cost, size and power requirements. In this work we propose the first fully-digital, single-chip 3D US imager on FPGA. The proposed design is a complete processing pipeline that includes pre-processing, image reconstruction, and post-processing. It supports up to 1024 input channels, which matches or exceeds state of the art, in an unprecedented estimated power budget of 6.1 W. The imager exploits a highly scalable architecture which can be either downscaled for 2D imaging, or further upscaled on a larger FPGA. Our platform supports both real-time inputs over an optical cable, or test data feeds sent by a laptop running Matlab and custom tools over an Ethernet connection. Additionally, the design allows HDMI video output on a screen.

3D超声(US)采集获得体积图像，从而缓解了传统的超声成像瓶颈，该瓶颈需要训练有素的超声医师操作超声探头。然而，这种机会并没有在实践中得到探索，因为3D美国机器在成本、尺寸和功率要求方面只适合医院使用。在这项工作中，我们提出了第一个基于FPGA的全数字单芯片3D美国成像仪。提出的设计是一个完整的处理管道，包括预处理、图像重建和后处理。它支持多达1024个输入通道，匹配或超过最先进的技术，在一个前所未有的估计功率预算为6.1 W。成像仪采用高度可扩展的架构，既可以缩小2D成像，也可以在更大的FPGA上进一步扩大。我们的平台既支持通过光缆的实时输入，也支持通过以太网连接运行Matlab和定制工具的笔记本电脑发送的测试数据馈送。此外，该设计允许在屏幕上输出HDMI视频。

引用次数: 4

Embedded fluorescence lifetime determination for high throughput real-time droplet sorting with microfluidics 微流体高通量实时液滴分选的嵌入式荧光寿命测定

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)

Pub Date : 2017-09-01 DOI: 10.1109/DASIP.2017.8122129

T. Lieske, W. Uhring, N. Dumas, J. Léonard, D. Fey

Time-resolved fluorescence (TRF) analysis is considered to be among the primary research tools in biochemistry and biophysics. One application of this method is the investigation of biomolecular interactions with promising applications for biosensing. For the latter context, time-correlated single photon counting (TCSPC) is the most sensitive, hence preferred implementation of TRF. However, high throughput applications are presently limited by the maximum achievable photon acquisition rate, and even more by the data processing rate. The latter rate is actually limited by the computational complexity to estimate accurately the fluorescence lifetime from TCSPC data. Here we propose a solution that would enable the implementation of TRF detection for fluorescence-activated droplet sorting (FADS), a particularly high throughput, microfluidic-based technology. Most fluorescence lifetime algorithms require a large number of detected photons for an accurate lifetime computation. This paper presents an implementation based on a maximum likelihood estimator (MLE), enabling high precision estimation with a limited number of detected photons, significantly reducing the total measurement time. This speedup rapidly increases the input data rate. As a result, off-the-shelf embedded products cannot handle the data rates produced by current TCSPC units that are used to measure the fluorescence. Therefore, a configurable real-time capable hardware architecture is implemented on a field-programmable gate array (FPGA) that can handle the data rates of future TCSPC units, rendering high throughput droplet sorting with microfluidics possible.

时间分辨荧光(TRF)分析被认为是生物化学和生物物理学的主要研究工具之一。该方法的一个应用是生物分子相互作用的研究，具有生物传感的前景。在后一种情况下，时间相关单光子计数(TCSPC)是最敏感的，因此首选TRF实现。然而，高通量应用目前受到最大可实现的光子采集速率的限制，甚至更多的受到数据处理速率的限制。后一种速率实际上受到计算复杂性的限制，无法准确估计TCSPC数据的荧光寿命。在这里，我们提出了一种解决方案，可以实现荧光激活液滴分选(FADS)的TRF检测，这是一种特别高通量、基于微流体的技术。大多数荧光寿命算法需要大量的检测光子来进行精确的寿命计算。本文提出了一种基于最大似然估计器(MLE)的实现方法，可以在有限的检测光子数量下实现高精度估计，大大缩短了总测量时间。这种加速迅速提高了输入数据速率。因此，现成的嵌入式产品无法处理当前用于测量荧光的TCSPC单元产生的数据速率。因此，在现场可编程门阵列(FPGA)上实现了可配置的实时硬件架构，该架构可以处理未来TCSPC单元的数据速率，从而使微流体的高通量液滴分选成为可能。

{"title":"Embedded fluorescence lifetime determination for high throughput real-time droplet sorting with microfluidics","authors":"T. Lieske, W. Uhring, N. Dumas, J. Léonard, D. Fey","doi":"10.1109/DASIP.2017.8122129","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122129","url":null,"abstract":"Time-resolved fluorescence (TRF) analysis is considered to be among the primary research tools in biochemistry and biophysics. One application of this method is the investigation of biomolecular interactions with promising applications for biosensing. For the latter context, time-correlated single photon counting (TCSPC) is the most sensitive, hence preferred implementation of TRF. However, high throughput applications are presently limited by the maximum achievable photon acquisition rate, and even more by the data processing rate. The latter rate is actually limited by the computational complexity to estimate accurately the fluorescence lifetime from TCSPC data. Here we propose a solution that would enable the implementation of TRF detection for fluorescence-activated droplet sorting (FADS), a particularly high throughput, microfluidic-based technology. Most fluorescence lifetime algorithms require a large number of detected photons for an accurate lifetime computation. This paper presents an implementation based on a maximum likelihood estimator (MLE), enabling high precision estimation with a limited number of detected photons, significantly reducing the total measurement time. This speedup rapidly increases the input data rate. As a result, off-the-shelf embedded products cannot handle the data rates produced by current TCSPC units that are used to measure the fluorescence. Therefore, a configurable real-time capable hardware architecture is implemented on a field-programmable gate array (FPGA) that can handle the data rates of future TCSPC units, rendering high throughput droplet sorting with microfluidics possible.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"38 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75648080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Power efficient dataflow design for a heterogeneous smart camera architecture 一种异构智能摄像头架构的高能效数据流设计

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)

Pub Date : 2017-09-01 DOI: 10.1109/DASIP.2017.8122128

Deepayan Bhowmik, Paulo Garcia, A. Wallace, Robert J. Stewart, G. Michaelson

Visual attention modelling characterises the scene to segment regions of visual interest and is increasingly being used as a pre-processing step in many computer vision applications including surveillance and security. Smart camera architectures are an emerging technology and a foundation of security and safety frameworks in modern vision systems. In this paper, we present a dataflow design of a visual saliency based camera architecture targeting a heterogeneous CPU+FPGA platform to propose a smart camera network infrastructure. The proposed design flow encompasses image processing algorithm implementation, hardware & software integration and network connectivity through a unified model. By leveraging the properties of the dataflow paradigm, we iteratively refine the algorithm specification into a deployable solution, addressing distinct requirements at each design stage: from algorithm accuracy to hardware-software interactions, real-time execution and power consumption. Our design achieved real-time run time performance and the power consumption of the optimised asynchronous design is reported at only 0.25 Watt. The resource usages on a Xilinx Zynq platform remains significantly low.

视觉注意建模通过描述场景特征来分割视觉感兴趣的区域，并且越来越多地被用作许多计算机视觉应用的预处理步骤，包括监视和安全。智能摄像头架构是一项新兴技术，是现代视觉系统中安全框架的基础。本文针对异构CPU+FPGA平台，提出了一种基于视觉显著性的摄像机架构数据流设计，提出了一种智能摄像机网络基础架构。提出的设计流程包括图像处理算法实现、软硬件集成和通过统一模型实现网络连接。通过利用数据流范式的属性，我们迭代地将算法规范细化为可部署的解决方案，解决每个设计阶段的不同需求:从算法准确性到硬件-软件交互、实时执行和功耗。我们的设计实现了实时运行时性能，优化的异步设计的功耗仅为0.25瓦。Xilinx Zynq平台上的资源使用仍然非常低。

{"title":"Power efficient dataflow design for a heterogeneous smart camera architecture","authors":"Deepayan Bhowmik, Paulo Garcia, A. Wallace, Robert J. Stewart, G. Michaelson","doi":"10.1109/DASIP.2017.8122128","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122128","url":null,"abstract":"Visual attention modelling characterises the scene to segment regions of visual interest and is increasingly being used as a pre-processing step in many computer vision applications including surveillance and security. Smart camera architectures are an emerging technology and a foundation of security and safety frameworks in modern vision systems. In this paper, we present a dataflow design of a visual saliency based camera architecture targeting a heterogeneous CPU+FPGA platform to propose a smart camera network infrastructure. The proposed design flow encompasses image processing algorithm implementation, hardware & software integration and network connectivity through a unified model. By leveraging the properties of the dataflow paradigm, we iteratively refine the algorithm specification into a deployable solution, addressing distinct requirements at each design stage: from algorithm accuracy to hardware-software interactions, real-time execution and power consumption. Our design achieved real-time run time performance and the power consumption of the optimised asynchronous design is reported at only 0.25 Watt. The resource usages on a Xilinx Zynq platform remains significantly low.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"79 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84108852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Hardware-based architecture for asymmetric numeral systems entropy decoder 非对称数字系统熵解码器的硬件结构

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)

Pub Date : 2017-09-01 DOI: 10.1109/DASIP.2017.8122109

Seyyed Mahdi Najmabadi, Harsimran Singh Tungal, Trung-Hieu Tran, S. Simon

In this paper, two novel hardware architectures based on tabled asymmetric numeral systems decoding algorithm are proposed. In the proposed architectures the decoding throughput is highly dependent on the how much the data is compressed at encoding time. The synthesis results presented here show that the throughput of the parallel architecture can reach up 200 MB/s. The benchmarks show that the parallel architecture that runs on Xilinx Kintex FPGA provides higher throughout in comparison with the same algorithm running on Core i3 CPU.

本文提出了两种基于表非对称数字系统译码算法的硬件结构。在所提出的体系结构中，解码吞吐量高度依赖于编码时数据被压缩的程度。综合结果表明，该并行架构的吞吐量可达200 MB/s。基准测试表明，与在Core i3 CPU上运行相同的算法相比，在Xilinx Kintex FPGA上运行的并行架构提供了更高的吞吐量。

引用次数: 8

Model-driven reliability evaluation for MPSoC design 模型驱动的MPSoC设计可靠性评估

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)

Pub Date : 2017-09-01 DOI: 10.1109/DASIP.2017.8122115

T. Nguyen, A. Mouraud, M. Thévenin, G. Corre, O. Pasquier, S. Pillement

When designing a Multi-Processor System-on-Chip (MPSoC), a very large range of design alternatives arises from a huge space of possible design options and component choices. Literature proposes numerous Design-Space-Exploration (DSE) approaches thats mainly focus on cost optimization. In this paper, we present a DSE approach which focuses on the reliability of the whole design. This approach is based on a meta-model of Multi-Processor System-on-Chips (MPSoCs) integrated the reliability evaluation. We develop a tool that allows designers to describe and optimize their platform based on the proposed meta-model. The obtained results of an MPSoC is presented including the improved overall reliability of the system thanks to the automatic selection of the fault tolerance strategies for each component.

在设计多处理器片上系统(MPSoC)时，从可能的设计选项和组件选择的巨大空间中产生了非常大的设计选择范围。文献提出了许多主要关注成本优化的设计-空间探索(DSE)方法。在本文中，我们提出了一种关注整个设计可靠性的DSE方法。该方法基于集成了可靠性评估的多处理器单片系统(mpsoc)元模型。我们开发了一个工具，允许设计人员根据提出的元模型描述和优化他们的平台。给出了MPSoC的结果，包括由于每个组件的容错策略的自动选择而提高了系统的整体可靠性。

引用次数: 0

Enabling GPU software developers to optimize their applications — The LPGPU2 approach 使GPU软件开发人员能够优化他们的应用程序- lppu2方法

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)

Pub Date : 2017-09-01 DOI: 10.1109/DASIP.2017.8122116

B. Juurlink, J. Lucas, Nadjib Mammeri, G. Keramidas, Katerina Pontzolkova, I. Aransay, Chrysa Kokkala, Martyn Bliss, A. Richards

Low-power GPUs have become ubiquitous, they can be found in domains ranging from wearable and mobile computing to automotive systems. With this ubiquity has come a wider range of applications exploiting low-power GPUs, placing ever increasing demands on the expected performance and power efficiency of the devices. The LPGPU2 project is an EU-funded, Innovation Action, 30-month-project targeting to develop an analysis and visualization framework that enables GPU application developers to improve the performance and power consumption of their applications. To this end, the project follows a holistic approach. First, several applications (use cases) are being developed for or ported to low-power GPUs. These applications will be optimized using the tooling framework in the last phase of the project. In addition, power measurement devices and power models are devised that are 10× more accurate than the state of the art. The ultimate goal of the project is to promote open vendor-neutral standards via the Khronos group. This paper briefly reports on the achievements made in the first phase of the project (till month 18) and focuses on the progress made in applications; in power measurement, estimation, and modelling; and in the analysis and visualization tool suite.

低功耗gpu已经变得无处不在，它们可以在从可穿戴和移动计算到汽车系统的各个领域找到。随着这种普及，越来越多的应用程序利用低功耗gpu，对设备的预期性能和功率效率提出了越来越高的要求。lppu2项目是欧盟资助的创新行动项目，为期30个月，旨在开发一个分析和可视化框架，使GPU应用程序开发人员能够提高其应用程序的性能和功耗。为此，该项目遵循整体方法。首先，一些应用程序(用例)正在为低功耗gpu开发或移植。这些应用程序将在项目的最后阶段使用工具框架进行优化。此外，功率测量装置和功率模型的设计比目前的技术水平精确10倍。该项目的最终目标是通过Khronos小组促进开放的供应商中立标准。本文简要介绍了项目第一阶段(截止到18日)所取得的成果，重点介绍了项目申请的进展情况;在功率测量，估计和建模;在分析和可视化工具套件中。

{"title":"Enabling GPU software developers to optimize their applications — The LPGPU2 approach","authors":"B. Juurlink, J. Lucas, Nadjib Mammeri, G. Keramidas, Katerina Pontzolkova, I. Aransay, Chrysa Kokkala, Martyn Bliss, A. Richards","doi":"10.1109/DASIP.2017.8122116","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122116","url":null,"abstract":"Low-power GPUs have become ubiquitous, they can be found in domains ranging from wearable and mobile computing to automotive systems. With this ubiquity has come a wider range of applications exploiting low-power GPUs, placing ever increasing demands on the expected performance and power efficiency of the devices. The LPGPU2 project is an EU-funded, Innovation Action, 30-month-project targeting to develop an analysis and visualization framework that enables GPU application developers to improve the performance and power consumption of their applications. To this end, the project follows a holistic approach. First, several applications (use cases) are being developed for or ported to low-power GPUs. These applications will be optimized using the tooling framework in the last phase of the project. In addition, power measurement devices and power models are devised that are 10× more accurate than the state of the art. The ultimate goal of the project is to promote open vendor-neutral standards via the Khronos group. This paper briefly reports on the achievements made in the first phase of the project (till month 18) and focuses on the progress made in applications; in power measurement, estimation, and modelling; and in the analysis and visualization tool suite.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"26 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82060300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀