Proceedings of the 5th International Workshop on OpenCL最新文献

英文中文

Applying Models of Computation to OpenCL Pipes for FPGA Computing OpenCL管道计算模型在FPGA计算中的应用

Proceedings of the 5th International Workshop on OpenCL

Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078163

Nachiket Kapre, Hiren D. Patel

OpenCL pipes offer a powerful construct for synthesizing multi-kernel FPGA applications with inter-kernel communication dependencies. The communication discipline between the FPGA kernels is restricted to producer-consumer style patterns supported with on-chip FPGA FIFOs. While this provides few restrictions on the usage, the OpenCL compiler is unable to provide guarantees on buffering capacity or schedulability of the connected kernels. Without these guarantees, an OpenCL developer may over-provision hardware resources or assume pessimistic timing during scheduling. We propose imposing a communication discipline inspired from models of computation (e.g.Ptolemy) such as synchronous dataflow (SDF), and bulk synchronous (BSP). These models offer a restricted subset of communication patterns that enable implementation tradeoffs and deliver performance and resource guarantees. This is useful for OpenCL developers operating within the constraints of the FPGA device. We provide a preliminary analysis of our proposal and sketch programmer and compiler responsibilities that would be needed for integrating these features into the FPGA OpenCL environment.

OpenCL管道为合成具有内核间通信依赖的多内核FPGA应用程序提供了强大的结构。FPGA内核之间的通信规则仅限于片上FPGA fifo支持的生产者-消费者风格模式。虽然这对使用提供了很少的限制，但OpenCL编译器无法保证所连接内核的缓冲容量或可调度性。如果没有这些保证，OpenCL开发人员可能会过度配置硬件资源，或者在调度过程中假设悲观的时序。我们建议从同步数据流(SDF)和批量同步(BSP)等计算模型(例如托勒密)中引入通信规则。这些模型提供了通信模式的受限子集，支持实现折衷并提供性能和资源保证。这对于在FPGA设备的约束下操作的OpenCL开发人员非常有用。我们提供了我们的提案和草图的程序员和编译器职责的初步分析，将这些功能集成到FPGA OpenCL环境中。

{"title":"Applying Models of Computation to OpenCL Pipes for FPGA Computing","authors":"Nachiket Kapre, Hiren D. Patel","doi":"10.1145/3078155.3078163","DOIUrl":"https://doi.org/10.1145/3078155.3078163","url":null,"abstract":"OpenCL pipes offer a powerful construct for synthesizing multi-kernel FPGA applications with inter-kernel communication dependencies. The communication discipline between the FPGA kernels is restricted to producer-consumer style patterns supported with on-chip FPGA FIFOs. While this provides few restrictions on the usage, the OpenCL compiler is unable to provide guarantees on buffering capacity or schedulability of the connected kernels. Without these guarantees, an OpenCL developer may over-provision hardware resources or assume pessimistic timing during scheduling. We propose imposing a communication discipline inspired from models of computation (e.g.Ptolemy) such as synchronous dataflow (SDF), and bulk synchronous (BSP). These models offer a restricted subset of communication patterns that enable implementation tradeoffs and deliver performance and resource guarantees. This is useful for OpenCL developers operating within the constraints of the FPGA device. We provide a preliminary analysis of our proposal and sketch programmer and compiler responsibilities that would be needed for integrating these features into the FPGA OpenCL environment.","PeriodicalId":267581,"journal":{"name":"Proceedings of the 5th International Workshop on OpenCL","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126824407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Symphony: Task Scheduling and Memory Management in Heterogeneous Computing 异构计算中的任务调度和内存管理

Proceedings of the 5th International Workshop on OpenCL

Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078171

Amit Jindal, Wenjia Ruan

Task scheduling and memory management are challenges that make Heterogeneous Computing difficult for the masses. There are several programming models and tools that exist targeting partitioning of workload and accessibility of data between CPU and GPU. We have developed and deployed Symphony SDK - a framework that makes workload partitioning, scheduling and memory management 'simple' for developers. In this talk, we will introduce Symphony architecture, elaborate how existing OpenCL kernels can be reused with heterogeneous task synchronization, task scheduling, and memory management capabilities of Symphony. We will also share real-world cases where Symphony has provided 2x-6x performance speed-ups.

任务调度和内存管理是使异构计算难以普及的难题。有几种编程模型和工具针对CPU和GPU之间的工作负载分区和数据可访问性。我们已经开发并部署了Symphony SDK，这是一个框架，它使工作负载分区、调度和内存管理对开发人员来说变得“简单”。在这次演讲中，我们将介绍Symphony架构，详细说明现有的OpenCL内核如何与Symphony的异构任务同步、任务调度和内存管理功能一起重用。我们还将分享Symphony提供2 -6倍性能加速的实际案例。

引用次数: 2

Analyzing and improving performance portability of OpenCL applications via auto-tuning 通过自动调优分析和改进OpenCL应用程序的性能可移植性

Proceedings of the 5th International Workshop on OpenCL

Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078173

J. Price, Simon McIntosh-Smith

The increasing uptake of portable, parallel programming models such as OpenCL has fueled extensive research into performance portability. Automatic performance tuning techniques have shown promise for generating kernels which are highly optimized for specific architectures, but do not address the issue of performance portability directly. With the range of architectures and possible optimizations continuously growing, the concept of achieving performance portability from a single code base becomes ever more attractive. In this talk, we present an approach for analyzing performance portability that exploits that black-box nature of automatic performance tuning techniques. We demonstrate this approach across a diverse range of GPU and CPU architectures for two simple OpenCL applications. We then discuss the potential for auto-tuning to aid the generation of performance portable OpenCL kernels by incorporating multi-objective optimization techniques into the tuning process.

可移植并行编程模型(如OpenCL)的日益普及推动了对性能可移植性的广泛研究。自动性能调优技术在生成针对特定体系结构进行高度优化的内核方面表现出了希望，但不能直接解决性能可移植性问题。随着体系结构的范围和可能的优化不断扩大，从单个代码库实现性能可移植性的概念变得越来越有吸引力。在本次演讲中，我们将介绍一种分析性能可移植性的方法，该方法利用了自动性能调优技术的黑盒特性。我们在两个简单的OpenCL应用程序中演示了这种方法在各种GPU和CPU架构上的应用。然后，我们讨论了自动调优的潜力，通过将多目标优化技术结合到调优过程中来帮助生成性能可移植的OpenCL内核。

引用次数: 7

Wavefront Parallel Processing on GPUs with an Application to Video Encoding Algorithms 基于gpu的波前并行处理及其在视频编码算法中的应用

Proceedings of the 5th International Workshop on OpenCL

Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078177

Biju George, Ben Ashbaugh

In this paper, we present our experiences in designing, implementing and evaluating efficient applications of the wavefront pattern for block-level motion estimation in video encoding algorithms using OpenCL™ kernels on Intel® Processor Graphics™. We implement multiple solutions exploring different performance considerations, evaluate their pros and cons, present performance data, and provide our recommendations.

在本文中，我们介绍了我们在使用Intel®Processor Graphics™上的OpenCL™内核设计、实现和评估波前模式在视频编码算法中用于块级运动估计的有效应用方面的经验。我们实现了多个解决方案，探索了不同的性能考虑因素，评估了它们的优缺点，提供了性能数据，并提供了我们的建议。

引用次数: 0

Accelerating Applications at Cloud Scale using FPGAs 使用fpga加速云级应用

Proceedings of the 5th International Workshop on OpenCL

Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078179

Spenser Gilliland

The acceptance and success of cloud computing has given application developers access to computing and new customers at a scale never seen below. The inherent ability of an FPGA to reconfigure and be workload optimized is a great advantage given the fast-moving needs of cloud computing applications. In this talk we will discuss how users can develop, accelerate and deploy accelerated applications in the cloud at scale. You will learn how to get started on a turn-key OpenCL development environment in the cloud using Xilinx FPGAs.

云计算的接受和成功使应用程序开发人员能够以前所未有的规模访问计算和新客户。考虑到云计算应用程序的快速变化需求，FPGA具有重新配置和工作负载优化的固有能力是一个很大的优势。在本次演讲中，我们将讨论用户如何在云端大规模开发、加速和部署加速应用程序。您将学习如何使用赛灵思fpga在云中开始一个交钥匙的OpenCL开发环境。

引用次数: 0

Heterogeneous Computing Using Modern C++ with OpenCL Devices: Tutorial at IWOCL 2017 使用现代c++与OpenCL设备进行异构计算:IWOCL 2017教程

Proceedings of the 5th International Workshop on OpenCL

Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078159

Rod Burns, Ruymán Reyes

SYCL™ is a royalty-free, cross-platform C++ abstraction layer that builds on the underlying concepts, portability and efficiency of OpenCL™, while adding the ease-of-use and flexibility of modern C++11/14. For example, SYCL enables single source development where C++ template functions are compiled for both host and device to construct complex algorithms that use OpenCL acceleration, and then re-use them throughout their source code on different types of data. Using SYCL can simplify development and reduce the amount of code required for applications using OpenCL devices by over 50% compared to standard OpenCL code. This is because of the use of template functions and a simplified, streamlined host API. This hands-on session will provide an opportunity to get experience with SYCL using ComputeCpp™ Community Edition, a free to use implementation of the SYCL 1.2 standard. Attendees will be shown how to set up ComputeCpp and use it to write their own SYCL code to run on supported GPUs and CPUs.

SYCL™是一个免版税的跨平台c++抽象层，它建立在OpenCL™的底层概念、可移植性和效率之上，同时增加了现代c++ 11/14的易用性和灵活性。例如，SYCL支持单源开发，其中为主机和设备编译c++模板函数，以构建使用OpenCL加速的复杂算法，然后在不同类型数据的整个源代码中重用它们。与标准OpenCL代码相比，使用SYCL可以简化开发，并将使用OpenCL设备的应用程序所需的代码量减少50%以上。这是因为使用了模板函数和简化、流线型的主机API。这个实践课程将提供一个使用ComputeCpp™社区版体验SYCL的机会，这是一个免费使用的SYCL 1.2标准实现。与会者将展示如何设置ComputeCpp，并使用它来编写自己的SYCL代码，以在支持的gpu和cpu上运行。

引用次数: 0

Harnessing the Power of FPGAs with the Intel FPGA SDK for OpenCL™ 利用英特尔OpenCL™FPGA SDK利用FPGA的强大功能

Proceedings of the 5th International Workshop on OpenCL

Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078168

Byron Sinclair, A. Ling, Genady Paikin

In this tutorial, we will introduce you to the reconfigurable hardware architecture and programming of Field Programmable Gate Arrays (FPGAs). You will learn why FPGAs have become so popular in recent years, and understand the many advantages of using FPGAs in your HPC application. In particular, we will cover architectural features of FPGAs that make them well suited to many complex operations, including matrix multiplications and convolutions. In addition, we will introduce you to programming FPGAs using the Intel FPGA SDK for OpenCL™, and how specific OpenCL coding techniques can lead to efficient circuits implemented on the FPGA. Finally, we will go over several case studies where FPGAs have shown very competitive performance when programmed using OpenCL, including convolutional neural nets, FFTs, and astronomy de-dispersion algorithms.

在本教程中，我们将向您介绍现场可编程门阵列(fpga)的可重构硬件架构和编程。您将了解为什么fpga近年来变得如此流行，并了解在HPC应用程序中使用fpga的许多优点。特别是，我们将介绍fpga的架构特征，使它们非常适合许多复杂的操作，包括矩阵乘法和卷积。此外，我们将向您介绍使用OpenCL™的英特尔FPGA SDK编程FPGA，以及特定的OpenCL编码技术如何在FPGA上实现高效电路。最后，我们将讨论几个案例研究，其中fpga在使用OpenCL编程时显示出非常有竞争力的性能，包括卷积神经网络，fft和天文去色散算法。

引用次数: 0

Using SYCL as an Implementation Framework for HPX.Compute 使用SYCL作为HPX的实现框架。计算

Proceedings of the 5th International Workshop on OpenCL

Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078187

Marcin Copik, Hartmut Kaiser

The recent advancements in High Performance Computing and ongoing research to reach Exascale has been heavily supported by introducing dedicated massively parallel accelerators. Programmers wishing to maximize utilization of current supercomputers are required to develop software which not only involves scaling across multiple nodes but are capable of offloading data-parallel computation to dedicated hardware such as graphic processors. Introduction of new types of hardware has been followed by developing new languages, extensions, compilers and libraries. Unfortunately, none of those solutions seem to be fully portable and independent from specific vendor and type of hardware. HPX.Compute, a programming model developed on top of HPX, a C++ standards library for concurrency and parallelism, uses existing and proposed C++ language and library capabilities to support various types of parallelism. It aims to provide a generic interface allowing for writing code which is portable between hardware architectures. We have implemented a new backend for HPX.Compute based on SYCL, a Khronos standard for single-source programming of OpenCL devices in C++. We present how this runtime may be used to target OpenCL devices through our C++ API. We have evaluated performance of new implementation on graphic processors with STREAM benchmark and compare results with existing CUDA-based implementation.

高性能计算的最新进展和正在进行的达到Exascale的研究得到了引入专用大规模并行加速器的大力支持。希望最大限度地利用当前超级计算机的程序员需要开发软件，不仅涉及跨多个节点的扩展，而且能够将数据并行计算卸载到专用硬件(如图形处理器)上。随着新型硬件的引入，开发了新的语言、扩展、编译器和库。不幸的是，这些解决方案似乎没有一个是完全可移植的，并且独立于特定的供应商和硬件类型。HPX。Compute是在HPX (c++并发性和并行性标准库)之上开发的编程模型，它使用现有的和建议的c++语言和库功能来支持各种类型的并行性。它旨在提供一个通用接口，允许编写可在硬件架构之间移植的代码。我们已经为HPX实现了一个新的后端。基于SYCL的计算，SYCL是一种用于OpenCL设备的c++单源编程的Khronos标准。我们展示了这个运行时如何通过我们的c++ API来瞄准OpenCL设备。我们用STREAM基准测试评估了新实现在图形处理器上的性能，并将结果与现有的基于cuda的实现进行了比较。

{"title":"Using SYCL as an Implementation Framework for HPX.Compute","authors":"Marcin Copik, Hartmut Kaiser","doi":"10.1145/3078155.3078187","DOIUrl":"https://doi.org/10.1145/3078155.3078187","url":null,"abstract":"The recent advancements in High Performance Computing and ongoing research to reach Exascale has been heavily supported by introducing dedicated massively parallel accelerators. Programmers wishing to maximize utilization of current supercomputers are required to develop software which not only involves scaling across multiple nodes but are capable of offloading data-parallel computation to dedicated hardware such as graphic processors. Introduction of new types of hardware has been followed by developing new languages, extensions, compilers and libraries. Unfortunately, none of those solutions seem to be fully portable and independent from specific vendor and type of hardware. HPX.Compute, a programming model developed on top of HPX, a C++ standards library for concurrency and parallelism, uses existing and proposed C++ language and library capabilities to support various types of parallelism. It aims to provide a generic interface allowing for writing code which is portable between hardware architectures. We have implemented a new backend for HPX.Compute based on SYCL, a Khronos standard for single-source programming of OpenCL devices in C++. We present how this runtime may be used to target OpenCL devices through our C++ API. We have evaluated performance of new implementation on graphic processors with STREAM benchmark and compare results with existing CUDA-based implementation.","PeriodicalId":267581,"journal":{"name":"Proceedings of the 5th International Workshop on OpenCL","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126786219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Enabling FPGAs as a True Device in the OpenCL Standard: Bridging the Gap for FPGAs 在OpenCL标准中使fpga成为真正的器件:弥合fpga的差距

Proceedings of the 5th International Workshop on OpenCL

Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078176

Vincent Mirian, P. Chow

In our work with developing an OpenCL platform for FPGAs, we observed that the way that OpenCL is currently used on FPGAs does not expose the full capability of FPGAs to the programmer. In particular, FPGAs are spatial devices that can be partitioned by area with each partition programmed with a different function. The latest FPGAs can even be reconfigured dynamically such that one partition of the FPGA can be configured while the rest of the FPGA is still in use. The analogy with GPUs is that an OpenCL programmer can partition a GPU into multiple device objects, execute different kernels on each device object, and reprogram the device objects. An OpenCL programmer cannot do this with an FPGA even though the capability exists. As FPGA capacities continue to increase, the ability to partition and partially reconfigure the FPGA will become even more desirable. The fundamental issue is how FPGAs are currently viewed as devices in the OpenCL model. In this paper, we propose a small change to the OpenCL definition of a device that unlocks the full potential of FPGAs to the programmer.

在我们为fpga开发OpenCL平台的工作中，我们观察到OpenCL目前在fpga上使用的方式并没有向程序员展示fpga的全部功能。特别地，fpga是可以按区域划分的空间器件，每个分区用不同的功能编程。最新的FPGA甚至可以动态重新配置，这样FPGA的一个分区可以配置，而FPGA的其余部分仍在使用中。与GPU类似的是，OpenCL程序员可以将GPU划分为多个设备对象，在每个设备对象上执行不同的内核，并重新编程设备对象。一个OpenCL程序员不能用FPGA做到这一点，即使这种能力存在。随着FPGA容量的不断增加，对FPGA进行分区和部分重新配置的能力将变得更加可取。基本问题是fpga目前如何被视为OpenCL模型中的设备。在本文中，我们建议对器件的OpenCL定义进行一个小的更改，以向程序员释放fpga的全部潜力。

引用次数: 0

Accelerated Machine Learning Using TensorFlow and SYCL on OpenCL Devices 在OpenCL设备上使用TensorFlow和SYCL加速机器学习

Proceedings of the 5th International Workshop on OpenCL

Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078160

M. Goli, L. Iwanski, A. Richards

Machine learning is being used in more and more artificial intelligence applications. While existing machine learning frameworks mostly support NVIDIA CUDA GPUs, there has been little research dedicated to targeting other devices through open standards such as OpenCL. In this paper, we explain how machine learning applications can harness the power of OpenCL using open standards and how, by using SYCL, TensorFlow can be extended to include customized operations running on OpenCL devices.

机器学习在越来越多的人工智能应用中得到应用。虽然现有的机器学习框架大多支持NVIDIA CUDA gpu，但很少有研究致力于通过开放标准(如OpenCL)瞄准其他设备。在本文中，我们解释了机器学习应用程序如何使用开放标准来利用OpenCL的强大功能，以及如何通过使用SYCL，将TensorFlow扩展到包括在OpenCL设备上运行的定制操作。

引用次数: 14

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 5th International Workshop on OpenCL

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀