首页 > 最新文献

Proceedings of the 5th International Workshop on OpenCL最新文献

英文 中文
OpenCL Interoperability with OpenVX Graphs OpenCL与OpenVX图形的互操作性
Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078183
Ben Ashbaugh, A. Bernal
OpenVX is a computer vision framework that enables embedded and real-time applications to optimize computer vision processing for performance and power. OpenVX addresses system-level optimizations by making use of a graph-based computational API. Although this gives a clear advantage over other traditional computer vision libraries such as OpenCV, which mainly addresses kernel-level optimizations, OpenVX still relies on vendor implementations to optimize individual built-in kernels. OpenVX implements several computer vision kernels but in order to increase adoption and user flexibility, OpenVX added support for C based user-kernels, which by default are single-threaded and there is no particular way to accelerate kernels or offload the computation to an accelerator such us a GPU. The user has to do the heavy lifting of supporting a multi-threaded implementation. We propose two different OpenVX API extensions to allow developers deploy accelerated user-kernels using OpenCL.
OpenVX是一个计算机视觉框架,它使嵌入式和实时应用程序能够优化计算机视觉处理的性能和功率。OpenVX通过使用基于图的计算API来解决系统级优化问题。尽管这比其他传统的计算机视觉库(如主要解决内核级优化的OpenCV)有明显的优势,但OpenVX仍然依赖于供应商实现来优化单个内置内核。OpenVX实现了几个计算机视觉内核,但为了提高采用率和用户灵活性,OpenVX增加了对基于C的用户内核的支持,默认情况下是单线程的,并且没有特别的方法来加速内核或将计算卸载到像GPU这样的加速器上。用户必须承担支持多线程实现的繁重工作。我们提出了两种不同的OpenVX API扩展,以允许开发人员使用OpenCL部署加速的用户内核。
{"title":"OpenCL Interoperability with OpenVX Graphs","authors":"Ben Ashbaugh, A. Bernal","doi":"10.1145/3078155.3078183","DOIUrl":"https://doi.org/10.1145/3078155.3078183","url":null,"abstract":"OpenVX is a computer vision framework that enables embedded and real-time applications to optimize computer vision processing for performance and power. OpenVX addresses system-level optimizations by making use of a graph-based computational API. Although this gives a clear advantage over other traditional computer vision libraries such as OpenCV, which mainly addresses kernel-level optimizations, OpenVX still relies on vendor implementations to optimize individual built-in kernels. OpenVX implements several computer vision kernels but in order to increase adoption and user flexibility, OpenVX added support for C based user-kernels, which by default are single-threaded and there is no particular way to accelerate kernels or offload the computation to an accelerator such us a GPU. The user has to do the heavy lifting of supporting a multi-threaded implementation. We propose two different OpenVX API extensions to allow developers deploy accelerated user-kernels using OpenCL.","PeriodicalId":267581,"journal":{"name":"Proceedings of the 5th International Workshop on OpenCL","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115105581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
CUDA-on-CL: a compiler and runtime for running NVIDIA® CUDA™ C++11 applications on OpenCL™ 1.2 Devices CUDA-on- cl:用于在OpenCL™1.2设备上运行NVIDIA®CUDA™c++ 11应用程序的编译器和运行时
Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078156
Hugh Perkins
In the machine learning domain, machine learning frameworks are predominantly written and maintained in NVIDIA® CUDA™ language. There have been attempts to port these frameworks to OpenCL®, notably the ports of Caffe framework by Gu et al; Tschopp; and Engel; and of Torch framework by Perkins. The authors of these frameworks found merging their work into the mainstream framework challenging, and maintain their forks as separate branches or repositories. CUDA-on-CL addresses this problem by leaving the reference implementation entirely in NVIDIA CUDA, both host-side and device-side, and providing a compiler and a runtime component, so that any CUDA C++11 application can in theory be compiled and run on any OpenCL 1.2 device. We use Tensorflow framework as a case-study, and demonstrate the ability to run unary, binary and reduction Tensorflow and Eigen kernels, with no modification to the original CUDA source-code. Performance studies are undertaken, using the Tensorflow kernels. For buffer sizes of 1MB or more, performance is comparable between CUDA and CUDA-on-CL, across unary operations, binary operations and single-axis reductions. Full reduction is around 14 times slower on CUDA-on-CL than on CUDA. We think this may be because of the absence of the low-level hardware shfl operation. The asymptotic time for zero buffer sizes is double that of CUDA, possibly because of the overhead of additional kernel boilerplate needed to workaround limitations in the OpenCL 1.2 standard.
在机器学习领域,机器学习框架主要使用NVIDIA®CUDA™语言编写和维护。已经有人尝试将这些框架移植到OpenCL®,特别是Gu等人对Caffe框架的移植;Tschopp;和恩格尔;以及Perkins的Torch框架。这些框架的作者发现将他们的工作合并到主流框架中是一项挑战,并将他们的分支作为单独的分支或存储库进行维护。CUDA-on- cl通过将参考实现完全保留在NVIDIA CUDA(主机端和设备端)中解决了这个问题,并提供了编译器和运行时组件,因此任何CUDA c++ 11应用程序理论上都可以在任何OpenCL 1.2设备上编译和运行。我们使用Tensorflow框架作为案例研究,并演示了在不修改原始CUDA源代码的情况下运行一元,二进制和约简Tensorflow和特征核的能力。使用Tensorflow核进行性能研究。对于1MB或更大的缓冲区大小,CUDA和CUDA-on- cl之间的性能在一元操作、二进制操作和单轴缩减方面是相当的。完全还原在CUDA-on- cl上比在CUDA上慢14倍左右。我们认为这可能是因为缺少底层硬件shfl操作。零缓冲区大小的渐近时间是CUDA的两倍,可能是因为需要额外的内核样板的开销来解决OpenCL 1.2标准中的限制。
{"title":"CUDA-on-CL: a compiler and runtime for running NVIDIA® CUDA™ C++11 applications on OpenCL™ 1.2 Devices","authors":"Hugh Perkins","doi":"10.1145/3078155.3078156","DOIUrl":"https://doi.org/10.1145/3078155.3078156","url":null,"abstract":"In the machine learning domain, machine learning frameworks are predominantly written and maintained in NVIDIA® CUDA™ language. There have been attempts to port these frameworks to OpenCL®, notably the ports of Caffe framework by Gu et al; Tschopp; and Engel; and of Torch framework by Perkins. The authors of these frameworks found merging their work into the mainstream framework challenging, and maintain their forks as separate branches or repositories. CUDA-on-CL addresses this problem by leaving the reference implementation entirely in NVIDIA CUDA, both host-side and device-side, and providing a compiler and a runtime component, so that any CUDA C++11 application can in theory be compiled and run on any OpenCL 1.2 device. We use Tensorflow framework as a case-study, and demonstrate the ability to run unary, binary and reduction Tensorflow and Eigen kernels, with no modification to the original CUDA source-code. Performance studies are undertaken, using the Tensorflow kernels. For buffer sizes of 1MB or more, performance is comparable between CUDA and CUDA-on-CL, across unary operations, binary operations and single-axis reductions. Full reduction is around 14 times slower on CUDA-on-CL than on CUDA. We think this may be because of the absence of the low-level hardware shfl operation. The asymptotic time for zero buffer sizes is double that of CUDA, possibly because of the overhead of additional kernel boilerplate needed to workaround limitations in the OpenCL 1.2 standard.","PeriodicalId":267581,"journal":{"name":"Proceedings of the 5th International Workshop on OpenCL","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130494139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
The Windsor Build and Testing Framework 温莎构建和测试框架
Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078184
Shane M. Peelar, P. Preney
Khronos open source components, including the ICD and Clang compiler, require significant time and effort to manually download, build, and install. Source code updates to these components require recompilation, and developers must repeat error-prone steps to build new test environments. Ideally developers should be able to use a tool that automatically obtains, builds, and installs OpenCL codes, libraries, and tools. The Windsor Build and Testing Framework (WBTF) is a tool that has been developed at the University of Windsor that does this. This paper will discuss how the WBTF works, demonstrate how it is used, will show how OpenCL C and C++ programs can be built, run, and/or used to perform various header-only, link, and/or various conformance-style tests using OpenCL reference, host-installed, or using device-installed header and libraries. Those interested in OpenCL C/C++ development, the Khronos OpenCL Clang compiler, and in writing conformance tests will be interested in this framework.
Khronos开源组件(包括ICD和Clang编译器)需要花费大量时间和精力手动下载、构建和安装。这些组件的源代码更新需要重新编译,开发人员必须重复容易出错的步骤来构建新的测试环境。理想情况下,开发人员应该能够使用自动获取、构建和安装OpenCL代码、库和工具的工具。温莎构建和测试框架(WBTF)是温莎大学开发的一个工具。本文将讨论WBTF如何工作,演示如何使用它,将展示如何构建、运行OpenCL C和c++程序,并/或使用OpenCL参考、主机安装或使用设备安装的头文件和库来执行各种仅头文件、链接和/或各种一致性风格的测试。那些对OpenCL C/ c++开发、Khronos OpenCL Clang编译器以及编写一致性测试感兴趣的人会对这个框架感兴趣。
{"title":"The Windsor Build and Testing Framework","authors":"Shane M. Peelar, P. Preney","doi":"10.1145/3078155.3078184","DOIUrl":"https://doi.org/10.1145/3078155.3078184","url":null,"abstract":"Khronos open source components, including the ICD and Clang compiler, require significant time and effort to manually download, build, and install. Source code updates to these components require recompilation, and developers must repeat error-prone steps to build new test environments. Ideally developers should be able to use a tool that automatically obtains, builds, and installs OpenCL codes, libraries, and tools. The Windsor Build and Testing Framework (WBTF) is a tool that has been developed at the University of Windsor that does this. This paper will discuss how the WBTF works, demonstrate how it is used, will show how OpenCL C and C++ programs can be built, run, and/or used to perform various header-only, link, and/or various conformance-style tests using OpenCL reference, host-installed, or using device-installed header and libraries. Those interested in OpenCL C/C++ development, the Khronos OpenCL Clang compiler, and in writing conformance tests will be interested in this framework.","PeriodicalId":267581,"journal":{"name":"Proceedings of the 5th International Workshop on OpenCL","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133213902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing the feasibility of OpenCL CPU implementations for agent-based simulations 评估基于代理模拟的OpenCL CPU实现的可行性
Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078174
Nuno Fachada, A. Rosa
Agent-based modeling (ABM) is a bottom-up modeling approach, where each entity of the system being modeled is uniquely represented as a self-determining agent. Large scale emergent behavior in ABMs is population sensitive. As such, it is advisable that the number of agents in a simulation is able to reflect the reality of the system being modeled. This means that in domains such as social modeling, ecology, and biology, systems can contain millions or billions of individuals. Such large scale simulations are only feasible in non-distributed scenarios when the computational power of commodity processors, such as GPUs and multi-core CPUs, is fully exploited. In this paper we evaluate the feasibility of using CPU-oriented OpenCL for high-performance simulations of agent-based models. We compare a CPU-oriented OpenCL implementation of a reference ABM against a parallel Java version of the same model. We show that there are considerable gains in using CPU-based OpenCL for developing and implementing ABMs, with speedups up to 10x over the parallel Java version on a 10-core hyper-threaded CPU.
基于代理的建模(ABM)是一种自下而上的建模方法,其中被建模的系统的每个实体都被唯一地表示为一个自决定的代理。人工智能中的大规模突现行为是种群敏感的。因此,建议模拟中的代理数量能够反映被建模系统的实际情况。这意味着在社会建模、生态学和生物学等领域,系统可以包含数百万或数十亿个个体。这种大规模的模拟只有在gpu和多核cpu等商用处理器的计算能力得到充分利用的非分布式场景下才可行。在本文中,我们评估了使用面向cpu的OpenCL对基于代理的模型进行高性能仿真的可行性。我们将参考ABM的面向cpu的OpenCL实现与同一模型的并行Java版本进行比较。我们展示了使用基于CPU的OpenCL来开发和实现abm有相当大的好处,在10核超线程CPU上,与并行Java版本相比,速度提高了10倍。
{"title":"Assessing the feasibility of OpenCL CPU implementations for agent-based simulations","authors":"Nuno Fachada, A. Rosa","doi":"10.1145/3078155.3078174","DOIUrl":"https://doi.org/10.1145/3078155.3078174","url":null,"abstract":"Agent-based modeling (ABM) is a bottom-up modeling approach, where each entity of the system being modeled is uniquely represented as a self-determining agent. Large scale emergent behavior in ABMs is population sensitive. As such, it is advisable that the number of agents in a simulation is able to reflect the reality of the system being modeled. This means that in domains such as social modeling, ecology, and biology, systems can contain millions or billions of individuals. Such large scale simulations are only feasible in non-distributed scenarios when the computational power of commodity processors, such as GPUs and multi-core CPUs, is fully exploited. In this paper we evaluate the feasibility of using CPU-oriented OpenCL for high-performance simulations of agent-based models. We compare a CPU-oriented OpenCL implementation of a reference ABM against a parallel Java version of the same model. We show that there are considerable gains in using CPU-based OpenCL for developing and implementing ABMs, with speedups up to 10x over the parallel Java version on a 10-core hyper-threaded CPU.","PeriodicalId":267581,"journal":{"name":"Proceedings of the 5th International Workshop on OpenCL","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124142591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Effective simulation of kinetic equations for bosonic system with two-particle interaction using OpenCL 用OpenCL有效模拟双粒子相互作用玻色子系统的动力学方程
Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078185
P. Kartsev
The system of kinetic equations for interacting quantum system (such as excitons or polaritons in semiconductor) is problematic to study numerically, due to multiple summation giving amount of calculation as high as L9 where L is the linear size of the system. Here we present the effective algorithm of L5 log L scale using analytical tranformation. Our OpenCL realization allows simulation of the systems as large as L = 64.
相互作用量子系统(如半导体中的激子或极化子)的动力学方程系统在数值上研究是有问题的,因为多次求和给出的计算量高达L9,其中L是系统的线性大小。本文提出了利用解析变换求解L5 log L尺度的有效算法。我们的OpenCL实现允许模拟大到L = 64的系统。
{"title":"Effective simulation of kinetic equations for bosonic system with two-particle interaction using OpenCL","authors":"P. Kartsev","doi":"10.1145/3078155.3078185","DOIUrl":"https://doi.org/10.1145/3078155.3078185","url":null,"abstract":"The system of kinetic equations for interacting quantum system (such as excitons or polaritons in semiconductor) is problematic to study numerically, due to multiple summation giving amount of calculation as high as L9 where L is the linear size of the system. Here we present the effective algorithm of L5 log L scale using analytical tranformation. Our OpenCL realization allows simulation of the systems as large as L = 64.","PeriodicalId":267581,"journal":{"name":"Proceedings of the 5th International Workshop on OpenCL","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133200477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
OpenCL in Scientific High Performance Computing: The Good, the Bad, and the Ugly 科学高性能计算中的OpenCL:好,坏,丑
Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078170
M. Noack
For writing a new scientific application, portability across existing and future hardware should be the major design goal, as there is a multitude of different compute devices, and programme codes typically outlive systems by far. Unlike other programming models that address parallelism or heterogeneity, OpenCL does provide practical portability across a wide range of HPC-relevant architectures. Other than that, it has a range of further advantages like being a library-only implementation, and using runtime kernel-compilation. We present experiences with utilising OpenCL alongside C++, MPI, and CMake in two real-world scientific codes. Our targets are a Cray XC40 supercomputer with multi- and many-core (Xeon Phi) CPUs, as well as multiple smaller systems with Nvidia and AMD GPUs. We shed light on practical issues arising in such a scenario, like the interaction between OpenCL and MPI, discuss solutions, and point out current limitations of OpenCL in the domain of scientific HPC from an application developer's and user's point of view.
对于编写新的科学应用程序,跨现有和未来硬件的可移植性应该是主要的设计目标,因为存在大量不同的计算设备,程序代码通常比系统寿命长得多。与其他解决并行性或异构性的编程模型不同,OpenCL确实提供了跨各种hpc相关架构的实际可移植性。除此之外,它还有其他一些优点,比如仅支持库实现,以及使用运行时内核编译。我们介绍了在两个真实世界的科学代码中使用OpenCL以及c++, MPI和CMake的经验。我们的目标是带有多核和多核(Xeon Phi) cpu的Cray XC40超级计算机,以及带有Nvidia和AMD gpu的多个小型系统。我们从应用程序开发人员和用户的角度,阐明了在这种情况下出现的实际问题,如OpenCL和MPI之间的交互,讨论了解决方案,并指出了OpenCL在科学高性能计算领域的当前局限性。
{"title":"OpenCL in Scientific High Performance Computing: The Good, the Bad, and the Ugly","authors":"M. Noack","doi":"10.1145/3078155.3078170","DOIUrl":"https://doi.org/10.1145/3078155.3078170","url":null,"abstract":"For writing a new scientific application, portability across existing and future hardware should be the major design goal, as there is a multitude of different compute devices, and programme codes typically outlive systems by far. Unlike other programming models that address parallelism or heterogeneity, OpenCL does provide practical portability across a wide range of HPC-relevant architectures. Other than that, it has a range of further advantages like being a library-only implementation, and using runtime kernel-compilation. We present experiences with utilising OpenCL alongside C++, MPI, and CMake in two real-world scientific codes. Our targets are a Cray XC40 supercomputer with multi- and many-core (Xeon Phi) CPUs, as well as multiple smaller systems with Nvidia and AMD GPUs. We shed light on practical issues arising in such a scenario, like the interaction between OpenCL and MPI, discuss solutions, and point out current limitations of OpenCL in the domain of scientific HPC from an application developer's and user's point of view.","PeriodicalId":267581,"journal":{"name":"Proceedings of the 5th International Workshop on OpenCL","volume":"48 23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116799505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Challenges and Opportunities in Native GPU Debugging 原生GPU调试的挑战与机遇
Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078158
Jeff McAllister, Uri Levy
In this technical session we present the open architectural design of the debugger and how it fits into the OpenCL JIT compilation flow. We demonstrate a show case on how to natively work with the debugger to solve functional bugs, as-well-as low-level debugging techniques on SIMD thread level which help to solve complex issues such as misaligned or out of range accesses to local/global memory, stack overflows, Illegal instructions, etc. Finally, we cover the challenges in debugging.
在本次技术会议上,我们将介绍调试器的开放式架构设计,以及它如何适应OpenCL JIT编译流程。我们演示了如何本地使用调试器来解决功能错误,以及SIMD线程级别的低级调试技术,这些技术有助于解决复杂的问题,例如对本地/全局内存的不对齐或超出范围的访问,堆栈溢出,非法指令等。最后,我们介绍了调试中的挑战。
{"title":"Challenges and Opportunities in Native GPU Debugging","authors":"Jeff McAllister, Uri Levy","doi":"10.1145/3078155.3078158","DOIUrl":"https://doi.org/10.1145/3078155.3078158","url":null,"abstract":"In this technical session we present the open architectural design of the debugger and how it fits into the OpenCL JIT compilation flow. We demonstrate a show case on how to natively work with the debugger to solve functional bugs, as-well-as low-level debugging techniques on SIMD thread level which help to solve complex issues such as misaligned or out of range accesses to local/global memory, stack overflows, Illegal instructions, etc. Finally, we cover the challenges in debugging.","PeriodicalId":267581,"journal":{"name":"Proceedings of the 5th International Workshop on OpenCL","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115260310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Performance and Energy Evaluation of OpenCL-accelerated Molecular Docking opencl加速分子对接的性能与能量评价
Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078167
Leonardo Solis-Vasquez, A. Koch
Molecular Docking is a methodology used extensively in modern drug design. It aims to predict the binding position of two molecules by calculating the energy of their possible binding poses. One of the most cited docking tools is AutoDock. At its core, it solves an optimization problem by generating a large solution space of possible poses, and searches among them for the one having the lowest energy. These complex algorithms thus benefit from parallelization based run-time acceleration. This work presents an OpenCL implementation of AutoDock, and a corresponding performance evaluation on two different platforms based on multi-core CPU and GPU accelerators. It shows that OpenCL allows highly efficient docking simulations, achieving speedups of ~4x and ~56x over the original serial AutoDock version, as well as energy efficiency gains of ~2x and ~6x. respectively. To the best of our knowledge, this work is the first one also considering the energy efficiency of molecular docking programs.
分子对接是一种广泛应用于现代药物设计的方法。它的目的是通过计算两个分子可能的结合姿态的能量来预测它们的结合位置。最常被引用的对接工具之一是AutoDock。其核心是通过生成一个由可能姿态组成的大解空间来解决优化问题,并在其中搜索能量最低的解空间。这些复杂的算法因此受益于基于并行化的运行时加速。本文介绍了AutoDock的OpenCL实现,并在基于多核CPU和GPU加速器的两个不同平台上进行了相应的性能评估。它表明,OpenCL可以实现高效的对接模拟,与原始串行AutoDock版本相比,实现了约4倍和约56倍的速度提升,以及约2倍和约6倍的能效提升。分别。据我们所知,这项工作是第一个考虑分子对接方案能量效率的工作。
{"title":"A Performance and Energy Evaluation of OpenCL-accelerated Molecular Docking","authors":"Leonardo Solis-Vasquez, A. Koch","doi":"10.1145/3078155.3078167","DOIUrl":"https://doi.org/10.1145/3078155.3078167","url":null,"abstract":"Molecular Docking is a methodology used extensively in modern drug design. It aims to predict the binding position of two molecules by calculating the energy of their possible binding poses. One of the most cited docking tools is AutoDock. At its core, it solves an optimization problem by generating a large solution space of possible poses, and searches among them for the one having the lowest energy. These complex algorithms thus benefit from parallelization based run-time acceleration. This work presents an OpenCL implementation of AutoDock, and a corresponding performance evaluation on two different platforms based on multi-core CPU and GPU accelerators. It shows that OpenCL allows highly efficient docking simulations, achieving speedups of ~4x and ~56x over the original serial AutoDock version, as well as energy efficiency gains of ~2x and ~6x. respectively. To the best of our knowledge, this work is the first one also considering the energy efficiency of molecular docking programs.","PeriodicalId":267581,"journal":{"name":"Proceedings of the 5th International Workshop on OpenCL","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122224813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Production-CL library for iterative scientific calculations 用于迭代科学计算的Production-CL库
Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078162
P. Kartsev
The Production-CL library for iterative scientific calculations with OpenCL is presented. The main goal is to get rid of long repeating lines of standard code which slow down the development process, and realize the typical workflow elements for simulation of physics problems. Main entities of PCL library are: (i) kernel (called with single line resembling CUDA kernel invocation) and (ii) batch of kernels (to help constructing complex step of each iteration). In addition, PCL realizes the procedures standard for scientific calculations 'in production': typical cycle of iterations with main step and regular save/load the whole state, to save work. As an example of library application, we show and compare several projects developed with different approaches.
介绍了用OpenCL进行迭代科学计算的Production-CL库。其主要目标是摆脱冗长重复的标准代码行,从而减缓开发过程,并实现模拟物理问题的典型工作流元素。PCL库的主要实体是:(i)内核(类似CUDA内核调用的单行调用)和(ii)批内核(帮助构建每次迭代的复杂步骤)。此外,PCL还实现了“生产中”科学计算的程序标准:典型的循环迭代与主步骤和定期保存/加载整个状态,以节省工作。作为一个库应用程序示例,我们展示并比较了使用不同方法开发的几个项目。
{"title":"Production-CL library for iterative scientific calculations","authors":"P. Kartsev","doi":"10.1145/3078155.3078162","DOIUrl":"https://doi.org/10.1145/3078155.3078162","url":null,"abstract":"The Production-CL library for iterative scientific calculations with OpenCL is presented. The main goal is to get rid of long repeating lines of standard code which slow down the development process, and realize the typical workflow elements for simulation of physics problems. Main entities of PCL library are: (i) kernel (called with single line resembling CUDA kernel invocation) and (ii) batch of kernels (to help constructing complex step of each iteration). In addition, PCL realizes the procedures standard for scientific calculations 'in production': typical cycle of iterations with main step and regular save/load the whole state, to save work. As an example of library application, we show and compare several projects developed with different approaches.","PeriodicalId":267581,"journal":{"name":"Proceedings of the 5th International Workshop on OpenCL","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129256312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
SYCL-BLAS: Leveraging Expression Trees for Linear Algebra SYCL-BLAS:利用线性代数的表达式树
Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078189
J. Aliaga, Ruymán Reyes, M. Goli
In the current landscape of C++ applications, there is an increasing need of including different levels of support for heterogeneous platforms, where multiple specialised devices collaborate to execute an application. In this context, the SYCL standard[8] has been published by Khronos, providing a C++ abstraction layer on top of OpenCL[9] that enables single-source programming for a large number of heterogeneous devices. SYCL single-source programming and task data-flow approach enable developers to leverage modern programming techniques on heterogeneous platforms. In this paper, we present SYCL-BLAS, a BLAS implementation using SYCL that uses Expression Tree templates to generate BLAS kernels. This technique is then used to demonstrate seamless kernel fusion via composition of tree nodes. We also demonstrate how SYCL can be used to quickly develop libraries for heterogeneous systems by providing sufficient levels of abstraction.
在当前的c++应用环境中,越来越需要包括对异构平台的不同级别的支持,在异构平台中,多个专用设备协作来执行应用程序。在这种情况下,Khronos发布了SYCL标准[8],它在OpenCL[8]之上提供了一个c++抽象层,支持对大量异构设备进行单源编程。SYCL单源编程和任务数据流方法使开发人员能够在异构平台上利用现代编程技术。在本文中,我们介绍了SYCL-BLAS,一个使用SYCL的BLAS实现,它使用表达式树模板来生成BLAS内核。然后使用该技术通过树节点的组合来演示无缝核融合。我们还演示了如何通过提供足够的抽象级别来使用SYCL快速开发异构系统的库。
{"title":"SYCL-BLAS: Leveraging Expression Trees for Linear Algebra","authors":"J. Aliaga, Ruymán Reyes, M. Goli","doi":"10.1145/3078155.3078189","DOIUrl":"https://doi.org/10.1145/3078155.3078189","url":null,"abstract":"In the current landscape of C++ applications, there is an increasing need of including different levels of support for heterogeneous platforms, where multiple specialised devices collaborate to execute an application. In this context, the SYCL standard[8] has been published by Khronos, providing a C++ abstraction layer on top of OpenCL[9] that enables single-source programming for a large number of heterogeneous devices. SYCL single-source programming and task data-flow approach enable developers to leverage modern programming techniques on heterogeneous platforms. In this paper, we present SYCL-BLAS, a BLAS implementation using SYCL that uses Expression Tree templates to generate BLAS kernels. This technique is then used to demonstrate seamless kernel fusion via composition of tree nodes. We also demonstrate how SYCL can be used to quickly develop libraries for heterogeneous systems by providing sufficient levels of abstraction.","PeriodicalId":267581,"journal":{"name":"Proceedings of the 5th International Workshop on OpenCL","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129733810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
Proceedings of the 5th International Workshop on OpenCL
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1