Automatic Differentiation for Adjoint Stencil Loops

Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-07-05 DOI:10.1145/3337821.3337906

J. Hückelheim, Navjot Kukreja, S. Narayanan, F. Luporini, G. Gorman, P. Hovland

{"title":"Automatic Differentiation for Adjoint Stencil Loops","authors":"J. Hückelheim, Navjot Kukreja, S. Narayanan, F. Luporini, G. Gorman, P. Hovland","doi":"10.1145/3337821.3337906","DOIUrl":null,"url":null,"abstract":"Stencil loops are a common motif in computations including convolutional neural networks, structured-mesh solvers for partial differential equations, and image processing. Stencil loops are easy to parallelise, and their fast execution is aided by compilers, libraries, and domain-specific languages. Reverse-mode automatic differentiation, also known as algorithmic differentiation, autodiff, adjoint differentiation, or back-propagation, is sometimes used to obtain gradients of programs that contain stencil loops. Unfortunately, conventional automatic differentiation results in a memory access pattern that is not stencil-like and not easily parallelisable. In this paper we present a novel combination of automatic differentiation and loop transformations that preserves the structure and memory access pattern of stencil loops, while computing fully consistent derivatives. The generated loops can be parallelised and optimised for performance in the same way and using the same tools as the original computation. We have implemented this new technique in the Python tool PerforAD, which we release with this paper along with test cases derived from seismic imaging and computational fluid dynamics applications.","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 48th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3337821.3337906","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

Abstract

Stencil loops are a common motif in computations including convolutional neural networks, structured-mesh solvers for partial differential equations, and image processing. Stencil loops are easy to parallelise, and their fast execution is aided by compilers, libraries, and domain-specific languages. Reverse-mode automatic differentiation, also known as algorithmic differentiation, autodiff, adjoint differentiation, or back-propagation, is sometimes used to obtain gradients of programs that contain stencil loops. Unfortunately, conventional automatic differentiation results in a memory access pattern that is not stencil-like and not easily parallelisable. In this paper we present a novel combination of automatic differentiation and loop transformations that preserves the structure and memory access pattern of stencil loops, while computing fully consistent derivatives. The generated loops can be parallelised and optimised for performance in the same way and using the same tools as the original computation. We have implemented this new technique in the Python tool PerforAD, which we release with this paper along with test cases derived from seismic imaging and computational fluid dynamics applications.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

伴随模板环的自动判别

模板循环是卷积神经网络、偏微分方程的结构网格求解器和图像处理等计算中的常见主题。模板循环很容易并行化，它们的快速执行得到编译器、库和特定于领域的语言的帮助。逆模式自动微分，也称为算法微分、自动微分、伴随微分或反向传播，有时用于获得包含模板循环的程序的梯度。不幸的是，传统的自动区分导致内存访问模式不像模板，也不容易并行。在本文中，我们提出了一种自动微分和循环变换的新组合，它保留了模板循环的结构和内存访问模式，同时计算完全一致的导数。生成的循环可以以相同的方式和使用与原始计算相同的工具来并行化和优化性能。我们已经在Python工具PerforAD中实现了这项新技术，该工具与地震成像和计算流体动力学应用程序的测试用例一起发布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 48th International Conference on Parallel Processing

自引率

0.00%

发文量

期刊最新文献

Express Link Placement for NoC-Based Many-Core Platforms Cartesian Collective Communication Artemis A Specialized Concurrent Queue for Scheduling Irregular Workloads on GPUs diBELLA: Distributed Long Read to Long Read Alignment