Smoothing Methods for Automatic Differentiation Across Conditional Branches

arXiv - CS - Mathematical Software Pub Date : 2023-10-05 DOI:arxiv-2310.03585

Justin N. Kreikemeyer, Philipp Andelfinger

{"title":"Smoothing Methods for Automatic Differentiation Across Conditional Branches","authors":"Justin N. Kreikemeyer, Philipp Andelfinger","doi":"arxiv-2310.03585","DOIUrl":null,"url":null,"abstract":"Programs involving discontinuities introduced by control flow constructs such\nas conditional branches pose challenges to mathematical optimization methods\nthat assume a degree of smoothness in the objective function's response\nsurface. Smooth interpretation (SI) is a form of abstract interpretation that\napproximates the convolution of a program's output with a Gaussian kernel, thus\nsmoothing its output in a principled manner. Here, we combine SI with automatic\ndifferentiation (AD) to efficiently compute gradients of smoothed programs. In\ncontrast to AD across a regular program execution, these gradients also capture\nthe effects of alternative control flow paths. The combination of SI with AD\nenables the direct gradient-based parameter synthesis for branching programs,\nallowing for instance the calibration of simulation models or their combination\nwith neural network models in machine learning pipelines. We detail the effects\nof the approximations made for tractability in SI and propose a novel Monte\nCarlo estimator that avoids the underlying assumptions by estimating the\nsmoothed programs' gradients through a combination of AD and sampling. Using\nDiscoGrad, our tool for automatically translating simple C++ programs to a\nsmooth differentiable form, we perform an extensive evaluation. We compare the\ncombination of SI with AD and our Monte Carlo estimator to existing\ngradient-free and stochastic methods on four non-trivial and originally\ndiscontinuous problems ranging from classical simulation-based optimization to\nneural network-driven control. While the optimization progress with the\nSI-based estimator depends on the complexity of the programs' control flow, our\nMonte Carlo estimator is competitive in all problems, exhibiting the fastest\nconvergence by a substantial margin in our highest-dimensional problem.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"10 5","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Mathematical Software","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2310.03585","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Programs involving discontinuities introduced by control flow constructs such as conditional branches pose challenges to mathematical optimization methods that assume a degree of smoothness in the objective function's response surface. Smooth interpretation (SI) is a form of abstract interpretation that approximates the convolution of a program's output with a Gaussian kernel, thus smoothing its output in a principled manner. Here, we combine SI with automatic differentiation (AD) to efficiently compute gradients of smoothed programs. In contrast to AD across a regular program execution, these gradients also capture the effects of alternative control flow paths. The combination of SI with AD enables the direct gradient-based parameter synthesis for branching programs, allowing for instance the calibration of simulation models or their combination with neural network models in machine learning pipelines. We detail the effects of the approximations made for tractability in SI and propose a novel Monte Carlo estimator that avoids the underlying assumptions by estimating the smoothed programs' gradients through a combination of AD and sampling. Using DiscoGrad, our tool for automatically translating simple C++ programs to a smooth differentiable form, we perform an extensive evaluation. We compare the combination of SI with AD and our Monte Carlo estimator to existing gradient-free and stochastic methods on four non-trivial and originally discontinuous problems ranging from classical simulation-based optimization to neural network-driven control. While the optimization progress with the SI-based estimator depends on the complexity of the programs' control flow, our Monte Carlo estimator is competitive in all problems, exhibiting the fastest convergence by a substantial margin in our highest-dimensional problem.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

条件分支间自动微分的平滑方法

涉及由控制流构造(如条件分支)引入的不连续的程序对数学优化方法提出了挑战，这些方法假设目标函数的响应面具有一定程度的平滑性。平滑解释(SI)是一种抽象解释形式，它近似于程序输出与高斯核的卷积，从而以一种有原则的方式平滑其输出。在这里，我们将SI与自动微分(AD)相结合，以有效地计算平滑程序的梯度。与常规程序执行中的AD相比，这些梯度还捕获了可选控制流路径的影响。SI与ad的结合使分支程序能够直接基于梯度的参数合成，例如允许模拟模型的校准或与机器学习管道中的神经网络模型的组合。我们详细介绍了SI中可追溯性近似的影响，并提出了一种新的蒙特卡罗估计器，该估计器通过AD和采样的组合估计平滑程序的梯度，从而避免了潜在的假设。使用我们自动将简单的c++程序转换为光滑可微形式的工具discograd，我们进行了广泛的评估。我们比较了SI与AD的组合和我们的蒙特卡罗估计与现有的无梯度和随机方法在四个非平凡和原始不连续问题上的组合，从经典的基于仿真的优化到神经网络驱动的控制。虽然基于thesi的估计器的优化进度取决于程序控制流的复杂性，但我们的蒙特卡罗估计器在所有问题中都具有竞争力，在我们的最高维问题中表现出最快的收敛速度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Mathematical Software

自引率

0.00%

发文量