The recently introduced Genetic Column Generation (GenCol) algorithm has been numerically observed to efficiently and accurately compute high-dimensional optimal transport (OT) plans for general multi-marginal problems, but theoretical results on the algorithm have hitherto been lacking. The algorithm solves the OT linear program on a dynamically updated low-dimensional submanifold consisting of sparse plans. The submanifold dimension exceeds the sparse support of optimal plans only by a fixed factor βbeta. Here we prove that for β≥2beta geq 2 and in the two-marginal case, GenCol always converges to an exact solution, for arbitrary costs and marginals. The proof relies on the concept of c-cyclical monotonicity. As an offshoot, GenCol rigorously reduces the data complexity of numerically solving two-marginal OT problems from O(ℓ2)O(ell ^2) to O(ℓ)O(ell ) without any loss in accuracy, where ℓell is the number of discretization points for a single marginal. At the end of the paper we also present some insights into the convergence behavior in the multi-marginal case.
最近推出的遗传列生成(GenCol)算法已被数值观测到,可以高效、准确地计算一般多边际问题的高维最优运输(OT)计划,但迄今为止还缺乏有关该算法的理论成果。该算法在由稀疏计划组成的动态更新的低维子平面上求解 OT 线性程序。子平面的维度仅以固定系数 β beta 的方式超出最优计划的稀疏支持。在这里,我们将证明对于 β ≥ 2 beta geq 2 和双边际情况,GenCol 总是收敛于精确解,适用于任意成本和边际。证明依赖于 c 周期单调性的概念。作为一个分支,GenCol 严格地将数值求解双边际 OT 问题的数据复杂度从 O ( ℓ 2 ) O(ell ^2) 降低到 O ( ℓ ) O(ell),并且没有任何精度损失,其中 ℓ ell 是单个边际的离散点数。在本文的最后,我们还提出了对多边际情况下收敛行为的一些见解。
{"title":"Convergence proof for the GenCol algorithm in the case of two-marginal optimal transport","authors":"Gero Friesecke, Maximilian Penka","doi":"10.1090/mcom/3968","DOIUrl":"https://doi.org/10.1090/mcom/3968","url":null,"abstract":"<p>The recently introduced Genetic Column Generation (GenCol) algorithm has been numerically observed to efficiently and accurately compute high-dimensional optimal transport (OT) plans for general multi-marginal problems, but theoretical results on the algorithm have hitherto been lacking. The algorithm solves the OT linear program on a dynamically updated low-dimensional submanifold consisting of sparse plans. The submanifold dimension exceeds the sparse support of optimal plans only by a fixed factor <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"beta\"> <mml:semantics> <mml:mi>β<!-- β --></mml:mi> <mml:annotation encoding=\"application/x-tex\">beta</mml:annotation> </mml:semantics> </mml:math> </inline-formula>. Here we prove that for <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"beta greater-than-or-equal-to 2\"> <mml:semantics> <mml:mrow> <mml:mi>β<!-- β --></mml:mi> <mml:mo>≥<!-- ≥ --></mml:mo> <mml:mn>2</mml:mn> </mml:mrow> <mml:annotation encoding=\"application/x-tex\">beta geq 2</mml:annotation> </mml:semantics> </mml:math> </inline-formula> and in the two-marginal case, GenCol always converges to an exact solution, for arbitrary costs and marginals. The proof relies on the concept of c-cyclical monotonicity. As an offshoot, GenCol rigorously reduces the data complexity of numerically solving two-marginal OT problems from <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"upper O left-parenthesis script l squared right-parenthesis\"> <mml:semantics> <mml:mrow> <mml:mi>O</mml:mi> <mml:mo stretchy=\"false\">(</mml:mo> <mml:msup> <mml:mi>ℓ<!-- ℓ --></mml:mi> <mml:mn>2</mml:mn> </mml:msup> <mml:mo stretchy=\"false\">)</mml:mo> </mml:mrow> <mml:annotation encoding=\"application/x-tex\">O(ell ^2)</mml:annotation> </mml:semantics> </mml:math> </inline-formula> to <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"upper O left-parenthesis script l right-parenthesis\"> <mml:semantics> <mml:mrow> <mml:mi>O</mml:mi> <mml:mo stretchy=\"false\">(</mml:mo> <mml:mi>ℓ<!-- ℓ --></mml:mi> <mml:mo stretchy=\"false\">)</mml:mo> </mml:mrow> <mml:annotation encoding=\"application/x-tex\">O(ell )</mml:annotation> </mml:semantics> </mml:math> </inline-formula> without any loss in accuracy, where <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"script l\"> <mml:semantics> <mml:mi>ℓ<!-- ℓ --></mml:mi> <mml:annotation encoding=\"application/x-tex\">ell</mml:annotation> </mml:semantics> </mml:math> </inline-formula> is the number of discretization points for a single marginal. At the end of the paper we also present some insights into the convergence behavior in the multi-marginal case.</p>","PeriodicalId":18456,"journal":{"name":"Mathematics of Computation","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140935884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matthew de Courcy-Ireland, Maria Dostert, Maryna Viazovska
We prove that the Cohn–Elkies linear programming bound for sphere packing is not sharp in dimension 6. The proof uses duality and optimization over a space of modular forms, generalizing a construction of Cohn–Triantafillou [Math. Comp. 91 (2021), pp. 491–508] to the case of odd weight and non-trivial character.
{"title":"Six-dimensional sphere packing and linear programming","authors":"Matthew de Courcy-Ireland, Maria Dostert, Maryna Viazovska","doi":"10.1090/mcom/3959","DOIUrl":"https://doi.org/10.1090/mcom/3959","url":null,"abstract":"<p>We prove that the Cohn–Elkies linear programming bound for sphere packing is not sharp in dimension 6. The proof uses duality and optimization over a space of modular forms, generalizing a construction of Cohn–Triantafillou [Math. Comp. 91 (2021), pp. 491–508] to the case of odd weight and non-trivial character.</p>","PeriodicalId":18456,"journal":{"name":"Mathematics of Computation","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140935684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We study the convergence of Langevin-Simulated Annealing type algorithms with multiplicative noise, i.e. for V:Rd→RV : mathbb {R}^d to mathbb {R} a potential function to minimize, we consider the stochastic differential equation dYt=−σσ⊤∇V(Yt)dY_t = - sigma sigma ^top nabla V(Y_t)dt+a(t)σ(Yt)dWt+a(t)2Υ(Yt)dt
我们研究了带有乘法噪声的朗格文模拟退火算法的收敛性,即对于 V : R d → R V :mathbb {R}^d to mathbb {R} 的势函数最小化、我们考虑随机微分方程 d Y t = - σ σ ⊤∇ V ( Y t ) dY_t = -V(Y_t) d t + a ( t ) σ ( Y t ) d W t + a ( t ) 2 Υ ( Y t ) d t dt + a(t)sigma (Y_t)dW_t + a(t)^2Upsilon (Y_t)dt 、其中 ( W t ) (W_t) 是布朗运动,其中 σ : R d → M d ( R ) σ : mathbb {R}^d to mathcal {M}_d(mathbb {R}) 是一个自适应(乘法)噪声,其中 a : R + → R + a : mathbb {R}^+ to mathbb {R}^+ 是一个递减到 0 0 的函数,Υ Upsilon 是一个修正项。这种设置可以应用于机器学习中出现的优化问题;与经典的朗格文方程 d Y t = -∇ V ( Y t ) d t + σ d W t dY_t = -nabla V(Y_t)dt + sigma dW_t 相比,允许 σ sigma 取决于位置会带来更快的收敛速度。σ sigma 是常量矩阵的情况已被广泛研究,但对一般情况的研究却很少。我们证明了 Y t 的 L 1 L^1 - Wasserstein 距离的收敛性。我们证明了 Y t Y_t 和相关欧拉方案 Y ¯ t (bar {Y}_t)的瓦瑟斯坦距离收敛于某个由 argmin ( V ) operatorname {argmin}(V) 支持的度量 ν ⋆ nu ^star ,并给出了密度 ∝ exp ( - 2 V ( x ) / a ( t ) 2 ) 的瞬时吉布斯度量 ν a ( t ) nu _{a(t)} 的收敛速率。 propto exp (-2V(x)/a(t)^2) .为此,我们首先考虑 a a 是片断常数函数的情况。我们再次找到经典的时间表 a ( t ) = A log - 1 / 2 ( t ) a(t) = Alog ^{-1/2}(t) 。然后,我们利用遍历特性给出了步进常数情况下的瓦瑟斯坦距离的边界,从而证明了一般情况下的收敛性。
{"title":"Convergence of Langevin-simulated annealing algorithms with multiplicative noise","authors":"Pierre Bras, Gilles Pagès","doi":"10.1090/mcom/3899","DOIUrl":"https://doi.org/10.1090/mcom/3899","url":null,"abstract":"<p>We study the convergence of Langevin-Simulated Annealing type algorithms with multiplicative noise, i.e. for <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"upper V colon double-struck upper R Superscript d Baseline right-arrow double-struck upper R\"> <mml:semantics> <mml:mrow> <mml:mi>V</mml:mi> <mml:mo>:</mml:mo> <mml:msup> <mml:mrow> <mml:mi mathvariant=\"double-struck\">R</mml:mi> </mml:mrow> <mml:mi>d</mml:mi> </mml:msup> <mml:mo stretchy=\"false\">→<!-- → --></mml:mo> <mml:mrow> <mml:mi mathvariant=\"double-struck\">R</mml:mi> </mml:mrow> </mml:mrow> <mml:annotation encoding=\"application/x-tex\">V : mathbb {R}^d to mathbb {R}</mml:annotation> </mml:semantics> </mml:math> </inline-formula> a potential function to minimize, we consider the stochastic differential equation <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"d upper Y Subscript t Baseline equals minus sigma sigma Superscript down-tack Baseline nabla upper V left-parenthesis upper Y Subscript t Baseline right-parenthesis\"> <mml:semantics> <mml:mrow> <mml:mi>d</mml:mi> <mml:msub> <mml:mi>Y</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:mo>=</mml:mo> <mml:mo>−<!-- − --></mml:mo> <mml:mi>σ<!-- σ --></mml:mi> <mml:msup> <mml:mi>σ<!-- σ --></mml:mi> <mml:mi mathvariant=\"normal\">⊤<!-- ⊤ --></mml:mi> </mml:msup> <mml:mi mathvariant=\"normal\">∇<!-- ∇ --></mml:mi> <mml:mi>V</mml:mi> <mml:mo stretchy=\"false\">(</mml:mo> <mml:msub> <mml:mi>Y</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:mo stretchy=\"false\">)</mml:mo> </mml:mrow> <mml:annotation encoding=\"application/x-tex\">dY_t = - sigma sigma ^top nabla V(Y_t)</mml:annotation> </mml:semantics> </mml:math> </inline-formula> <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"d t plus a left-parenthesis t right-parenthesis sigma left-parenthesis upper Y Subscript t Baseline right-parenthesis d upper W Subscript t plus a left-parenthesis t right-parenthesis squared normal upper Upsilon left-parenthesis upper Y Subscript t Baseline right-parenthesis d t\"> <mml:semantics> <mml:mrow> <mml:mi>d</mml:mi> <mml:mi>t</mml:mi> <mml:mo>+</mml:mo> <mml:mi>a</mml:mi> <mml:mo stretchy=\"false\">(</mml:mo> <mml:mi>t</mml:mi> <mml:mo stretchy=\"false\">)</mml:mo> <mml:mi>σ<!-- σ --></mml:mi> <mml:mo stretchy=\"false\">(</mml:mo> <mml:msub> <mml:mi>Y</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:mo stretchy=\"false\">)</mml:mo> <mml:mi>d</mml:mi> <mml:msub> <mml:mi>W</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:mo>+</mml:mo> <mml:mi>a</mml:mi> <mml:mo stretchy=\"false\">(</mml:mo> <mml:mi>t</mml:mi> <mml:msup> <mml:mo stretchy=\"false\">)</mml:mo> <mml:mn>2</mml:mn> </mml:msup> <mml:mi mathvariant=\"normal\">Υ<!-- Υ --></mml:mi> <mml:mo stretchy=\"false\">(</mml:mo> <mml:msub> <mml:mi>Y</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:mo stretchy=\"false\">)</mml:mo> <mml:mi>d</mml:mi> <mml:mi>t</mml:mi> </mml:mro","PeriodicalId":18456,"journal":{"name":"Mathematics of Computation","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140931328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we study the nonconvex constrained composition optimization, in which the objective contains a composition of two expected-value functions whose accurate information is normally expensive to calculate. We propose a STochastic nEsted Primal-dual (STEP) method for such problems. In each iteration, with an auxiliary variable introduced to track the inner layer function values we compute stochastic gradients of the nested function using a subsampling strategy. To alleviate difficulties caused by possibly nonconvex constraints, we construct a stochastic approximation to the linearized augmented Lagrangian function to update the primal variable, which further motivates to update the dual variable in a weighted-average way. Moreover, to better understand the asymptotic dynamics of the update schemes we consider a deterministic continuous-time system from the perspective of ordinary differential equation (ODE). We analyze the Karush-Kuhn-Tucker measure at the output by the STEP method with constant parameters and establish its iteration and sample complexities to find an ϵepsilon-stationary point, ensuring that expected stationarity, feasibility as well as complementary slackness are below accuracy ϵepsilon. To leverage the benefit of the (near) initial feasibility in the STEP method, we propose a two-stage framework incorporating a feasibility-seeking phase, aiming to locate a nearly feasible initial point. Moreover, to enhance the adaptivity of the STEP algorithm, we propose an adaptive variant by adaptively adjusting its parameters, along with a complexity analysis. Numerical results on a risk-averse portfolio optimization problem and an orthogonal nonnegative matrix decomposition reveal the effectiveness of the proposed algorithms.
{"title":"Stochastic nested primal-dual method for nonconvex constrained composition optimization","authors":"Lingzi Jin, Xiao Wang","doi":"10.1090/mcom/3965","DOIUrl":"https://doi.org/10.1090/mcom/3965","url":null,"abstract":"<p>In this paper we study the nonconvex constrained composition optimization, in which the objective contains a composition of two expected-value functions whose accurate information is normally expensive to calculate. We propose a STochastic nEsted Primal-dual (STEP) method for such problems. In each iteration, with an auxiliary variable introduced to track the inner layer function values we compute stochastic gradients of the nested function using a subsampling strategy. To alleviate difficulties caused by possibly nonconvex constraints, we construct a stochastic approximation to the linearized augmented Lagrangian function to update the primal variable, which further motivates to update the dual variable in a weighted-average way. Moreover, to better understand the asymptotic dynamics of the update schemes we consider a deterministic continuous-time system from the perspective of ordinary differential equation (ODE). We analyze the Karush-Kuhn-Tucker measure at the output by the STEP method with constant parameters and establish its iteration and sample complexities to find an <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"epsilon\"> <mml:semantics> <mml:mi>ϵ<!-- ϵ --></mml:mi> <mml:annotation encoding=\"application/x-tex\">epsilon</mml:annotation> </mml:semantics> </mml:math> </inline-formula>-stationary point, ensuring that expected stationarity, feasibility as well as complementary slackness are below accuracy <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"epsilon\"> <mml:semantics> <mml:mi>ϵ<!-- ϵ --></mml:mi> <mml:annotation encoding=\"application/x-tex\">epsilon</mml:annotation> </mml:semantics> </mml:math> </inline-formula>. To leverage the benefit of the (near) initial feasibility in the STEP method, we propose a two-stage framework incorporating a feasibility-seeking phase, aiming to locate a nearly feasible initial point. Moreover, to enhance the adaptivity of the STEP algorithm, we propose an adaptive variant by adaptively adjusting its parameters, along with a complexity analysis. Numerical results on a risk-averse portfolio optimization problem and an orthogonal nonnegative matrix decomposition reveal the effectiveness of the proposed algorithms.</p>","PeriodicalId":18456,"journal":{"name":"Mathematics of Computation","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140931201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Implicit-explicit Runge-Kutta (IMEX-RK) schemes are popular methods to treat multiscale equations that contain a stiff part and a non-stiff part, where the stiff part is characterized by a small parameter εvarepsilon. In this work, we prove rigorously the uniform stability and uniform accuracy of a class of IMEX-RK schemes for a linear hyperbolic system with stiff relaxation. The result we obtain is optimal in the sense that it holds regardless of the value of εvarepsilon and the order of accuracy is the same as the design order of the original scheme, i.e., there is no order reduction.
{"title":"Uniform accuracy of implicit-explicit Runge-Kutta (IMEX-RK) schemes for hyperbolic systems with relaxation","authors":"Jingwei Hu, Ruiwen Shu","doi":"10.1090/mcom/3967","DOIUrl":"https://doi.org/10.1090/mcom/3967","url":null,"abstract":"<p>Implicit-explicit Runge-Kutta (IMEX-RK) schemes are popular methods to treat multiscale equations that contain a stiff part and a non-stiff part, where the stiff part is characterized by a small parameter <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"epsilon\"> <mml:semantics> <mml:mi>ε<!-- ε --></mml:mi> <mml:annotation encoding=\"application/x-tex\">varepsilon</mml:annotation> </mml:semantics> </mml:math> </inline-formula>. In this work, we prove rigorously the uniform stability and uniform accuracy of a class of IMEX-RK schemes for a linear hyperbolic system with stiff relaxation. The result we obtain is optimal in the sense that it holds regardless of the value of <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"epsilon\"> <mml:semantics> <mml:mi>ε<!-- ε --></mml:mi> <mml:annotation encoding=\"application/x-tex\">varepsilon</mml:annotation> </mml:semantics> </mml:math> </inline-formula> and the order of accuracy is the same as the design order of the original scheme, i.e., there is no order reduction.</p>","PeriodicalId":18456,"journal":{"name":"Mathematics of Computation","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140942540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}