{"title":"具有乘法噪声的朗格文模拟退火算法的收敛性","authors":"Pierre Bras, Gilles Pagès","doi":"10.1090/mcom/3899","DOIUrl":null,"url":null,"abstract":"<p>We study the convergence of Langevin-Simulated Annealing type algorithms with multiplicative noise, i.e. for <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"upper V colon double-struck upper R Superscript d Baseline right-arrow double-struck upper R\"> <mml:semantics> <mml:mrow> <mml:mi>V</mml:mi> <mml:mo>:</mml:mo> <mml:msup> <mml:mrow> <mml:mi mathvariant=\"double-struck\">R</mml:mi> </mml:mrow> <mml:mi>d</mml:mi> </mml:msup> <mml:mo stretchy=\"false\">→<!-- → --></mml:mo> <mml:mrow> <mml:mi mathvariant=\"double-struck\">R</mml:mi> </mml:mrow> </mml:mrow> <mml:annotation encoding=\"application/x-tex\">V : \\mathbb {R}^d \\to \\mathbb {R}</mml:annotation> </mml:semantics> </mml:math> </inline-formula> a potential function to minimize, we consider the stochastic differential equation <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"d upper Y Subscript t Baseline equals minus sigma sigma Superscript down-tack Baseline nabla upper V left-parenthesis upper Y Subscript t Baseline right-parenthesis\"> <mml:semantics> <mml:mrow> <mml:mi>d</mml:mi> <mml:msub> <mml:mi>Y</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:mo>=</mml:mo> <mml:mo>−<!-- − --></mml:mo> <mml:mi>σ<!-- σ --></mml:mi> <mml:msup> <mml:mi>σ<!-- σ --></mml:mi> <mml:mi mathvariant=\"normal\">⊤<!-- ⊤ --></mml:mi> </mml:msup> <mml:mi mathvariant=\"normal\">∇<!-- ∇ --></mml:mi> <mml:mi>V</mml:mi> <mml:mo stretchy=\"false\">(</mml:mo> <mml:msub> <mml:mi>Y</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:mo stretchy=\"false\">)</mml:mo> </mml:mrow> <mml:annotation encoding=\"application/x-tex\">dY_t = - \\sigma \\sigma ^\\top \\nabla V(Y_t)</mml:annotation> </mml:semantics> </mml:math> </inline-formula> <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"d t plus a left-parenthesis t right-parenthesis sigma left-parenthesis upper Y Subscript t Baseline right-parenthesis d upper W Subscript t plus a left-parenthesis t right-parenthesis squared normal upper Upsilon left-parenthesis upper Y Subscript t Baseline right-parenthesis d t\"> <mml:semantics> <mml:mrow> <mml:mi>d</mml:mi> <mml:mi>t</mml:mi> <mml:mo>+</mml:mo> <mml:mi>a</mml:mi> <mml:mo stretchy=\"false\">(</mml:mo> <mml:mi>t</mml:mi> <mml:mo stretchy=\"false\">)</mml:mo> <mml:mi>σ<!-- σ --></mml:mi> <mml:mo stretchy=\"false\">(</mml:mo> <mml:msub> <mml:mi>Y</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:mo stretchy=\"false\">)</mml:mo> <mml:mi>d</mml:mi> <mml:msub> <mml:mi>W</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:mo>+</mml:mo> <mml:mi>a</mml:mi> <mml:mo stretchy=\"false\">(</mml:mo> <mml:mi>t</mml:mi> <mml:msup> <mml:mo stretchy=\"false\">)</mml:mo> <mml:mn>2</mml:mn> </mml:msup> <mml:mi mathvariant=\"normal\">Υ<!-- Υ --></mml:mi> <mml:mo stretchy=\"false\">(</mml:mo> <mml:msub> <mml:mi>Y</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:mo stretchy=\"false\">)</mml:mo> <mml:mi>d</mml:mi> <mml:mi>t</mml:mi> </mml:mrow> <mml:annotation encoding=\"application/x-tex\">dt + a(t)\\sigma (Y_t)dW_t + a(t)^2\\Upsilon (Y_t)dt</mml:annotation> </mml:semantics> </mml:math> </inline-formula>, where <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"left-parenthesis upper W Subscript t Baseline right-parenthesis\"> <mml:semantics> <mml:mrow> <mml:mo stretchy=\"false\">(</mml:mo> <mml:msub> <mml:mi>W</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:mo stretchy=\"false\">)</mml:mo> </mml:mrow> <mml:annotation encoding=\"application/x-tex\">(W_t)</mml:annotation> </mml:semantics> </mml:math> </inline-formula> is a Brownian motion, where <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"sigma colon double-struck upper R Superscript d Baseline right-arrow script upper M Subscript d Baseline left-parenthesis double-struck upper R right-parenthesis\"> <mml:semantics> <mml:mrow> <mml:mi>σ<!-- σ --></mml:mi> <mml:mo>:</mml:mo> <mml:msup> <mml:mrow> <mml:mi mathvariant=\"double-struck\">R</mml:mi> </mml:mrow> <mml:mi>d</mml:mi> </mml:msup> <mml:mo stretchy=\"false\">→<!-- → --></mml:mo> <mml:msub> <mml:mrow> <mml:mi mathvariant=\"script\">M</mml:mi> </mml:mrow> <mml:mi>d</mml:mi> </mml:msub> <mml:mo stretchy=\"false\">(</mml:mo> <mml:mrow> <mml:mi mathvariant=\"double-struck\">R</mml:mi> </mml:mrow> <mml:mo stretchy=\"false\">)</mml:mo> </mml:mrow> <mml:annotation encoding=\"application/x-tex\">\\sigma : \\mathbb {R}^d \\to \\mathcal {M}_d(\\mathbb {R})</mml:annotation> </mml:semantics> </mml:math> </inline-formula> is an adaptive (multiplicative) noise, where <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"a colon double-struck upper R Superscript plus Baseline right-arrow double-struck upper R Superscript plus\"> <mml:semantics> <mml:mrow> <mml:mi>a</mml:mi> <mml:mo>:</mml:mo> <mml:msup> <mml:mrow> <mml:mi mathvariant=\"double-struck\">R</mml:mi> </mml:mrow> <mml:mo>+</mml:mo> </mml:msup> <mml:mo stretchy=\"false\">→<!-- → --></mml:mo> <mml:msup> <mml:mrow> <mml:mi mathvariant=\"double-struck\">R</mml:mi> </mml:mrow> <mml:mo>+</mml:mo> </mml:msup> </mml:mrow> <mml:annotation encoding=\"application/x-tex\">a : \\mathbb {R}^+ \\to \\mathbb {R}^+</mml:annotation> </mml:semantics> </mml:math> </inline-formula> is a function decreasing to <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"0\"> <mml:semantics> <mml:mn>0</mml:mn> <mml:annotation encoding=\"application/x-tex\">0</mml:annotation> </mml:semantics> </mml:math> </inline-formula> and where <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"normal upper Upsilon\"> <mml:semantics> <mml:mi mathvariant=\"normal\">Υ<!-- Υ --></mml:mi> <mml:annotation encoding=\"application/x-tex\">\\Upsilon</mml:annotation> </mml:semantics> </mml:math> </inline-formula> is a correction term. This setting can be applied to optimization problems arising in Machine Learning; allowing <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"sigma\"> <mml:semantics> <mml:mi>σ<!-- σ --></mml:mi> <mml:annotation encoding=\"application/x-tex\">\\sigma</mml:annotation> </mml:semantics> </mml:math> </inline-formula> to depend on the position brings faster convergence in comparison with the classical Langevin equation <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"d upper Y Subscript t Baseline equals minus nabla upper V left-parenthesis upper Y Subscript t Baseline right-parenthesis d t plus sigma d upper W Subscript t\"> <mml:semantics> <mml:mrow> <mml:mi>d</mml:mi> <mml:msub> <mml:mi>Y</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:mo>=</mml:mo> <mml:mo>−<!-- − --></mml:mo> <mml:mi mathvariant=\"normal\">∇<!-- ∇ --></mml:mi> <mml:mi>V</mml:mi> <mml:mo stretchy=\"false\">(</mml:mo> <mml:msub> <mml:mi>Y</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:mo stretchy=\"false\">)</mml:mo> <mml:mi>d</mml:mi> <mml:mi>t</mml:mi> <mml:mo>+</mml:mo> <mml:mi>σ<!-- σ --></mml:mi> <mml:mi>d</mml:mi> <mml:msub> <mml:mi>W</mml:mi> <mml:mi>t</mml:mi> </mml:msub> </mml:mrow> <mml:annotation encoding=\"application/x-tex\">dY_t = -\\nabla V(Y_t)dt + \\sigma dW_t</mml:annotation> </mml:semantics> </mml:math> </inline-formula>. The case where <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"sigma\"> <mml:semantics> <mml:mi>σ<!-- σ --></mml:mi> <mml:annotation encoding=\"application/x-tex\">\\sigma</mml:annotation> </mml:semantics> </mml:math> </inline-formula> is a constant matrix has been extensively studied; however little attention has been paid to the general case. We prove the convergence for the <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"upper L Superscript 1\"> <mml:semantics> <mml:msup> <mml:mi>L</mml:mi> <mml:mn>1</mml:mn> </mml:msup> <mml:annotation encoding=\"application/x-tex\">L^1</mml:annotation> </mml:semantics> </mml:math> </inline-formula>-Wasserstein distance of <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"upper Y Subscript t\"> <mml:semantics> <mml:msub> <mml:mi>Y</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:annotation encoding=\"application/x-tex\">Y_t</mml:annotation> </mml:semantics> </mml:math> </inline-formula> and of the associated Euler scheme <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"upper Y overbar Subscript t\"> <mml:semantics> <mml:msub> <mml:mrow> <mml:mover> <mml:mi>Y</mml:mi> <mml:mo stretchy=\"false\">¯<!-- ¯ --></mml:mo> </mml:mover> </mml:mrow> <mml:mi>t</mml:mi> </mml:msub> <mml:annotation encoding=\"application/x-tex\">\\bar {Y}_t</mml:annotation> </mml:semantics> </mml:math> </inline-formula> to some measure <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"nu Superscript star\"> <mml:semantics> <mml:msup> <mml:mi>ν<!-- ν --></mml:mi> <mml:mo>⋆<!-- ⋆ --></mml:mo> </mml:msup> <mml:annotation encoding=\"application/x-tex\">\\nu ^\\star</mml:annotation> </mml:semantics> </mml:math> </inline-formula> which is supported by <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"a r g m i n left-parenthesis upper V right-parenthesis\"> <mml:semantics> <mml:mrow> <mml:mi>argmin</mml:mi> <mml:mo><!-- --></mml:mo> <mml:mo stretchy=\"false\">(</mml:mo> <mml:mi>V</mml:mi> <mml:mo stretchy=\"false\">)</mml:mo> </mml:mrow> <mml:annotation encoding=\"application/x-tex\">\\operatorname {argmin}(V)</mml:annotation> </mml:semantics> </mml:math> </inline-formula> and give rates of convergence to the instantaneous Gibbs measure <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"nu Subscript a left-parenthesis t right-parenthesis\"> <mml:semantics> <mml:msub> <mml:mi>ν<!-- ν --></mml:mi> <mml:mrow> <mml:mi>a</mml:mi> <mml:mo stretchy=\"false\">(</mml:mo> <mml:mi>t</mml:mi> <mml:mo stretchy=\"false\">)</mml:mo> </mml:mrow> </mml:msub> <mml:annotation encoding=\"application/x-tex\">\\nu _{a(t)}</mml:annotation> </mml:semantics> </mml:math> </inline-formula> of density <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"proportional-to exp left-parenthesis minus 2 upper V left-parenthesis x right-parenthesis slash a left-parenthesis t right-parenthesis squared right-parenthesis\"> <mml:semantics> <mml:mrow> <mml:mo>∝<!-- ∝ --></mml:mo> <mml:mi>exp</mml:mi> <mml:mo><!-- --></mml:mo> <mml:mo stretchy=\"false\">(</mml:mo> <mml:mo>−<!-- − --></mml:mo> <mml:mn>2</mml:mn> <mml:mi>V</mml:mi> <mml:mo stretchy=\"false\">(</mml:mo> <mml:mi>x</mml:mi> <mml:mo stretchy=\"false\">)</mml:mo> <mml:mrow> <mml:mo>/</mml:mo> </mml:mrow> <mml:mi>a</mml:mi> <mml:mo stretchy=\"false\">(</mml:mo> <mml:mi>t</mml:mi> <mml:msup> <mml:mo stretchy=\"false\">)</mml:mo> <mml:mn>2</mml:mn> </mml:msup> <mml:mo stretchy=\"false\">)</mml:mo> </mml:mrow> <mml:annotation encoding=\"application/x-tex\">\\propto \\exp (-2V(x)/a(t)^2)</mml:annotation> </mml:semantics> </mml:math> </inline-formula>. To do so, we first consider the case where <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"a\"> <mml:semantics> <mml:mi>a</mml:mi> <mml:annotation encoding=\"application/x-tex\">a</mml:annotation> </mml:semantics> </mml:math> </inline-formula> is a piecewise constant function. We find again the classical schedule <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"a left-parenthesis t right-parenthesis equals upper A log Superscript negative 1 slash 2 Baseline left-parenthesis t right-parenthesis\"> <mml:semantics> <mml:mrow> <mml:mi>a</mml:mi> <mml:mo stretchy=\"false\">(</mml:mo> <mml:mi>t</mml:mi> <mml:mo stretchy=\"false\">)</mml:mo> <mml:mo>=</mml:mo> <mml:mi>A</mml:mi> <mml:msup> <mml:mi>log</mml:mi> <mml:mrow> <mml:mo>−<!-- − --></mml:mo> <mml:mn>1</mml:mn> <mml:mrow> <mml:mo>/</mml:mo> </mml:mrow> <mml:mn>2</mml:mn> </mml:mrow> </mml:msup> <mml:mo><!-- --></mml:mo> <mml:mo stretchy=\"false\">(</mml:mo> <mml:mi>t</mml:mi> <mml:mo stretchy=\"false\">)</mml:mo> </mml:mrow> <mml:annotation encoding=\"application/x-tex\">a(t) = A\\log ^{-1/2}(t)</mml:annotation> </mml:semantics> </mml:math> </inline-formula>. We then prove the convergence for the general case by giving bounds for the Wasserstein distance to the stepwise constant case using ergodicity properties.</p>","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Convergence of Langevin-simulated annealing algorithms with multiplicative noise\",\"authors\":\"Pierre Bras, Gilles Pagès\",\"doi\":\"10.1090/mcom/3899\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>We study the convergence of Langevin-Simulated Annealing type algorithms with multiplicative noise, i.e. for <inline-formula content-type=\\\"math/mathml\\\"> <mml:math xmlns:mml=\\\"http://www.w3.org/1998/Math/MathML\\\" alttext=\\\"upper V colon double-struck upper R Superscript d Baseline right-arrow double-struck upper R\\\"> <mml:semantics> <mml:mrow> <mml:mi>V</mml:mi> <mml:mo>:</mml:mo> <mml:msup> <mml:mrow> <mml:mi mathvariant=\\\"double-struck\\\">R</mml:mi> </mml:mrow> <mml:mi>d</mml:mi> </mml:msup> <mml:mo stretchy=\\\"false\\\">→<!-- → --></mml:mo> <mml:mrow> <mml:mi mathvariant=\\\"double-struck\\\">R</mml:mi> </mml:mrow> </mml:mrow> <mml:annotation encoding=\\\"application/x-tex\\\">V : \\\\mathbb {R}^d \\\\to \\\\mathbb {R}</mml:annotation> </mml:semantics> </mml:math> </inline-formula> a potential function to minimize, we consider the stochastic differential equation <inline-formula content-type=\\\"math/mathml\\\"> <mml:math xmlns:mml=\\\"http://www.w3.org/1998/Math/MathML\\\" alttext=\\\"d upper Y Subscript t Baseline equals minus sigma sigma Superscript down-tack Baseline nabla upper V left-parenthesis upper Y Subscript t Baseline right-parenthesis\\\"> <mml:semantics> <mml:mrow> <mml:mi>d</mml:mi> <mml:msub> <mml:mi>Y</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:mo>=</mml:mo> <mml:mo>−<!-- − --></mml:mo> <mml:mi>σ<!-- σ --></mml:mi> <mml:msup> <mml:mi>σ<!-- σ --></mml:mi> <mml:mi mathvariant=\\\"normal\\\">⊤<!-- ⊤ --></mml:mi> </mml:msup> <mml:mi mathvariant=\\\"normal\\\">∇<!-- ∇ --></mml:mi> <mml:mi>V</mml:mi> <mml:mo stretchy=\\\"false\\\">(</mml:mo> <mml:msub> <mml:mi>Y</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:mo stretchy=\\\"false\\\">)</mml:mo> </mml:mrow> <mml:annotation encoding=\\\"application/x-tex\\\">dY_t = - \\\\sigma \\\\sigma ^\\\\top \\\\nabla V(Y_t)</mml:annotation> </mml:semantics> </mml:math> </inline-formula> <inline-formula content-type=\\\"math/mathml\\\"> <mml:math xmlns:mml=\\\"http://www.w3.org/1998/Math/MathML\\\" alttext=\\\"d t plus a left-parenthesis t right-parenthesis sigma left-parenthesis upper Y Subscript t Baseline right-parenthesis d upper W Subscript t plus a left-parenthesis t right-parenthesis squared normal upper Upsilon left-parenthesis upper Y Subscript t Baseline right-parenthesis d t\\\"> <mml:semantics> <mml:mrow> <mml:mi>d</mml:mi> <mml:mi>t</mml:mi> <mml:mo>+</mml:mo> <mml:mi>a</mml:mi> <mml:mo stretchy=\\\"false\\\">(</mml:mo> <mml:mi>t</mml:mi> <mml:mo stretchy=\\\"false\\\">)</mml:mo> <mml:mi>σ<!-- σ --></mml:mi> <mml:mo stretchy=\\\"false\\\">(</mml:mo> <mml:msub> <mml:mi>Y</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:mo stretchy=\\\"false\\\">)</mml:mo> <mml:mi>d</mml:mi> <mml:msub> <mml:mi>W</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:mo>+</mml:mo> <mml:mi>a</mml:mi> <mml:mo stretchy=\\\"false\\\">(</mml:mo> <mml:mi>t</mml:mi> <mml:msup> <mml:mo stretchy=\\\"false\\\">)</mml:mo> <mml:mn>2</mml:mn> </mml:msup> <mml:mi mathvariant=\\\"normal\\\">Υ<!-- Υ --></mml:mi> <mml:mo stretchy=\\\"false\\\">(</mml:mo> <mml:msub> <mml:mi>Y</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:mo stretchy=\\\"false\\\">)</mml:mo> <mml:mi>d</mml:mi> <mml:mi>t</mml:mi> </mml:mrow> <mml:annotation encoding=\\\"application/x-tex\\\">dt + a(t)\\\\sigma (Y_t)dW_t + a(t)^2\\\\Upsilon (Y_t)dt</mml:annotation> </mml:semantics> </mml:math> </inline-formula>, where <inline-formula content-type=\\\"math/mathml\\\"> <mml:math xmlns:mml=\\\"http://www.w3.org/1998/Math/MathML\\\" alttext=\\\"left-parenthesis upper W Subscript t Baseline right-parenthesis\\\"> <mml:semantics> <mml:mrow> <mml:mo stretchy=\\\"false\\\">(</mml:mo> <mml:msub> <mml:mi>W</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:mo stretchy=\\\"false\\\">)</mml:mo> </mml:mrow> <mml:annotation encoding=\\\"application/x-tex\\\">(W_t)</mml:annotation> </mml:semantics> </mml:math> </inline-formula> is a Brownian motion, where <inline-formula content-type=\\\"math/mathml\\\"> <mml:math xmlns:mml=\\\"http://www.w3.org/1998/Math/MathML\\\" alttext=\\\"sigma colon double-struck upper R Superscript d Baseline right-arrow script upper M Subscript d Baseline left-parenthesis double-struck upper R right-parenthesis\\\"> <mml:semantics> <mml:mrow> <mml:mi>σ<!-- σ --></mml:mi> <mml:mo>:</mml:mo> <mml:msup> <mml:mrow> <mml:mi mathvariant=\\\"double-struck\\\">R</mml:mi> </mml:mrow> <mml:mi>d</mml:mi> </mml:msup> <mml:mo stretchy=\\\"false\\\">→<!-- → --></mml:mo> <mml:msub> <mml:mrow> <mml:mi mathvariant=\\\"script\\\">M</mml:mi> </mml:mrow> <mml:mi>d</mml:mi> </mml:msub> <mml:mo stretchy=\\\"false\\\">(</mml:mo> <mml:mrow> <mml:mi mathvariant=\\\"double-struck\\\">R</mml:mi> </mml:mrow> <mml:mo stretchy=\\\"false\\\">)</mml:mo> </mml:mrow> <mml:annotation encoding=\\\"application/x-tex\\\">\\\\sigma : \\\\mathbb {R}^d \\\\to \\\\mathcal {M}_d(\\\\mathbb {R})</mml:annotation> </mml:semantics> </mml:math> </inline-formula> is an adaptive (multiplicative) noise, where <inline-formula content-type=\\\"math/mathml\\\"> <mml:math xmlns:mml=\\\"http://www.w3.org/1998/Math/MathML\\\" alttext=\\\"a colon double-struck upper R Superscript plus Baseline right-arrow double-struck upper R Superscript plus\\\"> <mml:semantics> <mml:mrow> <mml:mi>a</mml:mi> <mml:mo>:</mml:mo> <mml:msup> <mml:mrow> <mml:mi mathvariant=\\\"double-struck\\\">R</mml:mi> </mml:mrow> <mml:mo>+</mml:mo> </mml:msup> <mml:mo stretchy=\\\"false\\\">→<!-- → --></mml:mo> <mml:msup> <mml:mrow> <mml:mi mathvariant=\\\"double-struck\\\">R</mml:mi> </mml:mrow> <mml:mo>+</mml:mo> </mml:msup> </mml:mrow> <mml:annotation encoding=\\\"application/x-tex\\\">a : \\\\mathbb {R}^+ \\\\to \\\\mathbb {R}^+</mml:annotation> </mml:semantics> </mml:math> </inline-formula> is a function decreasing to <inline-formula content-type=\\\"math/mathml\\\"> <mml:math xmlns:mml=\\\"http://www.w3.org/1998/Math/MathML\\\" alttext=\\\"0\\\"> <mml:semantics> <mml:mn>0</mml:mn> <mml:annotation encoding=\\\"application/x-tex\\\">0</mml:annotation> </mml:semantics> </mml:math> </inline-formula> and where <inline-formula content-type=\\\"math/mathml\\\"> <mml:math xmlns:mml=\\\"http://www.w3.org/1998/Math/MathML\\\" alttext=\\\"normal upper Upsilon\\\"> <mml:semantics> <mml:mi mathvariant=\\\"normal\\\">Υ<!-- Υ --></mml:mi> <mml:annotation encoding=\\\"application/x-tex\\\">\\\\Upsilon</mml:annotation> </mml:semantics> </mml:math> </inline-formula> is a correction term. This setting can be applied to optimization problems arising in Machine Learning; allowing <inline-formula content-type=\\\"math/mathml\\\"> <mml:math xmlns:mml=\\\"http://www.w3.org/1998/Math/MathML\\\" alttext=\\\"sigma\\\"> <mml:semantics> <mml:mi>σ<!-- σ --></mml:mi> <mml:annotation encoding=\\\"application/x-tex\\\">\\\\sigma</mml:annotation> </mml:semantics> </mml:math> </inline-formula> to depend on the position brings faster convergence in comparison with the classical Langevin equation <inline-formula content-type=\\\"math/mathml\\\"> <mml:math xmlns:mml=\\\"http://www.w3.org/1998/Math/MathML\\\" alttext=\\\"d upper Y Subscript t Baseline equals minus nabla upper V left-parenthesis upper Y Subscript t Baseline right-parenthesis d t plus sigma d upper W Subscript t\\\"> <mml:semantics> <mml:mrow> <mml:mi>d</mml:mi> <mml:msub> <mml:mi>Y</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:mo>=</mml:mo> <mml:mo>−<!-- − --></mml:mo> <mml:mi mathvariant=\\\"normal\\\">∇<!-- ∇ --></mml:mi> <mml:mi>V</mml:mi> <mml:mo stretchy=\\\"false\\\">(</mml:mo> <mml:msub> <mml:mi>Y</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:mo stretchy=\\\"false\\\">)</mml:mo> <mml:mi>d</mml:mi> <mml:mi>t</mml:mi> <mml:mo>+</mml:mo> <mml:mi>σ<!-- σ --></mml:mi> <mml:mi>d</mml:mi> <mml:msub> <mml:mi>W</mml:mi> <mml:mi>t</mml:mi> </mml:msub> </mml:mrow> <mml:annotation encoding=\\\"application/x-tex\\\">dY_t = -\\\\nabla V(Y_t)dt + \\\\sigma dW_t</mml:annotation> </mml:semantics> </mml:math> </inline-formula>. The case where <inline-formula content-type=\\\"math/mathml\\\"> <mml:math xmlns:mml=\\\"http://www.w3.org/1998/Math/MathML\\\" alttext=\\\"sigma\\\"> <mml:semantics> <mml:mi>σ<!-- σ --></mml:mi> <mml:annotation encoding=\\\"application/x-tex\\\">\\\\sigma</mml:annotation> </mml:semantics> </mml:math> </inline-formula> is a constant matrix has been extensively studied; however little attention has been paid to the general case. We prove the convergence for the <inline-formula content-type=\\\"math/mathml\\\"> <mml:math xmlns:mml=\\\"http://www.w3.org/1998/Math/MathML\\\" alttext=\\\"upper L Superscript 1\\\"> <mml:semantics> <mml:msup> <mml:mi>L</mml:mi> <mml:mn>1</mml:mn> </mml:msup> <mml:annotation encoding=\\\"application/x-tex\\\">L^1</mml:annotation> </mml:semantics> </mml:math> </inline-formula>-Wasserstein distance of <inline-formula content-type=\\\"math/mathml\\\"> <mml:math xmlns:mml=\\\"http://www.w3.org/1998/Math/MathML\\\" alttext=\\\"upper Y Subscript t\\\"> <mml:semantics> <mml:msub> <mml:mi>Y</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:annotation encoding=\\\"application/x-tex\\\">Y_t</mml:annotation> </mml:semantics> </mml:math> </inline-formula> and of the associated Euler scheme <inline-formula content-type=\\\"math/mathml\\\"> <mml:math xmlns:mml=\\\"http://www.w3.org/1998/Math/MathML\\\" alttext=\\\"upper Y overbar Subscript t\\\"> <mml:semantics> <mml:msub> <mml:mrow> <mml:mover> <mml:mi>Y</mml:mi> <mml:mo stretchy=\\\"false\\\">¯<!-- ¯ --></mml:mo> </mml:mover> </mml:mrow> <mml:mi>t</mml:mi> </mml:msub> <mml:annotation encoding=\\\"application/x-tex\\\">\\\\bar {Y}_t</mml:annotation> </mml:semantics> </mml:math> </inline-formula> to some measure <inline-formula content-type=\\\"math/mathml\\\"> <mml:math xmlns:mml=\\\"http://www.w3.org/1998/Math/MathML\\\" alttext=\\\"nu Superscript star\\\"> <mml:semantics> <mml:msup> <mml:mi>ν<!-- ν --></mml:mi> <mml:mo>⋆<!-- ⋆ --></mml:mo> </mml:msup> <mml:annotation encoding=\\\"application/x-tex\\\">\\\\nu ^\\\\star</mml:annotation> </mml:semantics> </mml:math> </inline-formula> which is supported by <inline-formula content-type=\\\"math/mathml\\\"> <mml:math xmlns:mml=\\\"http://www.w3.org/1998/Math/MathML\\\" alttext=\\\"a r g m i n left-parenthesis upper V right-parenthesis\\\"> <mml:semantics> <mml:mrow> <mml:mi>argmin</mml:mi> <mml:mo><!-- --></mml:mo> <mml:mo stretchy=\\\"false\\\">(</mml:mo> <mml:mi>V</mml:mi> <mml:mo stretchy=\\\"false\\\">)</mml:mo> </mml:mrow> <mml:annotation encoding=\\\"application/x-tex\\\">\\\\operatorname {argmin}(V)</mml:annotation> </mml:semantics> </mml:math> </inline-formula> and give rates of convergence to the instantaneous Gibbs measure <inline-formula content-type=\\\"math/mathml\\\"> <mml:math xmlns:mml=\\\"http://www.w3.org/1998/Math/MathML\\\" alttext=\\\"nu Subscript a left-parenthesis t right-parenthesis\\\"> <mml:semantics> <mml:msub> <mml:mi>ν<!-- ν --></mml:mi> <mml:mrow> <mml:mi>a</mml:mi> <mml:mo stretchy=\\\"false\\\">(</mml:mo> <mml:mi>t</mml:mi> <mml:mo stretchy=\\\"false\\\">)</mml:mo> </mml:mrow> </mml:msub> <mml:annotation encoding=\\\"application/x-tex\\\">\\\\nu _{a(t)}</mml:annotation> </mml:semantics> </mml:math> </inline-formula> of density <inline-formula content-type=\\\"math/mathml\\\"> <mml:math xmlns:mml=\\\"http://www.w3.org/1998/Math/MathML\\\" alttext=\\\"proportional-to exp left-parenthesis minus 2 upper V left-parenthesis x right-parenthesis slash a left-parenthesis t right-parenthesis squared right-parenthesis\\\"> <mml:semantics> <mml:mrow> <mml:mo>∝<!-- ∝ --></mml:mo> <mml:mi>exp</mml:mi> <mml:mo><!-- --></mml:mo> <mml:mo stretchy=\\\"false\\\">(</mml:mo> <mml:mo>−<!-- − --></mml:mo> <mml:mn>2</mml:mn> <mml:mi>V</mml:mi> <mml:mo stretchy=\\\"false\\\">(</mml:mo> <mml:mi>x</mml:mi> <mml:mo stretchy=\\\"false\\\">)</mml:mo> <mml:mrow> <mml:mo>/</mml:mo> </mml:mrow> <mml:mi>a</mml:mi> <mml:mo stretchy=\\\"false\\\">(</mml:mo> <mml:mi>t</mml:mi> <mml:msup> <mml:mo stretchy=\\\"false\\\">)</mml:mo> <mml:mn>2</mml:mn> </mml:msup> <mml:mo stretchy=\\\"false\\\">)</mml:mo> </mml:mrow> <mml:annotation encoding=\\\"application/x-tex\\\">\\\\propto \\\\exp (-2V(x)/a(t)^2)</mml:annotation> </mml:semantics> </mml:math> </inline-formula>. To do so, we first consider the case where <inline-formula content-type=\\\"math/mathml\\\"> <mml:math xmlns:mml=\\\"http://www.w3.org/1998/Math/MathML\\\" alttext=\\\"a\\\"> <mml:semantics> <mml:mi>a</mml:mi> <mml:annotation encoding=\\\"application/x-tex\\\">a</mml:annotation> </mml:semantics> </mml:math> </inline-formula> is a piecewise constant function. We find again the classical schedule <inline-formula content-type=\\\"math/mathml\\\"> <mml:math xmlns:mml=\\\"http://www.w3.org/1998/Math/MathML\\\" alttext=\\\"a left-parenthesis t right-parenthesis equals upper A log Superscript negative 1 slash 2 Baseline left-parenthesis t right-parenthesis\\\"> <mml:semantics> <mml:mrow> <mml:mi>a</mml:mi> <mml:mo stretchy=\\\"false\\\">(</mml:mo> <mml:mi>t</mml:mi> <mml:mo stretchy=\\\"false\\\">)</mml:mo> <mml:mo>=</mml:mo> <mml:mi>A</mml:mi> <mml:msup> <mml:mi>log</mml:mi> <mml:mrow> <mml:mo>−<!-- − --></mml:mo> <mml:mn>1</mml:mn> <mml:mrow> <mml:mo>/</mml:mo> </mml:mrow> <mml:mn>2</mml:mn> </mml:mrow> </mml:msup> <mml:mo><!-- --></mml:mo> <mml:mo stretchy=\\\"false\\\">(</mml:mo> <mml:mi>t</mml:mi> <mml:mo stretchy=\\\"false\\\">)</mml:mo> </mml:mrow> <mml:annotation encoding=\\\"application/x-tex\\\">a(t) = A\\\\log ^{-1/2}(t)</mml:annotation> </mml:semantics> </mml:math> </inline-formula>. We then prove the convergence for the general case by giving bounds for the Wasserstein distance to the stepwise constant case using ergodicity properties.</p>\",\"PeriodicalId\":2,\"journal\":{\"name\":\"ACS Applied Bio Materials\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2024-03-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Bio Materials\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1090/mcom/3899\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATERIALS SCIENCE, BIOMATERIALS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1090/mcom/3899","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
引用次数: 0
摘要
我们研究了带有乘法噪声的朗格文模拟退火算法的收敛性,即对于 V : R d → R V :\mathbb {R}^d \to \mathbb {R} 的势函数最小化、我们考虑随机微分方程 d Y t = - σ σ ⊤∇ V ( Y t ) dY_t = -\V(Y_t) d t + a ( t ) σ ( Y t ) d W t + a ( t ) 2 Υ ( Y t ) d t dt + a(t)\sigma (Y_t)dW_t + a(t)^2\Upsilon (Y_t)dt 、其中 ( W t ) (W_t) 是布朗运动,其中 σ : R d → M d ( R ) σ : \mathbb {R}^d \to \mathcal {M}_d(\mathbb {R}) 是一个自适应(乘法)噪声,其中 a : R + → R + a : \mathbb {R}^+ \to \mathbb {R}^+ 是一个递减到 0 0 的函数,Υ \Upsilon 是一个修正项。这种设置可以应用于机器学习中出现的优化问题;与经典的朗格文方程 d Y t = -∇ V ( Y t ) d t + σ d W t dY_t = -\nabla V(Y_t)dt + \sigma dW_t 相比,允许 σ \sigma 取决于位置会带来更快的收敛速度。σ \sigma 是常量矩阵的情况已被广泛研究,但对一般情况的研究却很少。我们证明了 Y t 的 L 1 L^1 - Wasserstein 距离的收敛性。我们证明了 Y t Y_t 和相关欧拉方案 Y ¯ t (bar {Y}_t)的瓦瑟斯坦距离收敛于某个由 argmin ( V ) \operatorname {argmin}(V) 支持的度量 ν ⋆ \nu ^\star ,并给出了密度 ∝ exp ( - 2 V ( x ) / a ( t ) 2 ) 的瞬时吉布斯度量 ν a ( t ) \nu _{a(t)} 的收敛速率。 \propto \exp (-2V(x)/a(t)^2) .为此,我们首先考虑 a a 是片断常数函数的情况。我们再次找到经典的时间表 a ( t ) = A log - 1 / 2 ( t ) a(t) = A\log ^{-1/2}(t) 。然后,我们利用遍历特性给出了步进常数情况下的瓦瑟斯坦距离的边界,从而证明了一般情况下的收敛性。
Convergence of Langevin-simulated annealing algorithms with multiplicative noise
We study the convergence of Langevin-Simulated Annealing type algorithms with multiplicative noise, i.e. for V:Rd→RV : \mathbb {R}^d \to \mathbb {R} a potential function to minimize, we consider the stochastic differential equation dYt=−σσ⊤∇V(Yt)dY_t = - \sigma \sigma ^\top \nabla V(Y_t)dt+a(t)σ(Yt)dWt+a(t)2Υ(Yt)dtdt + a(t)\sigma (Y_t)dW_t + a(t)^2\Upsilon (Y_t)dt, where (Wt)(W_t) is a Brownian motion, where σ:Rd→Md(R)\sigma : \mathbb {R}^d \to \mathcal {M}_d(\mathbb {R}) is an adaptive (multiplicative) noise, where a:R+→R+a : \mathbb {R}^+ \to \mathbb {R}^+ is a function decreasing to 00 and where Υ\Upsilon is a correction term. This setting can be applied to optimization problems arising in Machine Learning; allowing σ\sigma to depend on the position brings faster convergence in comparison with the classical Langevin equation dYt=−∇V(Yt)dt+σdWtdY_t = -\nabla V(Y_t)dt + \sigma dW_t. The case where σ\sigma is a constant matrix has been extensively studied; however little attention has been paid to the general case. We prove the convergence for the L1L^1-Wasserstein distance of YtY_t and of the associated Euler scheme Y¯t\bar {Y}_t to some measure ν⋆\nu ^\star which is supported by argmin(V)\operatorname {argmin}(V) and give rates of convergence to the instantaneous Gibbs measure νa(t)\nu _{a(t)} of density ∝exp(−2V(x)/a(t)2)\propto \exp (-2V(x)/a(t)^2). To do so, we first consider the case where aa is a piecewise constant function. We find again the classical schedule a(t)=Alog−1/2(t)a(t) = A\log ^{-1/2}(t). We then prove the convergence for the general case by giving bounds for the Wasserstein distance to the stepwise constant case using ergodicity properties.