{"title":"通过热启动算法实现更快的高精度对数凹采样","authors":"Jason M. Altschuler, Sinho Chewi","doi":"10.1145/3653446","DOIUrl":null,"url":null,"abstract":"<p>It is a fundamental problem to understand the complexity of high-accuracy sampling from a strongly log-concave density <i>π</i> on \\(\\mathbb {R}^d \\). Indeed, in practice, high-accuracy samplers such as the Metropolis-adjusted Langevin algorithm (MALA) remain the de facto gold standard; and in theory, via the proximal sampler reduction, it is understood that such samplers are key for sampling even beyond log-concavity (in particular, for sampling under isoperimetric assumptions). This paper improves the dimension dependence of this sampling problem to \\(\\widetilde{O}(d^{1/2}) \\). The previous best result for MALA was \\(\\widetilde{O}(d) \\). This closes the long line of work on the complexity of MALA, and moreover leads to state-of-the-art guarantees for high-accuracy sampling under strong log-concavity and beyond (thanks to the aforementioned reduction). Our starting point is that the complexity of MALA improves to \\(\\widetilde{O}(d^{1/2}) \\), but only under a <i>warm start</i> (an initialization with constant Rényi divergence w.r.t. <i>π</i>). Previous algorithms for finding a warm start took <i>O</i>(<i>d</i>) time and thus dominated the computational effort of sampling. Our main technical contribution resolves this gap by establishing the first \\(\\widetilde{O}(d^{1/2}) \\) Rényi mixing rates for the discretized underdamped Langevin diffusion. For this, we develop new differential-privacy-inspired techniques based on Rényi divergences with Orlicz–Wasserstein shifts, which allow us to sidestep longstanding challenges for proving fast convergence of hypocoercive differential equations.</p>","PeriodicalId":50022,"journal":{"name":"Journal of the ACM","volume":"43 1","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Faster high-accuracy log-concave sampling via algorithmic warm starts\",\"authors\":\"Jason M. Altschuler, Sinho Chewi\",\"doi\":\"10.1145/3653446\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>It is a fundamental problem to understand the complexity of high-accuracy sampling from a strongly log-concave density <i>π</i> on \\\\(\\\\mathbb {R}^d \\\\). Indeed, in practice, high-accuracy samplers such as the Metropolis-adjusted Langevin algorithm (MALA) remain the de facto gold standard; and in theory, via the proximal sampler reduction, it is understood that such samplers are key for sampling even beyond log-concavity (in particular, for sampling under isoperimetric assumptions). This paper improves the dimension dependence of this sampling problem to \\\\(\\\\widetilde{O}(d^{1/2}) \\\\). The previous best result for MALA was \\\\(\\\\widetilde{O}(d) \\\\). This closes the long line of work on the complexity of MALA, and moreover leads to state-of-the-art guarantees for high-accuracy sampling under strong log-concavity and beyond (thanks to the aforementioned reduction). Our starting point is that the complexity of MALA improves to \\\\(\\\\widetilde{O}(d^{1/2}) \\\\), but only under a <i>warm start</i> (an initialization with constant Rényi divergence w.r.t. <i>π</i>). Previous algorithms for finding a warm start took <i>O</i>(<i>d</i>) time and thus dominated the computational effort of sampling. Our main technical contribution resolves this gap by establishing the first \\\\(\\\\widetilde{O}(d^{1/2}) \\\\) Rényi mixing rates for the discretized underdamped Langevin diffusion. For this, we develop new differential-privacy-inspired techniques based on Rényi divergences with Orlicz–Wasserstein shifts, which allow us to sidestep longstanding challenges for proving fast convergence of hypocoercive differential equations.</p>\",\"PeriodicalId\":50022,\"journal\":{\"name\":\"Journal of the ACM\",\"volume\":\"43 1\",\"pages\":\"\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2024-03-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the ACM\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3653446\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the ACM","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3653446","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
摘要
要理解从 \(\mathbb {R}^d \)上的强对数凹密度π进行高精度采样的复杂性,是一个基本问题。事实上,在实践中,诸如 Metropolis-adjusted Langevin 算法(MALA)这样的高精度采样器仍然是事实上的黄金标准;而在理论上,通过近端采样器还原,我们可以理解这种采样器是甚至超越对数凹陷采样的关键(尤其是在等运算假设下的采样)。本文将这个采样问题的维度依赖性提高到了(\widetilde{O}(d^{1/2}) \)。之前 MALA 的最佳结果是(\widetilde{O}(d) \)。这结束了关于 MALA 复杂性的长期研究,而且为强对数凹性及更高精度的采样提供了最先进的保证(这要归功于前面提到的缩减)。我们的出发点是,MALA 的复杂度提高到了\(\widetilde{O}(d^{1/2}) \),但仅限于热启动(具有恒定雷尼发散的π初始化)。以前的暖起点算法需要花费 O(d) 时间,因此在采样的计算量上占优势。我们的主要技术贡献是解决了这一问题,首次建立了 \(\widetilde{O}(d^{1/2}) \)Rényi 混合率。为此,我们开发了新的微分私有性启发技术,该技术基于具有奥立兹-瓦瑟斯坦偏移的雷尼发散,使我们能够避开长期以来证明低胁迫微分方程快速收敛的难题。
Faster high-accuracy log-concave sampling via algorithmic warm starts
It is a fundamental problem to understand the complexity of high-accuracy sampling from a strongly log-concave density π on \(\mathbb {R}^d \). Indeed, in practice, high-accuracy samplers such as the Metropolis-adjusted Langevin algorithm (MALA) remain the de facto gold standard; and in theory, via the proximal sampler reduction, it is understood that such samplers are key for sampling even beyond log-concavity (in particular, for sampling under isoperimetric assumptions). This paper improves the dimension dependence of this sampling problem to \(\widetilde{O}(d^{1/2}) \). The previous best result for MALA was \(\widetilde{O}(d) \). This closes the long line of work on the complexity of MALA, and moreover leads to state-of-the-art guarantees for high-accuracy sampling under strong log-concavity and beyond (thanks to the aforementioned reduction). Our starting point is that the complexity of MALA improves to \(\widetilde{O}(d^{1/2}) \), but only under a warm start (an initialization with constant Rényi divergence w.r.t. π). Previous algorithms for finding a warm start took O(d) time and thus dominated the computational effort of sampling. Our main technical contribution resolves this gap by establishing the first \(\widetilde{O}(d^{1/2}) \) Rényi mixing rates for the discretized underdamped Langevin diffusion. For this, we develop new differential-privacy-inspired techniques based on Rényi divergences with Orlicz–Wasserstein shifts, which allow us to sidestep longstanding challenges for proving fast convergence of hypocoercive differential equations.
期刊介绍:
The best indicator of the scope of the journal is provided by the areas covered by its Editorial Board. These areas change from time to time, as the field evolves. The following areas are currently covered by a member of the Editorial Board: Algorithms and Combinatorial Optimization; Algorithms and Data Structures; Algorithms, Combinatorial Optimization, and Games; Artificial Intelligence; Complexity Theory; Computational Biology; Computational Geometry; Computer Graphics and Computer Vision; Computer-Aided Verification; Cryptography and Security; Cyber-Physical, Embedded, and Real-Time Systems; Database Systems and Theory; Distributed Computing; Economics and Computation; Information Theory; Logic and Computation; Logic, Algorithms, and Complexity; Machine Learning and Computational Learning Theory; Networking; Parallel Computing and Architecture; Programming Languages; Quantum Computing; Randomized Algorithms and Probabilistic Analysis of Algorithms; Scientific Computing and High Performance Computing; Software Engineering; Web Algorithms and Data Mining