SIAM Journal on Optimization, Volume 34, Issue 3, Page 2917-2942, September 2024. Abstract. This paper considers a multiproduct pricing problem under pure characteristics demand models when the probability distribution of the random parameter in the problem is uncertain. We formulate this problem as a distributionally robust optimization (DRO) problem based on a constructive approach to estimating pure characteristics demand models with pricing by Pang, Su, and Lee. In this model, the consumers’ purchase decision is to maximize their utility. We show that the DRO problem is well-defined, and the objective function is upper semicontinuous by using an equivalent hierarchical form. We also use the data-driven approach to analyze the DRO problem when the ambiguity set, i.e., a set of probability distributions that contains some exact information of the underlying probability distribution, is given by a general moment-based case. We give convergence results as the data size tends to infinity and analyze the quantitative statistical robustness in view of the possible contamination of driven data. Furthermore, we use the Lagrange duality to reformulate the DRO problem as a mathematical program with complementarity constraints, and give a numerical procedure for finding a global solution of the DRO problem under certain specific settings. Finally, we report numerical results that validate the effectiveness and scalability of our approach for the distributionally robust multiproduct pricing problem.
{"title":"Data-Driven Distributionally Robust Multiproduct Pricing Problems under Pure Characteristics Demand Models","authors":"Jie Jiang, Hailin Sun, Xiaojun Chen","doi":"10.1137/23m1585131","DOIUrl":"https://doi.org/10.1137/23m1585131","url":null,"abstract":"SIAM Journal on Optimization, Volume 34, Issue 3, Page 2917-2942, September 2024. <br/> Abstract. This paper considers a multiproduct pricing problem under pure characteristics demand models when the probability distribution of the random parameter in the problem is uncertain. We formulate this problem as a distributionally robust optimization (DRO) problem based on a constructive approach to estimating pure characteristics demand models with pricing by Pang, Su, and Lee. In this model, the consumers’ purchase decision is to maximize their utility. We show that the DRO problem is well-defined, and the objective function is upper semicontinuous by using an equivalent hierarchical form. We also use the data-driven approach to analyze the DRO problem when the ambiguity set, i.e., a set of probability distributions that contains some exact information of the underlying probability distribution, is given by a general moment-based case. We give convergence results as the data size tends to infinity and analyze the quantitative statistical robustness in view of the possible contamination of driven data. Furthermore, we use the Lagrange duality to reformulate the DRO problem as a mathematical program with complementarity constraints, and give a numerical procedure for finding a global solution of the DRO problem under certain specific settings. Finally, we report numerical results that validate the effectiveness and scalability of our approach for the distributionally robust multiproduct pricing problem.","PeriodicalId":49529,"journal":{"name":"SIAM Journal on Optimization","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142212459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIAM Journal on Optimization, Volume 34, Issue 3, Page 2856-2882, September 2024. Abstract. In this paper, we develop new discrete relaxations for nonlinear expressions in factorable programming. We utilize specialized convexification results as well as composite relaxations to develop mixed-integer programming relaxations. Our relaxations rely on ideal formulations of convex hulls of outer-functions over a combinatorial structure that captures local inner-function structure. The resulting relaxations often require fewer variables and are tighter than currently prevalent ones. Finally, we provide computational evidence to demonstrate that our relaxations close approximately 60%–70% of the gap relative to McCormick relaxations and significantly improve the relaxations used in a state-of-the-art solver on various instances involving polynomial functions.
{"title":"MIP Relaxations in Factorable Programming","authors":"Taotao He, Mohit Tawarmalani","doi":"10.1137/22m1515537","DOIUrl":"https://doi.org/10.1137/22m1515537","url":null,"abstract":"SIAM Journal on Optimization, Volume 34, Issue 3, Page 2856-2882, September 2024. <br/> Abstract. In this paper, we develop new discrete relaxations for nonlinear expressions in factorable programming. We utilize specialized convexification results as well as composite relaxations to develop mixed-integer programming relaxations. Our relaxations rely on ideal formulations of convex hulls of outer-functions over a combinatorial structure that captures local inner-function structure. The resulting relaxations often require fewer variables and are tighter than currently prevalent ones. Finally, we provide computational evidence to demonstrate that our relaxations close approximately 60%–70% of the gap relative to McCormick relaxations and significantly improve the relaxations used in a state-of-the-art solver on various instances involving polynomial functions.","PeriodicalId":49529,"journal":{"name":"SIAM Journal on Optimization","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142212461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael Hintermüller, Thomas M. Surowiec, Mike Theiß
SIAM Journal on Optimization, Volume 34, Issue 3, Page 2821-2855, September 2024. Abstract. We consider a class of [math]-player linear quadratic differential generalized Nash equilibrium problems (GNEPs) with bound constraints on the individual control and state variables. In addition, we assume the individual players’ optimal control problems are coupled through their dynamics and objectives via a time-dependent mean field interaction term. This assumption allows us to model the realistic setting that strategic players in large games cannot observe the individual states of their competitors. We observe that the GNEPs require a constraint qualification, which necessitates sufficient robustness of the individuals, in order to prove the existence of an open-loop pure strategy Nash equilibrium and to derive optimality conditions. In order to gain qualitative insight into the [math]-player game, we assume that players are identical and pass to the limit in [math] to derive a type of first-order constrained mean field game (MFG). We prove that the mean field interaction terms converge to an absolutely continuous curve of probability measures on the set of possible state trajectories. Using variational convergence methods, we show that the optimal control problems converge to a representative agent problem. Under additional regularity assumptions, we provide an explicit form for the mean field term as the solution of a continuity equation and demonstrate the link back to the [math]-player GNEP.
{"title":"On a Differential Generalized Nash Equilibrium Problem with Mean Field Interaction","authors":"Michael Hintermüller, Thomas M. Surowiec, Mike Theiß","doi":"10.1137/22m1489952","DOIUrl":"https://doi.org/10.1137/22m1489952","url":null,"abstract":"SIAM Journal on Optimization, Volume 34, Issue 3, Page 2821-2855, September 2024. <br/> Abstract. We consider a class of [math]-player linear quadratic differential generalized Nash equilibrium problems (GNEPs) with bound constraints on the individual control and state variables. In addition, we assume the individual players’ optimal control problems are coupled through their dynamics and objectives via a time-dependent mean field interaction term. This assumption allows us to model the realistic setting that strategic players in large games cannot observe the individual states of their competitors. We observe that the GNEPs require a constraint qualification, which necessitates sufficient robustness of the individuals, in order to prove the existence of an open-loop pure strategy Nash equilibrium and to derive optimality conditions. In order to gain qualitative insight into the [math]-player game, we assume that players are identical and pass to the limit in [math] to derive a type of first-order constrained mean field game (MFG). We prove that the mean field interaction terms converge to an absolutely continuous curve of probability measures on the set of possible state trajectories. Using variational convergence methods, we show that the optimal control problems converge to a representative agent problem. Under additional regularity assumptions, we provide an explicit form for the mean field term as the solution of a continuity equation and demonstrate the link back to the [math]-player GNEP.","PeriodicalId":49529,"journal":{"name":"SIAM Journal on Optimization","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142212462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIAM Journal on Optimization, Volume 34, Issue 3, Page 2788-2820, September 2024. Abstract. We study the combination of proximal gradient descent with multigrid for solving a class of possibly nonsmooth strongly convex optimization problems. We propose a multigrid proximal gradient method called MGProx, which accelerates the proximal gradient method by multigrid, based on using hierarchical information of the optimization problem. MGProx applies a newly introduced adaptive restriction operator to simplify the Minkowski sum of subdifferentials of the nondifferentiable objective function across different levels. We provide a theoretical characterization of MGProx. First we show that the MGProx update operator exhibits a fixed-point property. Next, we show that the coarse correction is a descent direction for the fine variable of the original fine level problem in the general nonsmooth case. Last, under some assumptions we provide the convergence rate for the algorithm. In the numerical tests on the elastic obstacle problem, which is an example of a nonsmooth convex optimization problem where the multigrid method can be applied, we show that MGProx has a faster convergence speed than competing methods.
{"title":"MGProx: A Nonsmooth Multigrid Proximal Gradient Method with Adaptive Restriction for Strongly Convex Optimization","authors":"Andersen Ang, Hans De Sterck, Stephen Vavasis","doi":"10.1137/23m1552140","DOIUrl":"https://doi.org/10.1137/23m1552140","url":null,"abstract":"SIAM Journal on Optimization, Volume 34, Issue 3, Page 2788-2820, September 2024. <br/> Abstract. We study the combination of proximal gradient descent with multigrid for solving a class of possibly nonsmooth strongly convex optimization problems. We propose a multigrid proximal gradient method called MGProx, which accelerates the proximal gradient method by multigrid, based on using hierarchical information of the optimization problem. MGProx applies a newly introduced adaptive restriction operator to simplify the Minkowski sum of subdifferentials of the nondifferentiable objective function across different levels. We provide a theoretical characterization of MGProx. First we show that the MGProx update operator exhibits a fixed-point property. Next, we show that the coarse correction is a descent direction for the fine variable of the original fine level problem in the general nonsmooth case. Last, under some assumptions we provide the convergence rate for the algorithm. In the numerical tests on the elastic obstacle problem, which is an example of a nonsmooth convex optimization problem where the multigrid method can be applied, we show that MGProx has a faster convergence speed than competing methods.","PeriodicalId":49529,"journal":{"name":"SIAM Journal on Optimization","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142212474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. D. Khanh, V. V. H. Khoa, B. S. Mordukhovich, V. T. Phat
SIAM Journal on Optimization, Volume 34, Issue 3, Page 2756-2787, September 2024. Abstract. This paper is devoted to a systematic study and characterizations of the fundamental notions of variational and strong variational convexity for lower semicontinuous functions. While these notions have been quite recently introduced by Rockafellar, the importance of them has already been recognized and documented in finite-dimensional variational analysis and optimization. Here we address general infinite-dimensional settings and derive comprehensive characterizations of both variational and strong variational convexity notions by developing novel techniques, which are essentially different from finite-dimensional counterparts. As a consequence of the obtained characterizations, we establish new quantitative and qualitative relationships between strong variational convexity and tilt stability of local minimizers in appropriate frameworks of Banach spaces.
{"title":"Variational and Strong Variational Convexity in Infinite-Dimensional Variational Analysis","authors":"P. D. Khanh, V. V. H. Khoa, B. S. Mordukhovich, V. T. Phat","doi":"10.1137/23m1604667","DOIUrl":"https://doi.org/10.1137/23m1604667","url":null,"abstract":"SIAM Journal on Optimization, Volume 34, Issue 3, Page 2756-2787, September 2024. <br/> Abstract. This paper is devoted to a systematic study and characterizations of the fundamental notions of variational and strong variational convexity for lower semicontinuous functions. While these notions have been quite recently introduced by Rockafellar, the importance of them has already been recognized and documented in finite-dimensional variational analysis and optimization. Here we address general infinite-dimensional settings and derive comprehensive characterizations of both variational and strong variational convexity notions by developing novel techniques, which are essentially different from finite-dimensional counterparts. As a consequence of the obtained characterizations, we establish new quantitative and qualitative relationships between strong variational convexity and tilt stability of local minimizers in appropriate frameworks of Banach spaces.","PeriodicalId":49529,"journal":{"name":"SIAM Journal on Optimization","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142212475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIAM Journal on Optimization, Volume 34, Issue 3, Page 2729-2755, September 2024. Abstract. Natural policy gradient (NPG) methods, equipped with function approximation and entropy regularization, achieve impressive empirical success in reinforcement learning problems with large state-action spaces. However, their convergence properties and the impact of entropy regularization remain elusive in the function approximation regime. In this paper, we establish finite-time convergence analyses of entropy-regularized NPG with linear function approximation under softmax parameterization. In particular, we prove that entropy-regularized NPG with averaging satisfies the persistence of excitation condition, and achieves a fast convergence rate of [math] up to a function approximation error in regularized Markov decision processes. This convergence result does not require any a priori assumptions on the policies. Furthermore, under mild regularity conditions on the concentrability coefficient and basis vectors, we prove that entropy-regularized NPG exhibits linear convergence up to the compatible function approximation error. Finally, we provide sample complexity results for sample-based NPG with entropy regularization.
{"title":"Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation","authors":"Semih Cayci, Niao He, R. Srikant","doi":"10.1137/22m1540156","DOIUrl":"https://doi.org/10.1137/22m1540156","url":null,"abstract":"SIAM Journal on Optimization, Volume 34, Issue 3, Page 2729-2755, September 2024. <br/> Abstract. Natural policy gradient (NPG) methods, equipped with function approximation and entropy regularization, achieve impressive empirical success in reinforcement learning problems with large state-action spaces. However, their convergence properties and the impact of entropy regularization remain elusive in the function approximation regime. In this paper, we establish finite-time convergence analyses of entropy-regularized NPG with linear function approximation under softmax parameterization. In particular, we prove that entropy-regularized NPG with averaging satisfies the persistence of excitation condition, and achieves a fast convergence rate of [math] up to a function approximation error in regularized Markov decision processes. This convergence result does not require any a priori assumptions on the policies. Furthermore, under mild regularity conditions on the concentrability coefficient and basis vectors, we prove that entropy-regularized NPG exhibits linear convergence up to the compatible function approximation error. Finally, we provide sample complexity results for sample-based NPG with entropy regularization.","PeriodicalId":49529,"journal":{"name":"SIAM Journal on Optimization","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141863905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIAM Journal on Optimization, Volume 34, Issue 3, Page 2671-2699, September 2024. Abstract. This work proposes a framework for large-scale stochastic derivative-free optimization (DFO) by introducing STARS, a trust-region method based on iterative minimization in random subspaces. This framework is both an algorithmic and theoretical extension of a random subspace derivative-free optimization (RSDFO) framework, and an algorithm for stochastic optimization with random models (STORM). Moreover, like RSDFO, STARS achieves scalability by minimizing interpolation models that approximate the objective in low-dimensional affine subspaces, thus significantly reducing per-iteration costs in terms of function evaluations and yielding strong performance on large-scale stochastic DFO problems. The user-determined dimension of these subspaces, when the latter are defined, for example, by the columns of so-called Johnson–Lindenstrauss transforms, turns out to be independent of the dimension of the problem. For convergence purposes, inspired by the analyses of RSDFO and STORM, both a particular quality of the subspace and the accuracies of random function estimates and models are required to hold with sufficiently high, but fixed, probabilities. Using martingale theory under the latter assumptions, an almost sure global convergence of STARS to a first-order stationary point is shown, and the expected number of iterations required to reach a desired first-order accuracy is proved to be similar to that of STORM and other stochastic DFO algorithms, up to constants.
{"title":"Stochastic Trust-Region Algorithm in Random Subspaces with Convergence and Expected Complexity Analyses","authors":"K. J. Dzahini, S. M. Wild","doi":"10.1137/22m1524072","DOIUrl":"https://doi.org/10.1137/22m1524072","url":null,"abstract":"SIAM Journal on Optimization, Volume 34, Issue 3, Page 2671-2699, September 2024. <br/> Abstract. This work proposes a framework for large-scale stochastic derivative-free optimization (DFO) by introducing STARS, a trust-region method based on iterative minimization in random subspaces. This framework is both an algorithmic and theoretical extension of a random subspace derivative-free optimization (RSDFO) framework, and an algorithm for stochastic optimization with random models (STORM). Moreover, like RSDFO, STARS achieves scalability by minimizing interpolation models that approximate the objective in low-dimensional affine subspaces, thus significantly reducing per-iteration costs in terms of function evaluations and yielding strong performance on large-scale stochastic DFO problems. The user-determined dimension of these subspaces, when the latter are defined, for example, by the columns of so-called Johnson–Lindenstrauss transforms, turns out to be independent of the dimension of the problem. For convergence purposes, inspired by the analyses of RSDFO and STORM, both a particular quality of the subspace and the accuracies of random function estimates and models are required to hold with sufficiently high, but fixed, probabilities. Using martingale theory under the latter assumptions, an almost sure global convergence of STARS to a first-order stationary point is shown, and the expected number of iterations required to reach a desired first-order accuracy is proved to be similar to that of STORM and other stochastic DFO algorithms, up to constants.","PeriodicalId":49529,"journal":{"name":"SIAM Journal on Optimization","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141775205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIAM Journal on Optimization, Volume 34, Issue 3, Page 2700-2728, September 2024. Abstract. In this work, we propose an outer approximation algorithm for solving bounded convex vector optimization problems (CVOPs). The scalarization model solved iteratively within the algorithm is a modification of the norm-minimizing scalarization proposed in [Ç. Ararat, F. Ulus, and M. Umer, J. Optim. Theory Appl., 194 (2022), pp. 681–712]. For a predetermined tolerance [math], we prove that the algorithm terminates after finitely many iterations, and it returns a polyhedral outer approximation to the upper image of the CVOP such that the Hausdorff distance between the two is less than [math]. We show that for an arbitrary norm used in the scalarization models, the approximation error after [math] iterations decreases by the order of [math], where [math] is the dimension of the objective space. An improved convergence rate of [math] is proved for the special case of using the Euclidean norm.
SIAM 优化期刊》,第 34 卷第 3 期,第 2700-2728 页,2024 年 9 月。 摘要在这项工作中,我们提出了一种求解有界凸向量优化问题(CVOPs)的外近似算法。算法中迭代求解的标量化模型是对[Ç. Ararat, F. Ulus, Ç...]中提出的规范最小化标量化的修正。Ararat, F. Ulus, and M. Umer, J. Optim.理论应用》,194 (2022),第 681-712 页]。对于预定公差 [math],我们证明该算法在有限次迭代后终止,并返回 CVOP 上像的多面体外近似,且两者之间的豪斯多夫距离小于 [math]。我们证明,对于标量化模型中使用的任意规范,[math] 次迭代后的近似误差会以 [math] 的数量级减小,其中 [math] 是目标空间的维度。对于使用欧氏规范的特殊情况,我们证明了[math]的收敛率有所提高。
{"title":"Convergence Analysis of a Norm Minimization-Based Convex Vector Optimization Algorithm","authors":"Çağin Ararat, Firdevs Ulus, Muhammad Umer","doi":"10.1137/23m1574580","DOIUrl":"https://doi.org/10.1137/23m1574580","url":null,"abstract":"SIAM Journal on Optimization, Volume 34, Issue 3, Page 2700-2728, September 2024. <br/> Abstract. In this work, we propose an outer approximation algorithm for solving bounded convex vector optimization problems (CVOPs). The scalarization model solved iteratively within the algorithm is a modification of the norm-minimizing scalarization proposed in [Ç. Ararat, F. Ulus, and M. Umer, J. Optim. Theory Appl., 194 (2022), pp. 681–712]. For a predetermined tolerance [math], we prove that the algorithm terminates after finitely many iterations, and it returns a polyhedral outer approximation to the upper image of the CVOP such that the Hausdorff distance between the two is less than [math]. We show that for an arbitrary norm used in the scalarization models, the approximation error after [math] iterations decreases by the order of [math], where [math] is the dimension of the objective space. An improved convergence rate of [math] is proved for the special case of using the Euclidean norm.","PeriodicalId":49529,"journal":{"name":"SIAM Journal on Optimization","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141863976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIAM Journal on Optimization, Volume 34, Issue 3, Page 2638-2670, September 2024. Abstract. Most zeroth-order optimization algorithms mimic a first-order algorithm but replace the gradient of the objective function with some gradient estimator that can be computed from a small number of function evaluations. This estimator is constructed randomly, and its expectation matches the gradient of a smooth approximation of the objective function whose quality improves as the underlying smoothing parameter [math] is reduced. Gradient estimators requiring a smaller number of function evaluations are preferable from a computational point of view. While estimators based on a single function evaluation can be obtained by use of the divergence theorem from vector calculus, their variance explodes as [math] tends to 0. Estimators based on multiple function evaluations, on the other hand, suffer from numerical cancellation when [math] tends to 0. To combat both effects simultaneously, we extend the objective function to the complex domain and construct a gradient estimator that evaluates the objective at a complex point whose coordinates have small imaginary parts of the order [math]. As this estimator requires only one function evaluation, it is immune to cancellation. In addition, its variance remains bounded as [math] tends to 0. We prove that zeroth-order algorithms that use our estimator offer the same theoretical convergence guarantees as the state-of-the-art methods. Numerical experiments suggest, however, that they often converge faster in practice.
{"title":"Small Errors in Random Zeroth-Order Optimization Are Imaginary","authors":"Wouter Jongeneel, Man-Chung Yue, Daniel Kuhn","doi":"10.1137/22m1510261","DOIUrl":"https://doi.org/10.1137/22m1510261","url":null,"abstract":"SIAM Journal on Optimization, Volume 34, Issue 3, Page 2638-2670, September 2024. <br/> Abstract. Most zeroth-order optimization algorithms mimic a first-order algorithm but replace the gradient of the objective function with some gradient estimator that can be computed from a small number of function evaluations. This estimator is constructed randomly, and its expectation matches the gradient of a smooth approximation of the objective function whose quality improves as the underlying smoothing parameter [math] is reduced. Gradient estimators requiring a smaller number of function evaluations are preferable from a computational point of view. While estimators based on a single function evaluation can be obtained by use of the divergence theorem from vector calculus, their variance explodes as [math] tends to 0. Estimators based on multiple function evaluations, on the other hand, suffer from numerical cancellation when [math] tends to 0. To combat both effects simultaneously, we extend the objective function to the complex domain and construct a gradient estimator that evaluates the objective at a complex point whose coordinates have small imaginary parts of the order [math]. As this estimator requires only one function evaluation, it is immune to cancellation. In addition, its variance remains bounded as [math] tends to 0. We prove that zeroth-order algorithms that use our estimator offer the same theoretical convergence guarantees as the state-of-the-art methods. Numerical experiments suggest, however, that they often converge faster in practice.","PeriodicalId":49529,"journal":{"name":"SIAM Journal on Optimization","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141738174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIAM Journal on Optimization, Volume 34, Issue 3, Page 2609-2637, September 2024. Abstract. This paper studies well-posedness and parameter sensitivity of the square root LASSO (SR-LASSO), an optimization model for recovering sparse solutions to linear inverse problems in finite dimension. An advantage of the SR-LASSO (e.g., over the standard LASSO) is that the optimal tuning of the regularization parameter is robust with respect to measurement noise. This paper provides three point-based regularity conditions at a solution of the SR-LASSO: the weak, intermediate, and strong assumptions. It is shown that the weak assumption implies uniqueness of the solution in question. The intermediate assumption yields a directionally differentiable and locally Lipschitz solution map (with explicit Lipschitz bounds), whereas the strong assumption gives continuous differentiability of said map around the point in question. Our analysis leads to new theoretical insights on the comparison between SR-LASSO and LASSO from the viewpoint of tuning parameter sensitivity: noise-robust optimal parameter choice for SR-LASSO comes at the “price” of elevated tuning parameter sensitivity. Numerical results support and showcase the theoretical findings.
{"title":"Square Root LASSO: Well-Posedness, Lipschitz Stability, and the Tuning Trade-Off","authors":"Aaron Berk, Simone Brugiapaglia, Tim Hoheisel","doi":"10.1137/23m1561968","DOIUrl":"https://doi.org/10.1137/23m1561968","url":null,"abstract":"SIAM Journal on Optimization, Volume 34, Issue 3, Page 2609-2637, September 2024. <br/> Abstract. This paper studies well-posedness and parameter sensitivity of the square root LASSO (SR-LASSO), an optimization model for recovering sparse solutions to linear inverse problems in finite dimension. An advantage of the SR-LASSO (e.g., over the standard LASSO) is that the optimal tuning of the regularization parameter is robust with respect to measurement noise. This paper provides three point-based regularity conditions at a solution of the SR-LASSO: the weak, intermediate, and strong assumptions. It is shown that the weak assumption implies uniqueness of the solution in question. The intermediate assumption yields a directionally differentiable and locally Lipschitz solution map (with explicit Lipschitz bounds), whereas the strong assumption gives continuous differentiability of said map around the point in question. Our analysis leads to new theoretical insights on the comparison between SR-LASSO and LASSO from the viewpoint of tuning parameter sensitivity: noise-robust optimal parameter choice for SR-LASSO comes at the “price” of elevated tuning parameter sensitivity. Numerical results support and showcase the theoretical findings.","PeriodicalId":49529,"journal":{"name":"SIAM Journal on Optimization","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141745955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}