Motivated by personalized decision making, given observational data [Formula: see text] involving features [Formula: see text], assigned treatments or prescriptions [Formula: see text], and outcomes [Formula: see text], we propose a tree-based algorithm called optimal prescriptive tree (OPT) that uses either constant or linear models in the leaves of the tree to predict the counterfactuals and assign optimal treatments to new samples. We propose an objective function that balances optimality and accuracy. OPTs are interpretable and highly scalable, accommodate multiple treatments, and provide high-quality prescriptions. We report results involving synthetic and real data that show that OPTs either outperform or are comparable with several state-of-the-art methods. Given their combination of interpretability, scalability, generalizability, and performance, OPTs are an attractive alternative for personalized decision making in a variety of areas, such as online advertising and personalized medicine.
{"title":"Optimal Prescriptive Trees","authors":"D. Bertsimas, Jack Dunn, Nishanth Mundru","doi":"10.1287/IJOO.2018.0005","DOIUrl":"https://doi.org/10.1287/IJOO.2018.0005","url":null,"abstract":"Motivated by personalized decision making, given observational data [Formula: see text] involving features [Formula: see text], assigned treatments or prescriptions [Formula: see text], and outcomes [Formula: see text], we propose a tree-based algorithm called optimal prescriptive tree (OPT) that uses either constant or linear models in the leaves of the tree to predict the counterfactuals and assign optimal treatments to new samples. We propose an objective function that balances optimality and accuracy. OPTs are interpretable and highly scalable, accommodate multiple treatments, and provide high-quality prescriptions. We report results involving synthetic and real data that show that OPTs either outperform or are comparable with several state-of-the-art methods. Given their combination of interpretability, scalability, generalizability, and performance, OPTs are an attractive alternative for personalized decision making in a variety of areas, such as online advertising and personalized medicine.","PeriodicalId":73382,"journal":{"name":"INFORMS journal on optimization","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1287/IJOO.2018.0005","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48152889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine Learning and Optimization: Introduction to the Special Issue","authors":"D. Bertsimas","doi":"10.1287/IJOO.2019.0024","DOIUrl":"https://doi.org/10.1287/IJOO.2019.0024","url":null,"abstract":"","PeriodicalId":73382,"journal":{"name":"INFORMS journal on optimization","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1287/IJOO.2019.0024","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45410396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Distributionally robust optimization (DRO) has been introduced for solving stochastic programs in which the distribution of the random variables is unknown and must be estimated by samples from that distribution. A key element of DRO is the construction of the ambiguity set, which is a set of distributions that contains the true distribution with a high probability. Assuming that the true distribution has a probability density function, we propose a class of ambiguity sets based on confidence bands of the true density function. As examples, we consider the shape-restricted confidence bands and the confidence bands constructed with a kernel density estimation technique. The former allows us to incorporate the prior knowledge of the shape of the underlying density function (e.g., unimodality and monotonicity), and the latter enables us to handle multidimensional cases. Furthermore, we establish the convergence of the optimal value of DRO to that of the underlying stochastic program as the sample size increases. The DRO with our ambiguity set involves functional decision variables and infinitely many constraints. To address this challenge, we apply duality theory to reformulate the DRO to a finite-dimensional stochastic program, which is amenable to a stochastic subgradient scheme as a solution method.
{"title":"Distributionally Robust Optimization with Confidence Bands for Probability Density Functions","authors":"Xi Chen, Qihang Lin, Guanglin Xu","doi":"10.1287/ijoo.2021.0059","DOIUrl":"https://doi.org/10.1287/ijoo.2021.0059","url":null,"abstract":"Distributionally robust optimization (DRO) has been introduced for solving stochastic programs in which the distribution of the random variables is unknown and must be estimated by samples from that distribution. A key element of DRO is the construction of the ambiguity set, which is a set of distributions that contains the true distribution with a high probability. Assuming that the true distribution has a probability density function, we propose a class of ambiguity sets based on confidence bands of the true density function. As examples, we consider the shape-restricted confidence bands and the confidence bands constructed with a kernel density estimation technique. The former allows us to incorporate the prior knowledge of the shape of the underlying density function (e.g., unimodality and monotonicity), and the latter enables us to handle multidimensional cases. Furthermore, we establish the convergence of the optimal value of DRO to that of the underlying stochastic program as the sample size increases. The DRO with our ambiguity set involves functional decision variables and infinitely many constraints. To address this challenge, we apply duality theory to reformulate the DRO to a finite-dimensional stochastic program, which is amenable to a stochastic subgradient scheme as a solution method.","PeriodicalId":73382,"journal":{"name":"INFORMS journal on optimization","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45099064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"From the Editor","authors":"D. Bertsimas","doi":"10.1287/ijoo.2019.0011","DOIUrl":"https://doi.org/10.1287/ijoo.2019.0011","url":null,"abstract":"","PeriodicalId":73382,"journal":{"name":"INFORMS journal on optimization","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1287/ijoo.2019.0011","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46729307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose new constraint generation (CG) algorithms for solving the two-stage robust minimum cost flow problem, a problem that arises from various applications such as transportatio...
{"title":"Constraint Generation for Two-Stage Robust Network Flow Problems","authors":"D. Simchi-Levi, He Wang, Y. Wei","doi":"10.1287/IJOO.2018.0003","DOIUrl":"https://doi.org/10.1287/IJOO.2018.0003","url":null,"abstract":"In this paper, we propose new constraint generation (CG) algorithms for solving the two-stage robust minimum cost flow problem, a problem that arises from various applications such as transportatio...","PeriodicalId":73382,"journal":{"name":"INFORMS journal on optimization","volume":"62 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1287/IJOO.2018.0003","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66363375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Bertsimas, Jack Dunn, C. Pawlowski, Ying Daisy Zhuo
{"title":"Robust Classification","authors":"D. Bertsimas, Jack Dunn, C. Pawlowski, Ying Daisy Zhuo","doi":"10.1287/ijoo.2018.0001","DOIUrl":"https://doi.org/10.1287/ijoo.2018.0001","url":null,"abstract":"","PeriodicalId":73382,"journal":{"name":"INFORMS journal on optimization","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1287/ijoo.2018.0001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66363371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We first propose a semi-proximal augmented Lagrangian-based decomposition method to directly solve the primal form of a convex composite quadratic conic-programming problem with a primal block-angular structure. Using our algorithmic framework, we are able to naturally derive several well-known augmented Lagrangian-based decomposition methods for stochastic programming, such as the diagonal quadratic approximation method of Mulvey and Ruszczyński. Although it is natural to develop an augmented Lagrangian decomposition algorithm based on the primal problem, here, we demonstrate that it is, in fact, numerically more economical to solve the dual problem by an appropriately designed decomposition algorithm. In particular, we propose a semi-proximal symmetric Gauss–Seidel-based alternating direction method of multipliers (sGS-ADMM) for solving the corresponding dual problem. Numerical results show that our dual-based sGS-ADMM algorithm can very efficiently solve some very large instances of primal block-angular convex quadratic-programming problems. For example, one instance with more than 300,000 linear constraints and 12.5 million nonnegative variables is solved to the accuracy of 10-5 in the relative KKT residual in less than a minute on a modest desktop computer.
{"title":"Semi-proximal Augmented Lagrangian-Based Decomposition Methods for Primal Block-Angular Convex Composite Quadratic Conic Programming Problems","authors":"Xin-Yee Lam, Defeng Sun, K. Toh","doi":"10.1287/IJOO.2019.0048","DOIUrl":"https://doi.org/10.1287/IJOO.2019.0048","url":null,"abstract":"We first propose a semi-proximal augmented Lagrangian-based decomposition method to directly solve the primal form of a convex composite quadratic conic-programming problem with a primal block-angular structure. Using our algorithmic framework, we are able to naturally derive several well-known augmented Lagrangian-based decomposition methods for stochastic programming, such as the diagonal quadratic approximation method of Mulvey and Ruszczyński. Although it is natural to develop an augmented Lagrangian decomposition algorithm based on the primal problem, here, we demonstrate that it is, in fact, numerically more economical to solve the dual problem by an appropriately designed decomposition algorithm. In particular, we propose a semi-proximal symmetric Gauss–Seidel-based alternating direction method of multipliers (sGS-ADMM) for solving the corresponding dual problem. Numerical results show that our dual-based sGS-ADMM algorithm can very efficiently solve some very large instances of primal block-angular convex quadratic-programming problems. For example, one instance with more than 300,000 linear constraints and 12.5 million nonnegative variables is solved to the accuracy of 10-5 in the relative KKT residual in less than a minute on a modest desktop computer.","PeriodicalId":73382,"journal":{"name":"INFORMS journal on optimization","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46490424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The cubic regularized Newton method of Nesterov and Polyak has become increasingly popular for nonconvex optimization because of its capability of finding an approximate local solution with a second order guarantee and its low iteration complexity. Several recent works extend this method to the setting of minimizing the average of N smooth functions by replacing the exact gradients and Hessians with subsampled approximations. It is shown that the total Hessian sample complexity can be reduced to be sublinear in N per iteration by leveraging stochastic variance reduction techniques. We present an adaptive variance reduction scheme for a subsampled Newton method with cubic regularization and show that the expected Hessian sample complexity is [Formula: see text] for finding an [Formula: see text]-approximate local solution (in terms of first and second order guarantees, respectively). Moreover, we show that the same Hessian sample complexity is retained with fixed sample sizes if exact gradients are used. The techniques of our analysis are different from previous works in that we do not rely on high probability bounds based on matrix concentration inequalities. Instead, we derive and utilize new bounds on the third and fourth order moments of the average of random matrices, which are of independent interest on their own.
{"title":"Adaptive Stochastic Variance Reduction for Subsampled Newton Method with Cubic Regularization","authors":"Junyu Zhang, Lin Xiao, Shuzhong Zhang","doi":"10.1287/ijoo.2021.0058","DOIUrl":"https://doi.org/10.1287/ijoo.2021.0058","url":null,"abstract":"The cubic regularized Newton method of Nesterov and Polyak has become increasingly popular for nonconvex optimization because of its capability of finding an approximate local solution with a second order guarantee and its low iteration complexity. Several recent works extend this method to the setting of minimizing the average of N smooth functions by replacing the exact gradients and Hessians with subsampled approximations. It is shown that the total Hessian sample complexity can be reduced to be sublinear in N per iteration by leveraging stochastic variance reduction techniques. We present an adaptive variance reduction scheme for a subsampled Newton method with cubic regularization and show that the expected Hessian sample complexity is [Formula: see text] for finding an [Formula: see text]-approximate local solution (in terms of first and second order guarantees, respectively). Moreover, we show that the same Hessian sample complexity is retained with fixed sample sizes if exact gradients are used. The techniques of our analysis are different from previous works in that we do not rely on high probability bounds based on matrix concentration inequalities. Instead, we derive and utilize new bounds on the third and fourth order moments of the average of random matrices, which are of independent interest on their own.","PeriodicalId":73382,"journal":{"name":"INFORMS journal on optimization","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45490024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Bergou, Y. Diouane, V. Kunc, V. Kungurtsev, C. Royer
In many contemporary optimization problems such as those arising in machine learning, it can be computationally challenging or even infeasible to evaluate an entire function or its derivatives. This motivates the use of stochastic algorithms that sample problem data, which can jeopardize the guarantees obtained through classical globalization techniques in optimization, such as a line search. Using subsampled function values is particularly challenging for the latter strategy, which relies upon multiple evaluations. For nonconvex data-related problems, such as training deep learning models, one aims at developing methods that converge to second-order stationary points quickly, that is, escape saddle points efficiently. This is particularly difficult to ensure when one only accesses subsampled approximations of the objective and its derivatives. In this paper, we describe a stochastic algorithm based on negative curvature and Newton-type directions that are computed for a subsampling model of the objective. A line-search technique is used to enforce suitable decrease for this model; for a sufficiently large sample, a similar amount of reduction holds for the true objective. We then present worst-case complexity guarantees for a notion of stationarity tailored to the subsampling context. Our analysis encompasses the deterministic regime and allows us to identify sampling requirements for second-order line-search paradigms. As we illustrate through real data experiments, these worst-case estimates need not be satisfied for our method to be competitive with first-order strategies in practice.
{"title":"A Subsampling Line-Search Method with Second-Order Results","authors":"E. Bergou, Y. Diouane, V. Kunc, V. Kungurtsev, C. Royer","doi":"10.1287/ijoo.2022.0072","DOIUrl":"https://doi.org/10.1287/ijoo.2022.0072","url":null,"abstract":"In many contemporary optimization problems such as those arising in machine learning, it can be computationally challenging or even infeasible to evaluate an entire function or its derivatives. This motivates the use of stochastic algorithms that sample problem data, which can jeopardize the guarantees obtained through classical globalization techniques in optimization, such as a line search. Using subsampled function values is particularly challenging for the latter strategy, which relies upon multiple evaluations. For nonconvex data-related problems, such as training deep learning models, one aims at developing methods that converge to second-order stationary points quickly, that is, escape saddle points efficiently. This is particularly difficult to ensure when one only accesses subsampled approximations of the objective and its derivatives. In this paper, we describe a stochastic algorithm based on negative curvature and Newton-type directions that are computed for a subsampling model of the objective. A line-search technique is used to enforce suitable decrease for this model; for a sufficiently large sample, a similar amount of reduction holds for the true objective. We then present worst-case complexity guarantees for a notion of stationarity tailored to the subsampling context. Our analysis encompasses the deterministic regime and allows us to identify sampling requirements for second-order line-search paradigms. As we illustrate through real data experiments, these worst-case estimates need not be satisfied for our method to be competitive with first-order strategies in practice.","PeriodicalId":73382,"journal":{"name":"INFORMS journal on optimization","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45304134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chun-Lai Cheng, Y. Adulyasak, Louis-Martin Rousseau
Facility networks can be disrupted by, for example, power outages, poor weather conditions, or natural disasters, and the probabilities of these events may be difficult to estimate. This could lead to costly recourse decisions because customers cannot be served by the planned facilities. In this paper, we study a fixed-charge location problem (FLP) that considers disruption risks. We adopt a two-stage robust optimization method, by which facility location decisions are made here and now and recourse decisions to reassign customers are made after the uncertainty information on the facility availability has been revealed. We implement a column-and-constraint generation (C&CG) algorithm to solve the robust models exactly. Instead of relying on dualization or reformulation techniques to deal with the subproblem, as is common in the literature, we use a linear programming–based enumeration method that allows us to take into account a discrete uncertainty set of facility failures. This also gives the flexibility to tackle cases when the dualization technique cannot be applied to the subproblem. We further develop an approximation scheme for instances of a realistic size. Numerical experiments show that the proposed C&CG algorithm outperforms existing methods for both the robust FLP and the robust p-median problem.
{"title":"Robust Facility Location Under Disruptions","authors":"Chun-Lai Cheng, Y. Adulyasak, Louis-Martin Rousseau","doi":"10.1287/IJOO.2021.0054","DOIUrl":"https://doi.org/10.1287/IJOO.2021.0054","url":null,"abstract":"Facility networks can be disrupted by, for example, power outages, poor weather conditions, or natural disasters, and the probabilities of these events may be difficult to estimate. This could lead to costly recourse decisions because customers cannot be served by the planned facilities. In this paper, we study a fixed-charge location problem (FLP) that considers disruption risks. We adopt a two-stage robust optimization method, by which facility location decisions are made here and now and recourse decisions to reassign customers are made after the uncertainty information on the facility availability has been revealed. We implement a column-and-constraint generation (C&CG) algorithm to solve the robust models exactly. Instead of relying on dualization or reformulation techniques to deal with the subproblem, as is common in the literature, we use a linear programming–based enumeration method that allows us to take into account a discrete uncertainty set of facility failures. This also gives the flexibility to tackle cases when the dualization technique cannot be applied to the subproblem. We further develop an approximation scheme for instances of a realistic size. Numerical experiments show that the proposed C&CG algorithm outperforms existing methods for both the robust FLP and the robust p-median problem.","PeriodicalId":73382,"journal":{"name":"INFORMS journal on optimization","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42508210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}