{"title":"Optimal multivariate decision trees","authors":"Justin Boutilier, Carla Michini, Zachary Zhou","doi":"10.1007/s10601-023-09367-y","DOIUrl":null,"url":null,"abstract":"<p>Recently, mixed-integer programming (MIP) techniques have been applied to learn optimal decision trees. Empirical research has shown that optimal trees typically have better out-of-sample performance than heuristic approaches such as CART. However, the underlying MIP formulations often suffer from weak linear programming (LP) relaxations. Many existing MIP approaches employ big-M constraints to ensure observations are routed throughout the tree in a feasible manner. This paper introduces new MIP formulations for learning optimal decision trees with multivariate branching rules and no assumptions on the feature types. We first propose a strong baseline MIP formulation that still uses big-M constraints, but yields a stronger LP relaxation than its counterparts in the literature. We then introduce a problem-specific class of valid inequalities called shattering inequalities. Each inequality encodes an inclusion-minimal set of points that cannot be shattered by a multivariate split, and in the context of a MIP formulation, the inequalities are sparse, involving at most the number of features plus two variables. We propose a separation procedure that attempts to find a violated inequality given a (possibly fractional) solution to the LP relaxation; in the case where the solution is integer, the separation is exact. Numerical experiments show that our MIP approach outperforms two other MIP formulations in terms of solution time and relative gap, and is able to improve solution time while remaining competitive with regards to out-of-sample accuracy in comparison to a wider range of approaches from the literature.</p>","PeriodicalId":55211,"journal":{"name":"Constraints","volume":"11 1","pages":""},"PeriodicalIF":0.5000,"publicationDate":"2023-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Constraints","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10601-023-09367-y","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Recently, mixed-integer programming (MIP) techniques have been applied to learn optimal decision trees. Empirical research has shown that optimal trees typically have better out-of-sample performance than heuristic approaches such as CART. However, the underlying MIP formulations often suffer from weak linear programming (LP) relaxations. Many existing MIP approaches employ big-M constraints to ensure observations are routed throughout the tree in a feasible manner. This paper introduces new MIP formulations for learning optimal decision trees with multivariate branching rules and no assumptions on the feature types. We first propose a strong baseline MIP formulation that still uses big-M constraints, but yields a stronger LP relaxation than its counterparts in the literature. We then introduce a problem-specific class of valid inequalities called shattering inequalities. Each inequality encodes an inclusion-minimal set of points that cannot be shattered by a multivariate split, and in the context of a MIP formulation, the inequalities are sparse, involving at most the number of features plus two variables. We propose a separation procedure that attempts to find a violated inequality given a (possibly fractional) solution to the LP relaxation; in the case where the solution is integer, the separation is exact. Numerical experiments show that our MIP approach outperforms two other MIP formulations in terms of solution time and relative gap, and is able to improve solution time while remaining competitive with regards to out-of-sample accuracy in comparison to a wider range of approaches from the literature.
期刊介绍:
Constraints provides a common forum for the many disciplines interested in constraint programming and constraint satisfaction and optimization, and the many application domains in which constraint technology is employed. It covers all aspects of computing with constraints: theory and practice, algorithms and systems, reasoning and programming, logics and languages.