Branch Ranking for Efficient Mixed-Integer Programming via Offline Ranking-based Policy Learning

Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference) Pub Date : 2022-07-26 DOI:10.48550/arXiv.2207.13701

Zeren Huang, Wenhao Chen, Weinan Zhang, Chuhan Shi, Furui Liu, Hui-Ling Zhen, M. Yuan, Jianye Hao, Yong Yu, Jun Wang

{"title":"Branch Ranking for Efficient Mixed-Integer Programming via Offline Ranking-based Policy Learning","authors":"Zeren Huang, Wenhao Chen, Weinan Zhang, Chuhan Shi, Furui Liu, Hui-Ling Zhen, M. Yuan, Jianye Hao, Yong Yu, Jun Wang","doi":"10.48550/arXiv.2207.13701","DOIUrl":null,"url":null,"abstract":"Deriving a good variable selection strategy in branch-and-bound is essential for the efficiency of modern mixed-integer programming (MIP) solvers. With MIP branching data collected during the previous solution process, learning to branch methods have recently become superior over heuristics. As branch-and-bound is naturally a sequential decision making task, one should learn to optimize the utility of the whole MIP solving process instead of being myopic on each step. In this work, we formulate learning to branch as an offline reinforcement learning (RL) problem, and propose a long-sighted hybrid search scheme to construct the offline MIP dataset, which values the long-term utilities of branching decisions. During the policy training phase, we deploy a ranking-based reward assignment scheme to distinguish the promising samples from the long-term or short-term view, and train the branching model named Branch Ranking via offline policy learning. Experiments on synthetic MIP benchmarks and real-world tasks demonstrate that Branch Rankink is more efficient and robust, and can better generalize to large scales of MIP instances compared to the widely used heuristics and state-of-the-art learning-based branching models.","PeriodicalId":74091,"journal":{"name":"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2207.13701","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Deriving a good variable selection strategy in branch-and-bound is essential for the efficiency of modern mixed-integer programming (MIP) solvers. With MIP branching data collected during the previous solution process, learning to branch methods have recently become superior over heuristics. As branch-and-bound is naturally a sequential decision making task, one should learn to optimize the utility of the whole MIP solving process instead of being myopic on each step. In this work, we formulate learning to branch as an offline reinforcement learning (RL) problem, and propose a long-sighted hybrid search scheme to construct the offline MIP dataset, which values the long-term utilities of branching decisions. During the policy training phase, we deploy a ranking-based reward assignment scheme to distinguish the promising samples from the long-term or short-term view, and train the branching model named Branch Ranking via offline policy learning. Experiments on synthetic MIP benchmarks and real-world tasks demonstrate that Branch Rankink is more efficient and robust, and can better generalize to large scales of MIP instances compared to the widely used heuristics and state-of-the-art learning-based branching models.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于离线排序策略学习的高效混合整数规划分支排序

在分支定界问题中，给出一个好的变量选择策略是保证现代混合整数规划(MIP)求解效率的关键。由于在之前的解决过程中收集了MIP分支数据，学习分支方法最近变得优于启发式方法。由于分支定界是一个自然的顺序决策任务，人们应该学会优化整个MIP求解过程的效用，而不是在每一步都短视。在这项工作中，我们将分支学习制定为离线强化学习(RL)问题，并提出了一种长期的混合搜索方案来构建离线MIP数据集，该数据集重视分支决策的长期效用。在策略训练阶段，我们采用基于排名的奖励分配方案，从长期和短期角度区分有希望的样本，并通过离线策略学习训练分支模型Branch Ranking。在综合MIP基准和现实世界任务上的实验表明，与广泛使用的启发式和最先进的基于学习的分支模型相比，Branch Rankink更高效、鲁棒，可以更好地泛化到大规模的MIP实例。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)

自引率

0.00%

发文量