Ryo Kuroiwa, Alexander Shleyfman, Chiara Piacentini, Margarita P. Castro, J. Christopher Beck
The LM-cut heuristic, both alone and as part of the operator counting framework, represents one of the most successful heuristics for classical planning. In this paper, we generalize LM-cut and its use in operator counting to optimal numeric planning with simple conditions and simple numeric effects, i.e., linear expressions over numeric state variables and actions that increase or decrease such variables by constant quantities. We introduce a variant of hmaxhbd (a previously proposed numeric hmax heuristic) based on the delete-relaxed version of such planning tasks and show that, although inadmissible by itself, our variant yields a numeric version of the classical LM-cut heuristic which is admissible. We classify the three existing families of heuristics for this class of numeric planning tasks and introduce the LM-cut family, proving dominance or incomparability between all pairs of existing max and LM-cut heuristics for numeric planning with simple conditions. Our extensive empirical evaluation shows that the new LM-cut heuristic, both on its own and as part of the operator counting framework, is the state-of-the-art for this class of numeric planning problem.
{"title":"The LM-Cut Heuristic Family for Optimal Numeric Planning with Simple Conditions","authors":"Ryo Kuroiwa, Alexander Shleyfman, Chiara Piacentini, Margarita P. Castro, J. Christopher Beck","doi":"10.1613/jair.1.14034","DOIUrl":"https://doi.org/10.1613/jair.1.14034","url":null,"abstract":"The LM-cut heuristic, both alone and as part of the operator counting framework, represents one of the most successful heuristics for classical planning. In this paper, we generalize LM-cut and its use in operator counting to optimal numeric planning with simple conditions and simple numeric effects, i.e., linear expressions over numeric state variables and actions that increase or decrease such variables by constant quantities. We introduce a variant of hmaxhbd (a previously proposed numeric hmax heuristic) based on the delete-relaxed version of such planning tasks and show that, although inadmissible by itself, our variant yields a numeric version of the classical LM-cut heuristic which is admissible. We classify the three existing families of heuristics for this class of numeric planning tasks and introduce the LM-cut family, proving dominance or incomparability between all pairs of existing max and LM-cut heuristics for numeric planning with simple conditions. Our extensive empirical evaluation shows that the new LM-cut heuristic, both on its own and as part of the operator counting framework, is the state-of-the-art for this class of numeric planning problem.","PeriodicalId":54877,"journal":{"name":"Journal of Artificial Intelligence Research","volume":"15 1","pages":"1477-1548"},"PeriodicalIF":5.0,"publicationDate":"2022-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88041327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Davide Dell’Anna, N. Alechina, F. Dalpiaz, M. Dastani, B. Logan
In multi-agent systems, norm enforcement is a mechanism for steering the behavior of individual agents in order to achieve desired system-level objectives. Due to the dynamics of multi-agent systems, however, it is hard to design norms that guarantee the achievement of the objectives in every operating context. Also, these objectives may change over time, thereby making previously defined norms ineffective. In this paper, we investigate the use of system execution data to automatically synthesise and revise conditional prohibitions with deadlines, a type of norms aimed at prohibiting agents from exhibiting certain patterns of behaviors. We propose DDNR (Data-Driven Norm Revision), a data-driven approach to norm revision that synthesises revised norms with respect to a data set of traces describing the behavior of the agents in the system. We evaluate DDNR using a state-of-the-art, off-the-shelf urban traffic simulator. The results show that DDNR synthesises revised norms that are significantly more accurate than the original norms in distinguishing adequate and inadequate behaviors for the achievement of the system-level objectives.
{"title":"Data-Driven Revision of Conditional Norms in Multi-Agent Systems","authors":"Davide Dell’Anna, N. Alechina, F. Dalpiaz, M. Dastani, B. Logan","doi":"10.1613/jair.1.13683","DOIUrl":"https://doi.org/10.1613/jair.1.13683","url":null,"abstract":"In multi-agent systems, norm enforcement is a mechanism for steering the behavior of individual agents in order to achieve desired system-level objectives. Due to the dynamics of multi-agent systems, however, it is hard to design norms that guarantee the achievement of the objectives in every operating context. Also, these objectives may change over time, thereby making previously defined norms ineffective. In this paper, we investigate the use of system execution data to automatically synthesise and revise conditional prohibitions with deadlines, a type of norms aimed at prohibiting agents from exhibiting certain patterns of behaviors. We propose DDNR (Data-Driven Norm Revision), a data-driven approach to norm revision that synthesises revised norms with respect to a data set of traces describing the behavior of the agents in the system. We evaluate DDNR using a state-of-the-art, off-the-shelf urban traffic simulator. The results show that DDNR synthesises revised norms that are significantly more accurate than the original norms in distinguishing adequate and inadequate behaviors for the achievement of the system-level objectives.","PeriodicalId":54877,"journal":{"name":"Journal of Artificial Intelligence Research","volume":"1 1","pages":"6868-6872"},"PeriodicalIF":5.0,"publicationDate":"2022-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48784946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Current Max-SAT solvers are able to efficiently compute the optimal value of an input instance but they do not provide any certificate of its validity. In this paper, we present a tool, called MS-Builder, which generates certificates for the Max-SAT problem in the particular form of a sequence of equivalence-preserving transformations. To generate a certificate, MS-Builder iteratively calls a SAT oracle to get a SAT resolution refutation which is handled and adapted into a sound refutation for Max-SAT. In particular, we prove that the size of the computed Max-SAT refutation is linear with respect to the size of the initial refutation if it is semi-read-once, tree-like regular, tree-like or semi-tree-like. Additionally, we propose an extendable tool, called MS-Checker, able to verify the validity of any Max-SAT certificate using Max-SAT inference rules. Both tools are evaluated on the unweighted and weighted benchmark instances of the 2020 Max-SAT Evaluation.
{"title":"Proofs and Certificates for Max-SAT","authors":"M. Py, Mohamed Sami Cherif, Djamal Habet","doi":"10.1613/jair.1.13811","DOIUrl":"https://doi.org/10.1613/jair.1.13811","url":null,"abstract":"Current Max-SAT solvers are able to efficiently compute the optimal value of an input instance but they do not provide any certificate of its validity. In this paper, we present a tool, called MS-Builder, which generates certificates for the Max-SAT problem in the particular form of a sequence of equivalence-preserving transformations. To generate a certificate, MS-Builder iteratively calls a SAT oracle to get a SAT resolution refutation which is handled and adapted into a sound refutation for Max-SAT. In particular, we prove that the size of the computed Max-SAT refutation is linear with respect to the size of the initial refutation if it is semi-read-once, tree-like regular, tree-like or semi-tree-like. Additionally, we propose an extendable tool, called MS-Checker, able to verify the validity of any Max-SAT certificate using Max-SAT inference rules. Both tools are evaluated on the unweighted and weighted benchmark instances of the 2020 Max-SAT Evaluation.","PeriodicalId":54877,"journal":{"name":"Journal of Artificial Intelligence Research","volume":"8 1","pages":"1373-1400"},"PeriodicalIF":5.0,"publicationDate":"2022-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88507166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Time management under uncertainty is essential to large scale projects. From space exploration to industrial production, there is a need to schedule and perform activities. given complex specifications on timing. In order to generate schedules that are robust to uncertainty in the duration of activities, prior work has focused on a problem framing that uses an interval-bounded uncertainty representation. However, such approaches are unable to take advantage of known probability distributions over duration. In this paper we concentrate on a probabilistic formulation of temporal problems with uncertain duration, called the probabilistic simple temporal problem. As distributions often have an unbounded range of outcomes, we consider chance-constrained solutions, with guarantees on the probability of meeting temporal constraints. By considering distributions over uncertain duration, we are able to use risk as a resource, reason over the relative likelihood of outcomes, and derive higher utility solutions. We first demonstrate our approach by encoding the problem as a convex program. We then develop a more efficient hybrid algorithm whose parent solver generates risk allocations and whose child solver generates schedules for a particular risk allocation. The child is made efficient by leveraging existing interval-bounded scheduling algorithms, while the parent is made efficient by extracting conflicts over risk allocations. We perform numerical experiments to show the advantages of reasoning over probabilistic uncertainty, by comparing the utility of schedules generated with risk allocation against those generated from reasoning over bounded uncertainty. We also empirically show that solution time is greatly reduced by incorporating conflict-directed risk allocation.
{"title":"Chance-constrained Static Schedules for Temporally Probabilistic Plans","authors":"Cheng Fang, Andrew J. Wang, B. Williams","doi":"10.1613/jair.1.13636","DOIUrl":"https://doi.org/10.1613/jair.1.13636","url":null,"abstract":"Time management under uncertainty is essential to large scale projects. From space exploration to industrial production, there is a need to schedule and perform activities. given complex specifications on timing. In order to generate schedules that are robust to uncertainty in the duration of activities, prior work has focused on a problem framing that uses an interval-bounded uncertainty representation. However, such approaches are unable to take advantage of known probability distributions over duration.\u0000In this paper we concentrate on a probabilistic formulation of temporal problems with uncertain duration, called the probabilistic simple temporal problem. As distributions often have an unbounded range of outcomes, we consider chance-constrained solutions, with guarantees on the probability of meeting temporal constraints. By considering distributions over uncertain duration, we are able to use risk as a resource, reason over the relative likelihood of outcomes, and derive higher utility solutions. We first demonstrate our approach by encoding the problem as a convex program. We then develop a more efficient hybrid algorithm whose parent solver generates risk allocations and whose child solver generates schedules for a particular risk allocation. The child is made efficient by leveraging existing interval-bounded scheduling algorithms, while the parent is made efficient by extracting conflicts over risk allocations. We perform numerical experiments to show the advantages of reasoning over probabilistic uncertainty, by comparing the utility of schedules generated with risk allocation against those generated from reasoning over bounded uncertainty. We also empirically show that solution time is greatly reduced by incorporating conflict-directed risk allocation.","PeriodicalId":54877,"journal":{"name":"Journal of Artificial Intelligence Research","volume":"103 19 1","pages":"1323-1372"},"PeriodicalIF":5.0,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86533324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrea Cohen, Sebastian Gottifredi, A. García, Guillermo R. Simari
Arguments in argumentation systems cannot always be considered as standalone entities, requiring the consideration of the pieces of evidence they rely on. This evidence might have to be retrieved from external sources such as databases or the web, and each attempt to retrieve a piece of evidence comes with an associated cost. Moreover, a piece of evidence may be available in a given scenario but not in others, and this is not known beforehand. As a result, the collection of active arguments (whose entire set of evidence is available) that can be used by the argumentation machinery of the system may vary from one scenario to another. In this work, we consider an Abstract Argumentation Framework with Fallible Evidence that accounts for these issues, and propose a heuristic measure used as part of the acceptability calculus (specifically, for building pruned dialectical trees) with the aim of minimizing the evidence retrieval cost of the arguments involved in the reasoning process. We provide an algorithmic solution that is empirically tested against two baselines and formally show the correctness of our approach.
{"title":"Towards Evidence Retrieval Cost Reduction in Abstract Argumentation Frameworks with Fallible Evidence","authors":"Andrea Cohen, Sebastian Gottifredi, A. García, Guillermo R. Simari","doi":"10.1613/jair.1.13639","DOIUrl":"https://doi.org/10.1613/jair.1.13639","url":null,"abstract":"Arguments in argumentation systems cannot always be considered as standalone entities, requiring the consideration of the pieces of evidence they rely on. This evidence might have to be retrieved from external sources such as databases or the web, and each attempt to retrieve a piece of evidence comes with an associated cost. Moreover, a piece of evidence may be available in a given scenario but not in others, and this is not known beforehand. As a result, the collection of active arguments (whose entire set of evidence is available) that can be used by the argumentation machinery of the system may vary from one scenario to another. In this work, we consider an Abstract Argumentation Framework with Fallible Evidence that accounts for these issues, and propose a heuristic measure used as part of the acceptability calculus (specifically, for building pruned dialectical trees) with the aim of minimizing the evidence retrieval cost of the arguments involved in the reasoning process. We provide an algorithmic solution that is empirically tested against two baselines and formally show the correctness of our approach.","PeriodicalId":54877,"journal":{"name":"Journal of Artificial Intelligence Research","volume":"28 1","pages":"1293-1322"},"PeriodicalIF":5.0,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73877920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents new methods for analyzing and evaluating generalized plans that can solve broad classes of related planning problems. Although synthesis and learning of generalized plans has been a longstanding goal in AI, it remains challenging due to fundamental gaps in methods for analyzing the scope and utility of a given generalized plan. This paper addresses these gaps by developing a new conceptual framework along with proof techniques and algorithmic processes for assessing termination and goal-reachability related properties of generalized plans. We build upon classic results from graph theory to decompose generalized plans into smaller components that are then used to derive hierarchical termination arguments. These methods can be used to determine the utility of a given generalized plan, as well as to guide the synthesis and learning processes for generalized plans. We present theoretical as well as empirical results illustrating the scope of this new approach. Our analysis shows that this approach significantly extends the class of generalized plans that can be assessed automatically, thereby reducing barriers in the synthesis and learning of reliable generalized plans.
{"title":"Hierarchical Decompositions and Termination Analysis for Generalized Planning","authors":"Siddharth Srivastava","doi":"10.1613/jair.1.14185","DOIUrl":"https://doi.org/10.1613/jair.1.14185","url":null,"abstract":"This paper presents new methods for analyzing and evaluating generalized plans that can solve broad classes of related planning problems. Although synthesis and learning of generalized plans has been a longstanding goal in AI, it remains challenging due to fundamental gaps in methods for analyzing the scope and utility of a given generalized plan. This paper addresses these gaps by developing a new conceptual framework along with proof techniques and algorithmic processes for assessing termination and goal-reachability related properties of generalized plans. We build upon classic results from graph theory to decompose generalized plans into smaller components that are then used to derive hierarchical termination arguments. These methods can be used to determine the utility of a given generalized plan, as well as to guide the synthesis and learning processes for generalized plans. We present theoretical as well as empirical results illustrating the scope of this new approach. Our analysis shows that this approach significantly extends the class of generalized plans that can be assessed automatically, thereby reducing barriers in the synthesis and learning of reliable generalized plans.","PeriodicalId":54877,"journal":{"name":"Journal of Artificial Intelligence Research","volume":"85 1","pages":"1203-1236"},"PeriodicalIF":5.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85412786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An influence diagram is a graphical model of a Bayesian decision problem that is solved by finding a strategy that maximizes expected utility. When an influence diagram is solved by variable elimination or a related dynamic programming algorithm, it is traditional to represent a strategy as a sequence of policies, one for each decision variable, where a policy maps the relevant history for a decision to an action. We propose an alternative representation of a strategy as a graph, called a strategy graph, and show how to modify a variable elimination algorithm so that it constructs a strategy graph. We consider both a classic variable elimination algorithm for influence diagrams and a recent extension of this algorithm that has more relaxed constraints on elimination order that allow improved performance. We consider the advantages of representing a strategy as a graph and, in particular, how to simplify a strategy graph so that it is easier to interpret and analyze.
{"title":"Strategy Graphs for Influence Diagrams","authors":"E. Hansen, Jinchuan Shi, James Kastrantas","doi":"10.1613/jair.1.13865","DOIUrl":"https://doi.org/10.1613/jair.1.13865","url":null,"abstract":"\u0000\u0000\u0000An influence diagram is a graphical model of a Bayesian decision problem that is solved by finding a strategy that maximizes expected utility. When an influence diagram is solved by variable elimination or a related dynamic programming algorithm, it is traditional to represent a strategy as a sequence of policies, one for each decision variable, where a policy maps the relevant history for a decision to an action. We propose an alternative representation of a strategy as a graph, called a strategy graph, and show how to modify a variable elimination algorithm so that it constructs a strategy graph. We consider both a classic variable elimination algorithm for influence diagrams and a recent extension of this algorithm that has more relaxed constraints on elimination order that allow improved performance. We consider the advantages of representing a strategy as a graph and, in particular, how to simplify a strategy graph so that it is easier to interpret and analyze.\u0000\u0000\u0000","PeriodicalId":54877,"journal":{"name":"Journal of Artificial Intelligence Research","volume":"231 1","pages":"1177-1221"},"PeriodicalIF":5.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91020447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Action abstractions restrict the number of legal actions available for real-time planning in zero-sum extensive-form games, thus allowing algorithms to focus their search on a set of promising actions. Even though unabstracted game trees can lead to optimal policies, due to real-time constraints and the tree size, they are not a practical choice. In this context, we introduce an action abstraction scheme which we call asymmetric action abstraction. Asymmetric abstractions allow search algorithms to “pay more attention” to some aspects of the game by unevenly dividing the algorithm’s search effort amongst different aspects of the game. We also introduce four algorithms that search in asymmetrically abstracted game trees to evaluate the effectiveness of our abstraction schemes. Two of our algorithms are adaptations of algorithms developed for searching in action-abstracted spaces, Portfolio Greedy Search and Stratified Strategy Selection, and the other two are adaptations of an algorithm developed for searching in unabstracted spaces, NaïveMCTS. An extensive set of experiments in a real-time strategy game shows that search algorithms using asymmetric abstractions are able to outperform all other search algorithms tested.
{"title":"Asymmetric Action Abstractions for Planning in Real-Time Strategy Games","authors":"Rubens O. Moraes, M. Nascimento, Levi H. S. Lelis","doi":"10.1613/jair.1.13769","DOIUrl":"https://doi.org/10.1613/jair.1.13769","url":null,"abstract":"Action abstractions restrict the number of legal actions available for real-time planning in zero-sum extensive-form games, thus allowing algorithms to focus their search on a set of promising actions. Even though unabstracted game trees can lead to optimal policies, due to real-time constraints and the tree size, they are not a practical choice. In this context, we introduce an action abstraction scheme which we call asymmetric action abstraction. Asymmetric abstractions allow search algorithms to “pay more attention” to some aspects of the game by unevenly dividing the algorithm’s search effort amongst different aspects of the game. We also introduce four algorithms that search in asymmetrically abstracted game trees to evaluate the effectiveness of our abstraction schemes. Two of our algorithms are adaptations of algorithms developed for searching in action-abstracted spaces, Portfolio Greedy Search and Stratified Strategy Selection, and the other two are adaptations of an algorithm developed for searching in unabstracted spaces, NaïveMCTS. An extensive set of experiments in a real-time strategy game shows that search algorithms using asymmetric abstractions are able to outperform all other search algorithms tested.","PeriodicalId":54877,"journal":{"name":"Journal of Artificial Intelligence Research","volume":"18 1","pages":"1103-1137"},"PeriodicalIF":5.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80962716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Farhad Mohsin, Ao Liu, Pin-Yu Chen, Francesca Rossi, Lirong Xia
Voting is used widely to identify a collective decision for a group of agents, based on their preferences. In this paper, we focus on evaluating and designing voting rules that support both the privacy of the voting agents and a notion of fairness over such agents. To do this, we introduce a novel notion of group fairness and adopt the existing notion of local differential privacy. We then evaluate the level of group fairness in several existing voting rules, as well as the trade-offs between fairness and privacy, showing that it is not possible to always obtain maximal economic efficiency with high fairness or high privacy levels. Then, we present both a machine learning and a constrained optimization approach to design new voting rules that are fair while maintaining a high level of economic efficiency. Finally, we empirically examine the effect of adding noise to create local differentially private voting rules and discuss the three-way trade-off between economic efficiency, fairness, and privacy. This paper appears in the special track on AI & Society.
{"title":"Learning to Design Fair and Private Voting Rules","authors":"Farhad Mohsin, Ao Liu, Pin-Yu Chen, Francesca Rossi, Lirong Xia","doi":"10.1613/jair.1.13734","DOIUrl":"https://doi.org/10.1613/jair.1.13734","url":null,"abstract":"Voting is used widely to identify a collective decision for a group of agents, based on their preferences. In this paper, we focus on evaluating and designing voting rules that support both the privacy of the voting agents and a notion of fairness over such agents. To do this, we introduce a novel notion of group fairness and adopt the existing notion of local differential privacy. We then evaluate the level of group fairness in several existing voting rules, as well as the trade-offs between fairness and privacy, showing that it is not possible to always obtain maximal economic efficiency with high fairness or high privacy levels. Then, we present both a machine learning and a constrained optimization approach to design new voting rules that are fair while maintaining a high level of economic efficiency. Finally, we empirically examine the effect of adding noise to create local differentially private voting rules and discuss the three-way trade-off between economic efficiency, fairness, and privacy.\u0000This paper appears in the special track on AI & Society.","PeriodicalId":54877,"journal":{"name":"Journal of Artificial Intelligence Research","volume":"50 1","pages":"1139-1176"},"PeriodicalIF":5.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86800576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Idle vehicle relocation is crucial for addressing demand-supply imbalance that frequently arises in the ride-hailing system. Current mainstream methodologies - optimization and reinforcement learning - suffer from obvious computational drawbacks. Optimization models need to be solved in real-time and often trade off model fidelity (hence quality of solutions) for computational efficiency. Reinforcement learning is expensive to train and often struggles to achieve coordination among a large fleet. This paper designs a hybrid approach that leverages the strengths of the two while overcoming their drawbacks. Specifically, it trains an optimization proxy, i.e., a machine-learning model that approximates an optimization model, and then refines the proxy with reinforcement learning. This Reinforcement Learning from Optimization Proxy (RLOP) approach is computationally efficient to train and deploy, and achieves better results than RL or optimization alone. Numerical experiments on the New York City dataset show that the RLOP approach reduces both the relocation costs and computation time significantly compared to the optimization model, while pure reinforcement learning fails to converge due to computational complexity.
{"title":"Reinforcement Learning from Optimization Proxy for Ride-Hailing Vehicle Relocation","authors":"Enpeng Yuan, Wenbo Chen, P. V. Hentenryck","doi":"10.1613/jair.1.13794","DOIUrl":"https://doi.org/10.1613/jair.1.13794","url":null,"abstract":"\u0000\u0000\u0000Idle vehicle relocation is crucial for addressing demand-supply imbalance that frequently arises in the ride-hailing system. Current mainstream methodologies - optimization and reinforcement learning - suffer from obvious computational drawbacks. Optimization models need to be solved in real-time and often trade off model fidelity (hence quality of solutions) for computational efficiency. Reinforcement learning is expensive to train and often struggles to achieve coordination among a large fleet. This paper designs a hybrid approach that leverages the strengths of the two while overcoming their drawbacks. Specifically, it trains an optimization proxy, i.e., a machine-learning model that approximates an optimization model, and then refines the proxy with reinforcement learning. This Reinforcement Learning from Optimization Proxy (RLOP) approach is computationally efficient to train and deploy, and achieves better results than RL or optimization alone. Numerical experiments on the New York City dataset show that the RLOP approach reduces both the relocation costs and computation time significantly compared to the optimization model, while pure reinforcement learning fails to converge due to computational complexity.\u0000\u0000\u0000","PeriodicalId":54877,"journal":{"name":"Journal of Artificial Intelligence Research","volume":"90 1","pages":"985-1002"},"PeriodicalIF":5.0,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83907912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}