{"title":"Distributed reinforcement learning for multiple objective optimization problems","authors":"C. Mariano, E. Morales","doi":"10.1109/CEC.2000.870294","DOIUrl":null,"url":null,"abstract":"This paper describes the application and performance evaluation of a new algorithm for multiple objective optimization problems (MOOP) based on reinforcement learning. The new algorithm, called MDQL, considers a family of agents for each objective function involved in a MOOP. Each agent proposes a solution for its corresponding objective function. Agents leave traces while they construct solutions considering traces made by other agents. The solutions proposed by the agents are evaluated using a non-domination criterion and solutions in the final Pareto set for each iteration are rewarded. A mechanism for the application of MDQL in continuous spaces which considers a fixed set of possible actions for the states (the number of actions depends on the dimensionality of the MOOP), is also proposed. Each action represents a path direction and its magnitude is changed dynamically depending on the evaluation of the state that the agent reached. Constraint handling, based on reinforcement comparison, considers reference values for constraints, penalizing agents violating any of them proportionally to the violation committed. MDQL performance was measured with \"error ratio\" and \"spacing\" metrics on four test bed problems suggested in the literature, showing competitive results with state-of-the-art algorithms.","PeriodicalId":218136,"journal":{"name":"Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEC.2000.870294","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
Abstract
This paper describes the application and performance evaluation of a new algorithm for multiple objective optimization problems (MOOP) based on reinforcement learning. The new algorithm, called MDQL, considers a family of agents for each objective function involved in a MOOP. Each agent proposes a solution for its corresponding objective function. Agents leave traces while they construct solutions considering traces made by other agents. The solutions proposed by the agents are evaluated using a non-domination criterion and solutions in the final Pareto set for each iteration are rewarded. A mechanism for the application of MDQL in continuous spaces which considers a fixed set of possible actions for the states (the number of actions depends on the dimensionality of the MOOP), is also proposed. Each action represents a path direction and its magnitude is changed dynamically depending on the evaluation of the state that the agent reached. Constraint handling, based on reinforcement comparison, considers reference values for constraints, penalizing agents violating any of them proportionally to the violation committed. MDQL performance was measured with "error ratio" and "spacing" metrics on four test bed problems suggested in the literature, showing competitive results with state-of-the-art algorithms.