{"title":"通过策略共享加速学习","authors":"Kao-Shing Hwang, Yu-Jen Chen, Wei-Cheng Jiang","doi":"10.1109/WCICA.2011.5970609","DOIUrl":null,"url":null,"abstract":"Reinforcement learning is one of the more prominent machine learning technologies due to its unsupervised learning structure and ability to continually learn, even in a dynamic operating environment. Applying this learning to cooperative multi-agent systems not only allows each individual agent to learn from its own experience, but also offers the opportunity for the individual agents to learn from the other agents in the system to increase the speed of learning can be accelerated. In the proposed learning algorithm, an agent store its experience in terms of state aggregation implemented with a decision tree, such that policy sharing between multi-agent is eventually accomplished by merging different decision trees between peers. Unlike lookup tables which have homogeneous structure for state aggregations, decision trees carried in agents are with heterogeneous structure. This work executes policy sharing between cooperative agents by means of forming a hyper structure from their trees instead of merging whole trees violently. The proposed scheme initially translates the whole decision tree from an agent to others. Based on the evidence, only partial leaf nodes hold helpful experience for policy sharing. The proposed method inducts a hyper decision tree by a great mount of samples which are sampled from the shared nodes. Results from simulations in multi-agent cooperative domain illustrate that the proposed algorithms perform better than the one without sharing.","PeriodicalId":211049,"journal":{"name":"2011 9th World Congress on Intelligent Control and Automation","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2011-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Learning acceleration by policy sharing\",\"authors\":\"Kao-Shing Hwang, Yu-Jen Chen, Wei-Cheng Jiang\",\"doi\":\"10.1109/WCICA.2011.5970609\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reinforcement learning is one of the more prominent machine learning technologies due to its unsupervised learning structure and ability to continually learn, even in a dynamic operating environment. Applying this learning to cooperative multi-agent systems not only allows each individual agent to learn from its own experience, but also offers the opportunity for the individual agents to learn from the other agents in the system to increase the speed of learning can be accelerated. In the proposed learning algorithm, an agent store its experience in terms of state aggregation implemented with a decision tree, such that policy sharing between multi-agent is eventually accomplished by merging different decision trees between peers. Unlike lookup tables which have homogeneous structure for state aggregations, decision trees carried in agents are with heterogeneous structure. This work executes policy sharing between cooperative agents by means of forming a hyper structure from their trees instead of merging whole trees violently. The proposed scheme initially translates the whole decision tree from an agent to others. Based on the evidence, only partial leaf nodes hold helpful experience for policy sharing. The proposed method inducts a hyper decision tree by a great mount of samples which are sampled from the shared nodes. Results from simulations in multi-agent cooperative domain illustrate that the proposed algorithms perform better than the one without sharing.\",\"PeriodicalId\":211049,\"journal\":{\"name\":\"2011 9th World Congress on Intelligent Control and Automation\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 9th World Congress on Intelligent Control and Automation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WCICA.2011.5970609\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 9th World Congress on Intelligent Control and Automation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WCICA.2011.5970609","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Reinforcement learning is one of the more prominent machine learning technologies due to its unsupervised learning structure and ability to continually learn, even in a dynamic operating environment. Applying this learning to cooperative multi-agent systems not only allows each individual agent to learn from its own experience, but also offers the opportunity for the individual agents to learn from the other agents in the system to increase the speed of learning can be accelerated. In the proposed learning algorithm, an agent store its experience in terms of state aggregation implemented with a decision tree, such that policy sharing between multi-agent is eventually accomplished by merging different decision trees between peers. Unlike lookup tables which have homogeneous structure for state aggregations, decision trees carried in agents are with heterogeneous structure. This work executes policy sharing between cooperative agents by means of forming a hyper structure from their trees instead of merging whole trees violently. The proposed scheme initially translates the whole decision tree from an agent to others. Based on the evidence, only partial leaf nodes hold helpful experience for policy sharing. The proposed method inducts a hyper decision tree by a great mount of samples which are sampled from the shared nodes. Results from simulations in multi-agent cooperative domain illustrate that the proposed algorithms perform better than the one without sharing.