{"title":"基于改进NSGA2的快速MCVI","authors":"Yin Liu, Yingping Zhou, Shuai Chen","doi":"10.1109/IHMSC.2014.38","DOIUrl":null,"url":null,"abstract":"Nowadays, the partially observable Markov decision processes (POMDPs) is widely used in many fields. The solutions to POMDP suffer from prohibitive computational complexity due to curse of dimensionality, but MCVI for POMDP is envisioned as a promising approach to break the curse. Although MCVI is a great breakthrough toward solving this problem, it still has some defects, such as the slow convergence rate and the continuous growth of nodes' number of policy graph. To this end, the purpose of this paper is to provide a fast MCVI based on improved NSGA2. Different from the general NSGA2, the improved NSGA2 initializes the population by experiential knowledge and uses a self-adjustable value as the probability of cross and mutation. Before executing the MCVI, the algorithm will set a series of thresholds. When the algorithm gets a temporary policy graph which reaches one of the thresholds, it will use a discount operator to update the threshold and use the improved NSGA2 to update policy graph. After that, the algorithm will execute the MCVI again and repeat this process until the end. Numerical experiments show that the fast MCVI achieves about 8% increase in convergence rate over original MCVI, and about 60% decrease in nodes' number of policy graph, for the classic problem of corridor.","PeriodicalId":370654,"journal":{"name":"2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fast MCVI Based on Improved NSGA2\",\"authors\":\"Yin Liu, Yingping Zhou, Shuai Chen\",\"doi\":\"10.1109/IHMSC.2014.38\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, the partially observable Markov decision processes (POMDPs) is widely used in many fields. The solutions to POMDP suffer from prohibitive computational complexity due to curse of dimensionality, but MCVI for POMDP is envisioned as a promising approach to break the curse. Although MCVI is a great breakthrough toward solving this problem, it still has some defects, such as the slow convergence rate and the continuous growth of nodes' number of policy graph. To this end, the purpose of this paper is to provide a fast MCVI based on improved NSGA2. Different from the general NSGA2, the improved NSGA2 initializes the population by experiential knowledge and uses a self-adjustable value as the probability of cross and mutation. Before executing the MCVI, the algorithm will set a series of thresholds. When the algorithm gets a temporary policy graph which reaches one of the thresholds, it will use a discount operator to update the threshold and use the improved NSGA2 to update policy graph. After that, the algorithm will execute the MCVI again and repeat this process until the end. Numerical experiments show that the fast MCVI achieves about 8% increase in convergence rate over original MCVI, and about 60% decrease in nodes' number of policy graph, for the classic problem of corridor.\",\"PeriodicalId\":370654,\"journal\":{\"name\":\"2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IHMSC.2014.38\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IHMSC.2014.38","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Nowadays, the partially observable Markov decision processes (POMDPs) is widely used in many fields. The solutions to POMDP suffer from prohibitive computational complexity due to curse of dimensionality, but MCVI for POMDP is envisioned as a promising approach to break the curse. Although MCVI is a great breakthrough toward solving this problem, it still has some defects, such as the slow convergence rate and the continuous growth of nodes' number of policy graph. To this end, the purpose of this paper is to provide a fast MCVI based on improved NSGA2. Different from the general NSGA2, the improved NSGA2 initializes the population by experiential knowledge and uses a self-adjustable value as the probability of cross and mutation. Before executing the MCVI, the algorithm will set a series of thresholds. When the algorithm gets a temporary policy graph which reaches one of the thresholds, it will use a discount operator to update the threshold and use the improved NSGA2 to update policy graph. After that, the algorithm will execute the MCVI again and repeat this process until the end. Numerical experiments show that the fast MCVI achieves about 8% increase in convergence rate over original MCVI, and about 60% decrease in nodes' number of policy graph, for the classic problem of corridor.