{"title":"基于无模型多维q函数深度强化学习的乒乓任务策略","authors":"H. Ma, Jianyin Fan, Qiang Wang","doi":"10.1109/ICSAI57119.2022.10005466","DOIUrl":null,"url":null,"abstract":"Deep reinforcement learning has been widely used in table tennis decision-making tasks, but most methods have their own defects, such as relying on high-precision trajectory prediction work or requiring targeted class training, etc. None of the methods can directly get a complete hitting strategy through the initial state of the ball. In this paper, we train a ping-pong hitting policy controller with model-free reinforcement learning. By extending the multi-dimensional Q-function, the prediction part of the table tennis task and the batting strategy part are integrated, the trajectory prediction work and the batting work are completed by a single network, which simplifies the complex prediction process and does not need to build a complex dynamics model or train a neural network to predict the trajectory of ping-pong balls. In this way, the deep reinforcement learning process and supervised trajectory prediction training process are organized into a single process. The experimental results show that the best convergence effect can be basically achieved in 50,000 rounds of training. 10,000 tests were performed as a test set with a success rate of over 99%.","PeriodicalId":339547,"journal":{"name":"2022 8th International Conference on Systems and Informatics (ICSAI)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Novel Ping-pong Task Strategy Based on Model-free Multi-dimensional Q-function Deep Reinforcement Learning\",\"authors\":\"H. Ma, Jianyin Fan, Qiang Wang\",\"doi\":\"10.1109/ICSAI57119.2022.10005466\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep reinforcement learning has been widely used in table tennis decision-making tasks, but most methods have their own defects, such as relying on high-precision trajectory prediction work or requiring targeted class training, etc. None of the methods can directly get a complete hitting strategy through the initial state of the ball. In this paper, we train a ping-pong hitting policy controller with model-free reinforcement learning. By extending the multi-dimensional Q-function, the prediction part of the table tennis task and the batting strategy part are integrated, the trajectory prediction work and the batting work are completed by a single network, which simplifies the complex prediction process and does not need to build a complex dynamics model or train a neural network to predict the trajectory of ping-pong balls. In this way, the deep reinforcement learning process and supervised trajectory prediction training process are organized into a single process. The experimental results show that the best convergence effect can be basically achieved in 50,000 rounds of training. 10,000 tests were performed as a test set with a success rate of over 99%.\",\"PeriodicalId\":339547,\"journal\":{\"name\":\"2022 8th International Conference on Systems and Informatics (ICSAI)\",\"volume\":\"52 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 8th International Conference on Systems and Informatics (ICSAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSAI57119.2022.10005466\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 8th International Conference on Systems and Informatics (ICSAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSAI57119.2022.10005466","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Novel Ping-pong Task Strategy Based on Model-free Multi-dimensional Q-function Deep Reinforcement Learning
Deep reinforcement learning has been widely used in table tennis decision-making tasks, but most methods have their own defects, such as relying on high-precision trajectory prediction work or requiring targeted class training, etc. None of the methods can directly get a complete hitting strategy through the initial state of the ball. In this paper, we train a ping-pong hitting policy controller with model-free reinforcement learning. By extending the multi-dimensional Q-function, the prediction part of the table tennis task and the batting strategy part are integrated, the trajectory prediction work and the batting work are completed by a single network, which simplifies the complex prediction process and does not need to build a complex dynamics model or train a neural network to predict the trajectory of ping-pong balls. In this way, the deep reinforcement learning process and supervised trajectory prediction training process are organized into a single process. The experimental results show that the best convergence effect can be basically achieved in 50,000 rounds of training. 10,000 tests were performed as a test set with a success rate of over 99%.