基于无模型多维q函数深度强化学习的乒乓任务策略

2022 8th International Conference on Systems and Informatics (ICSAI) Pub Date : 2022-12-10 DOI:10.1109/ICSAI57119.2022.10005466

H. Ma, Jianyin Fan, Qiang Wang

{"title":"基于无模型多维q函数深度强化学习的乒乓任务策略","authors":"H. Ma, Jianyin Fan, Qiang Wang","doi":"10.1109/ICSAI57119.2022.10005466","DOIUrl":null,"url":null,"abstract":"Deep reinforcement learning has been widely used in table tennis decision-making tasks, but most methods have their own defects, such as relying on high-precision trajectory prediction work or requiring targeted class training, etc. None of the methods can directly get a complete hitting strategy through the initial state of the ball. In this paper, we train a ping-pong hitting policy controller with model-free reinforcement learning. By extending the multi-dimensional Q-function, the prediction part of the table tennis task and the batting strategy part are integrated, the trajectory prediction work and the batting work are completed by a single network, which simplifies the complex prediction process and does not need to build a complex dynamics model or train a neural network to predict the trajectory of ping-pong balls. In this way, the deep reinforcement learning process and supervised trajectory prediction training process are organized into a single process. The experimental results show that the best convergence effect can be basically achieved in 50,000 rounds of training. 10,000 tests were performed as a test set with a success rate of over 99%.","PeriodicalId":339547,"journal":{"name":"2022 8th International Conference on Systems and Informatics (ICSAI)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Novel Ping-pong Task Strategy Based on Model-free Multi-dimensional Q-function Deep Reinforcement Learning\",\"authors\":\"H. Ma, Jianyin Fan, Qiang Wang\",\"doi\":\"10.1109/ICSAI57119.2022.10005466\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep reinforcement learning has been widely used in table tennis decision-making tasks, but most methods have their own defects, such as relying on high-precision trajectory prediction work or requiring targeted class training, etc. None of the methods can directly get a complete hitting strategy through the initial state of the ball. In this paper, we train a ping-pong hitting policy controller with model-free reinforcement learning. By extending the multi-dimensional Q-function, the prediction part of the table tennis task and the batting strategy part are integrated, the trajectory prediction work and the batting work are completed by a single network, which simplifies the complex prediction process and does not need to build a complex dynamics model or train a neural network to predict the trajectory of ping-pong balls. In this way, the deep reinforcement learning process and supervised trajectory prediction training process are organized into a single process. The experimental results show that the best convergence effect can be basically achieved in 50,000 rounds of training. 10,000 tests were performed as a test set with a success rate of over 99%.\",\"PeriodicalId\":339547,\"journal\":{\"name\":\"2022 8th International Conference on Systems and Informatics (ICSAI)\",\"volume\":\"52 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 8th International Conference on Systems and Informatics (ICSAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSAI57119.2022.10005466\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 8th International Conference on Systems and Informatics (ICSAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSAI57119.2022.10005466","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

深度强化学习已被广泛应用于乒乓球决策任务中，但大多数方法都存在自身的缺陷，如依赖高精度的轨迹预测工作或需要有针对性的班级训练等。没有一种方法可以通过球的初始状态直接得到一个完整的击球策略。在本文中，我们使用无模型强化学习训练一个乒乓球击球策略控制器。通过扩展多维q函数，将乒乓球任务的预测部分与击球策略部分相结合，将轨迹预测工作与击球工作由一个网络完成，简化了复杂的预测过程，不需要建立复杂的动力学模型或训练神经网络来预测乒乓球的轨迹。这样，深度强化学习过程和监督式轨迹预测训练过程被组织成一个过程。实验结果表明，在5万轮训练中基本可以达到最佳的收敛效果。作为一个测试集进行了10,000次测试，成功率超过99%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Novel Ping-pong Task Strategy Based on Model-free Multi-dimensional Q-function Deep Reinforcement Learning

Deep reinforcement learning has been widely used in table tennis decision-making tasks, but most methods have their own defects, such as relying on high-precision trajectory prediction work or requiring targeted class training, etc. None of the methods can directly get a complete hitting strategy through the initial state of the ball. In this paper, we train a ping-pong hitting policy controller with model-free reinforcement learning. By extending the multi-dimensional Q-function, the prediction part of the table tennis task and the batting strategy part are integrated, the trajectory prediction work and the batting work are completed by a single network, which simplifies the complex prediction process and does not need to build a complex dynamics model or train a neural network to predict the trajectory of ping-pong balls. In this way, the deep reinforcement learning process and supervised trajectory prediction training process are organized into a single process. The experimental results show that the best convergence effect can be basically achieved in 50,000 rounds of training. 10,000 tests were performed as a test set with a success rate of over 99%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 8th International Conference on Systems and Informatics (ICSAI)

自引率

0.00%

发文量