{"title":"Mastering table tennis with hierarchy: a reinforcement learning approach with progressive self-play training","authors":"Hongxu Ma, Jianyin Fan, Haoran Xu, Qiang Wang","doi":"10.1007/s10489-025-06450-0","DOIUrl":null,"url":null,"abstract":"<div><p>Hierarchical Reinforcement Learning (HRL) is widely applied in various complex task scenarios. In complex tasks where simple model-free reinforcement learning struggles, hierarchical design allows for more efficient utilization of interactive data, significantly reducing training costs and improving training success rates. This study delves into the use of HRL based on the model-free policy layer to learn complex strategies for a robotic arm playing table tennis. Through processes such as pre-training, self-play training, and self-play training with top-level winning strategies, the robustness of the lower-level hitting strategies has been enhanced. Furthermore, a novel decay reward mechanism has been employed in the training of the higher-level agent to improve the win rate in adversarial matches against other methods. After pre-training and adversarial training, we achieved an average of 52 rally cycles for the forehand strategy and 48 rally cycles for the backhand strategy in testing. The high-level strategy training based on the decay reward mechanism resulted in an advantageous score when competing against other strategies.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 6","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06450-0","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Hierarchical Reinforcement Learning (HRL) is widely applied in various complex task scenarios. In complex tasks where simple model-free reinforcement learning struggles, hierarchical design allows for more efficient utilization of interactive data, significantly reducing training costs and improving training success rates. This study delves into the use of HRL based on the model-free policy layer to learn complex strategies for a robotic arm playing table tennis. Through processes such as pre-training, self-play training, and self-play training with top-level winning strategies, the robustness of the lower-level hitting strategies has been enhanced. Furthermore, a novel decay reward mechanism has been employed in the training of the higher-level agent to improve the win rate in adversarial matches against other methods. After pre-training and adversarial training, we achieved an average of 52 rally cycles for the forehand strategy and 48 rally cycles for the backhand strategy in testing. The high-level strategy training based on the decay reward mechanism resulted in an advantageous score when competing against other strategies.
期刊介绍:
With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance.
The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.