Niannian Wu, Zongyu Yang, Rongpeng Li, Ning Wei, Yihang Chen, Qianyun Dong, Jiyuan Li, Guohui Zheng, Xinwen Gong, Feng Gao, Bo Li, Min Xu, Zhifeng Zhao, Wulyu Zhong
{"title":"基于强化学习的 HL-3 托卡马克磁控制的高保真数据驱动动力学模型","authors":"Niannian Wu, Zongyu Yang, Rongpeng Li, Ning Wei, Yihang Chen, Qianyun Dong, Jiyuan Li, Guohui Zheng, Xinwen Gong, Feng Gao, Bo Li, Min Xu, Zhifeng Zhao, Wulyu Zhong","doi":"arxiv-2409.09238","DOIUrl":null,"url":null,"abstract":"The drive to control tokamaks, a prominent technology in nuclear fusion, is\nessential due to its potential to provide a virtually unlimited source of clean\nenergy. Reinforcement learning (RL) promises improved flexibility to manage the\nintricate and non-linear dynamics of the plasma encapsulated in a tokamak.\nHowever, RL typically requires substantial interaction with a simulator capable\nof accurately evolving the high-dimensional plasma state. Compared to\nfirst-principle-based simulators, whose intense computations lead to sluggish\nRL training, we devise an effective method to acquire a fully data-driven\nsimulator, by mitigating the arising compounding error issue due to the\nunderlying autoregressive nature. With high accuracy and appealing\nextrapolation capability, this high-fidelity dynamics model subsequently\nenables the rapid training of a qualified RL agent to directly generate\nengineering-reasonable magnetic coil commands, aiming at the desired long-term\ntargets of plasma current and last closed flux surface. Together with a\nsurrogate magnetic equilibrium reconstruction model EFITNN, the RL agent\nsuccessfully maintains a $100$-ms, $1$ kHz trajectory control with accurate\nwaveform tracking on the HL-3 tokamak. Furthermore, it also demonstrates the\nfeasibility of zero-shot adaptation to changed triangularity targets,\nconfirming the robustness of the developed data-driven dynamics model. Our work\nunderscores the advantage of fully data-driven dynamics models in yielding\nRL-based trajectory control policies at a sufficiently fast pace, an\nanticipated engineering requirement in daily discharge practices for the\nupcoming ITER device.","PeriodicalId":501274,"journal":{"name":"arXiv - PHYS - Plasma Physics","volume":"41 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"High-Fidelity Data-Driven Dynamics Model for Reinforcement Learning-based Magnetic Control in HL-3 Tokamak\",\"authors\":\"Niannian Wu, Zongyu Yang, Rongpeng Li, Ning Wei, Yihang Chen, Qianyun Dong, Jiyuan Li, Guohui Zheng, Xinwen Gong, Feng Gao, Bo Li, Min Xu, Zhifeng Zhao, Wulyu Zhong\",\"doi\":\"arxiv-2409.09238\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The drive to control tokamaks, a prominent technology in nuclear fusion, is\\nessential due to its potential to provide a virtually unlimited source of clean\\nenergy. Reinforcement learning (RL) promises improved flexibility to manage the\\nintricate and non-linear dynamics of the plasma encapsulated in a tokamak.\\nHowever, RL typically requires substantial interaction with a simulator capable\\nof accurately evolving the high-dimensional plasma state. Compared to\\nfirst-principle-based simulators, whose intense computations lead to sluggish\\nRL training, we devise an effective method to acquire a fully data-driven\\nsimulator, by mitigating the arising compounding error issue due to the\\nunderlying autoregressive nature. With high accuracy and appealing\\nextrapolation capability, this high-fidelity dynamics model subsequently\\nenables the rapid training of a qualified RL agent to directly generate\\nengineering-reasonable magnetic coil commands, aiming at the desired long-term\\ntargets of plasma current and last closed flux surface. Together with a\\nsurrogate magnetic equilibrium reconstruction model EFITNN, the RL agent\\nsuccessfully maintains a $100$-ms, $1$ kHz trajectory control with accurate\\nwaveform tracking on the HL-3 tokamak. Furthermore, it also demonstrates the\\nfeasibility of zero-shot adaptation to changed triangularity targets,\\nconfirming the robustness of the developed data-driven dynamics model. Our work\\nunderscores the advantage of fully data-driven dynamics models in yielding\\nRL-based trajectory control policies at a sufficiently fast pace, an\\nanticipated engineering requirement in daily discharge practices for the\\nupcoming ITER device.\",\"PeriodicalId\":501274,\"journal\":{\"name\":\"arXiv - PHYS - Plasma Physics\",\"volume\":\"41 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - PHYS - Plasma Physics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.09238\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Plasma Physics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09238","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
High-Fidelity Data-Driven Dynamics Model for Reinforcement Learning-based Magnetic Control in HL-3 Tokamak
The drive to control tokamaks, a prominent technology in nuclear fusion, is
essential due to its potential to provide a virtually unlimited source of clean
energy. Reinforcement learning (RL) promises improved flexibility to manage the
intricate and non-linear dynamics of the plasma encapsulated in a tokamak.
However, RL typically requires substantial interaction with a simulator capable
of accurately evolving the high-dimensional plasma state. Compared to
first-principle-based simulators, whose intense computations lead to sluggish
RL training, we devise an effective method to acquire a fully data-driven
simulator, by mitigating the arising compounding error issue due to the
underlying autoregressive nature. With high accuracy and appealing
extrapolation capability, this high-fidelity dynamics model subsequently
enables the rapid training of a qualified RL agent to directly generate
engineering-reasonable magnetic coil commands, aiming at the desired long-term
targets of plasma current and last closed flux surface. Together with a
surrogate magnetic equilibrium reconstruction model EFITNN, the RL agent
successfully maintains a $100$-ms, $1$ kHz trajectory control with accurate
waveform tracking on the HL-3 tokamak. Furthermore, it also demonstrates the
feasibility of zero-shot adaptation to changed triangularity targets,
confirming the robustness of the developed data-driven dynamics model. Our work
underscores the advantage of fully data-driven dynamics models in yielding
RL-based trajectory control policies at a sufficiently fast pace, an
anticipated engineering requirement in daily discharge practices for the
upcoming ITER device.