Niannian Wu, Zongyu Yang, Rongpeng Li, Ning Wei, Yihang Chen, Qianyun Dong, Jiyuan Li, Guohui Zheng, Xinwen Gong, Feng Gao, Bo Li, Min Xu, Zhifeng Zhao, Wulyu Zhong
{"title":"High-Fidelity Data-Driven Dynamics Model for Reinforcement Learning-based Magnetic Control in HL-3 Tokamak","authors":"Niannian Wu, Zongyu Yang, Rongpeng Li, Ning Wei, Yihang Chen, Qianyun Dong, Jiyuan Li, Guohui Zheng, Xinwen Gong, Feng Gao, Bo Li, Min Xu, Zhifeng Zhao, Wulyu Zhong","doi":"arxiv-2409.09238","DOIUrl":null,"url":null,"abstract":"The drive to control tokamaks, a prominent technology in nuclear fusion, is\nessential due to its potential to provide a virtually unlimited source of clean\nenergy. Reinforcement learning (RL) promises improved flexibility to manage the\nintricate and non-linear dynamics of the plasma encapsulated in a tokamak.\nHowever, RL typically requires substantial interaction with a simulator capable\nof accurately evolving the high-dimensional plasma state. Compared to\nfirst-principle-based simulators, whose intense computations lead to sluggish\nRL training, we devise an effective method to acquire a fully data-driven\nsimulator, by mitigating the arising compounding error issue due to the\nunderlying autoregressive nature. With high accuracy and appealing\nextrapolation capability, this high-fidelity dynamics model subsequently\nenables the rapid training of a qualified RL agent to directly generate\nengineering-reasonable magnetic coil commands, aiming at the desired long-term\ntargets of plasma current and last closed flux surface. Together with a\nsurrogate magnetic equilibrium reconstruction model EFITNN, the RL agent\nsuccessfully maintains a $100$-ms, $1$ kHz trajectory control with accurate\nwaveform tracking on the HL-3 tokamak. Furthermore, it also demonstrates the\nfeasibility of zero-shot adaptation to changed triangularity targets,\nconfirming the robustness of the developed data-driven dynamics model. Our work\nunderscores the advantage of fully data-driven dynamics models in yielding\nRL-based trajectory control policies at a sufficiently fast pace, an\nanticipated engineering requirement in daily discharge practices for the\nupcoming ITER device.","PeriodicalId":501274,"journal":{"name":"arXiv - PHYS - Plasma Physics","volume":"41 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Plasma Physics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09238","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The drive to control tokamaks, a prominent technology in nuclear fusion, is
essential due to its potential to provide a virtually unlimited source of clean
energy. Reinforcement learning (RL) promises improved flexibility to manage the
intricate and non-linear dynamics of the plasma encapsulated in a tokamak.
However, RL typically requires substantial interaction with a simulator capable
of accurately evolving the high-dimensional plasma state. Compared to
first-principle-based simulators, whose intense computations lead to sluggish
RL training, we devise an effective method to acquire a fully data-driven
simulator, by mitigating the arising compounding error issue due to the
underlying autoregressive nature. With high accuracy and appealing
extrapolation capability, this high-fidelity dynamics model subsequently
enables the rapid training of a qualified RL agent to directly generate
engineering-reasonable magnetic coil commands, aiming at the desired long-term
targets of plasma current and last closed flux surface. Together with a
surrogate magnetic equilibrium reconstruction model EFITNN, the RL agent
successfully maintains a $100$-ms, $1$ kHz trajectory control with accurate
waveform tracking on the HL-3 tokamak. Furthermore, it also demonstrates the
feasibility of zero-shot adaptation to changed triangularity targets,
confirming the robustness of the developed data-driven dynamics model. Our work
underscores the advantage of fully data-driven dynamics models in yielding
RL-based trajectory control policies at a sufficiently fast pace, an
anticipated engineering requirement in daily discharge practices for the
upcoming ITER device.