基于强化学习的 HL-3 托卡马克磁控制的高保真数据驱动动力学模型

Niannian Wu, Zongyu Yang, Rongpeng Li, Ning Wei, Yihang Chen, Qianyun Dong, Jiyuan Li, Guohui Zheng, Xinwen Gong, Feng Gao, Bo Li, Min Xu, Zhifeng Zhao, Wulyu Zhong
{"title":"基于强化学习的 HL-3 托卡马克磁控制的高保真数据驱动动力学模型","authors":"Niannian Wu, Zongyu Yang, Rongpeng Li, Ning Wei, Yihang Chen, Qianyun Dong, Jiyuan Li, Guohui Zheng, Xinwen Gong, Feng Gao, Bo Li, Min Xu, Zhifeng Zhao, Wulyu Zhong","doi":"arxiv-2409.09238","DOIUrl":null,"url":null,"abstract":"The drive to control tokamaks, a prominent technology in nuclear fusion, is\nessential due to its potential to provide a virtually unlimited source of clean\nenergy. Reinforcement learning (RL) promises improved flexibility to manage the\nintricate and non-linear dynamics of the plasma encapsulated in a tokamak.\nHowever, RL typically requires substantial interaction with a simulator capable\nof accurately evolving the high-dimensional plasma state. Compared to\nfirst-principle-based simulators, whose intense computations lead to sluggish\nRL training, we devise an effective method to acquire a fully data-driven\nsimulator, by mitigating the arising compounding error issue due to the\nunderlying autoregressive nature. With high accuracy and appealing\nextrapolation capability, this high-fidelity dynamics model subsequently\nenables the rapid training of a qualified RL agent to directly generate\nengineering-reasonable magnetic coil commands, aiming at the desired long-term\ntargets of plasma current and last closed flux surface. Together with a\nsurrogate magnetic equilibrium reconstruction model EFITNN, the RL agent\nsuccessfully maintains a $100$-ms, $1$ kHz trajectory control with accurate\nwaveform tracking on the HL-3 tokamak. Furthermore, it also demonstrates the\nfeasibility of zero-shot adaptation to changed triangularity targets,\nconfirming the robustness of the developed data-driven dynamics model. Our work\nunderscores the advantage of fully data-driven dynamics models in yielding\nRL-based trajectory control policies at a sufficiently fast pace, an\nanticipated engineering requirement in daily discharge practices for the\nupcoming ITER device.","PeriodicalId":501274,"journal":{"name":"arXiv - PHYS - Plasma Physics","volume":"41 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"High-Fidelity Data-Driven Dynamics Model for Reinforcement Learning-based Magnetic Control in HL-3 Tokamak\",\"authors\":\"Niannian Wu, Zongyu Yang, Rongpeng Li, Ning Wei, Yihang Chen, Qianyun Dong, Jiyuan Li, Guohui Zheng, Xinwen Gong, Feng Gao, Bo Li, Min Xu, Zhifeng Zhao, Wulyu Zhong\",\"doi\":\"arxiv-2409.09238\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The drive to control tokamaks, a prominent technology in nuclear fusion, is\\nessential due to its potential to provide a virtually unlimited source of clean\\nenergy. Reinforcement learning (RL) promises improved flexibility to manage the\\nintricate and non-linear dynamics of the plasma encapsulated in a tokamak.\\nHowever, RL typically requires substantial interaction with a simulator capable\\nof accurately evolving the high-dimensional plasma state. Compared to\\nfirst-principle-based simulators, whose intense computations lead to sluggish\\nRL training, we devise an effective method to acquire a fully data-driven\\nsimulator, by mitigating the arising compounding error issue due to the\\nunderlying autoregressive nature. With high accuracy and appealing\\nextrapolation capability, this high-fidelity dynamics model subsequently\\nenables the rapid training of a qualified RL agent to directly generate\\nengineering-reasonable magnetic coil commands, aiming at the desired long-term\\ntargets of plasma current and last closed flux surface. Together with a\\nsurrogate magnetic equilibrium reconstruction model EFITNN, the RL agent\\nsuccessfully maintains a $100$-ms, $1$ kHz trajectory control with accurate\\nwaveform tracking on the HL-3 tokamak. Furthermore, it also demonstrates the\\nfeasibility of zero-shot adaptation to changed triangularity targets,\\nconfirming the robustness of the developed data-driven dynamics model. Our work\\nunderscores the advantage of fully data-driven dynamics models in yielding\\nRL-based trajectory control policies at a sufficiently fast pace, an\\nanticipated engineering requirement in daily discharge practices for the\\nupcoming ITER device.\",\"PeriodicalId\":501274,\"journal\":{\"name\":\"arXiv - PHYS - Plasma Physics\",\"volume\":\"41 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - PHYS - Plasma Physics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.09238\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Plasma Physics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09238","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

托卡马克是核聚变领域的一项重要技术,由于它有可能提供几乎无限的清洁能源,因此控制托卡马克至关重要。然而,强化学习(RL)通常需要与能够精确演化高维等离子体状态的模拟器进行大量交互。与基于第一原理的模拟器(其高强度计算导致 RL 训练迟缓)相比,我们设计了一种有效的方法来获得完全由数据驱动的模拟器,减轻了由于潜在的自回归性质而产生的复合误差问题。这种高保真动力学模型具有高精度和极具吸引力的外推能力,因此可以快速训练出合格的 RL 代理,直接生成工程合理的磁线圈指令,从而实现等离子体电流和最后闭合磁通表面的长期目标。结合代用磁平衡重建模型 EFITNN,RL 代理成功地在 HL-3 托卡马克上保持了 100 美元-毫秒、1 美元-千赫兹的轨迹控制和精确的波形跟踪。此外,它还证明了对变化的三角形目标进行零次适应的可行性,证实了所开发的数据驱动动力学模型的鲁棒性。我们的工作证明了全数据驱动动力学模型在以足够快的速度产生基于RL的轨迹控制策略方面的优势,而这正是即将投入使用的ITER装置在日常放电实践中的预期工程要求。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
High-Fidelity Data-Driven Dynamics Model for Reinforcement Learning-based Magnetic Control in HL-3 Tokamak
The drive to control tokamaks, a prominent technology in nuclear fusion, is essential due to its potential to provide a virtually unlimited source of clean energy. Reinforcement learning (RL) promises improved flexibility to manage the intricate and non-linear dynamics of the plasma encapsulated in a tokamak. However, RL typically requires substantial interaction with a simulator capable of accurately evolving the high-dimensional plasma state. Compared to first-principle-based simulators, whose intense computations lead to sluggish RL training, we devise an effective method to acquire a fully data-driven simulator, by mitigating the arising compounding error issue due to the underlying autoregressive nature. With high accuracy and appealing extrapolation capability, this high-fidelity dynamics model subsequently enables the rapid training of a qualified RL agent to directly generate engineering-reasonable magnetic coil commands, aiming at the desired long-term targets of plasma current and last closed flux surface. Together with a surrogate magnetic equilibrium reconstruction model EFITNN, the RL agent successfully maintains a $100$-ms, $1$ kHz trajectory control with accurate waveform tracking on the HL-3 tokamak. Furthermore, it also demonstrates the feasibility of zero-shot adaptation to changed triangularity targets, confirming the robustness of the developed data-driven dynamics model. Our work underscores the advantage of fully data-driven dynamics models in yielding RL-based trajectory control policies at a sufficiently fast pace, an anticipated engineering requirement in daily discharge practices for the upcoming ITER device.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Oscillation damper for misaligned witness in plasma wakefield accelerator Turbulence and transport in mirror geometries in the Large Plasma Device Wave Steepening and Shock Formation in Ultracold Neutral Plasmas Limitations from charge quantization on the parallel temperature diagnostic of nonneutral plasmas An Extended Variational Method for the Resistive Wall Mode in Toroidal Plasma Confinement Devices
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1