{"title":"Automating proton PBS treatment planning for head and neck cancers using policy gradient-based deep reinforcement learning","authors":"Qingqing Wang, Chang Chang","doi":"arxiv-2409.11576","DOIUrl":null,"url":null,"abstract":"Proton pencil beam scanning (PBS) treatment planning for head and neck (H&N)\ncancers is a time-consuming and experience-demanding task where a large number\nof planning objectives are involved. Deep reinforcement learning (DRL) has\nrecently been introduced to the planning processes of intensity-modulated\nradiation therapy and brachytherapy for prostate, lung, and cervical cancers.\nHowever, existing approaches are built upon the Q-learning framework and\nweighted linear combinations of clinical metrics, suffering from poor\nscalability and flexibility and only capable of adjusting a limited number of\nplanning objectives in discrete action spaces. We propose an automatic\ntreatment planning model using the proximal policy optimization (PPO) algorithm\nand a dose distribution-based reward function for proton PBS treatment planning\nof H&N cancers. Specifically, a set of empirical rules is used to create\nauxiliary planning structures from target volumes and organs-at-risk (OARs),\nalong with their associated planning objectives. These planning objectives are\nfed into an in-house optimization engine to generate the spot monitor unit (MU)\nvalues. A decision-making policy network trained using PPO is developed to\niteratively adjust the involved planning objective parameters in a continuous\naction space and refine the PBS treatment plans using a novel dose\ndistribution-based reward function. Proton H&N treatment plans generated by the\nmodel show improved OAR sparing with equal or superior target coverage when\ncompared with human-generated plans. Moreover, additional experiments on liver\ncancer demonstrate that the proposed method can be successfully generalized to\nother treatment sites. To the best of our knowledge, this is the first\nDRL-based automatic treatment planning model capable of achieving human-level\nperformance for H&N cancers.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"22 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Quantitative Methods","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11576","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Proton pencil beam scanning (PBS) treatment planning for head and neck (H&N)
cancers is a time-consuming and experience-demanding task where a large number
of planning objectives are involved. Deep reinforcement learning (DRL) has
recently been introduced to the planning processes of intensity-modulated
radiation therapy and brachytherapy for prostate, lung, and cervical cancers.
However, existing approaches are built upon the Q-learning framework and
weighted linear combinations of clinical metrics, suffering from poor
scalability and flexibility and only capable of adjusting a limited number of
planning objectives in discrete action spaces. We propose an automatic
treatment planning model using the proximal policy optimization (PPO) algorithm
and a dose distribution-based reward function for proton PBS treatment planning
of H&N cancers. Specifically, a set of empirical rules is used to create
auxiliary planning structures from target volumes and organs-at-risk (OARs),
along with their associated planning objectives. These planning objectives are
fed into an in-house optimization engine to generate the spot monitor unit (MU)
values. A decision-making policy network trained using PPO is developed to
iteratively adjust the involved planning objective parameters in a continuous
action space and refine the PBS treatment plans using a novel dose
distribution-based reward function. Proton H&N treatment plans generated by the
model show improved OAR sparing with equal or superior target coverage when
compared with human-generated plans. Moreover, additional experiments on liver
cancer demonstrate that the proposed method can be successfully generalized to
other treatment sites. To the best of our knowledge, this is the first
DRL-based automatic treatment planning model capable of achieving human-level
performance for H&N cancers.