Reinforcement learning in large, structured action spaces: A simulation study of decision support for spinal cord injury rehabilitation

Intelligence-based medicine Pub Date : 2024-01-01 Epub Date: 2024-05-11 DOI:10.1016/j.ibmed.2024.100137

Nathan Phelps , Stephanie Marrocco , Stephanie Cornell , Dalton L. Wolfe , Daniel J. Lizotte

{"title":"Reinforcement learning in large, structured action spaces: A simulation study of decision support for spinal cord injury rehabilitation","authors":"Nathan Phelps , Stephanie Marrocco , Stephanie Cornell , Dalton L. Wolfe , Daniel J. Lizotte","doi":"10.1016/j.ibmed.2024.100137","DOIUrl":null,"url":null,"abstract":"<div><p>Reinforcement learning (RL) has helped improve decision-making in several domains but can be challenging to apply; this is the case for rehabilitation of people with a spinal cord injury (SCI). Among other factors, applying RL in this domain is difficult because there are many possible treatments (i.e., large action space) and few detailed records of longitudinal treatments and outcomes (i.e., limited training data). Applying Fitted Q Iteration in this domain with linear models and the most natural state and action representation results in problems with convergence and overfitting. However, isolating treatments from one another can mitigate the convergence issue, and treatments for SCIs have meaningful groupings that can be used to combat overfitting. We propose two approaches to grouping treatments so that an RL agent can learn effectively from limited data. One relies on domain knowledge of SCI rehabilitation and the other learns similarities among treatments using an embedding technique. After re-interpreting the data using these treatment grouping approaches in conjunction with our process that isolates the treatment groups, we use Fitted Q Iteration to train an agent that learns to select better treatments. Through a simulation study designed to reflect the properties of SCI rehabilitation, we find that agents trained after using either grouping method can help improve the treatment decisions of individual physiotherapists, but the approach based on domain knowledge offers better performance. Our findings provide a proof of concept that applying RL has the potential to help improve the treatment of those with an SCI and indicates that continued efforts to gather data and apply RL to this domain are worthwhile.</p></div>","PeriodicalId":73399,"journal":{"name":"Intelligence-based medicine","volume":"9 ","pages":"Article 100137"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666521224000048/pdfft?md5=0e9b4fe44a6fce7ea5f3e30e6224f595&pid=1-s2.0-S2666521224000048-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligence-based medicine","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666521224000048","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/5/11 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Reinforcement learning (RL) has helped improve decision-making in several domains but can be challenging to apply; this is the case for rehabilitation of people with a spinal cord injury (SCI). Among other factors, applying RL in this domain is difficult because there are many possible treatments (i.e., large action space) and few detailed records of longitudinal treatments and outcomes (i.e., limited training data). Applying Fitted Q Iteration in this domain with linear models and the most natural state and action representation results in problems with convergence and overfitting. However, isolating treatments from one another can mitigate the convergence issue, and treatments for SCIs have meaningful groupings that can be used to combat overfitting. We propose two approaches to grouping treatments so that an RL agent can learn effectively from limited data. One relies on domain knowledge of SCI rehabilitation and the other learns similarities among treatments using an embedding technique. After re-interpreting the data using these treatment grouping approaches in conjunction with our process that isolates the treatment groups, we use Fitted Q Iteration to train an agent that learns to select better treatments. Through a simulation study designed to reflect the properties of SCI rehabilitation, we find that agents trained after using either grouping method can help improve the treatment decisions of individual physiotherapists, but the approach based on domain knowledge offers better performance. Our findings provide a proof of concept that applying RL has the potential to help improve the treatment of those with an SCI and indicates that continued efforts to gather data and apply RL to this domain are worthwhile.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

大型结构化行动空间中的强化学习：脊髓损伤康复决策支持模拟研究

强化学习（RL）有助于改善多个领域的决策，但在应用时可能会遇到困难；脊髓损伤（SCI）患者的康复治疗就属于这种情况。除其他因素外，在这一领域应用 RL 的难度还在于，可能的治疗方法很多（即行动空间大），而纵向治疗和结果的详细记录却很少（即训练数据有限）。在这一领域使用线性模型和最自然的状态与动作表示法进行拟合 Q 迭代会导致收敛和过拟合问题。不过，将治疗方法相互隔离可以缓解收敛问题，SCIs 的治疗方法可以进行有意义的分组，以应对过拟合问题。我们提出了两种对治疗方法进行分组的方法，这样 RL 代理就能从有限的数据中有效地学习。一种方法依赖于 SCI 康复领域的知识，另一种方法则利用嵌入技术学习治疗方法之间的相似性。在使用这些治疗分组方法结合我们分离治疗组的过程重新解释数据后，我们使用拟合 Q 迭代来训练一个代理，使其学会选择更好的治疗方法。通过一项旨在反映 SCI 康复特性的模拟研究，我们发现，使用任何一种分组方法训练出来的代理都能帮助物理治疗师改进治疗决策，但基于领域知识的方法性能更好。我们的研究结果提供了一个概念证明，即应用 RL 有可能帮助改善 SCI 患者的治疗，并表明值得继续努力收集数据并将 RL 应用于该领域。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊