Anna L Trella, Kelly W Zhang, Inbal Nahum-Shani, Vivek Shetty, Finale Doshi-Velez, Susan A Murphy
{"title":"支持口腔自我护理的在线强化学习算法的奖励设计","authors":"Anna L Trella, Kelly W Zhang, Inbal Nahum-Shani, Vivek Shetty, Finale Doshi-Velez, Susan A Murphy","doi":"10.1609/aaai.v37i13.26866","DOIUrl":null,"url":null,"abstract":"<p><p>While dental disease is largely preventable, professional advice on optimal oral hygiene practices is often forgotten or abandoned by patients. Therefore patients may benefit from timely and personalized encouragement to engage in oral self-care behaviors. In this paper, we develop an online reinforcement learning (RL) algorithm for use in optimizing the delivery of mobile-based prompts to encourage oral hygiene behaviors. One of the main challenges in developing such an algorithm is ensuring that the algorithm considers the impact of current actions on the effectiveness of future actions (i.e., delayed effects), especially when the algorithm has been designed to run stably and autonomously in a constrained, real-world setting characterized by highly noisy, sparse data. We address this challenge by designing a quality reward that maximizes the desired health outcome (i.e., high-quality brushing) while minimizing user burden. We also highlight a procedure for optimizing the hyperparameters of the reward by building a simulation environment test bed and evaluating candidates using the test bed. The RL algorithm discussed in this paper will be deployed in Oralytics. To the best of our knowledge, Oralytics is the first mobile health study utilizing an RL algorithm designed to prevent dental disease by optimizing the delivery of motivational messages supporting oral self-care behaviors.</p>","PeriodicalId":74524,"journal":{"name":"Proceedings of the ... Innovative Applications of Artificial Intelligence Conference. Innovative Applications of Artificial Intelligence Conference","volume":"37 13","pages":"15724-15730"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10457015/pdf/nihms-1851571.pdf","citationCount":"0","resultStr":"{\"title\":\"Reward Design For An Online Reinforcement Learning Algorithm Supporting Oral Self-Care.\",\"authors\":\"Anna L Trella, Kelly W Zhang, Inbal Nahum-Shani, Vivek Shetty, Finale Doshi-Velez, Susan A Murphy\",\"doi\":\"10.1609/aaai.v37i13.26866\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>While dental disease is largely preventable, professional advice on optimal oral hygiene practices is often forgotten or abandoned by patients. Therefore patients may benefit from timely and personalized encouragement to engage in oral self-care behaviors. In this paper, we develop an online reinforcement learning (RL) algorithm for use in optimizing the delivery of mobile-based prompts to encourage oral hygiene behaviors. One of the main challenges in developing such an algorithm is ensuring that the algorithm considers the impact of current actions on the effectiveness of future actions (i.e., delayed effects), especially when the algorithm has been designed to run stably and autonomously in a constrained, real-world setting characterized by highly noisy, sparse data. We address this challenge by designing a quality reward that maximizes the desired health outcome (i.e., high-quality brushing) while minimizing user burden. We also highlight a procedure for optimizing the hyperparameters of the reward by building a simulation environment test bed and evaluating candidates using the test bed. The RL algorithm discussed in this paper will be deployed in Oralytics. To the best of our knowledge, Oralytics is the first mobile health study utilizing an RL algorithm designed to prevent dental disease by optimizing the delivery of motivational messages supporting oral self-care behaviors.</p>\",\"PeriodicalId\":74524,\"journal\":{\"name\":\"Proceedings of the ... Innovative Applications of Artificial Intelligence Conference. Innovative Applications of Artificial Intelligence Conference\",\"volume\":\"37 13\",\"pages\":\"15724-15730\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10457015/pdf/nihms-1851571.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... Innovative Applications of Artificial Intelligence Conference. Innovative Applications of Artificial Intelligence Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1609/aaai.v37i13.26866\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... Innovative Applications of Artificial Intelligence Conference. Innovative Applications of Artificial Intelligence Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/aaai.v37i13.26866","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Reward Design For An Online Reinforcement Learning Algorithm Supporting Oral Self-Care.
While dental disease is largely preventable, professional advice on optimal oral hygiene practices is often forgotten or abandoned by patients. Therefore patients may benefit from timely and personalized encouragement to engage in oral self-care behaviors. In this paper, we develop an online reinforcement learning (RL) algorithm for use in optimizing the delivery of mobile-based prompts to encourage oral hygiene behaviors. One of the main challenges in developing such an algorithm is ensuring that the algorithm considers the impact of current actions on the effectiveness of future actions (i.e., delayed effects), especially when the algorithm has been designed to run stably and autonomously in a constrained, real-world setting characterized by highly noisy, sparse data. We address this challenge by designing a quality reward that maximizes the desired health outcome (i.e., high-quality brushing) while minimizing user burden. We also highlight a procedure for optimizing the hyperparameters of the reward by building a simulation environment test bed and evaluating candidates using the test bed. The RL algorithm discussed in this paper will be deployed in Oralytics. To the best of our knowledge, Oralytics is the first mobile health study utilizing an RL algorithm designed to prevent dental disease by optimizing the delivery of motivational messages supporting oral self-care behaviors.