{"title":"Stabilizing Diffusion Model for Robotic Control With Dynamic Programming and Transition Feasibility","authors":"Haoran Li;Yaocheng Zhang;Haowei Wen;Yuanheng Zhu;Dongbin Zhao","doi":"10.1109/TAI.2024.3387401","DOIUrl":null,"url":null,"abstract":"Due to its strong ability in distribution representation, the diffusion model has been incorporated into offline reinforcement learning (RL) to cover diverse trajectories of the complex behavior policy. However, this also causes several challenges. Training the diffusion model to imitate behavior from the collected trajectories suffers from limited stitching capability which derives better policies from suboptimal trajectories. Furthermore, the inherent randomness of the diffusion model can lead to unpredictable control and dangerous behavior for the robot. To address these concerns, we propose the value-learning-based decision diffuser (V-DD), which consists of the trajectory diffusion module (TDM) and the trajectory evaluation module (TEM). During the training process, the TDM combines the state-value and classifier-free guidance to bolster the ability to stitch suboptimal trajectories. During the inference process, we design the TEM to select a feasible trajectory generated by the diffusion model. Empirical results demonstrate that our method delivers competitive results on the D4RL benchmark and substantially outperforms current diffusion model-based methods on the real-world robot task.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 9","pages":"4585-4594"},"PeriodicalIF":0.0000,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10496464/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Due to its strong ability in distribution representation, the diffusion model has been incorporated into offline reinforcement learning (RL) to cover diverse trajectories of the complex behavior policy. However, this also causes several challenges. Training the diffusion model to imitate behavior from the collected trajectories suffers from limited stitching capability which derives better policies from suboptimal trajectories. Furthermore, the inherent randomness of the diffusion model can lead to unpredictable control and dangerous behavior for the robot. To address these concerns, we propose the value-learning-based decision diffuser (V-DD), which consists of the trajectory diffusion module (TDM) and the trajectory evaluation module (TEM). During the training process, the TDM combines the state-value and classifier-free guidance to bolster the ability to stitch suboptimal trajectories. During the inference process, we design the TEM to select a feasible trajectory generated by the diffusion model. Empirical results demonstrate that our method delivers competitive results on the D4RL benchmark and substantially outperforms current diffusion model-based methods on the real-world robot task.