{"title":"Multi-Objective Inverse Reinforcement Learning via Non-Negative Matrix Factorization","authors":"Daiko Kishikawa, S. Arai","doi":"10.1109/iiai-aai53430.2021.00078","DOIUrl":null,"url":null,"abstract":"In recent years, inverse reinforcement learning, which estimates the reward from the sequence of states followed by an expert (trajectory), has been attracting attention in terms of imitating complex behaviors and estimating the intentions of people or animals. Existing inverse reinforcement learning methods assume that the expert has a single objective. However, it is more natural to assume that experts have multiple objectives in the real world. A previous paper proposed a method for estimating an expert's preferences for each objective (i.e., weights) when the true multi-objective reward vector is known. In this study, we formulated the simultaneous estimation of the multi-objective reward vector and weights as a multi-objective inverse reinforcement learning (MOIRL) problem where both are unknown. In this paper, we propose a MOIRL method based on non-negative matrix factorization. Through the results of computational experiments, we show that the proposed method can estimate the rewards and weights from trajectories obtained in a multi-objective environment.","PeriodicalId":414070,"journal":{"name":"2021 10th International Congress on Advanced Applied Informatics (IIAI-AAI)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 10th International Congress on Advanced Applied Informatics (IIAI-AAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iiai-aai53430.2021.00078","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
In recent years, inverse reinforcement learning, which estimates the reward from the sequence of states followed by an expert (trajectory), has been attracting attention in terms of imitating complex behaviors and estimating the intentions of people or animals. Existing inverse reinforcement learning methods assume that the expert has a single objective. However, it is more natural to assume that experts have multiple objectives in the real world. A previous paper proposed a method for estimating an expert's preferences for each objective (i.e., weights) when the true multi-objective reward vector is known. In this study, we formulated the simultaneous estimation of the multi-objective reward vector and weights as a multi-objective inverse reinforcement learning (MOIRL) problem where both are unknown. In this paper, we propose a MOIRL method based on non-negative matrix factorization. Through the results of computational experiments, we show that the proposed method can estimate the rewards and weights from trajectories obtained in a multi-objective environment.