A survey on interpretable reinforcement learning

IF 2.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Machine Learning Pub Date : 2024-04-19 DOI:10.1007/s10994-024-06543-w

Claire Glanois, Paul Weng, Matthieu Zimmer, Dong Li, Tianpei Yang, Jianye Hao, Wulong Liu

{"title":"A survey on interpretable reinforcement learning","authors":"Claire Glanois, Paul Weng, Matthieu Zimmer, Dong Li, Tianpei Yang, Jianye Hao, Wulong Liu","doi":"10.1007/s10994-024-06543-w","DOIUrl":null,"url":null,"abstract":"<p>Although deep reinforcement learning has become a promising machine learning approach for sequential decision-making problems, it is still not mature enough for high-stake domains such as autonomous driving or medical applications. In such contexts, a learned policy needs for instance to be interpretable, so that it can be inspected before any deployment (e.g., for safety and verifiability reasons). This survey provides an overview of various approaches to achieve higher interpretability in reinforcement learning (RL). To that aim, we distinguish interpretability (as an intrinsic property of a model) and explainability (as a post-hoc operation) and discuss them in the context of RL with an emphasis on the former notion. In particular, we argue that interpretable RL may embrace different facets: interpretable inputs, interpretable (transition/reward) models, and interpretable decision-making. Based on this scheme, we summarize and analyze recent work related to interpretable RL with an emphasis on papers published in the past 10 years. We also discuss briefly some related research areas and point to some potential promising research directions, notably related to the recent development of foundation models (e.g., large language models, RL from human feedback).</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"33 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10994-024-06543-w","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Although deep reinforcement learning has become a promising machine learning approach for sequential decision-making problems, it is still not mature enough for high-stake domains such as autonomous driving or medical applications. In such contexts, a learned policy needs for instance to be interpretable, so that it can be inspected before any deployment (e.g., for safety and verifiability reasons). This survey provides an overview of various approaches to achieve higher interpretability in reinforcement learning (RL). To that aim, we distinguish interpretability (as an intrinsic property of a model) and explainability (as a post-hoc operation) and discuss them in the context of RL with an emphasis on the former notion. In particular, we argue that interpretable RL may embrace different facets: interpretable inputs, interpretable (transition/reward) models, and interpretable decision-making. Based on this scheme, we summarize and analyze recent work related to interpretable RL with an emphasis on papers published in the past 10 years. We also discuss briefly some related research areas and point to some potential promising research directions, notably related to the recent development of foundation models (e.g., large language models, RL from human feedback).

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

可解释强化学习调查

虽然深度强化学习已成为一种很有前途的机器学习方法，可用于连续决策问题，但对于自动驾驶或医疗应用等高风险领域来说，它还不够成熟。在这种情况下，学习到的策略需要具有可解释性，以便在部署前对其进行检查（例如，出于安全性和可验证性的原因）。本调查概述了在强化学习（RL）中实现更高可解释性的各种方法。为此，我们区分了可解释性（作为模型的固有属性）和可解释性（作为事后操作），并在 RL 的背景下对它们进行了讨论，重点放在前者的概念上。特别是，我们认为可解释的 RL 可能包含不同的方面：可解释的输入、可解释的（过渡/回报）模型和可解释的决策。基于这一方案，我们总结并分析了与可解释 RL 相关的最新研究成果，重点是过去 10 年发表的论文。我们还简要讨论了一些相关的研究领域，并指出了一些潜在的有前途的研究方向，特别是与基础模型（如大型语言模型、来自人类反馈的 RL）的最新发展相关的研究方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Machine Learning 工程技术-计算机：人工智能

CiteScore

11.00

自引率

2.70%

发文量

162

审稿时长

3 months

期刊介绍： Machine Learning serves as a global platform dedicated to computational approaches in learning. The journal reports substantial findings on diverse learning methods applied to various problems, offering support through empirical studies, theoretical analysis, or connections to psychological phenomena. It demonstrates the application of learning methods to solve significant problems and aims to enhance the conduct of machine learning research with a focus on verifiable and replicable evidence in published papers.

期刊最新文献

Linear Causal Discovery with Interventional Constraints. Interpretable optimisation-based approach for hyper-box classification. Deep latent force models: ODE-based process convolutions for Bayesian deep learning. Offline reinforcement learning for learning to dispatch for job shop scheduling. Computing the distance between unbalanced distributions: the flat metric.