Multi-objective ω-Regular Reinforcement Learning

IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Formal Aspects of Computing Pub Date : 2023-07-18 DOI:https://dl.acm.org/doi/10.1145/3605950
Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, Dominik Wojtczak
{"title":"Multi-objective ω-Regular Reinforcement Learning","authors":"Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, Dominik Wojtczak","doi":"https://dl.acm.org/doi/10.1145/3605950","DOIUrl":null,"url":null,"abstract":"<p>The expanding role of reinforcement learning (RL) in safety-critical system design has promoted ω-automata as a way to express learning requirements—often non-Markovian—with greater ease of expression and interpretation than scalar reward signals. However, real-world sequential decision making situations often involve multiple, potentially conflicting, objectives. Two dominant approaches to express relative preferences over multiple objectives are: (1) <i>weighted preference</i>, where the decision maker provides scalar weights for various objectives, and (2) <i>lexicographic preference</i>, where the decision maker provides an order over the objectives such that any amount of satisfaction of a higher-ordered objective is preferable to any amount of a lower-ordered one. In this article, we study and develop RL algorithms to compute optimal strategies in Markov decision processes against multiple ω-regular objectives under weighted and lexicographic preferences. We provide a translation from multiple ω-regular objectives to a scalar reward signal that is both <i>faithful</i> (maximising reward means maximising probability of achieving the objectives under the corresponding preference) and <i>effective</i> (RL quickly converges to optimal strategies). We have implemented the translations in a formal reinforcement learning tool, <span>Mungojerrie</span>, and we present an experimental evaluation of our technique on benchmark learning problems.</p>","PeriodicalId":50432,"journal":{"name":"Formal Aspects of Computing","volume":"23 1","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Formal Aspects of Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/10.1145/3605950","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

The expanding role of reinforcement learning (RL) in safety-critical system design has promoted ω-automata as a way to express learning requirements—often non-Markovian—with greater ease of expression and interpretation than scalar reward signals. However, real-world sequential decision making situations often involve multiple, potentially conflicting, objectives. Two dominant approaches to express relative preferences over multiple objectives are: (1) weighted preference, where the decision maker provides scalar weights for various objectives, and (2) lexicographic preference, where the decision maker provides an order over the objectives such that any amount of satisfaction of a higher-ordered objective is preferable to any amount of a lower-ordered one. In this article, we study and develop RL algorithms to compute optimal strategies in Markov decision processes against multiple ω-regular objectives under weighted and lexicographic preferences. We provide a translation from multiple ω-regular objectives to a scalar reward signal that is both faithful (maximising reward means maximising probability of achieving the objectives under the corresponding preference) and effective (RL quickly converges to optimal strategies). We have implemented the translations in a formal reinforcement learning tool, Mungojerrie, and we present an experimental evaluation of our technique on benchmark learning problems.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
多目标ω-常规强化学习
强化学习(RL)在安全关键型系统设计中的作用不断扩大,使得ω-自动机成为一种表达学习需求的方式——通常是非马尔科夫的——比标量奖励信号更容易表达和解释。然而,现实世界的顺序决策通常涉及多个潜在冲突的目标。在多个目标上表达相对偏好的两种主要方法是:(1)加权偏好,其中决策者为各种目标提供标量权重;(2)字典偏好,其中决策者提供目标的顺序,以便任何数量的高阶目标的满足都优于任何数量的低阶目标。在本文中,我们研究和开发了RL算法,用于在加权和字典偏好下针对多个ω-正则目标计算马尔可夫决策过程中的最优策略。我们提供了从多个正则目标到标量奖励信号的转换,该信号既忠实(奖励最大化意味着在相应偏好下实现目标的概率最大化)又有效(强化学习快速收敛到最优策略)。我们已经在正式的强化学习工具Mungojerrie中实现了翻译,并在基准学习问题上对我们的技术进行了实验评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Formal Aspects of Computing
Formal Aspects of Computing 工程技术-计算机:软件工程
CiteScore
3.30
自引率
0.00%
发文量
17
审稿时长
>12 weeks
期刊介绍: This journal aims to publish contributions at the junction of theory and practice. The objective is to disseminate applicable research. Thus new theoretical contributions are welcome where they are motivated by potential application; applications of existing formalisms are of interest if they show something novel about the approach or application. In particular, the scope of Formal Aspects of Computing includes: well-founded notations for the description of systems; verifiable design methods; elucidation of fundamental computational concepts; approaches to fault-tolerant design; theorem-proving support; state-exploration tools; formal underpinning of widely used notations and methods; formal approaches to requirements analysis.
期刊最新文献
A Calculus for the Specification, Design, and Verification of Distributed Concurrent Systems Trace Semantics for C++11 Memory Model SecCT: Secure and scalable count query models on encrypted genomic data On Formal Methods Thinking in Computer Science Education FuSeBMC v4: Improving code coverage with smart seeds via BMC, fuzzing and static analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1