Discounted fully probabilistic design of decision rules

IF 8.1 1区计算机科学 0 COMPUTER SCIENCE, INFORMATION SYSTEMS Information Sciences Pub Date : 2024-10-22 DOI:10.1016/j.ins.2024.121578

Miroslav Kárný, Soňa Molnárová

{"title":"Discounted fully probabilistic design of decision rules","authors":"Miroslav Kárný, Soňa Molnárová","doi":"10.1016/j.ins.2024.121578","DOIUrl":null,"url":null,"abstract":"<div><div>Axiomatic fully probabilistic design (FPD) of optimal decision rules strictly extends the decision making (DM) theory represented by Markov decision processes (MDP). This means that any MDP task can be approximated by an explicitly found FPD task whereas many FPD tasks have no MDP equivalent. MDP and FPD model the closed loop — the coupling of an agent and its environment — via a joint probability density (pd) relating the involved random variables, referred to as behaviour. Unlike MDP, FPD quantifies agent's aims and constraints by an <em>ideal pd</em>. The ideal pd is high on the desired behaviours, small on undesired behaviours and zero on forbidden ones. FPD selects the optimal decision rules as the minimiser of Kullback-Leibler's divergence of the closed-loop-modelling pd to its ideal twin. The proximity measure choice follows from the FPD axiomatics.</div><div>MDP minimises the expected total loss, which is usually the sum of discounted partial losses. The discounting reflects the decreasing importance of future losses. It also diminishes the influence of errors caused by:</div><div><figure><img></figure> the imperfection of the employed environment model;</div><div><figure><img></figure> roughly-expressed aims;</div><div><figure><img></figure> the approximate learning and decision-rules design.</div><div>The established FPD cannot currently account for these important features. The paper elaborates the missing discounted version of FPD. This non-trivial filling of the gap in FPD also employs an extension of dynamic programming, which is of an independent interest.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"690 ","pages":"Article 121578"},"PeriodicalIF":8.1000,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025524014920","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Axiomatic fully probabilistic design (FPD) of optimal decision rules strictly extends the decision making (DM) theory represented by Markov decision processes (MDP). This means that any MDP task can be approximated by an explicitly found FPD task whereas many FPD tasks have no MDP equivalent. MDP and FPD model the closed loop — the coupling of an agent and its environment — via a joint probability density (pd) relating the involved random variables, referred to as behaviour. Unlike MDP, FPD quantifies agent's aims and constraints by an ideal pd. The ideal pd is high on the desired behaviours, small on undesired behaviours and zero on forbidden ones. FPD selects the optimal decision rules as the minimiser of Kullback-Leibler's divergence of the closed-loop-modelling pd to its ideal twin. The proximity measure choice follows from the FPD axiomatics.

MDP minimises the expected total loss, which is usually the sum of discounted partial losses. The discounting reflects the decreasing importance of future losses. It also diminishes the influence of errors caused by:

the imperfection of the employed environment model;

roughly-expressed aims;

the approximate learning and decision-rules design.

The established FPD cannot currently account for these important features. The paper elaborates the missing discounted version of FPD. This non-trivial filling of the gap in FPD also employs an extension of dynamic programming, which is of an independent interest.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

决策规则的全概率贴现设计

最优决策规则的公理全概率设计（FPD）严格扩展了马尔可夫决策过程（MDP）所代表的决策（DM）理论。这意味着任何 MDP 任务都可以用明确找到的 FPD 任务来近似，而许多 FPD 任务却没有与 MDP 相对应的任务。马尔可夫决策过程和 FPD 通过相关随机变量的联合概率密度 (pd) 对闭环（即代理与其环境的耦合）进行建模，并将其称为行为。与 MDP 不同，FPD 通过理想 pd 量化代理的目标和约束。理想 pd 在期望行为上为高，在不期望行为上为小，在禁止行为上为零。FPD 根据闭环建模 pd 与理想 pd 的库尔巴克-莱伯勒发散值的最小值来选择最优决策规则。MDP 最小化预期总损失，通常是折现部分损失之和。贴现反映了未来损失重要性的递减。它还能减少以下因素造成的误差：所使用环境模型的不完善；目标表达粗糙；近似学习和决策规则设计。本文阐述了 FPD 的缺失折扣版本。对 FPD 缺陷的这一非同小可的填补，还采用了动态编程的扩展，这也是本文的另一个关注点。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Information Sciences 工程技术-计算机：信息系统

CiteScore

14.00

自引率

17.30%

发文量

1322

审稿时长

10.4 months

期刊介绍： Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.