Pub Date : 2024-08-16DOI: 10.1016/j.artint.2024.104204
Value-based reinforcement-learning algorithms have shown strong results in games, robotics, and other real-world applications. Overestimation bias is a known threat to those algorithms and can sometimes lead to dramatic performance decreases or even complete algorithmic failure. We frame the bias problem statistically and consider it an instance of estimating the maximum expected value (MEV) of a set of random variables. We propose the T-Estimator (TE) based on two-sample testing for the mean, that flexibly interpolates between over- and underestimation by adjusting the significance level of the underlying hypothesis tests. We also introduce a generalization, termed K-Estimator (KE), that obeys the same bias and variance bounds as the TE and relies on a nearly arbitrary kernel function. We introduce modifications of Q-Learning and the Bootstrapped Deep Q-Network (BDQN) using the TE and the KE, and prove convergence in the tabular setting. Furthermore, we propose an adaptive variant of the TE-based BDQN that dynamically adjusts the significance level to minimize the absolute estimation bias. All proposed estimators and algorithms are thoroughly tested and validated on diverse tasks and environments, illustrating the bias control and performance potential of the TE and KE.
基于价值的强化学习算法在游戏、机器人和其他实际应用中都取得了很好的效果。高估偏差是这些算法面临的一个已知威胁,有时会导致性能急剧下降甚至算法完全失效。我们从统计学的角度来看待偏差问题,并将其视为估计一组随机变量的最大期望值 (MEV) 的一个实例。我们提出了基于均值双样本检验的 T-估计器(TE),它可以通过调整基本假设检验的显著性水平,在高估和低估之间灵活插值。我们还引入了一种概括,称为 K-估计器(KE),它与 TE 遵循相同的偏差和方差约束,并依赖于近乎任意的核函数。我们介绍了使用 TE 和 KE 对 Q-Learning 和 Bootstrapped Deep Q-Network (BDQN) 的修改,并证明了在表格设置中的收敛性。此外,我们还提出了基于 TE 的 BDQN 的自适应变体,该变体可动态调整显著性水平,使绝对估计偏差最小化。所有提出的估计器和算法都在不同的任务和环境中进行了全面的测试和验证,说明了 TE 和 KE 的偏差控制和性能潜力。
{"title":"Addressing maximization bias in reinforcement learning with two-sample testing","authors":"","doi":"10.1016/j.artint.2024.104204","DOIUrl":"10.1016/j.artint.2024.104204","url":null,"abstract":"<div><p>Value-based reinforcement-learning algorithms have shown strong results in games, robotics, and other real-world applications. Overestimation bias is a known threat to those algorithms and can sometimes lead to dramatic performance decreases or even complete algorithmic failure. We frame the bias problem statistically and consider it an instance of estimating the maximum expected value (MEV) of a set of random variables. We propose the <em>T</em>-Estimator (TE) based on two-sample testing for the mean, that flexibly interpolates between over- and underestimation by adjusting the significance level of the underlying hypothesis tests. We also introduce a generalization, termed <em>K</em>-Estimator (KE), that obeys the same bias and variance bounds as the TE and relies on a nearly arbitrary kernel function. We introduce modifications of <em>Q</em>-Learning and the Bootstrapped Deep <em>Q</em>-Network (BDQN) using the TE and the KE, and prove convergence in the tabular setting. Furthermore, we propose an adaptive variant of the TE-based BDQN that dynamically adjusts the significance level to minimize the absolute estimation bias. All proposed estimators and algorithms are thoroughly tested and validated on diverse tasks and environments, illustrating the bias control and performance potential of the TE and KE.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":5.1,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0004370224001401/pdfft?md5=5b6841aff0d8d49b8cc40332377d2f38&pid=1-s2.0-S0004370224001401-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142020506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-13DOI: 10.1016/j.artint.2024.104201
Many autonomous systems are safety-critical, making it essential to have a closed-loop control system that satisfies constraints arising from underlying physical limitations and safety aspects in a robust manner. However, this is often challenging to achieve for real-world systems. For example, autonomous ships at sea have nonlinear and uncertain dynamics and are subject to numerous time-varying environmental disturbances such as waves, currents, and wind. There is increasing interest in using machine learning-based approaches to adapt these systems to more complex scenarios, but there are few standard frameworks that guarantee the safety and stability of such systems. Recently, predictive safety filters (PSF) have emerged as a promising method to ensure constraint satisfaction in learning-based control, bypassing the need for explicit constraint handling in the learning algorithms themselves. The safety filter approach leads to a modular separation of the problem, allowing the use of arbitrary control policies in a task-agnostic way. The filter takes in a potentially unsafe control action from the main controller and solves an optimization problem to compute a minimal perturbation of the proposed action that adheres to both physical and safety constraints. In this work, we combine reinforcement learning (RL) with predictive safety filtering in the context of marine navigation and control. The RL agent is trained on path-following and safety adherence across a wide range of randomly generated environments, while the predictive safety filter continuously monitors the agents' proposed control actions and modifies them if necessary. The combined PSF/RL scheme is implemented on a simulated model of Cybership II, a miniature replica of a typical supply ship. Safety performance and learning rate are evaluated and compared with those of a standard, non-PSF, RL agent. It is demonstrated that the predictive safety filter is able to keep the vessel safe, while not prohibiting the learning rate and performance of the RL agent.
{"title":"Modular control architecture for safe marine navigation: Reinforcement learning with predictive safety filters","authors":"","doi":"10.1016/j.artint.2024.104201","DOIUrl":"10.1016/j.artint.2024.104201","url":null,"abstract":"<div><p>Many autonomous systems are safety-critical, making it essential to have a closed-loop control system that satisfies constraints arising from underlying physical limitations and safety aspects in a robust manner. However, this is often challenging to achieve for real-world systems. For example, autonomous ships at sea have nonlinear and uncertain dynamics and are subject to numerous time-varying environmental disturbances such as waves, currents, and wind. There is increasing interest in using machine learning-based approaches to adapt these systems to more complex scenarios, but there are few standard frameworks that guarantee the safety and stability of such systems. Recently, predictive safety filters (PSF) have emerged as a promising method to ensure constraint satisfaction in learning-based control, bypassing the need for explicit constraint handling in the learning algorithms themselves. The safety filter approach leads to a modular separation of the problem, allowing the use of arbitrary control policies in a task-agnostic way. The filter takes in a potentially unsafe control action from the main controller and solves an optimization problem to compute a minimal perturbation of the proposed action that adheres to both physical and safety constraints. In this work, we combine reinforcement learning (RL) with predictive safety filtering in the context of marine navigation and control. The RL agent is trained on path-following and safety adherence across a wide range of randomly generated environments, while the predictive safety filter continuously monitors the agents' proposed control actions and modifies them if necessary. The combined PSF/RL scheme is implemented on a simulated model of Cybership II, a miniature replica of a typical supply ship. Safety performance and learning rate are evaluated and compared with those of a standard, non-PSF, RL agent. It is demonstrated that the predictive safety filter is able to keep the vessel safe, while not prohibiting the learning rate and performance of the RL agent.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":5.1,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0004370224001371/pdfft?md5=32cb7040f174b219329c813dbac41fde&pid=1-s2.0-S0004370224001371-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141985125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-08DOI: 10.1016/j.artint.2024.104194
Quantified conflict-driven clause learning (QCDCL) is one of the main approaches for solving quantified Boolean formulas (QBF). We formalise and investigate several versions of QCDCL that include cube learning and/or pure-literal elimination, and formally compare the resulting solving variants via proof complexity techniques. Our results show that almost all of the QCDCL variants are exponentially incomparable with respect to proof size (and hence solver running time), pointing towards different orthogonal ways how to practically implement QCDCL.
{"title":"QCDCL with cube learning or pure literal elimination – What is best?","authors":"","doi":"10.1016/j.artint.2024.104194","DOIUrl":"10.1016/j.artint.2024.104194","url":null,"abstract":"<div><p>Quantified conflict-driven clause learning (QCDCL) is one of the main approaches for solving quantified Boolean formulas (QBF). We formalise and investigate several versions of QCDCL that include cube learning and/or pure-literal elimination, and formally compare the resulting solving variants via proof complexity techniques. Our results show that almost all of the QCDCL variants are exponentially incomparable with respect to proof size (and hence solver running time), pointing towards different orthogonal ways how to practically implement QCDCL.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":5.1,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0004370224001309/pdfft?md5=5239acd648349c514fda83a672a66c32&pid=1-s2.0-S0004370224001309-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141910632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-05DOI: 10.1016/j.artint.2024.104199
It has been increasingly recognized that identifying roles of formulas of a knowledge base in the inconsistency of that base can help us better look inside the inconsistency. However, there are few approaches to identifying such roles of formulas from a perspective of models in some paraconsistent logic, one of typical tools used to characterize inconsistency in semantics. In this paper, we characterize the role of each formula in the inconsistency arising in a knowledge base from informational as well as causal aspects in the framework of Priest's minimally inconsistent logic of paradox. At first, we identify the causal responsibility of a formula for the inconsistency based on the counterfactual dependence of the inconsistency on the formula under some contingency in semantics. Then we incorporate the change on semantic information in the framework of causal responsibility to develop the informational responsibility of a formula for the inconsistency to capture the contribution made by the formula for the inconsistent information. This incorporation makes the informational responsibility interpretable from the point of view of causality, and capable of catching the role of a formula in inconsistent information concisely. In addition, we propose notions of naive and quasi naive responsibilities as two auxiliaries to describe special relations between inconsistency and formulas in semantic sense. Some intuitive and interesting properties of the two kinds of responsibilities are also discussed.
{"title":"Identifying roles of formulas in inconsistency under Priest's minimally inconsistent logic of paradox","authors":"","doi":"10.1016/j.artint.2024.104199","DOIUrl":"10.1016/j.artint.2024.104199","url":null,"abstract":"<div><p>It has been increasingly recognized that identifying roles of formulas of a knowledge base in the inconsistency of that base can help us better look inside the inconsistency. However, there are few approaches to identifying such roles of formulas from a perspective of models in some paraconsistent logic, one of typical tools used to characterize inconsistency in semantics. In this paper, we characterize the role of each formula in the inconsistency arising in a knowledge base from informational as well as causal aspects in the framework of Priest's minimally inconsistent logic of paradox. At first, we identify the causal responsibility of a formula for the inconsistency based on the counterfactual dependence of the inconsistency on the formula under some contingency in semantics. Then we incorporate the change on semantic information in the framework of causal responsibility to develop the informational responsibility of a formula for the inconsistency to capture the contribution made by the formula for the inconsistent information. This incorporation makes the informational responsibility interpretable from the point of view of causality, and capable of catching the role of a formula in inconsistent information concisely. In addition, we propose notions of naive and quasi naive responsibilities as two auxiliaries to describe special relations between inconsistency and formulas in semantic sense. Some intuitive and interesting properties of the two kinds of responsibilities are also discussed.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":5.1,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141910633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-05DOI: 10.1016/j.artint.2024.104200
Iterated belief revision requires information about the current beliefs. This information is represented by mathematical structures called doxastic states. Most literature concentrates on how to revise a doxastic state and neglects that it may exponentially grow. This problem is studied for the most common ways of storing a doxastic state. All four of them are able to store every doxastic state, but some do it in less space than others. In particular, the explicit representation (an enumeration of the current beliefs) is the more wasteful on space. The level representation (a sequence of propositional formulae) and the natural representation (a history of natural revisions) are more succinct than it. The lexicographic representation (a history of lexicographic revision) is even more succinct than them.
{"title":"Representing states in iterated belief revision","authors":"","doi":"10.1016/j.artint.2024.104200","DOIUrl":"10.1016/j.artint.2024.104200","url":null,"abstract":"<div><p>Iterated belief revision requires information about the current beliefs. This information is represented by mathematical structures called doxastic states. Most literature concentrates on how to revise a doxastic state and neglects that it may exponentially grow. This problem is studied for the most common ways of storing a doxastic state. All four of them are able to store every doxastic state, but some do it in less space than others. In particular, the explicit representation (an enumeration of the current beliefs) is the more wasteful on space. The level representation (a sequence of propositional formulae) and the natural representation (a history of natural revisions) are more succinct than it. The lexicographic representation (a history of lexicographic revision) is even more succinct than them.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":5.1,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141910631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-02DOI: 10.1016/j.artint.2024.104197
Real-world data are often inconsistent. Although a substantial amount of research has been done on measuring inconsistency, this research concentrated on knowledge bases formalized in propositional logic. Recently, inconsistency measures have been introduced for relational databases. However, nowadays, real-world information is always more frequently represented by graph-based structures which offer a more intuitive conceptualization than relational ones. In this paper, we explore inconsistency measures for graph databases with regular path constraints, a class of integrity constraints based on a well-known navigational language for graph data. In this context, we define several inconsistency measures dealing with specific elements contributing to inconsistency in graph databases. We also define some rationality postulates that are desirable properties for an inconsistency measure for graph databases. We analyze the compliance of each measure with each postulate and find various degrees of satisfaction; in fact, one of the measures satisfies all the postulates. Finally, we investigate the data and combined complexity of the calculation of all the measures as well as the complexity of deciding whether a measure is lower than, equal to, or greater than a given threshold. It turns out that for a majority of the measures these problems are tractable, while for the other different levels of intractability are exhibited.
{"title":"On measuring inconsistency in graph databases with regular path constraints","authors":"","doi":"10.1016/j.artint.2024.104197","DOIUrl":"10.1016/j.artint.2024.104197","url":null,"abstract":"<div><p>Real-world data are often inconsistent. Although a substantial amount of research has been done on measuring inconsistency, this research concentrated on knowledge bases formalized in propositional logic. Recently, inconsistency measures have been introduced for relational databases. However, nowadays, real-world information is always more frequently represented by graph-based structures which offer a more intuitive conceptualization than relational ones. In this paper, we explore inconsistency measures for graph databases with regular path constraints, a class of integrity constraints based on a well-known navigational language for graph data. In this context, we define several inconsistency measures dealing with specific elements contributing to inconsistency in graph databases. We also define some rationality postulates that are desirable properties for an inconsistency measure for graph databases. We analyze the compliance of each measure with each postulate and find various degrees of satisfaction; in fact, one of the measures satisfies all the postulates. Finally, we investigate the data and combined complexity of the calculation of all the measures as well as the complexity of deciding whether a measure is lower than, equal to, or greater than a given threshold. It turns out that for a majority of the measures these problems are tractable, while for the other different levels of intractability are exhibited.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":5.1,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0004370224001334/pdfft?md5=113adf90619058fb60d34c4ed866c0e0&pid=1-s2.0-S0004370224001334-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141910662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-02DOI: 10.1016/j.artint.2024.104195
Autonomous systems are increasingly used in highly variable and uncertain environments giving rise to the pressing need to consider risk in both the synthesis and verification of policies for these systems. This paper first develops a sample-based method to upper bound the risk measure evaluation of a random variable whose distribution is unknown. These bounds permit us to generate high-confidence verification statements for a large class of robotic systems in a sample-efficient manner. Second, we develop a sample-based method to determine solutions to non-convex optimization problems that outperform a large fraction of the decision space of possible solutions. Both sample-based approaches then permit us to rapidly synthesize risk-aware policies that are guaranteed to achieve a minimum level of system performance. To showcase our approach in simulation, we verify a cooperative multi-agent system and develop a risk-aware controller that outperforms the system's baseline controller. Our approach can be extended to account for any g-entropic risk measure.
{"title":"Sample-based bounds for coherent risk measures: Applications to policy synthesis and verification","authors":"","doi":"10.1016/j.artint.2024.104195","DOIUrl":"10.1016/j.artint.2024.104195","url":null,"abstract":"<div><p>Autonomous systems are increasingly used in highly variable and uncertain environments giving rise to the pressing need to consider risk in both the synthesis and verification of policies for these systems. This paper first develops a sample-based method to upper bound the risk measure evaluation of a random variable whose distribution is unknown. These bounds permit us to generate high-confidence verification statements for a large class of robotic systems in a sample-efficient manner. Second, we develop a sample-based method to determine solutions to non-convex optimization problems that outperform a large fraction of the decision space of possible solutions. Both sample-based approaches then permit us to rapidly synthesize risk-aware policies that are guaranteed to achieve a minimum level of system performance. To showcase our approach in simulation, we verify a cooperative multi-agent system and develop a risk-aware controller that outperforms the system's baseline controller. Our approach can be extended to account for any <em>g</em>-entropic risk measure.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":5.1,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142012116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-02DOI: 10.1016/j.artint.2024.104196
In peer mechanisms, the competitors for a prize also determine who wins. Each competitor may be asked to rank, grade, or nominate peers for the prize. Since the prize can be valuable, such as financial aid, course grades, or an award at a conference, competitors may be tempted to manipulate the mechanism. We survey approaches to prevent or discourage the manipulation of peer mechanisms. We conclude our survey by identifying several important research challenges.
{"title":"Manipulation and peer mechanisms: A survey","authors":"","doi":"10.1016/j.artint.2024.104196","DOIUrl":"10.1016/j.artint.2024.104196","url":null,"abstract":"<div><p>In peer mechanisms, the competitors for a prize also determine who wins. Each competitor may be asked to rank, grade, or nominate peers for the prize. Since the prize can be valuable, such as financial aid, course grades, or an award at a conference, competitors may be tempted to manipulate the mechanism. We survey approaches to prevent or discourage the manipulation of peer mechanisms. We conclude our survey by identifying several important research challenges.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":5.1,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0004370224001322/pdfft?md5=42efb319f0313556632e659a0b77231b&pid=1-s2.0-S0004370224001322-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141910637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-02DOI: 10.1016/j.artint.2024.104198
Due to the emergence of AI systems that interact with the physical environment, there is an increased interest in incorporating physical reasoning capabilities into those AI systems. But is it enough to only have physical reasoning capabilities to operate in a real physical environment? In the real world, we constantly face novel situations we have not encountered before. As humans, we are competent at successfully adapting to those situations. Similarly, an agent needs to have the ability to function under the impact of novelties in order to properly operate in an open-world physical environment. To facilitate the development of such AI systems, we propose a new benchmark, NovPhy, that requires an agent to reason about physical scenarios in the presence of novelties and take actions accordingly. The benchmark consists of tasks that require agents to detect and adapt to novelties in physical scenarios. To create tasks in the benchmark, we develop eight novelties representing a diverse novelty space and apply them to five commonly encountered scenarios in a physical environment, related to applying forces and motions such as rolling, falling, and sliding of objects. According to our benchmark design, we evaluate two capabilities of an agent: the performance on a novelty when it is applied to different physical scenarios and the performance on a physical scenario when different novelties are applied to it. We conduct a thorough evaluation with human players, learning agents, and heuristic agents. Our evaluation shows that humans' performance is far beyond the agents' performance. Some agents, even with good normal task performance, perform significantly worse when there is a novelty, and the agents that can adapt to novelties typically adapt slower than humans. We promote the development of intelligent agents capable of performing at the human level or above when operating in open-world physical environments. benchmark website: https://github.com/phy-q/novphy
{"title":"NovPhy: A Physical Reasoning Benchmark for Open-world AI Systems","authors":"","doi":"10.1016/j.artint.2024.104198","DOIUrl":"10.1016/j.artint.2024.104198","url":null,"abstract":"<div><p>Due to the emergence of AI systems that interact with the physical environment, there is an increased interest in incorporating physical reasoning capabilities into those AI systems. But is it enough to only have physical reasoning capabilities to operate in a real physical environment? In the real world, we constantly face novel situations we have not encountered before. As humans, we are competent at successfully adapting to those situations. Similarly, an agent needs to have the ability to function under the impact of novelties in order to properly operate in an open-world physical environment. To facilitate the development of such AI systems, we propose a new benchmark, NovPhy, that requires an agent to reason about physical scenarios in the presence of novelties and take actions accordingly. The benchmark consists of tasks that require agents to detect and adapt to novelties in physical scenarios. To create tasks in the benchmark, we develop eight novelties representing a diverse novelty space and apply them to five commonly encountered scenarios in a physical environment, related to applying forces and motions such as rolling, falling, and sliding of objects. According to our benchmark design, we evaluate two capabilities of an agent: the performance on a novelty when it is applied to different physical scenarios and the performance on a physical scenario when different novelties are applied to it. We conduct a thorough evaluation with human players, learning agents, and heuristic agents. Our evaluation shows that humans' performance is far beyond the agents' performance. Some agents, even with good normal task performance, perform significantly worse when there is a novelty, and the agents that can adapt to novelties typically adapt slower than humans. We promote the development of intelligent agents capable of performing at the human level or above when operating in open-world physical environments. benchmark website: <span><span>https://github.com/phy-q/novphy</span><svg><path></path></svg></span></p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":5.1,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0004370224001346/pdfft?md5=387702c1b2d7756ba391c869b2457b2c&pid=1-s2.0-S0004370224001346-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142012115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-30DOI: 10.1016/j.artint.2024.104178
We study a participatory budgeting problem, where a set of strategic agents wish to split a divisible budget among different projects, by aggregating their proposals on a single division. Unfortunately, the straightforward rule that divides the budget proportionally is susceptible to manipulation. Recently, a class of truthful mechanisms has been proposed, namely the moving phantom mechanisms. One such mechanism satisfies the proportionality property, in the sense that in the extreme case where all agents prefer a single project to receive the whole amount, the budget is assigned proportionally.
While proportionality is a naturally desired property, it is defined over a limited type of preference profiles. To address this, we expand the notion of proportionality, by proposing a quantitative framework that evaluates a budget aggregation mechanism according to its worst-case distance from the proportional allocation. Crucially, this is defined for every preference profile. We study this measure on the class of moving phantom mechanisms, and we provide approximation guarantees. For two projects, we show that the Uniform Phantom mechanism is optimal among all truthful mechanisms. For three projects, we propose a new, proportional mechanism that is virtually optimal among all moving phantom mechanisms. Finally, we provide impossibility results regarding the approximability of moving phantom mechanisms.
{"title":"Truthful aggregation of budget proposals with proportionality guarantees","authors":"","doi":"10.1016/j.artint.2024.104178","DOIUrl":"10.1016/j.artint.2024.104178","url":null,"abstract":"<div><p>We study a participatory budgeting problem, where a set of strategic agents wish to split a divisible budget among different projects, by aggregating their proposals on a single division. Unfortunately, the straightforward rule that divides the budget proportionally is susceptible to manipulation. Recently, a class of truthful mechanisms has been proposed, namely the moving phantom mechanisms. One such mechanism satisfies the proportionality property, in the sense that in the extreme case where all agents prefer a single project to receive the whole amount, the budget is assigned proportionally.</p><p>While proportionality is a naturally desired property, it is defined over a limited type of preference profiles. To address this, we expand the notion of proportionality, by proposing a quantitative framework that evaluates a budget aggregation mechanism according to its worst-case distance from the proportional allocation. Crucially, this is defined for every preference profile. We study this measure on the class of moving phantom mechanisms, and we provide approximation guarantees. For two projects, we show that the Uniform Phantom mechanism is optimal among all truthful mechanisms. For three projects, we propose a new, proportional mechanism that is virtually optimal among all moving phantom mechanisms. Finally, we provide impossibility results regarding the approximability of moving phantom mechanisms.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":5.1,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141892011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}