Journal of Artificial Intelligence Research最新文献_第9页

On Efficient Reinforcement Learning for Full-length Game of StarCraft II 关于《星际争霸2》全长游戏的有效强化学习

IF 5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Artificial Intelligence Research

Pub Date : 2022-09-23 DOI: 10.48550/arXiv.2209.11553

Ruo-Ze Liu, Zhen-Jia Pang, Zhou-Yu Meng, Wenhai Wang, Yang Yu, Tong Lu

StarCraft II (SC2) poses a grand challenge for reinforcement learning (RL), of which the main difficulties include huge state space, varying action space, and a long time horizon. In this work, we investigate a set of RL techniques for the full-length game of StarCraft II. We investigate a hierarchical RL approach, where the hierarchy involves two. One is the extracted macro-actions from experts’ demonstration trajectories to reduce the action space in an order of magnitude. The other is a hierarchical architecture of neural networks, which is modular and facilitates scale. We investigate a curriculum transfer training procedure that trains the agent from the simplest level to the hardest level. We train the agent on a single machine with 4 GPUs and 48 CPU threads. On a 64x64 map and using restrictive units, we achieve a win rate of 99% against the difficulty level-1 built-in AI. Through the curriculum transfer learning algorithm and a mixture of combat models, we achieve a 93% win rate against the most difficult non-cheating level built-in AI (level-7). In this extended version of the paper, we improve our architecture to train the agent against the most difficult cheating level AIs (level-8, level-9, and level-10). We also test our method on different maps to evaluate the extensibility of our approach. By a final 3-layer hierarchical architecture and applying significant tricks to train SC2 agents, we increase the win rate against the level-8, level-9, and level-10 to 96%, 97%, and 94%, respectively. Our codes and models are all open-sourced now at https://github.com/liuruoze/HierNet-SC2.To provide a baseline referring the AlphaStar for our work as well as the research and open-source community, we reproduce a scaled-down version of it, mini-AlphaStar (mAS). The latest version of mAS is 1.07, which can be trained using supervised learning and reinforcement learning on the raw action space which has 564 actions. It is designed to run training on a single common machine, by making the hyper-parameters adjustable and some settings simplified. We then can compare our work with mAS using the same computing resources and training time. By experiment results, we show that our method is more effective when using limited resources. The inference and training codes of mini-AlphaStar are all open-sourced at https://github.com/liuruoze/mini-AlphaStar. We hope our study could shed some light on the future research of efficient reinforcement learning on SC2 and other large-scale games.

《星际争霸2》(SC2)对强化学习(RL)提出了巨大的挑战，其中主要的困难包括巨大的状态空间、变化的动作空间和较长的时间范围。在这项工作中，我们研究了一套用于《星际争霸2》全长游戏的RL技术。我们研究了一种分层强化学习方法，其中分层涉及两个。一是从专家的演示轨迹中提取宏观动作，将动作空间按数量级缩小。另一种是神经网络的层次结构，它是模块化的，便于扩展。我们研究了一个从最简单到最难的课程迁移训练过程。我们在一台具有4个gpu和48个CPU线程的机器上训练代理。在一张64x64的地图上，我们使用限制性单位，在对抗难度等级1的内置AI时，我们获得了99%的胜率。通过课程迁移学习算法和混合战斗模型，我们在对抗最困难的非作弊内置AI关卡(7级)时取得了93%的胜率。在本文的扩展版本中，我们改进了我们的体系结构，以训练代理对抗最困难的作弊级别ai(8级，9级和10级)。我们还在不同的映射上测试了我们的方法，以评估我们方法的可扩展性。通过最终的3层分层架构和应用重要技巧来训练SC2智能体，我们将对8级、9级和10级的胜率分别提高到96%、97%和94%。我们的代码和模型现在都是开源的，在https://github.com/liuruoze/HierNet-SC2.To为我们的工作以及研究和开源社区提供了一个参考AlphaStar的基线，我们复制了它的缩小版本，mini-AlphaStar (mAS)。mAS的最新版本是1.07，它可以在有564个动作的原始动作空间上使用监督学习和强化学习进行训练。它被设计为在一台普通机器上运行训练，通过使超参数可调和一些设置简化。然后，我们可以将我们的工作与使用相同计算资源和训练时间的mAS进行比较。实验结果表明，在资源有限的情况下，该方法更加有效。我们希望我们的研究可以为未来在SC2和其他大型博弈上的有效强化学习的研究提供一些启示。

{"title":"On Efficient Reinforcement Learning for Full-length Game of StarCraft II","authors":"Ruo-Ze Liu, Zhen-Jia Pang, Zhou-Yu Meng, Wenhai Wang, Yang Yu, Tong Lu","doi":"10.48550/arXiv.2209.11553","DOIUrl":"https://doi.org/10.48550/arXiv.2209.11553","url":null,"abstract":"StarCraft II (SC2) poses a grand challenge for reinforcement learning (RL), of which the main difficulties include huge state space, varying action space, and a long time horizon. In this work, we investigate a set of RL techniques for the full-length game of StarCraft II. We investigate a hierarchical RL approach, where the hierarchy involves two. One is the extracted macro-actions from experts’ demonstration trajectories to reduce the action space in an order of magnitude. The other is a hierarchical architecture of neural networks, which is modular and facilitates scale. We investigate a curriculum transfer training procedure that trains the agent from the simplest level to the hardest level. We train the agent on a single machine with 4 GPUs and 48 CPU threads. On a 64x64 map and using restrictive units, we achieve a win rate of 99% against the difficulty level-1 built-in AI. Through the curriculum transfer learning algorithm and a mixture of combat models, we achieve a 93% win rate against the most difficult non-cheating level built-in AI (level-7). In this extended version of the paper, we improve our architecture to train the agent against the most difficult cheating level AIs (level-8, level-9, and level-10). We also test our method on different maps to evaluate the extensibility of our approach. By a final 3-layer hierarchical architecture and applying significant tricks to train SC2 agents, we increase the win rate against the level-8, level-9, and level-10 to 96%, 97%, and 94%, respectively. Our codes and models are all open-sourced now at https://github.com/liuruoze/HierNet-SC2.\u0000To provide a baseline referring the AlphaStar for our work as well as the research and open-source community, we reproduce a scaled-down version of it, mini-AlphaStar (mAS). The latest version of mAS is 1.07, which can be trained using supervised learning and reinforcement learning on the raw action space which has 564 actions. It is designed to run training on a single common machine, by making the hyper-parameters adjustable and some settings simplified. We then can compare our work with mAS using the same computing resources and training time. By experiment results, we show that our method is more effective when using limited resources. The inference and training codes of mini-AlphaStar are all open-sourced at https://github.com/liuruoze/mini-AlphaStar. We hope our study could shed some light on the future research of efficient reinforcement learning on SC2 and other large-scale games.","PeriodicalId":54877,"journal":{"name":"Journal of Artificial Intelligence Research","volume":"14 1","pages":"213-260"},"PeriodicalIF":5.0,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73731667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

sEMG-Based Upper Limb Movement Classifier: Current Scenario and Upcoming Challenges 基于表面肌电信号的上肢运动分类器:现状和未来的挑战

IF 5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Artificial Intelligence Research

Pub Date : 2022-09-19 DOI: 10.1613/jair.1.13999

M. C. Tosin, Juliano C. Machado, A. Balbinot

Despite achieving accuracies higher than 90% on recognizing upper-limb movements through sEMG (surface Electromyography) signal with the state of art classifiers in the laboratory environment, there are still issues to be addressed for a myo-controlled prosthesis achieve similar performance in real environment conditions. Thereby, the main goal of this review is to expose the latest researches in terms of strategies in each block of the system, giving a global view of the current state of academic research. A systematic review was conducted, and the retrieved papers were organized according to the system step related to the proposed method. Then, for each stage of the upper limb motion recognition system, the works were described and compared in terms of strategy, methodology and issue addressed. An additional section was destined for the description of works related to signal contamination that is often neglected in reviews focused on sEMG based motion classifiers. Therefore, this section is the main contribution of this paper. Deep learning methods are a current trend for classification stage, providing strategies based on time-series and transfer learning to address the issues related to limb position, temporal/inter-subject variation, and electrode displacement. Despite the promising strategies presented for contaminant detection, identification, and removal, there are still some factors to be considered, such as the occurrence of simultaneous contaminants. This review exposes the current scenario of the movement classification system, providing valuable information for new researchers and guiding future works towards myo-controlled devices.

尽管在实验室环境中，使用最先进的分类器通过sEMG(表面肌电图)信号识别上肢运动的准确率高于90%，但在真实环境条件下，肌控制假肢仍有一些问题需要解决。因此，本综述的主要目的是揭示该系统各个模块中策略方面的最新研究，并对当前学术研究的现状进行全局观察。对检索到的论文进行系统综述，并按照与所提方法相关的系统步骤进行整理。然后，针对上肢运动识别系统的各个阶段，从策略、方法和解决的问题等方面对工作进行了描述和比较。另外一个部分用于描述与信号污染相关的工作，这在基于表面肌电信号的运动分类器的评论中经常被忽视。因此，这一部分是本文的主要贡献。深度学习方法是当前分类阶段的趋势，提供基于时间序列和迁移学习的策略来解决与肢体位置、时间/学科间变化和电极位移相关的问题。尽管提出了很有前途的污染物检测、识别和去除策略，但仍有一些因素需要考虑，例如同时发生的污染物。本文综述了运动分类系统的现状，为新的研究人员提供了有价值的信息，并指导了未来肌控设备的工作。

{"title":"sEMG-Based Upper Limb Movement Classifier: Current Scenario and Upcoming Challenges","authors":"M. C. Tosin, Juliano C. Machado, A. Balbinot","doi":"10.1613/jair.1.13999","DOIUrl":"https://doi.org/10.1613/jair.1.13999","url":null,"abstract":"Despite achieving accuracies higher than 90% on recognizing upper-limb movements through sEMG (surface Electromyography) signal with the state of art classifiers in the laboratory environment, there are still issues to be addressed for a myo-controlled prosthesis achieve similar performance in real environment conditions. Thereby, the main goal of this review is to expose the latest researches in terms of strategies in each block of the system, giving a global view of the current state of academic research. A systematic review was conducted, and the retrieved papers were organized according to the system step related to the proposed method. Then, for each stage of the upper limb motion recognition system, the works were described and compared in terms of strategy, methodology and issue addressed. An additional section was destined for the description of works related to signal contamination that is often neglected in reviews focused on sEMG based motion classifiers. Therefore, this section is the main contribution of this paper. Deep learning methods are a current trend for classification stage, providing strategies based on time-series and transfer learning to address the issues related to limb position, temporal/inter-subject variation, and electrode displacement. Despite the promising strategies presented for contaminant detection, identification, and removal, there are still some factors to be considered, such as the occurrence of simultaneous contaminants. This review exposes the current scenario of the movement classification system, providing valuable information for new researchers and guiding future works towards myo-controlled devices.","PeriodicalId":54877,"journal":{"name":"Journal of Artificial Intelligence Research","volume":"31 1","pages":"83-127"},"PeriodicalIF":5.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76994203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Specifying and Exploiting Non-Monotonic Domain-Specific Declarative Heuristics in Answer Set Programming 答案集规划中非单调域特定声明启发式的指定与利用

IF 5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Artificial Intelligence Research

Pub Date : 2022-09-19 DOI: 10.1613/jair.1.14091

Richard Comploi-Taupe, G. Friedrich, Konstantin Schekotihin, A. Weinzierl

Domain-specific heuristics are an essential technique for solving combinatorial problems efficiently. Current approaches to integrate domain-specific heuristics with Answer Set Programming (ASP) are unsatisfactory when dealing with heuristics that are specified non-monotonically on the basis of partial assignments. Such heuristics frequently occur in practice, for example, when picking an item that has not yet been placed in bin packing. Therefore, we present novel syntax and semantics for declarative specifications of domain-specific heuristics in ASP. Our approach supports heuristic statements that depend on the partial assignment maintained during solving, which has not been possible before. We provide an implementation in Alpha that makes Alpha the first lazy-grounding ASP system to support declaratively specified domain-specific heuristics. Two practical example domains are used to demonstrate the benefits of our proposal. Additionally, we use our approach to implement informed search with A*, which is tackled within ASP for the first time. A* is applied to two further search problems. The experiments confirm that combining lazy-grounding ASP solving and our novel heuristics can be vital for solving industrial-size problems.

特定领域启发式是有效求解组合问题的一种重要技术。当前将领域特定启发式与答案集规划(ASP)相结合的方法在处理基于部分分配的非单调指定启发式时是不令人满意的。这种启发式方法在实践中经常出现，例如，在挑选尚未放入垃圾箱包装的物品时。因此，我们为ASP中特定于领域的启发式的声明性规范提出了新的语法和语义。我们的方法支持启发式语句，这些启发式语句依赖于求解过程中维护的部分赋值，这在以前是不可能的。我们在Alpha中提供了一个实现，使Alpha成为第一个支持声明式指定的特定于领域的启发式的延迟基础ASP系统。使用两个实际示例域来演示我们的建议的好处。此外，我们使用我们的方法来实现带有A*的知情搜索，这是第一次在ASP中解决。A*应用于两个进一步的搜索问题。实验证实，将懒惰基础ASP求解方法与我们的新启发式方法相结合，对于解决工业规模的问题至关重要。

{"title":"Specifying and Exploiting Non-Monotonic Domain-Specific Declarative Heuristics in Answer Set Programming","authors":"Richard Comploi-Taupe, G. Friedrich, Konstantin Schekotihin, A. Weinzierl","doi":"10.1613/jair.1.14091","DOIUrl":"https://doi.org/10.1613/jair.1.14091","url":null,"abstract":"Domain-specific heuristics are an essential technique for solving combinatorial problems efficiently. Current approaches to integrate domain-specific heuristics with Answer Set Programming (ASP) are unsatisfactory when dealing with heuristics that are specified non-monotonically on the basis of partial assignments. Such heuristics frequently occur in practice, for example, when picking an item that has not yet been placed in bin packing. Therefore, we present novel syntax and semantics for declarative specifications of domain-specific heuristics in ASP. Our approach supports heuristic statements that depend on the partial assignment maintained during solving, which has not been possible before. We provide an implementation in Alpha that makes Alpha the first lazy-grounding ASP system to support declaratively specified domain-specific heuristics. Two practical example domains are used to demonstrate the benefits of our proposal. Additionally, we use our approach to implement informed search with A*, which is tackled within ASP for the first time. A* is applied to two further search problems. The experiments confirm that combining lazy-grounding ASP solving and our novel heuristics can be vital for solving industrial-size problems.","PeriodicalId":54877,"journal":{"name":"Journal of Artificial Intelligence Research","volume":"18 1","pages":"59-114"},"PeriodicalIF":5.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73701076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Motion Planning Under Uncertainty with Complex Agents and Environments via Hybrid Search 基于混合搜索的复杂agent和环境下不确定运动规划

IF 5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Artificial Intelligence Research

Pub Date : 2022-09-19 DOI: 10.1613/jair.1.13361

Daniel Strawser, B. Williams

As autonomous systems and robots are applied to more real world situations, they must reason about uncertainty when planning actions. Mission success oftentimes cannot be guaranteed and the planner must reason about the probability of failure. Unfortunately, computing a trajectory that satisfies mission goals while constraining the probability of failure is difficult because of the need to reason about complex, multidimensional probability distributions. Recent methods have seen success using chance-constrained, model-based planning. However, the majority of these methods can only handle simple environment and agent models. We argue that there are two main drawbacks of current approaches to goal-directed motion planning under uncertainty. First, current methods suffer from an inability to deal with expressive environment models such as 3D non-convex obstacles. Second, most planners rely on considerable simplifications when computing trajectory risk including approximating the agent’s dynamics, geometry, and uncertainty. In this article, we apply hybrid search to the risk-bound, goal-directed planning problem. The hybrid search consists of a region planner and a trajectory planner. The region planner makes discrete choices by reasoning about geometric regions that the autonomous agent should visit in order to accomplish its mission. In formulating the region planner, we propose landmark regions that help produce obstacle-free paths. The region planner passes paths through the environment to a trajectory planner; the task of the trajectory planner is to optimize trajectories that respect the agent’s dynamics and the user’s desired risk of mission failure. We discuss three approaches to modeling trajectory risk: a CDF-based approach, a sampling-based collocation method, and an algorithm named Shooting Method Monte Carlo. These models allow computation of trajectory risk with more complex environments, agent dynamics, geometries, and models of uncertainty than past approaches. A variety of 2D and 3D test cases are presented including a linear case, a Dubins car model, and an underwater autonomous vehicle. The method is shown to outperform other methods in terms of speed and utility of the solution. Additionally, the models of trajectory risk are shown to better approximate risk in simulation.

随着自主系统和机器人应用于更多的现实世界情况，它们在规划行动时必须对不确定性进行推理。任务的成功往往不能保证，计划者必须考虑失败的可能性。不幸的是，由于需要对复杂的多维概率分布进行推理，在限制失败概率的同时计算满足任务目标的轨迹是困难的。最近的方法已经成功地使用了机会约束、基于模型的计划。然而，这些方法中的大多数只能处理简单的环境和代理模型。我们认为，在不确定性下，目前的目标定向运动规划方法有两个主要缺点。首先，当前的方法无法处理具有表现力的环境模型，如3D非凸障碍物。其次，大多数计划者在计算轨迹风险时依赖于相当大的简化，包括逼近agent的动力学、几何和不确定性。在本文中，我们将混合搜索应用于风险约束、目标导向的规划问题。混合搜索由区域规划器和轨迹规划器组成。区域规划器通过推理自治智能体为完成任务需要访问的几何区域来做出离散选择。在制定区域规划时，我们提出了有助于产生无障碍路径的地标区域。区域规划器通过环境将路径传递给轨迹规划器;轨迹规划器的任务是在尊重智能体动力学和用户期望的任务失败风险的情况下优化轨迹。我们讨论了三种建模轨迹风险的方法:基于cdf的方法、基于采样的配置方法和射击法蒙特卡罗算法。与过去的方法相比，这些模型允许在更复杂的环境、代理动力学、几何形状和模型不确定性下计算轨迹风险。介绍了各种2D和3D测试案例，包括线性案例，杜宾汽车模型和水下自主车辆。该方法在解决方案的速度和效用方面优于其他方法。仿真结果表明，轨迹风险模型能较好地逼近风险。

{"title":"Motion Planning Under Uncertainty with Complex Agents and Environments via Hybrid Search","authors":"Daniel Strawser, B. Williams","doi":"10.1613/jair.1.13361","DOIUrl":"https://doi.org/10.1613/jair.1.13361","url":null,"abstract":"As autonomous systems and robots are applied to more real world situations, they must reason about uncertainty when planning actions. Mission success oftentimes cannot be guaranteed and the planner must reason about the probability of failure. Unfortunately, computing a trajectory that satisfies mission goals while constraining the probability of failure is difficult because of the need to reason about complex, multidimensional probability distributions. Recent methods have seen success using chance-constrained, model-based planning. However, the majority of these methods can only handle simple environment and agent models. We argue that there are two main drawbacks of current approaches to goal-directed motion planning under uncertainty. First, current methods suffer from an inability to deal with expressive environment models such as 3D non-convex obstacles. Second, most planners rely on considerable simplifications when computing trajectory risk including approximating the agent’s dynamics, geometry, and uncertainty. In this article, we apply hybrid search to the risk-bound, goal-directed planning problem. The hybrid search consists of a region planner and a trajectory planner. The region planner makes discrete choices by reasoning about geometric regions that the autonomous agent should visit in order to accomplish its mission. In formulating the region planner, we propose landmark regions that help produce obstacle-free paths. The region planner passes paths through the environment to a trajectory planner; the task of the trajectory planner is to optimize trajectories that respect the agent’s dynamics and the user’s desired risk of mission failure. We discuss three approaches to modeling trajectory risk: a CDF-based approach, a sampling-based collocation method, and an algorithm named Shooting Method Monte Carlo. These models allow computation of trajectory risk with more complex environments, agent dynamics, geometries, and models of uncertainty than past approaches. A variety of 2D and 3D test cases are presented including a linear case, a Dubins car model, and an underwater autonomous vehicle. The method is shown to outperform other methods in terms of speed and utility of the solution. Additionally, the models of trajectory risk are shown to better approximate risk in simulation.","PeriodicalId":54877,"journal":{"name":"Journal of Artificial Intelligence Research","volume":"13 1","pages":"1-81"},"PeriodicalIF":5.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90288714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Fairness in Forecasting of Observations of Linear Dynamical Systems 线性动力系统观测值预测的公平性

IF 5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Artificial Intelligence Research

Pub Date : 2022-09-12 DOI: 10.1613/jair.1.14050

Quan Zhou, Jakub Marecek, R. Shorten

In machine learning, training data often capture the behaviour of multiple subgroups of some underlying human population. This behaviour can often be modelled as observations of an unknown dynamical system with an unobserved state. When the training data for the subgroups are not controlled carefully, however, under-representation bias arises. To counter under-representation bias, we introduce two natural notions of fairness in timeseries forecasting problems: subgroup fairness and instantaneous fairness. These notion extend predictive parity to the learning of dynamical systems. We also show globally convergent methods for the fairness-constrained learning problems using hierarchies of convexifications of non-commutative polynomial optimisation problems. We also show that by exploiting sparsity in the convexifications, we can reduce the run time of our methods considerably. Our empirical results on a biased data set motivated by insurance applications and the well-known COMPAS data set demonstrate the efficacy of our methods.

在机器学习中，训练数据通常捕获一些潜在人群的多个子群体的行为。这种行为通常可以建模为具有未观察状态的未知动力系统的观察。然而，当子组的训练数据没有被仔细控制时，就会出现代表性不足的偏差。为了克服代表性不足的偏见，我们在时间序列预测问题中引入了两个自然的公平性概念:子群公平性和瞬时公平性。这些概念将预测奇偶性扩展到动力系统的学习。我们还展示了使用非交换多项式优化问题的凸化层次的公平性约束学习问题的全局收敛方法。我们还表明，通过利用凸化中的稀疏性，我们可以大大减少方法的运行时间。我们在一个由保险应用和著名的COMPAS数据集驱动的有偏差数据集上的实证结果证明了我们方法的有效性。

引用次数: 4

Negative Human Rights as a Basis for Long-term AI Safety and Regulation 消极人权是人工智能长期安全和监管的基础

IF 5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Artificial Intelligence Research

Pub Date : 2022-08-31 DOI: 10.1613/jair.1.14020

Ondrej Bajgar, Jan Horenovsky

If autonomous AI systems are to be reliably safe in novel situations, they will need to incorporate general principles guiding them to recognize and avoid harmful behaviours. Such principles may need to be supported by a binding system of regulation, which would need the underlying principles to be widely accepted. They should also be specific enough for technical implementation. Drawing inspiration from law, this article explains how negative human rights could fulfil the role of such principles and serve as a foundation both for an international regulatory system and for building technical safety constraints for future AI systems.This article appears in the AI & Society track.

如果要让自主人工智能系统在新情况下保持可靠的安全，它们就需要纳入指导它们识别和避免有害行为的一般原则。这些原则可能需要一个有约束力的监管体系来支持，这需要基本原则得到广泛接受。它们还应该足够具体，以便于技术实现。本文从法律中汲取灵感，解释了消极人权如何发挥这些原则的作用，并作为国际监管体系和为未来人工智能系统建立技术安全约束的基础。本文出现在人工智能与社会轨道上。

引用次数: 3

Pricing Problems with Buyer Preselection 买方预选的定价问题

IF 5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Artificial Intelligence Research

Pub Date : 2022-08-28 DOI: 10.4230/LIPIcs.MFCS.2018.47

Vittorio Bilò, M. Flammini, G. Monaco, L. Moscardelli

We investigate the problem of preselecting a subset of buyers (also called agents) participating in a market so as to optimize the performance of stable outcomes. We consider four scenarios arising from the combination of two stability notions, namely market envy-freeness and agent envy-freeness, with the two state-of-the-art objective functions of social welfare and seller’s revenue. When insisting on market envy-freeness, we prove that the problem cannot be approximated within n 1−ε (with n being the number of buyers) for any ε > 0, under both objective functions; we also provide approximation algorithms with an approximation ratio tight up to subpolynomial multiplicative factors for social welfare and the seller’s revenue. The negative result, in particular, holds even for markets with single-minded buyers. We also prove that maximizing the seller’s revenue is NP-hard even for a single buyer, thus closing a previous open question. Under agent envy-freeness and for both objective functions, instead, we design a polynomial time algorithm transforming any stable outcome for a market involving any subset of buyers into a stable outcome for the whole market without worsening its performance. This result creates an interesting middle-ground situation where, if on the one hand buyer preselection cannot improve the performance of agent envy-free outcomes, on the other one it can be used as a tool for simplifying the combinatorial structure of the buyers’ valuation functions in a given market. Finally, we consider the restricted case of multi-unit markets, where all items are of the same type and are assigned the same price. For these markets, we show that preselection may improve the performance of stable outcomes in all of the four considered scenarios, and design corresponding approximation algorithms.

我们研究了预先选择参与市场的买方子集(也称为代理人)以优化稳定结果的性能的问题。在社会福利和卖方收入这两个最先进的目标函数下，我们考虑了市场嫉妒自由和代理人嫉妒自由这两个稳定性概念结合而产生的四种情况。在市场嫉妒自由条件下，证明了在两个目标函数下，当ε > 0时，问题不能在n 1−ε (n为购买者数量)范围内逼近;我们还提供了近似算法，其近似比率紧达社会福利和卖方收入的次多项式乘法因子。负面结果尤其适用于那些一心一意的买家。我们还证明，即使对于单个买家，最大化卖方收入也是np困难的，从而结束了前面的开放问题。在agent嫉妒自由条件下，对于这两个目标函数，我们设计了一个多项式时间算法，将涉及任意买家子集的市场的任何稳定结果转换为整个市场的稳定结果，而不影响其性能。这一结果创造了一个有趣的中间局面，即如果买方预选一方面不能提高代理无嫉妒结果的性能，另一方面它可以用作简化给定市场中买方估值函数组合结构的工具。最后，我们考虑了多单位市场的限制情况，其中所有的物品都是相同的类型，并被赋予相同的价格。对于这些市场，我们证明了预选可以在所有四种考虑的场景中提高稳定结果的性能，并设计了相应的近似算法。

{"title":"Pricing Problems with Buyer Preselection","authors":"Vittorio Bilò, M. Flammini, G. Monaco, L. Moscardelli","doi":"10.4230/LIPIcs.MFCS.2018.47","DOIUrl":"https://doi.org/10.4230/LIPIcs.MFCS.2018.47","url":null,"abstract":"We investigate the problem of preselecting a subset of buyers (also called agents) participating in a market so as to optimize the performance of stable outcomes. We consider four scenarios arising from the combination of two stability notions, namely market envy-freeness and agent envy-freeness, with the two state-of-the-art objective functions of social welfare and seller’s revenue. When insisting on market envy-freeness, we prove that the problem cannot be approximated within n 1−ε (with n being the number of buyers) for any ε > 0, under both objective functions; we also provide approximation algorithms with an approximation ratio tight up to subpolynomial multiplicative factors for social welfare and the seller’s revenue. The negative result, in particular, holds even for markets with single-minded buyers. We also prove that maximizing the seller’s revenue is NP-hard even for a single buyer, thus closing a previous open question. Under agent envy-freeness and for both objective functions, instead, we design a polynomial time algorithm transforming any stable outcome for a market involving any subset of buyers into a stable outcome for the whole market without worsening its performance. This result creates an interesting middle-ground situation where, if on the one hand buyer preselection cannot improve the performance of agent envy-free outcomes, on the other one it can be used as a tool for simplifying the combinatorial structure of the buyers’ valuation functions in a given market. Finally, we consider the restricted case of multi-unit markets, where all items are of the same type and are assigned the same price. For these markets, we show that preselection may improve the performance of stable outcomes in all of the four considered scenarios, and design corresponding approximation algorithms.","PeriodicalId":54877,"journal":{"name":"Journal of Artificial Intelligence Research","volume":"56 1","pages":"1791-1822"},"PeriodicalIF":5.0,"publicationDate":"2022-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80262053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Synthesis and Properties of Optimally Value-Aligned Normative Systems 最优值对齐规范系统的综合与性质

IF 5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Artificial Intelligence Research

Pub Date : 2022-08-21 DOI: 10.1613/jair.1.13487

Nieves Montes, C. Sierra

The value alignment problem is concerned with the design of systems that provably abide by our human values. One approach to this challenge is through the leverage of prescriptive norms that, if carefully designed, are able to steer a multiagent system away from harmful outcomes and towards more beneficial ones. In this work, we first present a general methodology for the automated synthesis of value aligned normative systems, based on a consequentialist view of values. In the second part, we provide analytical tools to examine such value aligned normative systems, namely the Shapley value of individual norms and the compatibility of several values under a fixed set of norms. We illustrate all of our contributions with a running example of a society of agents where taxes are collected and redistributed according to a set of parametrised norms.

价值一致性问题关注的是系统的设计是否符合我们人类的价值观。应对这一挑战的一种方法是通过规范的杠杆作用，如果精心设计，这些规范能够引导多主体系统远离有害的结果，走向更有益的结果。在这项工作中，我们首先提出了一种基于结果主义价值观的价值规范系统自动合成的一般方法。在第二部分中，我们提供了分析工具来检验这种价值一致的规范系统，即单个规范的Shapley值和一组固定规范下若干值的兼容性。我们用一个运行的例子来说明我们所有的贡献，这个例子是一个由代理人组成的社会，在这个社会中，税收是根据一组参数化的规范来征收和重新分配的。

引用次数: 6

C-Face: Using Compare Face on Face Hallucination for Low-Resolution Face Recognition C-Face:基于比较脸幻觉的低分辨率人脸识别

IF 5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Artificial Intelligence Research

Pub Date : 2022-08-16 DOI: 10.1613/jair.1.13816

F. Han, Xudong Wang, S. Furao, Jian Zhao

Face hallucination is a task of generating high-resolution (HR) face images from low-resolution (LR) inputs, which is a subfield of the general image super-resolution. However, most of the previous methods only consider the visual effect, ignoring how to maintain the identity of the face. In this work, we propose a novel face hallucination model, called C-Face network, which can generate HR images with high visual quality while preserving the identity information. A face recognition network is used to extract the identity features in the training process. In order to make the reconstructed face images keep the identity information to a great extent, a novel metric, i.e., C-Face loss, is proposed. We also propose a new training algorithm to deal with the convergence problem. Moreover, since our work mainly focuses on the recognition accuracy of the output, we integrate face recognition into the face hallucination process which ensures that the model can be used in real scenarios. Extensive experiments on two large scale face datasets demonstrate that our C-Face network has the best performance compared with other state-of-the-art methods.

人脸幻觉是一种从低分辨率(LR)输入生成高分辨率(HR)人脸图像的任务，是一般图像超分辨率的一个分支。然而，以往的方法大多只考虑视觉效果，忽略了如何保持脸部的身份。在这项工作中，我们提出了一种新的人脸幻觉模型，称为C-Face网络，该模型可以在保留身份信息的同时生成高视觉质量的HR图像。在训练过程中，使用人脸识别网络提取身份特征。为了使重构后的人脸图像最大程度地保留身份信息，提出了一种新的度量，即C-Face loss。我们还提出了一种新的训练算法来处理收敛问题。此外，由于我们的工作主要关注输出的识别精度，因此我们将人脸识别融入到人脸幻觉过程中，确保了模型可以在真实场景中使用。在两个大规模人脸数据集上的大量实验表明，与其他最先进的方法相比，我们的C-Face网络具有最佳性能。

{"title":"C-Face: Using Compare Face on Face Hallucination for Low-Resolution Face Recognition","authors":"F. Han, Xudong Wang, S. Furao, Jian Zhao","doi":"10.1613/jair.1.13816","DOIUrl":"https://doi.org/10.1613/jair.1.13816","url":null,"abstract":"Face hallucination is a task of generating high-resolution (HR) face images from low-resolution (LR) inputs, which is a subfield of the general image super-resolution. However, most of the previous methods only consider the visual effect, ignoring how to maintain the identity of the face. In this work, we propose a novel face hallucination model, called C-Face network, which can generate HR images with high visual quality while preserving the identity information. A face recognition network is used to extract the identity features in the training process. In order to make the reconstructed face images keep the identity information to a great extent, a novel metric, i.e., C-Face loss, is proposed. We also propose a new training algorithm to deal with the convergence problem. Moreover, since our work mainly focuses on the recognition accuracy of the output, we integrate face recognition into the face hallucination process which ensures that the model can be used in real scenarios. Extensive experiments on two large scale face datasets demonstrate that our C-Face network has the best performance compared with other state-of-the-art methods.","PeriodicalId":54877,"journal":{"name":"Journal of Artificial Intelligence Research","volume":"16 1","pages":"1715-1737"},"PeriodicalIF":5.0,"publicationDate":"2022-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78341826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Joint Optimization of Concave Scalarized Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm 基于策略梯度算法的凹标化多目标强化学习联合优化

IF 5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Artificial Intelligence Research

Pub Date : 2022-08-09 DOI: 10.1613/jair.1.13981

Qinbo Bai, Mridul Agarwal, V. Aggarwal

Many engineering problems have multiple objectives, and the overall aim is to optimize a non-linear function of these objectives. In this paper, we formulate the problem of maximizing a non-linear concave function of multiple long-term objectives. A policy-gradient based model-free algorithm is proposed for the problem. To compute an estimate of the gradient, an asymptotically biased estimator is proposed. The proposed algorithm is shown to achieve convergence to within an ε of the global optima after sampling O(M4 σ2/(1-γ)8ε4) trajectories where γ is the discount factor and M is the number of the agents, thus achieving the same dependence on ε as the policy gradient algorithm for the standard reinforcement learning.

许多工程问题都有多个目标，总体目标是优化这些目标的非线性函数。本文讨论了多长期目标非线性凹函数的最大化问题。针对该问题，提出了一种基于策略梯度的无模型算法。为了计算梯度的估计，提出了渐近偏估计。结果表明，该算法在采样O(M4 σ2/(1-γ)8ε4)个轨迹(其中γ为折现因子，M为智能体数量)后，收敛到全局最优值的ε范围内，从而实现了与标准强化学习的策略梯度算法相同的对ε的依赖。

引用次数: 1