Artificial Intelligence最新文献_第5页

Optimizing pathfinding for goal legibility and recognition in cooperative partially observable environments 在部分可观测的合作环境中优化寻路，实现目标可读性和识别性

IF 14.4 2区计算机科学 Q1 Arts and Humanities

Artificial Intelligence

Pub Date : 2024-05-21 DOI: 10.1016/j.artint.2024.104148

Sara Bernardini , Fabio Fagnani , Alexandra Neacsu , Santiago Franco

In this paper, we perform a joint design of goal legibility and recognition in a cooperative, multi-agent pathfinding setting with partial observability. More specifically, we consider a set of identical agents (the actors) that move in an environment only partially observable to an observer in the loop. The actors are tasked with reaching a set of locations that need to be serviced in a timely fashion. The observer monitors the actors' behavior from a distance and needs to identify each actor's destination based on the actor's observable movements. Our approach generates legible paths for the actors; namely, it constructs one path from the origin to each destination so that these paths overlap as little as possible while satisfying budget constraints. It also equips the observer with a goal-recognition mapping between unique sequences of observations and destinations, ensuring that the observer can infer an actor's destination by making the minimum number of observations (legibility delay). Our method substantially extends previous work, which is limited to an observer with full observability, showing that optimizing pathfinding for goal legibility and recognition can be performed via a reformulation into a classical minimum cost flow problem in the partially observable case when the algorithms for the fully observable case are appropriately modified. Our empirical evaluation shows that our techniques are as effective in partially observable settings as in fully observable ones.

在本文中，我们在具有部分可观测性的合作式多代理寻路环境中，对目标可读性和识别性进行了联合设计。更具体地说，我们考虑了一组完全相同的代理（行动者），它们在环境中移动，而环路中的观察者只能部分地观察到它们。行动者的任务是及时到达一组需要服务的地点。观察者从远处监视行动者的行为，并需要根据可观察到的行动者的移动来确定每个行动者的目的地。我们的方法可为行动者生成清晰的路径，即构建一条从起点到每个目的地的路径，从而在满足预算限制的前提下尽可能减少路径重叠。它还为观察者提供了独特的观察序列和目的地之间的目标识别映射，确保观察者能通过最少的观察次数（可读性延迟）推断出演员的目的地。我们的方法大大扩展了之前仅限于完全可观察观察者的工作，表明在部分可观察的情况下，如果对完全可观察情况下的算法进行适当修改，就可以通过将其重拟为经典的最小成本流问题来优化目标可读性和识别的寻路过程。我们的经验评估表明，我们的技术在部分可观测环境中与在完全可观测环境中同样有效。

{"title":"Optimizing pathfinding for goal legibility and recognition in cooperative partially observable environments","authors":"Sara Bernardini , Fabio Fagnani , Alexandra Neacsu , Santiago Franco","doi":"10.1016/j.artint.2024.104148","DOIUrl":"10.1016/j.artint.2024.104148","url":null,"abstract":"<div><p>In this paper, we perform a joint design of goal legibility and recognition in a cooperative, multi-agent pathfinding setting with partial observability. More specifically, we consider a set of identical agents (the actors) that move in an environment only partially observable to an observer in the loop. The actors are tasked with reaching a set of locations that need to be serviced in a timely fashion. The observer monitors the actors' behavior from a distance and needs to identify each actor's destination based on the actor's observable movements. Our approach generates legible paths for the actors; namely, it constructs one path from the origin to each destination so that these paths overlap as little as possible while satisfying budget constraints. It also equips the observer with a goal-recognition mapping between unique sequences of observations and destinations, ensuring that the observer can infer an actor's destination by making the minimum number of observations (legibility delay). Our method substantially extends previous work, which is limited to an observer with full observability, showing that optimizing pathfinding for goal legibility and recognition can be performed via a reformulation into a classical minimum cost flow problem in the partially observable case when the algorithms for the fully observable case are appropriately modified. Our empirical evaluation shows that our techniques are as effective in partially observable settings as in fully observable ones.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":14.4,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0004370224000845/pdfft?md5=66bd75617c41f8c0d650bfa7aefc5bfd&pid=1-s2.0-S0004370224000845-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141136365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Acquiring and modeling abstract commonsense knowledge via conceptualization 通过概念化获取抽象常识并建立模型

IF 14.4 2区计算机科学 Q1 Arts and Humanities

Artificial Intelligence

Pub Date : 2024-05-17 DOI: 10.1016/j.artint.2024.104149

Mutian He, Tianqing Fang, Weiqi Wang, Yangqiu Song

Conceptualization, or viewing entities and situations as instances of abstract concepts in mind and making inferences based on that, is a vital component in human intelligence for commonsense reasoning. Despite recent progress in artificial intelligence to acquire and model commonsense attributed to neural language models and commonsense knowledge graphs (CKGs), conceptualization is yet to be introduced thoroughly, making current approaches ineffective to cover knowledge about countless diverse entities and situations in the real world. To address the problem, we thoroughly study the role of conceptualization in commonsense reasoning, and formulate a framework to replicate human conceptual induction by acquiring abstract knowledge about events regarding abstract concepts, as well as higher-level triples or inferences upon them. We then apply the framework to ATOMIC, a large-scale human-annotated CKG, aided by the taxonomy Probase. We annotate a dataset on the validity of contextualized conceptualizations from ATOMIC on both event and triple levels, develop a series of heuristic rules based on linguistic features, and train a set of neural models to generate and verify abstract knowledge. Based on these components, a pipeline to acquire abstract knowledge is built. A large abstract CKG upon ATOMIC is then induced, ready to be instantiated to infer about unseen entities or situations. Finally, we empirically show the benefits of augmenting CKGs with abstract knowledge in downstream tasks like commonsense inference and zero-shot commonsense QA.

概念化，即在头脑中将实体和情境视为抽象概念的实例并据此做出推断，是人类智能中常识推理的重要组成部分。尽管近年来人工智能在神经语言模型和常识知识图谱（CKGs）的常识获取和建模方面取得了进展，但概念化尚未被彻底引入，这使得当前的方法无法有效涵盖现实世界中无数不同实体和情境的知识。为了解决这个问题，我们深入研究了概念化在常识推理中的作用，并制定了一个框架，通过获取与抽象概念有关的事件的抽象知识以及更高层次的三元组或推论来复制人类的概念归纳。然后，我们将该框架应用于 ATOMIC，这是一个由人类标注的大规模 CKG，并得到了分类学 Probase 的帮助。我们对来自 ATOMIC 的上下文概念化数据集进行了事件和三元级别的有效性注释，开发了一系列基于语言特征的启发式规则，并训练了一套神经模型来生成和验证抽象知识。基于这些组件，我们建立了一个获取抽象知识的管道。然后在 ATOMIC 的基础上诱导出一个大型抽象 CKG，并将其实例化，以推断出未见过的实体或情况。最后，我们通过经验证明了在常识推理和零点常识质量保证等下游任务中使用抽象知识增强 CKG 的好处。

{"title":"Acquiring and modeling abstract commonsense knowledge via conceptualization","authors":"Mutian He, Tianqing Fang, Weiqi Wang, Yangqiu Song","doi":"10.1016/j.artint.2024.104149","DOIUrl":"10.1016/j.artint.2024.104149","url":null,"abstract":"<div><p>Conceptualization, or viewing entities and situations as instances of abstract concepts in mind and making inferences based on that, is a vital component in human intelligence for commonsense reasoning. Despite recent progress in artificial intelligence to acquire and model commonsense attributed to neural language models and commonsense knowledge graphs (CKGs), conceptualization is yet to be introduced thoroughly, making current approaches ineffective to cover knowledge about countless diverse entities and situations in the real world. To address the problem, we thoroughly study the role of conceptualization in commonsense reasoning, and formulate a framework to replicate human conceptual induction by acquiring abstract knowledge about events regarding abstract concepts, as well as higher-level triples or inferences upon them. We then apply the framework to ATOMIC, a large-scale human-annotated CKG, aided by the taxonomy Probase. We annotate a dataset on the validity of contextualized conceptualizations from ATOMIC on both event and triple levels, develop a series of heuristic rules based on linguistic features, and train a set of neural models to generate and verify abstract knowledge. Based on these components, a pipeline to acquire abstract knowledge is built. A large abstract CKG upon ATOMIC is then induced, ready to be instantiated to infer about unseen entities or situations. Finally, we empirically show the benefits of augmenting CKGs with abstract knowledge in downstream tasks like commonsense inference and zero-shot commonsense QA.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":14.4,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141027260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Knowledge is power: Open-world knowledge representation learning for knowledge-based visual reasoning 知识就是力量：基于知识的视觉推理的开放世界知识表征学习

IF 14.4 2区计算机科学 Q1 Arts and Humanities

Artificial Intelligence

Pub Date : 2024-05-13 DOI: 10.1016/j.artint.2024.104147

Wenbo Zheng , Lan Yan , Fei-Yue Wang

Knowledge-based visual reasoning requires the ability to associate outside knowledge that is not present in a given image for cross-modal visual understanding. Two deficiencies of the existing approaches are that (1) they only employ or construct elementary and explicit but superficial knowledge graphs while lacking complex and implicit but indispensable cross-modal knowledge for visual reasoning, and (2) they also cannot reason new/unseen images or questions in open environments and are often violated in real-world applications. How to represent and leverage tacit multimodal knowledge for open-world visual reasoning scenarios has been less studied. In this paper, we propose a novel open-world knowledge representation learning method to not only construct implicit knowledge representations from the given images and their questions but also enable knowledge transfer from a known given scene to an unknown scene for answer prediction. Extensive experiments conducted on six benchmarks demonstrate the superiority of our approach over other state-of-the-art methods. We apply our approach to other visual reasoning tasks, and the experimental results show that our approach, with its good performance, can support related reasoning applications.

基于知识的视觉推理要求能够联想到特定图像中不存在的外部知识，以实现跨模态视觉理解。现有方法有两个不足之处：(1) 它们只使用或构建了基本的、显性的但肤浅的知识图谱，而缺乏复杂的、隐性的但对于视觉推理不可或缺的跨模态知识；(2) 它们也无法在开放环境中推理新的/未见过的图像或问题，在现实世界的应用中经常被违反。如何在开放世界的视觉推理场景中表示和利用隐性多模态知识的研究较少。在本文中，我们提出了一种新颖的开放世界知识表征学习方法，不仅能从给定图像及其问题中构建隐性知识表征，还能实现从已知给定场景到未知场景的知识转移，从而进行答案预测。在六个基准测试中进行的大量实验证明，我们的方法优于其他最先进的方法。我们将我们的方法应用于其他视觉推理任务，实验结果表明我们的方法性能良好，可以支持相关的推理应用。

{"title":"Knowledge is power: Open-world knowledge representation learning for knowledge-based visual reasoning","authors":"Wenbo Zheng , Lan Yan , Fei-Yue Wang","doi":"10.1016/j.artint.2024.104147","DOIUrl":"10.1016/j.artint.2024.104147","url":null,"abstract":"<div><p>Knowledge-based visual reasoning requires the ability to associate outside knowledge that is not present in a given image for cross-modal visual understanding. Two deficiencies of the existing approaches are that (1) they only employ or construct elementary and <em>explicit</em> but superficial knowledge graphs while lacking complex and <em>implicit</em> but indispensable cross-modal knowledge for visual reasoning, and (2) they also cannot reason new/<em>unseen</em> images or questions in open environments and are often violated in real-world applications. How to represent and leverage tacit multimodal knowledge for open-world visual reasoning scenarios has been less studied. In this paper, we propose a novel open-world knowledge representation learning method to not only construct implicit knowledge representations from the given images and their questions but also enable knowledge transfer from a <em>known</em> given scene to an <em>unknown</em> scene for answer prediction. Extensive experiments conducted on six benchmarks demonstrate the superiority of our approach over other state-of-the-art methods. We apply our approach to other visual reasoning tasks, and the experimental results show that our approach, with its good performance, can support related reasoning applications.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":14.4,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140949791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning spatio-temporal dynamics on mobility networks for adaptation to open-world events 学习移动网络的时空动态，以适应开放世界事件

IF 5.1 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence

Pub Date : 2024-05-08 DOI: 10.1016/j.artint.2024.104120

As a decisive part in the success of Mobility-as-a-Service (MaaS), spatio-temporal dynamics modeling on mobility networks is a challenging task particularly considering scenarios where open-world events drive mobility behavior deviated from the routines. While tremendous progress has been made to model high-level spatio-temporal regularities with deep learning, most, if not all of the existing methods are neither aware of the dynamic interactions among multiple transport modes on mobility networks, nor adaptive to unprecedented volatility brought by potential open-world events. In this paper, we are therefore motivated to improve the canonical spatio-temporal network (ST-Net) from two perspectives: (1) design a heterogeneous mobility information network (HMIN) to explicitly represent intermodality in multimodal mobility; (2) propose a memory-augmented dynamic filter generator (MDFG) to generate sequence-specific parameters in an on-the-fly fashion for various scenarios. The enhanced event-aware spatio-temporal network, namely EAST-Net, is evaluated on several real-world datasets with a wide variety and coverage of open-world events. Both quantitative and qualitative experimental results verify the superiority of our approach compared with the state-of-the-art baselines. What is more, experiments show generalization ability of EAST-Net to perform zero-shot inference over different open-world events that have not been seen.

作为移动即服务（MaaS）成功的决定性因素，移动网络的时空动态建模是一项具有挑战性的任务，特别是考虑到开放世界事件驱动移动行为偏离常规的场景。虽然在利用深度学习对高层次时空规律性进行建模方面取得了巨大进展，但大多数（如果不是全部的话）现有方法既没有意识到移动网络中多种交通模式之间的动态交互，也不能适应潜在开放世界事件带来的前所未有的波动性。因此，在本文中，我们从两个方面着手改进典型时空网络（ST-Net）：（1）设计一种异构移动信息网络（HMIN），以明确表示多模式移动中的多模式性；（2）提出一种记忆增强型动态滤波器生成器（MDFG），以针对各种场景即时生成特定序列参数。增强型事件感知时空网络（即 EAST-Net）在多个真实世界数据集上进行了评估，这些数据集具有种类繁多、覆盖面广的开放世界事件。定量和定性实验结果都验证了我们的方法优于最先进的基线方法。此外，实验还显示了 EAST-Net 的泛化能力，可以对未见过的不同开放世界事件进行零点推理。

{"title":"Learning spatio-temporal dynamics on mobility networks for adaptation to open-world events","authors":"","doi":"10.1016/j.artint.2024.104120","DOIUrl":"10.1016/j.artint.2024.104120","url":null,"abstract":"<div><p>As a decisive part in the success of Mobility-as-a-Service (MaaS), spatio-temporal dynamics modeling on mobility networks is a challenging task particularly considering scenarios where open-world events drive mobility behavior deviated from the routines. While tremendous progress has been made to model high-level spatio-temporal regularities with deep learning, most, if not all of the existing methods are neither aware of the dynamic interactions among multiple transport modes on mobility networks, nor adaptive to unprecedented volatility brought by potential open-world events. In this paper, we are therefore motivated to improve the canonical spatio-temporal network (ST-Net) from two perspectives: (1) design a heterogeneous mobility information network (HMIN) to explicitly represent intermodality in multimodal mobility; (2) propose a memory-augmented dynamic filter generator (MDFG) to generate sequence-specific parameters in an on-the-fly fashion for various scenarios. The enhanced <u>e</u>vent-<u>a</u>ware <u>s</u>patio-<u>t</u>emporal <u>net</u>work, namely <strong>EAST-Net</strong>, is evaluated on several real-world datasets with a wide variety and coverage of open-world events. Both quantitative and qualitative experimental results verify the superiority of our approach compared with the state-of-the-art baselines. What is more, experiments show generalization ability of EAST-Net to perform zero-shot inference over different open-world events that have not been seen.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":5.1,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141043763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploring the psychology of LLMs’ moral and legal reasoning 探究法律硕士的道德和法律推理心理

IF 14.4 2区计算机科学 Q1 Arts and Humanities

Artificial Intelligence

Pub Date : 2024-05-03 DOI: 10.1016/j.artint.2024.104145

Guilherme F.C.F. Almeida , José Luiz Nunes , Neele Engelmann , Alex Wiegmann , Marcelo de Araújo

Large language models (LLMs) exhibit expert-level performance in tasks across a wide range of different domains. Ethical issues raised by LLMs and the need to align future versions makes it important to know how state of the art models reason about moral and legal issues. In this paper, we employ the methods of experimental psychology to probe into this question. We replicate eight studies from the experimental literature with instances of Google's Gemini Pro, Anthropic's Claude 2.1, OpenAI's GPT-4, and Meta's Llama 2 Chat 70b. We find that alignment with human responses shifts from one experiment to another, and that models differ amongst themselves as to their overall alignment, with GPT-4 taking a clear lead over all other models we tested. Nonetheless, even when LLM-generated responses are highly correlated to human responses, there are still systematic differences, with a tendency for models to exaggerate effects that are present among humans, in part by reducing variance. This recommends caution with regards to proposals of replacing human participants with current state-of-the-art LLMs in psychological research and highlights the need for further research about the distinctive aspects of machine psychology.

大型语言模型（LLMs）在各种不同领域的任务中表现出专家级的性能。LLM 引发的道德问题以及对未来版本进行调整的需要，使得了解最新模型如何推理道德和法律问题变得非常重要。在本文中，我们采用了实验心理学的方法来探究这一问题。我们用谷歌的 Gemini Pro、Anthropic 的 Claude 2.1、OpenAI 的 GPT-4 和 Meta 的 Llama 2 Chat 70b 复制了实验文献中的八项研究。我们发现，在不同的实验中，与人类反应的一致性会发生变化，而且不同模型之间的整体一致性也不尽相同，其中 GPT-4 明显领先于我们测试的所有其他模型。然而，即使当 LLM 生成的反应与人类反应高度相关时，仍然存在系统性差异，模型倾向于夸大人类中存在的效应，部分原因是模型减少了方差。因此，我们建议在心理学研究中谨慎对待用当前最先进的 LLM 取代人类参与者的建议，并强调了进一步研究机器心理学独特方面的必要性。

{"title":"Exploring the psychology of LLMs’ moral and legal reasoning","authors":"Guilherme F.C.F. Almeida , José Luiz Nunes , Neele Engelmann , Alex Wiegmann , Marcelo de Araújo","doi":"10.1016/j.artint.2024.104145","DOIUrl":"https://doi.org/10.1016/j.artint.2024.104145","url":null,"abstract":"<div><p>Large language models (LLMs) exhibit expert-level performance in tasks across a wide range of different domains. Ethical issues raised by LLMs and the need to align future versions makes it important to know how state of the art models reason about moral and legal issues. In this paper, we employ the methods of experimental psychology to probe into this question. We replicate eight studies from the experimental literature with instances of Google's Gemini Pro, Anthropic's Claude 2.1, OpenAI's GPT-4, and Meta's Llama 2 Chat 70b. We find that alignment with human responses shifts from one experiment to another, and that models differ amongst themselves as to their overall alignment, with GPT-4 taking a clear lead over all other models we tested. Nonetheless, even when LLM-generated responses are highly correlated to human responses, there are still systematic differences, with a tendency for models to exaggerate effects that are present among humans, in part by reducing variance. This recommends caution with regards to proposals of replacing human participants with current state-of-the-art LLMs in psychological research and highlights the need for further research about the distinctive aspects of machine psychology.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":14.4,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140913989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A multi-graph representation for event extraction 用于事件提取的多图表示法

IF 14.4 2区计算机科学 Q1 Arts and Humanities

Artificial Intelligence

Pub Date : 2024-05-03 DOI: 10.1016/j.artint.2024.104144

Hui Huang , Yanping Chen , Chuan Lin , Ruizhang Huang , Qinghua Zheng , Yongbin Qin

Event extraction has a trend in identifying event triggers and arguments in a unified framework, which has the advantage of avoiding the cascading failure in pipeline methods. The main problem is that joint models usually assume a one-to-one relationship between event triggers and arguments. It leads to the argument multiplexing problem, in which an argument mention can serve different roles in an event or shared by different events. To address this problem, we propose a multigraph-based event extraction framework. It allows parallel edges between any nodes, which is effective to represent semantic structures of an event. The framework enables the neural network to map a sentence(s) into a structurized semantic representation, which encodes multi-overlapped events. After evaluated on four public datasets, our method achieves the state-of-the-art performance, outperforming all compared models. Analytical experiments show that the multigraph representation is effective to address the argument multiplexing problem and helpful to advance the discriminability of the neural network for event extraction.

事件提取的趋势是在一个统一的框架中识别事件触发器和参数，其优点是可以避免流水线方法中的级联故障。主要问题在于，联合模型通常假定事件触发器和参数之间是一对一的关系。这就导致了参数复用问题，即一个参数可以在一个事件中扮演不同的角色，也可以被不同的事件共享。为了解决这个问题，我们提出了一个基于多图的事件提取框架。它允许任何节点之间存在平行边，从而有效地表示事件的语义结构。该框架可使神经网络将句子映射为结构化的语义表示，从而对多重叠事件进行编码。在四个公开数据集上进行评估后，我们的方法达到了最先进的性能，优于所有对比模型。分析实验表明，多图表示法能有效解决论点复用问题，并有助于提高神经网络对事件提取的辨别能力。

{"title":"A multi-graph representation for event extraction","authors":"Hui Huang , Yanping Chen , Chuan Lin , Ruizhang Huang , Qinghua Zheng , Yongbin Qin","doi":"10.1016/j.artint.2024.104144","DOIUrl":"https://doi.org/10.1016/j.artint.2024.104144","url":null,"abstract":"<div><p>Event extraction has a trend in identifying event triggers and arguments in a unified framework, which has the advantage of avoiding the cascading failure in pipeline methods. The main problem is that joint models usually assume a one-to-one relationship between event triggers and arguments. It leads to the argument multiplexing problem, in which an argument mention can serve different roles in an event or shared by different events. To address this problem, we propose a multigraph-based event extraction framework. It allows parallel edges between any nodes, which is effective to represent semantic structures of an event. The framework enables the neural network to map a sentence(s) into a structurized semantic representation, which encodes multi-overlapped events. After evaluated on four public datasets, our method achieves the state-of-the-art performance, outperforming all compared models. Analytical experiments show that the multigraph representation is effective to address the argument multiplexing problem and helpful to advance the discriminability of the neural network for event extraction.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":14.4,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140843426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mitigating social biases of pre-trained language models via contrastive self-debiasing with double data augmentation 通过双重数据增强的对比性自我消除，减轻预训练语言模型的社会偏见

IF 14.4 2区计算机科学 Q1 Arts and Humanities

Artificial Intelligence

Pub Date : 2024-04-26 DOI: 10.1016/j.artint.2024.104143

Yingji Li , Mengnan Du , Rui Song , Xin Wang , Mingchen Sun , Ying Wang

Pre-trained Language Models (PLMs) have been shown to inherit and even amplify the social biases contained in the training corpus, leading to undesired stereotype in real-world applications. Existing techniques for mitigating the social biases of PLMs mainly rely on data augmentation with manually designed prior knowledge or fine-tuning with abundant external corpora to debias. However, these methods are not only limited by artificial experience, but also consume a lot of resources to access all the parameters of the PLMs and are prone to introduce new external biases when fine-tuning with external corpora. In this paper, we propose a Contrastive Self-Debiasing Model with Double Data Augmentation (named CD³) for mitigating social biases of PLMs. Specifically, CD³ consists of two stages: double data augmentation and contrastive self-debiasing. First, we build on counterfactual data augmentation to perform a secondary augmentation using biased prompts that are automatically searched by maximizing the differences in PLMs' encoding across demographic groups. Double data augmentation further amplifies the biases between sample pairs to break the limitations of previous debiasing models that heavily rely on prior knowledge in data augmentation. We then leverage the augmented data for contrastive learning to train a plug-and-play adapter to mitigate the social biases in PLMs' encoding without tuning the PLMs. Extensive experimental results on BERT, ALBERT, and RoBERTa on several real-world datasets and fairness metrics show that CD³ outperforms baseline models on gender debiasing and race debiasing while retaining the language modeling capabilities of PLMs.

事实证明，预训练语言模型（PLMs）会继承甚至放大训练语料库中的社会偏见，从而在实际应用中产生不受欢迎的刻板印象。现有的减轻 PLMs 社会偏见的技术主要依赖于利用人工设计的先验知识进行数据扩充，或利用丰富的外部语料库进行微调来消除偏见。然而，这些方法不仅受到人工经验的限制，而且需要消耗大量资源来获取 PLM 的所有参数，并且在使用外部语料进行微调时容易引入新的外部偏差。在本文中，我们提出了一种具有双重数据增强功能的对比自消除模型（名为 CD3），用于减轻 PLM 的社会偏见。具体来说，CD3 包括两个阶段：双重数据增强和对比性自我纠错。首先，我们在反事实数据增强的基础上，使用有偏见的提示进行二次增强，这些提示是通过最大化不同人口群体中 PLM 编码的差异而自动搜索的。双重数据扩增进一步放大了样本对之间的偏差，从而打破了以往数据扩增严重依赖先验知识的去除法模型的局限性。然后，我们利用增强数据进行对比学习，训练即插即用适配器，以减轻 PLM 编码中的社会偏差，而无需调整 PLM。在 BERT、ALBERT 和 RoBERTa 上对多个真实数据集和公平性指标进行的大量实验结果表明，CD3 在性别去重和种族去重方面优于基线模型，同时保留了 PLM 的语言建模能力。

{"title":"Mitigating social biases of pre-trained language models via contrastive self-debiasing with double data augmentation","authors":"Yingji Li , Mengnan Du , Rui Song , Xin Wang , Mingchen Sun , Ying Wang","doi":"10.1016/j.artint.2024.104143","DOIUrl":"https://doi.org/10.1016/j.artint.2024.104143","url":null,"abstract":"<div><p>Pre-trained Language Models (PLMs) have been shown to inherit and even amplify the social biases contained in the training corpus, leading to undesired stereotype in real-world applications. Existing techniques for mitigating the social biases of PLMs mainly rely on data augmentation with manually designed prior knowledge or fine-tuning with abundant external corpora to debias. However, these methods are not only limited by artificial experience, but also consume a lot of resources to access all the parameters of the PLMs and are prone to introduce new external biases when fine-tuning with external corpora. In this paper, we propose a <u>C</u>ontrastive Self-<u>D</u>ebiasing Model with <u>D</u>ouble <u>D</u>ata Augmentation (named CD<sup>3</sup>) for mitigating social biases of PLMs. Specifically, CD<sup>3</sup> consists of two stages: double data augmentation and contrastive self-debiasing. First, we build on counterfactual data augmentation to perform a secondary augmentation using biased prompts that are automatically searched by maximizing the differences in PLMs' encoding across demographic groups. Double data augmentation further amplifies the biases between sample pairs to break the limitations of previous debiasing models that heavily rely on prior knowledge in data augmentation. We then leverage the augmented data for contrastive learning to train a plug-and-play adapter to mitigate the social biases in PLMs' encoding without tuning the PLMs. Extensive experimental results on BERT, ALBERT, and RoBERTa on several real-world datasets and fairness metrics show that CD<sup>3</sup> outperforms baseline models on gender debiasing and race debiasing while retaining the language modeling capabilities of PLMs.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":14.4,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140879371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Iterative voting with partial preferences 部分偏好的迭代投票

IF 14.4 2区计算机科学 Q1 Arts and Humanities

Artificial Intelligence

Pub Date : 2024-04-21 DOI: 10.1016/j.artint.2024.104133

Zoi Terzopoulou , Panagiotis Terzopoulos , Ulle Endriss

Voting platforms can offer participants the option to sequentially modify their preferences, whenever they have a reason to do so. But such iterative voting may never converge, meaning that a state where all agents are happy with their submitted preferences may never be reached. This problem has received increasing attention within the area of computational social choice. Yet, the relevant literature hinges on the rather stringent assumption that the agents are able to rank all alternatives they are presented with, i.e., that they hold preferences that are linear orders. We relax this assumption and investigate iterative voting under partial preferences. To that end, we define and study two families of rules that extend the well-known k-approval rules in the standard voting framework. Although we show that for none of these rules convergence is guaranteed in general, we also are able to identify natural conditions under which such guarantees can be given. Finally, we conduct simulation experiments to test the practical implications of our results.

只要参与者有理由修改自己的偏好，投票平台就可以为他们提供按顺序修改偏好的选项。但是，这种迭代式投票可能永远不会收敛，也就是说，可能永远不会达到所有参与者都对自己提交的偏好感到满意的状态。这个问题在计算社会选择领域受到越来越多的关注。然而，相关文献都基于一个相当严格的假设，即代理人能够对他们所提交的所有备选方案进行排序，也就是说，他们所持有的偏好都是线性顺序。我们放宽这一假设，研究部分偏好下的迭代投票。为此，我们定义并研究了两个规则系列，它们扩展了标准投票框架中著名的 k-approval 规则。尽管我们证明了这些规则在一般情况下都不能保证收敛性，但我们也能确定在哪些自然条件下可以提供收敛性保证。最后，我们进行了模拟实验，以检验我们的结果的实际意义。

{"title":"Iterative voting with partial preferences","authors":"Zoi Terzopoulou , Panagiotis Terzopoulos , Ulle Endriss","doi":"10.1016/j.artint.2024.104133","DOIUrl":"https://doi.org/10.1016/j.artint.2024.104133","url":null,"abstract":"<div><p>Voting platforms can offer participants the option to sequentially modify their preferences, whenever they have a reason to do so. But such iterative voting may never converge, meaning that a state where all agents are happy with their submitted preferences may never be reached. This problem has received increasing attention within the area of computational social choice. Yet, the relevant literature hinges on the rather stringent assumption that the agents are able to rank all alternatives they are presented with, i.e., that they hold preferences that are linear orders. We relax this assumption and investigate iterative voting under partial preferences. To that end, we define and study two families of rules that extend the well-known <em>k</em>-approval rules in the standard voting framework. Although we show that for none of these rules convergence is guaranteed in general, we also are able to identify natural conditions under which such guarantees can be given. Finally, we conduct simulation experiments to test the practical implications of our results.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":14.4,"publicationDate":"2024-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0004370224000699/pdfft?md5=f45969a9dc2b0460f68ac8a900765bbd&pid=1-s2.0-S0004370224000699-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140639115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Probabilistic reach-avoid for Bayesian neural networks 贝叶斯神经网络的概率避障

IF 5.1 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence

Pub Date : 2024-04-17 DOI: 10.1016/j.artint.2024.104132

Matthew Wicker , Luca Laurenti , Andrea Patane , Nicola Paoletti , Alessandro Abate , Marta Kwiatkowska

Model-based reinforcement learning seeks to simultaneously learn the dynamics of an unknown stochastic environment and synthesise an optimal policy for acting in it. Ensuring the safety and robustness of sequential decisions made through a policy in such an environment is a key challenge for policies intended for safety-critical scenarios. In this work, we investigate two complementary problems: first, computing reach-avoid probabilities for iterative predictions made with dynamical models, with dynamics described by Bayesian neural network (BNN); second, synthesising control policies that are optimal with respect to a given reach-avoid specification (reaching a “target” state, while avoiding a set of “unsafe” states) and a learned BNN model. Our solution leverages interval propagation and backward recursion techniques to compute lower bounds for the probability that a policy's sequence of actions leads to satisfying the reach-avoid specification. Such computed lower bounds provide safety certification for the given policy and BNN model. We then introduce control synthesis algorithms to derive policies maximizing said lower bounds on the safety probability. We demonstrate the effectiveness of our method on a series of control benchmarks characterized by learned BNN dynamics models. On our most challenging benchmark, compared to purely data-driven policies the optimal synthesis algorithm is able to provide more than a four-fold increase in the number of certifiable states and more than a three-fold increase in the average guaranteed reach-avoid probability.

基于模型的强化学习旨在同时学习未知随机环境的动态，并为在该环境中的行动合成最佳政策。确保在这种环境中通过政策做出的连续决策的安全性和稳健性，是针对安全关键场景的政策所面临的主要挑战。在这项工作中，我们研究了两个相辅相成的问题：第一，计算使用贝叶斯神经网络（BNN）描述动态的动态模型迭代预测的到达-避开概率；第二，合成与给定的到达-避开规范（到达 "目标 "状态，同时避开一组 "不安全 "状态）和学习的 BNN 模型相关的最优控制策略。我们的解决方案利用区间传播和后向递归技术，计算出政策行动序列满足到达-避免规范的概率下限。这些计算出的下限为给定的策略和 BNN 模型提供了安全认证。然后，我们引入控制合成算法，推导出使上述安全概率下限最大化的策略。我们在一系列以学习到的 BNN 动态模型为特征的控制基准上证明了我们方法的有效性。在我们最具挑战性的基准上，与纯粹的数据驱动策略相比，最优合成算法能够将可认证状态的数量提高四倍以上，并将平均保证到达-避免概率提高三倍以上。

{"title":"Probabilistic reach-avoid for Bayesian neural networks","authors":"Matthew Wicker , Luca Laurenti , Andrea Patane , Nicola Paoletti , Alessandro Abate , Marta Kwiatkowska","doi":"10.1016/j.artint.2024.104132","DOIUrl":"https://doi.org/10.1016/j.artint.2024.104132","url":null,"abstract":"<div><p>Model-based reinforcement learning seeks to simultaneously learn the dynamics of an unknown stochastic environment and synthesise an optimal policy for acting in it. Ensuring the safety and robustness of sequential decisions made through a policy in such an environment is a key challenge for policies intended for safety-critical scenarios. In this work, we investigate two complementary problems: first, computing reach-avoid probabilities for iterative predictions made with dynamical models, with dynamics described by Bayesian neural network (BNN); second, synthesising control policies that are optimal with respect to a given reach-avoid specification (reaching a “target” state, while avoiding a set of “unsafe” states) and a learned BNN model. Our solution leverages interval propagation and backward recursion techniques to compute lower bounds for the probability that a policy's sequence of actions leads to satisfying the reach-avoid specification. Such computed lower bounds provide safety certification for the given policy and BNN model. We then introduce control synthesis algorithms to derive policies maximizing said lower bounds on the safety probability. We demonstrate the effectiveness of our method on a series of control benchmarks characterized by learned BNN dynamics models. On our most challenging benchmark, compared to purely data-driven policies the optimal synthesis algorithm is able to provide more than a four-fold increase in the number of certifiable states and more than a three-fold increase in the average guaranteed reach-avoid probability.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":5.1,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141487483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A unified momentum-based paradigm of decentralized SGD for non-convex models and heterogeneous data 针对非凸模型和异构数据的基于动量的分散式 SGD 统一范式

IF 14.4 2区计算机科学 Q1 Arts and Humanities

Artificial Intelligence

Pub Date : 2024-04-17 DOI: 10.1016/j.artint.2024.104130

Haizhou Du, Chaoqian Cheng, Chengdong Ni

Emerging distributed applications recently boosted the development of decentralized machine learning, especially in IoT and edge computing fields. In real-world scenarios, the common problems of non-convexity and data heterogeneity result in inefficiency, performance degradation, and development stagnation. The bulk of studies concentrate on one of the issues mentioned above without having a more general framework that has been proven optimal. To this end, we propose a unified paradigm called UMP, which comprises two algorithms D-SUM and GT-DSUM based on the momentum technique with decentralized stochastic gradient descent (SGD). The former provides a convergence guarantee for general non-convex objectives, while the latter is extended by introducing gradient tracking, which estimates the global optimization direction to mitigate data heterogeneity (i.e., distribution drift). We can cover most momentum-based variants based on the classical heavy ball or Nesterov's acceleration with different parameters in UMP. In theory, we rigorously provide the convergence analysis of these two approaches for non-convex objectives and conduct extensive experiments, demonstrating a significant improvement in model accuracy up to 57.6% compared to other methods in practice.

最近，新兴的分布式应用推动了分散式机器学习的发展，尤其是在物联网和边缘计算领域。在现实世界的应用场景中，非凸性和数据异构性等常见问题导致效率低下、性能下降和发展停滞。大部分研究都集中在上述问题中的一个，而没有一个被证明是最佳的通用框架。为此，我们提出了一种名为 UMP 的统一范式，其中包括两种算法 D-SUM 和 GT-DSUM，这两种算法基于分散随机梯度下降（SGD）的动量技术。前者为一般非凸目标提供收敛保证，后者则通过引入梯度跟踪进行扩展，它能估计全局优化方向，以减轻数据异质性（即分布漂移）。在 UMP 中，我们可以涵盖大多数基于经典重球或内斯特洛夫加速度的动量变体，并具有不同的参数。在理论上，我们严格提供了这两种方法对于非凸目标的收敛性分析，并进行了大量实验，结果表明，与其他方法相比，这两种方法在实践中显著提高了模型精度，最高可达 57.6%。

{"title":"A unified momentum-based paradigm of decentralized SGD for non-convex models and heterogeneous data","authors":"Haizhou Du, Chaoqian Cheng, Chengdong Ni","doi":"10.1016/j.artint.2024.104130","DOIUrl":"https://doi.org/10.1016/j.artint.2024.104130","url":null,"abstract":"<div><p>Emerging distributed applications recently boosted the development of decentralized machine learning, especially in IoT and edge computing fields. In real-world scenarios, the common problems of non-convexity and data heterogeneity result in inefficiency, performance degradation, and development stagnation. The bulk of studies concentrate on one of the issues mentioned above without having a more general framework that has been proven optimal. To this end, we propose a unified paradigm called UMP, which comprises two algorithms <span>D-SUM</span> and <span>GT-DSUM</span> based on the momentum technique with decentralized stochastic gradient descent (SGD). The former provides a convergence guarantee for general non-convex objectives, while the latter is extended by introducing gradient tracking, which estimates the global optimization direction to mitigate data heterogeneity (<em>i.e.</em>, distribution drift). We can cover most momentum-based variants based on the classical heavy ball or Nesterov's acceleration with different parameters in UMP. In theory, we rigorously provide the convergence analysis of these two approaches for non-convex objectives and conduct extensive experiments, demonstrating a significant improvement in model accuracy up to 57.6% compared to other methods in practice.</p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":14.4,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140639127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0