arXiv - CS - Artificial Intelligence最新文献_第10页

NAVINACT: Combining Navigation and Imitation Learning for Bootstrapping Reinforcement Learning NAVINACT：结合导航和模仿学习以引导强化学习

arXiv - CS - Artificial Intelligence

Pub Date : 2024-08-07 DOI: arxiv-2408.04054

Amisha Bhaskar, Zahiruddin Mahammad, Sachin R Jadhav, Pratap Tokekar

Reinforcement Learning (RL) has shown remarkable progress in simulationenvironments, yet its application to real-world robotic tasks remains limiteddue to challenges in exploration and generalisation. To address these issues,we introduce NAVINACT, a framework that chooses when the robot should useclassical motion planning-based navigation and when it should learn a policy.To further improve the efficiency in exploration, we use imitation data tobootstrap the exploration. NAVINACT dynamically switches between two modes ofoperation: navigating to a waypoint using classical techniques when away fromthe objects and reinforcement learning for fine-grained manipulation controlwhen about to interact with objects. NAVINACT consists of a multi-headarchitecture composed of ModeNet for mode classification, NavNet for waypointprediction, and InteractNet for precise manipulation. By combining thestrengths of RL and Imitation Learning (IL), NAVINACT improves sampleefficiency and mitigates distribution shift, ensuring robust task execution. Weevaluate our approach across multiple challenging simulation environments andreal-world tasks, demonstrating superior performance in terms of adaptability,efficiency, and generalization compared to existing methods. In both simulatedand real-world settings, NAVINACT demonstrates robust performance. Insimulations, NAVINACT surpasses baseline methods by 10-15% in training successrates at 30k samples and by 30-40% during evaluation phases. In real-worldscenarios, it demonstrates a 30-40% higher success rate on simpler taskscompared to baselines and uniquely succeeds in complex, two-stage manipulationtasks. Datasets and supplementary materials can be found on our website:{https://raaslab.org/projects/NAVINACT/}.

强化学习（RL）在模拟环境中取得了显著进展，但由于在探索和泛化方面存在挑战，其在现实世界机器人任务中的应用仍然有限。为了解决这些问题，我们引入了 NAVINACT，这是一个可以选择机器人何时应该使用基于经典运动规划的导航，何时应该学习策略的框架。NAVINACT 可在两种操作模式之间动态切换：在远离目标时使用经典技术导航至航点，而在即将与目标交互时则通过强化学习进行细粒度操纵控制。NAVINACT 包含一个多头架构，由用于模式分类的 ModeNet、用于航点预测的 NavNet 和用于精确操控的 InteractNet 组成。通过结合 RL 和模仿学习（IL）的优势，NAVINACT 提高了采样效率，减轻了分布偏移，确保了任务的稳健执行。我们在多个具有挑战性的模拟环境和真实世界任务中对我们的方法进行了评估，结果表明，与现有方法相比，我们的方法在适应性、效率和泛化方面都有卓越表现。在模拟和真实世界环境中，NAVINACT 都表现出了强大的性能。在模拟环境中，NAVINACT 的训练成功率在 30k 样本时超过基准方法 10-15%，在评估阶段超过基准方法 30-40%。在现实世界场景中，与基线方法相比，NAVINACT 在较简单任务上的成功率提高了 30-40%，在复杂的两阶段操作任务上也取得了独一无二的成功。数据集和补充材料请访问我们的网站：{https://raaslab.org/projects/NAVINACT/}。

{"title":"NAVINACT: Combining Navigation and Imitation Learning for Bootstrapping Reinforcement Learning","authors":"Amisha Bhaskar, Zahiruddin Mahammad, Sachin R Jadhav, Pratap Tokekar","doi":"arxiv-2408.04054","DOIUrl":"https://doi.org/arxiv-2408.04054","url":null,"abstract":"Reinforcement Learning (RL) has shown remarkable progress in simulation\u0000environments, yet its application to real-world robotic tasks remains limited\u0000due to challenges in exploration and generalisation. To address these issues,\u0000we introduce NAVINACT, a framework that chooses when the robot should use\u0000classical motion planning-based navigation and when it should learn a policy.\u0000To further improve the efficiency in exploration, we use imitation data to\u0000bootstrap the exploration. NAVINACT dynamically switches between two modes of\u0000operation: navigating to a waypoint using classical techniques when away from\u0000the objects and reinforcement learning for fine-grained manipulation control\u0000when about to interact with objects. NAVINACT consists of a multi-head\u0000architecture composed of ModeNet for mode classification, NavNet for waypoint\u0000prediction, and InteractNet for precise manipulation. By combining the\u0000strengths of RL and Imitation Learning (IL), NAVINACT improves sample\u0000efficiency and mitigates distribution shift, ensuring robust task execution. We\u0000evaluate our approach across multiple challenging simulation environments and\u0000real-world tasks, demonstrating superior performance in terms of adaptability,\u0000efficiency, and generalization compared to existing methods. In both simulated\u0000and real-world settings, NAVINACT demonstrates robust performance. In\u0000simulations, NAVINACT surpasses baseline methods by 10-15% in training success\u0000rates at 30k samples and by 30-40% during evaluation phases. In real-world\u0000scenarios, it demonstrates a 30-40% higher success rate on simpler tasks\u0000compared to baselines and uniquely succeeds in complex, two-stage manipulation\u0000tasks. Datasets and supplementary materials can be found on our website:\u0000{https://raaslab.org/projects/NAVINACT/}.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"56 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Anytime Multi-Agent Path Finding with an Adaptive Delay-Based Heuristic 基于自适应延迟启发式的随时多代理路径搜索

arXiv - CS - Artificial Intelligence

Pub Date : 2024-08-06 DOI: arxiv-2408.02960

Thomy Phan, Benran Zhang, Shao-Hung Chan, Sven Koenig

Anytime multi-agent path finding (MAPF) is a promising approach to scalablepath optimization in multi-agent systems. MAPF-LNS, based on Large NeighborhoodSearch (LNS), is the current state-of-the-art approach where a fast initialsolution is iteratively optimized by destroying and repairing selected paths ofthe solution. Current MAPF-LNS variants commonly use an adaptive selectionmechanism to choose among multiple destroy heuristics. However, to determinepromising destroy heuristics, MAPF-LNS requires a considerable amount ofexploration time. As common destroy heuristics are non-adaptive, anyperformance bottleneck caused by these heuristics cannot be overcome viaadaptive heuristic selection alone, thus limiting the overall effectiveness ofMAPF-LNS in terms of solution cost. In this paper, we propose AdaptiveDelay-based Destroy-and-Repair Enhanced with Success-based Self-Learning(ADDRESS) as a single-destroy-heuristic variant of MAPF-LNS. ADDRESS appliesrestricted Thompson Sampling to the top-K set of the most delayed agents toselect a seed agent for adaptive LNS neighborhood generation. We evaluateADDRESS in multiple maps from the MAPF benchmark set and demonstrate costimprovements by at least 50% in large-scale scenarios with up to a thousandagents, compared with the original MAPF-LNS and other state-of-the-art methods.

随时多智能体路径搜索（MAPF）是多智能体系统中一种有前途的可扩展路径优化方法。基于大型邻域搜索（Large NeighborhoodSearch，LNS）的 MAPF-LNS 是目前最先进的方法，它通过破坏和修复所选路径来迭代优化快速初始解决方案。当前的 MAPF-LNS 变体通常使用自适应选择机制，在多种销毁启发式中进行选择。然而，为了确定最佳的破坏启发式，MAPF-LNS 需要大量的探索时间。由于常见的破坏启发式都是非自适应的，因此这些启发式造成的性能瓶颈无法仅通过自适应启发式选择来克服，从而在求解成本方面限制了MAPF-LNS的整体有效性。在本文中，我们提出了基于延迟的自适应破坏与修复（AdaptiveDelay-based Destroy-and-Repair Enhanced with Success-based Self-Learning，ADDRESS），作为 MAPF-LNS 的单一破坏启发式变体。ADDRESS 将限制性汤普森采样（Thompson Sampling）应用于延迟时间最长的前 K 个代理集，为自适应 LNS 邻域生成选择种子代理。我们在 MAPF 基准集的多个地图中对 ADDRESS 进行了评估，结果表明，与原始 MAPF-LNS 和其他最先进的方法相比，在多达上千个代理的大规模场景中，ADDRESS 的成本至少降低了 50%。

{"title":"Anytime Multi-Agent Path Finding with an Adaptive Delay-Based Heuristic","authors":"Thomy Phan, Benran Zhang, Shao-Hung Chan, Sven Koenig","doi":"arxiv-2408.02960","DOIUrl":"https://doi.org/arxiv-2408.02960","url":null,"abstract":"Anytime multi-agent path finding (MAPF) is a promising approach to scalable\u0000path optimization in multi-agent systems. MAPF-LNS, based on Large Neighborhood\u0000Search (LNS), is the current state-of-the-art approach where a fast initial\u0000solution is iteratively optimized by destroying and repairing selected paths of\u0000the solution. Current MAPF-LNS variants commonly use an adaptive selection\u0000mechanism to choose among multiple destroy heuristics. However, to determine\u0000promising destroy heuristics, MAPF-LNS requires a considerable amount of\u0000exploration time. As common destroy heuristics are non-adaptive, any\u0000performance bottleneck caused by these heuristics cannot be overcome via\u0000adaptive heuristic selection alone, thus limiting the overall effectiveness of\u0000MAPF-LNS in terms of solution cost. In this paper, we propose Adaptive\u0000Delay-based Destroy-and-Repair Enhanced with Success-based Self-Learning\u0000(ADDRESS) as a single-destroy-heuristic variant of MAPF-LNS. ADDRESS applies\u0000restricted Thompson Sampling to the top-K set of the most delayed agents to\u0000select a seed agent for adaptive LNS neighborhood generation. We evaluate\u0000ADDRESS in multiple maps from the MAPF benchmark set and demonstrate cost\u0000improvements by at least 50% in large-scale scenarios with up to a thousand\u0000agents, compared with the original MAPF-LNS and other state-of-the-art methods.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Developing PUGG for Polish: A Modern Approach to KBQA, MRC, and IR Dataset Construction 为波兰语开发 PUGG：构建 KBQA、MRC 和 IR 数据集的现代方法

arXiv - CS - Artificial Intelligence

Pub Date : 2024-08-05 DOI: arxiv-2408.02337

Albert Sawczyn, Katsiaryna Viarenich, Konrad Wojtasik, Aleksandra Domogała, Marcin Oleksy, Maciej Piasecki, Tomasz Kajdanowicz

Advancements in AI and natural language processing have revolutionizedmachine-human language interactions, with question answering (QA) systemsplaying a pivotal role. The knowledge base question answering (KBQA) task,utilizing structured knowledge graphs (KG), allows for handling extensiveknowledge-intensive questions. However, a significant gap exists in KBQAdatasets, especially for low-resource languages. Many existing constructionpipelines for these datasets are outdated and inefficient in human labor, andmodern assisting tools like Large Language Models (LLM) are not utilized toreduce the workload. To address this, we have designed and implemented amodern, semi-automated approach for creating datasets, encompassing tasks suchas KBQA, Machine Reading Comprehension (MRC), and Information Retrieval (IR),tailored explicitly for low-resource environments. We executed this pipelineand introduced the PUGG dataset, the first Polish KBQA dataset, and noveldatasets for MRC and IR. Additionally, we provide a comprehensiveimplementation, insightful findings, detailed statistics, and evaluation ofbaseline models.

人工智能和自然语言处理技术的进步彻底改变了机器与人类之间的语言交互，其中问题解答（QA）系统发挥着举足轻重的作用。知识库问题解答（KBQA）任务利用结构化知识图谱（KG），可以处理大量知识密集型问题。然而，在知识库问题解答数据集方面存在很大差距，尤其是在低资源语言方面。这些数据集的许多现有构建管道已经过时，人力效率低下，而且没有利用大语言模型（LLM）等现代辅助工具来减少工作量。为了解决这个问题，我们设计并实施了一种现代的半自动化数据集创建方法，其中包括 KBQA、机器阅读理解（MRC）和信息检索（IR）等任务，专门为低资源环境量身定制。我们实施了这一流程，并推出了波兰首个 KBQA 数据集 PUGG 数据集，以及 MRC 和 IR 的新数据集。此外，我们还提供了全面的实施方案、深入的研究结果、详细的统计数据以及对基准模型的评估。

{"title":"Developing PUGG for Polish: A Modern Approach to KBQA, MRC, and IR Dataset Construction","authors":"Albert Sawczyn, Katsiaryna Viarenich, Konrad Wojtasik, Aleksandra Domogała, Marcin Oleksy, Maciej Piasecki, Tomasz Kajdanowicz","doi":"arxiv-2408.02337","DOIUrl":"https://doi.org/arxiv-2408.02337","url":null,"abstract":"Advancements in AI and natural language processing have revolutionized\u0000machine-human language interactions, with question answering (QA) systems\u0000playing a pivotal role. The knowledge base question answering (KBQA) task,\u0000utilizing structured knowledge graphs (KG), allows for handling extensive\u0000knowledge-intensive questions. However, a significant gap exists in KBQA\u0000datasets, especially for low-resource languages. Many existing construction\u0000pipelines for these datasets are outdated and inefficient in human labor, and\u0000modern assisting tools like Large Language Models (LLM) are not utilized to\u0000reduce the workload. To address this, we have designed and implemented a\u0000modern, semi-automated approach for creating datasets, encompassing tasks such\u0000as KBQA, Machine Reading Comprehension (MRC), and Information Retrieval (IR),\u0000tailored explicitly for low-resource environments. We executed this pipeline\u0000and introduced the PUGG dataset, the first Polish KBQA dataset, and novel\u0000datasets for MRC and IR. Additionally, we provide a comprehensive\u0000implementation, insightful findings, detailed statistics, and evaluation of\u0000baseline models.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Counterfactual Shapley Values for Explaining Reinforcement Learning 解释强化学习的反事实夏普利值

arXiv - CS - Artificial Intelligence

Pub Date : 2024-08-05 DOI: arxiv-2408.02529

Yiwei Shi, Qi Zhang, Kevin McAreavey, Weiru Liu

This paper introduces a novel approach Counterfactual Shapley Values (CSV),which enhances explainability in reinforcement learning (RL) by integratingcounterfactual analysis with Shapley Values. The approach aims to quantify andcompare the contributions of different state dimensions to various actionchoices. To more accurately analyze these impacts, we introduce newcharacteristic value functions, the ``Counterfactual Difference CharacteristicValue" and the ``Average Counterfactual Difference Characteristic Value." Thesefunctions help calculate the Shapley values to evaluate the differences incontributions between optimal and non-optimal actions. Experiments acrossseveral RL domains, such as GridWorld, FrozenLake, and Taxi, demonstrate theeffectiveness of the CSV method. The results show that this method not onlyimproves transparency in complex RL systems but also quantifies the differencesacross various decisions.

本文介绍了一种新方法 "反事实夏普利值"（Counterfactual Shapley Values，CSV），它通过将反事实分析与夏普利值相结合，增强了强化学习（RL）中的可解释性。该方法旨在量化和比较不同状态维度对各种行动选择的贡献。为了更准确地分析这些影响，我们引入了新的特征值函数，即 "反事实差异特征值 "和 "平均反事实差异特征值"。这些函数有助于计算沙普利值，以评估最优行动和非最优行动之间的分布差异。在 GridWorld、FrozenLake 和 Taxi 等多个 RL 领域进行的实验证明了 CSV 方法的有效性。结果表明，这种方法不仅能提高复杂 RL 系统的透明度，还能量化各种决策之间的差异。

引用次数: 0

Perfect Information Monte Carlo with Postponing Reasoning 完美信息蒙特卡洛与延迟推理

arXiv - CS - Artificial Intelligence

Pub Date : 2024-08-05 DOI: arxiv-2408.02380

Jérôme Arjonilla, Abdallah Saffidine, Tristan Cazenave

Imperfect information games, such as Bridge and Skat, present challenges dueto state-space explosion and hidden information, posing formidable obstaclesfor search algorithms. Determinization-based algorithms offer a resolution bysampling hidden information and solving the game in a perfect informationsetting, facilitating rapid and effective action estimation. However,transitioning to perfect information introduces challenges, notably one calledstrategy fusion.This research introduces `Extended Perfect Information MonteCarlo' (EPIMC), an online algorithm inspired by the state-of-the-artdeterminization-based approach Perfect Information Monte Carlo (PIMC). EPIMCenhances the capabilities of PIMC by postponing the perfect informationresolution, reducing alleviating issues related to strategy fusion. However,the decision to postpone the leaf evaluator introduces novel considerations,such as the interplay between prior levels of reasoning and the newly deferredresolution. In our empirical analysis, we investigate the performance of EPIMCacross a range of games, with a particular focus on those characterized byvarying degrees of strategy fusion. Our results demonstrate notable performanceenhancements, particularly in games where strategy fusion significantly impactsgameplay. Furthermore, our research contributes to the theoretical foundationof determinization-based algorithms addressing challenges associated withstrategy fusion.%, thereby enhancing our understanding of these algorithmswithin the context of imperfect information game scenarios.

不完全信息博弈（如桥牌和 Skat）因状态空间爆炸和隐藏信息而面临挑战，给搜索算法带来了巨大障碍。基于确定性的算法提供了一种解决方案，即采样隐藏信息并在完美信息环境中求解博弈，从而促进快速有效的行动估计。本研究介绍了 "扩展完美信息蒙特卡洛"（EPIMC），这是一种在线算法，其灵感来自最先进的基于确定化的完美信息蒙特卡洛（PIMC）方法。EPIMC 通过推迟完美信息解析来增强 PIMC 的能力，从而减少与策略融合相关的问题。然而，推迟叶片评估器的决定引入了新的考虑因素，例如先前推理水平与新推迟的分辨率之间的相互作用。在实证分析中，我们研究了 EPIMC 在一系列博弈中的表现，尤其关注那些策略融合程度不同的博弈。我们的研究结果表明，EPIMC 的性能显著提高，尤其是在策略融合对游戏有重大影响的游戏中。此外，我们的研究为基于确定性的算法解决策略融合相关挑战奠定了理论基础，从而增强了我们对这些算法在不完全信息博弈场景下的理解。

{"title":"Perfect Information Monte Carlo with Postponing Reasoning","authors":"Jérôme Arjonilla, Abdallah Saffidine, Tristan Cazenave","doi":"arxiv-2408.02380","DOIUrl":"https://doi.org/arxiv-2408.02380","url":null,"abstract":"Imperfect information games, such as Bridge and Skat, present challenges due\u0000to state-space explosion and hidden information, posing formidable obstacles\u0000for search algorithms. Determinization-based algorithms offer a resolution by\u0000sampling hidden information and solving the game in a perfect information\u0000setting, facilitating rapid and effective action estimation. However,\u0000transitioning to perfect information introduces challenges, notably one called\u0000strategy fusion.This research introduces `Extended Perfect Information Monte\u0000Carlo' (EPIMC), an online algorithm inspired by the state-of-the-art\u0000determinization-based approach Perfect Information Monte Carlo (PIMC). EPIMC\u0000enhances the capabilities of PIMC by postponing the perfect information\u0000resolution, reducing alleviating issues related to strategy fusion. However,\u0000the decision to postpone the leaf evaluator introduces novel considerations,\u0000such as the interplay between prior levels of reasoning and the newly deferred\u0000resolution. In our empirical analysis, we investigate the performance of EPIMC\u0000across a range of games, with a particular focus on those characterized by\u0000varying degrees of strategy fusion. Our results demonstrate notable performance\u0000enhancements, particularly in games where strategy fusion significantly impacts\u0000gameplay. Furthermore, our research contributes to the theoretical foundation\u0000of determinization-based algorithms addressing challenges associated with\u0000strategy fusion.%, thereby enhancing our understanding of these algorithms\u0000within the context of imperfect information game scenarios.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"15 Suppl 1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Operationalizing Contextual Integrity in Privacy-Conscious Assistants 在具有隐私意识的助手中操作情境完整性

arXiv - CS - Artificial Intelligence

Pub Date : 2024-08-05 DOI: arxiv-2408.02373

Sahra Ghalebikesabi, Eugene Bagdasaryan, Ren Yi, Itay Yona, Ilia Shumailov, Aneesh Pappu, Chongyang Shi, Laura Weidinger, Robert Stanforth, Leonard Berrada, Pushmeet Kohli, Po-Sen Huang, Borja Balle

Advanced AI assistants combine frontier LLMs and tool access to autonomouslyperform complex tasks on behalf of users. While the helpfulness of suchassistants can increase dramatically with access to user information includingemails and documents, this raises privacy concerns about assistants sharinginappropriate information with third parties without user supervision. To steerinformation-sharing assistants to behave in accordance with privacyexpectations, we propose to operationalize $textit{contextual integrity}$(CI), a framework that equates privacy with the appropriate flow of informationin a given context. In particular, we design and evaluate a number ofstrategies to steer assistants' information-sharing actions to be CI compliant.Our evaluation is based on a novel form filling benchmark composed of syntheticdata and human annotations, and it reveals that prompting frontier LLMs toperform CI-based reasoning yields strong results.

先进的人工智能助手结合了前沿 LLM 和工具访问，可代表用户自主执行复杂的任务。虽然这类助手在获取用户信息（包括电子邮件和文档）后能显著提高帮助性，但这也引发了隐私问题，即助手在没有用户监督的情况下与第三方共享不适当的信息。为了引导信息共享助手的行为符合隐私期望，我们提出了$textit{contextual integrity}$(CI)，这是一个将隐私等同于特定情境下适当信息流的框架。我们的评估基于一个由合成数据和人类注释组成的新颖的表单填写基准，它揭示了促使前沿 LLM 执行基于 CI 的推理会产生强大的结果。

引用次数: 0

Development of REGAI: Rubric Enabled Generative Artificial Intelligence 开发 REGAI：Rubric Enabled Generative Artificial Intelligence（评分标准支持的生成式人工智能

arXiv - CS - Artificial Intelligence

Pub Date : 2024-08-05 DOI: arxiv-2408.02811

Zach Johnson, Jeremy Straub

This paper presents and evaluates a new retrieval augmented generation (RAG)and large language model (LLM)-based artificial intelligence (AI) technique:rubric enabled generative artificial intelligence (REGAI). REGAI uses rubrics,which can be created manually or automatically by the system, to enhance theperformance of LLMs for evaluation purposes. REGAI improves on the performanceof both classical LLMs and RAG-based LLM techniques. This paper describesREGAI, presents data regarding its performance and discusses several possibleapplication areas for the technology.

本文介绍并评估了一种新的基于检索增强生成（RAG）和大型语言模型（LLM）的人工智能（AI）技术：支持评分标准的生成式人工智能（REGAI）。REGAI 使用评分标准（可由系统手动或自动创建）来提高 LLM 的性能，以达到评估目的。REGAI 提高了经典 LLM 和基于 RAG 的 LLM 技术的性能。本文介绍了 REGAI，提供了有关其性能的数据，并讨论了该技术的几个可能应用领域。

引用次数: 0

SR-CIS: Self-Reflective Incremental System with Decoupled Memory and Reasoning SR-CIS：记忆与推理解耦的自反递增系统

arXiv - CS - Artificial Intelligence

Pub Date : 2024-08-04 DOI: arxiv-2408.01970

Biqing Qi, Junqi Gao, Xinquan Chen, Dong Li, Weinan Zhang, Bowen Zhou

The ability of humans to rapidly learn new knowledge while retaining oldmemories poses a significant challenge for current deep learning models. Tohandle this challenge, we draw inspiration from human memory and learningmechanisms and propose the Self-Reflective Complementary Incremental System(SR-CIS). Comprising the deconstructed Complementary Inference Module (CIM) andComplementary Memory Module (CMM), SR-CIS features a small model for fastinference and a large model for slow deliberation in CIM, enabled by theConfidence-Aware Online Anomaly Detection (CA-OAD) mechanism for efficientcollaboration. CMM consists of task-specific Short-Term Memory (STM) region anda universal Long-Term Memory (LTM) region. By setting task-specific Low-RankAdaptive (LoRA) and corresponding prototype weights and biases, it instantiatesexternal storage for parameter and representation memory, thus deconstructingthe memory module from the inference module. By storing textual descriptions ofimages during training and combining them with the Scenario Replay Module (SRM)post-training for memory combination, along with periodic short-to-long-termmemory restructuring, SR-CIS achieves stable incremental memory with limitedstorage requirements. Balancing model plasticity and memory stability underconstraints of limited storage and low data resources, SR-CIS surpassesexisting competitive baselines on multiple standard and few-shot incrementallearning benchmarks.

人类既能快速学习新知识，又能保留旧记忆，这给当前的深度学习模型带来了巨大挑战。为了应对这一挑战，我们从人类记忆和学习机制中汲取灵感，提出了自反互补增量系统（SR-CIS）。SR-CIS由解构的互补推理模块（CIM）和互补记忆模块（CMM）组成，其特点是CIM中用于快速推理的小模型和用于慢速审议的大模型，并通过可信度感知在线异常检测（CA-OAD）机制实现高效协作。CMM 由特定任务的短期记忆（STM）区域和通用的长期记忆（LTM）区域组成。通过设置特定任务的低强自适应（LoRA）和相应的原型权重和偏置，它将参数和表示记忆的外部存储实例化，从而将记忆模块从推理模块中解构出来。通过在训练过程中存储图像的文本描述，并在训练后将其与情景重放模块（SRM）结合起来进行记忆组合，再加上周期性的长短期记忆重组，SR-CIS 以有限的存储需求实现了稳定的增量记忆。在有限存储和低数据资源的限制下，SR-CIS 平衡了模型可塑性和记忆稳定性，在多个标准和少量增量学习基准上超越了现有的竞争基准。

{"title":"SR-CIS: Self-Reflective Incremental System with Decoupled Memory and Reasoning","authors":"Biqing Qi, Junqi Gao, Xinquan Chen, Dong Li, Weinan Zhang, Bowen Zhou","doi":"arxiv-2408.01970","DOIUrl":"https://doi.org/arxiv-2408.01970","url":null,"abstract":"The ability of humans to rapidly learn new knowledge while retaining old\u0000memories poses a significant challenge for current deep learning models. To\u0000handle this challenge, we draw inspiration from human memory and learning\u0000mechanisms and propose the Self-Reflective Complementary Incremental System\u0000(SR-CIS). Comprising the deconstructed Complementary Inference Module (CIM) and\u0000Complementary Memory Module (CMM), SR-CIS features a small model for fast\u0000inference and a large model for slow deliberation in CIM, enabled by the\u0000Confidence-Aware Online Anomaly Detection (CA-OAD) mechanism for efficient\u0000collaboration. CMM consists of task-specific Short-Term Memory (STM) region and\u0000a universal Long-Term Memory (LTM) region. By setting task-specific Low-Rank\u0000Adaptive (LoRA) and corresponding prototype weights and biases, it instantiates\u0000external storage for parameter and representation memory, thus deconstructing\u0000the memory module from the inference module. By storing textual descriptions of\u0000images during training and combining them with the Scenario Replay Module (SRM)\u0000post-training for memory combination, along with periodic short-to-long-term\u0000memory restructuring, SR-CIS achieves stable incremental memory with limited\u0000storage requirements. Balancing model plasticity and memory stability under\u0000constraints of limited storage and low data resources, SR-CIS surpasses\u0000existing competitive baselines on multiple standard and few-shot incremental\u0000learning benchmarks.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Visual Grounding for Object-Level Generalization in Reinforcement Learning 强化学习中对象级泛化的视觉基础

arXiv - CS - Artificial Intelligence

Pub Date : 2024-08-04 DOI: arxiv-2408.01942

Haobin Jiang, Zongqing Lu

Generalization is a pivotal challenge for agents following natural languageinstructions. To approach this goal, we leverage a vision-language model (VLM)for visual grounding and transfer its vision-language knowledge intoreinforcement learning (RL) for object-centric tasks, which makes the agentcapable of zero-shot generalization to unseen objects and instructions. Byvisual grounding, we obtain an object-grounded confidence map for the targetobject indicated in the instruction. Based on this map, we introduce two routesto transfer VLM knowledge into RL. Firstly, we propose an object-groundedintrinsic reward function derived from the confidence map to more effectivelyguide the agent towards the target object. Secondly, the confidence map offersa more unified, accessible task representation for the agent's policy, comparedto language embeddings. This enables the agent to process unseen objects andinstructions through comprehensible visual confidence maps, facilitatingzero-shot object-level generalization. Single-task experiments prove that ourintrinsic reward significantly improves performance on challenging skilllearning. In multi-task experiments, through testing on tasks beyond thetraining set, we show that the agent, when provided with the confidence map asthe task representation, possesses better generalization capabilities thanlanguage-based conditioning. The code is available athttps://github.com/PKU-RL/COPL.

对于遵循自然语言指令的代理来说，泛化是一项关键挑战。为了实现这一目标，我们利用视觉语言模型（VLM）进行视觉接地，并将其视觉语言知识转移到以对象为中心的任务的强化学习（RL）中，从而使代理能够对未见过的对象和指令进行零点泛化。通过视觉接地，我们获得了指令中指示的目标对象的对象接地置信度图。在此基础上，我们提出了两种将 VLM 知识转移到 RL 中的方法。首先，我们提出了一个由置信度图衍生出的基于对象的内在奖励函数，以更有效地引导代理走向目标对象。其次，与语言嵌入相比，置信度图为代理的策略提供了更统一、更易用的任务表示。这使代理能够通过可理解的可视化置信度地图来处理未见过的对象和指令，从而促进零镜头对象级的泛化。单任务实验证明，我们的内在奖励显著提高了高难度技能学习的性能。在多任务实验中，通过对训练集以外的任务进行测试，我们证明，当提供置信度图作为任务表征时，代理拥有比基于语言的条件反射更好的泛化能力。代码可在https://github.com/PKU-RL/COPL。

{"title":"Visual Grounding for Object-Level Generalization in Reinforcement Learning","authors":"Haobin Jiang, Zongqing Lu","doi":"arxiv-2408.01942","DOIUrl":"https://doi.org/arxiv-2408.01942","url":null,"abstract":"Generalization is a pivotal challenge for agents following natural language\u0000instructions. To approach this goal, we leverage a vision-language model (VLM)\u0000for visual grounding and transfer its vision-language knowledge into\u0000reinforcement learning (RL) for object-centric tasks, which makes the agent\u0000capable of zero-shot generalization to unseen objects and instructions. By\u0000visual grounding, we obtain an object-grounded confidence map for the target\u0000object indicated in the instruction. Based on this map, we introduce two routes\u0000to transfer VLM knowledge into RL. Firstly, we propose an object-grounded\u0000intrinsic reward function derived from the confidence map to more effectively\u0000guide the agent towards the target object. Secondly, the confidence map offers\u0000a more unified, accessible task representation for the agent's policy, compared\u0000to language embeddings. This enables the agent to process unseen objects and\u0000instructions through comprehensible visual confidence maps, facilitating\u0000zero-shot object-level generalization. Single-task experiments prove that our\u0000intrinsic reward significantly improves performance on challenging skill\u0000learning. In multi-task experiments, through testing on tasks beyond the\u0000training set, we show that the agent, when provided with the confidence map as\u0000the task representation, possesses better generalization capabilities than\u0000language-based conditioning. The code is available at\u0000https://github.com/PKU-RL/COPL.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Integrating Large Language Models and Knowledge Graphs for Extraction and Validation of Textual Test Data 整合大型语言模型和知识图谱，提取和验证文本测试数据

arXiv - CS - Artificial Intelligence

Pub Date : 2024-08-03 DOI: arxiv-2408.01700

Antonio De Santis, Marco Balduini, Federico De Santis, Andrea Proia, Arsenio Leo, Marco Brambilla, Emanuele Della Valle

Aerospace manufacturing companies, such as Thales Alenia Space, design,develop, integrate, verify, and validate products characterized by highcomplexity and low volume. They carefully document all phases for each productbut analyses across products are challenging due to the heterogeneity andunstructured nature of the data in documents. In this paper, we propose ahybrid methodology that leverages Knowledge Graphs (KGs) in conjunction withLarge Language Models (LLMs) to extract and validate data contained in thesedocuments. We consider a case study focused on test data related to electronicboards for satellites. To do so, we extend the Semantic Sensor Networkontology. We store the metadata of the reports in a KG, while the actual testresults are stored in parquet accessible via a Virtual Knowledge Graph. Thevalidation process is managed using an LLM-based approach. We also conduct abenchmarking study to evaluate the performance of state-of-the-art LLMs inexecuting this task. Finally, we analyze the costs and benefits of automatingpreexisting processes of manual data extraction and validation for subsequentcross-report analyses.

航空航天制造公司（如泰雷兹阿莱尼亚宇航公司）设计、开发、集成、验证和确认的产品具有高复杂性和低产量的特点。他们仔细记录每个产品的所有阶段，但由于文档中数据的异质性和非结构化性质，跨产品分析具有挑战性。在本文中，我们提出了一种混合方法，利用知识图谱（KG）和大型语言模型（LLM）来提取和验证文档中包含的数据。我们考虑的案例研究侧重于与卫星电子板相关的测试数据。为此，我们扩展了语义传感器网络本体。我们将报告的元数据存储在 KG 中，而实际测试结果则存储在可通过虚拟知识图谱访问的 parquet 中。验证过程采用基于 LLM 的方法进行管理。我们还进行了一项enchmarking 研究，以评估最先进的 LLM 在执行这项任务时的性能。最后，我们分析了将现有的人工数据提取和验证流程自动化以进行后续交叉报告分析的成本和收益。

{"title":"Integrating Large Language Models and Knowledge Graphs for Extraction and Validation of Textual Test Data","authors":"Antonio De Santis, Marco Balduini, Federico De Santis, Andrea Proia, Arsenio Leo, Marco Brambilla, Emanuele Della Valle","doi":"arxiv-2408.01700","DOIUrl":"https://doi.org/arxiv-2408.01700","url":null,"abstract":"Aerospace manufacturing companies, such as Thales Alenia Space, design,\u0000develop, integrate, verify, and validate products characterized by high\u0000complexity and low volume. They carefully document all phases for each product\u0000but analyses across products are challenging due to the heterogeneity and\u0000unstructured nature of the data in documents. In this paper, we propose a\u0000hybrid methodology that leverages Knowledge Graphs (KGs) in conjunction with\u0000Large Language Models (LLMs) to extract and validate data contained in these\u0000documents. We consider a case study focused on test data related to electronic\u0000boards for satellites. To do so, we extend the Semantic Sensor Network\u0000ontology. We store the metadata of the reports in a KG, while the actual test\u0000results are stored in parquet accessible via a Virtual Knowledge Graph. The\u0000validation process is managed using an LLM-based approach. We also conduct a\u0000benchmarking study to evaluate the performance of state-of-the-art LLMs in\u0000executing this task. Finally, we analyze the costs and benefits of automating\u0000preexisting processes of manual data extraction and validation for subsequent\u0000cross-report analyses.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0