Amisha Bhaskar, Zahiruddin Mahammad, Sachin R Jadhav, Pratap Tokekar
Reinforcement Learning (RL) has shown remarkable progress in simulation environments, yet its application to real-world robotic tasks remains limited due to challenges in exploration and generalisation. To address these issues, we introduce NAVINACT, a framework that chooses when the robot should use classical motion planning-based navigation and when it should learn a policy. To further improve the efficiency in exploration, we use imitation data to bootstrap the exploration. NAVINACT dynamically switches between two modes of operation: navigating to a waypoint using classical techniques when away from the objects and reinforcement learning for fine-grained manipulation control when about to interact with objects. NAVINACT consists of a multi-head architecture composed of ModeNet for mode classification, NavNet for waypoint prediction, and InteractNet for precise manipulation. By combining the strengths of RL and Imitation Learning (IL), NAVINACT improves sample efficiency and mitigates distribution shift, ensuring robust task execution. We evaluate our approach across multiple challenging simulation environments and real-world tasks, demonstrating superior performance in terms of adaptability, efficiency, and generalization compared to existing methods. In both simulated and real-world settings, NAVINACT demonstrates robust performance. In simulations, NAVINACT surpasses baseline methods by 10-15% in training success rates at 30k samples and by 30-40% during evaluation phases. In real-world scenarios, it demonstrates a 30-40% higher success rate on simpler tasks compared to baselines and uniquely succeeds in complex, two-stage manipulation tasks. Datasets and supplementary materials can be found on our website: {https://raaslab.org/projects/NAVINACT/}.
{"title":"NAVINACT: Combining Navigation and Imitation Learning for Bootstrapping Reinforcement Learning","authors":"Amisha Bhaskar, Zahiruddin Mahammad, Sachin R Jadhav, Pratap Tokekar","doi":"arxiv-2408.04054","DOIUrl":"https://doi.org/arxiv-2408.04054","url":null,"abstract":"Reinforcement Learning (RL) has shown remarkable progress in simulation\u0000environments, yet its application to real-world robotic tasks remains limited\u0000due to challenges in exploration and generalisation. To address these issues,\u0000we introduce NAVINACT, a framework that chooses when the robot should use\u0000classical motion planning-based navigation and when it should learn a policy.\u0000To further improve the efficiency in exploration, we use imitation data to\u0000bootstrap the exploration. NAVINACT dynamically switches between two modes of\u0000operation: navigating to a waypoint using classical techniques when away from\u0000the objects and reinforcement learning for fine-grained manipulation control\u0000when about to interact with objects. NAVINACT consists of a multi-head\u0000architecture composed of ModeNet for mode classification, NavNet for waypoint\u0000prediction, and InteractNet for precise manipulation. By combining the\u0000strengths of RL and Imitation Learning (IL), NAVINACT improves sample\u0000efficiency and mitigates distribution shift, ensuring robust task execution. We\u0000evaluate our approach across multiple challenging simulation environments and\u0000real-world tasks, demonstrating superior performance in terms of adaptability,\u0000efficiency, and generalization compared to existing methods. In both simulated\u0000and real-world settings, NAVINACT demonstrates robust performance. In\u0000simulations, NAVINACT surpasses baseline methods by 10-15% in training success\u0000rates at 30k samples and by 30-40% during evaluation phases. In real-world\u0000scenarios, it demonstrates a 30-40% higher success rate on simpler tasks\u0000compared to baselines and uniquely succeeds in complex, two-stage manipulation\u0000tasks. Datasets and supplementary materials can be found on our website:\u0000{https://raaslab.org/projects/NAVINACT/}.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"56 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomy Phan, Benran Zhang, Shao-Hung Chan, Sven Koenig
Anytime multi-agent path finding (MAPF) is a promising approach to scalable path optimization in multi-agent systems. MAPF-LNS, based on Large Neighborhood Search (LNS), is the current state-of-the-art approach where a fast initial solution is iteratively optimized by destroying and repairing selected paths of the solution. Current MAPF-LNS variants commonly use an adaptive selection mechanism to choose among multiple destroy heuristics. However, to determine promising destroy heuristics, MAPF-LNS requires a considerable amount of exploration time. As common destroy heuristics are non-adaptive, any performance bottleneck caused by these heuristics cannot be overcome via adaptive heuristic selection alone, thus limiting the overall effectiveness of MAPF-LNS in terms of solution cost. In this paper, we propose Adaptive Delay-based Destroy-and-Repair Enhanced with Success-based Self-Learning (ADDRESS) as a single-destroy-heuristic variant of MAPF-LNS. ADDRESS applies restricted Thompson Sampling to the top-K set of the most delayed agents to select a seed agent for adaptive LNS neighborhood generation. We evaluate ADDRESS in multiple maps from the MAPF benchmark set and demonstrate cost improvements by at least 50% in large-scale scenarios with up to a thousand agents, compared with the original MAPF-LNS and other state-of-the-art methods.
{"title":"Anytime Multi-Agent Path Finding with an Adaptive Delay-Based Heuristic","authors":"Thomy Phan, Benran Zhang, Shao-Hung Chan, Sven Koenig","doi":"arxiv-2408.02960","DOIUrl":"https://doi.org/arxiv-2408.02960","url":null,"abstract":"Anytime multi-agent path finding (MAPF) is a promising approach to scalable\u0000path optimization in multi-agent systems. MAPF-LNS, based on Large Neighborhood\u0000Search (LNS), is the current state-of-the-art approach where a fast initial\u0000solution is iteratively optimized by destroying and repairing selected paths of\u0000the solution. Current MAPF-LNS variants commonly use an adaptive selection\u0000mechanism to choose among multiple destroy heuristics. However, to determine\u0000promising destroy heuristics, MAPF-LNS requires a considerable amount of\u0000exploration time. As common destroy heuristics are non-adaptive, any\u0000performance bottleneck caused by these heuristics cannot be overcome via\u0000adaptive heuristic selection alone, thus limiting the overall effectiveness of\u0000MAPF-LNS in terms of solution cost. In this paper, we propose Adaptive\u0000Delay-based Destroy-and-Repair Enhanced with Success-based Self-Learning\u0000(ADDRESS) as a single-destroy-heuristic variant of MAPF-LNS. ADDRESS applies\u0000restricted Thompson Sampling to the top-K set of the most delayed agents to\u0000select a seed agent for adaptive LNS neighborhood generation. We evaluate\u0000ADDRESS in multiple maps from the MAPF benchmark set and demonstrate cost\u0000improvements by at least 50% in large-scale scenarios with up to a thousand\u0000agents, compared with the original MAPF-LNS and other state-of-the-art methods.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Albert Sawczyn, Katsiaryna Viarenich, Konrad Wojtasik, Aleksandra Domogała, Marcin Oleksy, Maciej Piasecki, Tomasz Kajdanowicz
Advancements in AI and natural language processing have revolutionized machine-human language interactions, with question answering (QA) systems playing a pivotal role. The knowledge base question answering (KBQA) task, utilizing structured knowledge graphs (KG), allows for handling extensive knowledge-intensive questions. However, a significant gap exists in KBQA datasets, especially for low-resource languages. Many existing construction pipelines for these datasets are outdated and inefficient in human labor, and modern assisting tools like Large Language Models (LLM) are not utilized to reduce the workload. To address this, we have designed and implemented a modern, semi-automated approach for creating datasets, encompassing tasks such as KBQA, Machine Reading Comprehension (MRC), and Information Retrieval (IR), tailored explicitly for low-resource environments. We executed this pipeline and introduced the PUGG dataset, the first Polish KBQA dataset, and novel datasets for MRC and IR. Additionally, we provide a comprehensive implementation, insightful findings, detailed statistics, and evaluation of baseline models.
人工智能和自然语言处理技术的进步彻底改变了机器与人类之间的语言交互,其中问题解答(QA)系统发挥着举足轻重的作用。知识库问题解答(KBQA)任务利用结构化知识图谱(KG),可以处理大量知识密集型问题。然而,在知识库问题解答数据集方面存在很大差距,尤其是在低资源语言方面。这些数据集的许多现有构建管道已经过时,人力效率低下,而且没有利用大语言模型(LLM)等现代辅助工具来减少工作量。为了解决这个问题,我们设计并实施了一种现代的半自动化数据集创建方法,其中包括 KBQA、机器阅读理解(MRC)和信息检索(IR)等任务,专门为低资源环境量身定制。我们实施了这一流程,并推出了波兰首个 KBQA 数据集 PUGG 数据集,以及 MRC 和 IR 的新数据集。此外,我们还提供了全面的实施方案、深入的研究结果、详细的统计数据以及对基准模型的评估。
{"title":"Developing PUGG for Polish: A Modern Approach to KBQA, MRC, and IR Dataset Construction","authors":"Albert Sawczyn, Katsiaryna Viarenich, Konrad Wojtasik, Aleksandra Domogała, Marcin Oleksy, Maciej Piasecki, Tomasz Kajdanowicz","doi":"arxiv-2408.02337","DOIUrl":"https://doi.org/arxiv-2408.02337","url":null,"abstract":"Advancements in AI and natural language processing have revolutionized\u0000machine-human language interactions, with question answering (QA) systems\u0000playing a pivotal role. The knowledge base question answering (KBQA) task,\u0000utilizing structured knowledge graphs (KG), allows for handling extensive\u0000knowledge-intensive questions. However, a significant gap exists in KBQA\u0000datasets, especially for low-resource languages. Many existing construction\u0000pipelines for these datasets are outdated and inefficient in human labor, and\u0000modern assisting tools like Large Language Models (LLM) are not utilized to\u0000reduce the workload. To address this, we have designed and implemented a\u0000modern, semi-automated approach for creating datasets, encompassing tasks such\u0000as KBQA, Machine Reading Comprehension (MRC), and Information Retrieval (IR),\u0000tailored explicitly for low-resource environments. We executed this pipeline\u0000and introduced the PUGG dataset, the first Polish KBQA dataset, and novel\u0000datasets for MRC and IR. Additionally, we provide a comprehensive\u0000implementation, insightful findings, detailed statistics, and evaluation of\u0000baseline models.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces a novel approach Counterfactual Shapley Values (CSV), which enhances explainability in reinforcement learning (RL) by integrating counterfactual analysis with Shapley Values. The approach aims to quantify and compare the contributions of different state dimensions to various action choices. To more accurately analyze these impacts, we introduce new characteristic value functions, the ``Counterfactual Difference Characteristic Value" and the ``Average Counterfactual Difference Characteristic Value." These functions help calculate the Shapley values to evaluate the differences in contributions between optimal and non-optimal actions. Experiments across several RL domains, such as GridWorld, FrozenLake, and Taxi, demonstrate the effectiveness of the CSV method. The results show that this method not only improves transparency in complex RL systems but also quantifies the differences across various decisions.
{"title":"Counterfactual Shapley Values for Explaining Reinforcement Learning","authors":"Yiwei Shi, Qi Zhang, Kevin McAreavey, Weiru Liu","doi":"arxiv-2408.02529","DOIUrl":"https://doi.org/arxiv-2408.02529","url":null,"abstract":"This paper introduces a novel approach Counterfactual Shapley Values (CSV),\u0000which enhances explainability in reinforcement learning (RL) by integrating\u0000counterfactual analysis with Shapley Values. The approach aims to quantify and\u0000compare the contributions of different state dimensions to various action\u0000choices. To more accurately analyze these impacts, we introduce new\u0000characteristic value functions, the ``Counterfactual Difference Characteristic\u0000Value\" and the ``Average Counterfactual Difference Characteristic Value.\" These\u0000functions help calculate the Shapley values to evaluate the differences in\u0000contributions between optimal and non-optimal actions. Experiments across\u0000several RL domains, such as GridWorld, FrozenLake, and Taxi, demonstrate the\u0000effectiveness of the CSV method. The results show that this method not only\u0000improves transparency in complex RL systems but also quantifies the differences\u0000across various decisions.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"191 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Imperfect information games, such as Bridge and Skat, present challenges due to state-space explosion and hidden information, posing formidable obstacles for search algorithms. Determinization-based algorithms offer a resolution by sampling hidden information and solving the game in a perfect information setting, facilitating rapid and effective action estimation. However, transitioning to perfect information introduces challenges, notably one called strategy fusion.This research introduces `Extended Perfect Information Monte Carlo' (EPIMC), an online algorithm inspired by the state-of-the-art determinization-based approach Perfect Information Monte Carlo (PIMC). EPIMC enhances the capabilities of PIMC by postponing the perfect information resolution, reducing alleviating issues related to strategy fusion. However, the decision to postpone the leaf evaluator introduces novel considerations, such as the interplay between prior levels of reasoning and the newly deferred resolution. In our empirical analysis, we investigate the performance of EPIMC across a range of games, with a particular focus on those characterized by varying degrees of strategy fusion. Our results demonstrate notable performance enhancements, particularly in games where strategy fusion significantly impacts gameplay. Furthermore, our research contributes to the theoretical foundation of determinization-based algorithms addressing challenges associated with strategy fusion.%, thereby enhancing our understanding of these algorithms within the context of imperfect information game scenarios.
{"title":"Perfect Information Monte Carlo with Postponing Reasoning","authors":"Jérôme Arjonilla, Abdallah Saffidine, Tristan Cazenave","doi":"arxiv-2408.02380","DOIUrl":"https://doi.org/arxiv-2408.02380","url":null,"abstract":"Imperfect information games, such as Bridge and Skat, present challenges due\u0000to state-space explosion and hidden information, posing formidable obstacles\u0000for search algorithms. Determinization-based algorithms offer a resolution by\u0000sampling hidden information and solving the game in a perfect information\u0000setting, facilitating rapid and effective action estimation. However,\u0000transitioning to perfect information introduces challenges, notably one called\u0000strategy fusion.This research introduces `Extended Perfect Information Monte\u0000Carlo' (EPIMC), an online algorithm inspired by the state-of-the-art\u0000determinization-based approach Perfect Information Monte Carlo (PIMC). EPIMC\u0000enhances the capabilities of PIMC by postponing the perfect information\u0000resolution, reducing alleviating issues related to strategy fusion. However,\u0000the decision to postpone the leaf evaluator introduces novel considerations,\u0000such as the interplay between prior levels of reasoning and the newly deferred\u0000resolution. In our empirical analysis, we investigate the performance of EPIMC\u0000across a range of games, with a particular focus on those characterized by\u0000varying degrees of strategy fusion. Our results demonstrate notable performance\u0000enhancements, particularly in games where strategy fusion significantly impacts\u0000gameplay. Furthermore, our research contributes to the theoretical foundation\u0000of determinization-based algorithms addressing challenges associated with\u0000strategy fusion.%, thereby enhancing our understanding of these algorithms\u0000within the context of imperfect information game scenarios.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"15 Suppl 1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sahra Ghalebikesabi, Eugene Bagdasaryan, Ren Yi, Itay Yona, Ilia Shumailov, Aneesh Pappu, Chongyang Shi, Laura Weidinger, Robert Stanforth, Leonard Berrada, Pushmeet Kohli, Po-Sen Huang, Borja Balle
Advanced AI assistants combine frontier LLMs and tool access to autonomously perform complex tasks on behalf of users. While the helpfulness of such assistants can increase dramatically with access to user information including emails and documents, this raises privacy concerns about assistants sharing inappropriate information with third parties without user supervision. To steer information-sharing assistants to behave in accordance with privacy expectations, we propose to operationalize $textit{contextual integrity}$ (CI), a framework that equates privacy with the appropriate flow of information in a given context. In particular, we design and evaluate a number of strategies to steer assistants' information-sharing actions to be CI compliant. Our evaluation is based on a novel form filling benchmark composed of synthetic data and human annotations, and it reveals that prompting frontier LLMs to perform CI-based reasoning yields strong results.
先进的人工智能助手结合了前沿 LLM 和工具访问,可代表用户自主执行复杂的任务。虽然这类助手在获取用户信息(包括电子邮件和文档)后能显著提高帮助性,但这也引发了隐私问题,即助手在没有用户监督的情况下与第三方共享不适当的信息。为了引导信息共享助手的行为符合隐私期望,我们提出了$textit{contextual integrity}$(CI),这是一个将隐私等同于特定情境下适当信息流的框架。我们的评估基于一个由合成数据和人类注释组成的新颖的表单填写基准,它揭示了促使前沿 LLM 执行基于 CI 的推理会产生强大的结果。
{"title":"Operationalizing Contextual Integrity in Privacy-Conscious Assistants","authors":"Sahra Ghalebikesabi, Eugene Bagdasaryan, Ren Yi, Itay Yona, Ilia Shumailov, Aneesh Pappu, Chongyang Shi, Laura Weidinger, Robert Stanforth, Leonard Berrada, Pushmeet Kohli, Po-Sen Huang, Borja Balle","doi":"arxiv-2408.02373","DOIUrl":"https://doi.org/arxiv-2408.02373","url":null,"abstract":"Advanced AI assistants combine frontier LLMs and tool access to autonomously\u0000perform complex tasks on behalf of users. While the helpfulness of such\u0000assistants can increase dramatically with access to user information including\u0000emails and documents, this raises privacy concerns about assistants sharing\u0000inappropriate information with third parties without user supervision. To steer\u0000information-sharing assistants to behave in accordance with privacy\u0000expectations, we propose to operationalize $textit{contextual integrity}$\u0000(CI), a framework that equates privacy with the appropriate flow of information\u0000in a given context. In particular, we design and evaluate a number of\u0000strategies to steer assistants' information-sharing actions to be CI compliant.\u0000Our evaluation is based on a novel form filling benchmark composed of synthetic\u0000data and human annotations, and it reveals that prompting frontier LLMs to\u0000perform CI-based reasoning yields strong results.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents and evaluates a new retrieval augmented generation (RAG) and large language model (LLM)-based artificial intelligence (AI) technique: rubric enabled generative artificial intelligence (REGAI). REGAI uses rubrics, which can be created manually or automatically by the system, to enhance the performance of LLMs for evaluation purposes. REGAI improves on the performance of both classical LLMs and RAG-based LLM techniques. This paper describes REGAI, presents data regarding its performance and discusses several possible application areas for the technology.
{"title":"Development of REGAI: Rubric Enabled Generative Artificial Intelligence","authors":"Zach Johnson, Jeremy Straub","doi":"arxiv-2408.02811","DOIUrl":"https://doi.org/arxiv-2408.02811","url":null,"abstract":"This paper presents and evaluates a new retrieval augmented generation (RAG)\u0000and large language model (LLM)-based artificial intelligence (AI) technique:\u0000rubric enabled generative artificial intelligence (REGAI). REGAI uses rubrics,\u0000which can be created manually or automatically by the system, to enhance the\u0000performance of LLMs for evaluation purposes. REGAI improves on the performance\u0000of both classical LLMs and RAG-based LLM techniques. This paper describes\u0000REGAI, presents data regarding its performance and discusses several possible\u0000application areas for the technology.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The ability of humans to rapidly learn new knowledge while retaining old memories poses a significant challenge for current deep learning models. To handle this challenge, we draw inspiration from human memory and learning mechanisms and propose the Self-Reflective Complementary Incremental System (SR-CIS). Comprising the deconstructed Complementary Inference Module (CIM) and Complementary Memory Module (CMM), SR-CIS features a small model for fast inference and a large model for slow deliberation in CIM, enabled by the Confidence-Aware Online Anomaly Detection (CA-OAD) mechanism for efficient collaboration. CMM consists of task-specific Short-Term Memory (STM) region and a universal Long-Term Memory (LTM) region. By setting task-specific Low-Rank Adaptive (LoRA) and corresponding prototype weights and biases, it instantiates external storage for parameter and representation memory, thus deconstructing the memory module from the inference module. By storing textual descriptions of images during training and combining them with the Scenario Replay Module (SRM) post-training for memory combination, along with periodic short-to-long-term memory restructuring, SR-CIS achieves stable incremental memory with limited storage requirements. Balancing model plasticity and memory stability under constraints of limited storage and low data resources, SR-CIS surpasses existing competitive baselines on multiple standard and few-shot incremental learning benchmarks.
{"title":"SR-CIS: Self-Reflective Incremental System with Decoupled Memory and Reasoning","authors":"Biqing Qi, Junqi Gao, Xinquan Chen, Dong Li, Weinan Zhang, Bowen Zhou","doi":"arxiv-2408.01970","DOIUrl":"https://doi.org/arxiv-2408.01970","url":null,"abstract":"The ability of humans to rapidly learn new knowledge while retaining old\u0000memories poses a significant challenge for current deep learning models. To\u0000handle this challenge, we draw inspiration from human memory and learning\u0000mechanisms and propose the Self-Reflective Complementary Incremental System\u0000(SR-CIS). Comprising the deconstructed Complementary Inference Module (CIM) and\u0000Complementary Memory Module (CMM), SR-CIS features a small model for fast\u0000inference and a large model for slow deliberation in CIM, enabled by the\u0000Confidence-Aware Online Anomaly Detection (CA-OAD) mechanism for efficient\u0000collaboration. CMM consists of task-specific Short-Term Memory (STM) region and\u0000a universal Long-Term Memory (LTM) region. By setting task-specific Low-Rank\u0000Adaptive (LoRA) and corresponding prototype weights and biases, it instantiates\u0000external storage for parameter and representation memory, thus deconstructing\u0000the memory module from the inference module. By storing textual descriptions of\u0000images during training and combining them with the Scenario Replay Module (SRM)\u0000post-training for memory combination, along with periodic short-to-long-term\u0000memory restructuring, SR-CIS achieves stable incremental memory with limited\u0000storage requirements. Balancing model plasticity and memory stability under\u0000constraints of limited storage and low data resources, SR-CIS surpasses\u0000existing competitive baselines on multiple standard and few-shot incremental\u0000learning benchmarks.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Generalization is a pivotal challenge for agents following natural language instructions. To approach this goal, we leverage a vision-language model (VLM) for visual grounding and transfer its vision-language knowledge into reinforcement learning (RL) for object-centric tasks, which makes the agent capable of zero-shot generalization to unseen objects and instructions. By visual grounding, we obtain an object-grounded confidence map for the target object indicated in the instruction. Based on this map, we introduce two routes to transfer VLM knowledge into RL. Firstly, we propose an object-grounded intrinsic reward function derived from the confidence map to more effectively guide the agent towards the target object. Secondly, the confidence map offers a more unified, accessible task representation for the agent's policy, compared to language embeddings. This enables the agent to process unseen objects and instructions through comprehensible visual confidence maps, facilitating zero-shot object-level generalization. Single-task experiments prove that our intrinsic reward significantly improves performance on challenging skill learning. In multi-task experiments, through testing on tasks beyond the training set, we show that the agent, when provided with the confidence map as the task representation, possesses better generalization capabilities than language-based conditioning. The code is available at https://github.com/PKU-RL/COPL.
{"title":"Visual Grounding for Object-Level Generalization in Reinforcement Learning","authors":"Haobin Jiang, Zongqing Lu","doi":"arxiv-2408.01942","DOIUrl":"https://doi.org/arxiv-2408.01942","url":null,"abstract":"Generalization is a pivotal challenge for agents following natural language\u0000instructions. To approach this goal, we leverage a vision-language model (VLM)\u0000for visual grounding and transfer its vision-language knowledge into\u0000reinforcement learning (RL) for object-centric tasks, which makes the agent\u0000capable of zero-shot generalization to unseen objects and instructions. By\u0000visual grounding, we obtain an object-grounded confidence map for the target\u0000object indicated in the instruction. Based on this map, we introduce two routes\u0000to transfer VLM knowledge into RL. Firstly, we propose an object-grounded\u0000intrinsic reward function derived from the confidence map to more effectively\u0000guide the agent towards the target object. Secondly, the confidence map offers\u0000a more unified, accessible task representation for the agent's policy, compared\u0000to language embeddings. This enables the agent to process unseen objects and\u0000instructions through comprehensible visual confidence maps, facilitating\u0000zero-shot object-level generalization. Single-task experiments prove that our\u0000intrinsic reward significantly improves performance on challenging skill\u0000learning. In multi-task experiments, through testing on tasks beyond the\u0000training set, we show that the agent, when provided with the confidence map as\u0000the task representation, possesses better generalization capabilities than\u0000language-based conditioning. The code is available at\u0000https://github.com/PKU-RL/COPL.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Antonio De Santis, Marco Balduini, Federico De Santis, Andrea Proia, Arsenio Leo, Marco Brambilla, Emanuele Della Valle
Aerospace manufacturing companies, such as Thales Alenia Space, design, develop, integrate, verify, and validate products characterized by high complexity and low volume. They carefully document all phases for each product but analyses across products are challenging due to the heterogeneity and unstructured nature of the data in documents. In this paper, we propose a hybrid methodology that leverages Knowledge Graphs (KGs) in conjunction with Large Language Models (LLMs) to extract and validate data contained in these documents. We consider a case study focused on test data related to electronic boards for satellites. To do so, we extend the Semantic Sensor Network ontology. We store the metadata of the reports in a KG, while the actual test results are stored in parquet accessible via a Virtual Knowledge Graph. The validation process is managed using an LLM-based approach. We also conduct a benchmarking study to evaluate the performance of state-of-the-art LLMs in executing this task. Finally, we analyze the costs and benefits of automating preexisting processes of manual data extraction and validation for subsequent cross-report analyses.
航空航天制造公司(如泰雷兹阿莱尼亚宇航公司)设计、开发、集成、验证和确认的产品具有高复杂性和低产量的特点。他们仔细记录每个产品的所有阶段,但由于文档中数据的异质性和非结构化性质,跨产品分析具有挑战性。在本文中,我们提出了一种混合方法,利用知识图谱(KG)和大型语言模型(LLM)来提取和验证文档中包含的数据。我们考虑的案例研究侧重于与卫星电子板相关的测试数据。为此,我们扩展了语义传感器网络本体。我们将报告的元数据存储在 KG 中,而实际测试结果则存储在可通过虚拟知识图谱访问的 parquet 中。验证过程采用基于 LLM 的方法进行管理。我们还进行了一项enchmarking 研究,以评估最先进的 LLM 在执行这项任务时的性能。最后,我们分析了将现有的人工数据提取和验证流程自动化以进行后续交叉报告分析的成本和收益。
{"title":"Integrating Large Language Models and Knowledge Graphs for Extraction and Validation of Textual Test Data","authors":"Antonio De Santis, Marco Balduini, Federico De Santis, Andrea Proia, Arsenio Leo, Marco Brambilla, Emanuele Della Valle","doi":"arxiv-2408.01700","DOIUrl":"https://doi.org/arxiv-2408.01700","url":null,"abstract":"Aerospace manufacturing companies, such as Thales Alenia Space, design,\u0000develop, integrate, verify, and validate products characterized by high\u0000complexity and low volume. They carefully document all phases for each product\u0000but analyses across products are challenging due to the heterogeneity and\u0000unstructured nature of the data in documents. In this paper, we propose a\u0000hybrid methodology that leverages Knowledge Graphs (KGs) in conjunction with\u0000Large Language Models (LLMs) to extract and validate data contained in these\u0000documents. We consider a case study focused on test data related to electronic\u0000boards for satellites. To do so, we extend the Semantic Sensor Network\u0000ontology. We store the metadata of the reports in a KG, while the actual test\u0000results are stored in parquet accessible via a Virtual Knowledge Graph. The\u0000validation process is managed using an LLM-based approach. We also conduct a\u0000benchmarking study to evaluate the performance of state-of-the-art LLMs in\u0000executing this task. Finally, we analyze the costs and benefits of automating\u0000preexisting processes of manual data extraction and validation for subsequent\u0000cross-report analyses.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}