arXiv - CS - Artificial Intelligence最新文献_第2页

Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots 认知内核：面向通用自动驾驶仪的开源代理系统

arXiv - CS - Artificial Intelligence

Pub Date : 2024-09-16 DOI: arxiv-2409.10277

Hongming Zhang, Xiaoman Pan, Hongwei Wang, Kaixin Ma, Wenhao Yu, Dong Yu

We introduce Cognitive Kernel, an open-source agent system towards the goalof generalist autopilots. Unlike copilot systems, which primarily rely on usersto provide essential state information (e.g., task descriptions) and assistusers by answering questions or auto-completing contents, autopilot systemsmust complete tasks from start to finish independently, which requires thesystem to acquire the state information from the environments actively. Toachieve this, an autopilot system should be capable of understanding userintents, actively gathering necessary information from various real-worldsources, and making wise decisions. Cognitive Kernel adopts a model-centricdesign. In our implementation, the central policy model (a fine-tuned LLM)initiates interactions with the environment using a combination of atomicactions, such as opening files, clicking buttons, saving intermediate resultsto memory, or calling the LLM itself. This differs from the widely usedenvironment-centric design, where a task-specific environment with predefinedactions is fixed, and the policy model is limited to selecting the correctaction from a given set of options. Our design facilitates seamless informationflow across various sources and provides greater flexibility. We evaluate oursystem in three use cases: real-time information management, privateinformation management, and long-term memory management. The resultsdemonstrate that Cognitive Kernel achieves better or comparable performance toother closed-source systems in these scenarios. Cognitive Kernel is fullydockerized, ensuring everyone can deploy it privately and securely. Weopen-source the system and the backbone model to encourage further research onLLM-driven autopilot systems.

我们介绍的认知内核（Cognitive Kernel）是一个开源代理系统，旨在实现通用自动驾驶的目标。与主要依靠用户提供基本状态信息（如任务描述）并通过回答问题或自动完成内容来协助用户的副驾驶系统不同，自动驾驶系统必须自始至终独立完成任务，这就要求系统主动从环境中获取状态信息。为此，自动驾驶系统应能够理解用户的意图，主动从现实世界的各种资源中收集必要的信息，并做出明智的决策。认知内核采用了以模型为中心的设计。在我们的实现过程中，中央策略模型（经过微调的 LLM）使用原子交互组合启动与环境的交互，例如打开文件、点击按钮、将中间结果保存到内存或调用 LLM 本身。这不同于广泛使用的以环境为中心的设计，在这种设计中，具有预定义交互的特定任务环境是固定的，策略模型仅限于从一组给定的选项中选择正确的交互。我们的设计有利于跨各种来源的无缝信息流，并提供了更大的灵活性。我们在三个用例中评估了我们的系统：实时信息管理、私人信息管理和长期内存管理。结果表明，Cognitive Kernel 在这些应用场景中取得了比其他闭源系统更好或相当的性能。认知内核是完全ocker化的，确保每个人都能私密、安全地部署它。我们将系统和骨干模型开源，以鼓励对LLM驱动的自动驾驶系统的进一步研究。

{"title":"Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots","authors":"Hongming Zhang, Xiaoman Pan, Hongwei Wang, Kaixin Ma, Wenhao Yu, Dong Yu","doi":"arxiv-2409.10277","DOIUrl":"https://doi.org/arxiv-2409.10277","url":null,"abstract":"We introduce Cognitive Kernel, an open-source agent system towards the goal\u0000of generalist autopilots. Unlike copilot systems, which primarily rely on users\u0000to provide essential state information (e.g., task descriptions) and assist\u0000users by answering questions or auto-completing contents, autopilot systems\u0000must complete tasks from start to finish independently, which requires the\u0000system to acquire the state information from the environments actively. To\u0000achieve this, an autopilot system should be capable of understanding user\u0000intents, actively gathering necessary information from various real-world\u0000sources, and making wise decisions. Cognitive Kernel adopts a model-centric\u0000design. In our implementation, the central policy model (a fine-tuned LLM)\u0000initiates interactions with the environment using a combination of atomic\u0000actions, such as opening files, clicking buttons, saving intermediate results\u0000to memory, or calling the LLM itself. This differs from the widely used\u0000environment-centric design, where a task-specific environment with predefined\u0000actions is fixed, and the policy model is limited to selecting the correct\u0000action from a given set of options. Our design facilitates seamless information\u0000flow across various sources and provides greater flexibility. We evaluate our\u0000system in three use cases: real-time information management, private\u0000information management, and long-term memory management. The results\u0000demonstrate that Cognitive Kernel achieves better or comparable performance to\u0000other closed-source systems in these scenarios. Cognitive Kernel is fully\u0000dockerized, ensuring everyone can deploy it privately and securely. We\u0000open-source the system and the backbone model to encourage further research on\u0000LLM-driven autopilot systems.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142252657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automatic Control With Human-Like Reasoning: Exploring Language Model Embodied Air Traffic Agents 类人推理的自动控制：探索语言模型嵌入式空中交通代理

arXiv - CS - Artificial Intelligence

Pub Date : 2024-09-15 DOI: arxiv-2409.09717

Justas Andriuškevičius, Junzi Sun

Recent developments in language models have created new opportunities in airtraffic control studies. The current focus is primarily on text andlanguage-based use cases. However, these language models may offer a higherpotential impact in the air traffic control domain, thanks to their ability tointeract with air traffic environments in an embodied agent form. They alsoprovide a language-like reasoning capability to explain their decisions, whichhas been a significant roadblock for the implementation of automatic airtraffic control. This paper investigates the application of a language model-based agent withfunction-calling and learning capabilities to resolve air traffic conflictswithout human intervention. The main components of this research arefoundational large language models, tools that allow the agent to interact withthe simulator, and a new concept, the experience library. An innovative part ofthis research, the experience library, is a vector database that storessynthesized knowledge that agents have learned from interactions with thesimulations and language models. To evaluate the performance of our language model-based agent, bothopen-source and closed-source models were tested. The results of our studyreveal significant differences in performance across various configurations ofthe language model-based agents. The best-performing configuration was able tosolve almost all 120 but one imminent conflict scenarios, including up to fouraircraft at the same time. Most importantly, the agents are able to providehuman-level text explanations on traffic situations and conflict resolutionstrategies.

语言模型的最新发展为空中交通管制研究创造了新的机遇。目前的重点主要是基于文本和语言的使用案例。然而，由于这些语言模型能够以具身代理的形式与空中交通环境互动，因此可能会对空中交通管制领域产生更大的潜在影响。它们还提供了类似语言的推理能力来解释它们的决策，而这一直是实施自动空中交通管制的一个重大障碍。本文研究了基于语言模型、具有功能调用和学习能力的代理如何在没有人工干预的情况下解决空中交通冲突。这项研究的主要内容包括基础大型语言模型、允许代理与模拟器交互的工具，以及一个新概念--经验库。经验库是这项研究的创新部分，它是一个向量数据库，存储了代理从与模拟和语言模型的交互中学到的合成知识。为了评估基于语言模型的代理的性能，我们对开源和闭源模型进行了测试。我们的研究结果表明，基于语言模型的代理的各种配置在性能上存在显著差异。性能最好的配置能够解决几乎所有 120 个迫在眉睫的冲突场景，只有一个例外，其中包括多达四架飞机同时发生冲突。最重要的是，这些代理能够就交通状况和冲突解决策略提供人类水平的文本解释。

{"title":"Automatic Control With Human-Like Reasoning: Exploring Language Model Embodied Air Traffic Agents","authors":"Justas Andriuškevičius, Junzi Sun","doi":"arxiv-2409.09717","DOIUrl":"https://doi.org/arxiv-2409.09717","url":null,"abstract":"Recent developments in language models have created new opportunities in air\u0000traffic control studies. The current focus is primarily on text and\u0000language-based use cases. However, these language models may offer a higher\u0000potential impact in the air traffic control domain, thanks to their ability to\u0000interact with air traffic environments in an embodied agent form. They also\u0000provide a language-like reasoning capability to explain their decisions, which\u0000has been a significant roadblock for the implementation of automatic air\u0000traffic control. This paper investigates the application of a language model-based agent with\u0000function-calling and learning capabilities to resolve air traffic conflicts\u0000without human intervention. The main components of this research are\u0000foundational large language models, tools that allow the agent to interact with\u0000the simulator, and a new concept, the experience library. An innovative part of\u0000this research, the experience library, is a vector database that stores\u0000synthesized knowledge that agents have learned from interactions with the\u0000simulations and language models. To evaluate the performance of our language model-based agent, both\u0000open-source and closed-source models were tested. The results of our study\u0000reveal significant differences in performance across various configurations of\u0000the language model-based agents. The best-performing configuration was able to\u0000solve almost all 120 but one imminent conflict scenarios, including up to four\u0000aircraft at the same time. Most importantly, the agents are able to provide\u0000human-level text explanations on traffic situations and conflict resolution\u0000strategies.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142252821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Data-Centric RLHF: Simple Metrics for Preference Dataset Comparison 实现以数据为中心的 RLHF：用于偏好数据集比较的简单指标

arXiv - CS - Artificial Intelligence

Pub Date : 2024-09-15 DOI: arxiv-2409.09603

Judy Hanwen Shen, Archit Sharma, Jun Qin

The goal of aligning language models to human preferences requires data thatreveal these preferences. Ideally, time and money can be spent carefullycollecting and tailoring bespoke preference data to each downstreamapplication. However, in practice, a select few publicly available preferencedatasets are often used to train reward models for reinforcement learning fromhuman feedback (RLHF). While new preference datasets are being introduced withincreasing frequency, there are currently no existing efforts to measure andcompare these datasets. In this paper, we systematically study preferencedatasets through three perspectives: scale, label noise, and informationcontent. We propose specific metrics for each of these perspectives and uncoverdifferent axes of comparison for a better understanding of preference datasets.Our work is a first step towards a data-centric approach to alignment byproviding perspectives that aid in training efficiency and iterative datacollection for RLHF.

要实现语言模型与人类偏好相一致的目标，需要能够揭示这些偏好的数据。理想情况下，我们可以花费时间和金钱仔细收集和定制偏好数据，以满足每个下游应用的需要。然而，在实践中，通常只有少数几个公开的偏好数据集被用于训练从人类反馈中强化学习（RLHF）的奖励模型。虽然新的偏好数据集被越来越频繁地引入，但目前还没有对这些数据集进行测量和比较的工作。在本文中，我们从尺度、标签噪声和信息内容三个角度系统地研究了偏好数据集。通过提供有助于提高 RLHF 的训练效率和迭代数据收集的视角，我们的工作向以数据为中心的配准方法迈出了第一步。

引用次数: 0

Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models 通过步骤级 Q 值模型增强 LLM 代理的决策能力

arXiv - CS - Artificial Intelligence

Pub Date : 2024-09-14 DOI: arxiv-2409.09345

Yuanzhao Zhai, Tingkai Yang, Kele Xu, Feng Dawei, Cheng Yang, Bo Ding, Huaimin Wang

Agents significantly enhance the capabilities of standalone Large LanguageModels (LLMs) by perceiving environments, making decisions, and executingactions. However, LLM agents still face challenges in tasks that requiremultiple decision-making steps. Estimating the value of actions in specifictasks is difficult when intermediate actions are neither appropriately rewardednor penalized. In this paper, we propose leveraging a task-relevant Q-valuemodel to guide action selection. Specifically, we first collect decision-makingtrajectories annotated with step-level Q values via Monte Carlo Tree Search(MCTS) and construct preference data. We then use another LLM to fit thesepreferences through step-level Direct Policy Optimization (DPO), which servesas the Q-value model. During inference, at each decision-making step, LLMagents select the action with the highest Q value before interacting with theenvironment. We apply our method to various open-source and API-based LLMagents, demonstrating that Q-value models significantly improve theirperformance. Notably, the performance of the agent built withPhi-3-mini-4k-instruct improved by 103% on WebShop and 75% on HotPotQA whenenhanced with Q-value models, even surpassing GPT-4o-mini. Additionally,Q-value models offer several advantages, such as generalization to differentLLM agents and seamless integration with existing prompting strategies.

通过感知环境、做出决策和执行动作，代理大大增强了独立大型语言模型（LLM）的能力。然而，在需要多个决策步骤的任务中，LLM 代理仍然面临挑战。当中间行动既没有得到适当奖励也没有受到适当惩罚时，要估计具体任务中行动的价值就很困难。在本文中，我们建议利用与任务相关的 Q 值模型来指导行动选择。具体来说，我们首先通过蒙特卡洛树搜索（MCTS）收集注有步骤级 Q 值的决策轨迹，并构建偏好数据。然后，我们使用另一个 LLM，通过步骤级直接策略优化（DPO）来拟合这些偏好，作为 Q 值模型。在推理过程中，LLMagents 会在每个决策步骤中选择 Q 值最高的行动，然后再与环境互动。我们将我们的方法应用于各种开源和基于 API 的 LLMagents，结果表明 Q 值模型显著提高了它们的性能。值得注意的是，使用 Q 值模型增强后，使用 Phi3-mini-4k-instruct 构建的代理在 WebShop 上的性能提高了 103%，在 HotPotQA 上的性能提高了 75%，甚至超过了 GPT-4o-mini。此外，Q 值模型还具有一些优势，如可通用于不同的LLM 代理，并可与现有的提示策略无缝集成。

{"title":"Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models","authors":"Yuanzhao Zhai, Tingkai Yang, Kele Xu, Feng Dawei, Cheng Yang, Bo Ding, Huaimin Wang","doi":"arxiv-2409.09345","DOIUrl":"https://doi.org/arxiv-2409.09345","url":null,"abstract":"Agents significantly enhance the capabilities of standalone Large Language\u0000Models (LLMs) by perceiving environments, making decisions, and executing\u0000actions. However, LLM agents still face challenges in tasks that require\u0000multiple decision-making steps. Estimating the value of actions in specific\u0000tasks is difficult when intermediate actions are neither appropriately rewarded\u0000nor penalized. In this paper, we propose leveraging a task-relevant Q-value\u0000model to guide action selection. Specifically, we first collect decision-making\u0000trajectories annotated with step-level Q values via Monte Carlo Tree Search\u0000(MCTS) and construct preference data. We then use another LLM to fit these\u0000preferences through step-level Direct Policy Optimization (DPO), which serves\u0000as the Q-value model. During inference, at each decision-making step, LLM\u0000agents select the action with the highest Q value before interacting with the\u0000environment. We apply our method to various open-source and API-based LLM\u0000agents, demonstrating that Q-value models significantly improve their\u0000performance. Notably, the performance of the agent built with\u0000Phi-3-mini-4k-instruct improved by 103% on WebShop and 75% on HotPotQA when\u0000enhanced with Q-value models, even surpassing GPT-4o-mini. Additionally,\u0000Q-value models offer several advantages, such as generalization to different\u0000LLM agents and seamless integration with existing prompting strategies.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142252699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Autonomous Goal Detection and Cessation in Reinforcement Learning: A Case Study on Source Term Estimation 强化学习中的自主目标检测和停止：源词估计案例研究

arXiv - CS - Artificial Intelligence

Pub Date : 2024-09-14 DOI: arxiv-2409.09541

Yiwei Shi, Muning Wen, Qi Zhang, Weinan Zhang, Cunjia Liu, Weiru Liu

Reinforcement Learning has revolutionized decision-making processes indynamic environments, yet it often struggles with autonomously detecting andachieving goals without clear feedback signals. For example, in a Source TermEstimation problem, the lack of precise environmental information makes itchallenging to provide clear feedback signals and to define and evaluate howthe source's location is determined. To address this challenge, the AutonomousGoal Detection and Cessation (AGDC) module was developed, enhancing various RLalgorithms by incorporating a self-feedback mechanism for autonomous goaldetection and cessation upon task completion. Our method effectively identifiesand ceases undefined goals by approximating the agent's belief, significantlyenhancing the capabilities of RL algorithms in environments with limitedfeedback. To validate effectiveness of our approach, we integrated AGDC withdeep Q-Network, proximal policy optimization, and deep deterministic policygradient algorithms, and evaluated its performance on the Source TermEstimation problem. The experimental results showed that AGDC-enhanced RLalgorithms significantly outperformed traditional statistical methods such asinfotaxis, entrotaxis, and dual control for exploitation and exploration, aswell as a non-statistical random action selection method. These improvementswere evident in terms of success rate, mean traveled distance, and search time,highlighting AGDC's effectiveness and efficiency in complex, real-worldscenarios.

强化学习（Reinforcement Learning）已经彻底改变了动态环境中的决策过程，但在没有明确反馈信号的情况下，强化学习往往难以自主检测和实现目标。例如，在 "源术语估计 "问题中，由于缺乏精确的环境信息，因此很难提供明确的反馈信号，也很难定义和评估如何确定源的位置。为了应对这一挑战，我们开发了自主目标检测和停止（AGDC）模块，通过纳入自主目标检测和任务完成后停止的自我反馈机制来增强各种 RL 算法。我们的方法通过近似代理的信念来有效识别和停止未定义的目标，从而大大增强了有限反馈环境中 RL 算法的能力。为了验证我们方法的有效性，我们将 AGDC 与深度 Q 网络、近似策略优化和深度确定性策略梯度算法进行了集成，并在源术语估计问题上对其性能进行了评估。实验结果表明，AGDC 增强 RL 算法的性能明显优于传统的统计方法，如用于开发和探索的 Infotaxis、entrotaxis 和 dual control，以及非统计随机行动选择方法。这些改进在成功率、平均移动距离和搜索时间方面都很明显，凸显了 AGDC 在复杂的真实世界场景中的有效性和效率。

{"title":"Autonomous Goal Detection and Cessation in Reinforcement Learning: A Case Study on Source Term Estimation","authors":"Yiwei Shi, Muning Wen, Qi Zhang, Weinan Zhang, Cunjia Liu, Weiru Liu","doi":"arxiv-2409.09541","DOIUrl":"https://doi.org/arxiv-2409.09541","url":null,"abstract":"Reinforcement Learning has revolutionized decision-making processes in\u0000dynamic environments, yet it often struggles with autonomously detecting and\u0000achieving goals without clear feedback signals. For example, in a Source Term\u0000Estimation problem, the lack of precise environmental information makes it\u0000challenging to provide clear feedback signals and to define and evaluate how\u0000the source's location is determined. To address this challenge, the Autonomous\u0000Goal Detection and Cessation (AGDC) module was developed, enhancing various RL\u0000algorithms by incorporating a self-feedback mechanism for autonomous goal\u0000detection and cessation upon task completion. Our method effectively identifies\u0000and ceases undefined goals by approximating the agent's belief, significantly\u0000enhancing the capabilities of RL algorithms in environments with limited\u0000feedback. To validate effectiveness of our approach, we integrated AGDC with\u0000deep Q-Network, proximal policy optimization, and deep deterministic policy\u0000gradient algorithms, and evaluated its performance on the Source Term\u0000Estimation problem. The experimental results showed that AGDC-enhanced RL\u0000algorithms significantly outperformed traditional statistical methods such as\u0000infotaxis, entrotaxis, and dual control for exploitation and exploration, as\u0000well as a non-statistical random action selection method. These improvements\u0000were evident in terms of success rate, mean traveled distance, and search time,\u0000highlighting AGDC's effectiveness and efficiency in complex, real-world\u0000scenarios.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142268784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Developing an Algorithm Selector for Green Configuration in Scheduling Problems 为调度问题中的绿色配置开发算法选择器

arXiv - CS - Artificial Intelligence

Pub Date : 2024-09-13 DOI: arxiv-2409.08641

Carlos March, Christian Perez, Miguel A. Salido

The Job Shop Scheduling Problem (JSP) is central to operations research,primarily optimizing energy efficiency due to its profound environmental andeconomic implications. Efficient scheduling enhances production metrics andmitigates energy consumption, thus effectively balancing productivity andsustainability objectives. Given the intricate and diverse nature of JSPinstances, along with the array of algorithms developed to tackle thesechallenges, an intelligent algorithm selection tool becomes paramount. Thispaper introduces a framework designed to identify key problem features thatcharacterize its complexity and guide the selection of suitable algorithms.Leveraging machine learning techniques, particularly XGBoost, the frameworkrecommends optimal solvers such as GUROBI, CPLEX, and GECODE for efficient JSPscheduling. GUROBI excels with smaller instances, while GECODE demonstratesrobust scalability for complex scenarios. The proposed algorithm selectorachieves an accuracy of 84.51% in recommending the best algorithm for solvingnew JSP instances, highlighting its efficacy in algorithm selection. Byrefining feature extraction methodologies, the framework aims to broaden itsapplicability across diverse JSP scenarios, thereby advancing efficiency andsustainability in manufacturing logistics.

作业车间调度问题（JSP）是运筹学的核心问题，主要是优化能源效率，因为它对环境和经济有着深远的影响。高效的调度可以提高生产指标，减少能源消耗，从而有效平衡生产率和可持续发展目标。鉴于 JSPinstances 复杂多样的性质，以及为应对这些挑战而开发的一系列算法，智能算法选择工具变得至关重要。利用机器学习技术，特别是 XGBoost，该框架推荐了 GUROBI、CPLEX 和 GECODE 等最优解算器，用于高效的 JSP 调度。GUROBI 在较小的实例中表现出色，而 GECODE 则在复杂的场景中表现出强大的可扩展性。所提出的算法选择器在为解决新的 JSP 实例推荐最佳算法方面达到了 84.51% 的准确率，突出了其在算法选择方面的功效。通过改进特征提取方法，该框架旨在扩大其在各种 JSP 场景中的适用性，从而提高制造物流的效率和可持续性。

{"title":"Developing an Algorithm Selector for Green Configuration in Scheduling Problems","authors":"Carlos March, Christian Perez, Miguel A. Salido","doi":"arxiv-2409.08641","DOIUrl":"https://doi.org/arxiv-2409.08641","url":null,"abstract":"The Job Shop Scheduling Problem (JSP) is central to operations research,\u0000primarily optimizing energy efficiency due to its profound environmental and\u0000economic implications. Efficient scheduling enhances production metrics and\u0000mitigates energy consumption, thus effectively balancing productivity and\u0000sustainability objectives. Given the intricate and diverse nature of JSP\u0000instances, along with the array of algorithms developed to tackle these\u0000challenges, an intelligent algorithm selection tool becomes paramount. This\u0000paper introduces a framework designed to identify key problem features that\u0000characterize its complexity and guide the selection of suitable algorithms.\u0000Leveraging machine learning techniques, particularly XGBoost, the framework\u0000recommends optimal solvers such as GUROBI, CPLEX, and GECODE for efficient JSP\u0000scheduling. GUROBI excels with smaller instances, while GECODE demonstrates\u0000robust scalability for complex scenarios. The proposed algorithm selector\u0000achieves an accuracy of 84.51% in recommending the best algorithm for solving\u0000new JSP instances, highlighting its efficacy in algorithm selection. By\u0000refining feature extraction methodologies, the framework aims to broaden its\u0000applicability across diverse JSP scenarios, thereby advancing efficiency and\u0000sustainability in manufacturing logistics.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142252707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CPL: Critical Planning Step Learning Boosts LLM Generalization in Reasoning Tasks CPL：关键规划步骤学习可提高推理任务中的 LLM 通用性

arXiv - CS - Artificial Intelligence

Pub Date : 2024-09-13 DOI: arxiv-2409.08642

Tianlong Wang, Xueting Han, Jing Bai

Post-training large language models (LLMs) to develop reasoning capabilitieshas proven effective across diverse domains, such as mathematical reasoning andcode generation. However, existing methods primarily focus on improvingtask-specific reasoning but have not adequately addressed the model'sgeneralization capabilities across a broader range of reasoning tasks. Totackle this challenge, we introduce Critical Planning Step Learning (CPL),which leverages Monte Carlo Tree Search (MCTS) to explore diverse planningsteps in multi-step reasoning tasks. Based on long-term outcomes, CPL learnsstep-level planning preferences to improve the model's planning capabilitiesand, consequently, its general reasoning capabilities. Furthermore, whileeffective in many scenarios for aligning LLMs, existing preference learningapproaches like Direct Preference Optimization (DPO) struggle with complexmulti-step reasoning tasks due to their inability to capture fine-grainedsupervision at each step. We propose Step-level Advantage PreferenceOptimization (Step-APO), which integrates an advantage estimate for step-levelpreference pairs obtained via MCTS into the DPO. This enables the model to moreeffectively learn critical intermediate planning steps, thereby furtherimproving its generalization in reasoning tasks. Experimental resultsdemonstrate that our method, trained exclusively on GSM8K and MATH, not onlysignificantly improves performance on GSM8K (+10.5%) and MATH (+6.5%), but alsoenhances out-of-domain reasoning benchmarks, such as ARC-C (+4.0%), BBH(+1.8%), MMLU-STEM (+2.2%), and MMLU (+0.9%).

事实证明，通过后训练大型语言模型（LLM）来开发推理能力在数学推理和代码生成等不同领域都很有效。然而，现有的方法主要侧重于提高特定任务的推理能力，却没有充分解决模型在更广泛的推理任务中的泛化能力问题。为了应对这一挑战，我们引入了关键规划步骤学习（CPL），它利用蒙特卡洛树搜索（MCTS）来探索多步骤推理任务中的各种规划步骤。基于长期结果，CPL 学习步骤级规划偏好，以提高模型的规划能力，进而提高其一般推理能力。此外，现有的偏好学习方法（如直接偏好优化（DPO））虽然在很多场景下都能有效地调整 LLM，但由于无法捕捉每一步的细粒度监督，因此在复杂的多步推理任务中很难发挥作用。我们提出了步骤级优势偏好优化（Step-APO），它将通过 MCTS 获得的步骤级偏好对的优势估计整合到了 DPO 中。这使模型能够更有效地学习关键的中间规划步骤，从而进一步提高其在推理任务中的泛化能力。实验结果表明，我们的方法只在 GSM8K 和 MATH 上进行训练，不仅显著提高了 GSM8K（+10.5%）和 MATH（+6.5%）的性能，还提高了域外推理基准的性能，如 ARC-C（+4.0%）、BBH（+1.8%）、MMLU-STEM（+2.2%）和 MMLU（+0.9%）。

{"title":"CPL: Critical Planning Step Learning Boosts LLM Generalization in Reasoning Tasks","authors":"Tianlong Wang, Xueting Han, Jing Bai","doi":"arxiv-2409.08642","DOIUrl":"https://doi.org/arxiv-2409.08642","url":null,"abstract":"Post-training large language models (LLMs) to develop reasoning capabilities\u0000has proven effective across diverse domains, such as mathematical reasoning and\u0000code generation. However, existing methods primarily focus on improving\u0000task-specific reasoning but have not adequately addressed the model's\u0000generalization capabilities across a broader range of reasoning tasks. To\u0000tackle this challenge, we introduce Critical Planning Step Learning (CPL),\u0000which leverages Monte Carlo Tree Search (MCTS) to explore diverse planning\u0000steps in multi-step reasoning tasks. Based on long-term outcomes, CPL learns\u0000step-level planning preferences to improve the model's planning capabilities\u0000and, consequently, its general reasoning capabilities. Furthermore, while\u0000effective in many scenarios for aligning LLMs, existing preference learning\u0000approaches like Direct Preference Optimization (DPO) struggle with complex\u0000multi-step reasoning tasks due to their inability to capture fine-grained\u0000supervision at each step. We propose Step-level Advantage Preference\u0000Optimization (Step-APO), which integrates an advantage estimate for step-level\u0000preference pairs obtained via MCTS into the DPO. This enables the model to more\u0000effectively learn critical intermediate planning steps, thereby further\u0000improving its generalization in reasoning tasks. Experimental results\u0000demonstrate that our method, trained exclusively on GSM8K and MATH, not only\u0000significantly improves performance on GSM8K (+10.5%) and MATH (+6.5%), but also\u0000enhances out-of-domain reasoning benchmarks, such as ARC-C (+4.0%), BBH\u0000(+1.8%), MMLU-STEM (+2.2%), and MMLU (+0.9%).","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142252820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A RAG Approach for Generating Competency Questions in Ontology Engineering 本体论工程中生成能力问题的 RAG 方法

arXiv - CS - Artificial Intelligence

Pub Date : 2024-09-13 DOI: arxiv-2409.08820

Xueli Pan, Jacco van Ossenbruggen, Victor de Boer, Zhisheng Huang

Competency question (CQ) formulation is central to several ontologydevelopment and evaluation methodologies. Traditionally, the task of craftingthese competency questions heavily relies on the effort of domain experts andknowledge engineers which is often time-consuming and labor-intensive. With theemergence of Large Language Models (LLMs), there arises the possibility toautomate and enhance this process. Unlike other similar works which useexisting ontologies or knowledge graphs as input to LLMs, we present aretrieval-augmented generation (RAG) approach that uses LLMs for the automaticgeneration of CQs given a set of scientific papers considered to be a domainknowledge base. We investigate its performance and specifically, we study theimpact of different number of papers to the RAG and different temperaturesetting of the LLM. We conduct experiments using GPT-4 on two domain ontologyengineering tasks and compare results against ground-truth CQs constructed bydomain experts. Empirical assessments on the results, utilizing evaluationmetrics (precision and consistency), reveal that compared to zero-shotprompting, adding relevant domain knowledge to the RAG improves the performanceof LLMs on generating CQs for concrete ontology engineering tasks.

能力问题（CQ）的提出是几种本体开发和评估方法的核心。传统上，编制这些能力问题的任务主要依赖于领域专家和知识工程师的努力，往往耗时耗力。随着大型语言模型（LLM）的出现，这一过程有了自动化和改进的可能。与其他使用现有本体或知识图谱作为 LLMs 输入的类似工作不同，我们提出了一种检索增强生成（RAG）方法，该方法使用 LLMs 自动生成 CQ，给定一组被视为领域知识库的科学论文。我们对其性能进行了研究，特别是研究了不同论文数量对 RAG 的影响以及 LLM 的不同温度设置。我们在两个领域本体工程任务中使用 GPT-4 进行了实验，并将实验结果与领域专家构建的地面实况 CQ 进行了比较。利用评价指标（精确度和一致性）对结果进行的实证评估表明，与 "0-shot-prompting "相比，在 RAG 中添加相关领域知识可以提高 LLM 为具体本体工程任务生成 CQ 的性能。

{"title":"A RAG Approach for Generating Competency Questions in Ontology Engineering","authors":"Xueli Pan, Jacco van Ossenbruggen, Victor de Boer, Zhisheng Huang","doi":"arxiv-2409.08820","DOIUrl":"https://doi.org/arxiv-2409.08820","url":null,"abstract":"Competency question (CQ) formulation is central to several ontology\u0000development and evaluation methodologies. Traditionally, the task of crafting\u0000these competency questions heavily relies on the effort of domain experts and\u0000knowledge engineers which is often time-consuming and labor-intensive. With the\u0000emergence of Large Language Models (LLMs), there arises the possibility to\u0000automate and enhance this process. Unlike other similar works which use\u0000existing ontologies or knowledge graphs as input to LLMs, we present a\u0000retrieval-augmented generation (RAG) approach that uses LLMs for the automatic\u0000generation of CQs given a set of scientific papers considered to be a domain\u0000knowledge base. We investigate its performance and specifically, we study the\u0000impact of different number of papers to the RAG and different temperature\u0000setting of the LLM. We conduct experiments using GPT-4 on two domain ontology\u0000engineering tasks and compare results against ground-truth CQs constructed by\u0000domain experts. Empirical assessments on the results, utilizing evaluation\u0000metrics (precision and consistency), reveal that compared to zero-shot\u0000prompting, adding relevant domain knowledge to the RAG improves the performance\u0000of LLMs on generating CQs for concrete ontology engineering tasks.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142252700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Proactive and Reactive Constraint Programming for Stochastic Project Scheduling with Maximal Time-Lags 针对具有最大时滞的随机项目进度安排的主动和反应式约束编程

arXiv - CS - Artificial Intelligence

Pub Date : 2024-09-13 DOI: arxiv-2409.09107

Kim van den Houten, Léon Planken, Esteban Freydell, David M. J. Tax, Mathijs de Weerdt

This study investigates scheduling strategies for the stochasticresource-constrained project scheduling problem with maximal time lags(SRCPSP/max)). Recent advances in Constraint Programming (CP) and TemporalNetworks have reinvoked interest in evaluating the advantages and drawbacks ofvarious proactive and reactive scheduling methods. First, we present a new,CP-based fully proactive method. Second, we show how a reactive approach can beconstructed using an online rescheduling procedure. A third contribution isbased on partial order schedules and uses Simple Temporal Networks withUncertainty (STNUs). Our statistical analysis shows that the STNU-basedalgorithm performs best in terms of solution quality, while also showing goodrelative offline and online computation time.

本研究探讨了具有最大时滞（SRCPSP/max）的随机资源受限项目调度问题的调度策略。）约束编程（CP）和时态网络的最新进展重新激发了人们对评估各种主动和被动调度方法优缺点的兴趣。首先，我们提出了一种基于 CP 的全新全主动方法。其次，我们展示了如何使用在线重新调度程序来构建反应式方法。第三个贡献是基于部分订单调度，并使用具有不确定性的简单时序网络（STNUs）。我们的统计分析表明，基于 STNU 的算法在解决方案质量方面表现最佳，同时也显示出良好的离线和在线计算时间。

引用次数: 0

Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale Windows 代理竞技场：大规模评估多模式操作系统代理

arXiv - CS - Artificial Intelligence

Pub Date : 2024-09-12 DOI: arxiv-2409.08264

Rogerio Bonatti, Dan Zhao, Francesco Bonacci, Dillon Dupont, Sara Abdali, Yinheng Li, Justin Wagle, Kazuhito Koishida, Arthur Bucker, Lawrence Jang, Zack Hui

Large language models (LLMs) show remarkable potential to act as computeragents, enhancing human productivity and software accessibility in multi-modaltasks that require planning and reasoning. However, measuring agent performancein realistic environments remains a challenge since: (i) most benchmarks arelimited to specific modalities or domains (e.g. text-only, web navigation, Q&A,coding) and (ii) full benchmark evaluations are slow (on order of magnitude ofdays) given the multi-step sequential nature of tasks. To address thesechallenges, we introduce the Windows Agent Arena: a reproducible, generalenvironment focusing exclusively on the Windows operating system (OS) whereagents can operate freely within a real Windows OS and use the same wide rangeof applications, tools, and web browsers available to human users when solvingtasks. We adapt the OSWorld framework (Xie et al., 2024) to create 150+ diverseWindows tasks across representative domains that require agent abilities inplanning, screen understanding, and tool usage. Our benchmark is scalable andcan be seamlessly parallelized in Azure for a full benchmark evaluation in aslittle as 20 minutes. To demonstrate Windows Agent Arena's capabilities, wealso introduce a new multi-modal agent, Navi. Our agent achieves a success rateof 19.5% in the Windows domain, compared to 74.5% performance of an unassistedhuman. Navi also demonstrates strong performance on another popular web-basedbenchmark, Mind2Web. We offer extensive quantitative and qualitative analysisof Navi's performance, and provide insights into the opportunities for futureresearch in agent development and data generation using Windows Agent Arena. Webpage: https://microsoft.github.io/WindowsAgentArena Code: https://github.com/microsoft/WindowsAgentArena

大型语言模型（LLMs）显示出作为计算机代理的巨大潜力，可在需要规划和推理的多模式任务中提高人类的工作效率和软件的可访问性。然而，衡量代理在现实环境中的性能仍然是一项挑战，因为：(i) 大多数基准仅限于特定的模式或领域（如纯文本、网络导航、问答、编码）；(ii) 鉴于任务的多步骤连续性，完整的基准评估非常缓慢（以天为单位）。为了应对这些挑战，我们引入了 Windows Agent Arena：这是一个专门针对 Windows 操作系统（OS）的可重现的通用环境，在这里，Agent 可以在真实的 Windows 操作系统中自由操作，并在解决任务时使用与人类用户相同的各种应用程序、工具和网络浏览器。我们调整了 OSWorld 框架（Xie 等人，2024 年），创建了 150 多个具有代表性的 Windows 任务，这些任务要求代理具备规划、屏幕理解和工具使用方面的能力。我们的基准具有可扩展性，可以在 Azure 中进行无缝并行化，在短短 20 分钟内即可完成完整的基准评估。为了展示 Windows Agent Arena 的能力，我们还引入了一个新的多模式代理 Navi。我们的代理在 Windows 领域的成功率为 19.5%，而无人协助的成功率为 74.5%。Navi 还在另一个流行的基于网络的基准测试 Mind2Web 中表现出色。我们对 Navi 的性能进行了广泛的定量和定性分析，并深入探讨了使用 Windows Agent Arena 进行代理开发和数据生成的未来研究机会。网页：https://microsoft.github.io/WindowsAgentArena 代码：https://github.com/microsoft/WindowsAgentArena

{"title":"Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale","authors":"Rogerio Bonatti, Dan Zhao, Francesco Bonacci, Dillon Dupont, Sara Abdali, Yinheng Li, Justin Wagle, Kazuhito Koishida, Arthur Bucker, Lawrence Jang, Zack Hui","doi":"arxiv-2409.08264","DOIUrl":"https://doi.org/arxiv-2409.08264","url":null,"abstract":"Large language models (LLMs) show remarkable potential to act as computer\u0000agents, enhancing human productivity and software accessibility in multi-modal\u0000tasks that require planning and reasoning. However, measuring agent performance\u0000in realistic environments remains a challenge since: (i) most benchmarks are\u0000limited to specific modalities or domains (e.g. text-only, web navigation, Q&A,\u0000coding) and (ii) full benchmark evaluations are slow (on order of magnitude of\u0000days) given the multi-step sequential nature of tasks. To address these\u0000challenges, we introduce the Windows Agent Arena: a reproducible, general\u0000environment focusing exclusively on the Windows operating system (OS) where\u0000agents can operate freely within a real Windows OS and use the same wide range\u0000of applications, tools, and web browsers available to human users when solving\u0000tasks. We adapt the OSWorld framework (Xie et al., 2024) to create 150+ diverse\u0000Windows tasks across representative domains that require agent abilities in\u0000planning, screen understanding, and tool usage. Our benchmark is scalable and\u0000can be seamlessly parallelized in Azure for a full benchmark evaluation in as\u0000little as 20 minutes. To demonstrate Windows Agent Arena's capabilities, we\u0000also introduce a new multi-modal agent, Navi. Our agent achieves a success rate\u0000of 19.5% in the Windows domain, compared to 74.5% performance of an unassisted\u0000human. Navi also demonstrates strong performance on another popular web-based\u0000benchmark, Mind2Web. We offer extensive quantitative and qualitative analysis\u0000of Navi's performance, and provide insights into the opportunities for future\u0000research in agent development and data generation using Windows Agent Arena. Webpage: https://microsoft.github.io/WindowsAgentArena Code: https://github.com/microsoft/WindowsAgentArena","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0