arXiv - CS - Artificial Intelligence最新文献

英文中文

An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting 在多标准排序中学习潜在非单调偏好的基于偏好激发的增量方法

arXiv - CS - Artificial Intelligence

Pub Date : 2024-09-04 DOI: arxiv-2409.02760

Zhuolin Li, Zhen Zhang, Witold Pedrycz

This paper introduces a novel incremental preference elicitation-basedapproach to learning potentially non-monotonic preferences in multi-criteriasorting (MCS) problems, enabling decision makers to progressively provideassignment example preference information. Specifically, we first construct amax-margin optimization-based model to model potentially non-monotonicpreferences and inconsistent assignment example preference information in eachiteration of the incremental preference elicitation process. Using the optimalobjective function value of the max-margin optimization-based model, we deviseinformation amount measurement methods and question selection strategies topinpoint the most informative alternative in each iteration within theframework of uncertainty sampling in active learning. Once the terminationcriterion is satisfied, the sorting result for non-reference alternatives canbe determined through the use of two optimization models, i.e., the max-marginoptimization-based model and the complexity controlling optimization model.Subsequently, two incremental preference elicitation-based algorithms aredeveloped to learn potentially non-monotonic preferences, considering differenttermination criteria. Ultimately, we apply the proposed approach to a creditrating problem to elucidate the detailed implementation steps, and performcomputational experiments on both artificial and real-world data sets tocompare the proposed question selection strategies with several benchmarkstrategies.

本文介绍了一种新颖的基于增量偏好激发的方法，用于学习多标准排序（MCS）问题中的潜在非单调偏好，使决策者能够逐步提供分配示例偏好信息。具体来说，我们首先构建了一个基于最大边际优化的模型，以模拟增量偏好诱导过程中每次迭代中的潜在非单调偏好和不一致的分配示例偏好信息。利用基于最大边际优化模型的最优目标函数值，我们设计了信息量测量方法和问题选择策略，在主动学习的不确定性抽样框架内，在每次迭代中找出信息量最大的备选方案。一旦满足了终止标准，就可以通过使用两个优化模型（即基于最大边际优化的模型和复杂度控制优化模型）来确定非参考备选方案的排序结果。随后，考虑到不同的终止标准，我们开发了两种基于增量偏好激发的算法来学习潜在的非单调偏好。最后，我们将提出的方法应用于一个信用评级问题，以阐明详细的实施步骤，并在人工数据集和真实世界数据集上进行计算实验，将提出的问题选择策略与几种基准策略进行比较。

{"title":"An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting","authors":"Zhuolin Li, Zhen Zhang, Witold Pedrycz","doi":"arxiv-2409.02760","DOIUrl":"https://doi.org/arxiv-2409.02760","url":null,"abstract":"This paper introduces a novel incremental preference elicitation-based\u0000approach to learning potentially non-monotonic preferences in multi-criteria\u0000sorting (MCS) problems, enabling decision makers to progressively provide\u0000assignment example preference information. Specifically, we first construct a\u0000max-margin optimization-based model to model potentially non-monotonic\u0000preferences and inconsistent assignment example preference information in each\u0000iteration of the incremental preference elicitation process. Using the optimal\u0000objective function value of the max-margin optimization-based model, we devise\u0000information amount measurement methods and question selection strategies to\u0000pinpoint the most informative alternative in each iteration within the\u0000framework of uncertainty sampling in active learning. Once the termination\u0000criterion is satisfied, the sorting result for non-reference alternatives can\u0000be determined through the use of two optimization models, i.e., the max-margin\u0000optimization-based model and the complexity controlling optimization model.\u0000Subsequently, two incremental preference elicitation-based algorithms are\u0000developed to learn potentially non-monotonic preferences, considering different\u0000termination criteria. Ultimately, we apply the proposed approach to a credit\u0000rating problem to elucidate the detailed implementation steps, and perform\u0000computational experiments on both artificial and real-world data sets to\u0000compare the proposed question selection strategies with several benchmark\u0000strategies.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142193865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Intensional FOL: Many-Sorted Extension 多维 FOL：多分类扩展

arXiv - CS - Artificial Intelligence

Pub Date : 2024-09-03 DOI: arxiv-2409.04469

Zoran Majkic

The concepts used in IFOL have associated to them a list of sortedattributes, and the sorts are the intensional concepts as well. The requirementto extend the unsorted IFOL (Intensional FOL) to many-sorted IFOL is mainlybased on the fact that a natural language is implicitly many-sorted and that weintend to use IFOL to support applications that use natural languages. Thus,the proposed version of many-sorted IFOL is just the completion of thisconceptual feature of the IFOL.

IFOL 中使用的概念都与排序属性列表相关联，排序也是内维概念。之所以要求将无排序 IFOL（内维 FOL）扩展为多排序 IFOL，主要是因为自然语言隐含着多排序，而我们希望使用 IFOL 来支持使用自然语言的应用程序。因此，我们提出的多排序 IFOL 版本只是对 IFOL 这一概念特征的完善。

引用次数: 0

A Deployed Online Reinforcement Learning Algorithm In An Oral Health Clinical Trial 在口腔健康临床试验中部署在线强化学习算法

arXiv - CS - Artificial Intelligence

Pub Date : 2024-09-03 DOI: arxiv-2409.02069

Anna L. Trella, Kelly W. Zhang, Hinal Jajal, Inbal Nahum-Shani, Vivek Shetty, Finale Doshi-Velez, Susan A. Murphy

Dental disease is a prevalent chronic condition associated with substantialfinancial burden, personal suffering, and increased risk of systemic diseases.Despite widespread recommendations for twice-daily tooth brushing, adherence torecommended oral self-care behaviors remains sub-optimal due to factors such asforgetfulness and disengagement. To address this, we developed Oralytics, amHealth intervention system designed to complement clinician-deliveredpreventative care for marginalized individuals at risk for dental disease.Oralytics incorporates an online reinforcement learning algorithm to determineoptimal times to deliver intervention prompts that encourage oral self-carebehaviors. We have deployed Oralytics in a registered clinical trial. Thedeployment required careful design to manage challenges specific to theclinical trials setting in the U.S. In this paper, we (1) highlight key designdecisions of the RL algorithm that address these challenges and (2) conduct are-sampling analysis to evaluate algorithm design decisions. A second phase(randomized control trial) of Oralytics is planned to start in spring 2025.

牙病是一种普遍存在的慢性疾病，与巨大的经济负担、个人痛苦和全身性疾病风险增加有关。尽管人们普遍建议每天刷牙两次，但由于注意力不集中和脱离等因素，坚持建议的口腔自我护理行为的情况仍然不理想。为了解决这个问题，我们开发了Oralytics口腔保健干预系统，该系统旨在补充临床医生为有牙病风险的边缘化人群提供的预防保健服务。Oralytics口腔保健干预系统采用了在线强化学习算法，以确定提供干预提示的最佳时间，从而鼓励口腔自我保健行为。我们已在一项注册临床试验中部署了 Oralytics。在本文中，我们（1）强调了应对这些挑战的 RL 算法的关键设计决策；（2）进行了抽样分析，以评估算法设计决策。Oralytics 的第二阶段（随机对照试验）计划于 2025 年春季开始。

{"title":"A Deployed Online Reinforcement Learning Algorithm In An Oral Health Clinical Trial","authors":"Anna L. Trella, Kelly W. Zhang, Hinal Jajal, Inbal Nahum-Shani, Vivek Shetty, Finale Doshi-Velez, Susan A. Murphy","doi":"arxiv-2409.02069","DOIUrl":"https://doi.org/arxiv-2409.02069","url":null,"abstract":"Dental disease is a prevalent chronic condition associated with substantial\u0000financial burden, personal suffering, and increased risk of systemic diseases.\u0000Despite widespread recommendations for twice-daily tooth brushing, adherence to\u0000recommended oral self-care behaviors remains sub-optimal due to factors such as\u0000forgetfulness and disengagement. To address this, we developed Oralytics, a\u0000mHealth intervention system designed to complement clinician-delivered\u0000preventative care for marginalized individuals at risk for dental disease.\u0000Oralytics incorporates an online reinforcement learning algorithm to determine\u0000optimal times to deliver intervention prompts that encourage oral self-care\u0000behaviors. We have deployed Oralytics in a registered clinical trial. The\u0000deployment required careful design to manage challenges specific to the\u0000clinical trials setting in the U.S. In this paper, we (1) highlight key design\u0000decisions of the RL algorithm that address these challenges and (2) conduct a\u0000re-sampling analysis to evaluate algorithm design decisions. A second phase\u0000(randomized control trial) of Oralytics is planned to start in spring 2025.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"156 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142193868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning State-Dependent Policy Parametrizations for Dynamic Technician Routing with Rework 为带返工的动态技术人员路由学习与状态相关的策略参数化

arXiv - CS - Artificial Intelligence

Pub Date : 2024-09-03 DOI: arxiv-2409.01815

Jonas Stein, Florentin D Hildebrandt, Barrett W Thomas, Marlin W Ulmer

Home repair and installation services require technicians to visit customersand resolve tasks of different complexity. Technicians often have heterogeneousskills and working experiences. The geographical spread of customers makesachieving only perfect matches between technician skills and task requirementsimpractical. Additionally, technicians are regularly absent due to sickness.With non-perfect assignments regarding task requirement and technician skill,some tasks may remain unresolved and require a revisit and rework. Companiesseek to minimize customer inconvenience due to delay. We model the problem as asequential decision process where, over a number of service days, customersrequest service while heterogeneously skilled technicians are routed to servecustomers in the system. Each day, our policy iteratively builds tours byadding "important" customers. The importance bases on analytical considerationsand is measured by respecting routing efficiency, urgency of service, and riskof rework in an integrated fashion. We propose a state-dependent balance ofthese factors via reinforcement learning. A comprehensive study shows thattaking a few non-perfect assignments can be quite beneficial for the overallservice quality. We further demonstrate the value provided by a state-dependentparametrization.

家庭维修和安装服务需要技术人员拜访客户，并解决不同复杂程度的任务。技术人员通常拥有不同的技能和工作经验。由于客户分布在不同的地域，要实现技术人员的技能与任务要求完全匹配是不现实的。此外，技术人员经常因病缺勤。在任务要求和技术人员技能不完全匹配的情况下，有些任务可能无法解决，需要重新检查和返工。公司希望尽量减少因延误而给客户带来的不便。我们将该问题建模为一个连续的决策过程，在该过程中，在若干个服务日内，客户提出服务请求，而技术水平参差不齐的技术人员被分派到系统中为客户提供服务。每天，我们的策略都会通过增加 "重要 "客户来迭代建立巡回服务。重要程度基于分析考虑，并通过综合考虑路由效率、服务紧迫性和返工风险来衡量。我们建议通过强化学习来平衡这些因素。一项综合研究表明，接受一些非完美任务对整体服务质量是非常有益的。我们进一步证明了与状态相关的参数化所带来的价值。

{"title":"Learning State-Dependent Policy Parametrizations for Dynamic Technician Routing with Rework","authors":"Jonas Stein, Florentin D Hildebrandt, Barrett W Thomas, Marlin W Ulmer","doi":"arxiv-2409.01815","DOIUrl":"https://doi.org/arxiv-2409.01815","url":null,"abstract":"Home repair and installation services require technicians to visit customers\u0000and resolve tasks of different complexity. Technicians often have heterogeneous\u0000skills and working experiences. The geographical spread of customers makes\u0000achieving only perfect matches between technician skills and task requirements\u0000impractical. Additionally, technicians are regularly absent due to sickness.\u0000With non-perfect assignments regarding task requirement and technician skill,\u0000some tasks may remain unresolved and require a revisit and rework. Companies\u0000seek to minimize customer inconvenience due to delay. We model the problem as a\u0000sequential decision process where, over a number of service days, customers\u0000request service while heterogeneously skilled technicians are routed to serve\u0000customers in the system. Each day, our policy iteratively builds tours by\u0000adding \"important\" customers. The importance bases on analytical considerations\u0000and is measured by respecting routing efficiency, urgency of service, and risk\u0000of rework in an integrated fashion. We propose a state-dependent balance of\u0000these factors via reinforcement learning. A comprehensive study shows that\u0000taking a few non-perfect assignments can be quite beneficial for the overall\u0000service quality. We further demonstrate the value provided by a state-dependent\u0000parametrization.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"248 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142193900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Lexicographic optimization-based approaches to learning a representative model for multi-criteria sorting with non-monotonic criteria 基于词典优化的方法，学习具有非单调标准的多标准排序代表模型

arXiv - CS - Artificial Intelligence

Pub Date : 2024-09-03 DOI: arxiv-2409.01612

Zhen Zhang, Zhuolin Li, Wenyu Yu

Deriving a representative model using value function-based methods from theperspective of preference disaggregation has emerged as a prominent and growingtopic in multi-criteria sorting (MCS) problems. A noteworthy observation isthat many existing approaches to learning a representative model for MCSproblems traditionally assume the monotonicity of criteria, which may notalways align with the complexities found in real-world MCS scenarios.Consequently, this paper proposes some approaches to learning a representativemodel for MCS problems with non-monotonic criteria through the integration ofthe threshold-based value-driven sorting procedure. To do so, we first definesome transformation functions to map the marginal values and categorythresholds into a UTA-like functional space. Subsequently, we constructconstraint sets to model non-monotonic criteria in MCS problems and developoptimization models to check and rectify the inconsistency of the decisionmaker's assignment example preference information. By simultaneouslyconsidering the complexity and discriminative power of the models, two distinctlexicographic optimization-based approaches are developed to derive arepresentative model for MCS problems with non-monotonic criteria. Eventually,we offer an illustrative example and conduct comprehensive simulationexperiments to elaborate the feasibility and validity of the proposedapproaches.

在多标准排序（MCS）问题中，使用基于价值函数的方法从偏好分解的角度推导代表性模型已成为一个突出且不断发展的课题。值得注意的是，许多现有的 MCS 问题代表模型学习方法传统上都假定标准是单调的，这可能并不总是符合现实世界中 MCS 场景的复杂性。为此，我们首先定义了一些转换函数，将边际值和类别阈值映射到类似于UTA的函数空间中。随后，我们构建了约束集来模拟 MCS 问题中的非单调标准，并开发了优化模型来检查和纠正决策者分配示例偏好信息的不一致性。通过同时考虑模型的复杂性和辨别力，我们提出了两种不同的基于 Alexicographic 优化的方法，以推导出具有非单调标准的 MCS 问题的表征模型。最后，我们提供了一个示例，并进行了综合模拟实验，以阐述所提方法的可行性和有效性。

{"title":"Lexicographic optimization-based approaches to learning a representative model for multi-criteria sorting with non-monotonic criteria","authors":"Zhen Zhang, Zhuolin Li, Wenyu Yu","doi":"arxiv-2409.01612","DOIUrl":"https://doi.org/arxiv-2409.01612","url":null,"abstract":"Deriving a representative model using value function-based methods from the\u0000perspective of preference disaggregation has emerged as a prominent and growing\u0000topic in multi-criteria sorting (MCS) problems. A noteworthy observation is\u0000that many existing approaches to learning a representative model for MCS\u0000problems traditionally assume the monotonicity of criteria, which may not\u0000always align with the complexities found in real-world MCS scenarios.\u0000Consequently, this paper proposes some approaches to learning a representative\u0000model for MCS problems with non-monotonic criteria through the integration of\u0000the threshold-based value-driven sorting procedure. To do so, we first define\u0000some transformation functions to map the marginal values and category\u0000thresholds into a UTA-like functional space. Subsequently, we construct\u0000constraint sets to model non-monotonic criteria in MCS problems and develop\u0000optimization models to check and rectify the inconsistency of the decision\u0000maker's assignment example preference information. By simultaneously\u0000considering the complexity and discriminative power of the models, two distinct\u0000lexicographic optimization-based approaches are developed to derive a\u0000representative model for MCS problems with non-monotonic criteria. Eventually,\u0000we offer an illustrative example and conduct comprehensive simulation\u0000experiments to elaborate the feasibility and validity of the proposed\u0000approaches.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"156 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142193903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LASP: Surveying the State-of-the-Art in Large Language Model-Assisted AI Planning LASP：大型语言模型辅助人工智能规划技术现状调查

arXiv - CS - Artificial Intelligence

Pub Date : 2024-09-03 DOI: arxiv-2409.01806

Haoming Li, Zhaoliang Chen, Jonathan Zhang, Fei Liu

Effective planning is essential for the success of any task, from organizinga vacation to routing autonomous vehicles and developing corporate strategies.It involves setting goals, formulating plans, and allocating resources toachieve them. LLMs are particularly well-suited for automated planning due totheir strong capabilities in commonsense reasoning. They can deduce a sequenceof actions needed to achieve a goal from a given state and identify aneffective course of action. However, it is frequently observed that plansgenerated through direct prompting often fail upon execution. Our survey aimsto highlight the existing challenges in planning with language models, focusingon key areas such as embodied environments, optimal scheduling, competitive andcooperative games, task decomposition, reasoning, and planning. Through thisstudy, we explore how LLMs transform AI planning and provide unique insightsinto the future of LM-assisted planning.

有效的规划对任何任务的成功都至关重要，从组织度假到自动驾驶汽车的路由选择以及制定企业战略，都涉及到设定目标、制定计划以及分配资源以实现目标。LLM 具有强大的常识推理能力，因此特别适合自动规划。它们可以根据给定的状态推导出实现目标所需的行动序列，并确定有效的行动方案。然而，人们经常发现，通过直接提示生成的计划在执行时往往会失败。我们的调查旨在强调使用语言模型进行规划方面的现有挑战，重点关注一些关键领域，如具身环境、优化调度、竞争和合作博弈、任务分解、推理和规划。通过这项研究，我们探索了 LLM 如何改变人工智能规划，并为 LM 辅助规划的未来提供了独特见解。

引用次数: 0

Here's Charlie! Realising the Semantic Web vision of Agents in the age of LLMs 查理来了在法律硕士时代实现代理的语义网愿景

arXiv - CS - Artificial Intelligence

Pub Date : 2024-09-03 DOI: arxiv-2409.04465

Jesse Wright

This paper presents our research towards a near-term future in which legalentities, such as individuals and organisations can entrust semi-autonomousAI-driven agents to carry out online interactions on their behalf. The author'sresearch concerns the development of semi-autonomous Web agents, which consultusers if and only if the system does not have sufficient context or confidenceto proceed working autonomously. This creates a user-agent dialogue that allowsthe user to teach the agent about the information sources they trust, theirdata-sharing preferences, and their decision-making preferences. Ultimately,this enables the user to maximise control over their data and decisions whileretaining the convenience of using agents, including those driven by LLMs. In view of developing near-term solutions, the research seeks to answer thequestion: "How do we build a trustworthy and reliable network ofsemi-autonomous agents which represent individuals and organisations on theWeb?". After identifying key requirements, the paper presents a demo for asample use case of a generic personal assistant. This is implemented using(Notation3) rules to enforce safety guarantees around belief, data sharing anddata usage and LLMs to allow natural language interaction with users andserendipitous dialogues between software agents.

本文介绍了我们的研究，在不久的将来，个人和组织等法人可以委托半自主人工智能驱动的代理代表他们进行在线互动。作者的研究涉及半自主网络代理的开发，只有当系统没有足够的背景或信心来自主工作时，才会咨询用户。这样就形成了用户与代理的对话，用户可以向代理了解他们信任的信息来源、他们的数据共享偏好以及他们的决策偏好。最终，用户可以最大限度地控制自己的数据和决策，同时保持使用代理（包括由 LLM 驱动的代理）的便利性。为了开发近期解决方案，本研究试图回答以下问题："我们如何在网络上建立一个代表个人和组织的可信、可靠的半自主代理网络？在确定了关键需求之后，本文介绍了一个通用个人助理用例的演示。它是通过使用（Notation3）规则来实现的，以强制执行有关信念、数据共享和数据使用的安全保证，并使用 LLMs 来实现与用户的自然语言交互以及软件代理之间的泛在对话。

{"title":"Here's Charlie! Realising the Semantic Web vision of Agents in the age of LLMs","authors":"Jesse Wright","doi":"arxiv-2409.04465","DOIUrl":"https://doi.org/arxiv-2409.04465","url":null,"abstract":"This paper presents our research towards a near-term future in which legal\u0000entities, such as individuals and organisations can entrust semi-autonomous\u0000AI-driven agents to carry out online interactions on their behalf. The author's\u0000research concerns the development of semi-autonomous Web agents, which consult\u0000users if and only if the system does not have sufficient context or confidence\u0000to proceed working autonomously. This creates a user-agent dialogue that allows\u0000the user to teach the agent about the information sources they trust, their\u0000data-sharing preferences, and their decision-making preferences. Ultimately,\u0000this enables the user to maximise control over their data and decisions while\u0000retaining the convenience of using agents, including those driven by LLMs. In view of developing near-term solutions, the research seeks to answer the\u0000question: \"How do we build a trustworthy and reliable network of\u0000semi-autonomous agents which represent individuals and organisations on the\u0000Web?\". After identifying key requirements, the paper presents a demo for a\u0000sample use case of a generic personal assistant. This is implemented using\u0000(Notation3) rules to enforce safety guarantees around belief, data sharing and\u0000data usage and LLMs to allow natural language interaction with users and\u0000serendipitous dialogues between software agents.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

H-ARC: A Robust Estimate of Human Performance on the Abstraction and Reasoning Corpus Benchmark H-ARC：对人类在抽象与推理语料库基准上的表现的可靠评估

arXiv - CS - Artificial Intelligence

Pub Date : 2024-09-02 DOI: arxiv-2409.01374

Solim LeGris, Wai Keen Vong, Brenden M. Lake, Todd M. Gureckis

The Abstraction and Reasoning Corpus (ARC) is a visual program synthesisbenchmark designed to test challenging out-of-distribution generalization inhumans and machines. Since 2019, limited progress has been observed on thechallenge using existing artificial intelligence methods. Comparing human andmachine performance is important for the validity of the benchmark. Whileprevious work explored how well humans can solve tasks from the ARC benchmark,they either did so using only a subset of tasks from the original dataset, orfrom variants of ARC, and therefore only provided a tentative estimate of humanperformance. In this work, we obtain a more robust estimate of humanperformance by evaluating 1729 humans on the full set of 400 training and 400evaluation tasks from the original ARC problem set. We estimate that averagehuman performance lies between 73.3% and 77.2% correct with a reportedempirical average of 76.2% on the training set, and between 55.9% and 68.9%correct with a reported empirical average of 64.2% on the public evaluationset. However, we also find that 790 out of the 800 tasks were solvable by atleast one person in three attempts, suggesting that the vast majority of thepublicly available ARC tasks are in principle solvable by typical crowd-workersrecruited over the internet. Notably, while these numbers are slightly lowerthan earlier estimates, human performance still greatly exceeds currentstate-of-the-art approaches for solving ARC. To facilitate research on ARC, wepublicly release our dataset, called H-ARC (human-ARC), which includes all ofthe submissions and action traces from human participants.

抽象与推理语料库（ARC）是一个可视化程序合成基准，旨在测试人类和机器在分布外概括方面的挑战。自2019年以来，使用现有人工智能方法在该挑战上取得的进展有限。比较人类和机器的性能对基准的有效性非常重要。虽然以前的工作探索了人类解决 ARC 基准任务的能力，但它们要么只使用了原始数据集中的任务子集，要么使用了 ARC 的变体，因此只能提供对人类性能的初步估计。在这项工作中，我们通过对原始 ARC 问题集的全部 400 个训练任务和 400 个评估任务中的 1729 人进行评估，获得了对人类性能更可靠的估计。我们估计，在训练集上，人类的平均正确率在 73.3% 到 77.2% 之间，报告的经验平均值为 76.2%；在公共评估集上，人类的平均正确率在 55.9% 到 68.9% 之间，报告的经验平均值为 64.2%。不过，我们还发现，在 800 项任务中，至少有一人可以在三次尝试中解决 790 项任务，这表明绝大多数公开的 ARC 任务原则上都可以由通过互联网招募的典型人群工作者解决。值得注意的是，虽然这些数字略低于之前的估计，但人类的表现仍然大大超过了目前最先进的 ARC 解决方法。为了促进对 ARC 的研究，我们公开发布了名为 H-ARC（human-ARC）的数据集，其中包括人类参与者提交的所有文件和行动轨迹。

{"title":"H-ARC: A Robust Estimate of Human Performance on the Abstraction and Reasoning Corpus Benchmark","authors":"Solim LeGris, Wai Keen Vong, Brenden M. Lake, Todd M. Gureckis","doi":"arxiv-2409.01374","DOIUrl":"https://doi.org/arxiv-2409.01374","url":null,"abstract":"The Abstraction and Reasoning Corpus (ARC) is a visual program synthesis\u0000benchmark designed to test challenging out-of-distribution generalization in\u0000humans and machines. Since 2019, limited progress has been observed on the\u0000challenge using existing artificial intelligence methods. Comparing human and\u0000machine performance is important for the validity of the benchmark. While\u0000previous work explored how well humans can solve tasks from the ARC benchmark,\u0000they either did so using only a subset of tasks from the original dataset, or\u0000from variants of ARC, and therefore only provided a tentative estimate of human\u0000performance. In this work, we obtain a more robust estimate of human\u0000performance by evaluating 1729 humans on the full set of 400 training and 400\u0000evaluation tasks from the original ARC problem set. We estimate that average\u0000human performance lies between 73.3% and 77.2% correct with a reported\u0000empirical average of 76.2% on the training set, and between 55.9% and 68.9%\u0000correct with a reported empirical average of 64.2% on the public evaluation\u0000set. However, we also find that 790 out of the 800 tasks were solvable by at\u0000least one person in three attempts, suggesting that the vast majority of the\u0000publicly available ARC tasks are in principle solvable by typical crowd-workers\u0000recruited over the internet. Notably, while these numbers are slightly lower\u0000than earlier estimates, human performance still greatly exceeds current\u0000state-of-the-art approaches for solving ARC. To facilitate research on ARC, we\u0000publicly release our dataset, called H-ARC (human-ARC), which includes all of\u0000the submissions and action traces from human participants.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142193872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Integrating End-to-End and Modular Driving Approaches for Online Corner Case Detection in Autonomous Driving 整合端到端和模块化驾驶方法，实现自动驾驶中的在线拐角检测

arXiv - CS - Artificial Intelligence

Pub Date : 2024-09-02 DOI: arxiv-2409.01178

Gemb Kaljavesi, Xiyan Su, Frank Diermeyer

Online corner case detection is crucial for ensuring safety in autonomousdriving vehicles. Current autonomous driving approaches can be categorized intomodular approaches and end-to-end approaches. To leverage the advantages ofboth, we propose a method for online corner case detection that integrates anend-to-end approach into a modular system. The modular system takes over theprimary driving task and the end-to-end network runs in parallel as a secondaryone, the disagreement between the systems is then used for corner casedetection. We implement this method on a real vehicle and evaluate itqualitatively. Our results demonstrate that end-to-end networks, known fortheir superior situational awareness, as secondary driving systems, caneffectively contribute to corner case detection. These findings suggest thatsuch an approach holds potential for enhancing the safety of autonomousvehicles.

在线拐角情况检测对于确保自动驾驶车辆的安全至关重要。目前的自动驾驶方法可分为模块化方法和端到端方法。为了充分利用这两种方法的优势，我们提出了一种在线拐角检测方法，将端到端方法集成到模块化系统中。模块化系统接管主要驾驶任务，端到端网络作为辅助任务并行运行，然后利用系统之间的分歧进行拐角检测。我们在一辆真实车辆上实施了这种方法，并对其进行了定性评估。我们的结果表明，以卓越的态势感知能力而著称的端到端网络作为辅助驾驶系统，能够有效地促进转弯检测。这些研究结果表明，这种方法具有提高自动驾驶汽车安全性的潜力。

引用次数: 0

Unlocking the Wisdom of Large Language Models: An Introduction to The Path to Artificial General Intelligence 开启大型语言模型的智慧：人工通用智能之路导论

arXiv - CS - Artificial Intelligence

Pub Date : 2024-09-02 DOI: arxiv-2409.01007

Edward Y. Chang

This booklet, "Unlocking the Wisdom of Large Language Models," serves as anintroduction to the comprehensive work "The Path to Artificial GeneralIntelligence." Through a series of nine aphorisms, we distill key insights andprinciples that underpin the larger exploration of AI's future throughadversarial LLM dialogue. We propose this approach as a potential path torealizing artificial general intelligence (AGI). This booklet also includes thetitles, abstracts, and introductions of the chapters in the main book, andpresents the first two chapters in their entirety.

这本名为《开启大型语言模型的智慧》的小册子是对综合性著作《人工通用智能之路》的介绍。通过一系列九条箴言，我们提炼出了关键的见解和原则，这些见解和原则是通过对抗性 LLM 对话探索人工智能未来的基础。我们建议将这种方法作为实现人工通用智能（AGI）的潜在途径。这本小册子还包括正书各章的标题、摘要和引言，并介绍了前两章的全部内容。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

arXiv - CS - Artificial Intelligence

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀