arXiv - CS - Artificial Intelligence最新文献_第4页

Can We Count on LLMs? The Fixed-Effect Fallacy and Claims of GPT-4 Capabilities 我们能指望法律硕士吗？固定效应谬误与 GPT-4 能力主张

arXiv - CS - Artificial Intelligence

Pub Date : 2024-09-11 DOI: arxiv-2409.07638

Thomas Ball, Shuo Chen, Cormac Herley

In this paper we explore evaluation of LLM capabilities. We presentmeasurements of GPT-4 performance on several deterministic tasks; each taskinvolves a basic calculation and takes as input parameter some element drawnfrom a large well-defined population (e.g., count elements in a list, multiplytwo k-digit numbers, etc). We examine several conditions per-task and performenough trials so that statistically significant differences can be detected.This allows us to investigate the sensitivity of task-accuracy both to queryphrasing and input parameter population. We find that seemingly trivialmodifications in the task-prompt or input population can yield differences farlarger than can be explained by sampling effects. For example, performance on asimple list-counting task varies with query-phrasing and list-length, but alsowith list composition (i.e., the thing-to-be-counted) and object frequency(e.g., success when an element accounts for $approx$ 50% of a list isdifferent from when it accounts for $approx$ 70% etc). We conclude that efforts to quantify LLM capabilities easily succumb to thelanguage-as-fixed-effect fallacy, where experimental observations areimproperly generalized beyond what the data supports. A consequence appears tobe that intuitions that have been formed based on interactions with humans forma very unreliable guide as to which input modifications should ``make nodifference'' to LLM performance.

本文探讨了对 LLM 能力的评估。我们展示了 GPT-4 在几项确定性任务上的性能测量结果；每项任务都涉及基本计算，并将从一个定义明确的大群体中抽取的某些元素作为输入参数（例如，计算列表中的元素、两个 k 位数相乘等）。这样，我们就可以研究任务准确性对问句和输入参数群的敏感性。我们发现，对任务提示或输入参数进行看似微不足道的修改，所产生的差异却远远大于抽样效应所能解释的差异。例如，在简单的列表计数任务中，表现会随着查询措辞和列表长度的变化而变化，但也会随着列表组成（即要计数的事物）和对象频率的变化而变化（例如，当一个元素占列表的 50% 左右时，其成功率与占 70% 左右时的成功率是不同的）。我们的结论是，量化 LLM 能力的努力很容易陷入 "语言即固定效应"（language-as-fixed-effect）的谬误，即实验观察结果被适当地概括为超出了数据所支持的范围。其后果似乎是，基于与人类互动而形成的直觉，对于哪些输入修改应该 "对 LLM 性能产生影响"，是一种非常不可靠的指导。

{"title":"Can We Count on LLMs? The Fixed-Effect Fallacy and Claims of GPT-4 Capabilities","authors":"Thomas Ball, Shuo Chen, Cormac Herley","doi":"arxiv-2409.07638","DOIUrl":"https://doi.org/arxiv-2409.07638","url":null,"abstract":"In this paper we explore evaluation of LLM capabilities. We present\u0000measurements of GPT-4 performance on several deterministic tasks; each task\u0000involves a basic calculation and takes as input parameter some element drawn\u0000from a large well-defined population (e.g., count elements in a list, multiply\u0000two k-digit numbers, etc). We examine several conditions per-task and perform\u0000enough trials so that statistically significant differences can be detected.\u0000This allows us to investigate the sensitivity of task-accuracy both to query\u0000phrasing and input parameter population. We find that seemingly trivial\u0000modifications in the task-prompt or input population can yield differences far\u0000larger than can be explained by sampling effects. For example, performance on a\u0000simple list-counting task varies with query-phrasing and list-length, but also\u0000with list composition (i.e., the thing-to-be-counted) and object frequency\u0000(e.g., success when an element accounts for $approx$ 50% of a list is\u0000different from when it accounts for $approx$ 70% etc). We conclude that efforts to quantify LLM capabilities easily succumb to the\u0000language-as-fixed-effect fallacy, where experimental observations are\u0000improperly generalized beyond what the data supports. A consequence appears to\u0000be that intuitions that have been formed based on interactions with humans form\u0000a very unreliable guide as to which input modifications should ``make no\u0000difference'' to LLM performance.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Traceable LLM-based validation of statements in knowledge graphs 基于可追溯 LLM 的知识图谱语句验证

arXiv - CS - Artificial Intelligence

Pub Date : 2024-09-11 DOI: arxiv-2409.07507

Daniel Adam, Tomáš Kliegr

This article presents a method for verifying RDF triples using LLMs, with anemphasis on providing traceable arguments. Because the LLMs cannot currentlyreliably identify the origin of the information used to construct the responseto the user query, our approach is to avoid using internal LLM factualknowledge altogether. Instead, verified RDF statements are compared to chunksof external documents retrieved through a web search or Wikipedia. To assessthe possible application of this workflow on biosciences content, we evaluated1,719 positive statements from the BioRED dataset and the same number of newlygenerated negative statements. The resulting precision is 88%, and recall is44%. This indicates that the method requires human oversight. We demonstratethe method on Wikidata, where a SPARQL query is used to automatically retrievestatements needing verification. Overall, the results suggest that LLMs couldbe used for large-scale verification of statements in KGs, a task previouslyunfeasible due to human annotation costs.

本文介绍了一种使用 LLM 验证 RDF 三元组的方法，重点在于提供可追溯的论据。由于 LLM 目前无法可靠地识别用于构建用户查询响应的信息来源，我们的方法是完全避免使用 LLM 内部的事实知识。取而代之的是，将经过验证的 RDF 语句与通过网络搜索或维基百科检索到的外部文档块进行比较。为了评估这一工作流程在生物科学内容上的可能应用，我们评估了来自 BioRED 数据集的 1719 条正面语句和相同数量的新生成的负面语句。结果精确度为 88%，召回率为 44%。这表明该方法需要人为监督。我们在维基数据上演示了该方法，使用 SPARQL 查询自动检索需要验证的声明。总之，结果表明 LLM 可以用于大规模验证 KG 中的语句，而由于人工标注成本的原因，这项任务以前是不可行的。

引用次数: 0

A Novel Mathematical Framework for Objective Evaluation of Ideas using a Conversational AI (CAI) System 利用会话式人工智能（CAI）系统客观评估创意的新颖数学框架

arXiv - CS - Artificial Intelligence

Pub Date : 2024-09-11 DOI: arxiv-2409.07578

B. Sankar, Dibakar Sen

The demand for innovation in product design necessitates a prolific ideationphase. Conversational AI (CAI) systems that use Large Language Models (LLMs)such as GPT (Generative Pre-trained Transformer) have been shown to be fruitfulin augmenting human creativity, providing numerous novel and diverse ideas.Despite the success in ideation quantity, the qualitative assessment of theseideas remains challenging and traditionally reliant on expert human evaluation.This method suffers from limitations such as human judgment errors, bias, andoversight. Addressing this gap, our study introduces a comprehensivemathematical framework for automated analysis to objectively evaluate theplethora of ideas generated by CAI systems and/or humans. This framework isparticularly advantageous for novice designers who lack experience in selectingpromising ideas. By converting the ideas into higher dimensional vectors andquantitatively measuring the diversity between them using tools such as UMAP,DBSCAN and PCA, the proposed method provides a reliable and objective way ofselecting the most promising ideas, thereby enhancing the efficiency of theideation phase.

产品设计中的创新需求需要一个多产的构思阶段。使用大型语言模型（LLM）（如GPT（生成预训练转换器））的会话式人工智能（CAI）系统在增强人类创造力方面取得了丰硕成果，提供了大量新颖多样的想法。尽管在构思数量上取得了成功，但对这些想法的定性评估仍具有挑战性，传统上依赖于专家人工评估。针对这一缺陷，我们的研究引入了一个全面的数学框架，用于自动分析，客观评估 CAI 系统和/或人类产生的大量创意。对于缺乏经验的新手设计者来说，这个框架尤其具有优势。通过将创意转换成高维向量，并使用 UMAP、DBSCAN 和 PCA 等工具定量测量它们之间的多样性，所提出的方法为选择最有前途的创意提供了一种可靠而客观的方法，从而提高了创意阶段的效率。

{"title":"A Novel Mathematical Framework for Objective Evaluation of Ideas using a Conversational AI (CAI) System","authors":"B. Sankar, Dibakar Sen","doi":"arxiv-2409.07578","DOIUrl":"https://doi.org/arxiv-2409.07578","url":null,"abstract":"The demand for innovation in product design necessitates a prolific ideation\u0000phase. Conversational AI (CAI) systems that use Large Language Models (LLMs)\u0000such as GPT (Generative Pre-trained Transformer) have been shown to be fruitful\u0000in augmenting human creativity, providing numerous novel and diverse ideas.\u0000Despite the success in ideation quantity, the qualitative assessment of these\u0000ideas remains challenging and traditionally reliant on expert human evaluation.\u0000This method suffers from limitations such as human judgment errors, bias, and\u0000oversight. Addressing this gap, our study introduces a comprehensive\u0000mathematical framework for automated analysis to objectively evaluate the\u0000plethora of ideas generated by CAI systems and/or humans. This framework is\u0000particularly advantageous for novice designers who lack experience in selecting\u0000promising ideas. By converting the ideas into higher dimensional vectors and\u0000quantitatively measuring the diversity between them using tools such as UMAP,\u0000DBSCAN and PCA, the proposed method provides a reliable and objective way of\u0000selecting the most promising ideas, thereby enhancing the efficiency of the\u0000ideation phase.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MAGDA: Multi-agent guideline-driven diagnostic assistance MAGDA：多代理指南驱动的诊断协助

arXiv - CS - Artificial Intelligence

Pub Date : 2024-09-10 DOI: arxiv-2409.06351

David Bani-Harouni, Nassir Navab, Matthias Keicher

In emergency departments, rural hospitals, or clinics in less developedregions, clinicians often lack fast image analysis by trained radiologists,which can have a detrimental effect on patients' healthcare. Large LanguageModels (LLMs) have the potential to alleviate some pressure from theseclinicians by providing insights that can help them in their decision-making.While these LLMs achieve high test results on medical exams showcasing theirgreat theoretical medical knowledge, they tend not to follow medicalguidelines. In this work, we introduce a new approach for zero-shotguideline-driven decision support. We model a system of multiple LLM agentsaugmented with a contrastive vision-language model that collaborate to reach apatient diagnosis. After providing the agents with simple diagnosticguidelines, they will synthesize prompts and screen the image for findingsfollowing these guidelines. Finally, they provide understandablechain-of-thought reasoning for their diagnosis, which is then self-refined toconsider inter-dependencies between diseases. As our method is zero-shot, it isadaptable to settings with rare diseases, where training data is limited, butexpert-crafted disease descriptions are available. We evaluate our method ontwo chest X-ray datasets, CheXpert and ChestX-ray 14 Longtail, showcasingperformance improvement over existing zero-shot methods and generalizability torare diseases.

在欠发达地区的急诊科、乡村医院或诊所，临床医生往往缺乏训练有素的放射科医生对图像进行快速分析，这可能会对患者的医疗保健产生不利影响。虽然这些大型语言模型（LLM）在医学考试中取得了很高的测试成绩，展示了其丰富的医学理论知识，但它们往往并不遵循医疗指南。在这项工作中，我们引入了一种新的零镜头指南驱动决策支持方法。我们建立了一个由多个 LLM 代理组成的系统模型，这些代理使用对比性视觉语言模型进行协作，以达成对患者的诊断。在为代理提供简单的诊断指南后，它们将根据这些指南合成提示并筛选图像结果。最后，它们会为自己的诊断提供可理解的思维推理链，然后对其进行自我提炼，以考虑疾病之间的相互依赖关系。由于我们的方法是 "0-shot "式的，因此它适用于罕见疾病的环境，在这种环境中，训练数据是有限的，但可以获得专家撰写的疾病描述。我们在两个胸部 X 光数据集（CheXpert 和 ChestX-ray 14 Longtail）上对我们的方法进行了评估，结果表明我们的方法比现有的零点扫描方法性能更优，而且可以推广到其他疾病。

{"title":"MAGDA: Multi-agent guideline-driven diagnostic assistance","authors":"David Bani-Harouni, Nassir Navab, Matthias Keicher","doi":"arxiv-2409.06351","DOIUrl":"https://doi.org/arxiv-2409.06351","url":null,"abstract":"In emergency departments, rural hospitals, or clinics in less developed\u0000regions, clinicians often lack fast image analysis by trained radiologists,\u0000which can have a detrimental effect on patients' healthcare. Large Language\u0000Models (LLMs) have the potential to alleviate some pressure from these\u0000clinicians by providing insights that can help them in their decision-making.\u0000While these LLMs achieve high test results on medical exams showcasing their\u0000great theoretical medical knowledge, they tend not to follow medical\u0000guidelines. In this work, we introduce a new approach for zero-shot\u0000guideline-driven decision support. We model a system of multiple LLM agents\u0000augmented with a contrastive vision-language model that collaborate to reach a\u0000patient diagnosis. After providing the agents with simple diagnostic\u0000guidelines, they will synthesize prompts and screen the image for findings\u0000following these guidelines. Finally, they provide understandable\u0000chain-of-thought reasoning for their diagnosis, which is then self-refined to\u0000consider inter-dependencies between diseases. As our method is zero-shot, it is\u0000adaptable to settings with rare diseases, where training data is limited, but\u0000expert-crafted disease descriptions are available. We evaluate our method on\u0000two chest X-ray datasets, CheXpert and ChestX-ray 14 Longtail, showcasing\u0000performance improvement over existing zero-shot methods and generalizability to\u0000rare diseases.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"69 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Shadowed AHP for multi-criteria supplier selection 用于多标准供应商选择的阴影式 AHP

arXiv - CS - Artificial Intelligence

Pub Date : 2024-09-10 DOI: arxiv-2409.09082

Mohamed Abdel Hameed El-Hawy

Numerous techniques of multi-criteria decision-making (MCDM) have beenproposed in a variety of business domains. One of the well-known methods is theAnalytical Hierarchical Process (AHP). Various uncertain numbers are commonlyused to represent preference values in AHP problems. In the case ofmulti-granularity linguistic information, several methods have been proposed toaddress this type of AHP problem. This paper introduces a novel method to solvethis problem using shadowed fuzzy numbers (SFNs). These numbers arecharacterized by approximating different types of fuzzy numbers and preservingtheir uncertainty properties. The new Shadowed AHP method is proposed to handlepreference values which are represented by multi-types of uncertain numbers.The new approach converts multi-granular preference values into unified modelof shadowed fuzzy numbers and utilizes their properties. A new ranking approachis introduced to order the results of aggregation preferences. The new approachis applied to solve a supplier selection problem in which multi-granularinformation are used. The features of the new approach are significant fordecision-making applications.

在各种商业领域，已经提出了许多多标准决策（MCDM）技术。其中一种著名的方法是层次分析法（AHP）。在 AHP 问题中，通常使用各种不确定的数字来表示偏好值。在多粒度语言信息的情况下，已经提出了几种方法来解决这类 AHP 问题。本文介绍了一种使用阴影模糊数（SFN）解决该问题的新方法。这些数的特点是逼近不同类型的模糊数，并保留其不确定性属性。新方法将多粒度偏好值转换为统一的阴影模糊数模型，并利用其特性。新方法将多粒度偏好值转换为统一的阴影模糊数模型，并利用其特性。新方法引入了一种新的排序方法，对汇总偏好值的结果进行排序。新方法被应用于解决使用多粒度信息的供应商选择问题。新方法的特点对决策应用具有重要意义。

{"title":"Shadowed AHP for multi-criteria supplier selection","authors":"Mohamed Abdel Hameed El-Hawy","doi":"arxiv-2409.09082","DOIUrl":"https://doi.org/arxiv-2409.09082","url":null,"abstract":"Numerous techniques of multi-criteria decision-making (MCDM) have been\u0000proposed in a variety of business domains. One of the well-known methods is the\u0000Analytical Hierarchical Process (AHP). Various uncertain numbers are commonly\u0000used to represent preference values in AHP problems. In the case of\u0000multi-granularity linguistic information, several methods have been proposed to\u0000address this type of AHP problem. This paper introduces a novel method to solve\u0000this problem using shadowed fuzzy numbers (SFNs). These numbers are\u0000characterized by approximating different types of fuzzy numbers and preserving\u0000their uncertainty properties. The new Shadowed AHP method is proposed to handle\u0000preference values which are represented by multi-types of uncertain numbers.\u0000The new approach converts multi-granular preference values into unified model\u0000of shadowed fuzzy numbers and utilizes their properties. A new ranking approach\u0000is introduced to order the results of aggregation preferences. The new approach\u0000is applied to solve a supplier selection problem in which multi-granular\u0000information are used. The features of the new approach are significant for\u0000decision-making applications.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142252702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Case Study: Leveraging GenAI to Build AI-based Surrogates and Regressors for Modeling Radio Frequency Heating in Fusion Energy Science 案例研究：利用 GenAI 构建基于人工智能的替代物和回归因子，为聚变能科学中的射频加热建模

arXiv - CS - Artificial Intelligence

Pub Date : 2024-09-10 DOI: arxiv-2409.06122

E. Wes Bethel, Vianna Cramer, Alexander del Rio, Lothar Narins, Chris Pestano, Satvik Verma, Erick Arias, Nicola Bertelli, Talita Perciano, Syun'ichi Shiraiwa, Álvaro Sánchez Villar, Greg Wallace, John C. Wright

This work presents a detailed case study on using Generative AI (GenAI) todevelop AI surrogates for simulation models in fusion energy research. Thescope includes the methodology, implementation, and results of using GenAI toassist in model development and optimization, comparing these results withprevious manually developed models.

这项工作介绍了在聚变能源研究中使用生成式人工智能（GenAI）为仿真模型开发人工智能代理的详细案例研究。研究范围包括使用 GenAI 协助模型开发和优化的方法、实施和结果，并将这些结果与之前人工开发的模型进行比较。

引用次数: 0

Applying Attribution Explanations in Truth-Discovery Quantitative Bipolar Argumentation Frameworks 在真相发现定量双极论证框架中应用归因解释

arXiv - CS - Artificial Intelligence

Pub Date : 2024-09-09 DOI: arxiv-2409.05831

Xiang Yin, Nico Potyka, Francesca Toni

Explaining the strength of arguments under gradual semantics is receivingincreasing attention. For example, various studies in the literature offerexplanations by computing the attribution scores of arguments or edges inQuantitative Bipolar Argumentation Frameworks (QBAFs). These explanations,known as Argument Attribution Explanations (AAEs) and Relation AttributionExplanations (RAEs), commonly employ removal-based and Shapley-based techniquesfor computing the attribution scores. While AAEs and RAEs have proven useful inseveral applications with acyclic QBAFs, they remain largely unexplored forcyclic QBAFs. Furthermore, existing applications tend to focus solely on eitherAAEs or RAEs, but do not compare them directly. In this paper, we apply bothAAEs and RAEs, to Truth Discovery QBAFs (TD-QBAFs), which assess thetrustworthiness of sources (e.g., websites) and their claims (e.g., theseverity of a virus), and feature complex cycles. We find that both AAEs andRAEs can provide interesting explanations and can give non-trivial andsurprising insights.

解释渐进语义下的论据强度正受到越来越多的关注。例如，文献中的各种研究通过计算定量双极论证框架（QBAFs）中论据或边缘的归因得分来提供解释。这些解释被称为论据归因解释（AAE）和关系归因解释（RAE），通常采用基于移除和基于 Shapley 的技术来计算归因分数。虽然 AAE 和 RAE 在非循环 QBAF 的多个应用中被证明是有用的，但它们在循环 QBAF 中的应用在很大程度上仍未被开发。此外，现有的应用往往只关注 AAE 或 RAE，而不对它们进行直接比较。在本文中，我们将 AAE 和 RAE 都应用于真相发现 QBAFs（TD-QBAFs），该 QBAFs 评估来源（如网站）及其声明（如病毒的严重性）的可信度，并以复杂循环为特征。我们发现 AAE 和RAE 都能提供有趣的解释，并能给出非同一般的惊人见解。

引用次数: 0

MLLM-FL: Multimodal Large Language Model Assisted Federated Learning on Heterogeneous and Long-tailed Data MLLM-FL：异构长尾数据上的多模态大语言模型辅助联合学习

arXiv - CS - Artificial Intelligence

Pub Date : 2024-09-09 DOI: arxiv-2409.06067

Jianyi Zhang, Hao Frank Yang, Ang Li, Xin Guo, Pu Wang, Haiming Wang, Yiran Chen, Hai Li

Previous studies on federated learning (FL) often encounter performancedegradation due to data heterogeneity among different clients. In light of therecent advances in multimodal large language models (MLLMs), such as GPT-4v andLLaVA, which demonstrate their exceptional proficiency in multimodal tasks,such as image captioning and multimodal question answering. We introduce anovel federated learning framework, named Multimodal Large Language ModelAssisted Federated Learning (MLLM-FL), which which employs powerful MLLMs atthe server end to address the heterogeneous and long-tailed challenges. Owingto the advanced cross-modality representation capabilities and the extensiveopen-vocabulary prior knowledge of MLLMs, our framework is adept at harnessingthe extensive, yet previously underexploited, open-source data accessible fromwebsites and powerful server-side computational resources. Hence, the MLLM-FLnot only enhances the performance but also avoids increasing the risk ofprivacy leakage and the computational burden on local devices, distinguishingit from prior methodologies. Our framework has three key stages. Initially,prior to local training on local datasets of clients, we conduct globalvisual-text pretraining of the model. This pretraining is facilitated byutilizing the extensive open-source data available online, with the assistanceof multimodal large language models. Subsequently, the pretrained model isdistributed among various clients for local training. Finally, once the locallytrained models are transmitted back to the server, a global alignment iscarried out under the supervision of MLLMs to further enhance the performance.Experimental evaluations on established benchmarks, show that our frameworkdelivers promising performance in the typical scenarios with data heterogeneityand long-tail distribution across different clients in FL.

以往关于联合学习（FL）的研究经常会遇到由于不同客户端之间的数据异构而导致性能下降的问题。鉴于多模态大型语言模型（MLLMs）的最新进展，如 GPT-4v 和LLaVA，它们在多模态任务（如图像字幕和多模态问题解答）中表现出了非凡的能力。我们介绍了一种新的联合学习框架，名为 "多模态大语言模型辅助联合学习（MLLM-FL）"，它在服务器端采用强大的 MLLM 来应对异构和长尾挑战。由于 MLLMs 先进的跨模态表示能力和广泛的开放词汇先验知识，我们的框架善于利用从网站获取的大量但以前未得到充分利用的开源数据和强大的服务器端计算资源。因此，MLLM-FL 不仅能提高性能，还能避免增加隐私泄露的风险和本地设备的计算负担，从而区别于之前的方法。我们的框架分为三个关键阶段。首先，在对客户的本地数据集进行本地训练之前，我们对模型进行全局视觉文本预训练。在多模态大型语言模型的帮助下，我们利用广泛的在线开源数据进行预训练。随后，预训练好的模型会被分发到不同的客户端进行本地训练。最后，一旦本地训练的模型被传输回服务器，就会在多模态大语言模型的监督下进行全局配准，以进一步提高性能。在已建立的基准上进行的实验评估表明，我们的框架在 FL 中不同客户端数据异构和长尾分布的典型场景中提供了良好的性能。

{"title":"MLLM-FL: Multimodal Large Language Model Assisted Federated Learning on Heterogeneous and Long-tailed Data","authors":"Jianyi Zhang, Hao Frank Yang, Ang Li, Xin Guo, Pu Wang, Haiming Wang, Yiran Chen, Hai Li","doi":"arxiv-2409.06067","DOIUrl":"https://doi.org/arxiv-2409.06067","url":null,"abstract":"Previous studies on federated learning (FL) often encounter performance\u0000degradation due to data heterogeneity among different clients. In light of the\u0000recent advances in multimodal large language models (MLLMs), such as GPT-4v and\u0000LLaVA, which demonstrate their exceptional proficiency in multimodal tasks,\u0000such as image captioning and multimodal question answering. We introduce a\u0000novel federated learning framework, named Multimodal Large Language Model\u0000Assisted Federated Learning (MLLM-FL), which which employs powerful MLLMs at\u0000the server end to address the heterogeneous and long-tailed challenges. Owing\u0000to the advanced cross-modality representation capabilities and the extensive\u0000open-vocabulary prior knowledge of MLLMs, our framework is adept at harnessing\u0000the extensive, yet previously underexploited, open-source data accessible from\u0000websites and powerful server-side computational resources. Hence, the MLLM-FL\u0000not only enhances the performance but also avoids increasing the risk of\u0000privacy leakage and the computational burden on local devices, distinguishing\u0000it from prior methodologies. Our framework has three key stages. Initially,\u0000prior to local training on local datasets of clients, we conduct global\u0000visual-text pretraining of the model. This pretraining is facilitated by\u0000utilizing the extensive open-source data available online, with the assistance\u0000of multimodal large language models. Subsequently, the pretrained model is\u0000distributed among various clients for local training. Finally, once the locally\u0000trained models are transmitted back to the server, a global alignment is\u0000carried out under the supervision of MLLMs to further enhance the performance.\u0000Experimental evaluations on established benchmarks, show that our framework\u0000delivers promising performance in the typical scenarios with data heterogeneity\u0000and long-tail distribution across different clients in FL.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"156 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep Generative Model for Mechanical System Configuration Design 用于机械系统配置设计的深度生成模型

arXiv - CS - Artificial Intelligence

Pub Date : 2024-09-09 DOI: arxiv-2409.06016

Yasaman Etesam, Hyunmin Cheong, Mohammadmehdi Ataei, Pradeep Kumar Jayaraman

Generative AI has made remarkable progress in addressing various designchallenges. One prominent area where generative AI could bring significantvalue is in engineering design. In particular, selecting an optimal set ofcomponents and their interfaces to create a mechanical system that meets designrequirements is one of the most challenging and time-consuming tasks forengineers. This configuration design task is inherently challenging due to itscategorical nature, multiple design requirements a solution must satisfy, andthe reliance on physics simulations for evaluating potential solutions. Thesecharacteristics entail solving a combinatorial optimization problem withmultiple constraints involving black-box functions. To address this challenge,we propose a deep generative model to predict the optimal combination ofcomponents and interfaces for a given design problem. To demonstrate ourapproach, we solve a gear train synthesis problem by first creating a syntheticdataset using a grammar, a parts catalogue, and a physics simulator. We thentrain a Transformer using this dataset, named GearFormer, which can not onlygenerate quality solutions on its own, but also augment search methods such asan evolutionary algorithm and Monte Carlo tree search. We show that GearFormeroutperforms such search methods on their own in terms of satisfying thespecified design requirements with orders of magnitude faster generation time.Additionally, we showcase the benefit of hybrid methods that leverage bothGearFormer and search methods, which further improve the quality of thesolutions.

生成式人工智能在应对各种设计挑战方面取得了显著进展。生成式人工智能能带来重大价值的一个突出领域是工程设计。特别是，选择一组最佳组件及其接口来创建一个符合设计要求的机械系统，是工程师面临的最具挑战性且最耗时的任务之一。这项配置设计任务本身就极具挑战性，因为它具有分类性质，解决方案必须满足多种设计要求，并且依赖物理模拟来评估潜在的解决方案。这些特点要求解决一个具有多个约束条件的组合优化问题，其中涉及黑盒函数。为了应对这一挑战，我们提出了一种深度生成模型，用于预测给定设计问题的组件和接口的最佳组合。为了展示我们的方法，我们首先使用语法、零件目录和物理模拟器创建了一个合成数据集，从而解决了齿轮系合成问题。然后，我们利用这个数据集训练了一个名为 GearFormer 的变形器，它不仅能独立生成高质量的解决方案，还能增强进化算法和蒙特卡洛树搜索等搜索方法。此外，我们还展示了同时利用 GearFormer 和搜索方法的混合方法的优势，它们能进一步提高解决方案的质量。

{"title":"Deep Generative Model for Mechanical System Configuration Design","authors":"Yasaman Etesam, Hyunmin Cheong, Mohammadmehdi Ataei, Pradeep Kumar Jayaraman","doi":"arxiv-2409.06016","DOIUrl":"https://doi.org/arxiv-2409.06016","url":null,"abstract":"Generative AI has made remarkable progress in addressing various design\u0000challenges. One prominent area where generative AI could bring significant\u0000value is in engineering design. In particular, selecting an optimal set of\u0000components and their interfaces to create a mechanical system that meets design\u0000requirements is one of the most challenging and time-consuming tasks for\u0000engineers. This configuration design task is inherently challenging due to its\u0000categorical nature, multiple design requirements a solution must satisfy, and\u0000the reliance on physics simulations for evaluating potential solutions. These\u0000characteristics entail solving a combinatorial optimization problem with\u0000multiple constraints involving black-box functions. To address this challenge,\u0000we propose a deep generative model to predict the optimal combination of\u0000components and interfaces for a given design problem. To demonstrate our\u0000approach, we solve a gear train synthesis problem by first creating a synthetic\u0000dataset using a grammar, a parts catalogue, and a physics simulator. We then\u0000train a Transformer using this dataset, named GearFormer, which can not only\u0000generate quality solutions on its own, but also augment search methods such as\u0000an evolutionary algorithm and Monte Carlo tree search. We show that GearFormer\u0000outperforms such search methods on their own in terms of satisfying the\u0000specified design requirements with orders of magnitude faster generation time.\u0000Additionally, we showcase the benefit of hybrid methods that leverage both\u0000GearFormer and search methods, which further improve the quality of the\u0000solutions.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"79 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Semifactual Explanations for Reinforcement Learning 强化学习的半成品解释

arXiv - CS - Artificial Intelligence

Pub Date : 2024-09-09 DOI: arxiv-2409.05435

Jasmina Gajcin, Jovan Jeromela, Ivana Dusparic

Reinforcement Learning (RL) is a learning paradigm in which the agent learnsfrom its environment through trial and error. Deep reinforcement learning (DRL)algorithms represent the agent's policies using neural networks, making theirdecisions difficult to interpret. Explaining the behaviour of DRL agents isnecessary to advance user trust, increase engagement, and facilitateintegration with real-life tasks. Semifactual explanations aim to explain anoutcome by providing "even if" scenarios, such as "even if the car were movingtwice as slowly, it would still have to swerve to avoid crashing". Semifactualshelp users understand the effects of different factors on the outcome andsupport the optimisation of resources. While extensively studied in psychologyand even utilised in supervised learning, semifactuals have not been used toexplain the decisions of RL systems. In this work, we develop a first approachto generating semifactual explanations for RL agents. We start by defining fiveproperties of desirable semifactual explanations in RL and then introducingSGRL-Rewind and SGRL-Advance, the first algorithms for generating semifactualexplanations in RL. We evaluate the algorithms in two standard RL environmentsand find that they generate semifactuals that are easier to reach, representthe agent's policy better, and are more diverse compared to baselines. Lastly,we conduct and analyse a user study to assess the participant's perception ofsemifactual explanations of the agent's actions.

强化学习（RL）是一种学习范式，在这种范式中，代理通过试错从环境中学习。深度强化学习（DRL）算法使用神经网络表示代理的策略，因此很难解释其决策。解释 DRL 代理的行为对于提高用户信任度、增加参与度以及促进与现实生活任务的整合非常必要。半事实性解释旨在通过提供 "即使 "场景来解释结果，例如 "即使汽车的速度慢了两倍，它仍然必须转弯以避免撞车"。半事实帮助用户理解不同因素对结果的影响，并支持资源优化。虽然半事实在心理学中得到了广泛的研究，甚至在监督学习中也得到了应用，但是半事实还没有被用于解释 RL 系统的决策。在这项工作中，我们开发了第一种为 RL 代理生成半事实解释的方法。我们首先定义了 RL 中理想的半事实解释的五个属性，然后介绍了 SGRL-Rewind 和 SGRL-Advance，它们是在 RL 中生成半事实解释的第一种算法。我们在两个标准的 RL 环境中对这两种算法进行了评估，发现与基线算法相比，这两种算法生成的半事实解释更容易达成，能更好地代表代理的策略，而且更加多样化。最后，我们开展并分析了一项用户研究，以评估参与者对代理行动的半事实解释的感知。

{"title":"Semifactual Explanations for Reinforcement Learning","authors":"Jasmina Gajcin, Jovan Jeromela, Ivana Dusparic","doi":"arxiv-2409.05435","DOIUrl":"https://doi.org/arxiv-2409.05435","url":null,"abstract":"Reinforcement Learning (RL) is a learning paradigm in which the agent learns\u0000from its environment through trial and error. Deep reinforcement learning (DRL)\u0000algorithms represent the agent's policies using neural networks, making their\u0000decisions difficult to interpret. Explaining the behaviour of DRL agents is\u0000necessary to advance user trust, increase engagement, and facilitate\u0000integration with real-life tasks. Semifactual explanations aim to explain an\u0000outcome by providing \"even if\" scenarios, such as \"even if the car were moving\u0000twice as slowly, it would still have to swerve to avoid crashing\". Semifactuals\u0000help users understand the effects of different factors on the outcome and\u0000support the optimisation of resources. While extensively studied in psychology\u0000and even utilised in supervised learning, semifactuals have not been used to\u0000explain the decisions of RL systems. In this work, we develop a first approach\u0000to generating semifactual explanations for RL agents. We start by defining five\u0000properties of desirable semifactual explanations in RL and then introducing\u0000SGRL-Rewind and SGRL-Advance, the first algorithms for generating semifactual\u0000explanations in RL. We evaluate the algorithms in two standard RL environments\u0000and find that they generate semifactuals that are easier to reach, represent\u0000the agent's policy better, and are more diverse compared to baselines. Lastly,\u0000we conduct and analyse a user study to assess the participant's perception of\u0000semifactual explanations of the agent's actions.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142193901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0