首页 > 最新文献

ArXiv最新文献

英文 中文
POBEVM: Real-time Video Matting via Progressively Optimize the Target Body and Edge POBEVM:通过逐步优化目标体和边缘进行实时视频匹配
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09731
Jianming Xian
Deep convolutional neural networks (CNNs) based approaches have achieved great performance in video matting. Many of these methods can produce accurate alpha estimation for the target body but typically yield fuzzy or incorrect target edges. This is usually caused by the following reasons: 1) The current methods always treat the target body and edge indiscriminately; 2) Target body dominates the whole target with only a tiny proportion target edge. For the first problem, we propose a CNN-based module that separately optimizes the matting target body and edge (SOBE). And on this basis, we introduce a real-time, trimap-free video matting method via progressively optimizing the matting target body and edge (POBEVM) that is much lighter than previous approaches and achieves significant improvements in the predicted target edge. For the second problem, we propose an Edge-L1-Loss (ELL) function that enforces our network on the matting target edge. Experiments demonstrate our method outperforms prior trimap-free matting methods on both Distinctions-646 (D646) and VideoMatte240K(VM) dataset, especially in edge optimization.
基于深度卷积神经网络(CNN)的方法在视频消隐方面取得了很好的效果。其中许多方法可以对目标体进行精确的阿尔法估计,但通常会产生模糊或不正确的目标边缘。这通常是由以下原因造成的:1) 目前的方法总是不加区分地处理目标主体和边缘;2) 目标主体在整个目标中占主导地位,而目标边缘只占很小的比例。针对第一个问题,我们提出了一种基于 CNN 的模块,可分别优化目标体和边缘的匹配(SOBE)。在此基础上,我们引入了一种通过逐步优化消隐目标身体和边缘的实时无修剪视频消隐方法(POBEVM),该方法比以往的方法更轻便,并能显著改善预测的目标边缘。针对第二个问题,我们提出了一种边缘-L1-损失(ELL)函数,该函数可在消隐目标边缘上执行我们的网络。实验证明,在 Distinctions-646 (D646) 和 VideoMatte240K(VM) 数据集上,我们的方法优于之前的无修剪消隐方法,尤其是在边缘优化方面。
{"title":"POBEVM: Real-time Video Matting via Progressively Optimize the Target Body and Edge","authors":"Jianming Xian","doi":"10.48550/arXiv.2402.09731","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09731","url":null,"abstract":"Deep convolutional neural networks (CNNs) based approaches have achieved great performance in video matting. Many of these methods can produce accurate alpha estimation for the target body but typically yield fuzzy or incorrect target edges. This is usually caused by the following reasons: 1) The current methods always treat the target body and edge indiscriminately; 2) Target body dominates the whole target with only a tiny proportion target edge. For the first problem, we propose a CNN-based module that separately optimizes the matting target body and edge (SOBE). And on this basis, we introduce a real-time, trimap-free video matting method via progressively optimizing the matting target body and edge (POBEVM) that is much lighter than previous approaches and achieves significant improvements in the predicted target edge. For the second problem, we propose an Edge-L1-Loss (ELL) function that enforces our network on the matting target edge. Experiments demonstrate our method outperforms prior trimap-free matting methods on both Distinctions-646 (D646) and VideoMatte240K(VM) dataset, especially in edge optimization.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"29 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Agents Need Not Know Their Purpose 代理人不必知道自己的目的
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09734
Paulo Garcia
Ensuring artificial intelligence behaves in such a way that is aligned with human values is commonly referred to as the alignment challenge. Prior work has shown that rational agents, behaving in such a way that maximizes a utility function, will inevitably behave in such a way that is not aligned with human values, especially as their level of intelligence goes up. Prior work has also shown that there is no"one true utility function"; solutions must include a more holistic approach to alignment. This paper describes oblivious agents: agents that are architected in such a way that their effective utility function is an aggregation of a known and hidden sub-functions. The hidden component, to be maximized, is internally implemented as a black box, preventing the agent from examining it. The known component, to be minimized, is knowledge of the hidden sub-function. Architectural constraints further influence how agent actions can evolve its internal environment model. We show that an oblivious agent, behaving rationally, constructs an internal approximation of designers' intentions (i.e., infers alignment), and, as a consequence of its architecture and effective utility function, behaves in such a way that maximizes alignment; i.e., maximizing the approximated intention function. We show that, paradoxically, it does this for whatever utility function is used as the hidden component and, in contrast with extant techniques, chances of alignment actually improve as agent intelligence grows.
确保人工智能的行为与人类价值观相一致,通常被称为 "一致性挑战"。先前的研究表明,理性代理人的行为方式在使效用函数最大化的同时,不可避免地会与人类价值观不一致,尤其是当他们的智能水平不断提高时。先前的研究还表明,不存在 "一种真正的效用函数";解决方案必须包括一种更全面的协调方法。本文描述的是遗忘型代理:这种代理的架构方式使其有效效用函数成为已知和隐藏子函数的集合。要实现最大化的隐藏部分在内部是一个黑盒子,特工无法对其进行检查。要最小化的已知部分是对隐藏子函数的了解。架构限制进一步影响了代理行动如何发展其内部环境模型。我们证明,一个理性行为的遗忘代理会构建一个设计者意图的内部近似值(即推断对齐),并且,作为其架构和有效效用函数的结果,其行为方式会使对齐最大化;即,使近似意图函数最大化。我们的研究表明,矛盾的是,无论使用什么效用函数作为隐藏组件,它都能做到这一点,而且与现有技术不同的是,随着代理智能的提高,对齐的机会实际上也在提高。
{"title":"Agents Need Not Know Their Purpose","authors":"Paulo Garcia","doi":"10.48550/arXiv.2402.09734","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09734","url":null,"abstract":"Ensuring artificial intelligence behaves in such a way that is aligned with human values is commonly referred to as the alignment challenge. Prior work has shown that rational agents, behaving in such a way that maximizes a utility function, will inevitably behave in such a way that is not aligned with human values, especially as their level of intelligence goes up. Prior work has also shown that there is no\"one true utility function\"; solutions must include a more holistic approach to alignment. This paper describes oblivious agents: agents that are architected in such a way that their effective utility function is an aggregation of a known and hidden sub-functions. The hidden component, to be maximized, is internally implemented as a black box, preventing the agent from examining it. The known component, to be minimized, is knowledge of the hidden sub-function. Architectural constraints further influence how agent actions can evolve its internal environment model. We show that an oblivious agent, behaving rationally, constructs an internal approximation of designers' intentions (i.e., infers alignment), and, as a consequence of its architecture and effective utility function, behaves in such a way that maximizes alignment; i.e., maximizing the approximated intention function. We show that, paradoxically, it does this for whatever utility function is used as the hidden component and, in contrast with extant techniques, chances of alignment actually improve as agent intelligence grows.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"26 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FedAnchor: Enhancing Federated Semi-Supervised Learning with Label Contrastive Loss for Unlabeled Clients FedAnchor:利用未标记客户端的标签对比损失加强联合半监督学习
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.10191
Xinchi Qiu, Yan Gao, Lorenzo Sani, Heng Pan, Wanru Zhao, Pedro Gusmão, Mina Alibeigi, Alexandru Iacob, Nicholas D. Lane
Federated learning (FL) is a distributed learning paradigm that facilitates collaborative training of a shared global model across devices while keeping data localized. The deployment of FL in numerous real-world applications faces delays, primarily due to the prevalent reliance on supervised tasks. Generating detailed labels at edge devices, if feasible, is demanding, given resource constraints and the imperative for continuous data updates. In addressing these challenges, solutions such as federated semi-supervised learning (FSSL), which relies on unlabeled clients' data and a limited amount of labeled data on the server, become pivotal. In this paper, we propose FedAnchor, an innovative FSSL method that introduces a unique double-head structure, called anchor head, paired with the classification head trained exclusively on labeled anchor data on the server. The anchor head is empowered with a newly designed label contrastive loss based on the cosine similarity metric. Our approach mitigates the confirmation bias and overfitting issues associated with pseudo-labeling techniques based on high-confidence model prediction samples. Extensive experiments on CIFAR10/100 and SVHN datasets demonstrate that our method outperforms the state-of-the-art method by a significant margin in terms of convergence rate and model accuracy.
联合学习(FL)是一种分布式学习范式,有利于跨设备协作训练共享的全局模型,同时保持数据的本地化。在现实世界的众多应用中,FL 的部署面临着延迟,主要原因是普遍依赖于监督任务。考虑到资源限制和持续数据更新的必要性,在边缘设备上生成详细标签(如果可行的话)要求很高。在应对这些挑战时,联合半监督学习(FSSL)等解决方案变得至关重要,因为联合半监督学习依赖于未标记的客户端数据和服务器上有限的标记数据。在本文中,我们提出了一种创新的 FSSL 方法--FedAnchor,它引入了一种独特的双头结构,称为锚头(anchor head),与完全根据服务器上有标签的锚数据训练的分类头配对。锚头具有基于余弦相似度量新设计的标签对比损失。我们的方法减轻了与基于高置信度模型预测样本的伪标签技术相关的确认偏差和过拟合问题。在 CIFAR10/100 和 SVHN 数据集上进行的大量实验表明,我们的方法在收敛速度和模型准确性方面明显优于最先进的方法。
{"title":"FedAnchor: Enhancing Federated Semi-Supervised Learning with Label Contrastive Loss for Unlabeled Clients","authors":"Xinchi Qiu, Yan Gao, Lorenzo Sani, Heng Pan, Wanru Zhao, Pedro Gusmão, Mina Alibeigi, Alexandru Iacob, Nicholas D. Lane","doi":"10.48550/arXiv.2402.10191","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10191","url":null,"abstract":"Federated learning (FL) is a distributed learning paradigm that facilitates collaborative training of a shared global model across devices while keeping data localized. The deployment of FL in numerous real-world applications faces delays, primarily due to the prevalent reliance on supervised tasks. Generating detailed labels at edge devices, if feasible, is demanding, given resource constraints and the imperative for continuous data updates. In addressing these challenges, solutions such as federated semi-supervised learning (FSSL), which relies on unlabeled clients' data and a limited amount of labeled data on the server, become pivotal. In this paper, we propose FedAnchor, an innovative FSSL method that introduces a unique double-head structure, called anchor head, paired with the classification head trained exclusively on labeled anchor data on the server. The anchor head is empowered with a newly designed label contrastive loss based on the cosine similarity metric. Our approach mitigates the confirmation bias and overfitting issues associated with pseudo-labeling techniques based on high-confidence model prediction samples. Extensive experiments on CIFAR10/100 and SVHN datasets demonstrate that our method outperforms the state-of-the-art method by a significant margin in terms of convergence rate and model accuracy.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A cross-talk robust multichannel VAD model for multiparty agent interactions trained using synthetic re-recordings 利用合成再记录训练多方代理互动的交叉稳健多通道 VAD 模型
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09797
Hyewon Han, Naveen Kumar
In this work, we propose a novel cross-talk rejection framework for a multi-channel multi-talker setup for a live multiparty interactive show. Our far-field audio setup is required to be hands-free during live interaction and comprises four adjacent talkers with directional microphones in the same space. Such setups often introduce heavy cross-talk between channels, resulting in reduced automatic speech recognition (ASR) and natural language understanding (NLU) performance. To address this problem, we propose voice activity detection (VAD) model for all talkers using multichannel information, which is then used to filter audio for downstream tasks. We adopt a synthetic training data generation approach through playback and re-recording for such scenarios, simulating challenging speech overlap conditions. We train our models on this synthetic data and demonstrate that our approach outperforms single-channel VAD models and energy-based multi-channel VAD algorithm in various acoustic environments. In addition to VAD results, we also present multiparty ASR evaluation results to highlight the impact of using our VAD model for filtering audio in downstream tasks by significantly reducing the insertion error.
在这项工作中,我们为现场多方互动节目的多通道多谈话者设置提出了一种新颖的串扰抑制框架。我们的远场音频设置要求在现场互动时免提,由四个相邻的谈话者在同一空间内使用定向麦克风组成。这种设置通常会在声道之间产生严重的串扰,从而降低自动语音识别(ASR)和自然语言理解(NLU)的性能。为解决这一问题,我们提出了利用多通道信息对所有说话者进行语音活动检测(VAD)的模型,然后利用该模型为下游任务过滤音频。我们采用一种合成训练数据生成方法,通过回放和重新录制此类场景,模拟具有挑战性的语音重叠条件。我们在这些合成数据上训练我们的模型,并证明我们的方法在各种声学环境中优于单通道 VAD 模型和基于能量的多通道 VAD 算法。除了 VAD 结果外,我们还展示了多方 ASR 评估结果,以强调在下游任务中使用我们的 VAD 模型过滤音频的影响,即显著减少插入误差。
{"title":"A cross-talk robust multichannel VAD model for multiparty agent interactions trained using synthetic re-recordings","authors":"Hyewon Han, Naveen Kumar","doi":"10.48550/arXiv.2402.09797","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09797","url":null,"abstract":"In this work, we propose a novel cross-talk rejection framework for a multi-channel multi-talker setup for a live multiparty interactive show. Our far-field audio setup is required to be hands-free during live interaction and comprises four adjacent talkers with directional microphones in the same space. Such setups often introduce heavy cross-talk between channels, resulting in reduced automatic speech recognition (ASR) and natural language understanding (NLU) performance. To address this problem, we propose voice activity detection (VAD) model for all talkers using multichannel information, which is then used to filter audio for downstream tasks. We adopt a synthetic training data generation approach through playback and re-recording for such scenarios, simulating challenging speech overlap conditions. We train our models on this synthetic data and demonstrate that our approach outperforms single-channel VAD models and energy-based multi-channel VAD algorithm in various acoustic environments. In addition to VAD results, we also present multiparty ASR evaluation results to highlight the impact of using our VAD model for filtering audio in downstream tasks by significantly reducing the insertion error.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"19 11","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rethinking Information Structures in RLHF: Reward Generalization from a Graph Theory Perspective 反思 RLHF 中的信息结构:从图论角度看奖励泛化
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.10184
Tianyi Qiu, Fanzhi Zeng, Jiaming Ji, Dong Yan, Kaile Wang, Jiayi Zhou, Han Yang, Josef Dai, Xuehai Pan, Yaodong Yang
There is a trilemma in reinforcement learning from human feedback (RLHF): the incompatibility between highly diverse contexts, low labeling cost, and reliable alignment performance. Here we aim to mitigate such incompatibility through the design of dataset information structures during reward modeling, and meanwhile propose new, generalizable methods of analysis that have wider applications, including potentially shedding light on goal misgeneralization. Specifically, we first reexamine the RLHF process and propose a theoretical framework portraying it as an autoencoding process over text distributions. Our framework formalizes the RLHF objective of ensuring distributional consistency between human preference and large language model (LLM) behavior. Based on this framework, we introduce a new method to model generalization in the reward modeling stage of RLHF, the induced Bayesian network (IBN). Drawing from random graph theory and causal analysis, it enables empirically grounded derivation of generalization error bounds, a key improvement over classical methods of generalization analysis. An insight from our analysis is the superiority of the tree-based information structure in reward modeling, compared to chain-based baselines in conventional RLHF methods. We derive that in complex contexts with limited data, the tree-based reward model (RM) induces up to $Theta(log n/loglog n)$ times less variance than chain-based RM where $n$ is the dataset size. As validation, we demonstrate that on three NLP tasks, the tree-based RM achieves 65% win rate on average against chain-based baselines. Looking ahead, we hope to extend the IBN analysis to help understand the phenomenon of goal misgeneralization.
来自人类反馈的强化学习(RLHF)存在一个三难问题:高度多样化的情境、低标记成本和可靠的配准性能之间的不兼容性。在此,我们旨在通过在奖励建模过程中设计数据集信息结构来缓解这种不兼容性,同时提出新的、可推广的分析方法,这些方法具有更广泛的应用前景,包括可能揭示目标泛化错误。具体来说,我们首先重新审视了 RLHF 过程,并提出了一个理论框架,将其描绘成文本分布的自动编码过程。我们的框架形式化了 RLHF 目标,即确保人类偏好与大型语言模型(LLM)行为之间的分布一致性。基于这一框架,我们在 RLHF 的奖励建模阶段引入了一种新的泛化建模方法--诱导贝叶斯网络(IBN)。该方法借鉴了随机图理论和因果分析,能够根据经验推导出泛化误差边界,是对经典泛化分析方法的重要改进。与传统 RLHF 方法中基于链的基线相比,我们的分析深入揭示了基于树的信息结构在奖赏建模中的优越性。我们得出,在数据有限的复杂情况下,基于树的奖励模型(RM)比基于链的奖励模型(其中$n$为数据集大小)引起的方差最多可减少$θ(log n/loglog n)$倍。作为验证,我们证明在三个 NLP 任务中,基于树的 RM 与基于链的基线相比,平均胜率达到 65%。展望未来,我们希望扩展 IBN 分析,以帮助理解目标概括错误的现象。
{"title":"Rethinking Information Structures in RLHF: Reward Generalization from a Graph Theory Perspective","authors":"Tianyi Qiu, Fanzhi Zeng, Jiaming Ji, Dong Yan, Kaile Wang, Jiayi Zhou, Han Yang, Josef Dai, Xuehai Pan, Yaodong Yang","doi":"10.48550/arXiv.2402.10184","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10184","url":null,"abstract":"There is a trilemma in reinforcement learning from human feedback (RLHF): the incompatibility between highly diverse contexts, low labeling cost, and reliable alignment performance. Here we aim to mitigate such incompatibility through the design of dataset information structures during reward modeling, and meanwhile propose new, generalizable methods of analysis that have wider applications, including potentially shedding light on goal misgeneralization. Specifically, we first reexamine the RLHF process and propose a theoretical framework portraying it as an autoencoding process over text distributions. Our framework formalizes the RLHF objective of ensuring distributional consistency between human preference and large language model (LLM) behavior. Based on this framework, we introduce a new method to model generalization in the reward modeling stage of RLHF, the induced Bayesian network (IBN). Drawing from random graph theory and causal analysis, it enables empirically grounded derivation of generalization error bounds, a key improvement over classical methods of generalization analysis. An insight from our analysis is the superiority of the tree-based information structure in reward modeling, compared to chain-based baselines in conventional RLHF methods. We derive that in complex contexts with limited data, the tree-based reward model (RM) induces up to $Theta(log n/loglog n)$ times less variance than chain-based RM where $n$ is the dataset size. As validation, we demonstrate that on three NLP tasks, the tree-based RM achieves 65% win rate on average against chain-based baselines. Looking ahead, we hope to extend the IBN analysis to help understand the phenomenon of goal misgeneralization.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"21 22","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Persuading a Learning Agent 说服学习代理
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09721
Tao Lin, Yiling Chen
We study a repeated Bayesian persuasion problem (and more generally, any generalized principal-agent problem with complete information) where the principal does not have commitment power and the agent uses algorithms to learn to respond to the principal's signals. We reduce this problem to a one-shot generalized principal-agent problem with an approximately-best-responding agent. This reduction allows us to show that: if the agent uses contextual no-regret learning algorithms, then the principal can guarantee a utility that is arbitrarily close to the principal's optimal utility in the classic non-learning model with commitment; if the agent uses contextual no-swap-regret learning algorithms, then the principal cannot obtain any utility significantly more than the optimal utility in the non-learning model with commitment. The difference between the principal's obtainable utility in the learning model and the non-learning model is bounded by the agent's regret (swap-regret). If the agent uses mean-based learning algorithms (which can be no-regret but not no-swap-regret), then the principal can do significantly better than the non-learning model. These conclusions hold not only for Bayesian persuasion, but also for any generalized principal-agent problem with complete information, including Stackelberg games and contract design.
我们研究的是一个重复贝叶斯说服问题(更广泛地说,是任何具有完整信息的广义委托代理问题),在这个问题中,委托人没有承诺能力,代理人使用算法来学习对委托人的信号做出反应。我们将这一问题简化为一个具有近似最佳响应代理的单次广义委托代理问题。通过这一简化,我们可以证明:如果代理人使用上下文无悔学习算法,那么委托人可以保证获得任意接近于有承诺的经典非学习模型中委托人最优效用的效用;如果代理人使用上下文无交换-后悔学习算法,那么委托人无法获得明显高于有承诺的非学习模型中最优效用的任何效用。委托人在学习模型和非学习模型中可获得的效用之间的差额以代理人的后悔(交换-后悔)为界。如果代理人使用基于均值的学习算法(可以是无遗憾算法,但不能是无交换遗憾算法),那么委托人的表现就会大大优于非学习模型。这些结论不仅适用于贝叶斯说服,也适用于任何具有完全信息的广义委托-代理问题,包括斯塔克尔伯格博弈和合同设计。
{"title":"Persuading a Learning Agent","authors":"Tao Lin, Yiling Chen","doi":"10.48550/arXiv.2402.09721","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09721","url":null,"abstract":"We study a repeated Bayesian persuasion problem (and more generally, any generalized principal-agent problem with complete information) where the principal does not have commitment power and the agent uses algorithms to learn to respond to the principal's signals. We reduce this problem to a one-shot generalized principal-agent problem with an approximately-best-responding agent. This reduction allows us to show that: if the agent uses contextual no-regret learning algorithms, then the principal can guarantee a utility that is arbitrarily close to the principal's optimal utility in the classic non-learning model with commitment; if the agent uses contextual no-swap-regret learning algorithms, then the principal cannot obtain any utility significantly more than the optimal utility in the non-learning model with commitment. The difference between the principal's obtainable utility in the learning model and the non-learning model is bounded by the agent's regret (swap-regret). If the agent uses mean-based learning algorithms (which can be no-regret but not no-swap-regret), then the principal can do significantly better than the non-learning model. These conclusions hold not only for Bayesian persuasion, but also for any generalized principal-agent problem with complete information, including Stackelberg games and contract design.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"9 20","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FedRDF: A Robust and Dynamic Aggregation Function against Poisoning Attacks in Federated Learning FedRDF:联盟学习中抵御中毒攻击的稳健动态聚合函数
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.10082
Enrique Mármol Campos, Aurora González-Vidal, José Luis Hernández Ramos, A. Gómez-Skarmeta
Federated Learning (FL) represents a promising approach to typical privacy concerns associated with centralized Machine Learning (ML) deployments. Despite its well-known advantages, FL is vulnerable to security attacks such as Byzantine behaviors and poisoning attacks, which can significantly degrade model performance and hinder convergence. The effectiveness of existing approaches to mitigate complex attacks, such as median, trimmed mean, or Krum aggregation functions, has been only partially demonstrated in the case of specific attacks. Our study introduces a novel robust aggregation mechanism utilizing the Fourier Transform (FT), which is able to effectively handling sophisticated attacks without prior knowledge of the number of attackers. Employing this data technique, weights generated by FL clients are projected into the frequency domain to ascertain their density function, selecting the one exhibiting the highest frequency. Consequently, malicious clients' weights are excluded. Our proposed approach was tested against various model poisoning attacks, demonstrating superior performance over state-of-the-art aggregation methods.
联合学习(FL)是解决与集中式机器学习(ML)部署相关的典型隐私问题的一种有前途的方法。尽管联合学习具有众所周知的优势,但它很容易受到拜占庭行为和中毒攻击等安全攻击,这些攻击会显著降低模型性能并阻碍收敛。现有的缓解复杂攻击的方法,如中位数、修剪均值或克鲁姆聚合函数,仅在特定攻击情况下部分证明了其有效性。我们的研究引入了一种利用傅立叶变换(FT)的新型稳健聚合机制,它能够有效处理复杂的攻击,而无需事先了解攻击者的数量。利用这种数据技术,FL 客户端生成的权重被投射到频域中,以确定其密度函数,并选择频率最高的一个。因此,恶意客户端的权重被排除在外。我们提出的方法针对各种模型中毒攻击进行了测试,证明其性能优于最先进的聚合方法。
{"title":"FedRDF: A Robust and Dynamic Aggregation Function against Poisoning Attacks in Federated Learning","authors":"Enrique Mármol Campos, Aurora González-Vidal, José Luis Hernández Ramos, A. Gómez-Skarmeta","doi":"10.48550/arXiv.2402.10082","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10082","url":null,"abstract":"Federated Learning (FL) represents a promising approach to typical privacy concerns associated with centralized Machine Learning (ML) deployments. Despite its well-known advantages, FL is vulnerable to security attacks such as Byzantine behaviors and poisoning attacks, which can significantly degrade model performance and hinder convergence. The effectiveness of existing approaches to mitigate complex attacks, such as median, trimmed mean, or Krum aggregation functions, has been only partially demonstrated in the case of specific attacks. Our study introduces a novel robust aggregation mechanism utilizing the Fourier Transform (FT), which is able to effectively handling sophisticated attacks without prior knowledge of the number of attackers. Employing this data technique, weights generated by FL clients are projected into the frequency domain to ascertain their density function, selecting the one exhibiting the highest frequency. Consequently, malicious clients' weights are excluded. Our proposed approach was tested against various model poisoning attacks, demonstrating superior performance over state-of-the-art aggregation methods.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OptiMUS: Scalable Optimization Modeling with (MI)LP Solvers and Large Language Models OptiMUS:利用 (MI)LP 求解器和大型语言模型进行可扩展优化建模
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.10172
Ali AhmadiTeshnizi, Wenzhi Gao, Madeleine Udell
Optimization problems are pervasive in sectors from manufacturing and distribution to healthcare. However, most such problems are still solved heuristically by hand rather than optimally by state-of-the-art solvers because the expertise required to formulate and solve these problems limits the widespread adoption of optimization tools and techniques. This paper introduces OptiMUS, a Large Language Model (LLM)-based agent designed to formulate and solve (mixed integer) linear programming problems from their natural language descriptions. OptiMUS can develop mathematical models, write and debug solver code, evaluate the generated solutions, and improve its model and code based on these evaluations. OptiMUS utilizes a modular structure to process problems, allowing it to handle problems with long descriptions and complex data without long prompts. Experiments demonstrate that OptiMUS outperforms existing state-of-the-art methods on easy datasets by more than $20%$ and on hard datasets (including a new dataset, NLP4LP, released with this paper that features long and complex problems) by more than $30%$.
从制造业、分销业到医疗保健业,优化问题无处不在。然而,由于制定和解决这些问题所需的专业知识限制了优化工具和技术的广泛应用,因此大多数此类问题仍由人工启发式解决,而不是由最先进的求解器优化解决。本文介绍了 OptiMUS,这是一种基于大型语言模型(LLM)的代理,旨在根据自然语言描述制定和解决(混合整数)线性规划问题。OptiMUS 可以开发数学模型、编写和调试求解器代码、评估生成的解决方案,并根据评估结果改进其模型和代码。OptiMUS 采用模块化结构处理问题,因此可以处理描述冗长、数据复杂的问题,而无需冗长的提示。实验证明,在简单数据集上,OptiMUS 的性能比现有的一流方法高出 20% 美元以上,而在困难数据集上(包括与本文一同发布的新数据集 NLP4LP,该数据集具有长而复杂的问题),OptiMUS 的性能比现有的一流方法高出 30% 美元以上。
{"title":"OptiMUS: Scalable Optimization Modeling with (MI)LP Solvers and Large Language Models","authors":"Ali AhmadiTeshnizi, Wenzhi Gao, Madeleine Udell","doi":"10.48550/arXiv.2402.10172","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10172","url":null,"abstract":"Optimization problems are pervasive in sectors from manufacturing and distribution to healthcare. However, most such problems are still solved heuristically by hand rather than optimally by state-of-the-art solvers because the expertise required to formulate and solve these problems limits the widespread adoption of optimization tools and techniques. This paper introduces OptiMUS, a Large Language Model (LLM)-based agent designed to formulate and solve (mixed integer) linear programming problems from their natural language descriptions. OptiMUS can develop mathematical models, write and debug solver code, evaluate the generated solutions, and improve its model and code based on these evaluations. OptiMUS utilizes a modular structure to process problems, allowing it to handle problems with long descriptions and complex data without long prompts. Experiments demonstrate that OptiMUS outperforms existing state-of-the-art methods on easy datasets by more than $20%$ and on hard datasets (including a new dataset, NLP4LP, released with this paper that features long and complex problems) by more than $30%$.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"8 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation 文本到图像生成扩散模型的自播放微调
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.10210
Huizhuo Yuan, Zixiang Chen, Kaixuan Ji, Quanquan Gu
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI), especially when compared with the remarkable progress made in fine-tuning Large Language Models (LLMs). While cutting-edge diffusion models such as Stable Diffusion (SD) and SDXL rely on supervised fine-tuning, their performance inevitably plateaus after seeing a certain volume of data. Recently, reinforcement learning (RL) has been employed to fine-tune diffusion models with human preference data, but it requires at least two images ("winner"and"loser"images) for each text prompt. In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion), where the diffusion model engages in competition with its earlier versions, facilitating an iterative self-improvement process. Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment. Our experiments on the Pick-a-Pic dataset reveal that SPIN-Diffusion outperforms the existing supervised fine-tuning method in aspects of human preference alignment and visual appeal right from its first iteration. By the second iteration, it exceeds the performance of RLHF-based methods across all metrics, achieving these results with less data.
对扩散模型进行微调仍然是生成式人工智能(GenAI)领域一个尚未充分开发的前沿领域,尤其是与在微调大型语言模型(LLM)方面取得的显著进展相比。虽然稳定扩散(SD)和 SDXL 等尖端扩散模型依赖于有监督的微调,但它们的性能在看到一定量的数据后会不可避免地趋于平稳。最近,强化学习(RL)被用于利用人类偏好数据对扩散模型进行微调,但它要求每个文本提示至少有两个图像("赢家 "和 "输家 "图像)。在本文中,我们引入了一种称为扩散模型自我游戏微调(SPIN-Diffusion)的创新技术,在这种技术中,扩散模型与其早期版本进行竞争,从而促进迭代式自我改进过程。我们的方法可替代传统的监督微调和 RL 策略,显著提高模型性能和一致性。我们在 Pick-a-Pic 数据集上的实验表明,SPIN-Diffusion 从第一次迭代开始就在人类偏好一致性和视觉吸引力方面优于现有的监督微调方法。到第二次迭代时,它在所有指标上的表现都超过了基于 RLHF 的方法,而且只用了较少的数据就取得了这些结果。
{"title":"Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation","authors":"Huizhuo Yuan, Zixiang Chen, Kaixuan Ji, Quanquan Gu","doi":"10.48550/arXiv.2402.10210","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10210","url":null,"abstract":"Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI), especially when compared with the remarkable progress made in fine-tuning Large Language Models (LLMs). While cutting-edge diffusion models such as Stable Diffusion (SD) and SDXL rely on supervised fine-tuning, their performance inevitably plateaus after seeing a certain volume of data. Recently, reinforcement learning (RL) has been employed to fine-tune diffusion models with human preference data, but it requires at least two images (\"winner\"and\"loser\"images) for each text prompt. In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion), where the diffusion model engages in competition with its earlier versions, facilitating an iterative self-improvement process. Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment. Our experiments on the Pick-a-Pic dataset reveal that SPIN-Diffusion outperforms the existing supervised fine-tuning method in aspects of human preference alignment and visual appeal right from its first iteration. By the second iteration, it exceeds the performance of RLHF-based methods across all metrics, achieving these results with less data.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"1 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Two-Timescale Design for Active STAR-RIS Aided Massive MIMO Systems 主动式 STAR-RIS 辅助大规模多输入多输出系统的双时标设计
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09896
Anastasios K. Papazafeiropoulos, Hanxiao Ge, P. Kourtessis, T. Ratnarajah, S. Chatzinotas, S. Papavassiliou
Simultaneously transmitting and reflecting textcolor{black}{reconfigurable intelligent surface} (STAR-RIS) is a promising implementation of RIS-assisted systems that enables full-space coverage. However, STAR-RIS as well as conventional RIS suffer from the double-fading effect. Thus, in this paper, we propose the marriage of active RIS and STAR-RIS, denoted as ASTARS for massive multiple-input multiple-output (mMIMO) systems, and we focus on the energy splitting (ES) and mode switching (MS) protocols. Compared to prior literature, we consider the impact of correlated fading, and we rely our analysis on the two timescale protocol, being dependent on statistical channel state information (CSI). On this ground, we propose a channel estimation method for ASTARS with reduced overhead that accounts for its architecture. Next, we derive a textcolor{black}{closed-form expression} for the achievable sum-rate for both types of users in the transmission and reflection regions in a unified approach with significant practical advantages such as reduced complexity and overhead, which result in a lower number of required iterations for convergence compared to an alternating optimization (AO) approach. Notably, we maximize simultaneously the amplitudes, the phase shifts, and the active amplifying coefficients of the ASTARS by applying the projected gradient ascent method (PGAM). Remarkably, the proposed optimization can be executed at every several coherence intervals that reduces the processing burden considerably. Simulations corroborate the analytical results, provide insight into the effects of fundamental variables on the sum achievable SE, and present the superiority of 16 ASTARS compared to passive STAR-RIS for a practical number of surface elements.
同时发射和反射(textcolor{black}{可重新配置的智能表面}(STAR-RIS)是 RIS 辅助系统的一种有前途的实现方式,它能实现全空间覆盖。然而,STAR-RIS 和传统的 RIS 都存在双衰减效应。因此,在本文中,我们提出了主动 RIS 与 STAR-RIS 的结合,并将其命名为 ASTARS,用于大规模多输入多输出(mMIMO)系统,重点关注能量分割(ES)和模式切换(MS)协议。与之前的文献相比,我们考虑了相关衰落的影响,并依赖于统计信道状态信息(CSI)对两个时标协议进行分析。在此基础上,我们提出了一种针对 ASTARS 的信道估计方法,该方法开销较小,且考虑到了 ASTARS 的架构。接下来,我们用一种统一的方法推导出了两类用户在传输和反射区域的可实现总速率(textcolor{black}{闭式表达式}),这种方法具有显著的实用优势,如降低了复杂性和开销,与交替优化(AO)方法相比,收敛所需的迭代次数更少。值得注意的是,通过应用投影梯度上升法(PGAM),我们同时最大化了 ASTARS 的振幅、相移和有源放大系数。值得注意的是,所提出的优化方法可以在每几个相干间隔内执行,从而大大减轻了处理负担。仿真证实了分析结果,深入分析了基本变量对可实现 SE 之和的影响,并展示了 16 ASTARS 与被动 STAR-RIS 相比,在实际表面元素数量上的优越性。
{"title":"Two-Timescale Design for Active STAR-RIS Aided Massive MIMO Systems","authors":"Anastasios K. Papazafeiropoulos, Hanxiao Ge, P. Kourtessis, T. Ratnarajah, S. Chatzinotas, S. Papavassiliou","doi":"10.48550/arXiv.2402.09896","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09896","url":null,"abstract":"Simultaneously transmitting and reflecting textcolor{black}{reconfigurable intelligent surface} (STAR-RIS) is a promising implementation of RIS-assisted systems that enables full-space coverage. However, STAR-RIS as well as conventional RIS suffer from the double-fading effect. Thus, in this paper, we propose the marriage of active RIS and STAR-RIS, denoted as ASTARS for massive multiple-input multiple-output (mMIMO) systems, and we focus on the energy splitting (ES) and mode switching (MS) protocols. Compared to prior literature, we consider the impact of correlated fading, and we rely our analysis on the two timescale protocol, being dependent on statistical channel state information (CSI). On this ground, we propose a channel estimation method for ASTARS with reduced overhead that accounts for its architecture. Next, we derive a textcolor{black}{closed-form expression} for the achievable sum-rate for both types of users in the transmission and reflection regions in a unified approach with significant practical advantages such as reduced complexity and overhead, which result in a lower number of required iterations for convergence compared to an alternating optimization (AO) approach. Notably, we maximize simultaneously the amplitudes, the phase shifts, and the active amplifying coefficients of the ASTARS by applying the projected gradient ascent method (PGAM). Remarkably, the proposed optimization can be executed at every several coherence intervals that reduces the processing burden considerably. Simulations corroborate the analytical results, provide insight into the effects of fundamental variables on the sum achievable SE, and present the superiority of 16 ASTARS compared to passive STAR-RIS for a practical number of surface elements.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"19 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ArXiv
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1