首页 > 最新文献

Machine Learning最新文献

英文 中文
FairMOE: counterfactually-fair mixture of experts with levels of interpretability FairMOE: 具有可解释性水平的反事实公平专家混合物
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-08 DOI: 10.1007/s10994-024-06583-2
Joe Germino, Nuno Moniz, Nitesh V. Chawla

With the rise of artificial intelligence in our everyday lives, the need for human interpretation of machine learning models’ predictions emerges as a critical issue. Generally, interpretability is viewed as a binary notion with a performance trade-off. Either a model is fully-interpretable but lacks the ability to capture more complex patterns in the data, or it is a black box. In this paper, we argue that this view is severely limiting and that instead interpretability should be viewed as a continuous domain-informed concept. We leverage the well-known Mixture of Experts architecture with user-defined limits on non-interpretability. We extend this idea with a counterfactual fairness module to ensure the selection of consistently fair experts: FairMOE. We perform an extensive experimental evaluation with fairness-related data sets and compare our proposal against state-of-the-art methods. Our results demonstrate that FairMOE is competitive with the leading fairness-aware algorithms in both fairness and predictive measures while providing more consistent performance, competitive scalability, and, most importantly, greater interpretability.

随着人工智能在我们日常生活中的兴起,人类对机器学习模型预测的解释需求成为一个关键问题。一般来说,可解释性被视为一个二元概念,需要进行性能权衡。要么模型是完全可解释的,但缺乏捕捉数据中更复杂模式的能力,要么就是一个黑盒子。在本文中,我们认为这种观点具有严重的局限性,可解释性应被视为一个连续的领域信息概念。我们利用著名的混合专家架构,用户可自定义对不可解释性的限制。我们用一个反事实公平模块扩展了这一想法,以确保选择始终公平的专家:FairMOE。我们使用与公平性相关的数据集进行了广泛的实验评估,并将我们的建议与最先进的方法进行了比较。我们的结果表明,FairMOE 在公平性和预测性方面都能与领先的公平感知算法相媲美,同时还能提供更稳定的性能、更有竞争力的可扩展性,更重要的是,它还具有更强的可解释性。
{"title":"FairMOE: counterfactually-fair mixture of experts with levels of interpretability","authors":"Joe Germino, Nuno Moniz, Nitesh V. Chawla","doi":"10.1007/s10994-024-06583-2","DOIUrl":"https://doi.org/10.1007/s10994-024-06583-2","url":null,"abstract":"<p>With the rise of artificial intelligence in our everyday lives, the need for human interpretation of machine learning models’ predictions emerges as a critical issue. Generally, interpretability is viewed as a binary notion with a performance trade-off. Either a model is fully-interpretable but lacks the ability to capture more complex patterns in the data, or it is a black box. In this paper, we argue that this view is severely limiting and that instead interpretability should be viewed as a continuous domain-informed concept. We leverage the well-known Mixture of Experts architecture with user-defined limits on non-interpretability. We extend this idea with a counterfactual fairness module to ensure the selection of consistently <i>fair</i> experts: <b>FairMOE</b>. We perform an extensive experimental evaluation with fairness-related data sets and compare our proposal against state-of-the-art methods. Our results demonstrate that FairMOE is competitive with the leading fairness-aware algorithms in both fairness and predictive measures while providing more consistent performance, competitive scalability, and, most importantly, greater interpretability.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141574798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast linear model trees by PILOT PILOT 快速线性模型树
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-08 DOI: 10.1007/s10994-024-06590-3
Jakob Raymaekers, Peter J. Rousseeuw, Tim Verdonck, Ruicong Yao

Linear model trees are regression trees that incorporate linear models in the leaf nodes. This preserves the intuitive interpretation of decision trees and at the same time enables them to better capture linear relationships, which is hard for standard decision trees. But most existing methods for fitting linear model trees are time consuming and therefore not scalable to large data sets. In addition, they are more prone to overfitting and extrapolation issues than standard regression trees. In this paper we introduce PILOT, a new algorithm for linear model trees that is fast, regularized, stable and interpretable. PILOT trains in a greedy fashion like classic regression trees, but incorporates an L2 boosting approach and a model selection rule for fitting linear models in the nodes. The abbreviation PILOT stands for PIecewise Linear Organic Tree, where ‘organic’ refers to the fact that no pruning is carried out. PILOT has the same low time and space complexity as CART without its pruning. An empirical study indicates that PILOT tends to outperform standard decision trees and other linear model trees on a variety of data sets. Moreover, we prove its consistency in an additive model setting under weak assumptions. When the data is generated by a linear model, the convergence rate is polynomial.

线性模型树是在叶节点中加入线性模型的回归树。这既保留了决策树的直观解释,又能使其更好地捕捉线性关系,而标准决策树很难做到这一点。但是,现有的大多数拟合线性模型树的方法都很耗时,因此无法扩展到大型数据集。此外,与标准回归树相比,它们更容易出现过拟合和外推问题。在本文中,我们介绍了 PILOT,一种快速、正则化、稳定和可解释的线性模型树新算法。PILOT 与经典回归树一样采用贪婪方式进行训练,但在节点中加入了 L2 提升方法和拟合线性模型的模型选择规则。缩写 PILOT 是 PIecewise Linear Organic Tree 的缩写,其中的 "organic "指的是不进行修剪。PILOT 与不进行剪枝的 CART 一样,具有较低的时间和空间复杂度。实证研究表明,PILOT 在各种数据集上的表现往往优于标准决策树和其他线性模型树。此外,我们还证明了它在弱假设条件下的加法模型设置中的一致性。当数据由线性模型生成时,收敛速率为多项式。
{"title":"Fast linear model trees by PILOT","authors":"Jakob Raymaekers, Peter J. Rousseeuw, Tim Verdonck, Ruicong Yao","doi":"10.1007/s10994-024-06590-3","DOIUrl":"https://doi.org/10.1007/s10994-024-06590-3","url":null,"abstract":"<p>Linear model trees are regression trees that incorporate linear models in the leaf nodes. This preserves the intuitive interpretation of decision trees and at the same time enables them to better capture linear relationships, which is hard for standard decision trees. But most existing methods for fitting linear model trees are time consuming and therefore not scalable to large data sets. In addition, they are more prone to overfitting and extrapolation issues than standard regression trees. In this paper we introduce PILOT, a new algorithm for linear model trees that is fast, regularized, stable and interpretable. PILOT trains in a greedy fashion like classic regression trees, but incorporates an <i>L</i><sup>2</sup> boosting approach and a model selection rule for fitting linear models in the nodes. The abbreviation PILOT stands for PIecewise Linear Organic Tree, where ‘organic’ refers to the fact that no pruning is carried out. PILOT has the same low time and space complexity as CART without its pruning. An empirical study indicates that PILOT tends to outperform standard decision trees and other linear model trees on a variety of data sets. Moreover, we prove its consistency in an additive model setting under weak assumptions. When the data is generated by a linear model, the convergence rate is polynomial.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141574800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A systematic approach for learning imbalanced data: enhancing zero-inflated models through boosting 学习不平衡数据的系统方法:通过提升增强零膨胀模型
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-08 DOI: 10.1007/s10994-024-06558-3
Yeasung Jeong, Kangbok Lee, Young Woong Park, Sumin Han

In this paper, we propose systematic approaches for learning imbalanced data based on a two-regime process: regime 0, which generates excess zeros (majority class), and regime 1, which contributes to generating an outcome of one (minority class). The proposed model contains two latent equations: a split probit (logit) equation in the first stage and an ordinary probit (logit) equation in the second stage. Because boosting improves the accuracy of prediction versus using a single classifier, we combined a boosting strategy with the two-regime process. Thus, we developed the zero-inflated probit boost (ZIPBoost) and zero-inflated logit boost (ZILBoost) methods. We show that the weight functions of ZIPBoost have the desired properties for good predictive performance. Like AdaBoost, the weight functions upweight misclassified examples and downweight correctly classified examples. We show that the weight functions of ZILBoost have similar properties to those of LogitBoost. The algorithm will focus more on examples that are hard to classify in the next iteration, resulting in improved prediction accuracy. We provide the relative performance of ZIPBoost and ZILBoost, which rely on the excess kurtosis of the data distribution. Furthermore, we show the convergence and time complexity of our proposed methods. We demonstrate the performance of our proposed methods using a Monte Carlo simulation, mergers and acquisitions (M&A) data application, and imbalanced datasets from the Keel repository. The results of the experiments show that our proposed methods yield better prediction accuracy compared to other learning algorithms.

在本文中,我们提出了基于两制度过程学习不平衡数据的系统方法:制度 0(产生多余的零(多数类))和制度 1(有助于产生结果为一(少数类))。所提出的模型包含两个潜在方程:第一阶段为分裂概率(logit)方程,第二阶段为普通概率(logit)方程。与使用单一分类器相比,提升策略可提高预测的准确性,因此我们将提升策略与双因子过程相结合。因此,我们开发了零膨胀概率提升(ZIPBoost)和零膨胀对数提升(ZILBoost)方法。我们证明,ZIPBoost 的权重函数具有良好预测性能所需的特性。与 AdaBoost 一样,权重函数对错误分类的示例加权,对正确分类的示例减权。我们表明,ZILBoost 的权重函数与 LogitBoost 的权重函数具有相似的特性。该算法在下一次迭代中会更关注难以分类的示例,从而提高预测准确率。我们提供了 ZIPBoost 和 ZILBoost 的相对性能,它们都依赖于数据分布的过度峰度。此外,我们还展示了所提方法的收敛性和时间复杂性。我们使用蒙特卡罗模拟、并购(M&A)数据应用以及 Keel 数据库中的不平衡数据集演示了我们提出的方法的性能。实验结果表明,与其他学习算法相比,我们提出的方法具有更高的预测准确性。
{"title":"A systematic approach for learning imbalanced data: enhancing zero-inflated models through boosting","authors":"Yeasung Jeong, Kangbok Lee, Young Woong Park, Sumin Han","doi":"10.1007/s10994-024-06558-3","DOIUrl":"https://doi.org/10.1007/s10994-024-06558-3","url":null,"abstract":"<p>In this paper, we propose systematic approaches for learning imbalanced data based on a two-regime process: regime 0, which generates excess zeros (majority class), and regime 1, which contributes to generating an outcome of one (minority class). The proposed model contains two latent equations: a split probit (logit) equation in the first stage and an ordinary probit (logit) equation in the second stage. Because boosting improves the accuracy of prediction versus using a single classifier, we combined a boosting strategy with the two-regime process. Thus, we developed the zero-inflated probit boost (ZIPBoost) and zero-inflated logit boost (ZILBoost) methods. We show that the weight functions of ZIPBoost have the desired properties for good predictive performance. Like AdaBoost, the weight functions upweight misclassified examples and downweight correctly classified examples. We show that the weight functions of ZILBoost have similar properties to those of LogitBoost. The algorithm will focus more on examples that are hard to classify in the next iteration, resulting in improved prediction accuracy. We provide the relative performance of ZIPBoost and ZILBoost, which rely on the excess kurtosis of the data distribution. Furthermore, we show the convergence and time complexity of our proposed methods. We demonstrate the performance of our proposed methods using a Monte Carlo simulation, mergers and acquisitions (M&amp;A) data application, and imbalanced datasets from the Keel repository. The results of the experiments show that our proposed methods yield better prediction accuracy compared to other learning algorithms.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141574796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rule learning by modularity 通过模块化学习规则
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-03 DOI: 10.1007/s10994-024-06556-5
Albert Nössig, Tobias Hell, Georg Moser

In this paper, we present a modular methodology that combines state-of-the-art methods in (stochastic) machine learning with well-established methods in inductive logic programming (ILP) and rule induction to provide efficient and scalable algorithms for the classification of vast data sets. By construction, these classifications are based on the synthesis of simple rules, thus providing direct explanations of the obtained classifications. Apart from evaluating our approach on the common large scale data sets MNIST, Fashion-MNIST and IMDB, we present novel results on explainable classifications of dental bills. The latter case study stems from an industrial collaboration with Allianz Private Krankenversicherung which is an insurance company offering diverse services in Germany.

在本文中,我们介绍了一种模块化方法,它将最先进的(随机)机器学习方法与归纳逻辑编程(ILP)和规则归纳的成熟方法相结合,为海量数据集的分类提供了高效、可扩展的算法。从结构上看,这些分类基于简单规则的综合,从而为所获得的分类提供了直接解释。除了在常见的大规模数据集 MNIST、Fashion-MNIST 和 IMDB 上评估我们的方法外,我们还展示了牙科账单可解释分类的新成果。后一个案例研究源于与安联私人医疗保险公司(Allianz Private Krankenversicherung)的行业合作,该公司是一家在德国提供多种服务的保险公司。
{"title":"Rule learning by modularity","authors":"Albert Nössig, Tobias Hell, Georg Moser","doi":"10.1007/s10994-024-06556-5","DOIUrl":"https://doi.org/10.1007/s10994-024-06556-5","url":null,"abstract":"<p>In this paper, we present a modular methodology that combines state-of-the-art methods in (stochastic) machine learning with well-established methods in inductive logic programming (ILP) and rule induction to provide efficient and scalable algorithms for the classification of vast data sets. By construction, these classifications are based on the synthesis of simple rules, thus providing direct explanations of the obtained classifications. Apart from evaluating our approach on the common large scale data sets <i>MNIST</i>, <i>Fashion-MNIST</i> and <i>IMDB</i>, we present novel results on explainable classifications of dental bills. The latter case study stems from an industrial collaboration with <i>Allianz Private Krankenversicherung</i> which is an insurance company offering diverse services in Germany.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141551766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PROUD: PaRetO-gUided diffusion model for multi-objective generation PROUD:PaRetO-gUided 多目标生成扩散模型
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-02 DOI: 10.1007/s10994-024-06575-2
Yinghua Yao, Yuangang Pan, Jing Li, Ivor Tsang, Xin Yao

Recent advancements in the realm of deep generative models focus on generating samples that satisfy multiple desired properties. However, prevalent approaches optimize these property functions independently, thus omitting the trade-offs among them. In addition, the property optimization is often improperly integrated into the generative models, resulting in an unnecessary compromise on generation quality (i.e., the quality of generated samples). To address these issues, we formulate a constrained optimization problem. It seeks to optimize generation quality while ensuring that generated samples reside at the Pareto front of multiple property objectives. Such a formulation enables the generation of samples that cannot be further improved simultaneously on the conflicting property functions and preserves good quality of generated samples.Building upon this formulation, we introduce the ParetO-gUided Diffusion model (PROUD), wherein the gradients in the denoising process are dynamically adjusted to enhance generation quality while the generated samples adhere to Pareto optimality. Experimental evaluations on image generation and protein generation tasks demonstrate that our PROUD consistently maintains superior generation quality while approaching Pareto optimality across multiple property functions compared to various baselines

深度生成模型领域的最新进展侧重于生成满足多种所需属性的样本。然而,普遍的方法都是独立优化这些属性函数,从而忽略了它们之间的权衡。此外,属性优化往往被不恰当地集成到生成模型中,导致生成质量(即生成样本的质量)受到不必要的影响。为了解决这些问题,我们提出了一个约束优化问题。该问题旨在优化生成质量,同时确保生成的样本位于多个属性目标的帕累托前沿。在此基础上,我们引入了 ParetO-gUided Diffusion 模型(PROUD),动态调整去噪过程中的梯度,以提高生成质量,同时使生成的样本符合帕累托最优性。对图像生成和蛋白质生成任务的实验评估表明,与各种基线相比,我们的 PROUD 始终保持着卓越的生成质量,同时在多个属性函数中接近帕累托最优。
{"title":"PROUD: PaRetO-gUided diffusion model for multi-objective generation","authors":"Yinghua Yao, Yuangang Pan, Jing Li, Ivor Tsang, Xin Yao","doi":"10.1007/s10994-024-06575-2","DOIUrl":"https://doi.org/10.1007/s10994-024-06575-2","url":null,"abstract":"<p>Recent advancements in the realm of deep generative models focus on generating samples that satisfy multiple desired properties. However, prevalent approaches optimize these property functions independently, thus omitting the trade-offs among them. In addition, the property optimization is often improperly integrated into the generative models, resulting in an unnecessary compromise on generation quality (i.e., the quality of generated samples). To address these issues, we formulate a constrained optimization problem. It seeks to optimize generation quality while ensuring that generated samples reside at the Pareto front of multiple property objectives. Such a formulation enables the generation of samples that cannot be further improved simultaneously on the conflicting property functions and preserves good quality of generated samples.Building upon this formulation, we introduce the ParetO-gUided Diffusion model (PROUD), wherein the gradients in the denoising process are dynamically adjusted to enhance generation quality while the generated samples adhere to Pareto optimality. Experimental evaluations on image generation and protein generation tasks demonstrate that our PROUD consistently maintains superior generation quality while approaching Pareto optimality across multiple property functions compared to various baselines</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141525355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Secure and fast asynchronous Vertical Federated Learning via cascaded hybrid optimization 通过级联混合优化实现安全快速的异步垂直联合学习
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-06-27 DOI: 10.1007/s10994-024-06541-y
Ganyu Wang, Qingsong Zhang, Xiang Li, Boyu Wang, Bin Gu, Charles X. Ling

Vertical Federated Learning (VFL) is gaining increasing attention due to its ability to enable multiple parties to collaboratively train a privacy-preserving model using vertically partitioned data. Recent research has highlighted the advantages of using zeroth-order optimization (ZOO) in developing practical VFL algorithms. However, a significant drawback of ZOO-based VFL is its slow convergence rate, which limits its applicability in handling large modern models. To address this issue, we propose a cascaded hybrid optimization method for VFL. In this method, the downstream models (clients) are trained using ZOO to ensure privacy and prevent the sharing of internal information. Simultaneously, the upstream model (server) is updated locally using first-order optimization, which significantly improves the convergence rate. This approach allows for the training of large models without compromising privacy and security. We theoretically prove that our VFL method achieves faster convergence compared to ZOO-based VFL because the convergence rate of our framework is not limited by the size of the server model, making it effective for training large models. Extensive experiments demonstrate that our method achieves faster convergence than ZOO-based VFL while maintaining an equivalent level of privacy protection. Additionally, we demonstrate the feasibility of training large models using our method.

垂直联合学习(Vertical Federated Learning,VFL)能够让多方利用垂直分割的数据协作训练一个保护隐私的模型,因此越来越受到关注。最近的研究凸显了使用零阶优化(ZOO)开发实用 VFL 算法的优势。然而,基于 ZOO 的 VFL 的一个显著缺点是收敛速度慢,这限制了它在处理大型现代模型时的适用性。为了解决这个问题,我们提出了一种级联混合优化 VFL 方法。在这种方法中,下游模型(客户端)使用 ZOO 进行训练,以确保隐私并防止内部信息共享。同时,上游模型(服务器)使用一阶优化进行本地更新,从而显著提高收敛速度。这种方法可以在不影响隐私和安全的情况下训练大型模型。我们从理论上证明,与基于 ZOO 的 VFL 相比,我们的 VFL 方法能实现更快的收敛速度,因为我们框架的收敛速度不受服务器模型大小的限制,使其能有效地训练大型模型。大量实验证明,我们的方法比基于 ZOO 的 VFL 收敛速度更快,同时保持了同等水平的隐私保护。此外,我们还证明了使用我们的方法训练大型模型的可行性。
{"title":"Secure and fast asynchronous Vertical Federated Learning via cascaded hybrid optimization","authors":"Ganyu Wang, Qingsong Zhang, Xiang Li, Boyu Wang, Bin Gu, Charles X. Ling","doi":"10.1007/s10994-024-06541-y","DOIUrl":"https://doi.org/10.1007/s10994-024-06541-y","url":null,"abstract":"<p>Vertical Federated Learning (VFL) is gaining increasing attention due to its ability to enable multiple parties to collaboratively train a privacy-preserving model using vertically partitioned data. Recent research has highlighted the advantages of using zeroth-order optimization (ZOO) in developing practical VFL algorithms. However, a significant drawback of ZOO-based VFL is its slow convergence rate, which limits its applicability in handling large modern models. To address this issue, we propose a cascaded hybrid optimization method for VFL. In this method, the downstream models (clients) are trained using ZOO to ensure privacy and prevent the sharing of internal information. Simultaneously, the upstream model (server) is updated locally using first-order optimization, which significantly improves the convergence rate. This approach allows for the training of large models without compromising privacy and security. We theoretically prove that our VFL method achieves faster convergence compared to ZOO-based VFL because the convergence rate of our framework is not limited by the size of the server model, making it effective for training large models. Extensive experiments demonstrate that our method achieves faster convergence than ZOO-based VFL while maintaining an equivalent level of privacy protection. Additionally, we demonstrate the feasibility of training large models using our method.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141525356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evidential uncertainty sampling strategies for active learning 主动学习的证据不确定性抽样策略
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-06-27 DOI: 10.1007/s10994-024-06567-2
Arthur Hoarau, Vincent Lemaire, Yolande Le Gall, Jean-Christophe Dubois, Arnaud Martin

Recent studies in active learning, particularly in uncertainty sampling, have focused on the decomposition of model uncertainty into reducible and irreducible uncertainties. In this paper, the aim is to simplify the computational process while eliminating the dependence on observations. Crucially, the inherent uncertainty in the labels is considered, i.e. the uncertainty of the oracles. Two strategies are proposed, sampling by Klir uncertainty, which tackles the exploration–exploitation dilemma, and sampling by evidential epistemic uncertainty, which extends the concept of reducible uncertainty within the evidential framework, both using the theory of belief functions. Experimental results in active learning demonstrate that our proposed method can outperform uncertainty sampling.

最近的主动学习研究,尤其是不确定性采样研究,主要集中在将模型的不确定性分解为可还原不确定性和不可还原不确定性。本文的目的是简化计算过程,同时消除对观测结果的依赖。最重要的是,本文考虑了标签中固有的不确定性,即指标的不确定性。我们提出了两种策略,一种是通过克利尔不确定性进行采样,以解决探索-开发两难的问题;另一种是通过证据认识论不确定性进行采样,在证据框架内扩展了可还原不确定性的概念,这两种策略都使用了信念函数理论。主动学习的实验结果表明,我们提出的方法可以超越不确定性采样。
{"title":"Evidential uncertainty sampling strategies for active learning","authors":"Arthur Hoarau, Vincent Lemaire, Yolande Le Gall, Jean-Christophe Dubois, Arnaud Martin","doi":"10.1007/s10994-024-06567-2","DOIUrl":"https://doi.org/10.1007/s10994-024-06567-2","url":null,"abstract":"<p>Recent studies in active learning, particularly in uncertainty sampling, have focused on the decomposition of model uncertainty into reducible and irreducible uncertainties. In this paper, the aim is to simplify the computational process while eliminating the dependence on observations. Crucially, the inherent uncertainty in the labels is considered, i.e. the uncertainty of the oracles. Two strategies are proposed, sampling by Klir uncertainty, which tackles the exploration–exploitation dilemma, and sampling by evidential epistemic uncertainty, which extends the concept of reducible uncertainty within the evidential framework, both using the theory of belief functions. Experimental results in active learning demonstrate that our proposed method can outperform uncertainty sampling.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sample complexity of variance-reduced policy gradient: weaker assumptions and lower bounds 方差缩小政策梯度的样本复杂性:较弱的假设和下限
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-06-27 DOI: 10.1007/s10994-024-06573-4
Gabor Paczolay, Matteo Papini, Alberto Maria Metelli, Istvan Harmati, Marcello Restelli

Several variance-reduced versions of REINFORCE based on importance sampling achieve an improved (O(epsilon ^{-3})) sample complexity to find an (epsilon)-stationary point, under an unrealistic assumption on the variance of the importance weights. In this paper, we propose the Defensive Policy Gradient (DEF-PG) algorithm, based on defensive importance sampling, achieving the same result without any assumption on the variance of the importance weights. We also show that this is not improvable by establishing a matching (Omega (epsilon ^{-3})) lower bound, and that REINFORCE with its (O(epsilon ^{-4})) sample complexity is actually optimal under weaker assumptions on the policy class. Numerical simulations show promising results for the proposed technique compared to similar algorithms based on vanilla importance sampling.

在对重要性权重的方差做了不切实际的假设的情况下,基于重要性采样的 REINFORCE 的几个方差降低版本实现了改进的 (O(epsilon ^{-3}))采样复杂度,从而找到了一个 (epsilon)-stationary point。在本文中,我们提出了基于防御性重要性采样的防御策略梯度(DEF-PG)算法,在不假设重要性权重方差的情况下实现了相同的结果。我们还通过建立一个匹配的 (ω (epsilon ^{-3}))下限证明了这一算法无法改进,而且在政策类的较弱假设下,具有 (O(epsilon ^{-4}))采样复杂度的 REINFORCE 实际上是最优的。数值模拟显示,与基于香草重要性采样的类似算法相比,所提出的技术具有良好的效果。
{"title":"Sample complexity of variance-reduced policy gradient: weaker assumptions and lower bounds","authors":"Gabor Paczolay, Matteo Papini, Alberto Maria Metelli, Istvan Harmati, Marcello Restelli","doi":"10.1007/s10994-024-06573-4","DOIUrl":"https://doi.org/10.1007/s10994-024-06573-4","url":null,"abstract":"<p>Several variance-reduced versions of REINFORCE based on importance sampling achieve an improved <span>(O(epsilon ^{-3}))</span> sample complexity to find an <span>(epsilon)</span>-stationary point, under an unrealistic assumption on the variance of the importance weights. In this paper, we propose the Defensive Policy Gradient (DEF-PG) algorithm, based on defensive importance sampling, achieving the same result without any assumption on the variance of the importance weights. We also show that this is not improvable by establishing a matching <span>(Omega (epsilon ^{-3}))</span> lower bound, and that REINFORCE with its <span>(O(epsilon ^{-4}))</span> sample complexity is actually optimal under weaker assumptions on the policy class. Numerical simulations show promising results for the proposed technique compared to similar algorithms based on vanilla importance sampling.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantitative Gaussian approximation of randomly initialized deep neural networks 随机初始化深度神经网络的定量高斯逼近
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-06-25 DOI: 10.1007/s10994-024-06578-z
Andrea Basteri, Dario Trevisan

Given any deep fully connected neural network, initialized with random Gaussian parameters, we bound from above the quadratic Wasserstein distance between its output distribution and a suitable Gaussian process. Our explicit inequalities indicate how the hidden and output layers sizes affect the Gaussian behaviour of the network and quantitatively recover the distributional convergence results in the wide limit, i.e., if all the hidden layers sizes become large.

给定任何以随机高斯参数初始化的深度全连接神经网络,我们从上面约束了其输出分布与合适的高斯过程之间的二次瓦瑟斯坦距离。我们的显式不等式指出了隐藏层和输出层的大小如何影响网络的高斯行为,并定量地恢复了广义极限的分布收敛结果,即如果所有隐藏层的大小都变得很大。
{"title":"Quantitative Gaussian approximation of randomly initialized deep neural networks","authors":"Andrea Basteri, Dario Trevisan","doi":"10.1007/s10994-024-06578-z","DOIUrl":"https://doi.org/10.1007/s10994-024-06578-z","url":null,"abstract":"<p>Given any deep fully connected neural network, initialized with random Gaussian parameters, we bound from above the quadratic Wasserstein distance between its output distribution and a suitable Gaussian process. Our explicit inequalities indicate how the hidden and output layers sizes affect the Gaussian behaviour of the network and quantitatively recover the distributional convergence results in the wide limit, i.e., if all the hidden layers sizes become large.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141532684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discrete-time graph neural networks for transaction prediction in Web3 social platforms 用于 Web3 社交平台交易预测的离散时间图神经网络
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-06-25 DOI: 10.1007/s10994-024-06579-y
Manuel Dileo, Matteo Zignani

In Web3 social platforms, i.e. social web applications that rely on blockchain technology to support their functionalities, interactions among users are usually multimodal, from common social interactions such as following, liking, or posting, to specific relations given by crypto-token transfers facilitated by the blockchain. In this dynamic and intertwined networked context, modeled as a financial network, our main goals are (i) to predict whether a pair of users will be involved in a financial transaction, i.e. the transaction prediction task, even using textual information produced by users, and (ii) to verify whether performances may be enhanced by textual content. To address the above issues, we compared current snapshot-based temporal graph learning methods and developed T3GNN, a solution based on state-of-the-art temporal graph neural networks’ design, which integrates fine-tuned sentence embeddings and a simple yet effective graph-augmentation strategy for representing content, and historical negative sampling. We evaluated models in a Web3 context by leveraging a novel high-resolution temporal dataset, collected from one of the most used Web3 social platforms, which spans more than one year of financial interactions as well as published textual content. The experimental evaluation has shown that T3GNN consistently achieved the best performance over time and for most of the snapshots. Furthermore, through an extensive analysis of the performance of our model, we show that, despite the graph structure being crucial for making predictions, textual content contains useful information for forecasting transactions, highlighting an interplay between users’ interests and economic relationships in Web3 platforms. Finally, the evaluation has also highlighted the importance of adopting sampling methods alternative to random negative sampling when dealing with prediction tasks on temporal networks.

在 Web3 社交平台(即依靠区块链技术来支持其功能的社交网络应用程序)中,用户之间的互动通常是多模式的,从关注、点赞或发帖等普通社交互动,到区块链促成的加密令牌转账所带来的特定关系。在这种以金融网络为模型的动态、交织的网络背景下,我们的主要目标是:(i)预测一对用户是否会参与金融交易,即交易预测任务,甚至使用用户生成的文本信息;(ii)验证是否可以通过文本内容提高性能。为了解决上述问题,我们比较了当前基于快照的时态图学习方法,并开发了 T3GNN,这是一种基于最先进的时态图神经网络设计的解决方案,它集成了微调句子嵌入、简单而有效的图增强策略(用于表示内容)和历史负采样。我们利用从最常用的 Web3 社交平台之一收集的新型高分辨率时态数据集,对 Web3 环境下的模型进行了评估,该数据集涵盖了一年多的金融互动以及发布的文本内容。实验评估结果表明,T3GNN 在大部分时间和大部分快照中都始终保持着最佳性能。此外,通过对模型性能的广泛分析,我们发现,尽管图结构对预测至关重要,但文本内容也包含了预测交易的有用信息,这凸显了 Web3 平台中用户兴趣与经济关系之间的相互作用。最后,评估还强调了在处理时态网络预测任务时采用随机负抽样以外的抽样方法的重要性。
{"title":"Discrete-time graph neural networks for transaction prediction in Web3 social platforms","authors":"Manuel Dileo, Matteo Zignani","doi":"10.1007/s10994-024-06579-y","DOIUrl":"https://doi.org/10.1007/s10994-024-06579-y","url":null,"abstract":"<p>In Web3 social platforms, i.e. social web applications that rely on blockchain technology to support their functionalities, interactions among users are usually multimodal, from common social interactions such as following, liking, or posting, to specific relations given by crypto-token transfers facilitated by the blockchain. In this dynamic and intertwined networked context, modeled as a financial network, our main goals are (i) to predict whether a pair of users will be involved in a financial transaction, i.e. the <i>transaction prediction task</i>, even using textual information produced by users, and (ii) to verify whether performances may be enhanced by textual content. To address the above issues, we compared current snapshot-based temporal graph learning methods and developed T3GNN, a solution based on state-of-the-art temporal graph neural networks’ design, which integrates fine-tuned sentence embeddings and a simple yet effective graph-augmentation strategy for representing content, and historical negative sampling. We evaluated models in a Web3 context by leveraging a novel high-resolution temporal dataset, collected from one of the most used Web3 social platforms, which spans more than one year of financial interactions as well as published textual content. The experimental evaluation has shown that T3GNN consistently achieved the best performance over time and for most of the snapshots. Furthermore, through an extensive analysis of the performance of our model, we show that, despite the graph structure being crucial for making predictions, textual content contains useful information for forecasting transactions, highlighting an interplay between users’ interests and economic relationships in Web3 platforms. Finally, the evaluation has also highlighted the importance of adopting sampling methods alternative to random negative sampling when dealing with prediction tasks on temporal networks.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Machine Learning
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1