Pub Date : 2024-07-08DOI: 10.1007/s10994-024-06583-2
Joe Germino, Nuno Moniz, Nitesh V. Chawla
With the rise of artificial intelligence in our everyday lives, the need for human interpretation of machine learning models’ predictions emerges as a critical issue. Generally, interpretability is viewed as a binary notion with a performance trade-off. Either a model is fully-interpretable but lacks the ability to capture more complex patterns in the data, or it is a black box. In this paper, we argue that this view is severely limiting and that instead interpretability should be viewed as a continuous domain-informed concept. We leverage the well-known Mixture of Experts architecture with user-defined limits on non-interpretability. We extend this idea with a counterfactual fairness module to ensure the selection of consistently fair experts: FairMOE. We perform an extensive experimental evaluation with fairness-related data sets and compare our proposal against state-of-the-art methods. Our results demonstrate that FairMOE is competitive with the leading fairness-aware algorithms in both fairness and predictive measures while providing more consistent performance, competitive scalability, and, most importantly, greater interpretability.
{"title":"FairMOE: counterfactually-fair mixture of experts with levels of interpretability","authors":"Joe Germino, Nuno Moniz, Nitesh V. Chawla","doi":"10.1007/s10994-024-06583-2","DOIUrl":"https://doi.org/10.1007/s10994-024-06583-2","url":null,"abstract":"<p>With the rise of artificial intelligence in our everyday lives, the need for human interpretation of machine learning models’ predictions emerges as a critical issue. Generally, interpretability is viewed as a binary notion with a performance trade-off. Either a model is fully-interpretable but lacks the ability to capture more complex patterns in the data, or it is a black box. In this paper, we argue that this view is severely limiting and that instead interpretability should be viewed as a continuous domain-informed concept. We leverage the well-known Mixture of Experts architecture with user-defined limits on non-interpretability. We extend this idea with a counterfactual fairness module to ensure the selection of consistently <i>fair</i> experts: <b>FairMOE</b>. We perform an extensive experimental evaluation with fairness-related data sets and compare our proposal against state-of-the-art methods. Our results demonstrate that FairMOE is competitive with the leading fairness-aware algorithms in both fairness and predictive measures while providing more consistent performance, competitive scalability, and, most importantly, greater interpretability.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141574798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-08DOI: 10.1007/s10994-024-06590-3
Jakob Raymaekers, Peter J. Rousseeuw, Tim Verdonck, Ruicong Yao
Linear model trees are regression trees that incorporate linear models in the leaf nodes. This preserves the intuitive interpretation of decision trees and at the same time enables them to better capture linear relationships, which is hard for standard decision trees. But most existing methods for fitting linear model trees are time consuming and therefore not scalable to large data sets. In addition, they are more prone to overfitting and extrapolation issues than standard regression trees. In this paper we introduce PILOT, a new algorithm for linear model trees that is fast, regularized, stable and interpretable. PILOT trains in a greedy fashion like classic regression trees, but incorporates an L2 boosting approach and a model selection rule for fitting linear models in the nodes. The abbreviation PILOT stands for PIecewise Linear Organic Tree, where ‘organic’ refers to the fact that no pruning is carried out. PILOT has the same low time and space complexity as CART without its pruning. An empirical study indicates that PILOT tends to outperform standard decision trees and other linear model trees on a variety of data sets. Moreover, we prove its consistency in an additive model setting under weak assumptions. When the data is generated by a linear model, the convergence rate is polynomial.
线性模型树是在叶节点中加入线性模型的回归树。这既保留了决策树的直观解释,又能使其更好地捕捉线性关系,而标准决策树很难做到这一点。但是,现有的大多数拟合线性模型树的方法都很耗时,因此无法扩展到大型数据集。此外,与标准回归树相比,它们更容易出现过拟合和外推问题。在本文中,我们介绍了 PILOT,一种快速、正则化、稳定和可解释的线性模型树新算法。PILOT 与经典回归树一样采用贪婪方式进行训练,但在节点中加入了 L2 提升方法和拟合线性模型的模型选择规则。缩写 PILOT 是 PIecewise Linear Organic Tree 的缩写,其中的 "organic "指的是不进行修剪。PILOT 与不进行剪枝的 CART 一样,具有较低的时间和空间复杂度。实证研究表明,PILOT 在各种数据集上的表现往往优于标准决策树和其他线性模型树。此外,我们还证明了它在弱假设条件下的加法模型设置中的一致性。当数据由线性模型生成时,收敛速率为多项式。
{"title":"Fast linear model trees by PILOT","authors":"Jakob Raymaekers, Peter J. Rousseeuw, Tim Verdonck, Ruicong Yao","doi":"10.1007/s10994-024-06590-3","DOIUrl":"https://doi.org/10.1007/s10994-024-06590-3","url":null,"abstract":"<p>Linear model trees are regression trees that incorporate linear models in the leaf nodes. This preserves the intuitive interpretation of decision trees and at the same time enables them to better capture linear relationships, which is hard for standard decision trees. But most existing methods for fitting linear model trees are time consuming and therefore not scalable to large data sets. In addition, they are more prone to overfitting and extrapolation issues than standard regression trees. In this paper we introduce PILOT, a new algorithm for linear model trees that is fast, regularized, stable and interpretable. PILOT trains in a greedy fashion like classic regression trees, but incorporates an <i>L</i><sup>2</sup> boosting approach and a model selection rule for fitting linear models in the nodes. The abbreviation PILOT stands for PIecewise Linear Organic Tree, where ‘organic’ refers to the fact that no pruning is carried out. PILOT has the same low time and space complexity as CART without its pruning. An empirical study indicates that PILOT tends to outperform standard decision trees and other linear model trees on a variety of data sets. Moreover, we prove its consistency in an additive model setting under weak assumptions. When the data is generated by a linear model, the convergence rate is polynomial.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141574800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-08DOI: 10.1007/s10994-024-06558-3
Yeasung Jeong, Kangbok Lee, Young Woong Park, Sumin Han
In this paper, we propose systematic approaches for learning imbalanced data based on a two-regime process: regime 0, which generates excess zeros (majority class), and regime 1, which contributes to generating an outcome of one (minority class). The proposed model contains two latent equations: a split probit (logit) equation in the first stage and an ordinary probit (logit) equation in the second stage. Because boosting improves the accuracy of prediction versus using a single classifier, we combined a boosting strategy with the two-regime process. Thus, we developed the zero-inflated probit boost (ZIPBoost) and zero-inflated logit boost (ZILBoost) methods. We show that the weight functions of ZIPBoost have the desired properties for good predictive performance. Like AdaBoost, the weight functions upweight misclassified examples and downweight correctly classified examples. We show that the weight functions of ZILBoost have similar properties to those of LogitBoost. The algorithm will focus more on examples that are hard to classify in the next iteration, resulting in improved prediction accuracy. We provide the relative performance of ZIPBoost and ZILBoost, which rely on the excess kurtosis of the data distribution. Furthermore, we show the convergence and time complexity of our proposed methods. We demonstrate the performance of our proposed methods using a Monte Carlo simulation, mergers and acquisitions (M&A) data application, and imbalanced datasets from the Keel repository. The results of the experiments show that our proposed methods yield better prediction accuracy compared to other learning algorithms.
{"title":"A systematic approach for learning imbalanced data: enhancing zero-inflated models through boosting","authors":"Yeasung Jeong, Kangbok Lee, Young Woong Park, Sumin Han","doi":"10.1007/s10994-024-06558-3","DOIUrl":"https://doi.org/10.1007/s10994-024-06558-3","url":null,"abstract":"<p>In this paper, we propose systematic approaches for learning imbalanced data based on a two-regime process: regime 0, which generates excess zeros (majority class), and regime 1, which contributes to generating an outcome of one (minority class). The proposed model contains two latent equations: a split probit (logit) equation in the first stage and an ordinary probit (logit) equation in the second stage. Because boosting improves the accuracy of prediction versus using a single classifier, we combined a boosting strategy with the two-regime process. Thus, we developed the zero-inflated probit boost (ZIPBoost) and zero-inflated logit boost (ZILBoost) methods. We show that the weight functions of ZIPBoost have the desired properties for good predictive performance. Like AdaBoost, the weight functions upweight misclassified examples and downweight correctly classified examples. We show that the weight functions of ZILBoost have similar properties to those of LogitBoost. The algorithm will focus more on examples that are hard to classify in the next iteration, resulting in improved prediction accuracy. We provide the relative performance of ZIPBoost and ZILBoost, which rely on the excess kurtosis of the data distribution. Furthermore, we show the convergence and time complexity of our proposed methods. We demonstrate the performance of our proposed methods using a Monte Carlo simulation, mergers and acquisitions (M&A) data application, and imbalanced datasets from the Keel repository. The results of the experiments show that our proposed methods yield better prediction accuracy compared to other learning algorithms.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141574796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-03DOI: 10.1007/s10994-024-06556-5
Albert Nössig, Tobias Hell, Georg Moser
In this paper, we present a modular methodology that combines state-of-the-art methods in (stochastic) machine learning with well-established methods in inductive logic programming (ILP) and rule induction to provide efficient and scalable algorithms for the classification of vast data sets. By construction, these classifications are based on the synthesis of simple rules, thus providing direct explanations of the obtained classifications. Apart from evaluating our approach on the common large scale data sets MNIST, Fashion-MNIST and IMDB, we present novel results on explainable classifications of dental bills. The latter case study stems from an industrial collaboration with Allianz Private Krankenversicherung which is an insurance company offering diverse services in Germany.
{"title":"Rule learning by modularity","authors":"Albert Nössig, Tobias Hell, Georg Moser","doi":"10.1007/s10994-024-06556-5","DOIUrl":"https://doi.org/10.1007/s10994-024-06556-5","url":null,"abstract":"<p>In this paper, we present a modular methodology that combines state-of-the-art methods in (stochastic) machine learning with well-established methods in inductive logic programming (ILP) and rule induction to provide efficient and scalable algorithms for the classification of vast data sets. By construction, these classifications are based on the synthesis of simple rules, thus providing direct explanations of the obtained classifications. Apart from evaluating our approach on the common large scale data sets <i>MNIST</i>, <i>Fashion-MNIST</i> and <i>IMDB</i>, we present novel results on explainable classifications of dental bills. The latter case study stems from an industrial collaboration with <i>Allianz Private Krankenversicherung</i> which is an insurance company offering diverse services in Germany.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141551766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-02DOI: 10.1007/s10994-024-06575-2
Yinghua Yao, Yuangang Pan, Jing Li, Ivor Tsang, Xin Yao
Recent advancements in the realm of deep generative models focus on generating samples that satisfy multiple desired properties. However, prevalent approaches optimize these property functions independently, thus omitting the trade-offs among them. In addition, the property optimization is often improperly integrated into the generative models, resulting in an unnecessary compromise on generation quality (i.e., the quality of generated samples). To address these issues, we formulate a constrained optimization problem. It seeks to optimize generation quality while ensuring that generated samples reside at the Pareto front of multiple property objectives. Such a formulation enables the generation of samples that cannot be further improved simultaneously on the conflicting property functions and preserves good quality of generated samples.Building upon this formulation, we introduce the ParetO-gUided Diffusion model (PROUD), wherein the gradients in the denoising process are dynamically adjusted to enhance generation quality while the generated samples adhere to Pareto optimality. Experimental evaluations on image generation and protein generation tasks demonstrate that our PROUD consistently maintains superior generation quality while approaching Pareto optimality across multiple property functions compared to various baselines
{"title":"PROUD: PaRetO-gUided diffusion model for multi-objective generation","authors":"Yinghua Yao, Yuangang Pan, Jing Li, Ivor Tsang, Xin Yao","doi":"10.1007/s10994-024-06575-2","DOIUrl":"https://doi.org/10.1007/s10994-024-06575-2","url":null,"abstract":"<p>Recent advancements in the realm of deep generative models focus on generating samples that satisfy multiple desired properties. However, prevalent approaches optimize these property functions independently, thus omitting the trade-offs among them. In addition, the property optimization is often improperly integrated into the generative models, resulting in an unnecessary compromise on generation quality (i.e., the quality of generated samples). To address these issues, we formulate a constrained optimization problem. It seeks to optimize generation quality while ensuring that generated samples reside at the Pareto front of multiple property objectives. Such a formulation enables the generation of samples that cannot be further improved simultaneously on the conflicting property functions and preserves good quality of generated samples.Building upon this formulation, we introduce the ParetO-gUided Diffusion model (PROUD), wherein the gradients in the denoising process are dynamically adjusted to enhance generation quality while the generated samples adhere to Pareto optimality. Experimental evaluations on image generation and protein generation tasks demonstrate that our PROUD consistently maintains superior generation quality while approaching Pareto optimality across multiple property functions compared to various baselines</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141525355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-27DOI: 10.1007/s10994-024-06541-y
Ganyu Wang, Qingsong Zhang, Xiang Li, Boyu Wang, Bin Gu, Charles X. Ling
Vertical Federated Learning (VFL) is gaining increasing attention due to its ability to enable multiple parties to collaboratively train a privacy-preserving model using vertically partitioned data. Recent research has highlighted the advantages of using zeroth-order optimization (ZOO) in developing practical VFL algorithms. However, a significant drawback of ZOO-based VFL is its slow convergence rate, which limits its applicability in handling large modern models. To address this issue, we propose a cascaded hybrid optimization method for VFL. In this method, the downstream models (clients) are trained using ZOO to ensure privacy and prevent the sharing of internal information. Simultaneously, the upstream model (server) is updated locally using first-order optimization, which significantly improves the convergence rate. This approach allows for the training of large models without compromising privacy and security. We theoretically prove that our VFL method achieves faster convergence compared to ZOO-based VFL because the convergence rate of our framework is not limited by the size of the server model, making it effective for training large models. Extensive experiments demonstrate that our method achieves faster convergence than ZOO-based VFL while maintaining an equivalent level of privacy protection. Additionally, we demonstrate the feasibility of training large models using our method.
{"title":"Secure and fast asynchronous Vertical Federated Learning via cascaded hybrid optimization","authors":"Ganyu Wang, Qingsong Zhang, Xiang Li, Boyu Wang, Bin Gu, Charles X. Ling","doi":"10.1007/s10994-024-06541-y","DOIUrl":"https://doi.org/10.1007/s10994-024-06541-y","url":null,"abstract":"<p>Vertical Federated Learning (VFL) is gaining increasing attention due to its ability to enable multiple parties to collaboratively train a privacy-preserving model using vertically partitioned data. Recent research has highlighted the advantages of using zeroth-order optimization (ZOO) in developing practical VFL algorithms. However, a significant drawback of ZOO-based VFL is its slow convergence rate, which limits its applicability in handling large modern models. To address this issue, we propose a cascaded hybrid optimization method for VFL. In this method, the downstream models (clients) are trained using ZOO to ensure privacy and prevent the sharing of internal information. Simultaneously, the upstream model (server) is updated locally using first-order optimization, which significantly improves the convergence rate. This approach allows for the training of large models without compromising privacy and security. We theoretically prove that our VFL method achieves faster convergence compared to ZOO-based VFL because the convergence rate of our framework is not limited by the size of the server model, making it effective for training large models. Extensive experiments demonstrate that our method achieves faster convergence than ZOO-based VFL while maintaining an equivalent level of privacy protection. Additionally, we demonstrate the feasibility of training large models using our method.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141525356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-27DOI: 10.1007/s10994-024-06567-2
Arthur Hoarau, Vincent Lemaire, Yolande Le Gall, Jean-Christophe Dubois, Arnaud Martin
Recent studies in active learning, particularly in uncertainty sampling, have focused on the decomposition of model uncertainty into reducible and irreducible uncertainties. In this paper, the aim is to simplify the computational process while eliminating the dependence on observations. Crucially, the inherent uncertainty in the labels is considered, i.e. the uncertainty of the oracles. Two strategies are proposed, sampling by Klir uncertainty, which tackles the exploration–exploitation dilemma, and sampling by evidential epistemic uncertainty, which extends the concept of reducible uncertainty within the evidential framework, both using the theory of belief functions. Experimental results in active learning demonstrate that our proposed method can outperform uncertainty sampling.
{"title":"Evidential uncertainty sampling strategies for active learning","authors":"Arthur Hoarau, Vincent Lemaire, Yolande Le Gall, Jean-Christophe Dubois, Arnaud Martin","doi":"10.1007/s10994-024-06567-2","DOIUrl":"https://doi.org/10.1007/s10994-024-06567-2","url":null,"abstract":"<p>Recent studies in active learning, particularly in uncertainty sampling, have focused on the decomposition of model uncertainty into reducible and irreducible uncertainties. In this paper, the aim is to simplify the computational process while eliminating the dependence on observations. Crucially, the inherent uncertainty in the labels is considered, i.e. the uncertainty of the oracles. Two strategies are proposed, sampling by Klir uncertainty, which tackles the exploration–exploitation dilemma, and sampling by evidential epistemic uncertainty, which extends the concept of reducible uncertainty within the evidential framework, both using the theory of belief functions. Experimental results in active learning demonstrate that our proposed method can outperform uncertainty sampling.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-27DOI: 10.1007/s10994-024-06573-4
Gabor Paczolay, Matteo Papini, Alberto Maria Metelli, Istvan Harmati, Marcello Restelli
Several variance-reduced versions of REINFORCE based on importance sampling achieve an improved (O(epsilon ^{-3})) sample complexity to find an (epsilon)-stationary point, under an unrealistic assumption on the variance of the importance weights. In this paper, we propose the Defensive Policy Gradient (DEF-PG) algorithm, based on defensive importance sampling, achieving the same result without any assumption on the variance of the importance weights. We also show that this is not improvable by establishing a matching (Omega (epsilon ^{-3})) lower bound, and that REINFORCE with its (O(epsilon ^{-4})) sample complexity is actually optimal under weaker assumptions on the policy class. Numerical simulations show promising results for the proposed technique compared to similar algorithms based on vanilla importance sampling.
{"title":"Sample complexity of variance-reduced policy gradient: weaker assumptions and lower bounds","authors":"Gabor Paczolay, Matteo Papini, Alberto Maria Metelli, Istvan Harmati, Marcello Restelli","doi":"10.1007/s10994-024-06573-4","DOIUrl":"https://doi.org/10.1007/s10994-024-06573-4","url":null,"abstract":"<p>Several variance-reduced versions of REINFORCE based on importance sampling achieve an improved <span>(O(epsilon ^{-3}))</span> sample complexity to find an <span>(epsilon)</span>-stationary point, under an unrealistic assumption on the variance of the importance weights. In this paper, we propose the Defensive Policy Gradient (DEF-PG) algorithm, based on defensive importance sampling, achieving the same result without any assumption on the variance of the importance weights. We also show that this is not improvable by establishing a matching <span>(Omega (epsilon ^{-3}))</span> lower bound, and that REINFORCE with its <span>(O(epsilon ^{-4}))</span> sample complexity is actually optimal under weaker assumptions on the policy class. Numerical simulations show promising results for the proposed technique compared to similar algorithms based on vanilla importance sampling.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-25DOI: 10.1007/s10994-024-06578-z
Andrea Basteri, Dario Trevisan
Given any deep fully connected neural network, initialized with random Gaussian parameters, we bound from above the quadratic Wasserstein distance between its output distribution and a suitable Gaussian process. Our explicit inequalities indicate how the hidden and output layers sizes affect the Gaussian behaviour of the network and quantitatively recover the distributional convergence results in the wide limit, i.e., if all the hidden layers sizes become large.
{"title":"Quantitative Gaussian approximation of randomly initialized deep neural networks","authors":"Andrea Basteri, Dario Trevisan","doi":"10.1007/s10994-024-06578-z","DOIUrl":"https://doi.org/10.1007/s10994-024-06578-z","url":null,"abstract":"<p>Given any deep fully connected neural network, initialized with random Gaussian parameters, we bound from above the quadratic Wasserstein distance between its output distribution and a suitable Gaussian process. Our explicit inequalities indicate how the hidden and output layers sizes affect the Gaussian behaviour of the network and quantitatively recover the distributional convergence results in the wide limit, i.e., if all the hidden layers sizes become large.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141532684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-25DOI: 10.1007/s10994-024-06579-y
Manuel Dileo, Matteo Zignani
In Web3 social platforms, i.e. social web applications that rely on blockchain technology to support their functionalities, interactions among users are usually multimodal, from common social interactions such as following, liking, or posting, to specific relations given by crypto-token transfers facilitated by the blockchain. In this dynamic and intertwined networked context, modeled as a financial network, our main goals are (i) to predict whether a pair of users will be involved in a financial transaction, i.e. the transaction prediction task, even using textual information produced by users, and (ii) to verify whether performances may be enhanced by textual content. To address the above issues, we compared current snapshot-based temporal graph learning methods and developed T3GNN, a solution based on state-of-the-art temporal graph neural networks’ design, which integrates fine-tuned sentence embeddings and a simple yet effective graph-augmentation strategy for representing content, and historical negative sampling. We evaluated models in a Web3 context by leveraging a novel high-resolution temporal dataset, collected from one of the most used Web3 social platforms, which spans more than one year of financial interactions as well as published textual content. The experimental evaluation has shown that T3GNN consistently achieved the best performance over time and for most of the snapshots. Furthermore, through an extensive analysis of the performance of our model, we show that, despite the graph structure being crucial for making predictions, textual content contains useful information for forecasting transactions, highlighting an interplay between users’ interests and economic relationships in Web3 platforms. Finally, the evaluation has also highlighted the importance of adopting sampling methods alternative to random negative sampling when dealing with prediction tasks on temporal networks.
{"title":"Discrete-time graph neural networks for transaction prediction in Web3 social platforms","authors":"Manuel Dileo, Matteo Zignani","doi":"10.1007/s10994-024-06579-y","DOIUrl":"https://doi.org/10.1007/s10994-024-06579-y","url":null,"abstract":"<p>In Web3 social platforms, i.e. social web applications that rely on blockchain technology to support their functionalities, interactions among users are usually multimodal, from common social interactions such as following, liking, or posting, to specific relations given by crypto-token transfers facilitated by the blockchain. In this dynamic and intertwined networked context, modeled as a financial network, our main goals are (i) to predict whether a pair of users will be involved in a financial transaction, i.e. the <i>transaction prediction task</i>, even using textual information produced by users, and (ii) to verify whether performances may be enhanced by textual content. To address the above issues, we compared current snapshot-based temporal graph learning methods and developed T3GNN, a solution based on state-of-the-art temporal graph neural networks’ design, which integrates fine-tuned sentence embeddings and a simple yet effective graph-augmentation strategy for representing content, and historical negative sampling. We evaluated models in a Web3 context by leveraging a novel high-resolution temporal dataset, collected from one of the most used Web3 social platforms, which spans more than one year of financial interactions as well as published textual content. The experimental evaluation has shown that T3GNN consistently achieved the best performance over time and for most of the snapshots. Furthermore, through an extensive analysis of the performance of our model, we show that, despite the graph structure being crucial for making predictions, textual content contains useful information for forecasting transactions, highlighting an interplay between users’ interests and economic relationships in Web3 platforms. Finally, the evaluation has also highlighted the importance of adopting sampling methods alternative to random negative sampling when dealing with prediction tasks on temporal networks.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}