Pub Date : 2024-07-12DOI: 10.1007/s10994-024-06576-1
Giuseppe Serra, Mathias Niepert
Graph Neural Networks (GNNs) are a popular class of machine learning models. Inspired by the learning to explain (L2X) paradigm, we propose L2xGnn, a framework for explainable GNNs which provides faithful explanations by design. L2xGnn learns a mechanism for selecting explanatory subgraphs (motifs) which are exclusively used in the GNNs message-passing operations. L2xGnn is able to select, for each input graph, a subgraph with specific properties such as being sparse and connected. Imposing such constraints on the motifs often leads to more interpretable and effective explanations. Experiments on several datasets suggest that L2xGnn achieves the same classification accuracy as baseline methods using the entire input graph while ensuring that only the provided explanations are used to make predictions. Moreover, we show that L2xGnn is able to identify motifs responsible for the graph’s properties it is intended to predict.
{"title":"L2XGNN: learning to explain graph neural networks","authors":"Giuseppe Serra, Mathias Niepert","doi":"10.1007/s10994-024-06576-1","DOIUrl":"https://doi.org/10.1007/s10994-024-06576-1","url":null,"abstract":"<p>Graph Neural Networks (GNNs) are a popular class of machine learning models. Inspired by the learning to explain (L2X) paradigm, we propose <span>L2xGnn</span>, a framework for explainable GNNs which provides <i>faithful</i> explanations by design. <span>L2xGnn</span> learns a mechanism for selecting explanatory subgraphs (motifs) which are exclusively used in the GNNs message-passing operations. <span>L2xGnn</span> is able to select, for each input graph, a subgraph with specific properties such as being sparse and connected. Imposing such constraints on the motifs often leads to more interpretable and effective explanations. Experiments on several datasets suggest that <span>L2xGnn</span> achieves the same classification accuracy as baseline methods using the entire input graph while ensuring that only the provided explanations are used to make predictions. Moreover, we show that <span>L2xGnn</span> is able to identify motifs responsible for the graph’s properties it is intended to predict.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"32 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141612076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-11DOI: 10.1007/s10994-024-06577-0
Dimitris Bertsimas, Nicholas A. G. Johnson
We study the Compressed Sensing (CS) problem, which is the problem of finding the most sparse vector that satisfies a set of linear measurements up to some numerical tolerance. CS is a central problem in Statistics, Operations Research and Machine Learning which arises in applications such as signal processing, data compression, image reconstruction, and multi-label learning. We introduce an (ell _2) regularized formulation of CS which we reformulate as a mixed integer second order cone program. We derive a second order cone relaxation of this problem and show that under mild conditions on the regularization parameter, the resulting relaxation is equivalent to the well studied basis pursuit denoising problem. We present a semidefinite relaxation that strengthens the second order cone relaxation and develop a custom branch-and-bound algorithm that leverages our second order cone relaxation to solve small-scale instances of CS to certifiable optimality. When compared against solutions produced by three state of the art benchmark methods on synthetic data, our numerical results show that our approach produces solutions that are on average (6.22%) more sparse. When compared only against the experiment-wise best performing benchmark method on synthetic data, our approach produces solutions that are on average (3.10%) more sparse. On real world ECG data, for a given (ell _2) reconstruction error our approach produces solutions that are on average (9.95%) more sparse than benchmark methods ((3.88%) more sparse if only compared against the best performing benchmark), while for a given sparsity level our approach produces solutions that have on average (10.77%) lower reconstruction error than benchmark methods ((1.42%) lower error if only compared against the best performing benchmark). When used as a component of a multi-label classification algorithm, our approach achieves greater classification accuracy than benchmark compressed sensing methods. This improved accuracy comes at the cost of an increase in computation time by several orders of magnitude. Thus, for applications where runtime is not of critical importance, leveraging integer optimization can yield sparser and lower error solutions to CS than existing benchmarks.
{"title":"Compressed sensing: a discrete optimization approach","authors":"Dimitris Bertsimas, Nicholas A. G. Johnson","doi":"10.1007/s10994-024-06577-0","DOIUrl":"https://doi.org/10.1007/s10994-024-06577-0","url":null,"abstract":"<p>We study the Compressed Sensing (CS) problem, which is the problem of finding the most sparse vector that satisfies a set of linear measurements up to some numerical tolerance. CS is a central problem in Statistics, Operations Research and Machine Learning which arises in applications such as signal processing, data compression, image reconstruction, and multi-label learning. We introduce an <span>(ell _2)</span> regularized formulation of CS which we reformulate as a mixed integer second order cone program. We derive a second order cone relaxation of this problem and show that under mild conditions on the regularization parameter, the resulting relaxation is equivalent to the well studied basis pursuit denoising problem. We present a semidefinite relaxation that strengthens the second order cone relaxation and develop a custom branch-and-bound algorithm that leverages our second order cone relaxation to solve small-scale instances of CS to certifiable optimality. When compared against solutions produced by three state of the art benchmark methods on synthetic data, our numerical results show that our approach produces solutions that are on average <span>(6.22%)</span> more sparse. When compared only against the experiment-wise best performing benchmark method on synthetic data, our approach produces solutions that are on average <span>(3.10%)</span> more sparse. On real world ECG data, for a given <span>(ell _2)</span> reconstruction error our approach produces solutions that are on average <span>(9.95%)</span> more sparse than benchmark methods (<span>(3.88%)</span> more sparse if only compared against the best performing benchmark), while for a given sparsity level our approach produces solutions that have on average <span>(10.77%)</span> lower reconstruction error than benchmark methods (<span>(1.42%)</span> lower error if only compared against the best performing benchmark). When used as a component of a multi-label classification algorithm, our approach achieves greater classification accuracy than benchmark compressed sensing methods. This improved accuracy comes at the cost of an increase in computation time by several orders of magnitude. Thus, for applications where runtime is not of critical importance, leveraging integer optimization can yield sparser and lower error solutions to CS than existing benchmarks.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"56 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141611977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-11DOI: 10.1007/s10994-024-06589-w
John Pavlopoulos, Maria Konstantinidou, Elpida Perdiki, Isabelle Marthot-Santaniello, Holger Essler, Georgios Vardakas, Aristidis Likas
Greek literary papyri, which are unique witnesses of antique literature, do not usually bear a date. They are thus currently dated based on palaeographical methods, with broad approximations which often span more than a century. We created a dataset of 242 images of papyri written in “bookhand” scripts whose date can be securely assigned, and we used it to train algorithms for the task of dating, showing its challenging nature. To address data scarcity, we extended our dataset by segmenting each image into its respective text lines. By using the line-based version of our dataset, we trained a Convolutional Neural Network, equipped with a fragmentation-based augmentation strategy, and we achieved a mean absolute error of 54 years. The results improve further when the task is cast as a multi-class classification problem, predicting the century. Using our network, we computed precise date estimations for papyri whose date is disputed or vaguely defined, employing explainability to understand dating-driving features.
{"title":"Explainable dating of greek papyri images","authors":"John Pavlopoulos, Maria Konstantinidou, Elpida Perdiki, Isabelle Marthot-Santaniello, Holger Essler, Georgios Vardakas, Aristidis Likas","doi":"10.1007/s10994-024-06589-w","DOIUrl":"https://doi.org/10.1007/s10994-024-06589-w","url":null,"abstract":"<p>Greek literary papyri, which are unique witnesses of antique literature, do not usually bear a date. They are thus currently dated based on palaeographical methods, with broad approximations which often span more than a century. We created a dataset of 242 images of papyri written in “bookhand” scripts whose date can be securely assigned, and we used it to train algorithms for the task of dating, showing its challenging nature. To address data scarcity, we extended our dataset by segmenting each image into its respective text lines. By using the line-based version of our dataset, we trained a Convolutional Neural Network, equipped with a fragmentation-based augmentation strategy, and we achieved a mean absolute error of 54 years. The results improve further when the task is cast as a multi-class classification problem, predicting the century. Using our network, we computed precise date estimations for papyri whose date is disputed or vaguely defined, employing explainability to understand dating-driving features.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"29 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141612077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-10DOI: 10.1007/s10994-024-06586-z
Dai Hai Nguyen, Tetsuya Sakurai
We address a general optimization problem involving the minimization of a composite objective functional defined over a class of probability distributions. The objective function consists of two components: one assumed to have a variational representation, and the other expressed in terms of the expectation operator of a possibly nonsmooth convex regularizer function. Such a regularized distributional optimization problem widely appears in machine learning and statistics, including proximal Monte-Carlo sampling, Bayesian inference, and generative modeling for regularized estimation and generation. Our proposed method, named Moreau-Yoshida Variational Transport (MYVT), introduces a novel approach to tackle this regularized distributional optimization problem. First, as the name suggests, our method utilizes the Moreau-Yoshida envelope to provide a smooth approximation of the nonsmooth function in the objective. Second, we reformulate the approximate problem as a concave-convex saddle point problem by leveraging the variational representation. Subsequently, we develop an efficient primal–dual algorithm to approximate the saddle point. Furthermore, we provide theoretical analyses and present experimental results to showcase the effectiveness of the proposed method.
{"title":"Moreau-Yoshida variational transport: a general framework for solving regularized distributional optimization problems","authors":"Dai Hai Nguyen, Tetsuya Sakurai","doi":"10.1007/s10994-024-06586-z","DOIUrl":"https://doi.org/10.1007/s10994-024-06586-z","url":null,"abstract":"<p>We address a general optimization problem involving the minimization of a composite objective functional defined over a class of probability distributions. The objective function consists of two components: one assumed to have a variational representation, and the other expressed in terms of the expectation operator of a possibly nonsmooth convex regularizer function. Such a regularized distributional optimization problem widely appears in machine learning and statistics, including proximal Monte-Carlo sampling, Bayesian inference, and generative modeling for regularized estimation and generation. Our proposed method, named Moreau-Yoshida Variational Transport (MYVT), introduces a novel approach to tackle this regularized distributional optimization problem. First, as the name suggests, our method utilizes the Moreau-Yoshida envelope to provide a smooth approximation of the nonsmooth function in the objective. Second, we reformulate the approximate problem as a concave-convex saddle point problem by leveraging the variational representation. Subsequently, we develop an efficient primal–dual algorithm to approximate the saddle point. Furthermore, we provide theoretical analyses and present experimental results to showcase the effectiveness of the proposed method.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"20 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141584981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-09DOI: 10.1007/s10994-024-06561-8
Ludwig Lausser, Robin Szekely, Hans A. Kestler
Invariant concept classes form the backbone of classification algorithms immune to specific data transformations, ensuring consistent predictions regardless of these alterations. However, this robustness can come at the cost of limited access to the original sample information, potentially impacting generalization performance. This study introduces an addition to these classes—the permutation-invariant linear classifiers. Distinguished by their structural characteristics, permutation-invariant linear classifiers are unaffected by permutations on feature vectors, a property not guaranteed by other non-constant linear classifiers. The study characterizes this new concept class, highlighting its constant capacity, independent of input dimensionality. In practical assessments using linear support vector machines, the permutation-invariant classifiers exhibit superior performance in permutation experiments on artificial datasets and real mutation profiles. Interestingly, they outperform general linear classifiers not only in permutation experiments but also in permutation-free settings, surpassing unconstrained counterparts. Additionally, findings from real mutation profiles support the significance of tumor mutational burden as a biomarker.
{"title":"Permutation-invariant linear classifiers","authors":"Ludwig Lausser, Robin Szekely, Hans A. Kestler","doi":"10.1007/s10994-024-06561-8","DOIUrl":"https://doi.org/10.1007/s10994-024-06561-8","url":null,"abstract":"<p>Invariant concept classes form the backbone of classification algorithms immune to specific data transformations, ensuring consistent predictions regardless of these alterations. However, this robustness can come at the cost of limited access to the original sample information, potentially impacting generalization performance. This study introduces an addition to these classes—the permutation-invariant linear classifiers. Distinguished by their structural characteristics, permutation-invariant linear classifiers are unaffected by permutations on feature vectors, a property not guaranteed by other non-constant linear classifiers. The study characterizes this new concept class, highlighting its constant capacity, independent of input dimensionality. In practical assessments using linear support vector machines, the permutation-invariant classifiers exhibit superior performance in permutation experiments on artificial datasets and real mutation profiles. Interestingly, they outperform general linear classifiers not only in permutation experiments but also in permutation-free settings, surpassing unconstrained counterparts. Additionally, findings from real mutation profiles support the significance of tumor mutational burden as a biomarker.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"65 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141574756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-09DOI: 10.1007/s10994-024-06555-6
Jiachen Lyu, Katharina Dost, Yun Sing Koh, Jörg Wicker
In Natural Language Processing (NLP), pre-trained language models (LLMs) are widely employed and refined for various tasks. These models have shown considerable social and geographic biases creating skewed or even unfair representations of certain groups. Research focuses on biases toward L2 (English as a second language) regions but neglects bias within L1 (first language) regions. In this work, we ask if there is regional bias within L1 regions already inherent in pre-trained LLMs and, if so, what the consequences are in terms of downstream model performance. We contribute an investigation framework specifically tailored for low-resource regions, offering a method to identify bias without imposing strict requirements for labeled datasets. Our research reveals subtle geographic variations in the word embeddings of BERT, even in cultures traditionally perceived as similar. These nuanced features, once captured, have the potential to significantly impact downstream tasks. Generally, models exhibit comparable performance on datasets that share similarities, and conversely, performance may diverge when datasets differ in their nuanced features embedded within the language. It is crucial to note that estimating model performance solely based on standard benchmark datasets may not necessarily apply to the datasets with distinct features from the benchmark datasets. Our proposed framework plays a pivotal role in identifying and addressing biases detected in word embeddings, particularly evident in low-resource regions such as New Zealand.
{"title":"Regional bias in monolingual English language models","authors":"Jiachen Lyu, Katharina Dost, Yun Sing Koh, Jörg Wicker","doi":"10.1007/s10994-024-06555-6","DOIUrl":"https://doi.org/10.1007/s10994-024-06555-6","url":null,"abstract":"<p>In Natural Language Processing (NLP), pre-trained language models (LLMs) are widely employed and refined for various tasks. These models have shown considerable social and geographic biases creating skewed or even unfair representations of certain groups. Research focuses on biases toward L2 (English as a second language) regions but neglects bias within L1 (first language) regions. In this work, we ask if there is regional bias within L1 regions already inherent in pre-trained LLMs and, if so, what the consequences are in terms of downstream model performance. We contribute an investigation framework specifically tailored for low-resource regions, offering a method to identify bias without imposing strict requirements for labeled datasets. Our research reveals subtle geographic variations in the word embeddings of BERT, even in cultures traditionally perceived as similar. These nuanced features, once captured, have the potential to significantly impact downstream tasks. Generally, models exhibit comparable performance on datasets that share similarities, and conversely, performance may diverge when datasets differ in their nuanced features embedded within the language. It is crucial to note that estimating model performance solely based on standard benchmark datasets may not necessarily apply to the datasets with distinct features from the benchmark datasets. Our proposed framework plays a pivotal role in identifying and addressing biases detected in word embeddings, particularly evident in low-resource regions such as New Zealand.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"35 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141574755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-09DOI: 10.1007/s10994-024-06571-6
Alberto Carlevaro, Teodoro Alamo, Fabrizio Dabbene, Maurizio Mongelli
Conformal predictions make it possible to define reliable and robust learning algorithms. But they are essentially a method for evaluating whether an algorithm is good enough to be used in practice. To define a reliable learning framework for classification from the very beginning of its design, the concept of scalable classifier was introduced to generalize the concept of classical classifier by linking it to statistical order theory and probabilistic learning theory. In this paper, we analyze the similarities between scalable classifiers and conformal predictions by introducing a new definition of a score function and defining a special set of input variables, the conformal safety set, which can identify patterns in the input space that satisfy the error coverage guarantee, i.e., that the probability of observing the wrong (possibly unsafe) label for points belonging to this set is bounded by a predefined (varepsilon) error level. We demonstrate the practical implications of this framework through an application in cybersecurity for identifying DNS tunneling attacks. Our work contributes to the development of probabilistically robust and reliable machine learning models.
共形预测使我们有可能定义可靠、稳健的学习算法。但是,它们本质上是一种评估算法是否足以在实践中使用的方法。为了从设计之初就定义可靠的分类学习框架,我们引入了可扩展分类器的概念,通过将其与统计秩理论和概率学习理论联系起来,对经典分类器的概念进行了概括。在本文中,我们分析了可扩展分类器和保形预测之间的相似性,引入了得分函数的新定义,并定义了一个特殊的输入变量集--保形安全集,它可以识别输入空间中满足误差覆盖保证的模式,即观察到属于该集的点的错误(可能不安全)标签的概率被预定义的(varepsilon)误差水平所限制。我们通过在网络安全中识别 DNS 隧道攻击的应用,展示了这一框架的实际意义。我们的工作有助于开发概率上稳健可靠的机器学习模型。
{"title":"Conformal predictions for probabilistically robust scalable machine learning classification","authors":"Alberto Carlevaro, Teodoro Alamo, Fabrizio Dabbene, Maurizio Mongelli","doi":"10.1007/s10994-024-06571-6","DOIUrl":"https://doi.org/10.1007/s10994-024-06571-6","url":null,"abstract":"<p>Conformal predictions make it possible to define reliable and robust learning algorithms. But they are essentially a method for evaluating whether an algorithm is good enough to be used in practice. To define a reliable learning framework for classification from the very beginning of its design, the concept of scalable classifier was introduced to generalize the concept of classical classifier by linking it to statistical order theory and probabilistic learning theory. In this paper, we analyze the similarities between scalable classifiers and conformal predictions by introducing a new definition of a score function and defining a special set of input variables, the conformal safety set, which can identify patterns in the input space that satisfy the error coverage guarantee, i.e., that the probability of observing the wrong (possibly unsafe) label for points belonging to this set is bounded by a predefined <span>(varepsilon)</span> error level. We demonstrate the practical implications of this framework through an application in cybersecurity for identifying DNS tunneling attacks. Our work contributes to the development of probabilistically robust and reliable machine learning models.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"72 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141574797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-09DOI: 10.1007/s10994-024-06581-4
Francesco Gullo, Domenico Mandaglio, Andrea Tagarelli
Signed graphs are a model to depict friendly (positive) or antagonistic (negative) interactions (edges) among users (nodes). 2-Polarized-Communities (2pc) is a well-established combinatorial-optimization problem whose goal is to find two polarized communities from a signed graph, i.e., two subsets of nodes (disjoint, but not necessarily covering the entire node set) which exhibit a high number of both intra-community positive edges and negative inter-community edges. The state of the art in 2pc suffers from the limitations that (i) existing methods rely on a single (optimal) solution to a continuous relaxation of the problem in order to produce the ultimate discrete solution via rounding, and (ii) 2pc objective function comes with no control on size balance among communities. In this paper, we provide advances to the 2pc problem by addressing both these limitations, with a twofold contribution. First, we devise a novel neural approach that allows for soundly and elegantly explore a variety of suboptimal solutions to the relaxed 2pc problem, so as to pick the one that leads to the best discrete solution after rounding. Second, we introduce a generalization of 2pc objective function – termed (gamma )-polarity – which fosters size balance among communities, and we incorporate it into the proposed machine-learning framework. Extensive experiments attest high accuracy of our approach, its superiority over the state of the art, and capability of function (gamma )-polarity to discover high-quality size-balanced communities.
{"title":"Neural discovery of balance-aware polarized communities","authors":"Francesco Gullo, Domenico Mandaglio, Andrea Tagarelli","doi":"10.1007/s10994-024-06581-4","DOIUrl":"https://doi.org/10.1007/s10994-024-06581-4","url":null,"abstract":"<p><i>Signed graphs</i> are a model to depict friendly (<i>positive</i>) or antagonistic (<i>negative</i>) interactions (edges) among users (nodes). <span>2-Polarized-Communities</span> (<span>2pc</span>) is a well-established combinatorial-optimization problem whose goal is to find two <i>polarized</i> communities from a signed graph, i.e., two subsets of nodes (disjoint, but not necessarily covering the entire node set) which exhibit a high number of both intra-community positive edges and negative inter-community edges. The state of the art in <span>2pc</span> suffers from the limitations that (<i>i</i>) existing methods rely on a single (optimal) solution to a continuous relaxation of the problem in order to produce the ultimate discrete solution via rounding, and (<i>ii</i>) <span>2pc</span> objective function comes with no control on size balance among communities. In this paper, we provide advances to the <span>2pc</span> problem by addressing both these limitations, with a twofold contribution. First, we devise a novel neural approach that allows for soundly and elegantly explore a variety of suboptimal solutions to the relaxed <span>2pc</span> problem, so as to pick the one that leads to the best discrete solution after rounding. Second, we introduce a generalization of <span>2pc</span> objective function – termed <span>(gamma )</span>-<i>polarity </i>– which fosters size balance among communities, and we incorporate it into the proposed machine-learning framework. Extensive experiments attest high accuracy of our approach, its superiority over the state of the art, and capability of function <span>(gamma )</span>-polarity to discover high-quality size-balanced communities.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"179 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141577849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-08DOI: 10.1007/s10994-024-06583-2
Joe Germino, Nuno Moniz, Nitesh V. Chawla
With the rise of artificial intelligence in our everyday lives, the need for human interpretation of machine learning models’ predictions emerges as a critical issue. Generally, interpretability is viewed as a binary notion with a performance trade-off. Either a model is fully-interpretable but lacks the ability to capture more complex patterns in the data, or it is a black box. In this paper, we argue that this view is severely limiting and that instead interpretability should be viewed as a continuous domain-informed concept. We leverage the well-known Mixture of Experts architecture with user-defined limits on non-interpretability. We extend this idea with a counterfactual fairness module to ensure the selection of consistently fair experts: FairMOE. We perform an extensive experimental evaluation with fairness-related data sets and compare our proposal against state-of-the-art methods. Our results demonstrate that FairMOE is competitive with the leading fairness-aware algorithms in both fairness and predictive measures while providing more consistent performance, competitive scalability, and, most importantly, greater interpretability.
{"title":"FairMOE: counterfactually-fair mixture of experts with levels of interpretability","authors":"Joe Germino, Nuno Moniz, Nitesh V. Chawla","doi":"10.1007/s10994-024-06583-2","DOIUrl":"https://doi.org/10.1007/s10994-024-06583-2","url":null,"abstract":"<p>With the rise of artificial intelligence in our everyday lives, the need for human interpretation of machine learning models’ predictions emerges as a critical issue. Generally, interpretability is viewed as a binary notion with a performance trade-off. Either a model is fully-interpretable but lacks the ability to capture more complex patterns in the data, or it is a black box. In this paper, we argue that this view is severely limiting and that instead interpretability should be viewed as a continuous domain-informed concept. We leverage the well-known Mixture of Experts architecture with user-defined limits on non-interpretability. We extend this idea with a counterfactual fairness module to ensure the selection of consistently <i>fair</i> experts: <b>FairMOE</b>. We perform an extensive experimental evaluation with fairness-related data sets and compare our proposal against state-of-the-art methods. Our results demonstrate that FairMOE is competitive with the leading fairness-aware algorithms in both fairness and predictive measures while providing more consistent performance, competitive scalability, and, most importantly, greater interpretability.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"29 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141574798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-08DOI: 10.1007/s10994-024-06590-3
Jakob Raymaekers, Peter J. Rousseeuw, Tim Verdonck, Ruicong Yao
Linear model trees are regression trees that incorporate linear models in the leaf nodes. This preserves the intuitive interpretation of decision trees and at the same time enables them to better capture linear relationships, which is hard for standard decision trees. But most existing methods for fitting linear model trees are time consuming and therefore not scalable to large data sets. In addition, they are more prone to overfitting and extrapolation issues than standard regression trees. In this paper we introduce PILOT, a new algorithm for linear model trees that is fast, regularized, stable and interpretable. PILOT trains in a greedy fashion like classic regression trees, but incorporates an L2 boosting approach and a model selection rule for fitting linear models in the nodes. The abbreviation PILOT stands for PIecewise Linear Organic Tree, where ‘organic’ refers to the fact that no pruning is carried out. PILOT has the same low time and space complexity as CART without its pruning. An empirical study indicates that PILOT tends to outperform standard decision trees and other linear model trees on a variety of data sets. Moreover, we prove its consistency in an additive model setting under weak assumptions. When the data is generated by a linear model, the convergence rate is polynomial.
线性模型树是在叶节点中加入线性模型的回归树。这既保留了决策树的直观解释,又能使其更好地捕捉线性关系,而标准决策树很难做到这一点。但是,现有的大多数拟合线性模型树的方法都很耗时,因此无法扩展到大型数据集。此外,与标准回归树相比,它们更容易出现过拟合和外推问题。在本文中,我们介绍了 PILOT,一种快速、正则化、稳定和可解释的线性模型树新算法。PILOT 与经典回归树一样采用贪婪方式进行训练,但在节点中加入了 L2 提升方法和拟合线性模型的模型选择规则。缩写 PILOT 是 PIecewise Linear Organic Tree 的缩写,其中的 "organic "指的是不进行修剪。PILOT 与不进行剪枝的 CART 一样,具有较低的时间和空间复杂度。实证研究表明,PILOT 在各种数据集上的表现往往优于标准决策树和其他线性模型树。此外,我们还证明了它在弱假设条件下的加法模型设置中的一致性。当数据由线性模型生成时,收敛速率为多项式。
{"title":"Fast linear model trees by PILOT","authors":"Jakob Raymaekers, Peter J. Rousseeuw, Tim Verdonck, Ruicong Yao","doi":"10.1007/s10994-024-06590-3","DOIUrl":"https://doi.org/10.1007/s10994-024-06590-3","url":null,"abstract":"<p>Linear model trees are regression trees that incorporate linear models in the leaf nodes. This preserves the intuitive interpretation of decision trees and at the same time enables them to better capture linear relationships, which is hard for standard decision trees. But most existing methods for fitting linear model trees are time consuming and therefore not scalable to large data sets. In addition, they are more prone to overfitting and extrapolation issues than standard regression trees. In this paper we introduce PILOT, a new algorithm for linear model trees that is fast, regularized, stable and interpretable. PILOT trains in a greedy fashion like classic regression trees, but incorporates an <i>L</i><sup>2</sup> boosting approach and a model selection rule for fitting linear models in the nodes. The abbreviation PILOT stands for PIecewise Linear Organic Tree, where ‘organic’ refers to the fact that no pruning is carried out. PILOT has the same low time and space complexity as CART without its pruning. An empirical study indicates that PILOT tends to outperform standard decision trees and other linear model trees on a variety of data sets. Moreover, we prove its consistency in an additive model setting under weak assumptions. When the data is generated by a linear model, the convergence rate is polynomial.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"10 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141574800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}