首页 > 最新文献

Machine Learning最新文献

英文 中文
Classification with costly features in hierarchical deep sets 在分层深度集合中使用代价高昂的特征进行分类
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-05-22 DOI: 10.1007/s10994-024-06565-4
Jaromír Janisch, Tomáš Pevný, Viliam Lisý

Classification with costly features (CwCF) is a classification problem that includes the cost of features in the optimization criteria. Individually for each sample, its features are sequentially acquired to maximize accuracy while minimizing the acquired features’ cost. However, existing approaches can only process data that can be expressed as vectors of fixed length. In real life, the data often possesses rich and complex structure, which can be more precisely described with formats such as XML or JSON. The data is hierarchical and often contains nested lists of objects. In this work, we extend an existing deep reinforcement learning-based algorithm with hierarchical deep sets and hierarchical softmax, so that it can directly process this data. The extended method has greater control over which features it can acquire and, in experiments with seven datasets, we show that this leads to superior performance. To showcase the real usage of the new method, we apply it to a real-life problem of classifying malicious web domains, using an online service.

高成本特征分类(CwCF)是一种将特征成本纳入优化标准的分类问题。对于每个样本,都要按顺序获取其特征,以最大限度地提高准确率,同时最小化获取特征的成本。然而,现有方法只能处理以固定长度向量表示的数据。在现实生活中,数据往往具有丰富而复杂的结构,而 XML 或 JSON 等格式可以更精确地描述这些结构。数据是分层的,通常包含嵌套的对象列表。在这项工作中,我们利用分层深度集和分层软最大值扩展了现有的基于深度强化学习的算法,使其可以直接处理这些数据。通过对七个数据集的实验,我们发现这种扩展方法能更好地控制所能获取的特征,从而带来更出色的性能。为了展示新方法的实际用途,我们将其应用于一个实际问题,即利用在线服务对恶意网站域名进行分类。
{"title":"Classification with costly features in hierarchical deep sets","authors":"Jaromír Janisch, Tomáš Pevný, Viliam Lisý","doi":"10.1007/s10994-024-06565-4","DOIUrl":"https://doi.org/10.1007/s10994-024-06565-4","url":null,"abstract":"<p>Classification with costly features (CwCF) is a classification problem that includes the cost of features in the optimization criteria. Individually for each sample, its features are sequentially acquired to maximize accuracy while minimizing the acquired features’ cost. However, existing approaches can only process data that can be expressed as vectors of fixed length. In real life, the data often possesses rich and complex structure, which can be more precisely described with formats such as XML or JSON. The data is hierarchical and often contains nested lists of objects. In this work, we extend an existing deep reinforcement learning-based algorithm with hierarchical deep sets and hierarchical softmax, so that it can directly process this data. The extended method has greater control over which features it can acquire and, in experiments with seven datasets, we show that this leads to superior performance. To showcase the real usage of the new method, we apply it to a real-life problem of classifying malicious web domains, using an online service.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"29 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141151499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CoMadOut—a robust outlier detection algorithm based on CoMAD CoMadOut - 基于 CoMAD 的鲁棒离群点检测算法
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-05-07 DOI: 10.1007/s10994-024-06521-2
Andreas Lohrer, Daniyal Kazempour, Maximilian Hünemörder, Peer Kröger

Unsupervised learning methods are well established in the area of anomaly detection and achieve state of the art performances on outlier datasets. Outliers play a significant role, since they bear the potential to distort the predictions of a machine learning algorithm on a given dataset. Especially among PCA-based methods, outliers have an additional destructive potential regarding the result: they may not only distort the orientation and translation of the principal components, they also make it more complicated to detect outliers. To address this problem, we propose the robust outlier detection algorithm CoMadOut, which satisfies two required properties: (1) being robust towards outliers and (2) detecting them. Our CoMadOut outlier detection variants using comedian PCA define, dependent on its variant, an inlier region with a robust noise margin by measures of in-distribution (variant CMO) and optimized scores by measures of out-of-distribution (variants CMO*), e.g. kurtosis-weighting by CMO+k. These measures allow distribution based outlier scoring for each principal component, and thus, an appropriate alignment of the degree of outlierness between normal and abnormal instances. Experiments comparing CoMadOut with traditional, deep and other comparable robust outlier detection methods showed that the performance of the introduced CoMadOut approach is competitive to well established methods related to average precision (AP), area under the precision recall curve (AUPRC) and area under the receiver operating characteristic (AUROC) curve. In summary our approach can be seen as a robust alternative for outlier detection tasks.

无监督学习方法在异常检测领域已得到广泛应用,并在异常值数据集上实现了最先进的性能。异常值具有重要作用,因为它们有可能扭曲机器学习算法对给定数据集的预测。特别是在基于 PCA 的方法中,离群值对结果具有额外的破坏潜力:它们不仅会扭曲主成分的方向和平移,还会使离群值的检测变得更加复杂。为了解决这个问题,我们提出了鲁棒离群值检测算法 CoMadOut,它满足两个必要的属性:(1)对离群值的鲁棒性和(2)检测离群值。我们的 CoMadOut 离群点检测变体使用 comedian PCA,根据其变体,通过分布内测量(变体 CMO)和分布外测量(变体 CMO*)(如 CMO+k 的峰度加权)定义具有稳健噪声边际的离群点区域和优化分数。这些测量方法可以对每个主成分进行基于分布的离群值评分,从而对正常和异常实例之间的离群程度进行适当的调整。将 CoMadOut 与传统的、深度的和其他类似的鲁棒离群点检测方法进行比较的实验表明,引入的 CoMadOut 方法在平均精度(AP)、精度召回曲线下面积(AUPRC)和接收者操作特征曲线下面积(AUROC)方面的性能与成熟的方法相比具有竞争力。总之,我们的方法可被视为离群点检测任务的一种稳健替代方法。
{"title":"CoMadOut—a robust outlier detection algorithm based on CoMAD","authors":"Andreas Lohrer, Daniyal Kazempour, Maximilian Hünemörder, Peer Kröger","doi":"10.1007/s10994-024-06521-2","DOIUrl":"https://doi.org/10.1007/s10994-024-06521-2","url":null,"abstract":"<p>Unsupervised learning methods are well established in the area of anomaly detection and achieve state of the art performances on outlier datasets. Outliers play a significant role, since they bear the potential to distort the predictions of a machine learning algorithm on a given dataset. Especially among PCA-based methods, outliers have an additional destructive potential regarding the result: they may not only distort the orientation and translation of the principal components, they also make it more complicated to detect outliers. To address this problem, we propose the robust outlier detection algorithm CoMadOut, which satisfies two required properties: (1) being robust towards outliers and (2) detecting them. Our CoMadOut outlier detection variants using comedian PCA define, dependent on its variant, an inlier region with a robust noise margin by measures of in-distribution (variant CMO) and optimized scores by measures of out-of-distribution (variants CMO*), e.g. kurtosis-weighting by CMO+k. These measures allow distribution based outlier scoring for each principal component, and thus, an appropriate alignment of the degree of outlierness between normal and abnormal instances. Experiments comparing CoMadOut with traditional, deep and other comparable robust outlier detection methods showed that the performance of the introduced CoMadOut approach is competitive to well established methods related to average precision (AP), area under the precision recall curve (AUPRC) and area under the receiver operating characteristic (AUROC) curve. In summary our approach can be seen as a robust alternative for outlier detection tasks.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"1 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140884304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SWoTTeD: an extension of tensor decomposition to temporal phenotyping SWoTTeD:将张量分解扩展到时间表型分析
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-04-30 DOI: 10.1007/s10994-024-06545-8
Hana Sebia, Thomas Guyet, Etienne Audureau

Tensor decomposition has recently been gaining attention in the machine learning community for the analysis of individual traces, such as Electronic Health Records. However, this task becomes significantly more difficult when the data follows complex temporal patterns. This paper introduces the notion of a temporal phenotype as an arrangement of features over time and it proposes SWoTTeD (Sliding Window for Temporal Tensor Decomposition), a novel method to discover hidden temporal patterns. SWoTTeD integrates several constraints and regularizations to enhance the interpretability of the extracted phenotypes. We validate our proposal using both synthetic and real-world datasets, and we present an original usecase using data from the Greater Paris University Hospital. The results show that SWoTTeD achieves at least as accurate reconstruction as recent state-of-the-art tensor decomposition models, and extracts temporal phenotypes that are meaningful for clinicians.

最近,张量分解法在机器学习领域越来越受到关注,它可用于分析电子健康记录等单个痕迹。然而,当数据遵循复杂的时间模式时,这项任务就变得困难得多。本文引入了时态表型的概念,将其视为随时间变化的特征排列,并提出了 SWoTTeD(用于时态张量分解的滑动窗口),这是一种发现隐藏时态模式的新方法。SWoTTeD 整合了多个约束条件和正则化,以提高提取表型的可解释性。我们使用合成数据集和真实世界数据集验证了我们的建议,并使用大巴黎大学医院的数据介绍了一个原创案例。结果表明,SWoTTeD 所实现的重建精确度至少不亚于最近最先进的张量分解模型,而且提取的时间表型对临床医生来说非常有意义。
{"title":"SWoTTeD: an extension of tensor decomposition to temporal phenotyping","authors":"Hana Sebia, Thomas Guyet, Etienne Audureau","doi":"10.1007/s10994-024-06545-8","DOIUrl":"https://doi.org/10.1007/s10994-024-06545-8","url":null,"abstract":"<p>Tensor decomposition has recently been gaining attention in the machine learning community for the analysis of individual traces, such as Electronic Health Records. However, this task becomes significantly more difficult when the data follows complex temporal patterns. This paper introduces the notion of a temporal phenotype as an arrangement of features over time and it proposes <span>SWoTTeD</span> (<b>S</b>liding <b>W</b>ind<b>o</b>w for <b>T</b>emporal <b>Te</b>nsor <b>D</b>ecomposition), a novel method to discover hidden temporal patterns. <span>SWoTTeD</span> integrates several constraints and regularizations to enhance the interpretability of the extracted phenotypes. We validate our proposal using both synthetic and real-world datasets, and we present an original usecase using data from the Greater Paris University Hospital. The results show that <span>SWoTTeD</span> achieves at least as accurate reconstruction as recent state-of-the-art tensor decomposition models, and extracts temporal phenotypes that are meaningful for clinicians.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"12 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140840995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Finite-time error bounds for Greedy-GQ Greedy-GQ 的有限时间误差边界
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-04-30 DOI: 10.1007/s10994-024-06542-x
Yue Wang, Yi Zhou, Shaofeng Zou

Greedy-GQ with linear function approximation, originally proposed in Maei et al. (in: Proceedings of the international conference on machine learning (ICML), 2010), is a value-based off-policy algorithm for optimal control in reinforcement learning, and it has a non-linear two timescale structure with non-convex objective function. This paper develops its tightest finite-time error bounds. We show that the Greedy-GQ algorithm converges as fast as (mathcal {O}({1}/{sqrt{T}})) under the i.i.d. setting and (mathcal {O}({log T}/{sqrt{T}})) under the Markovian setting. We further design variant of the vanilla Greedy-GQ algorithm using the nested-loop approach, and show that its sample complexity is (mathcal {O}({log (1/epsilon )epsilon ^{-2}})), which matches with the one of the vanilla Greedy-GQ. Our finite-time error bounds match with the one of the stochastic gradient descent algorithm for general smooth non-convex optimization problems, despite of its additonal challenge in the two time-scale updates. Our finite-sample analysis provides theoretical guidance on choosing step-sizes for faster convergence in practice, and suggests the trade-off between the convergence rate and the quality of the obtained policy. Our techniques provide a general approach for finite-sample analysis of non-convex two timescale value-based reinforcement learning algorithms.

具有线性函数近似的 Greedy-GQ 算法最初是由 Maei 等人提出的(Proceedings of international conference of machine learning (ICML, 2010)):国际机器学习会议(ICML)论文集,2010 年)中提出的,是一种基于值的非策略算法,用于强化学习中的最优控制,它具有非线性双时标结构和非凸目标函数。本文开发了其最严格的有限时间误差边界。我们证明,在 i.i.d. 设定下,Greedy-GQ 算法的收敛速度可达 (mathcal {O}({1}/{sqrt{T}})) ;在马尔可夫设定下,收敛速度可达 (mathcal {O}({log T}/{sqrt{T}})) 。我们使用嵌套循环方法进一步设计了vanilla Greedy-GQ算法的变体,并证明其采样复杂度为(mathcal {O}({log (1/epsilon )epsilon ^{-2}})),与vanilla Greedy-GQ算法的采样复杂度相匹配。我们的有限时间误差边界与随机梯度下降算法的有限时间误差边界相匹配,该算法适用于一般平滑非凸优化问题,尽管它在两个时间尺度的更新中面临额外的挑战。我们的有限样本分析为在实践中选择更快收敛的步长提供了理论指导,并提出了收敛速度与所获策略质量之间的权衡。我们的技术为基于值的非凸双时间尺度强化学习算法的有限样本分析提供了一种通用方法。
{"title":"Finite-time error bounds for Greedy-GQ","authors":"Yue Wang, Yi Zhou, Shaofeng Zou","doi":"10.1007/s10994-024-06542-x","DOIUrl":"https://doi.org/10.1007/s10994-024-06542-x","url":null,"abstract":"<p>Greedy-GQ with linear function approximation, originally proposed in Maei et al. (in: Proceedings of the international conference on machine learning (ICML), 2010), is a value-based off-policy algorithm for optimal control in reinforcement learning, and it has a non-linear two timescale structure with non-convex objective function. This paper develops its tightest finite-time error bounds. We show that the Greedy-GQ algorithm converges as fast as <span>(mathcal {O}({1}/{sqrt{T}}))</span> under the i.i.d. setting and <span>(mathcal {O}({log T}/{sqrt{T}}))</span> under the Markovian setting. We further design variant of the vanilla Greedy-GQ algorithm using the nested-loop approach, and show that its sample complexity is <span>(mathcal {O}({log (1/epsilon )epsilon ^{-2}}))</span>, which matches with the one of the vanilla Greedy-GQ. Our finite-time error bounds match with the one of the stochastic gradient descent algorithm for general smooth non-convex optimization problems, despite of its additonal challenge in the two time-scale updates. Our finite-sample analysis provides theoretical guidance on choosing step-sizes for faster convergence in practice, and suggests the trade-off between the convergence rate and the quality of the obtained policy. Our techniques provide a general approach for finite-sample analysis of non-convex two timescale value-based reinforcement learning algorithms.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"41 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantic-enhanced graph neural networks with global context representation 具有全局上下文表示的语义增强图神经网络
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-04-29 DOI: 10.1007/s10994-024-06523-0
Youcheng Qian, Xueyan Yin

Node classification is a crucial task for efficiently analyzing graph-structured data. Related semi-supervised methods have been extensively studied to address the scarcity of labeled data in emerging classes. However, two fundamental weaknesses hinder the performance: lacking the ability to mine latent semantic information between nodes, or ignoring to simultaneously capture local and global coupling dependencies between different nodes. To solve these limitations, we propose a novel semantic-enhanced graph neural networks with global context representation for semi-supervised node classification. Specifically, we first use graph convolution network to learn short-range local dependencies, which not only considers the spatial topological structure relationship between nodes, but also takes into account the semantic correlation between nodes to enhance the representation ability of nodes. Second, an improved Transformer model is introduced to reasoning the long-range global pairwise relationships, which has linear computational complexity and is particularly important for large datasets. Finally, the proposed model shows strong performance on various open datasets, demonstrating the superiority of our solutions.

节点分类是高效分析图结构数据的一项重要任务。相关的半监督方法已被广泛研究,以解决新兴类别中标记数据稀缺的问题。然而,有两个基本弱点阻碍了这些方法的性能:缺乏挖掘节点间潜在语义信息的能力,或者忽略了同时捕捉不同节点间的局部和全局耦合依赖关系。为了解决这些局限性,我们提出了一种具有全局上下文表示的新型语义增强图神经网络,用于半监督节点分类。具体来说,我们首先利用图卷积网络来学习短程局部依赖关系,这不仅考虑了节点之间的空间拓扑结构关系,还考虑了节点之间的语义关联,从而增强了节点的表示能力。其次,引入改进的 Transformer 模型来推理长程全局配对关系,该模型具有线性计算复杂度,对于大型数据集尤为重要。最后,所提出的模型在各种开放数据集上表现出很强的性能,证明了我们的解决方案的优越性。
{"title":"Semantic-enhanced graph neural networks with global context representation","authors":"Youcheng Qian, Xueyan Yin","doi":"10.1007/s10994-024-06523-0","DOIUrl":"https://doi.org/10.1007/s10994-024-06523-0","url":null,"abstract":"<p>Node classification is a crucial task for efficiently analyzing graph-structured data. Related semi-supervised methods have been extensively studied to address the scarcity of labeled data in emerging classes. However, two fundamental weaknesses hinder the performance: lacking the ability to mine latent semantic information between nodes, or ignoring to simultaneously capture local and global coupling dependencies between different nodes. To solve these limitations, we propose a novel semantic-enhanced graph neural networks with global context representation for semi-supervised node classification. Specifically, we first use graph convolution network to learn short-range local dependencies, which not only considers the spatial topological structure relationship between nodes, but also takes into account the semantic correlation between nodes to enhance the representation ability of nodes. Second, an improved Transformer model is introduced to reasoning the long-range global pairwise relationships, which has linear computational complexity and is particularly important for large datasets. Finally, the proposed model shows strong performance on various open datasets, demonstrating the superiority of our solutions.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"53 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explaining Siamese networks in few-shot learning 解释少儿学习中的连体网络
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-04-29 DOI: 10.1007/s10994-024-06529-8
Andrea Fedele, Riccardo Guidotti, Dino Pedreschi

Machine learning models often struggle to generalize accurately when tested on new class distributions that were not present in their training data. This is a significant challenge for real-world applications that require quick adaptation without the need for retraining. To address this issue, few-shot learning frameworks, which includes models such as Siamese Networks, have been proposed. Siamese Networks learn similarity between pairs of records through a metric that can be easily extended to new, unseen classes. However, these systems lack interpretability, which can hinder their use in certain applications. To address this, we propose a data-agnostic method to explain the outcomes of Siamese Networks in the context of few-shot learning. Our explanation method is based on a post-hoc perturbation-based procedure that evaluates the contribution of individual input features to the final outcome. As such, it falls under the category of post-hoc explanation methods. We present two variants, one that considers each input feature independently, and another that evaluates the interplay between features. Additionally, we propose two perturbation procedures to evaluate feature contributions. Qualitative and quantitative results demonstrate that our method is able to identify highly discriminant intra-class and inter-class characteristics, as well as predictive behaviors that lead to misclassification by relying on incorrect features.

机器学习模型在对训练数据中不存在的新类别分布进行测试时,往往难以准确泛化。对于需要快速适应而无需重新训练的实际应用来说,这是一个巨大的挑战。为了解决这个问题,有人提出了少量学习框架,其中包括连体网络(Siamese Networks)等模型。连体网络通过一种指标来学习记录对之间的相似性,这种指标可以很容易地扩展到新的、未见过的类别。然而,这些系统缺乏可解释性,这可能会阻碍它们在某些应用中的使用。为了解决这个问题,我们提出了一种与数据无关的方法,来解释连体网络在少量学习中的结果。我们的解释方法基于一种事后扰动程序,该程序可评估各个输入特征对最终结果的贡献。因此,它属于事后解释方法的范畴。我们提出了两种变体,一种是独立考虑每个输入特征,另一种是评估特征之间的相互作用。此外,我们还提出了两种扰动程序来评估特征贡献。定性和定量结果表明,我们的方法能够识别高区分度的类内和类间特征,以及依赖不正确特征而导致误分类的预测行为。
{"title":"Explaining Siamese networks in few-shot learning","authors":"Andrea Fedele, Riccardo Guidotti, Dino Pedreschi","doi":"10.1007/s10994-024-06529-8","DOIUrl":"https://doi.org/10.1007/s10994-024-06529-8","url":null,"abstract":"<p>Machine learning models often struggle to generalize accurately when tested on new class distributions that were not present in their training data. This is a significant challenge for real-world applications that require quick adaptation without the need for retraining. To address this issue, few-shot learning frameworks, which includes models such as Siamese Networks, have been proposed. Siamese Networks learn similarity between pairs of records through a metric that can be easily extended to new, unseen classes. However, these systems lack interpretability, which can hinder their use in certain applications. To address this, we propose a data-agnostic method to explain the outcomes of Siamese Networks in the context of few-shot learning. Our explanation method is based on a post-hoc perturbation-based procedure that evaluates the contribution of individual input features to the final outcome. As such, it falls under the category of post-hoc explanation methods. We present two variants, one that considers each input feature independently, and another that evaluates the interplay between features. Additionally, we propose two perturbation procedures to evaluate feature contributions. Qualitative and quantitative results demonstrate that our method is able to identify highly discriminant intra-class and inter-class characteristics, as well as predictive behaviors that lead to misclassification by relying on incorrect features.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"38 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reversible jump attack to textual classifiers with modification reduction 针对文本分类器的可逆跳转攻击与修改减少
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-04-22 DOI: 10.1007/s10994-024-06539-6
Mingze Ni, Zhensu Sun, Wei Liu

Recent studies on adversarial examples expose vulnerabilities of natural language processing models. Existing techniques for generating adversarial examples are typically driven by deterministic hierarchical rules that are agnostic to the optimal adversarial examples, a strategy that often results in adversarial samples with a suboptimal balance between magnitudes of changes and attack successes. To this end, in this research we propose two algorithms, Reversible Jump Attack (RJA) and Metropolis–Hasting Modification Reduction (MMR), to generate highly effective adversarial examples and to improve the imperceptibility of the examples, respectively. RJA utilizes a novel randomization mechanism to enlarge the search space and efficiently adapts to a number of perturbed words for adversarial examples. With these generated adversarial examples, MMR applies the Metropolis–Hasting sampler to enhance the imperceptibility of adversarial examples. Extensive experiments demonstrate that RJA-MMR outperforms current state-of-the-art methods in attack performance, imperceptibility, fluency and grammar correctness.

最近关于对抗示例的研究暴露了自然语言处理模型的漏洞。生成对抗示例的现有技术通常由确定性分层规则驱动,而这些规则与最优对抗示例无关,这种策略通常会导致对抗样本在变化幅度和攻击成功率之间达不到最佳平衡。为此,我们在本研究中提出了两种算法--可逆跳跃攻击(RJA)和大都会-空速修改还原(MMR),分别用于生成高效的对抗示例和提高示例的不可感知性。RJA 利用一种新颖的随机化机制来扩大搜索空间,并能有效地适应大量扰动词的对抗示例。利用这些生成的对抗示例,MMR 应用 Metropolis-Hasting 采样器来增强对抗示例的不可感知性。大量实验证明,RJA-MMR 在攻击性能、不可感知性、流畅性和语法正确性方面都优于目前最先进的方法。
{"title":"Reversible jump attack to textual classifiers with modification reduction","authors":"Mingze Ni, Zhensu Sun, Wei Liu","doi":"10.1007/s10994-024-06539-6","DOIUrl":"https://doi.org/10.1007/s10994-024-06539-6","url":null,"abstract":"<p>Recent studies on adversarial examples expose vulnerabilities of natural language processing models. Existing techniques for generating adversarial examples are typically driven by deterministic hierarchical rules that are agnostic to the optimal adversarial examples, a strategy that often results in adversarial samples with a suboptimal balance between magnitudes of changes and attack successes. To this end, in this research we propose two algorithms, Reversible Jump Attack (RJA) and Metropolis–Hasting Modification Reduction (MMR), to generate highly effective adversarial examples and to improve the imperceptibility of the examples, respectively. RJA utilizes a novel randomization mechanism to enlarge the search space and efficiently adapts to a number of perturbed words for adversarial examples. With these generated adversarial examples, MMR applies the Metropolis–Hasting sampler to enhance the imperceptibility of adversarial examples. Extensive experiments demonstrate that RJA-MMR outperforms current state-of-the-art methods in attack performance, imperceptibility, fluency and grammar correctness.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"279 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140806544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Coresets for kernel clustering 内核聚类的核集
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-04-22 DOI: 10.1007/s10994-024-06540-z
Shaofeng H. -C. Jiang, Robert Krauthgamer, Jianing Lou, Yubo Zhang

We devise coresets for kernel (k)-Means with a general kernel, and use them to obtain new, more efficient, algorithms. Kernel (k)-Means has superior clustering capability compared to classical (k)-Means, particularly when clusters are non-linearly separable, but it also introduces significant computational challenges. We address this computational issue by constructing a coreset, which is a reduced dataset that accurately preserves the clustering costs. Our main result is a coreset for kernel (k)-Means that works for a general kernel and has size ({{,textrm{poly},}}(kepsilon ^{-1})). Our new coreset both generalizes and greatly improves all previous results; moreover, it can be constructed in time near-linear in n. This result immediately implies new algorithms for kernel (k)-Means, such as a ((1+epsilon ))-approximation in time near-linear in n, and a streaming algorithm using space and update time ({{,textrm{poly},}}(k epsilon ^{-1} log n)). We validate our coreset on various datasets with different kernels. Our coreset performs consistently well, achieving small errors while using very few points. We show that our coresets can speed up kernel (textsc {k-Means++}) (the kernelized version of the widely used (textsc {k-Means++}) algorithm), and we further use this faster kernel (textsc {k-Means++}) for spectral clustering. In both applications, we achieve significant speedup and a better asymptotic growth while the error is comparable to baselines that do not use coresets.

我们为具有一般核的核(k/)-Means设计了核集,并利用它们获得了更高效的新算法。与经典的(k)-Means相比,核(k)-Means具有更优越的聚类能力,尤其是当聚类是非线性可分离的时候,但它也带来了巨大的计算挑战。我们通过构建一个核心集来解决这个计算问题,核心集是一个缩小了的数据集,它能准确地保留聚类成本。我们的主要成果是一个适用于一般内核、大小为 ({{,textrm{poly},}}(kepsilon ^{-1}))的内核 (k)-Means 的核心集。我们的新内核既概括了之前的所有结果,又大大改进了这些结果;此外,它可以在接近 n 线性的时间内构造出来。这一结果立即意味着核(k)-均值的新算法,比如在时间上接近于 n 的 ((1+epsilon ))-approximation 算法,以及使用空间和更新时间的流算法 ({{,textrm{poly},}(kepsilon ^{-1} log n))。我们用不同的内核在各种数据集上验证了我们的核心集。我们的核心集始终表现出色,在使用极少量点的情况下误差很小。我们的研究表明,我们的核心集可以加快核(textsc {k-Means++})(广泛使用的核(textsc {k-Means++})算法的核化版本)的速度,我们还将这种更快的核(textsc {k-Means++})用于光谱聚类。在这两种应用中,我们都实现了显著的提速和更好的渐进增长,而误差则与不使用核集的基线相当。
{"title":"Coresets for kernel clustering","authors":"Shaofeng H. -C. Jiang, Robert Krauthgamer, Jianing Lou, Yubo Zhang","doi":"10.1007/s10994-024-06540-z","DOIUrl":"https://doi.org/10.1007/s10994-024-06540-z","url":null,"abstract":"<p>We devise coresets for kernel <span>(k)</span>-<span>Means</span> with a general kernel, and use them to obtain new, more efficient, algorithms. Kernel <span>(k)</span>-<span>Means</span> has superior clustering capability compared to classical <span>(k)</span>-<span>Means</span>, particularly when clusters are non-linearly separable, but it also introduces significant computational challenges. We address this computational issue by constructing a coreset, which is a reduced dataset that accurately preserves the clustering costs. Our main result is a coreset for kernel <span>(k)</span>-<span>Means</span> that works for a general kernel and has size <span>({{,textrm{poly},}}(kepsilon ^{-1}))</span>. Our new coreset both generalizes and greatly improves all previous results; moreover, it can be constructed in time near-linear in <i>n</i>. This result immediately implies new algorithms for kernel <span>(k)</span>-<span>Means</span>, such as a <span>((1+epsilon ))</span>-approximation in time near-linear in <i>n</i>, and a streaming algorithm using space and update time <span>({{,textrm{poly},}}(k epsilon ^{-1} log n))</span>. We validate our coreset on various datasets with different kernels. Our coreset performs consistently well, achieving small errors while using very few points. We show that our coresets can speed up kernel <span>(textsc {k-Means++})</span> (the kernelized version of the widely used <span>(textsc {k-Means++})</span> algorithm), and we further use this faster kernel <span>(textsc {k-Means++})</span> for spectral clustering. In both applications, we achieve significant speedup and a better asymptotic growth while the error is comparable to baselines that do not use coresets.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"2 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140806646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From MNIST to ImageNet and back: benchmarking continual curriculum learning 从 MNIST 到 ImageNet 再到 ImageNet:持续课程学习的基准测试
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-04-22 DOI: 10.1007/s10994-024-06524-z
Kamil Faber, Dominik Zurek, Marcin Pietron, Nathalie Japkowicz, Antonio Vergari, Roberto Corizzo

Continual learning (CL) is one of the most promising trends in recent machine learning research. Its goal is to go beyond classical assumptions in machine learning and develop models and learning strategies that present high robustness in dynamic environments. This goal is realized by designing strategies that simultaneously foster the incorporation of new knowledge while avoiding forgetting past knowledge. The landscape of CL research is fragmented into several learning evaluation protocols, comprising different learning tasks, datasets, and evaluation metrics. Additionally, the benchmarks adopted so far are still distant from the complexity of real-world scenarios, and are usually tailored to highlight capabilities specific to certain strategies. In such a landscape, it is hard to clearly and objectively assess models and strategies. In this work, we fill this gap for CL on image data by introducing two novel CL benchmarks that involve multiple heterogeneous tasks from six image datasets, with varying levels of complexity and quality. Our aim is to fairly evaluate current state-of-the-art CL strategies on a common ground that is closer to complex real-world scenarios. We additionally structure our benchmarks so that tasks are presented in increasing and decreasing order of complexity—according to a curriculum—in order to evaluate if current CL models are able to exploit structure across tasks. We devote particular emphasis to providing the CL community with a rigorous and reproducible evaluation protocol for measuring the ability of a model to generalize and not to forget while learning. Furthermore, we provide an extensive experimental evaluation showing that popular CL strategies, when challenged with our proposed benchmarks, yield sub-par performance, high levels of forgetting, and present a limited ability to effectively leverage curriculum task ordering. We believe that these results highlight the need for rigorous comparisons in future CL works as well as pave the way to design new CL strategies that are able to deal with more complex scenarios.

持续学习(CL)是近期机器学习研究中最有前途的趋势之一。它的目标是超越机器学习的经典假设,开发在动态环境中具有高鲁棒性的模型和学习策略。要实现这一目标,就要设计出既能促进新知识的吸收,又能避免遗忘过去知识的策略。CL研究的范围被划分为多个学习评估协议,包括不同的学习任务、数据集和评估指标。此外,迄今为止所采用的基准仍与现实世界场景的复杂性相去甚远,而且通常是为突出某些策略的特定能力而量身定制的。在这种情况下,很难对模型和策略进行清晰客观的评估。在这项工作中,我们引入了两个新颖的图像数据分析基准,涉及六个图像数据集的多个异构任务,复杂程度和质量各不相同,从而填补了图像数据分析的这一空白。我们的目标是在更接近复杂现实世界场景的共同基础上,公平地评估当前最先进的 CL 策略。此外,我们还对基准进行了结构化设计,使任务的复杂度按照课程的顺序依次递增和递减,以评估当前的 CL 模型是否能够利用跨任务的结构。我们特别强调要为 CL 社区提供一个严格的、可重复的评估协议,以衡量模型的泛化能力和在学习过程中不遗忘的能力。此外,我们还提供了广泛的实验评估,结果表明,当使用我们提出的基准进行挑战时,流行的CL策略会产生不合格的性能、高水平的遗忘,并且有效利用课程任务排序的能力有限。我们认为,这些结果凸显了在未来的学习策略研究中进行严格比较的必要性,同时也为设计能够应对更复杂情况的新型学习策略铺平了道路。
{"title":"From MNIST to ImageNet and back: benchmarking continual curriculum learning","authors":"Kamil Faber, Dominik Zurek, Marcin Pietron, Nathalie Japkowicz, Antonio Vergari, Roberto Corizzo","doi":"10.1007/s10994-024-06524-z","DOIUrl":"https://doi.org/10.1007/s10994-024-06524-z","url":null,"abstract":"<p>Continual learning (CL) is one of the most promising trends in recent machine learning research. Its goal is to go beyond classical assumptions in machine learning and develop models and learning strategies that present high robustness in dynamic environments. This goal is realized by designing strategies that simultaneously foster the incorporation of new knowledge while avoiding forgetting past knowledge. The landscape of CL research is fragmented into several learning evaluation protocols, comprising different learning tasks, datasets, and evaluation metrics. Additionally, the benchmarks adopted so far are still distant from the complexity of real-world scenarios, and are usually tailored to highlight capabilities specific to certain strategies. In such a landscape, it is hard to clearly and objectively assess models and strategies. In this work, we fill this gap for CL on image data by introducing two novel CL benchmarks that involve multiple heterogeneous tasks from six image datasets, with varying levels of complexity and quality. Our aim is to fairly evaluate current state-of-the-art CL strategies on a common ground that is closer to complex real-world scenarios. We additionally structure our benchmarks so that tasks are presented in increasing and decreasing order of complexity—according to a curriculum—in order to evaluate if current CL models are able to exploit structure across tasks. We devote particular emphasis to providing the CL community with a rigorous and reproducible evaluation protocol for measuring the ability of a model to generalize and not to forget while learning. Furthermore, we provide an extensive experimental evaluation showing that popular CL strategies, when challenged with our proposed benchmarks, yield sub-par performance, high levels of forgetting, and present a limited ability to effectively leverage curriculum task ordering. We believe that these results highlight the need for rigorous comparisons in future CL works as well as pave the way to design new CL strategies that are able to deal with more complex scenarios.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"21 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140798897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A survey on interpretable reinforcement learning 可解释强化学习调查
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-04-19 DOI: 10.1007/s10994-024-06543-w
Claire Glanois, Paul Weng, Matthieu Zimmer, Dong Li, Tianpei Yang, Jianye Hao, Wulong Liu

Although deep reinforcement learning has become a promising machine learning approach for sequential decision-making problems, it is still not mature enough for high-stake domains such as autonomous driving or medical applications. In such contexts, a learned policy needs for instance to be interpretable, so that it can be inspected before any deployment (e.g., for safety and verifiability reasons). This survey provides an overview of various approaches to achieve higher interpretability in reinforcement learning (RL). To that aim, we distinguish interpretability (as an intrinsic property of a model) and explainability (as a post-hoc operation) and discuss them in the context of RL with an emphasis on the former notion. In particular, we argue that interpretable RL may embrace different facets: interpretable inputs, interpretable (transition/reward) models, and interpretable decision-making. Based on this scheme, we summarize and analyze recent work related to interpretable RL with an emphasis on papers published in the past 10 years. We also discuss briefly some related research areas and point to some potential promising research directions, notably related to the recent development of foundation models (e.g., large language models, RL from human feedback).

虽然深度强化学习已成为一种很有前途的机器学习方法,可用于连续决策问题,但对于自动驾驶或医疗应用等高风险领域来说,它还不够成熟。在这种情况下,学习到的策略需要具有可解释性,以便在部署前对其进行检查(例如,出于安全性和可验证性的原因)。本调查概述了在强化学习(RL)中实现更高可解释性的各种方法。为此,我们区分了可解释性(作为模型的固有属性)和可解释性(作为事后操作),并在 RL 的背景下对它们进行了讨论,重点放在前者的概念上。特别是,我们认为可解释的 RL 可能包含不同的方面:可解释的输入、可解释的(过渡/回报)模型和可解释的决策。基于这一方案,我们总结并分析了与可解释 RL 相关的最新研究成果,重点是过去 10 年发表的论文。我们还简要讨论了一些相关的研究领域,并指出了一些潜在的有前途的研究方向,特别是与基础模型(如大型语言模型、来自人类反馈的 RL)的最新发展相关的研究方向。
{"title":"A survey on interpretable reinforcement learning","authors":"Claire Glanois, Paul Weng, Matthieu Zimmer, Dong Li, Tianpei Yang, Jianye Hao, Wulong Liu","doi":"10.1007/s10994-024-06543-w","DOIUrl":"https://doi.org/10.1007/s10994-024-06543-w","url":null,"abstract":"<p>Although deep reinforcement learning has become a promising machine learning approach for sequential decision-making problems, it is still not mature enough for high-stake domains such as autonomous driving or medical applications. In such contexts, a learned policy needs for instance to be interpretable, so that it can be inspected before any deployment (e.g., for safety and verifiability reasons). This survey provides an overview of various approaches to achieve higher interpretability in reinforcement learning (RL). To that aim, we distinguish interpretability (as an intrinsic property of a model) and explainability (as a post-hoc operation) and discuss them in the context of RL with an emphasis on the former notion. In particular, we argue that interpretable RL may embrace different facets: interpretable inputs, interpretable (transition/reward) models, and interpretable decision-making. Based on this scheme, we summarize and analyze recent work related to interpretable RL with an emphasis on papers published in the past 10 years. We also discuss briefly some related research areas and point to some potential promising research directions, notably related to the recent development of foundation models (e.g., large language models, RL from human feedback).</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"33 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140625634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Machine Learning
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1