首页 > 最新文献

Machine Learning最新文献

英文 中文
Meta-learning for heterogeneous treatment effect estimation with closed-form solvers 利用闭式求解器进行异质治疗效果估计的元学习
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-05-29 DOI: 10.1007/s10994-024-06546-7
Tomoharu Iwata, Yoichi Chikahara

This article proposes a meta-learning method for estimating the conditional average treatment effect (CATE) from a few observational data. The proposed method learns how to estimate CATEs from multiple tasks and uses the knowledge for unseen tasks. In the proposed method, based on the meta-learner framework, we decompose the CATE estimation problem into sub-problems. For each sub-problem, we formulate our estimation models using neural networks with task-shared and task-specific parameters. With our formulation, we can obtain optimal task-specific parameters in a closed form that are differentiable with respect to task-shared parameters, making it possible to perform effective meta-learning. The task-shared parameters are trained such that the expected CATE estimation performance in few-shot settings is improved by minimizing the difference between a CATE estimated with a large amount of data and one estimated with just a few data. Our experimental results demonstrate that our method outperforms the existing meta-learning approaches and CATE estimation methods.

本文提出了一种元学习方法,用于从少量观察数据中估计条件平均治疗效果(CATE)。该方法可以学习如何从多个任务中估计 CATE,并将所学知识用于未见任务。在所提出的方法中,基于元学习者框架,我们将 CATE 估计问题分解为多个子问题。对于每个子问题,我们使用带有任务共享参数和任务特定参数的神经网络来建立估计模型。通过我们的表述,我们可以以封闭形式获得最优的特定任务参数,这些参数相对于任务共享参数是可微分的,从而可以进行有效的元学习。对任务共享参数进行训练后,通过最小化用大量数据估算出的 CATE 与仅用少量数据估算出的 CATE 之间的差异,可以提高在少量数据设置下的预期 CATE 估算性能。实验结果表明,我们的方法优于现有的元学习方法和 CATE 估算方法。
{"title":"Meta-learning for heterogeneous treatment effect estimation with closed-form solvers","authors":"Tomoharu Iwata, Yoichi Chikahara","doi":"10.1007/s10994-024-06546-7","DOIUrl":"https://doi.org/10.1007/s10994-024-06546-7","url":null,"abstract":"<p>This article proposes a meta-learning method for estimating the conditional average treatment effect (CATE) from a few observational data. The proposed method learns how to estimate CATEs from multiple tasks and uses the knowledge for unseen tasks. In the proposed method, based on the meta-learner framework, we decompose the CATE estimation problem into sub-problems. For each sub-problem, we formulate our estimation models using neural networks with task-shared and task-specific parameters. With our formulation, we can obtain optimal task-specific parameters in a closed form that are differentiable with respect to task-shared parameters, making it possible to perform effective meta-learning. The task-shared parameters are trained such that the expected CATE estimation performance in few-shot settings is improved by minimizing the difference between a CATE estimated with a large amount of data and one estimated with just a few data. Our experimental results demonstrate that our method outperforms the existing meta-learning approaches and CATE estimation methods.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"17 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141197657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Probabilistic grammars for modeling dynamical systems from coarse, noisy, and partial data 从粗略、嘈杂和部分数据为动力系统建模的概率语法
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-05-29 DOI: 10.1007/s10994-024-06522-1
Nina Omejc, Boštjan Gec, Jure Brence, Ljupčo Todorovski, Sašo Džeroski

Ordinary differential equations (ODEs) are a widely used formalism for the mathematical modeling of dynamical systems, a task omnipresent in scientific domains. The paper introduces a novel method for inferring ODEs from data, which extends ProGED, a method for equation discovery that allows users to formalize domain-specific knowledge as probabilistic context-free grammars and use it for constraining the space of candidate equations. The extended method can discover ODEs from partial observations of dynamical systems, where only a subset of state variables can be observed. To evaluate the performance of the newly proposed method, we perform a systematic empirical comparison with alternative state-of-the-art methods for equation discovery and system identification from complete and partial observations. The comparison uses Dynobench, a set of ten dynamical systems that extends the standard Strogatz benchmark. We compare the ability of the considered methods to reconstruct the known ODEs from synthetic data simulated at different temporal resolutions. We also consider data with different levels of noise, i.e., signal-to-noise ratios. The improved ProGED compares favourably to state-of-the-art methods for inferring ODEs from data regarding reconstruction abilities and robustness to data coarseness, noise, and completeness.

常微分方程(ODEs)是一种广泛应用于动态系统数学建模的形式主义,是科学领域无处不在的任务。ProGED 是一种用于发现方程的方法,允许用户将特定领域的知识形式化为概率无上下文语法,并将其用于限制候选方程的空间。这种扩展方法可以从动态系统的部分观测结果中发现 ODE,在这种情况下,只能观测到状态变量的子集。为了评估新方法的性能,我们与其他最先进的方法进行了系统的实证比较,以便从完整和部分观测结果中发现方程和识别系统。比较使用的是 Dynobench,这是一套扩展了标准 Strogatz 基准的十个动态系统。我们比较了所考虑的方法从不同时间分辨率模拟的合成数据中重建已知 ODE 的能力。我们还考虑了不同噪声水平(即信噪比)的数据。改进后的 ProGED 在重构能力以及对数据粗度、噪声和完整性的鲁棒性方面优于最先进的从数据推断 ODE 的方法。
{"title":"Probabilistic grammars for modeling dynamical systems from coarse, noisy, and partial data","authors":"Nina Omejc, Boštjan Gec, Jure Brence, Ljupčo Todorovski, Sašo Džeroski","doi":"10.1007/s10994-024-06522-1","DOIUrl":"https://doi.org/10.1007/s10994-024-06522-1","url":null,"abstract":"<p>Ordinary differential equations (ODEs) are a widely used formalism for the mathematical modeling of dynamical systems, a task omnipresent in scientific domains. The paper introduces a novel method for inferring ODEs from data, which extends ProGED, a method for equation discovery that allows users to formalize domain-specific knowledge as probabilistic context-free grammars and use it for constraining the space of candidate equations. The extended method can discover ODEs from partial observations of dynamical systems, where only a subset of state variables can be observed. To evaluate the performance of the newly proposed method, we perform a systematic empirical comparison with alternative state-of-the-art methods for equation discovery and system identification from complete and partial observations. The comparison uses Dynobench, a set of ten dynamical systems that extends the standard Strogatz benchmark. We compare the ability of the considered methods to reconstruct the known ODEs from synthetic data simulated at different temporal resolutions. We also consider data with different levels of noise, i.e., signal-to-noise ratios. The improved ProGED compares favourably to state-of-the-art methods for inferring ODEs from data regarding reconstruction abilities and robustness to data coarseness, noise, and completeness.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"43 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141197670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating feature attribution methods in the image domain 评估图像领域的特征归属方法
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-05-24 DOI: 10.1007/s10994-024-06550-x
Arne Gevaert, Axel-Jan Rousseau, Thijs Becker, Dirk Valkenborg, Tijl De Bie, Yvan Saeys

Feature attribution maps are a popular approach to highlight the most important pixels in an image for a given prediction of a model. Despite a recent growth in popularity and available methods, the objective evaluation of such attribution maps remains an open problem. Building on previous work in this domain, we investigate existing quality metrics and propose new variants of metrics for the evaluation of attribution maps. We confirm a recent finding that different quality metrics seem to measure different underlying properties of attribution maps, and extend this finding to a larger selection of attribution methods, quality metrics, and datasets. We also find that metric results on one dataset do not necessarily generalize to other datasets, and methods with desirable theoretical properties do not necessarily outperform computationally cheaper alternatives in practice. Based on these findings, we propose a general benchmarking approach to help guide the selection of attribution methods for a given use case. Implementations of attribution metrics and our experiments are available online (https://github.com/arnegevaert/benchmark-general-imaging).

Graphical abstract

特征归因图是一种流行的方法,用于突出图像中对给定模型预测最重要的像素。尽管最近这种方法越来越流行,可用性也越来越高,但如何客观地评估这种归因图仍然是一个有待解决的问题。在该领域以往工作的基础上,我们研究了现有的质量度量标准,并提出了用于评估归因图的新度量标准变体。我们证实了最近的一项发现,即不同的质量度量似乎衡量了归因图的不同基本属性,并将这一发现扩展到更多的归因方法、质量度量和数据集。我们还发现,一个数据集上的度量结果并不一定适用于其他数据集,而且具有理想理论属性的方法在实践中并不一定优于计算成本更低的替代方法。基于这些发现,我们提出了一种通用的基准测试方法,以帮助指导特定用例中归因方法的选择。归因指标的实现和我们的实验可在线获取(https://github.com/arnegevaert/benchmark-general-imaging)。图文摘要
{"title":"Evaluating feature attribution methods in the image domain","authors":"Arne Gevaert, Axel-Jan Rousseau, Thijs Becker, Dirk Valkenborg, Tijl De Bie, Yvan Saeys","doi":"10.1007/s10994-024-06550-x","DOIUrl":"https://doi.org/10.1007/s10994-024-06550-x","url":null,"abstract":"<p>Feature attribution maps are a popular approach to highlight the most important pixels in an image for a given prediction of a model. Despite a recent growth in popularity and available methods, the objective evaluation of such attribution maps remains an open problem. Building on previous work in this domain, we investigate existing quality metrics and propose new variants of metrics for the evaluation of attribution maps. We confirm a recent finding that different quality metrics seem to measure different underlying properties of attribution maps, and extend this finding to a larger selection of attribution methods, quality metrics, and datasets. We also find that metric results on one dataset do not necessarily generalize to other datasets, and methods with desirable theoretical properties do not necessarily outperform computationally cheaper alternatives in practice. Based on these findings, we propose a general benchmarking approach to help guide the selection of attribution methods for a given use case. Implementations of attribution metrics and our experiments are available online (https://github.com/arnegevaert/benchmark-general-imaging).</p><h3 data-test=\"abstract-sub-heading\">Graphical abstract</h3>\u0000","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"17 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141153809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Classification with costly features in hierarchical deep sets 在分层深度集合中使用代价高昂的特征进行分类
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-05-22 DOI: 10.1007/s10994-024-06565-4
Jaromír Janisch, Tomáš Pevný, Viliam Lisý

Classification with costly features (CwCF) is a classification problem that includes the cost of features in the optimization criteria. Individually for each sample, its features are sequentially acquired to maximize accuracy while minimizing the acquired features’ cost. However, existing approaches can only process data that can be expressed as vectors of fixed length. In real life, the data often possesses rich and complex structure, which can be more precisely described with formats such as XML or JSON. The data is hierarchical and often contains nested lists of objects. In this work, we extend an existing deep reinforcement learning-based algorithm with hierarchical deep sets and hierarchical softmax, so that it can directly process this data. The extended method has greater control over which features it can acquire and, in experiments with seven datasets, we show that this leads to superior performance. To showcase the real usage of the new method, we apply it to a real-life problem of classifying malicious web domains, using an online service.

高成本特征分类(CwCF)是一种将特征成本纳入优化标准的分类问题。对于每个样本,都要按顺序获取其特征,以最大限度地提高准确率,同时最小化获取特征的成本。然而,现有方法只能处理以固定长度向量表示的数据。在现实生活中,数据往往具有丰富而复杂的结构,而 XML 或 JSON 等格式可以更精确地描述这些结构。数据是分层的,通常包含嵌套的对象列表。在这项工作中,我们利用分层深度集和分层软最大值扩展了现有的基于深度强化学习的算法,使其可以直接处理这些数据。通过对七个数据集的实验,我们发现这种扩展方法能更好地控制所能获取的特征,从而带来更出色的性能。为了展示新方法的实际用途,我们将其应用于一个实际问题,即利用在线服务对恶意网站域名进行分类。
{"title":"Classification with costly features in hierarchical deep sets","authors":"Jaromír Janisch, Tomáš Pevný, Viliam Lisý","doi":"10.1007/s10994-024-06565-4","DOIUrl":"https://doi.org/10.1007/s10994-024-06565-4","url":null,"abstract":"<p>Classification with costly features (CwCF) is a classification problem that includes the cost of features in the optimization criteria. Individually for each sample, its features are sequentially acquired to maximize accuracy while minimizing the acquired features’ cost. However, existing approaches can only process data that can be expressed as vectors of fixed length. In real life, the data often possesses rich and complex structure, which can be more precisely described with formats such as XML or JSON. The data is hierarchical and often contains nested lists of objects. In this work, we extend an existing deep reinforcement learning-based algorithm with hierarchical deep sets and hierarchical softmax, so that it can directly process this data. The extended method has greater control over which features it can acquire and, in experiments with seven datasets, we show that this leads to superior performance. To showcase the real usage of the new method, we apply it to a real-life problem of classifying malicious web domains, using an online service.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"29 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141151499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CoMadOut—a robust outlier detection algorithm based on CoMAD CoMadOut - 基于 CoMAD 的鲁棒离群点检测算法
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-05-07 DOI: 10.1007/s10994-024-06521-2
Andreas Lohrer, Daniyal Kazempour, Maximilian Hünemörder, Peer Kröger

Unsupervised learning methods are well established in the area of anomaly detection and achieve state of the art performances on outlier datasets. Outliers play a significant role, since they bear the potential to distort the predictions of a machine learning algorithm on a given dataset. Especially among PCA-based methods, outliers have an additional destructive potential regarding the result: they may not only distort the orientation and translation of the principal components, they also make it more complicated to detect outliers. To address this problem, we propose the robust outlier detection algorithm CoMadOut, which satisfies two required properties: (1) being robust towards outliers and (2) detecting them. Our CoMadOut outlier detection variants using comedian PCA define, dependent on its variant, an inlier region with a robust noise margin by measures of in-distribution (variant CMO) and optimized scores by measures of out-of-distribution (variants CMO*), e.g. kurtosis-weighting by CMO+k. These measures allow distribution based outlier scoring for each principal component, and thus, an appropriate alignment of the degree of outlierness between normal and abnormal instances. Experiments comparing CoMadOut with traditional, deep and other comparable robust outlier detection methods showed that the performance of the introduced CoMadOut approach is competitive to well established methods related to average precision (AP), area under the precision recall curve (AUPRC) and area under the receiver operating characteristic (AUROC) curve. In summary our approach can be seen as a robust alternative for outlier detection tasks.

无监督学习方法在异常检测领域已得到广泛应用,并在异常值数据集上实现了最先进的性能。异常值具有重要作用,因为它们有可能扭曲机器学习算法对给定数据集的预测。特别是在基于 PCA 的方法中,离群值对结果具有额外的破坏潜力:它们不仅会扭曲主成分的方向和平移,还会使离群值的检测变得更加复杂。为了解决这个问题,我们提出了鲁棒离群值检测算法 CoMadOut,它满足两个必要的属性:(1)对离群值的鲁棒性和(2)检测离群值。我们的 CoMadOut 离群点检测变体使用 comedian PCA,根据其变体,通过分布内测量(变体 CMO)和分布外测量(变体 CMO*)(如 CMO+k 的峰度加权)定义具有稳健噪声边际的离群点区域和优化分数。这些测量方法可以对每个主成分进行基于分布的离群值评分,从而对正常和异常实例之间的离群程度进行适当的调整。将 CoMadOut 与传统的、深度的和其他类似的鲁棒离群点检测方法进行比较的实验表明,引入的 CoMadOut 方法在平均精度(AP)、精度召回曲线下面积(AUPRC)和接收者操作特征曲线下面积(AUROC)方面的性能与成熟的方法相比具有竞争力。总之,我们的方法可被视为离群点检测任务的一种稳健替代方法。
{"title":"CoMadOut—a robust outlier detection algorithm based on CoMAD","authors":"Andreas Lohrer, Daniyal Kazempour, Maximilian Hünemörder, Peer Kröger","doi":"10.1007/s10994-024-06521-2","DOIUrl":"https://doi.org/10.1007/s10994-024-06521-2","url":null,"abstract":"<p>Unsupervised learning methods are well established in the area of anomaly detection and achieve state of the art performances on outlier datasets. Outliers play a significant role, since they bear the potential to distort the predictions of a machine learning algorithm on a given dataset. Especially among PCA-based methods, outliers have an additional destructive potential regarding the result: they may not only distort the orientation and translation of the principal components, they also make it more complicated to detect outliers. To address this problem, we propose the robust outlier detection algorithm CoMadOut, which satisfies two required properties: (1) being robust towards outliers and (2) detecting them. Our CoMadOut outlier detection variants using comedian PCA define, dependent on its variant, an inlier region with a robust noise margin by measures of in-distribution (variant CMO) and optimized scores by measures of out-of-distribution (variants CMO*), e.g. kurtosis-weighting by CMO+k. These measures allow distribution based outlier scoring for each principal component, and thus, an appropriate alignment of the degree of outlierness between normal and abnormal instances. Experiments comparing CoMadOut with traditional, deep and other comparable robust outlier detection methods showed that the performance of the introduced CoMadOut approach is competitive to well established methods related to average precision (AP), area under the precision recall curve (AUPRC) and area under the receiver operating characteristic (AUROC) curve. In summary our approach can be seen as a robust alternative for outlier detection tasks.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"1 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140884304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SWoTTeD: an extension of tensor decomposition to temporal phenotyping SWoTTeD:将张量分解扩展到时间表型分析
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-04-30 DOI: 10.1007/s10994-024-06545-8
Hana Sebia, Thomas Guyet, Etienne Audureau

Tensor decomposition has recently been gaining attention in the machine learning community for the analysis of individual traces, such as Electronic Health Records. However, this task becomes significantly more difficult when the data follows complex temporal patterns. This paper introduces the notion of a temporal phenotype as an arrangement of features over time and it proposes SWoTTeD (Sliding Window for Temporal Tensor Decomposition), a novel method to discover hidden temporal patterns. SWoTTeD integrates several constraints and regularizations to enhance the interpretability of the extracted phenotypes. We validate our proposal using both synthetic and real-world datasets, and we present an original usecase using data from the Greater Paris University Hospital. The results show that SWoTTeD achieves at least as accurate reconstruction as recent state-of-the-art tensor decomposition models, and extracts temporal phenotypes that are meaningful for clinicians.

最近,张量分解法在机器学习领域越来越受到关注,它可用于分析电子健康记录等单个痕迹。然而,当数据遵循复杂的时间模式时,这项任务就变得困难得多。本文引入了时态表型的概念,将其视为随时间变化的特征排列,并提出了 SWoTTeD(用于时态张量分解的滑动窗口),这是一种发现隐藏时态模式的新方法。SWoTTeD 整合了多个约束条件和正则化,以提高提取表型的可解释性。我们使用合成数据集和真实世界数据集验证了我们的建议,并使用大巴黎大学医院的数据介绍了一个原创案例。结果表明,SWoTTeD 所实现的重建精确度至少不亚于最近最先进的张量分解模型,而且提取的时间表型对临床医生来说非常有意义。
{"title":"SWoTTeD: an extension of tensor decomposition to temporal phenotyping","authors":"Hana Sebia, Thomas Guyet, Etienne Audureau","doi":"10.1007/s10994-024-06545-8","DOIUrl":"https://doi.org/10.1007/s10994-024-06545-8","url":null,"abstract":"<p>Tensor decomposition has recently been gaining attention in the machine learning community for the analysis of individual traces, such as Electronic Health Records. However, this task becomes significantly more difficult when the data follows complex temporal patterns. This paper introduces the notion of a temporal phenotype as an arrangement of features over time and it proposes <span>SWoTTeD</span> (<b>S</b>liding <b>W</b>ind<b>o</b>w for <b>T</b>emporal <b>Te</b>nsor <b>D</b>ecomposition), a novel method to discover hidden temporal patterns. <span>SWoTTeD</span> integrates several constraints and regularizations to enhance the interpretability of the extracted phenotypes. We validate our proposal using both synthetic and real-world datasets, and we present an original usecase using data from the Greater Paris University Hospital. The results show that <span>SWoTTeD</span> achieves at least as accurate reconstruction as recent state-of-the-art tensor decomposition models, and extracts temporal phenotypes that are meaningful for clinicians.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"12 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140840995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Finite-time error bounds for Greedy-GQ Greedy-GQ 的有限时间误差边界
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-04-30 DOI: 10.1007/s10994-024-06542-x
Yue Wang, Yi Zhou, Shaofeng Zou

Greedy-GQ with linear function approximation, originally proposed in Maei et al. (in: Proceedings of the international conference on machine learning (ICML), 2010), is a value-based off-policy algorithm for optimal control in reinforcement learning, and it has a non-linear two timescale structure with non-convex objective function. This paper develops its tightest finite-time error bounds. We show that the Greedy-GQ algorithm converges as fast as (mathcal {O}({1}/{sqrt{T}})) under the i.i.d. setting and (mathcal {O}({log T}/{sqrt{T}})) under the Markovian setting. We further design variant of the vanilla Greedy-GQ algorithm using the nested-loop approach, and show that its sample complexity is (mathcal {O}({log (1/epsilon )epsilon ^{-2}})), which matches with the one of the vanilla Greedy-GQ. Our finite-time error bounds match with the one of the stochastic gradient descent algorithm for general smooth non-convex optimization problems, despite of its additonal challenge in the two time-scale updates. Our finite-sample analysis provides theoretical guidance on choosing step-sizes for faster convergence in practice, and suggests the trade-off between the convergence rate and the quality of the obtained policy. Our techniques provide a general approach for finite-sample analysis of non-convex two timescale value-based reinforcement learning algorithms.

具有线性函数近似的 Greedy-GQ 算法最初是由 Maei 等人提出的(Proceedings of international conference of machine learning (ICML, 2010)):国际机器学习会议(ICML)论文集,2010 年)中提出的,是一种基于值的非策略算法,用于强化学习中的最优控制,它具有非线性双时标结构和非凸目标函数。本文开发了其最严格的有限时间误差边界。我们证明,在 i.i.d. 设定下,Greedy-GQ 算法的收敛速度可达 (mathcal {O}({1}/{sqrt{T}})) ;在马尔可夫设定下,收敛速度可达 (mathcal {O}({log T}/{sqrt{T}})) 。我们使用嵌套循环方法进一步设计了vanilla Greedy-GQ算法的变体,并证明其采样复杂度为(mathcal {O}({log (1/epsilon )epsilon ^{-2}})),与vanilla Greedy-GQ算法的采样复杂度相匹配。我们的有限时间误差边界与随机梯度下降算法的有限时间误差边界相匹配,该算法适用于一般平滑非凸优化问题,尽管它在两个时间尺度的更新中面临额外的挑战。我们的有限样本分析为在实践中选择更快收敛的步长提供了理论指导,并提出了收敛速度与所获策略质量之间的权衡。我们的技术为基于值的非凸双时间尺度强化学习算法的有限样本分析提供了一种通用方法。
{"title":"Finite-time error bounds for Greedy-GQ","authors":"Yue Wang, Yi Zhou, Shaofeng Zou","doi":"10.1007/s10994-024-06542-x","DOIUrl":"https://doi.org/10.1007/s10994-024-06542-x","url":null,"abstract":"<p>Greedy-GQ with linear function approximation, originally proposed in Maei et al. (in: Proceedings of the international conference on machine learning (ICML), 2010), is a value-based off-policy algorithm for optimal control in reinforcement learning, and it has a non-linear two timescale structure with non-convex objective function. This paper develops its tightest finite-time error bounds. We show that the Greedy-GQ algorithm converges as fast as <span>(mathcal {O}({1}/{sqrt{T}}))</span> under the i.i.d. setting and <span>(mathcal {O}({log T}/{sqrt{T}}))</span> under the Markovian setting. We further design variant of the vanilla Greedy-GQ algorithm using the nested-loop approach, and show that its sample complexity is <span>(mathcal {O}({log (1/epsilon )epsilon ^{-2}}))</span>, which matches with the one of the vanilla Greedy-GQ. Our finite-time error bounds match with the one of the stochastic gradient descent algorithm for general smooth non-convex optimization problems, despite of its additonal challenge in the two time-scale updates. Our finite-sample analysis provides theoretical guidance on choosing step-sizes for faster convergence in practice, and suggests the trade-off between the convergence rate and the quality of the obtained policy. Our techniques provide a general approach for finite-sample analysis of non-convex two timescale value-based reinforcement learning algorithms.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"41 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantic-enhanced graph neural networks with global context representation 具有全局上下文表示的语义增强图神经网络
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-04-29 DOI: 10.1007/s10994-024-06523-0
Youcheng Qian, Xueyan Yin

Node classification is a crucial task for efficiently analyzing graph-structured data. Related semi-supervised methods have been extensively studied to address the scarcity of labeled data in emerging classes. However, two fundamental weaknesses hinder the performance: lacking the ability to mine latent semantic information between nodes, or ignoring to simultaneously capture local and global coupling dependencies between different nodes. To solve these limitations, we propose a novel semantic-enhanced graph neural networks with global context representation for semi-supervised node classification. Specifically, we first use graph convolution network to learn short-range local dependencies, which not only considers the spatial topological structure relationship between nodes, but also takes into account the semantic correlation between nodes to enhance the representation ability of nodes. Second, an improved Transformer model is introduced to reasoning the long-range global pairwise relationships, which has linear computational complexity and is particularly important for large datasets. Finally, the proposed model shows strong performance on various open datasets, demonstrating the superiority of our solutions.

节点分类是高效分析图结构数据的一项重要任务。相关的半监督方法已被广泛研究,以解决新兴类别中标记数据稀缺的问题。然而,有两个基本弱点阻碍了这些方法的性能:缺乏挖掘节点间潜在语义信息的能力,或者忽略了同时捕捉不同节点间的局部和全局耦合依赖关系。为了解决这些局限性,我们提出了一种具有全局上下文表示的新型语义增强图神经网络,用于半监督节点分类。具体来说,我们首先利用图卷积网络来学习短程局部依赖关系,这不仅考虑了节点之间的空间拓扑结构关系,还考虑了节点之间的语义关联,从而增强了节点的表示能力。其次,引入改进的 Transformer 模型来推理长程全局配对关系,该模型具有线性计算复杂度,对于大型数据集尤为重要。最后,所提出的模型在各种开放数据集上表现出很强的性能,证明了我们的解决方案的优越性。
{"title":"Semantic-enhanced graph neural networks with global context representation","authors":"Youcheng Qian, Xueyan Yin","doi":"10.1007/s10994-024-06523-0","DOIUrl":"https://doi.org/10.1007/s10994-024-06523-0","url":null,"abstract":"<p>Node classification is a crucial task for efficiently analyzing graph-structured data. Related semi-supervised methods have been extensively studied to address the scarcity of labeled data in emerging classes. However, two fundamental weaknesses hinder the performance: lacking the ability to mine latent semantic information between nodes, or ignoring to simultaneously capture local and global coupling dependencies between different nodes. To solve these limitations, we propose a novel semantic-enhanced graph neural networks with global context representation for semi-supervised node classification. Specifically, we first use graph convolution network to learn short-range local dependencies, which not only considers the spatial topological structure relationship between nodes, but also takes into account the semantic correlation between nodes to enhance the representation ability of nodes. Second, an improved Transformer model is introduced to reasoning the long-range global pairwise relationships, which has linear computational complexity and is particularly important for large datasets. Finally, the proposed model shows strong performance on various open datasets, demonstrating the superiority of our solutions.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"53 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explaining Siamese networks in few-shot learning 解释少儿学习中的连体网络
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-04-29 DOI: 10.1007/s10994-024-06529-8
Andrea Fedele, Riccardo Guidotti, Dino Pedreschi

Machine learning models often struggle to generalize accurately when tested on new class distributions that were not present in their training data. This is a significant challenge for real-world applications that require quick adaptation without the need for retraining. To address this issue, few-shot learning frameworks, which includes models such as Siamese Networks, have been proposed. Siamese Networks learn similarity between pairs of records through a metric that can be easily extended to new, unseen classes. However, these systems lack interpretability, which can hinder their use in certain applications. To address this, we propose a data-agnostic method to explain the outcomes of Siamese Networks in the context of few-shot learning. Our explanation method is based on a post-hoc perturbation-based procedure that evaluates the contribution of individual input features to the final outcome. As such, it falls under the category of post-hoc explanation methods. We present two variants, one that considers each input feature independently, and another that evaluates the interplay between features. Additionally, we propose two perturbation procedures to evaluate feature contributions. Qualitative and quantitative results demonstrate that our method is able to identify highly discriminant intra-class and inter-class characteristics, as well as predictive behaviors that lead to misclassification by relying on incorrect features.

机器学习模型在对训练数据中不存在的新类别分布进行测试时,往往难以准确泛化。对于需要快速适应而无需重新训练的实际应用来说,这是一个巨大的挑战。为了解决这个问题,有人提出了少量学习框架,其中包括连体网络(Siamese Networks)等模型。连体网络通过一种指标来学习记录对之间的相似性,这种指标可以很容易地扩展到新的、未见过的类别。然而,这些系统缺乏可解释性,这可能会阻碍它们在某些应用中的使用。为了解决这个问题,我们提出了一种与数据无关的方法,来解释连体网络在少量学习中的结果。我们的解释方法基于一种事后扰动程序,该程序可评估各个输入特征对最终结果的贡献。因此,它属于事后解释方法的范畴。我们提出了两种变体,一种是独立考虑每个输入特征,另一种是评估特征之间的相互作用。此外,我们还提出了两种扰动程序来评估特征贡献。定性和定量结果表明,我们的方法能够识别高区分度的类内和类间特征,以及依赖不正确特征而导致误分类的预测行为。
{"title":"Explaining Siamese networks in few-shot learning","authors":"Andrea Fedele, Riccardo Guidotti, Dino Pedreschi","doi":"10.1007/s10994-024-06529-8","DOIUrl":"https://doi.org/10.1007/s10994-024-06529-8","url":null,"abstract":"<p>Machine learning models often struggle to generalize accurately when tested on new class distributions that were not present in their training data. This is a significant challenge for real-world applications that require quick adaptation without the need for retraining. To address this issue, few-shot learning frameworks, which includes models such as Siamese Networks, have been proposed. Siamese Networks learn similarity between pairs of records through a metric that can be easily extended to new, unseen classes. However, these systems lack interpretability, which can hinder their use in certain applications. To address this, we propose a data-agnostic method to explain the outcomes of Siamese Networks in the context of few-shot learning. Our explanation method is based on a post-hoc perturbation-based procedure that evaluates the contribution of individual input features to the final outcome. As such, it falls under the category of post-hoc explanation methods. We present two variants, one that considers each input feature independently, and another that evaluates the interplay between features. Additionally, we propose two perturbation procedures to evaluate feature contributions. Qualitative and quantitative results demonstrate that our method is able to identify highly discriminant intra-class and inter-class characteristics, as well as predictive behaviors that lead to misclassification by relying on incorrect features.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"38 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reversible jump attack to textual classifiers with modification reduction 针对文本分类器的可逆跳转攻击与修改减少
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-04-22 DOI: 10.1007/s10994-024-06539-6
Mingze Ni, Zhensu Sun, Wei Liu

Recent studies on adversarial examples expose vulnerabilities of natural language processing models. Existing techniques for generating adversarial examples are typically driven by deterministic hierarchical rules that are agnostic to the optimal adversarial examples, a strategy that often results in adversarial samples with a suboptimal balance between magnitudes of changes and attack successes. To this end, in this research we propose two algorithms, Reversible Jump Attack (RJA) and Metropolis–Hasting Modification Reduction (MMR), to generate highly effective adversarial examples and to improve the imperceptibility of the examples, respectively. RJA utilizes a novel randomization mechanism to enlarge the search space and efficiently adapts to a number of perturbed words for adversarial examples. With these generated adversarial examples, MMR applies the Metropolis–Hasting sampler to enhance the imperceptibility of adversarial examples. Extensive experiments demonstrate that RJA-MMR outperforms current state-of-the-art methods in attack performance, imperceptibility, fluency and grammar correctness.

最近关于对抗示例的研究暴露了自然语言处理模型的漏洞。生成对抗示例的现有技术通常由确定性分层规则驱动,而这些规则与最优对抗示例无关,这种策略通常会导致对抗样本在变化幅度和攻击成功率之间达不到最佳平衡。为此,我们在本研究中提出了两种算法--可逆跳跃攻击(RJA)和大都会-空速修改还原(MMR),分别用于生成高效的对抗示例和提高示例的不可感知性。RJA 利用一种新颖的随机化机制来扩大搜索空间,并能有效地适应大量扰动词的对抗示例。利用这些生成的对抗示例,MMR 应用 Metropolis-Hasting 采样器来增强对抗示例的不可感知性。大量实验证明,RJA-MMR 在攻击性能、不可感知性、流畅性和语法正确性方面都优于目前最先进的方法。
{"title":"Reversible jump attack to textual classifiers with modification reduction","authors":"Mingze Ni, Zhensu Sun, Wei Liu","doi":"10.1007/s10994-024-06539-6","DOIUrl":"https://doi.org/10.1007/s10994-024-06539-6","url":null,"abstract":"<p>Recent studies on adversarial examples expose vulnerabilities of natural language processing models. Existing techniques for generating adversarial examples are typically driven by deterministic hierarchical rules that are agnostic to the optimal adversarial examples, a strategy that often results in adversarial samples with a suboptimal balance between magnitudes of changes and attack successes. To this end, in this research we propose two algorithms, Reversible Jump Attack (RJA) and Metropolis–Hasting Modification Reduction (MMR), to generate highly effective adversarial examples and to improve the imperceptibility of the examples, respectively. RJA utilizes a novel randomization mechanism to enlarge the search space and efficiently adapts to a number of perturbed words for adversarial examples. With these generated adversarial examples, MMR applies the Metropolis–Hasting sampler to enhance the imperceptibility of adversarial examples. Extensive experiments demonstrate that RJA-MMR outperforms current state-of-the-art methods in attack performance, imperceptibility, fluency and grammar correctness.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"279 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140806544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Machine Learning
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1