Machine Learning最新文献_第7页

Exploiting residual errors in nonlinear online prediction 利用非线性在线预测中的残余误差

IF 7.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Learning

Pub Date : 2024-05-29 DOI: 10.1007/s10994-024-06554-7

Emirhan Ilhan, Ahmet B. Koc, Suleyman S. Kozat

We introduce a novel online (or sequential) nonlinear prediction approach that incorporates the residuals, i.e., prediction errors in the past observations, as additional features for the current data. Including the past error terms in an online prediction algorithm naturally improves prediction performance significantly since this information is essential for an algorithm to adjust itself based on its past errors. These terms are well exploited in many linear statistical models such as ARMA, SES, and Holts-Winters models. However, the past error terms are rarely or in a certain sense not optimally exploited in nonlinear prediction models since training them requires complex nonlinear state-space modeling. To this end, for the first time in the literature, we introduce a nonlinear prediction framework that utilizes not only the current features but also the past error terms as additional features, thereby exploiting the residual state information in the error terms, i.e., the model’s performance on the past samples. Since the new feature vectors contain error terms that change with every update, our algorithm jointly optimizes the model parameters and the feature vectors simultaneously. We achieve this by introducing new update equations that handle the effects resulting from the changes in the feature vectors in an online manner. We use soft decision trees and neural networks as the nonlinear prediction algorithms since these are the most widely used methods in highly publicized competitions. However, as we show, our methods are generic and any algorithm supporting gradient calculations can be straightforwardly used. We show through our experiments on the well-known real-life competition datasets that our method significantly outperforms the state-of-the-art. We also provide the implementation of our approach including the source code to facilitate reproducibility (https://github.com/ahmetberkerkoc/SDT-ARMA).

我们引入了一种新颖的在线（或连续）非线性预测方法，该方法将残差（即过去观测中的预测误差）作为当前数据的附加特征。在在线预测算法中加入过去的误差项，自然能显著提高预测性能，因为这些信息对于算法根据过去的误差进行自我调整至关重要。在许多线性统计模型（如 ARMA、SES 和 Holts-Winters 模型）中，这些项都得到了很好的利用。然而，在非线性预测模型中，过去的误差项很少被利用，或者从某种意义上说，没有得到最佳利用，因为训练这些模型需要复杂的非线性状态空间建模。为此，我们在文献中首次引入了一个非线性预测框架，该框架不仅利用当前特征，还利用过去的误差项作为附加特征，从而利用误差项中的残余状态信息，即模型在过去样本上的表现。由于新的特征向量包含的误差项会随着每次更新而改变，因此我们的算法会同时对模型参数和特征向量进行联合优化。为此，我们引入了新的更新方程，以在线方式处理特征向量变化带来的影响。我们使用软决策树和神经网络作为非线性预测算法，因为这些方法在备受关注的竞赛中使用最为广泛。不过，正如我们所展示的，我们的方法是通用的，任何支持梯度计算的算法都可以直接使用。我们在著名的真实竞赛数据集上进行的实验表明，我们的方法明显优于最先进的方法。我们还提供了我们方法的实现，包括源代码，以促进可重复性（https://github.com/ahmetberkerkoc/SDT-ARMA）。

{"title":"Exploiting residual errors in nonlinear online prediction","authors":"Emirhan Ilhan, Ahmet B. Koc, Suleyman S. Kozat","doi":"10.1007/s10994-024-06554-7","DOIUrl":"https://doi.org/10.1007/s10994-024-06554-7","url":null,"abstract":"We introduce a novel online (or sequential) nonlinear prediction approach that incorporates the residuals, i.e., prediction errors in the past observations, as additional features for the current data. Including the past error terms in an online prediction algorithm naturally improves prediction performance significantly since this information is essential for an algorithm to adjust itself based on its past errors. These terms are well exploited in many linear statistical models such as ARMA, SES, and Holts-Winters models. However, the past error terms are rarely or in a certain sense not optimally exploited in nonlinear prediction models since training them requires complex nonlinear state-space modeling. To this end, for the first time in the literature, we introduce a nonlinear prediction framework that utilizes not only the current features but also the past error terms as additional features, thereby exploiting the residual state information in the error terms, i.e., the model’s performance on the past samples. Since the new feature vectors contain error terms that change with every update, our algorithm jointly optimizes the model parameters and the feature vectors simultaneously. We achieve this by introducing new update equations that handle the effects resulting from the changes in the feature vectors in an online manner. We use soft decision trees and neural networks as the nonlinear prediction algorithms since these are the most widely used methods in highly publicized competitions. However, as we show, our methods are generic and any algorithm supporting gradient calculations can be straightforwardly used. We show through our experiments on the well-known real-life competition datasets that our method significantly outperforms the state-of-the-art. We also provide the implementation of our approach including the source code to facilitate reproducibility (https://github.com/ahmetberkerkoc/SDT-ARMA).","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"34 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141197672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Meta-learning for heterogeneous treatment effect estimation with closed-form solvers 利用闭式求解器进行异质治疗效果估计的元学习

IF 7.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Learning

Pub Date : 2024-05-29 DOI: 10.1007/s10994-024-06546-7

Tomoharu Iwata, Yoichi Chikahara

This article proposes a meta-learning method for estimating the conditional average treatment effect (CATE) from a few observational data. The proposed method learns how to estimate CATEs from multiple tasks and uses the knowledge for unseen tasks. In the proposed method, based on the meta-learner framework, we decompose the CATE estimation problem into sub-problems. For each sub-problem, we formulate our estimation models using neural networks with task-shared and task-specific parameters. With our formulation, we can obtain optimal task-specific parameters in a closed form that are differentiable with respect to task-shared parameters, making it possible to perform effective meta-learning. The task-shared parameters are trained such that the expected CATE estimation performance in few-shot settings is improved by minimizing the difference between a CATE estimated with a large amount of data and one estimated with just a few data. Our experimental results demonstrate that our method outperforms the existing meta-learning approaches and CATE estimation methods.

本文提出了一种元学习方法，用于从少量观察数据中估计条件平均治疗效果（CATE）。该方法可以学习如何从多个任务中估计 CATE，并将所学知识用于未见任务。在所提出的方法中，基于元学习者框架，我们将 CATE 估计问题分解为多个子问题。对于每个子问题，我们使用带有任务共享参数和任务特定参数的神经网络来建立估计模型。通过我们的表述，我们可以以封闭形式获得最优的特定任务参数，这些参数相对于任务共享参数是可微分的，从而可以进行有效的元学习。对任务共享参数进行训练后，通过最小化用大量数据估算出的 CATE 与仅用少量数据估算出的 CATE 之间的差异，可以提高在少量数据设置下的预期 CATE 估算性能。实验结果表明，我们的方法优于现有的元学习方法和 CATE 估算方法。

引用次数: 0

Probabilistic grammars for modeling dynamical systems from coarse, noisy, and partial data 从粗略、嘈杂和部分数据为动力系统建模的概率语法

IF 7.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Learning

Pub Date : 2024-05-29 DOI: 10.1007/s10994-024-06522-1

Nina Omejc, Boštjan Gec, Jure Brence, Ljupčo Todorovski, Sašo Džeroski

Ordinary differential equations (ODEs) are a widely used formalism for the mathematical modeling of dynamical systems, a task omnipresent in scientific domains. The paper introduces a novel method for inferring ODEs from data, which extends ProGED, a method for equation discovery that allows users to formalize domain-specific knowledge as probabilistic context-free grammars and use it for constraining the space of candidate equations. The extended method can discover ODEs from partial observations of dynamical systems, where only a subset of state variables can be observed. To evaluate the performance of the newly proposed method, we perform a systematic empirical comparison with alternative state-of-the-art methods for equation discovery and system identification from complete and partial observations. The comparison uses Dynobench, a set of ten dynamical systems that extends the standard Strogatz benchmark. We compare the ability of the considered methods to reconstruct the known ODEs from synthetic data simulated at different temporal resolutions. We also consider data with different levels of noise, i.e., signal-to-noise ratios. The improved ProGED compares favourably to state-of-the-art methods for inferring ODEs from data regarding reconstruction abilities and robustness to data coarseness, noise, and completeness.

常微分方程（ODEs）是一种广泛应用于动态系统数学建模的形式主义，是科学领域无处不在的任务。ProGED 是一种用于发现方程的方法，允许用户将特定领域的知识形式化为概率无上下文语法，并将其用于限制候选方程的空间。这种扩展方法可以从动态系统的部分观测结果中发现 ODE，在这种情况下，只能观测到状态变量的子集。为了评估新方法的性能，我们与其他最先进的方法进行了系统的实证比较，以便从完整和部分观测结果中发现方程和识别系统。比较使用的是 Dynobench，这是一套扩展了标准 Strogatz 基准的十个动态系统。我们比较了所考虑的方法从不同时间分辨率模拟的合成数据中重建已知 ODE 的能力。我们还考虑了不同噪声水平（即信噪比）的数据。改进后的 ProGED 在重构能力以及对数据粗度、噪声和完整性的鲁棒性方面优于最先进的从数据推断 ODE 的方法。

{"title":"Probabilistic grammars for modeling dynamical systems from coarse, noisy, and partial data","authors":"Nina Omejc, Boštjan Gec, Jure Brence, Ljupčo Todorovski, Sašo Džeroski","doi":"10.1007/s10994-024-06522-1","DOIUrl":"https://doi.org/10.1007/s10994-024-06522-1","url":null,"abstract":"Ordinary differential equations (ODEs) are a widely used formalism for the mathematical modeling of dynamical systems, a task omnipresent in scientific domains. The paper introduces a novel method for inferring ODEs from data, which extends ProGED, a method for equation discovery that allows users to formalize domain-specific knowledge as probabilistic context-free grammars and use it for constraining the space of candidate equations. The extended method can discover ODEs from partial observations of dynamical systems, where only a subset of state variables can be observed. To evaluate the performance of the newly proposed method, we perform a systematic empirical comparison with alternative state-of-the-art methods for equation discovery and system identification from complete and partial observations. The comparison uses Dynobench, a set of ten dynamical systems that extends the standard Strogatz benchmark. We compare the ability of the considered methods to reconstruct the known ODEs from synthetic data simulated at different temporal resolutions. We also consider data with different levels of noise, i.e., signal-to-noise ratios. The improved ProGED compares favourably to state-of-the-art methods for inferring ODEs from data regarding reconstruction abilities and robustness to data coarseness, noise, and completeness.","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"43 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141197670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evaluating feature attribution methods in the image domain 评估图像领域的特征归属方法

IF 7.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Learning

Pub Date : 2024-05-24 DOI: 10.1007/s10994-024-06550-x

Arne Gevaert, Axel-Jan Rousseau, Thijs Becker, Dirk Valkenborg, Tijl De Bie, Yvan Saeys

Feature attribution maps are a popular approach to highlight the most important pixels in an image for a given prediction of a model. Despite a recent growth in popularity and available methods, the objective evaluation of such attribution maps remains an open problem. Building on previous work in this domain, we investigate existing quality metrics and propose new variants of metrics for the evaluation of attribution maps. We confirm a recent finding that different quality metrics seem to measure different underlying properties of attribution maps, and extend this finding to a larger selection of attribution methods, quality metrics, and datasets. We also find that metric results on one dataset do not necessarily generalize to other datasets, and methods with desirable theoretical properties do not necessarily outperform computationally cheaper alternatives in practice. Based on these findings, we propose a general benchmarking approach to help guide the selection of attribution methods for a given use case. Implementations of attribution metrics and our experiments are available online (https://github.com/arnegevaert/benchmark-general-imaging).

Graphical abstract

特征归因图是一种流行的方法，用于突出图像中对给定模型预测最重要的像素。尽管最近这种方法越来越流行，可用性也越来越高，但如何客观地评估这种归因图仍然是一个有待解决的问题。在该领域以往工作的基础上，我们研究了现有的质量度量标准，并提出了用于评估归因图的新度量标准变体。我们证实了最近的一项发现，即不同的质量度量似乎衡量了归因图的不同基本属性，并将这一发现扩展到更多的归因方法、质量度量和数据集。我们还发现，一个数据集上的度量结果并不一定适用于其他数据集，而且具有理想理论属性的方法在实践中并不一定优于计算成本更低的替代方法。基于这些发现，我们提出了一种通用的基准测试方法，以帮助指导特定用例中归因方法的选择。归因指标的实现和我们的实验可在线获取（https://github.com/arnegevaert/benchmark-general-imaging）。图文摘要

{"title":"Evaluating feature attribution methods in the image domain","authors":"Arne Gevaert, Axel-Jan Rousseau, Thijs Becker, Dirk Valkenborg, Tijl De Bie, Yvan Saeys","doi":"10.1007/s10994-024-06550-x","DOIUrl":"https://doi.org/10.1007/s10994-024-06550-x","url":null,"abstract":"Feature attribution maps are a popular approach to highlight the most important pixels in an image for a given prediction of a model. Despite a recent growth in popularity and available methods, the objective evaluation of such attribution maps remains an open problem. Building on previous work in this domain, we investigate existing quality metrics and propose new variants of metrics for the evaluation of attribution maps. We confirm a recent finding that different quality metrics seem to measure different underlying properties of attribution maps, and extend this finding to a larger selection of attribution methods, quality metrics, and datasets. We also find that metric results on one dataset do not necessarily generalize to other datasets, and methods with desirable theoretical properties do not necessarily outperform computationally cheaper alternatives in practice. Based on these findings, we propose a general benchmarking approach to help guide the selection of attribution methods for a given use case. Implementations of attribution metrics and our experiments are available online (https://github.com/arnegevaert/benchmark-general-imaging).<h3 data-test=\"abstract-sub-heading\">Graphical abstract</h3>\u0000","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"17 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141153809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Classification with costly features in hierarchical deep sets 在分层深度集合中使用代价高昂的特征进行分类

IF 7.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Learning

Pub Date : 2024-05-22 DOI: 10.1007/s10994-024-06565-4

Jaromír Janisch, Tomáš Pevný, Viliam Lisý

Classification with costly features (CwCF) is a classification problem that includes the cost of features in the optimization criteria. Individually for each sample, its features are sequentially acquired to maximize accuracy while minimizing the acquired features’ cost. However, existing approaches can only process data that can be expressed as vectors of fixed length. In real life, the data often possesses rich and complex structure, which can be more precisely described with formats such as XML or JSON. The data is hierarchical and often contains nested lists of objects. In this work, we extend an existing deep reinforcement learning-based algorithm with hierarchical deep sets and hierarchical softmax, so that it can directly process this data. The extended method has greater control over which features it can acquire and, in experiments with seven datasets, we show that this leads to superior performance. To showcase the real usage of the new method, we apply it to a real-life problem of classifying malicious web domains, using an online service.

高成本特征分类（CwCF）是一种将特征成本纳入优化标准的分类问题。对于每个样本，都要按顺序获取其特征，以最大限度地提高准确率，同时最小化获取特征的成本。然而，现有方法只能处理以固定长度向量表示的数据。在现实生活中，数据往往具有丰富而复杂的结构，而 XML 或 JSON 等格式可以更精确地描述这些结构。数据是分层的，通常包含嵌套的对象列表。在这项工作中，我们利用分层深度集和分层软最大值扩展了现有的基于深度强化学习的算法，使其可以直接处理这些数据。通过对七个数据集的实验，我们发现这种扩展方法能更好地控制所能获取的特征，从而带来更出色的性能。为了展示新方法的实际用途，我们将其应用于一个实际问题，即利用在线服务对恶意网站域名进行分类。

引用次数: 0

CoMadOut—a robust outlier detection algorithm based on CoMAD CoMadOut - 基于 CoMAD 的鲁棒离群点检测算法

IF 7.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Learning

Pub Date : 2024-05-07 DOI: 10.1007/s10994-024-06521-2

Andreas Lohrer, Daniyal Kazempour, Maximilian Hünemörder, Peer Kröger

Unsupervised learning methods are well established in the area of anomaly detection and achieve state of the art performances on outlier datasets. Outliers play a significant role, since they bear the potential to distort the predictions of a machine learning algorithm on a given dataset. Especially among PCA-based methods, outliers have an additional destructive potential regarding the result: they may not only distort the orientation and translation of the principal components, they also make it more complicated to detect outliers. To address this problem, we propose the robust outlier detection algorithm CoMadOut, which satisfies two required properties: (1) being robust towards outliers and (2) detecting them. Our CoMadOut outlier detection variants using comedian PCA define, dependent on its variant, an inlier region with a robust noise margin by measures of in-distribution (variant CMO) and optimized scores by measures of out-of-distribution (variants CMO*), e.g. kurtosis-weighting by CMO+k. These measures allow distribution based outlier scoring for each principal component, and thus, an appropriate alignment of the degree of outlierness between normal and abnormal instances. Experiments comparing CoMadOut with traditional, deep and other comparable robust outlier detection methods showed that the performance of the introduced CoMadOut approach is competitive to well established methods related to average precision (AP), area under the precision recall curve (AUPRC) and area under the receiver operating characteristic (AUROC) curve. In summary our approach can be seen as a robust alternative for outlier detection tasks.

无监督学习方法在异常检测领域已得到广泛应用，并在异常值数据集上实现了最先进的性能。异常值具有重要作用，因为它们有可能扭曲机器学习算法对给定数据集的预测。特别是在基于 PCA 的方法中，离群值对结果具有额外的破坏潜力：它们不仅会扭曲主成分的方向和平移，还会使离群值的检测变得更加复杂。为了解决这个问题，我们提出了鲁棒离群值检测算法 CoMadOut，它满足两个必要的属性：（1）对离群值的鲁棒性和（2）检测离群值。我们的 CoMadOut 离群点检测变体使用 comedian PCA，根据其变体，通过分布内测量（变体 CMO）和分布外测量（变体 CMO*）（如 CMO+k 的峰度加权）定义具有稳健噪声边际的离群点区域和优化分数。这些测量方法可以对每个主成分进行基于分布的离群值评分，从而对正常和异常实例之间的离群程度进行适当的调整。将 CoMadOut 与传统的、深度的和其他类似的鲁棒离群点检测方法进行比较的实验表明，引入的 CoMadOut 方法在平均精度（AP）、精度召回曲线下面积（AUPRC）和接收者操作特征曲线下面积（AUROC）方面的性能与成熟的方法相比具有竞争力。总之，我们的方法可被视为离群点检测任务的一种稳健替代方法。

{"title":"CoMadOut—a robust outlier detection algorithm based on CoMAD","authors":"Andreas Lohrer, Daniyal Kazempour, Maximilian Hünemörder, Peer Kröger","doi":"10.1007/s10994-024-06521-2","DOIUrl":"https://doi.org/10.1007/s10994-024-06521-2","url":null,"abstract":"Unsupervised learning methods are well established in the area of anomaly detection and achieve state of the art performances on outlier datasets. Outliers play a significant role, since they bear the potential to distort the predictions of a machine learning algorithm on a given dataset. Especially among PCA-based methods, outliers have an additional destructive potential regarding the result: they may not only distort the orientation and translation of the principal components, they also make it more complicated to detect outliers. To address this problem, we propose the robust outlier detection algorithm CoMadOut, which satisfies two required properties: (1) being robust towards outliers and (2) detecting them. Our CoMadOut outlier detection variants using comedian PCA define, dependent on its variant, an inlier region with a robust noise margin by measures of in-distribution (variant CMO) and optimized scores by measures of out-of-distribution (variants CMO*), e.g. kurtosis-weighting by CMO+k. These measures allow distribution based outlier scoring for each principal component, and thus, an appropriate alignment of the degree of outlierness between normal and abnormal instances. Experiments comparing CoMadOut with traditional, deep and other comparable robust outlier detection methods showed that the performance of the introduced CoMadOut approach is competitive to well established methods related to average precision (AP), area under the precision recall curve (AUPRC) and area under the receiver operating characteristic (AUROC) curve. In summary our approach can be seen as a robust alternative for outlier detection tasks.","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"1 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140884304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SWoTTeD: an extension of tensor decomposition to temporal phenotyping SWoTTeD：将张量分解扩展到时间表型分析

IF 7.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Learning

Pub Date : 2024-04-30 DOI: 10.1007/s10994-024-06545-8

Hana Sebia, Thomas Guyet, Etienne Audureau

Tensor decomposition has recently been gaining attention in the machine learning community for the analysis of individual traces, such as Electronic Health Records. However, this task becomes significantly more difficult when the data follows complex temporal patterns. This paper introduces the notion of a temporal phenotype as an arrangement of features over time and it proposes SWoTTeD (Sliding Window for Temporal Tensor Decomposition), a novel method to discover hidden temporal patterns. SWoTTeD integrates several constraints and regularizations to enhance the interpretability of the extracted phenotypes. We validate our proposal using both synthetic and real-world datasets, and we present an original usecase using data from the Greater Paris University Hospital. The results show that SWoTTeD achieves at least as accurate reconstruction as recent state-of-the-art tensor decomposition models, and extracts temporal phenotypes that are meaningful for clinicians.

最近，张量分解法在机器学习领域越来越受到关注，它可用于分析电子健康记录等单个痕迹。然而，当数据遵循复杂的时间模式时，这项任务就变得困难得多。本文引入了时态表型的概念，将其视为随时间变化的特征排列，并提出了 SWoTTeD（用于时态张量分解的滑动窗口），这是一种发现隐藏时态模式的新方法。SWoTTeD 整合了多个约束条件和正则化，以提高提取表型的可解释性。我们使用合成数据集和真实世界数据集验证了我们的建议，并使用大巴黎大学医院的数据介绍了一个原创案例。结果表明，SWoTTeD 所实现的重建精确度至少不亚于最近最先进的张量分解模型，而且提取的时间表型对临床医生来说非常有意义。

引用次数: 0

Finite-time error bounds for Greedy-GQ Greedy-GQ 的有限时间误差边界

IF 7.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Learning

Pub Date : 2024-04-30 DOI: 10.1007/s10994-024-06542-x

Yue Wang, Yi Zhou, Shaofeng Zou

Greedy-GQ with linear function approximation, originally proposed in Maei et al. (in: Proceedings of the international conference on machine learning (ICML), 2010), is a value-based off-policy algorithm for optimal control in reinforcement learning, and it has a non-linear two timescale structure with non-convex objective function. This paper develops its tightest finite-time error bounds. We show that the Greedy-GQ algorithm converges as fast as (mathcal {O}({1}/{sqrt{T}})) under the i.i.d. setting and (mathcal {O}({log T}/{sqrt{T}})) under the Markovian setting. We further design variant of the vanilla Greedy-GQ algorithm using the nested-loop approach, and show that its sample complexity is (mathcal {O}({log (1/epsilon )epsilon ^{-2}})), which matches with the one of the vanilla Greedy-GQ. Our finite-time error bounds match with the one of the stochastic gradient descent algorithm for general smooth non-convex optimization problems, despite of its additonal challenge in the two time-scale updates. Our finite-sample analysis provides theoretical guidance on choosing step-sizes for faster convergence in practice, and suggests the trade-off between the convergence rate and the quality of the obtained policy. Our techniques provide a general approach for finite-sample analysis of non-convex two timescale value-based reinforcement learning algorithms.

具有线性函数近似的 Greedy-GQ 算法最初是由 Maei 等人提出的（Proceedings of international conference of machine learning (ICML, 2010)）：国际机器学习会议（ICML）论文集，2010 年）中提出的，是一种基于值的非策略算法，用于强化学习中的最优控制，它具有非线性双时标结构和非凸目标函数。本文开发了其最严格的有限时间误差边界。我们证明，在 i.i.d. 设定下，Greedy-GQ 算法的收敛速度可达 (mathcal {O}({1}/{sqrt{T}})) ；在马尔可夫设定下，收敛速度可达 (mathcal {O}({log T}/{sqrt{T}})) 。我们使用嵌套循环方法进一步设计了vanilla Greedy-GQ算法的变体，并证明其采样复杂度为（mathcal {O}（{log (1/epsilon )epsilon ^{-2}}）），与vanilla Greedy-GQ算法的采样复杂度相匹配。我们的有限时间误差边界与随机梯度下降算法的有限时间误差边界相匹配，该算法适用于一般平滑非凸优化问题，尽管它在两个时间尺度的更新中面临额外的挑战。我们的有限样本分析为在实践中选择更快收敛的步长提供了理论指导，并提出了收敛速度与所获策略质量之间的权衡。我们的技术为基于值的非凸双时间尺度强化学习算法的有限样本分析提供了一种通用方法。

{"title":"Finite-time error bounds for Greedy-GQ","authors":"Yue Wang, Yi Zhou, Shaofeng Zou","doi":"10.1007/s10994-024-06542-x","DOIUrl":"https://doi.org/10.1007/s10994-024-06542-x","url":null,"abstract":"Greedy-GQ with linear function approximation, originally proposed in Maei et al. (in: Proceedings of the international conference on machine learning (ICML), 2010), is a value-based off-policy algorithm for optimal control in reinforcement learning, and it has a non-linear two timescale structure with non-convex objective function. This paper develops its tightest finite-time error bounds. We show that the Greedy-GQ algorithm converges as fast as (mathcal {O}({1}/{sqrt{T}})) under the i.i.d. setting and (mathcal {O}({log T}/{sqrt{T}})) under the Markovian setting. We further design variant of the vanilla Greedy-GQ algorithm using the nested-loop approach, and show that its sample complexity is (mathcal {O}({log (1/epsilon )epsilon ^{-2}})), which matches with the one of the vanilla Greedy-GQ. Our finite-time error bounds match with the one of the stochastic gradient descent algorithm for general smooth non-convex optimization problems, despite of its additonal challenge in the two time-scale updates. Our finite-sample analysis provides theoretical guidance on choosing step-sizes for faster convergence in practice, and suggests the trade-off between the convergence rate and the quality of the obtained policy. Our techniques provide a general approach for finite-sample analysis of non-convex two timescale value-based reinforcement learning algorithms.","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"41 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Semantic-enhanced graph neural networks with global context representation 具有全局上下文表示的语义增强图神经网络

IF 7.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Learning

Pub Date : 2024-04-29 DOI: 10.1007/s10994-024-06523-0

Youcheng Qian, Xueyan Yin

Node classification is a crucial task for efficiently analyzing graph-structured data. Related semi-supervised methods have been extensively studied to address the scarcity of labeled data in emerging classes. However, two fundamental weaknesses hinder the performance: lacking the ability to mine latent semantic information between nodes, or ignoring to simultaneously capture local and global coupling dependencies between different nodes. To solve these limitations, we propose a novel semantic-enhanced graph neural networks with global context representation for semi-supervised node classification. Specifically, we first use graph convolution network to learn short-range local dependencies, which not only considers the spatial topological structure relationship between nodes, but also takes into account the semantic correlation between nodes to enhance the representation ability of nodes. Second, an improved Transformer model is introduced to reasoning the long-range global pairwise relationships, which has linear computational complexity and is particularly important for large datasets. Finally, the proposed model shows strong performance on various open datasets, demonstrating the superiority of our solutions.

节点分类是高效分析图结构数据的一项重要任务。相关的半监督方法已被广泛研究，以解决新兴类别中标记数据稀缺的问题。然而，有两个基本弱点阻碍了这些方法的性能：缺乏挖掘节点间潜在语义信息的能力，或者忽略了同时捕捉不同节点间的局部和全局耦合依赖关系。为了解决这些局限性，我们提出了一种具有全局上下文表示的新型语义增强图神经网络，用于半监督节点分类。具体来说，我们首先利用图卷积网络来学习短程局部依赖关系，这不仅考虑了节点之间的空间拓扑结构关系，还考虑了节点之间的语义关联，从而增强了节点的表示能力。其次，引入改进的 Transformer 模型来推理长程全局配对关系，该模型具有线性计算复杂度，对于大型数据集尤为重要。最后，所提出的模型在各种开放数据集上表现出很强的性能，证明了我们的解决方案的优越性。

{"title":"Semantic-enhanced graph neural networks with global context representation","authors":"Youcheng Qian, Xueyan Yin","doi":"10.1007/s10994-024-06523-0","DOIUrl":"https://doi.org/10.1007/s10994-024-06523-0","url":null,"abstract":"Node classification is a crucial task for efficiently analyzing graph-structured data. Related semi-supervised methods have been extensively studied to address the scarcity of labeled data in emerging classes. However, two fundamental weaknesses hinder the performance: lacking the ability to mine latent semantic information between nodes, or ignoring to simultaneously capture local and global coupling dependencies between different nodes. To solve these limitations, we propose a novel semantic-enhanced graph neural networks with global context representation for semi-supervised node classification. Specifically, we first use graph convolution network to learn short-range local dependencies, which not only considers the spatial topological structure relationship between nodes, but also takes into account the semantic correlation between nodes to enhance the representation ability of nodes. Second, an improved Transformer model is introduced to reasoning the long-range global pairwise relationships, which has linear computational complexity and is particularly important for large datasets. Finally, the proposed model shows strong performance on various open datasets, demonstrating the superiority of our solutions.","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"53 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Explaining Siamese networks in few-shot learning 解释少儿学习中的连体网络

IF 7.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Learning

Pub Date : 2024-04-29 DOI: 10.1007/s10994-024-06529-8

Andrea Fedele, Riccardo Guidotti, Dino Pedreschi

Machine learning models often struggle to generalize accurately when tested on new class distributions that were not present in their training data. This is a significant challenge for real-world applications that require quick adaptation without the need for retraining. To address this issue, few-shot learning frameworks, which includes models such as Siamese Networks, have been proposed. Siamese Networks learn similarity between pairs of records through a metric that can be easily extended to new, unseen classes. However, these systems lack interpretability, which can hinder their use in certain applications. To address this, we propose a data-agnostic method to explain the outcomes of Siamese Networks in the context of few-shot learning. Our explanation method is based on a post-hoc perturbation-based procedure that evaluates the contribution of individual input features to the final outcome. As such, it falls under the category of post-hoc explanation methods. We present two variants, one that considers each input feature independently, and another that evaluates the interplay between features. Additionally, we propose two perturbation procedures to evaluate feature contributions. Qualitative and quantitative results demonstrate that our method is able to identify highly discriminant intra-class and inter-class characteristics, as well as predictive behaviors that lead to misclassification by relying on incorrect features.

机器学习模型在对训练数据中不存在的新类别分布进行测试时，往往难以准确泛化。对于需要快速适应而无需重新训练的实际应用来说，这是一个巨大的挑战。为了解决这个问题，有人提出了少量学习框架，其中包括连体网络（Siamese Networks）等模型。连体网络通过一种指标来学习记录对之间的相似性，这种指标可以很容易地扩展到新的、未见过的类别。然而，这些系统缺乏可解释性，这可能会阻碍它们在某些应用中的使用。为了解决这个问题，我们提出了一种与数据无关的方法，来解释连体网络在少量学习中的结果。我们的解释方法基于一种事后扰动程序，该程序可评估各个输入特征对最终结果的贡献。因此，它属于事后解释方法的范畴。我们提出了两种变体，一种是独立考虑每个输入特征，另一种是评估特征之间的相互作用。此外，我们还提出了两种扰动程序来评估特征贡献。定性和定量结果表明，我们的方法能够识别高区分度的类内和类间特征，以及依赖不正确特征而导致误分类的预测行为。

{"title":"Explaining Siamese networks in few-shot learning","authors":"Andrea Fedele, Riccardo Guidotti, Dino Pedreschi","doi":"10.1007/s10994-024-06529-8","DOIUrl":"https://doi.org/10.1007/s10994-024-06529-8","url":null,"abstract":"Machine learning models often struggle to generalize accurately when tested on new class distributions that were not present in their training data. This is a significant challenge for real-world applications that require quick adaptation without the need for retraining. To address this issue, few-shot learning frameworks, which includes models such as Siamese Networks, have been proposed. Siamese Networks learn similarity between pairs of records through a metric that can be easily extended to new, unseen classes. However, these systems lack interpretability, which can hinder their use in certain applications. To address this, we propose a data-agnostic method to explain the outcomes of Siamese Networks in the context of few-shot learning. Our explanation method is based on a post-hoc perturbation-based procedure that evaluates the contribution of individual input features to the final outcome. As such, it falls under the category of post-hoc explanation methods. We present two variants, one that considers each input feature independently, and another that evaluates the interplay between features. Additionally, we propose two perturbation procedures to evaluate feature contributions. Qualitative and quantitative results demonstrate that our method is able to identify highly discriminant intra-class and inter-class characteristics, as well as predictive behaviors that lead to misclassification by relying on incorrect features.","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"38 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0