首页 > 最新文献

Machine Learning最新文献

英文 中文
Deep latent force models: ODE-based process convolutions for Bayesian deep learning. 深度潜力模型:基于ode的贝叶斯深度学习过程卷积。
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-01 Epub Date: 2025-07-15 DOI: 10.1007/s10994-025-06824-y
Thomas Baldwin-McDonald, Xinxing Shi, Mingxin Shen, Mauricio A Álvarez

Modelling the behaviour of highly nonlinear dynamical systems with robust uncertainty quantification is a challenging task which typically requires approaches specifically designed to address the problem at hand. We introduce a domain-agnostic model to address this issue termed the deep latent force model (DLFM), a deep Gaussian process with physics-informed kernels at each layer, derived from ordinary differential equations using the framework of process convolutions. Two distinct formulations of the DLFM are presented which utilise weight-space and variational inducing points-based Gaussian process approximations, both of which are amenable to doubly stochastic variational inference. We present empirical evidence of the capability of the DLFM to capture the dynamics present in highly nonlinear real-world multi-output time series data. Additionally, we find that the DLFM is capable of achieving comparable performance to a range of non-physics-informed probabilistic models on benchmark univariate regression tasks. We also empirically assess the negative impact of the inducing points framework on the extrapolation capabilities of LFM-based models.

对高度非线性动力系统的行为进行鲁棒不确定性量化建模是一项具有挑战性的任务,通常需要专门设计的方法来解决手头的问题。我们引入了一个领域不可知论模型来解决这个问题,称为深潜力模型(DLFM),这是一个深度高斯过程,每层都有物理信息核,从使用过程卷积框架的常微分方程推导而来。提出了两种不同的DLFM公式,它们利用权空间和基于变分诱导点的高斯过程近似,这两种近似都适用于双重随机变分推理。我们提供了DLFM捕获高度非线性现实世界多输出时间序列数据中存在的动态的能力的经验证据。此外,我们发现DLFM能够在基准单变量回归任务上实现与一系列非物理信息概率模型相当的性能。我们还通过实证评估了诱导点框架对基于lfm模型的外推能力的负面影响。
{"title":"Deep latent force models: ODE-based process convolutions for Bayesian deep learning.","authors":"Thomas Baldwin-McDonald, Xinxing Shi, Mingxin Shen, Mauricio A Álvarez","doi":"10.1007/s10994-025-06824-y","DOIUrl":"10.1007/s10994-025-06824-y","url":null,"abstract":"<p><p>Modelling the behaviour of highly nonlinear dynamical systems with robust uncertainty quantification is a challenging task which typically requires approaches specifically designed to address the problem at hand. We introduce a domain-agnostic model to address this issue termed the deep latent force model (DLFM), a deep Gaussian process with physics-informed kernels at each layer, derived from ordinary differential equations using the framework of process convolutions. Two distinct formulations of the DLFM are presented which utilise weight-space and variational inducing points-based Gaussian process approximations, both of which are amenable to doubly stochastic variational inference. We present empirical evidence of the capability of the DLFM to capture the dynamics present in highly nonlinear real-world multi-output time series data. Additionally, we find that the DLFM is capable of achieving comparable performance to a range of non-physics-informed probabilistic models on benchmark univariate regression tasks. We also empirically assess the negative impact of the inducing points framework on the extrapolation capabilities of LFM-based models.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"114 8","pages":"192"},"PeriodicalIF":4.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12263784/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144660909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ensuring medical AI safety: interpretability-driven detection and mitigation of spurious model behavior and associated data. 确保医疗人工智能安全:可解释性驱动的虚假模型行为和相关数据检测和缓解。
IF 2.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-01 Epub Date: 2025-08-12 DOI: 10.1007/s10994-025-06834-w
Frederik Pahde, Thomas Wiegand, Sebastian Lapuschkin, Wojciech Samek

Deep neural networks are increasingly employed in high-stakes medical applications, despite their tendency for shortcut learning in the presence of spurious correlations, which can have potentially fatal consequences in practice. Whereas a multitude of works address either the detection or mitigation of such shortcut behavior in isolation, the Reveal2Revise approach provides a comprehensive bias mitigation framework combining these steps. However, effectively addressing these biases often requires substantial labeling efforts from domain experts. In this work, we review the steps of the Reveal2Revise framework and enhance it with semi-automated interpretability-based bias annotation capabilities. This includes methods for the sample- and feature-level bias annotation, providing valuable information for bias mitigation methods to unlearn the undesired shortcut behavior. We show the applicability of the framework using four medical datasets across two modalities, featuring controlled and real-world spurious correlations caused by data artifacts. We successfully identify and mitigate these biases in VGG16, ResNet50, and contemporary Vision Transformer models, ultimately increasing their robustness and applicability for real-world medical tasks. Our code is available at https://github.com/frederikpahde/medical-ai-safety.

深度神经网络越来越多地应用于高风险的医疗应用,尽管它们倾向于在存在虚假相关性的情况下进行捷径学习,这在实践中可能会产生致命的后果。虽然许多工作都是孤立地解决这种捷径行为的检测或缓解问题,但reveal2revision方法提供了一个综合的偏见缓解框架,将这些步骤结合在一起。然而,有效地解决这些偏见往往需要领域专家大量的标签工作。在这项工作中,我们回顾了reveal2revision框架的步骤,并通过基于可解释性的半自动偏见注释功能对其进行了增强。这包括样本级和特征级偏差注释的方法,为偏差缓解方法提供有价值的信息,以消除不希望的快捷行为。我们使用跨两种模式的四个医疗数据集展示了该框架的适用性,这些数据集具有由数据工件引起的受控和真实的虚假相关性。我们成功地在VGG16、ResNet50和当代Vision Transformer模型中识别并减轻了这些偏差,最终提高了它们对现实世界医疗任务的鲁棒性和适用性。我们的代码可在https://github.com/frederikpahde/medical-ai-safety上获得。
{"title":"Ensuring medical AI safety: interpretability-driven detection and mitigation of spurious model behavior and associated data.","authors":"Frederik Pahde, Thomas Wiegand, Sebastian Lapuschkin, Wojciech Samek","doi":"10.1007/s10994-025-06834-w","DOIUrl":"10.1007/s10994-025-06834-w","url":null,"abstract":"<p><p>Deep neural networks are increasingly employed in high-stakes medical applications, despite their tendency for shortcut learning in the presence of spurious correlations, which can have potentially fatal consequences in practice. Whereas a multitude of works address either the detection or mitigation of such shortcut behavior in isolation, the Reveal2Revise approach provides a comprehensive bias mitigation framework combining these steps. However, effectively addressing these biases often requires substantial labeling efforts from domain experts. In this work, we review the steps of the Reveal2Revise framework and enhance it with semi-automated interpretability-based bias annotation capabilities. This includes methods for the sample- and feature-level bias annotation, providing valuable information for bias mitigation methods to unlearn the undesired shortcut behavior. We show the applicability of the framework using four medical datasets across two modalities, featuring controlled and real-world spurious correlations caused by data artifacts. We successfully identify and mitigate these biases in VGG16, ResNet50, and contemporary Vision Transformer models, ultimately increasing their robustness and applicability for real-world medical tasks. Our code is available at https://github.com/frederikpahde/medical-ai-safety.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"114 9","pages":"206"},"PeriodicalIF":2.9,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12343733/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144856810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mining exceptional social behavior on attributed interaction networks. 基于属性交互网络的异常社会行为挖掘。
IF 2.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-01 Epub Date: 2025-10-10 DOI: 10.1007/s10994-025-06831-z
Martin Atzmueller, Carolina Centeio Jorge, Cláudio Rebelo de Sá, Behzad M Heravi, Jenny L Gibson, Rosaldo J F Rossetti

Social interactions are prevalent in our lives. These can be observed, e. g., online using social media, however, also offline specifically using sensors. In such contexts, typically time-stamped interactions are recorded, which can also be inferred from real-time location of humans. Such interaction data can then be modeled as so-called social interaction networks. For their analysis, a variety of different approaches can be applied. A prominent research direction is then the detection of patterns describing specific subgroups with exceptional behavioral characteristics, given some measure of interest. In the standard case of plain graphs modeling the interaction networks, methods for identifying such subgroups mainly focus on structural characteristics of the network and/or the induced subgraph. For attributed social networks, then additional attributive information can be exploited. This paper proposes to focus on the dyadic structure of the attributed social interaction networks, thus enabling a compositional perspective for identifying interesting subgroup patterns. Specifically, we can then analyze spatio-temporal data modeled as attributed social interaction networks for identifying exceptional social behavior. The presented approach adapts local pattern mining using subgroup discovery to the dyadic setting, exploiting attribute information of the spatio-temporal attributed interaction networks. With this, specific characteristics of social interactions are considered, i. e., duration and frequency, for identifying subgroups capturing social behavior that deviates from the norm. For subgroup discovery, we propose according interestingness measures in the form of seven novel quality functions and discuss their properties. In our experimentation, we perform an evaluation demonstrating the efficacy of the presented approach using four real-world datasets on face-to-face interactions in academic conferencing as well as school playground contexts. Our results indicate that the proposed method returns interesting, meaningful, and valid findings and results.

社会交往在我们的生活中很普遍。这些可以被观察到,例如,在线使用社交媒体,但是,也可以离线使用传感器。在这种情况下,通常会记录带有时间戳的互动,这也可以从人类的实时位置推断出来。这样的交互数据可以被建模为所谓的社会交互网络。对于他们的分析,可以应用各种不同的方法。然后,一个突出的研究方向是检测具有特殊行为特征的特定子群体的模式,给出一些兴趣度量。在普通图建模交互网络的标准情况下,识别子群的方法主要集中在网络和/或诱导子图的结构特征上。对于有属性的社交网络,则可以利用额外的属性信息。本文建议关注归因社会互动网络的二元结构,从而为识别有趣的子群体模式提供一个组合的视角。具体来说,我们可以分析时空数据建模为属性社会互动网络,以识别异常的社会行为。该方法利用时空属性交互网络的属性信息,将基于子组发现的局部模式挖掘适应于二元设置。有了这个,社会互动的特定特征被考虑,即持续时间和频率,以确定捕获偏离规范的社会行为的子群体。对于子群发现,我们提出了7种新的质量函数形式的兴趣度度量,并讨论了它们的性质。在我们的实验中,我们使用四个真实世界的数据集对学术会议和学校操场环境中的面对面互动进行了评估,证明了所提出方法的有效性。我们的结果表明,提出的方法返回有趣的,有意义的,有效的发现和结果。
{"title":"Mining exceptional social behavior on attributed interaction networks.","authors":"Martin Atzmueller, Carolina Centeio Jorge, Cláudio Rebelo de Sá, Behzad M Heravi, Jenny L Gibson, Rosaldo J F Rossetti","doi":"10.1007/s10994-025-06831-z","DOIUrl":"10.1007/s10994-025-06831-z","url":null,"abstract":"<p><p>Social interactions are prevalent in our lives. These can be observed, e. g., online using social media, however, also offline specifically using sensors. In such contexts, typically time-stamped interactions are recorded, which can also be inferred from real-time location of humans. Such interaction data can then be modeled as so-called social interaction networks. For their analysis, a variety of different approaches can be applied. A prominent research direction is then the detection of patterns describing specific subgroups with exceptional behavioral characteristics, given some measure of interest. In the standard case of plain graphs modeling the interaction networks, methods for identifying such subgroups mainly focus on structural characteristics of the network and/or the induced subgraph. For attributed social networks, then additional attributive information can be exploited. This paper proposes to focus on the dyadic structure of the attributed social interaction networks, thus enabling a compositional perspective for identifying interesting subgroup patterns. Specifically, we can then analyze spatio-temporal data modeled as attributed social interaction networks for identifying exceptional social behavior. The presented approach adapts local pattern mining using subgroup discovery to the dyadic setting, exploiting attribute information of the spatio-temporal attributed interaction networks. With this, specific characteristics of social interactions are considered, i. e., duration and frequency, for identifying subgroups capturing social behavior that deviates from the norm. For subgroup discovery, we propose according interestingness measures in the form of seven novel quality functions and discuss their properties. In our experimentation, we perform an evaluation demonstrating the efficacy of the presented approach using four real-world datasets on face-to-face interactions in academic conferencing as well as school playground contexts. Our results indicate that the proposed method returns interesting, meaningful, and valid findings and results.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"114 11","pages":"243"},"PeriodicalIF":2.9,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12513876/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145281580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Boolean matrix logic programming for active learning of gene functions in genome-scale metabolic network models. 基因组尺度代谢网络模型中基因功能主动学习的布尔矩阵逻辑规划。
IF 2.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-01 Epub Date: 2025-10-19 DOI: 10.1007/s10994-025-06868-0
Lun Ai, Stephen H Muggleton, Shi-Shun Liang, Geoff S Baldwin

Reasoning about hypotheses and updating knowledge through empirical observations are central to scientific discovery. In this work, we applied logic-based machine learning methods to drive biological discovery by guiding experimentation. Genome-scale metabolic network models (GEMs) - comprehensive representations of metabolic genes and reactions - are widely used to evaluate genetic engineering of biological systems. However, GEMs often fail to accurately predict the behaviour of genetically engineered cells, primarily due to incomplete annotations of gene interactions. The task of learning the intricate genetic interactions within GEMs presents computational and empirical challenges. To efficiently predict using GEM, we describe a novel approach called Boolean Matrix Logic Programming (BMLP) by leveraging Boolean matrices to evaluate large logic programs. We developed a new system, [Formula: see text], which guides cost-effective experimentation and uses interpretable logic programs to encode a state-of-the-art GEM of a model bacterial organism. Notably, [Formula: see text] successfully learned the interaction between a gene pair with fewer training examples than random experimentation, overcoming the increase in experimental design space. [Formula: see text] enables rapid optimisation of metabolic models to reliably engineer biological systems for producing useful compounds. It offers a realistic approach to creating a self-driving lab for biological discovery, which would then facilitate microbial engineering for practical applications.

对假设进行推理和通过经验观察更新知识是科学发现的核心。在这项工作中,我们应用基于逻辑的机器学习方法,通过指导实验来推动生物学发现。基因组尺度的代谢网络模型(GEMs)是代谢基因和代谢反应的综合表征,被广泛用于评价生物系统的基因工程。然而,GEMs常常不能准确地预测基因工程细胞的行为,主要是由于基因相互作用的注释不完整。学习GEMs中复杂的遗传相互作用的任务提出了计算和经验方面的挑战。为了有效地使用GEM进行预测,我们描述了一种称为布尔矩阵逻辑规划(BMLP)的新方法,通过利用布尔矩阵来评估大型逻辑程序。我们开发了一个新系统,[公式:见文本],它指导具有成本效益的实验,并使用可解释的逻辑程序来编码模型细菌有机体的最先进的GEM。值得注意的是,与随机实验相比,[公式:见文本]用更少的训练样本成功地学习了基因对之间的相互作用,克服了实验设计空间的增加。[公式:见原文]使代谢模型快速优化,从而可靠地设计生物系统,生产有用的化合物。它为创建一个生物发现的自动驾驶实验室提供了一种现实的方法,这将促进微生物工程的实际应用。
{"title":"Boolean matrix logic programming for active learning of gene functions in genome-scale metabolic network models.","authors":"Lun Ai, Stephen H Muggleton, Shi-Shun Liang, Geoff S Baldwin","doi":"10.1007/s10994-025-06868-0","DOIUrl":"10.1007/s10994-025-06868-0","url":null,"abstract":"<p><p>Reasoning about hypotheses and updating knowledge through empirical observations are central to scientific discovery. In this work, we applied logic-based machine learning methods to drive biological discovery by guiding experimentation. Genome-scale metabolic network models (GEMs) - comprehensive representations of metabolic genes and reactions - are widely used to evaluate genetic engineering of biological systems. However, GEMs often fail to accurately predict the behaviour of genetically engineered cells, primarily due to incomplete annotations of gene interactions. The task of learning the intricate genetic interactions within GEMs presents computational and empirical challenges. To efficiently predict using GEM, we describe a novel approach called Boolean Matrix Logic Programming (BMLP) by leveraging Boolean matrices to evaluate large logic programs. We developed a new system, [Formula: see text], which guides cost-effective experimentation and uses interpretable logic programs to encode a state-of-the-art GEM of a model bacterial organism. Notably, [Formula: see text] successfully learned the interaction between a gene pair with fewer training examples than random experimentation, overcoming the increase in experimental design space. [Formula: see text] enables rapid optimisation of metabolic models to reliably engineer biological systems for producing useful compounds. It offers a realistic approach to creating a self-driving lab for biological discovery, which would then facilitate microbial engineering for practical applications.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"114 11","pages":"254"},"PeriodicalIF":2.9,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12535945/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145349575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Computing the distance between unbalanced distributions: the flat metric. 计算不平衡分布之间的距离:平坦度量。
IF 2.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-01 Epub Date: 2025-07-24 DOI: 10.1007/s10994-025-06828-8
Henri Schmidt, Christian Düll

We provide an implementation to compute the flat metric in any dimension. The flat metric, also called dual bounded Lipschitz distance, generalizes the well-known Wasserstein distance W 1 to the case that the distributions are of unequal total mass. Thus, our implementation adapts very well to mass differences and uses them to distinguish between different distributions. This is of particular interest for unbalanced optimal transport tasks and for the analysis of data distributions where the sample size is important or normalization is not possible. The core of the method is based on a neural network to determine an optimal test function realizing the distance between two given measures. Special focus was put on achieving comparability of pairwise computed distances from independently trained networks. We tested the quality of the output in several experiments where ground truth was available as well as with simulated data.

我们提供了在任何维度上计算平面度量的实现。平坦度规,也称为对偶有界利普希茨距离,将著名的瓦瑟斯坦距离w1推广到总质量分布不等的情况。因此,我们的实现可以很好地适应质量差异,并使用它们来区分不同的发行版。对于不平衡的最优传输任务,以及对于样本大小很重要或不可能规范化的数据分布的分析,这是特别有趣的。该方法的核心是基于神经网络确定一个最优测试函数来实现两个给定测量之间的距离。特别的重点放在实现从独立训练的网络两两计算距离的可比性。我们在几个实验中测试了输出的质量,在这些实验中,地面真实数据和模拟数据都是可用的。
{"title":"Computing the distance between unbalanced distributions: the flat metric.","authors":"Henri Schmidt, Christian Düll","doi":"10.1007/s10994-025-06828-8","DOIUrl":"10.1007/s10994-025-06828-8","url":null,"abstract":"<p><p>We provide an implementation to compute the flat metric in any dimension. The flat metric, also called dual bounded Lipschitz distance, generalizes the well-known Wasserstein distance <math><msub><mi>W</mi> <mn>1</mn></msub> </math> to the case that the distributions are of unequal total mass. Thus, our implementation adapts very well to mass differences and uses them to distinguish between different distributions. This is of particular interest for unbalanced optimal transport tasks and for the analysis of data distributions where the sample size is important or normalization is not possible. The core of the method is based on a neural network to determine an optimal test function realizing the distance between two given measures. Special focus was put on achieving comparability of pairwise computed distances from independently trained networks. We tested the quality of the output in several experiments where ground truth was available as well as with simulated data.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"114 8","pages":"195"},"PeriodicalIF":2.9,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12289810/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144734905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interpretable optimisation-based approach for hyper-box classification. 基于可解释优化的超箱分类方法
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-01 Epub Date: 2025-02-06 DOI: 10.1007/s10994-024-06643-7
Georgios I Liapis, Sophia Tsoka, Lazaros G Papageorgiou

Data classification is considered a fundamental research subject within the machine learning community. Researchers seek the improvement of machine learning algorithms in not only accuracy, but also interpretability. Interpretable algorithms allow humans to easily understand the decisions that a machine learning model makes, which is challenging for black box models. Mathematical programming-based classification algorithms have attracted considerable attention due to their ability to effectively compete with leading-edge algorithms in terms of both accuracy and interpretability. Meanwhile, the training of a hyper-box classifier can be mathematically formulated as a Mixed Integer Linear Programming (MILP) model and the predictions combine accuracy and interpretability. In this work, an optimisation-based approach is proposed for multi-class data classification using a hyper-box representation, thus facilitating the extraction of compact IF-THEN rules. The key novelty of our approach lies in the minimisation of the number and length of the generated rules for enhanced interpretability. Through a number of real-world datasets, it is demonstrated that the algorithm exhibits favorable performance when compared to well-known alternatives in terms of prediction accuracy and rule set simplicity.

数据分类被认为是机器学习领域的一个基础研究课题。研究人员不仅在准确性方面寻求机器学习算法的改进,而且在可解释性方面也寻求改进。可解释的算法使人类能够很容易地理解机器学习模型做出的决定,这对黑箱模型来说是一个挑战。基于数学规划的分类算法由于其在准确性和可解释性方面与前沿算法有效竞争的能力而引起了相当大的关注。同时,超盒分类器的训练可以在数学上表示为混合整数线性规划(MILP)模型,并且预测结合了准确性和可解释性。在这项工作中,提出了一种基于优化的方法,用于使用超盒表示的多类数据分类,从而促进了紧凑IF-THEN规则的提取。我们方法的关键新颖之处在于最小化生成规则的数量和长度,以增强可解释性。通过一些真实世界的数据集,证明了该算法在预测精度和规则集简单性方面比已知的替代算法表现出良好的性能。
{"title":"Interpretable optimisation-based approach for hyper-box classification.","authors":"Georgios I Liapis, Sophia Tsoka, Lazaros G Papageorgiou","doi":"10.1007/s10994-024-06643-7","DOIUrl":"10.1007/s10994-024-06643-7","url":null,"abstract":"<p><p>Data classification is considered a fundamental research subject within the machine learning community. Researchers seek the improvement of machine learning algorithms in not only accuracy, but also interpretability. Interpretable algorithms allow humans to easily understand the decisions that a machine learning model makes, which is challenging for black box models. Mathematical programming-based classification algorithms have attracted considerable attention due to their ability to effectively compete with leading-edge algorithms in terms of both accuracy and interpretability. Meanwhile, the training of a hyper-box classifier can be mathematically formulated as a Mixed Integer Linear Programming (MILP) model and the predictions combine accuracy and interpretability. In this work, an optimisation-based approach is proposed for multi-class data classification using a hyper-box representation, thus facilitating the extraction of compact IF-THEN rules. The key novelty of our approach lies in the minimisation of the number and length of the generated rules for enhanced interpretability. Through a number of real-world datasets, it is demonstrated that the algorithm exhibits favorable performance when compared to well-known alternatives in terms of prediction accuracy and rule set simplicity.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"114 3","pages":"51"},"PeriodicalIF":4.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11861270/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143525101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Offline reinforcement learning for learning to dispatch for job shop scheduling. 离线强化学习学习调度作业车间调度。
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-01 Epub Date: 2025-07-15 DOI: 10.1007/s10994-025-06826-w
Jesse van Remmerden, Zaharah Bukhsh, Yingqian Zhang

The Job Shop Scheduling Problem (JSSP) is a complex combinatorial optimization problem. While online Reinforcement Learning (RL) has shown promise by quickly finding acceptable solutions for JSSP, it faces key limitations: it requires extensive training interactions from scratch leading to sample inefficiency, cannot leverage existing high-quality solutions from traditional methods like Constraint Programming (CP), and require simulated environments to train in, which are impracticable to build for complex scheduling environments. We introduce Offline Learned Dispatching (Offline-LD), an offline reinforcement learning approach for JSSP, which addresses these limitations by learning from historical scheduling data. Our approach is motivated by scenarios where historical scheduling data and expert solutions are available or scenarios where online training of RL approaches with simulated environments is impracticable. Offline-LD introduces maskable variants of two Q-learning methods, namely, Maskable Quantile Regression DQN (mQRDQN) and discrete maskable Soft Actor-Critic (d-mSAC), that are able to learn from historical data, through Conservative Q-Learning (CQL), whereby we present a novel entropy bonus modification for d-mSAC, for maskable action spaces. Moreover, we introduce a novel reward normalization method for JSSP in an offline RL setting. Our experiments demonstrate that Offline-LD outperforms online RL on both generated and benchmark instances when trained on only 100 solutions generated by CP. Notably, introducing noise to the expert dataset yields comparable or superior results to using the expert dataset, with the same amount of instances, a promising finding for real-world applications, where data is inherently noisy and imperfect.

作业车间调度问题(JSSP)是一个复杂的组合优化问题。虽然在线强化学习(RL)通过快速找到可接受的JSSP解决方案显示出了希望,但它面临着关键的局限性:它需要从头开始进行大量的训练交互,导致样本效率低下,不能利用约束规划(CP)等传统方法的现有高质量解决方案,并且需要模拟环境进行训练,这对于复杂的调度环境来说是不切实际的。我们介绍了离线学习调度(Offline- ld),这是一种针对JSSP的离线强化学习方法,它通过从历史调度数据中学习来解决这些限制。我们的方法是由历史调度数据和专家解决方案可用的场景或在模拟环境下RL方法的在线训练是不可行的场景驱动的。离线- ld引入了两种q -学习方法的可屏蔽变体,即可屏蔽分位数回归DQN (mQRDQN)和离散可屏蔽软Actor-Critic (d-mSAC),它们能够通过保守q -学习(CQL)从历史数据中学习,其中我们提出了一种新的熵奖励修改d-mSAC,用于可屏蔽的动作空间。此外,我们还在离线强化学习环境下引入了一种新的JSSP奖励归一化方法。我们的实验表明,当仅在CP生成的100个解决方案上进行训练时,离线- ld在生成和基准实例上的表现都优于在线RL。值得注意的是,在相同数量的实例下,向专家数据集引入噪声会产生与使用专家数据集相当或更好的结果,这对于数据本身具有噪声和不完美的现实应用来说是一个有希望的发现。
{"title":"Offline reinforcement learning for learning to dispatch for job shop scheduling.","authors":"Jesse van Remmerden, Zaharah Bukhsh, Yingqian Zhang","doi":"10.1007/s10994-025-06826-w","DOIUrl":"10.1007/s10994-025-06826-w","url":null,"abstract":"<p><p>The Job Shop Scheduling Problem (JSSP) is a complex combinatorial optimization problem. While online Reinforcement Learning (RL) has shown promise by quickly finding acceptable solutions for JSSP, it faces key limitations: it requires extensive training interactions from scratch leading to sample inefficiency, cannot leverage existing high-quality solutions from traditional methods like Constraint Programming (CP), and require simulated environments to train in, which are impracticable to build for complex scheduling environments. We introduce Offline Learned Dispatching (Offline-LD), an offline reinforcement learning approach for JSSP, which addresses these limitations by learning from historical scheduling data. Our approach is motivated by scenarios where historical scheduling data and expert solutions are available or scenarios where online training of RL approaches with simulated environments is impracticable. Offline-LD introduces maskable variants of two Q-learning methods, namely, Maskable Quantile Regression DQN (mQRDQN) and discrete maskable Soft Actor-Critic (d-mSAC), that are able to learn from historical data, through Conservative Q-Learning (CQL), whereby we present a novel entropy bonus modification for d-mSAC, for maskable action spaces. Moreover, we introduce a novel reward normalization method for JSSP in an offline RL setting. Our experiments demonstrate that Offline-LD outperforms online RL on both generated and benchmark instances when trained on only 100 solutions generated by CP. Notably, introducing noise to the expert dataset yields comparable or superior results to using the expert dataset, with the same amount of instances, a promising finding for real-world applications, where data is inherently noisy and imperfect.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"114 8","pages":"191"},"PeriodicalIF":4.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12263752/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144660910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On metafeatures’ ability of implicit concept identification 关于元特征的内隐概念识别能力
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-18 DOI: 10.1007/s10994-024-06612-0
Joanna Komorniczak, Paweł Ksieniewicz

Concept drift in data stream processing remains an intriguing challenge and states a popular research topic. Methods that actively process data streams usually employ drift detectors, whose performance is often based on monitoring the variability of different stream properties. This publication provides an overview and analysis of metafeatures variability describing data streams with concept drifts. Five experiments conducted on synthetic, semi-synthetic, and real-world data streams examine the ability of over 160 metafeatures from 9 categories to recognize concepts in non-stationary data streams. The work reveals the distinctions in the considered sources of streams and specifies 17 metafeatures with a high ability of concept identification.

数据流处理中的概念漂移仍然是一个引人入胜的挑战,也是一个热门的研究课题。主动处理数据流的方法通常采用漂移检测器,其性能通常基于对不同数据流属性变异性的监测。本出版物概述并分析了描述具有概念漂移的数据流的元特征变异性。在合成、半合成和真实世界数据流上进行的五项实验检验了 9 个类别的 160 多个元特征识别非稳态数据流中概念的能力。这项工作揭示了所考虑的数据流来源的区别,并确定了 17 个具有较高概念识别能力的元特征。
{"title":"On metafeatures’ ability of implicit concept identification","authors":"Joanna Komorniczak, Paweł Ksieniewicz","doi":"10.1007/s10994-024-06612-0","DOIUrl":"https://doi.org/10.1007/s10994-024-06612-0","url":null,"abstract":"<p>Concept drift in data stream processing remains an intriguing challenge and states a popular research topic. Methods that actively process data streams usually employ drift detectors, whose performance is often based on monitoring the variability of different stream properties. This publication provides an overview and analysis of metafeatures variability describing data streams with concept drifts. Five experiments conducted on synthetic, semi-synthetic, and real-world data streams examine the ability of over 160 metafeatures from 9 categories to recognize concepts in non-stationary data streams. The work reveals the distinctions in the considered sources of streams and specifies 17 metafeatures with a high ability of concept identification.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"51 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142253817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards a foundation large events model for soccer 建立足球大型活动基础模型
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-13 DOI: 10.1007/s10994-024-06606-y
Tiago Mendes-Neves, Luís Meireles, João Mendes-Moreira

This paper introduces the Large Events Model (LEM) for soccer, a novel deep learning framework for generating and analyzing soccer matches. The framework can simulate games from a given game state, with its primary output being the ensuing probabilities and events from multiple simulations. These can provide insights into match dynamics and underlying mechanisms. We discuss the framework’s design, features, and methodologies, including model optimization, data processing, and evaluation techniques. The models within this framework are developed to predict specific aspects of soccer events, such as event type, success likelihood, and further details. In an applied context, we showcase the estimation of xP+, a metric estimating a player’s contribution to the team’s points earned. This work ultimately enhances the field of sports event prediction and practical applications and emphasizes the potential for this kind of method.

本文介绍了足球大事件模型(LEM),这是一种用于生成和分析足球比赛的新型深度学习框架。该框架可以从给定的比赛状态出发模拟比赛,其主要输出是来自多次模拟的随之而来的概率和事件。这些数据可以帮助我们深入了解比赛动态和内在机制。我们将讨论该框架的设计、特点和方法,包括模型优化、数据处理和评估技术。该框架中的模型是为预测足球赛事的特定方面而开发的,如赛事类型、成功可能性和更多细节。在应用方面,我们展示了 xP+ 的估算,这是一个估算球员对球队得分贡献的指标。这项工作最终加强了体育赛事预测领域和实际应用,并强调了这种方法的潜力。
{"title":"Towards a foundation large events model for soccer","authors":"Tiago Mendes-Neves, Luís Meireles, João Mendes-Moreira","doi":"10.1007/s10994-024-06606-y","DOIUrl":"https://doi.org/10.1007/s10994-024-06606-y","url":null,"abstract":"<p>This paper introduces the Large Events Model (LEM) for soccer, a novel deep learning framework for generating and analyzing soccer matches. The framework can simulate games from a given game state, with its primary output being the ensuing probabilities and events from multiple simulations. These can provide insights into match dynamics and underlying mechanisms. We discuss the framework’s design, features, and methodologies, including model optimization, data processing, and evaluation techniques. The models within this framework are developed to predict specific aspects of soccer events, such as event type, success likelihood, and further details. In an applied context, we showcase the estimation of xP+, a metric estimating a player’s contribution to the team’s points earned. This work ultimately enhances the field of sports event prediction and practical applications and emphasizes the potential for this kind of method.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"23 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Persistent Laplacian-enhanced algorithm for scarcely labeled data classification 用于稀少标记数据分类的持续拉普拉斯增强算法
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-13 DOI: 10.1007/s10994-024-06616-w
Gokul Bhusal, Ekaterina Merkurjev, Guo-Wei Wei

The success of many machine learning (ML) methods depends crucially on having large amounts of labeled data. However, obtaining enough labeled data can be expensive, time-consuming, and subject to ethical constraints for many applications. One approach that has shown tremendous value in addressing this challenge is semi-supervised learning (SSL); this technique utilizes both labeled and unlabeled data during training, often with much less labeled data than unlabeled data, which is often relatively easy and inexpensive to obtain. In fact, SSL methods are particularly useful in applications where the cost of labeling data is especially expensive, such as medical analysis, natural language processing, or speech recognition. A subset of SSL methods that have achieved great success in various domains involves algorithms that integrate graph-based techniques. These procedures are popular due to the vast amount of information provided by the graphical framework. In this work, we propose an algebraic topology-based semi-supervised method called persistent Laplacian-enhanced graph MBO by integrating persistent spectral graph theory with the classical Merriman–Bence–Osher (MBO) scheme. Specifically, we use a filtration procedure to generate a sequence of chain complexes and associated families of simplicial complexes, from which we construct a family of persistent Laplacians. Overall, it is a very efficient procedure that requires much less labeled data to perform well compared to many ML techniques, and it can be adapted for both small and large datasets. We evaluate the performance of our method on classification, and the results indicate that the technique outperforms other existing semi-supervised algorithms.

许多机器学习(ML)方法的成功在很大程度上取决于是否拥有大量的标记数据。然而,对于许多应用来说,获取足够多的标记数据既昂贵又耗时,而且还受到道德约束。半监督学习(SSL)是一种在应对这一挑战方面显示出巨大价值的方法;这种技术在训练过程中同时使用标记数据和非标记数据,但标记数据往往比非标记数据少得多,而非标记数据通常相对容易获得,而且成本低廉。事实上,在医疗分析、自然语言处理或语音识别等标注数据成本特别昂贵的应用中,SSL 方法尤其有用。在各个领域取得巨大成功的 SSL 方法中,有一个子集涉及集成了基于图的技术的算法。由于图形框架提供了大量信息,这些程序很受欢迎。在这项工作中,我们通过将持久谱图理论与经典的梅里曼-本斯-奥舍(MBO)方案相结合,提出了一种基于代数拓扑的半监督方法,称为持久拉普拉斯增强图 MBO。具体来说,我们使用过滤程序生成链复数序列和相关的简复数族,并由此构建持久拉普拉斯族。总体而言,这是一种非常高效的程序,与许多 ML 技术相比,它所需的标记数据要少得多,而且既适用于小型数据集,也适用于大型数据集。我们对该方法的分类性能进行了评估,结果表明该技术优于其他现有的半监督算法。
{"title":"Persistent Laplacian-enhanced algorithm for scarcely labeled data classification","authors":"Gokul Bhusal, Ekaterina Merkurjev, Guo-Wei Wei","doi":"10.1007/s10994-024-06616-w","DOIUrl":"https://doi.org/10.1007/s10994-024-06616-w","url":null,"abstract":"<p>The success of many machine learning (ML) methods depends crucially on having large amounts of labeled data. However, obtaining enough labeled data can be expensive, time-consuming, and subject to ethical constraints for many applications. One approach that has shown tremendous value in addressing this challenge is semi-supervised learning (SSL); this technique utilizes both labeled and unlabeled data during training, often with much less labeled data than unlabeled data, which is often relatively easy and inexpensive to obtain. In fact, SSL methods are particularly useful in applications where the cost of labeling data is especially expensive, such as medical analysis, natural language processing, or speech recognition. A subset of SSL methods that have achieved great success in various domains involves algorithms that integrate graph-based techniques. These procedures are popular due to the vast amount of information provided by the graphical framework. In this work, we propose an algebraic topology-based semi-supervised method called persistent Laplacian-enhanced graph MBO by integrating persistent spectral graph theory with the classical Merriman–Bence–Osher (MBO) scheme. Specifically, we use a filtration procedure to generate a sequence of chain complexes and associated families of simplicial complexes, from which we construct a family of persistent Laplacians. Overall, it is a very efficient procedure that requires much less labeled data to perform well compared to many ML techniques, and it can be adapted for both small and large datasets. We evaluate the performance of our method on classification, and the results indicate that the technique outperforms other existing semi-supervised algorithms.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"176 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Machine Learning
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1