首页 > 最新文献

Machine Learning最新文献

英文 中文
Distribution-free conformal joint prediction regions for neural marked temporal point processes 神经标记时点过程的无分布共形联合预测区域
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-23 DOI: 10.1007/s10994-024-06594-z
Victor Dheur, Tanguy Bosser, Rafael Izbicki, Souhaib Ben Taieb

Sequences of labeled events observed at irregular intervals in continuous time are ubiquitous across various fields. Temporal Point Processes (TPPs) provide a mathematical framework for modeling these sequences, enabling inferences such as predicting the arrival time of future events and their associated label, called mark. However, due to model misspecification or lack of training data, these probabilistic models may provide a poor approximation of the true, unknown underlying process, with prediction regions extracted from them being unreliable estimates of the underlying uncertainty. This paper develops more reliable methods for uncertainty quantification in neural TPP models via the framework of conformal prediction. A primary objective is to generate a distribution-free joint prediction region for an event’s arrival time and mark, with a finite-sample marginal coverage guarantee. A key challenge is to handle both a strictly positive, continuous response and a categorical response, without distributional assumptions. We first consider a simple but overly conservative approach that combines individual prediction regions for the event’s arrival time and mark. Then, we introduce a more effective method based on bivariate highest density regions derived from the joint predictive density of arrival times and marks. By leveraging the dependencies between these two variables, this method excludes unlikely combinations of the two, resulting in sharper prediction regions while still attaining the pre-specified coverage level. We also explore the generation of individual univariate prediction regions for events’ arrival times and marks through conformal regression and classification techniques. Moreover, we evaluate the stronger notion of conditional coverage. Finally, through extensive experimentation on both simulated and real-world datasets, we assess the validity and efficiency of these methods.

在连续时间中以不规则间隔观察到的标记事件序列在各个领域无处不在。时点过程(TPPs)为这些序列建模提供了一个数学框架,可用于推断,如预测未来事件的到达时间及其相关标签(称为标记)。然而,由于模型规范错误或缺乏训练数据,这些概率模型可能无法很好地近似真实、未知的底层过程,从中提取的预测区域对底层不确定性的估计并不可靠。本文通过保形预测框架,为神经 TPP 模型的不确定性量化开发了更可靠的方法。其主要目标是为事件的到达时间和标记生成一个无分布的联合预测区域,并保证有限样本的边际覆盖。一个关键的挑战是如何在不考虑分布假设的情况下,同时处理严格的正向连续响应和分类响应。我们首先考虑了一种简单但过于保守的方法,即结合事件到达时间和标记的单个预测区域。然后,我们介绍了一种更有效的方法,它基于从到达时间和标记的联合预测密度得出的二元最高密度区域。通过利用这两个变量之间的依赖关系,该方法排除了这两个变量的不可能组合,从而产生了更清晰的预测区域,同时仍能达到预先指定的覆盖水平。我们还探索了通过保形回归和分类技术生成事件到达时间和标记的单变量预测区域。此外,我们还评估了更强的条件覆盖概念。最后,通过在模拟和真实世界数据集上进行大量实验,我们评估了这些方法的有效性和效率。
{"title":"Distribution-free conformal joint prediction regions for neural marked temporal point processes","authors":"Victor Dheur, Tanguy Bosser, Rafael Izbicki, Souhaib Ben Taieb","doi":"10.1007/s10994-024-06594-z","DOIUrl":"https://doi.org/10.1007/s10994-024-06594-z","url":null,"abstract":"<p>Sequences of labeled events observed at irregular intervals in continuous time are ubiquitous across various fields. Temporal Point Processes (TPPs) provide a mathematical framework for modeling these sequences, enabling inferences such as predicting the arrival time of future events and their associated label, called mark. However, due to model misspecification or lack of training data, these probabilistic models may provide a poor approximation of the true, unknown underlying process, with prediction regions extracted from them being unreliable estimates of the underlying uncertainty. This paper develops more reliable methods for uncertainty quantification in neural TPP models via the framework of conformal prediction. A primary objective is to generate a distribution-free joint prediction region for an event’s arrival time and mark, with a finite-sample marginal coverage guarantee. A key challenge is to handle both a strictly positive, continuous response and a categorical response, without distributional assumptions. We first consider a simple but overly conservative approach that combines individual prediction regions for the event’s arrival time and mark. Then, we introduce a more effective method based on bivariate highest density regions derived from the joint predictive density of arrival times and marks. By leveraging the dependencies between these two variables, this method excludes unlikely combinations of the two, resulting in sharper prediction regions while still attaining the pre-specified coverage level. We also explore the generation of individual univariate prediction regions for events’ arrival times and marks through conformal regression and classification techniques. Moreover, we evaluate the stronger notion of conditional coverage. Finally, through extensive experimentation on both simulated and real-world datasets, we assess the validity and efficiency of these methods.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"31 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141780828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extrapolation is not the same as interpolation 外推法不同于内插法
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-23 DOI: 10.1007/s10994-024-06591-2
Yuxuan Wang, Ross D. King

We propose a new machine learning formulation designed specifically for extrapolation. The textbook way to apply machine learning to drug design is to learn a univariate function that when a drug (structure) is input, the function outputs a real number (the activity): f(drug) (rightarrow) activity. However, experience in real-world drug design suggests that this formulation of the drug design problem is not quite correct. Specifically, what one is really interested in is extrapolation: predicting the activity of new drugs with higher activity than any existing ones. Our new formulation for extrapolation is based on learning a bivariate function that predicts the difference in activities of two drugs F(drug1, drug2) (rightarrow) difference in activity, followed by the use of ranking algorithms. This formulation is general and agnostic, suitable for finding samples with target values beyond the target value range of the training set. We applied the formulation to work with support vector machines , random forests , and Gradient Boosting Machines . We compared the formulation with standard regression on thousands of drug design datasets, gene expression datasets and material property datasets. The test set extrapolation metric was the identification of examples with greater values than the training set, and top-performing examples (within the top 10% of the whole dataset). On this metric our pairwise formulation vastly outperformed standard regression. Its proposed variations also showed a consistent outperformance. Its application in the stock selection problem further confirmed the advantage of this pairwise formulation.

我们提出了一种专为外推设计的新机器学习方法。教科书上将机器学习应用于药物设计的方法是学习一个单变量函数,当输入药物(结构)时,函数输出一个实数(活性):f(drug) (rightarrow) activity。然而,实际药物设计的经验表明,药物设计问题的这种表述并不完全正确。具体来说,我们真正感兴趣的是外推法:预测活性高于任何现有药物的新药的活性。我们新的外推方法是基于学习一个预测两种药物活性差异的双变量函数 F(drug1, drug2) (rightarrow) 活性差异,然后使用排序算法。这种公式具有通用性和不可知性,适用于寻找目标值超出训练集目标值范围的样本。我们将该公式应用于支持向量机、随机森林和梯度提升机。我们在数以千计的药物设计数据集、基因表达数据集和材料属性数据集上比较了该公式与标准回归。测试集的外推指标是识别出比训练集值更高的示例,以及表现最好的示例(在整个数据集中排名前 10%)。在这一指标上,我们的成对公式大大优于标准回归。其提出的变体也显示出持续的优越性。在选股问题中的应用进一步证实了这种成对公式的优势。
{"title":"Extrapolation is not the same as interpolation","authors":"Yuxuan Wang, Ross D. King","doi":"10.1007/s10994-024-06591-2","DOIUrl":"https://doi.org/10.1007/s10994-024-06591-2","url":null,"abstract":"<p>We propose a new machine learning formulation designed specifically for extrapolation. The textbook way to apply machine learning to drug design is to learn a univariate function that when a drug (structure) is input, the function outputs a real number (the activity): <i>f</i>(drug) <span>(rightarrow)</span> activity. However, experience in real-world drug design suggests that this formulation of the drug design problem is not quite correct. Specifically, what one is really interested in is extrapolation: predicting the activity of new drugs with higher activity than any existing ones. Our new formulation for extrapolation is based on learning a bivariate function that predicts the difference in activities of two drugs <i>F</i>(drug1, drug2) <span>(rightarrow)</span> difference in activity, followed by the use of ranking algorithms. This formulation is general and agnostic, suitable for finding samples with target values beyond the target value range of the training set. We applied the formulation to work with support vector machines , random forests , and Gradient Boosting Machines . We compared the formulation with standard regression on thousands of drug design datasets, gene expression datasets and material property datasets. The test set extrapolation metric was the identification of examples with greater values than the training set, and top-performing examples (within the top 10% of the whole dataset). On this metric our pairwise formulation vastly outperformed standard regression. Its proposed variations also showed a consistent outperformance. Its application in the stock selection problem further confirmed the advantage of this pairwise formulation.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"70 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141780829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards efficient AutoML: a pipeline synthesis approach leveraging pre-trained transformers for multimodal data 实现高效的 AutoML:利用预训练转换器进行多模态数据的流水线合成方法
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-19 DOI: 10.1007/s10994-024-06568-1
Ambarish Moharil, Joaquin Vanschoren, Prabhant Singh, Damian Tamburri

This paper introduces an Automated Machine Learning (AutoML) framework specifically designed to efficiently synthesize end-to-end multimodal machine learning pipelines. Traditional reliance on the computationally demanding Neural Architecture Search is minimized through the strategic integration of pre-trained transformer models. This innovative approach enables the effective unification of diverse data modalities into high-dimensional embeddings, streamlining the pipeline development process. We leverage an advanced Bayesian Optimization strategy, informed by meta-learning, to facilitate the warm-starting of the pipeline synthesis, thereby enhancing computational efficiency. Our methodology demonstrates its potential to create advanced and custom multimodal pipelines within limited computational resources. Extensive testing across 23 varied multimodal datasets indicates the promise and utility of our framework in diverse scenarios. The results contribute to the ongoing efforts in the AutoML field, suggesting new possibilities for efficiently handling complex multimodal data. This research represents a step towards developing more efficient and versatile tools in multimodal machine learning pipeline development, acknowledging the collaborative and ever-evolving nature of this field.

本文介绍了自动机器学习(AutoML)框架,该框架专门用于高效合成端到端多模态机器学习管道。通过战略性地集成预训练的转换器模型,最大限度地减少了对计算要求极高的神经架构搜索的传统依赖。这种创新方法能将不同的数据模式有效地统一到高维嵌入中,从而简化管道开发流程。我们利用先进的贝叶斯优化策略,通过元学习,促进管道合成的热启动,从而提高计算效率。我们的方法展示了在有限的计算资源内创建先进的定制多模态管道的潜力。在 23 个不同的多模态数据集上进行的广泛测试表明了我们的框架在不同场景中的前景和实用性。这些结果为 AutoML 领域的持续努力做出了贡献,为高效处理复杂的多模态数据提供了新的可能性。这项研究标志着在多模态机器学习管道开发方面朝着开发更高效、更多功能的工具迈出了一步,同时也承认了这一领域的协作性和不断发展的性质。
{"title":"Towards efficient AutoML: a pipeline synthesis approach leveraging pre-trained transformers for multimodal data","authors":"Ambarish Moharil, Joaquin Vanschoren, Prabhant Singh, Damian Tamburri","doi":"10.1007/s10994-024-06568-1","DOIUrl":"https://doi.org/10.1007/s10994-024-06568-1","url":null,"abstract":"<p>This paper introduces an Automated Machine Learning (AutoML) framework specifically designed to efficiently synthesize end-to-end multimodal machine learning pipelines. Traditional reliance on the computationally demanding Neural Architecture Search is minimized through the strategic integration of pre-trained transformer models. This innovative approach enables the effective unification of diverse data modalities into high-dimensional embeddings, streamlining the pipeline development process. We leverage an advanced Bayesian Optimization strategy, informed by meta-learning, to facilitate the warm-starting of the pipeline synthesis, thereby enhancing computational efficiency. Our methodology demonstrates its potential to create advanced and custom multimodal pipelines within limited computational resources. Extensive testing across 23 varied multimodal datasets indicates the promise and utility of our framework in diverse scenarios. The results contribute to the ongoing efforts in the AutoML field, suggesting new possibilities for efficiently handling complex multimodal data. This research represents a step towards developing more efficient and versatile tools in multimodal machine learning pipeline development, acknowledging the collaborative and ever-evolving nature of this field.\u0000</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"76 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141746054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ICM ensemble with novel betting functions for concept drift 具有新颖投注功能的概念漂移 ICM 集合
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-17 DOI: 10.1007/s10994-024-06593-0
Charalambos Eliades, Harris Papadopoulos

This study builds upon our previous work by introducing a refined Inductive Conformal Martingale (ICM) approach for addressing Concept Drift. Specifically, we enhance our previously proposed CAUTIOUS betting function to incorporate multiple density estimators for improving detection ability. We also combine this betting function with two base estimators that have not been previously utilized within the ICM framework: the Interpolated Histogram and Nearest Neighbor Density Estimators. We assess these extensions using both a single ICM and an ensemble of ICMs. For the latter, we conduct a comprehensive experimental investigation into the influence of the ensemble size on prediction accuracy and the number of available predictions. Our experimental results on four benchmark datasets demonstrate that the proposed approach surpasses our previous methodology in terms of performance while matching or in many cases exceeding that of three contemporary state-of-the-art techniques.

本研究以我们之前的工作为基础,引入了一种经过改进的归纳共形马丁格尔(ICM)方法来解决概念漂移问题。具体来说,我们增强了之前提出的 CAUTIOUS 下注函数,将多个密度估算器纳入其中,以提高检测能力。我们还将这一投注函数与之前未在 ICM 框架内使用过的两个基本估计器相结合:插值直方图和近邻密度估计器。我们使用单个 ICM 和 ICM 集合对这些扩展进行了评估。对于后者,我们对集合规模对预测准确性和可用预测数量的影响进行了全面的实验研究。我们在四个基准数据集上的实验结果表明,所提出的方法在性能上超越了我们以前的方法,同时与三种当代最先进的技术相匹配,甚至在很多情况下超过了它们。
{"title":"ICM ensemble with novel betting functions for concept drift","authors":"Charalambos Eliades, Harris Papadopoulos","doi":"10.1007/s10994-024-06593-0","DOIUrl":"https://doi.org/10.1007/s10994-024-06593-0","url":null,"abstract":"<p>This study builds upon our previous work by introducing a refined Inductive Conformal Martingale (ICM) approach for addressing Concept Drift. Specifically, we enhance our previously proposed CAUTIOUS betting function to incorporate multiple density estimators for improving detection ability. We also combine this betting function with two base estimators that have not been previously utilized within the ICM framework: the Interpolated Histogram and Nearest Neighbor Density Estimators. We assess these extensions using both a single ICM and an ensemble of ICMs. For the latter, we conduct a comprehensive experimental investigation into the influence of the ensemble size on prediction accuracy and the number of available predictions. Our experimental results on four benchmark datasets demonstrate that the proposed approach surpasses our previous methodology in terms of performance while matching or in many cases exceeding that of three contemporary state-of-the-art techniques.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"160 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141740287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variable selection for both outcomes and predictors: sparse multivariate principal covariates regression 结果和预测因素的变量选择:稀疏多变量主协变量回归
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-17 DOI: 10.1007/s10994-024-06520-3
Soogeun Park, Eva Ceulemans, Katrijn Van Deun

Datasets comprised of large sets of both predictor and outcome variables are becoming more widely used in research. In addition to the well-known problems of model complexity and predictor variable selection, predictive modelling with such large data also presents a relatively novel and under-studied challenge of outcome variable selection. Certain outcome variables in the data may not be adequately predicted by the given sets of predictors. In this paper, we propose the method of Sparse Multivariate Principal Covariates Regression that addresses these issues altogether by expanding the Principal Covariates Regression model to incorporate sparsity penalties on both of predictor and outcome variables. Our method is one of the first methods that perform variable selection for both predictors and outcomes simultaneously. Moreover, by relying on summary variables that explain the variance in both predictor and outcome variables, the method offers a sparse and succinct model representation of the data. In a simulation study, the method performed better than methods with similar aims such as sparse Partial Least Squares at prediction of the outcome variables and recovery of the population parameters. Lastly, we administered the method on an empirical dataset to illustrate its application in practice.

由大量预测变量和结果变量组成的数据集在研究中的应用越来越广泛。除了众所周知的模型复杂性和预测变量选择问题外,使用此类大型数据建立预测模型还面临着一个相对新颖且研究不足的挑战,即结果变量选择问题。给定的预测变量集可能无法充分预测数据中的某些结果变量。在本文中,我们提出了稀疏多变量主变量回归方法,通过扩展主变量回归模型,在预测变量和结果变量上都加入稀疏性惩罚,从而彻底解决这些问题。我们的方法是首批同时对预测变量和结果变量进行选择的方法之一。此外,通过依赖能解释预测变量和结果变量方差的汇总变量,该方法提供了稀疏而简洁的数据模型表示。在模拟研究中,该方法在预测结果变量和恢复群体参数方面的表现优于具有类似目的的方法,如稀疏偏最小二乘法。最后,我们在一个经验数据集上使用了该方法,以说明其在实践中的应用。
{"title":"Variable selection for both outcomes and predictors: sparse multivariate principal covariates regression","authors":"Soogeun Park, Eva Ceulemans, Katrijn Van Deun","doi":"10.1007/s10994-024-06520-3","DOIUrl":"https://doi.org/10.1007/s10994-024-06520-3","url":null,"abstract":"<p>Datasets comprised of large sets of both predictor and outcome variables are becoming more widely used in research. In addition to the well-known problems of model complexity and predictor variable selection, predictive modelling with such large data also presents a relatively novel and under-studied challenge of outcome variable selection. Certain outcome variables in the data may not be adequately predicted by the given sets of predictors. In this paper, we propose the method of Sparse Multivariate Principal Covariates Regression that addresses these issues altogether by expanding the Principal Covariates Regression model to incorporate sparsity penalties on both of predictor and outcome variables. Our method is one of the first methods that perform variable selection for both predictors and outcomes simultaneously. Moreover, by relying on summary variables that explain the variance in both predictor and outcome variables, the method offers a sparse and succinct model representation of the data. In a simulation study, the method performed better than methods with similar aims such as sparse Partial Least Squares at prediction of the outcome variables and recovery of the population parameters. Lastly, we administered the method on an empirical dataset to illustrate its application in practice.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"2018 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141740289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Methodology and evaluation in sports analytics: challenges, approaches, and lessons learned 体育分析的方法和评估:挑战、方法和经验教训
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-17 DOI: 10.1007/s10994-024-06585-0
Jesse Davis, Lotte Bransen, Laurens Devos, Arne Jaspers, Wannes Meert, Pieter Robberechts, Jan Van Haaren, Maaike Van Roy

There has been an explosion of data collected about sports. Because such data is extremely rich and complex, machine learning is increasingly being used to extract actionable insights from it. Typically, machine learning is used to build models and indicators that capture the skills, capabilities, and tendencies of athletes and teams. Such indicators and models are in turn used to inform decision-making at professional clubs. Designing these indicators requires paying careful attention to a number of subtle issues from a methodological and evaluation perspective. In this paper, we highlight these challenges in sports and discuss a variety of approaches for handling them. Methodologically, we highlight that dependencies affect how to perform data partitioning for evaluation as well as the need to consider contextual factors. From an evaluation perspective, we draw a distinction between evaluating the developed indicators themselves versus the underlying models that power them. We argue that both aspects must be considered, but that they require different approaches. We hope that this article helps bridge the gap between traditional sports expertise and modern data analytics by providing a structured framework with practical examples.

收集到的体育数据呈爆炸式增长。由于这些数据极其丰富和复杂,人们越来越多地使用机器学习来从中提取可行的见解。通常,机器学习用于建立模型和指标,以捕捉运动员和团队的技能、能力和倾向。这些指标和模型反过来又为职业俱乐部的决策提供依据。设计这些指标需要从方法论和评估的角度仔细关注一些微妙的问题。在本文中,我们强调了体育运动中的这些挑战,并讨论了处理这些挑战的各种方法。在方法论上,我们强调依赖性会影响如何进行数据分区以进行评估,以及考虑背景因素的必要性。从评估的角度来看,我们将评估所开发的指标本身与支持这些指标的基础模型区分开来。我们认为,这两个方面都必须考虑,但需要采用不同的方法。我们希望这篇文章通过提供一个结构化的框架和实际案例,有助于弥合传统体育专业知识与现代数据分析之间的差距。
{"title":"Methodology and evaluation in sports analytics: challenges, approaches, and lessons learned","authors":"Jesse Davis, Lotte Bransen, Laurens Devos, Arne Jaspers, Wannes Meert, Pieter Robberechts, Jan Van Haaren, Maaike Van Roy","doi":"10.1007/s10994-024-06585-0","DOIUrl":"https://doi.org/10.1007/s10994-024-06585-0","url":null,"abstract":"<p>There has been an explosion of data collected about sports. Because such data is extremely rich and complex, machine learning is increasingly being used to extract actionable insights from it. Typically, machine learning is used to build models and indicators that capture the skills, capabilities, and tendencies of athletes and teams. Such indicators and models are in turn used to inform decision-making at professional clubs. Designing these indicators requires paying careful attention to a number of subtle issues from a methodological and evaluation perspective. In this paper, we highlight these challenges in sports and discuss a variety of approaches for handling them. Methodologically, we highlight that dependencies affect how to perform data partitioning for evaluation as well as the need to consider contextual factors. From an evaluation perspective, we draw a distinction between evaluating the developed indicators themselves versus the underlying models that power them. We argue that both aspects must be considered, but that they require different approaches. We hope that this article helps bridge the gap between traditional sports expertise and modern data analytics by providing a structured framework with practical examples.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"26 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141746053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatial entropy as an inductive bias for vision transformers 作为视觉变换器感应偏置的空间熵
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-17 DOI: 10.1007/s10994-024-06570-7
Elia Peruzzo, Enver Sangineto, Yahui Liu, Marco De Nadai, Wei Bi, Bruno Lepri, Nicu Sebe

Recent work on Vision Transformers (VTs) showed that introducing a local inductive bias in the VT architecture helps reducing the number of samples necessary for training. However, the architecture modifications lead to a loss of generality of the Transformer backbone, partially contradicting the push towards the development of uniform architectures, shared, e.g., by both the Computer Vision and the Natural Language Processing areas. In this work, we propose a different and complementary direction, in which a local bias is introduced using an auxiliary self-supervised task, performed jointly with standard supervised training. Specifically, we exploit the observation that the attention maps of VTs, when trained with self-supervision, can contain a semantic segmentation structure which does not spontaneously emerge when training is supervised. Thus, we explicitly encourage the emergence of this spatial clustering as a form of training regularization. In more detail, we exploit the assumption that, in a given image, objects usually correspond to few connected regions, and we propose a spatial formulation of the information entropy to quantify this object-based inductive bias. By minimizing the proposed spatial entropy, we include an additional self-supervised signal during training. Using extensive experiments, we show that the proposed regularization leads to equivalent or better results than other VT proposals which include a local bias by changing the basic Transformer architecture, and it can drastically boost the VT final accuracy when using small-medium training sets. The code is available at https://github.com/helia95/SAR.

最近关于视觉转换器(VT)的研究表明,在视觉转换器架构中引入局部归纳偏差有助于减少训练所需的样本数量。然而,架构的修改导致变换器主干失去了通用性,部分违背了计算机视觉和自然语言处理等领域共享的统一架构的发展方向。在这项工作中,我们提出了一个不同的互补方向,即使用辅助自监督任务引入局部偏差,并与标准监督训练联合执行。具体来说,我们利用了一个观察结果,即在进行自我监督训练时,VT 的注意图可能包含一种语义分割结构,而这种结构在监督训练时不会自发出现。因此,我们明确鼓励这种空间聚类的出现,将其作为训练正则化的一种形式。更详细地说,我们利用了一个假设,即在给定图像中,对象通常对应于很少的连接区域,我们提出了一种信息熵的空间表述来量化这种基于对象的归纳偏差。通过最小化提出的空间熵,我们在训练过程中加入了额外的自监督信号。通过大量的实验,我们发现所提出的正则化方法与其他通过改变基本变换器架构而包含局部偏差的 VT 方案相比,能带来相同或更好的结果,而且在使用中小型训练集时,它还能大幅提高 VT 的最终准确率。代码见 https://github.com/helia95/SAR。
{"title":"Spatial entropy as an inductive bias for vision transformers","authors":"Elia Peruzzo, Enver Sangineto, Yahui Liu, Marco De Nadai, Wei Bi, Bruno Lepri, Nicu Sebe","doi":"10.1007/s10994-024-06570-7","DOIUrl":"https://doi.org/10.1007/s10994-024-06570-7","url":null,"abstract":"<p>Recent work on Vision Transformers (VTs) showed that introducing a local inductive bias in the VT <i>architecture</i> helps reducing the number of samples necessary for training. However, the architecture modifications lead to a loss of generality of the Transformer backbone, partially contradicting the push towards the development of uniform architectures, shared, e.g., by both the Computer Vision and the Natural Language Processing areas. In this work, we propose a different and complementary direction, in which a local bias is introduced using <i>an auxiliary self-supervised task</i>, performed jointly with standard supervised training. Specifically, we exploit the observation that the attention maps of VTs, when trained with self-supervision, can contain a semantic segmentation structure which does not spontaneously emerge when training is supervised. Thus, we <i>explicitly</i> encourage the emergence of this spatial clustering as a form of training regularization. In more detail, we exploit the assumption that, in a given image, objects usually correspond to few connected regions, and we propose a spatial formulation of the information entropy to quantify this <i>object-based inductive bias</i>. By minimizing the proposed spatial entropy, we include an additional self-supervised signal during training. Using extensive experiments, we show that the proposed regularization leads to equivalent or better results than other VT proposals which include a local bias by changing the basic Transformer architecture, and it can drastically boost the VT final accuracy when using small-medium training sets. The code is available at https://github.com/helia95/SAR.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"68 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141740288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
XAI-TRIS: non-linear image benchmarks to quantify false positive post-hoc attribution of feature importance XAI-TRIS:非线性图像基准,用于量化特征重要性的假阳性事后归因
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-16 DOI: 10.1007/s10994-024-06574-3
Benedict Clark, Rick Wilming, Stefan Haufe

The field of ‘explainable’ artificial intelligence (XAI) has produced highly acclaimed methods that seek to make the decisions of complex machine learning (ML) methods ‘understandable’ to humans, for example by attributing ‘importance’ scores to input features. Yet, a lack of formal underpinning leaves it unclear as to what conclusions can safely be drawn from the results of a given XAI method and has also so far hindered the theoretical verification and empirical validation of XAI methods. This means that challenging non-linear problems, typically solved by deep neural networks, presently lack appropriate remedies. Here, we craft benchmark datasets for one linear and three different non-linear classification scenarios, in which the important class-conditional features are known by design, serving as ground truth explanations. Using novel quantitative metrics, we benchmark the explanation performance of a wide set of XAI methods across three deep learning model architectures. We show that popular XAI methods are often unable to significantly outperform random performance baselines and edge detection methods, attributing false-positive importance to features with no statistical relationship to the prediction target rather than truly important features. Moreover, we demonstrate that explanations derived from different model architectures can be vastly different; thus, prone to misinterpretation even under controlled conditions.

可解释 "人工智能(XAI)领域提出了一些备受赞誉的方法,这些方法试图让人类 "理解 "复杂的机器学习(ML)方法的决策,例如通过对输入特征赋予 "重要性 "分数。然而,由于缺乏正式的支持,人们并不清楚从特定 XAI 方法的结果中可以安全地得出什么结论,这也阻碍了 XAI 方法的理论验证和经验验证。这意味着,通常由深度神经网络解决的具有挑战性的非线性问题目前缺乏适当的补救措施。在这里,我们为一种线性分类和三种不同的非线性分类场景制作了基准数据集,其中重要的类条件特征在设计上是已知的,可作为地面实况解释。利用新颖的定量指标,我们对三种深度学习模型架构的各种 XAI 方法的解释性能进行了基准测试。我们的研究表明,流行的 XAI 方法往往无法显著超越随机性能基线和边缘检测方法,它们将假阳性重要性归因于与预测目标没有统计关系的特征,而不是真正重要的特征。此外,我们还证明了从不同模型架构中得出的解释可能大相径庭;因此,即使在受控条件下也容易产生误读。
{"title":"XAI-TRIS: non-linear image benchmarks to quantify false positive post-hoc attribution of feature importance","authors":"Benedict Clark, Rick Wilming, Stefan Haufe","doi":"10.1007/s10994-024-06574-3","DOIUrl":"https://doi.org/10.1007/s10994-024-06574-3","url":null,"abstract":"<p>The field of ‘explainable’ artificial intelligence (XAI) has produced highly acclaimed methods that seek to make the decisions of complex machine learning (ML) methods ‘understandable’ to humans, for example by attributing ‘importance’ scores to input features. Yet, a lack of formal underpinning leaves it unclear as to what conclusions can safely be drawn from the results of a given XAI method and has also so far hindered the theoretical verification and empirical validation of XAI methods. This means that challenging non-linear problems, typically solved by deep neural networks, presently lack appropriate remedies. Here, we craft benchmark datasets for one linear and three different non-linear classification scenarios, in which the important class-conditional features are known by design, serving as ground truth explanations. Using novel quantitative metrics, we benchmark the explanation performance of a wide set of XAI methods across three deep learning model architectures. We show that popular XAI methods are often unable to significantly outperform random performance baselines and edge detection methods, attributing false-positive importance to features with no statistical relationship to the prediction target rather than truly important features. Moreover, we demonstrate that explanations derived from different model architectures can be vastly different; thus, prone to misinterpretation even under controlled conditions.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"22 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141717813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Partitioned least squares 分区最小二乘法
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-15 DOI: 10.1007/s10994-024-06582-3
Roberto Esposito, Mattia Cerrato, Marco Locatelli

Linear least squares is one of the most widely used regression methods in many fields. The simplicity of the model allows this method to be used when data is scarce and allows practitioners to gather some insight into the problem by inspecting the values of the learnt parameters. In this paper we propose a variant of the linear least squares model allowing practitioners to partition the input features into groups of variables that they require to contribute similarly to the final result. We show that the new formulation is not convex and provide two alternative methods to deal with the problem: one non-exact method based on an alternating least squares approach; and one exact method based on a reformulation of the problem. We show the correctness of the exact method and compare the two solutions showing that the exact solution provides better results in a fraction of the time required by the alternating least squares solution (when the number of partitions is small). We also provide a branch and bound algorithm that can be used in place of the exact method when the number of partitions is too large as well as a proof of NP-completeness of the optimization problem.

线性最小二乘法是许多领域最广泛使用的回归方法之一。该模型的简单性使其在数据稀缺的情况下也能使用,并允许从业人员通过检查所学参数的值对问题进行深入了解。在本文中,我们提出了线性最小二乘法模型的一种变体,允许从业人员将输入特征划分为变量组,要求这些变量对最终结果的贡献相似。我们证明了新表述并不具有凸性,并提供了两种处理问题的替代方法:一种是基于交替最小二乘法的非精确方法;另一种是基于问题重新表述的精确方法。我们证明了精确法的正确性,并对两种解法进行了比较,结果表明精确解法所需的时间仅为交替最小二乘法解法的一小部分(当分区数量较少时)。我们还提供了一种分支和约束算法,当分区数过大时,该算法可用于替代精确法,并证明了优化问题的 NP 完备性。
{"title":"Partitioned least squares","authors":"Roberto Esposito, Mattia Cerrato, Marco Locatelli","doi":"10.1007/s10994-024-06582-3","DOIUrl":"https://doi.org/10.1007/s10994-024-06582-3","url":null,"abstract":"<p>Linear least squares is one of the most widely used regression methods in many fields. The simplicity of the model allows this method to be used when data is scarce and allows practitioners to gather some insight into the problem by inspecting the values of the learnt parameters. In this paper we propose a variant of the linear least squares model allowing practitioners to partition the input features into groups of variables that they require to contribute similarly to the final result. We show that the new formulation is not convex and provide two alternative methods to deal with the problem: one non-exact method based on an alternating least squares approach; and one exact method based on a reformulation of the problem. We show the correctness of the exact method and compare the two solutions showing that the exact solution provides better results in a fraction of the time required by the alternating least squares solution (when the number of partitions is small). We also provide a branch and bound algorithm that can be used in place of the exact method when the number of partitions is too large as well as a proof of NP-completeness of the optimization problem.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"73 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141717814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
L2XGNN: learning to explain graph neural networks L2XGNN:学习解释图神经网络
IF 7.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-12 DOI: 10.1007/s10994-024-06576-1
Giuseppe Serra, Mathias Niepert

Graph Neural Networks (GNNs) are a popular class of machine learning models. Inspired by the learning to explain (L2X) paradigm, we propose L2xGnn, a framework for explainable GNNs which provides faithful explanations by design. L2xGnn learns a mechanism for selecting explanatory subgraphs (motifs) which are exclusively used in the GNNs message-passing operations. L2xGnn is able to select, for each input graph, a subgraph with specific properties such as being sparse and connected. Imposing such constraints on the motifs often leads to more interpretable and effective explanations. Experiments on several datasets suggest that L2xGnn achieves the same classification accuracy as baseline methods using the entire input graph while ensuring that only the provided explanations are used to make predictions. Moreover, we show that L2xGnn is able to identify motifs responsible for the graph’s properties it is intended to predict.

图神经网络(GNN)是一类流行的机器学习模型。受 "学习解释"(L2X)范式的启发,我们提出了 L2xGnn--一种可解释的图神经网络框架,它通过设计提供忠实的解释。L2xGnn 学习一种选择解释性子图(图案)的机制,这些子图专门用于 GNN 的信息传递操作。L2xGnn 能够为每个输入图选择具有稀疏性和连接性等特定属性的子图。对主题图施加这样的限制往往会带来更多可解释性和更有效的解释。在多个数据集上的实验表明,L2xGnn 与使用整个输入图的基准方法达到了相同的分类准确率,同时确保只使用所提供的解释进行预测。此外,我们还证明了 L2xGnn 能够识别出负责其所要预测的图形属性的主题。
{"title":"L2XGNN: learning to explain graph neural networks","authors":"Giuseppe Serra, Mathias Niepert","doi":"10.1007/s10994-024-06576-1","DOIUrl":"https://doi.org/10.1007/s10994-024-06576-1","url":null,"abstract":"<p>Graph Neural Networks (GNNs) are a popular class of machine learning models. Inspired by the learning to explain (L2X) paradigm, we propose <span>L2xGnn</span>, a framework for explainable GNNs which provides <i>faithful</i> explanations by design. <span>L2xGnn</span> learns a mechanism for selecting explanatory subgraphs (motifs) which are exclusively used in the GNNs message-passing operations. <span>L2xGnn</span> is able to select, for each input graph, a subgraph with specific properties such as being sparse and connected. Imposing such constraints on the motifs often leads to more interpretable and effective explanations. Experiments on several datasets suggest that <span>L2xGnn</span> achieves the same classification accuracy as baseline methods using the entire input graph while ensuring that only the provided explanations are used to make predictions. Moreover, we show that <span>L2xGnn</span> is able to identify motifs responsible for the graph’s properties it is intended to predict.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"32 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141612076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Machine Learning
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1