Pub Date : 2025-01-01Epub Date: 2025-07-15DOI: 10.1007/s10994-025-06824-y
Thomas Baldwin-McDonald, Xinxing Shi, Mingxin Shen, Mauricio A Álvarez
Modelling the behaviour of highly nonlinear dynamical systems with robust uncertainty quantification is a challenging task which typically requires approaches specifically designed to address the problem at hand. We introduce a domain-agnostic model to address this issue termed the deep latent force model (DLFM), a deep Gaussian process with physics-informed kernels at each layer, derived from ordinary differential equations using the framework of process convolutions. Two distinct formulations of the DLFM are presented which utilise weight-space and variational inducing points-based Gaussian process approximations, both of which are amenable to doubly stochastic variational inference. We present empirical evidence of the capability of the DLFM to capture the dynamics present in highly nonlinear real-world multi-output time series data. Additionally, we find that the DLFM is capable of achieving comparable performance to a range of non-physics-informed probabilistic models on benchmark univariate regression tasks. We also empirically assess the negative impact of the inducing points framework on the extrapolation capabilities of LFM-based models.
{"title":"Deep latent force models: ODE-based process convolutions for Bayesian deep learning.","authors":"Thomas Baldwin-McDonald, Xinxing Shi, Mingxin Shen, Mauricio A Álvarez","doi":"10.1007/s10994-025-06824-y","DOIUrl":"10.1007/s10994-025-06824-y","url":null,"abstract":"<p><p>Modelling the behaviour of highly nonlinear dynamical systems with robust uncertainty quantification is a challenging task which typically requires approaches specifically designed to address the problem at hand. We introduce a domain-agnostic model to address this issue termed the deep latent force model (DLFM), a deep Gaussian process with physics-informed kernels at each layer, derived from ordinary differential equations using the framework of process convolutions. Two distinct formulations of the DLFM are presented which utilise weight-space and variational inducing points-based Gaussian process approximations, both of which are amenable to doubly stochastic variational inference. We present empirical evidence of the capability of the DLFM to capture the dynamics present in highly nonlinear real-world multi-output time series data. Additionally, we find that the DLFM is capable of achieving comparable performance to a range of non-physics-informed probabilistic models on benchmark univariate regression tasks. We also empirically assess the negative impact of the inducing points framework on the extrapolation capabilities of LFM-based models.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"114 8","pages":"192"},"PeriodicalIF":4.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12263784/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144660909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-08-12DOI: 10.1007/s10994-025-06834-w
Frederik Pahde, Thomas Wiegand, Sebastian Lapuschkin, Wojciech Samek
Deep neural networks are increasingly employed in high-stakes medical applications, despite their tendency for shortcut learning in the presence of spurious correlations, which can have potentially fatal consequences in practice. Whereas a multitude of works address either the detection or mitigation of such shortcut behavior in isolation, the Reveal2Revise approach provides a comprehensive bias mitigation framework combining these steps. However, effectively addressing these biases often requires substantial labeling efforts from domain experts. In this work, we review the steps of the Reveal2Revise framework and enhance it with semi-automated interpretability-based bias annotation capabilities. This includes methods for the sample- and feature-level bias annotation, providing valuable information for bias mitigation methods to unlearn the undesired shortcut behavior. We show the applicability of the framework using four medical datasets across two modalities, featuring controlled and real-world spurious correlations caused by data artifacts. We successfully identify and mitigate these biases in VGG16, ResNet50, and contemporary Vision Transformer models, ultimately increasing their robustness and applicability for real-world medical tasks. Our code is available at https://github.com/frederikpahde/medical-ai-safety.
{"title":"Ensuring medical AI safety: interpretability-driven detection and mitigation of spurious model behavior and associated data.","authors":"Frederik Pahde, Thomas Wiegand, Sebastian Lapuschkin, Wojciech Samek","doi":"10.1007/s10994-025-06834-w","DOIUrl":"10.1007/s10994-025-06834-w","url":null,"abstract":"<p><p>Deep neural networks are increasingly employed in high-stakes medical applications, despite their tendency for shortcut learning in the presence of spurious correlations, which can have potentially fatal consequences in practice. Whereas a multitude of works address either the detection or mitigation of such shortcut behavior in isolation, the Reveal2Revise approach provides a comprehensive bias mitigation framework combining these steps. However, effectively addressing these biases often requires substantial labeling efforts from domain experts. In this work, we review the steps of the Reveal2Revise framework and enhance it with semi-automated interpretability-based bias annotation capabilities. This includes methods for the sample- and feature-level bias annotation, providing valuable information for bias mitigation methods to unlearn the undesired shortcut behavior. We show the applicability of the framework using four medical datasets across two modalities, featuring controlled and real-world spurious correlations caused by data artifacts. We successfully identify and mitigate these biases in VGG16, ResNet50, and contemporary Vision Transformer models, ultimately increasing their robustness and applicability for real-world medical tasks. Our code is available at https://github.com/frederikpahde/medical-ai-safety.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"114 9","pages":"206"},"PeriodicalIF":2.9,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12343733/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144856810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-10-10DOI: 10.1007/s10994-025-06831-z
Martin Atzmueller, Carolina Centeio Jorge, Cláudio Rebelo de Sá, Behzad M Heravi, Jenny L Gibson, Rosaldo J F Rossetti
Social interactions are prevalent in our lives. These can be observed, e. g., online using social media, however, also offline specifically using sensors. In such contexts, typically time-stamped interactions are recorded, which can also be inferred from real-time location of humans. Such interaction data can then be modeled as so-called social interaction networks. For their analysis, a variety of different approaches can be applied. A prominent research direction is then the detection of patterns describing specific subgroups with exceptional behavioral characteristics, given some measure of interest. In the standard case of plain graphs modeling the interaction networks, methods for identifying such subgroups mainly focus on structural characteristics of the network and/or the induced subgraph. For attributed social networks, then additional attributive information can be exploited. This paper proposes to focus on the dyadic structure of the attributed social interaction networks, thus enabling a compositional perspective for identifying interesting subgroup patterns. Specifically, we can then analyze spatio-temporal data modeled as attributed social interaction networks for identifying exceptional social behavior. The presented approach adapts local pattern mining using subgroup discovery to the dyadic setting, exploiting attribute information of the spatio-temporal attributed interaction networks. With this, specific characteristics of social interactions are considered, i. e., duration and frequency, for identifying subgroups capturing social behavior that deviates from the norm. For subgroup discovery, we propose according interestingness measures in the form of seven novel quality functions and discuss their properties. In our experimentation, we perform an evaluation demonstrating the efficacy of the presented approach using four real-world datasets on face-to-face interactions in academic conferencing as well as school playground contexts. Our results indicate that the proposed method returns interesting, meaningful, and valid findings and results.
{"title":"Mining exceptional social behavior on attributed interaction networks.","authors":"Martin Atzmueller, Carolina Centeio Jorge, Cláudio Rebelo de Sá, Behzad M Heravi, Jenny L Gibson, Rosaldo J F Rossetti","doi":"10.1007/s10994-025-06831-z","DOIUrl":"10.1007/s10994-025-06831-z","url":null,"abstract":"<p><p>Social interactions are prevalent in our lives. These can be observed, e. g., online using social media, however, also offline specifically using sensors. In such contexts, typically time-stamped interactions are recorded, which can also be inferred from real-time location of humans. Such interaction data can then be modeled as so-called social interaction networks. For their analysis, a variety of different approaches can be applied. A prominent research direction is then the detection of patterns describing specific subgroups with exceptional behavioral characteristics, given some measure of interest. In the standard case of plain graphs modeling the interaction networks, methods for identifying such subgroups mainly focus on structural characteristics of the network and/or the induced subgraph. For attributed social networks, then additional attributive information can be exploited. This paper proposes to focus on the dyadic structure of the attributed social interaction networks, thus enabling a compositional perspective for identifying interesting subgroup patterns. Specifically, we can then analyze spatio-temporal data modeled as attributed social interaction networks for identifying exceptional social behavior. The presented approach adapts local pattern mining using subgroup discovery to the dyadic setting, exploiting attribute information of the spatio-temporal attributed interaction networks. With this, specific characteristics of social interactions are considered, i. e., duration and frequency, for identifying subgroups capturing social behavior that deviates from the norm. For subgroup discovery, we propose according interestingness measures in the form of seven novel quality functions and discuss their properties. In our experimentation, we perform an evaluation demonstrating the efficacy of the presented approach using four real-world datasets on face-to-face interactions in academic conferencing as well as school playground contexts. Our results indicate that the proposed method returns interesting, meaningful, and valid findings and results.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"114 11","pages":"243"},"PeriodicalIF":2.9,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12513876/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145281580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-10-19DOI: 10.1007/s10994-025-06868-0
Lun Ai, Stephen H Muggleton, Shi-Shun Liang, Geoff S Baldwin
Reasoning about hypotheses and updating knowledge through empirical observations are central to scientific discovery. In this work, we applied logic-based machine learning methods to drive biological discovery by guiding experimentation. Genome-scale metabolic network models (GEMs) - comprehensive representations of metabolic genes and reactions - are widely used to evaluate genetic engineering of biological systems. However, GEMs often fail to accurately predict the behaviour of genetically engineered cells, primarily due to incomplete annotations of gene interactions. The task of learning the intricate genetic interactions within GEMs presents computational and empirical challenges. To efficiently predict using GEM, we describe a novel approach called Boolean Matrix Logic Programming (BMLP) by leveraging Boolean matrices to evaluate large logic programs. We developed a new system, [Formula: see text], which guides cost-effective experimentation and uses interpretable logic programs to encode a state-of-the-art GEM of a model bacterial organism. Notably, [Formula: see text] successfully learned the interaction between a gene pair with fewer training examples than random experimentation, overcoming the increase in experimental design space. [Formula: see text] enables rapid optimisation of metabolic models to reliably engineer biological systems for producing useful compounds. It offers a realistic approach to creating a self-driving lab for biological discovery, which would then facilitate microbial engineering for practical applications.
{"title":"Boolean matrix logic programming for active learning of gene functions in genome-scale metabolic network models.","authors":"Lun Ai, Stephen H Muggleton, Shi-Shun Liang, Geoff S Baldwin","doi":"10.1007/s10994-025-06868-0","DOIUrl":"10.1007/s10994-025-06868-0","url":null,"abstract":"<p><p>Reasoning about hypotheses and updating knowledge through empirical observations are central to scientific discovery. In this work, we applied logic-based machine learning methods to drive biological discovery by guiding experimentation. Genome-scale metabolic network models (GEMs) - comprehensive representations of metabolic genes and reactions - are widely used to evaluate genetic engineering of biological systems. However, GEMs often fail to accurately predict the behaviour of genetically engineered cells, primarily due to incomplete annotations of gene interactions. The task of learning the intricate genetic interactions within GEMs presents computational and empirical challenges. To efficiently predict using GEM, we describe a novel approach called Boolean Matrix Logic Programming (BMLP) by leveraging Boolean matrices to evaluate large logic programs. We developed a new system, [Formula: see text], which guides cost-effective experimentation and uses interpretable logic programs to encode a state-of-the-art GEM of a model bacterial organism. Notably, [Formula: see text] successfully learned the interaction between a gene pair with fewer training examples than random experimentation, overcoming the increase in experimental design space. [Formula: see text] enables rapid optimisation of metabolic models to reliably engineer biological systems for producing useful compounds. It offers a realistic approach to creating a self-driving lab for biological discovery, which would then facilitate microbial engineering for practical applications.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"114 11","pages":"254"},"PeriodicalIF":2.9,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12535945/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145349575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-07-24DOI: 10.1007/s10994-025-06828-8
Henri Schmidt, Christian Düll
We provide an implementation to compute the flat metric in any dimension. The flat metric, also called dual bounded Lipschitz distance, generalizes the well-known Wasserstein distance to the case that the distributions are of unequal total mass. Thus, our implementation adapts very well to mass differences and uses them to distinguish between different distributions. This is of particular interest for unbalanced optimal transport tasks and for the analysis of data distributions where the sample size is important or normalization is not possible. The core of the method is based on a neural network to determine an optimal test function realizing the distance between two given measures. Special focus was put on achieving comparability of pairwise computed distances from independently trained networks. We tested the quality of the output in several experiments where ground truth was available as well as with simulated data.
{"title":"Computing the distance between unbalanced distributions: the flat metric.","authors":"Henri Schmidt, Christian Düll","doi":"10.1007/s10994-025-06828-8","DOIUrl":"10.1007/s10994-025-06828-8","url":null,"abstract":"<p><p>We provide an implementation to compute the flat metric in any dimension. The flat metric, also called dual bounded Lipschitz distance, generalizes the well-known Wasserstein distance <math><msub><mi>W</mi> <mn>1</mn></msub> </math> to the case that the distributions are of unequal total mass. Thus, our implementation adapts very well to mass differences and uses them to distinguish between different distributions. This is of particular interest for unbalanced optimal transport tasks and for the analysis of data distributions where the sample size is important or normalization is not possible. The core of the method is based on a neural network to determine an optimal test function realizing the distance between two given measures. Special focus was put on achieving comparability of pairwise computed distances from independently trained networks. We tested the quality of the output in several experiments where ground truth was available as well as with simulated data.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"114 8","pages":"195"},"PeriodicalIF":2.9,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12289810/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144734905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-02-06DOI: 10.1007/s10994-024-06643-7
Georgios I Liapis, Sophia Tsoka, Lazaros G Papageorgiou
Data classification is considered a fundamental research subject within the machine learning community. Researchers seek the improvement of machine learning algorithms in not only accuracy, but also interpretability. Interpretable algorithms allow humans to easily understand the decisions that a machine learning model makes, which is challenging for black box models. Mathematical programming-based classification algorithms have attracted considerable attention due to their ability to effectively compete with leading-edge algorithms in terms of both accuracy and interpretability. Meanwhile, the training of a hyper-box classifier can be mathematically formulated as a Mixed Integer Linear Programming (MILP) model and the predictions combine accuracy and interpretability. In this work, an optimisation-based approach is proposed for multi-class data classification using a hyper-box representation, thus facilitating the extraction of compact IF-THEN rules. The key novelty of our approach lies in the minimisation of the number and length of the generated rules for enhanced interpretability. Through a number of real-world datasets, it is demonstrated that the algorithm exhibits favorable performance when compared to well-known alternatives in terms of prediction accuracy and rule set simplicity.
{"title":"Interpretable optimisation-based approach for hyper-box classification.","authors":"Georgios I Liapis, Sophia Tsoka, Lazaros G Papageorgiou","doi":"10.1007/s10994-024-06643-7","DOIUrl":"10.1007/s10994-024-06643-7","url":null,"abstract":"<p><p>Data classification is considered a fundamental research subject within the machine learning community. Researchers seek the improvement of machine learning algorithms in not only accuracy, but also interpretability. Interpretable algorithms allow humans to easily understand the decisions that a machine learning model makes, which is challenging for black box models. Mathematical programming-based classification algorithms have attracted considerable attention due to their ability to effectively compete with leading-edge algorithms in terms of both accuracy and interpretability. Meanwhile, the training of a hyper-box classifier can be mathematically formulated as a Mixed Integer Linear Programming (MILP) model and the predictions combine accuracy and interpretability. In this work, an optimisation-based approach is proposed for multi-class data classification using a hyper-box representation, thus facilitating the extraction of compact IF-THEN rules. The key novelty of our approach lies in the minimisation of the number and length of the generated rules for enhanced interpretability. Through a number of real-world datasets, it is demonstrated that the algorithm exhibits favorable performance when compared to well-known alternatives in terms of prediction accuracy and rule set simplicity.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"114 3","pages":"51"},"PeriodicalIF":4.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11861270/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143525101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-07-15DOI: 10.1007/s10994-025-06826-w
Jesse van Remmerden, Zaharah Bukhsh, Yingqian Zhang
The Job Shop Scheduling Problem (JSSP) is a complex combinatorial optimization problem. While online Reinforcement Learning (RL) has shown promise by quickly finding acceptable solutions for JSSP, it faces key limitations: it requires extensive training interactions from scratch leading to sample inefficiency, cannot leverage existing high-quality solutions from traditional methods like Constraint Programming (CP), and require simulated environments to train in, which are impracticable to build for complex scheduling environments. We introduce Offline Learned Dispatching (Offline-LD), an offline reinforcement learning approach for JSSP, which addresses these limitations by learning from historical scheduling data. Our approach is motivated by scenarios where historical scheduling data and expert solutions are available or scenarios where online training of RL approaches with simulated environments is impracticable. Offline-LD introduces maskable variants of two Q-learning methods, namely, Maskable Quantile Regression DQN (mQRDQN) and discrete maskable Soft Actor-Critic (d-mSAC), that are able to learn from historical data, through Conservative Q-Learning (CQL), whereby we present a novel entropy bonus modification for d-mSAC, for maskable action spaces. Moreover, we introduce a novel reward normalization method for JSSP in an offline RL setting. Our experiments demonstrate that Offline-LD outperforms online RL on both generated and benchmark instances when trained on only 100 solutions generated by CP. Notably, introducing noise to the expert dataset yields comparable or superior results to using the expert dataset, with the same amount of instances, a promising finding for real-world applications, where data is inherently noisy and imperfect.
{"title":"Offline reinforcement learning for learning to dispatch for job shop scheduling.","authors":"Jesse van Remmerden, Zaharah Bukhsh, Yingqian Zhang","doi":"10.1007/s10994-025-06826-w","DOIUrl":"10.1007/s10994-025-06826-w","url":null,"abstract":"<p><p>The Job Shop Scheduling Problem (JSSP) is a complex combinatorial optimization problem. While online Reinforcement Learning (RL) has shown promise by quickly finding acceptable solutions for JSSP, it faces key limitations: it requires extensive training interactions from scratch leading to sample inefficiency, cannot leverage existing high-quality solutions from traditional methods like Constraint Programming (CP), and require simulated environments to train in, which are impracticable to build for complex scheduling environments. We introduce Offline Learned Dispatching (Offline-LD), an offline reinforcement learning approach for JSSP, which addresses these limitations by learning from historical scheduling data. Our approach is motivated by scenarios where historical scheduling data and expert solutions are available or scenarios where online training of RL approaches with simulated environments is impracticable. Offline-LD introduces maskable variants of two Q-learning methods, namely, Maskable Quantile Regression DQN (mQRDQN) and discrete maskable Soft Actor-Critic (d-mSAC), that are able to learn from historical data, through Conservative Q-Learning (CQL), whereby we present a novel entropy bonus modification for d-mSAC, for maskable action spaces. Moreover, we introduce a novel reward normalization method for JSSP in an offline RL setting. Our experiments demonstrate that Offline-LD outperforms online RL on both generated and benchmark instances when trained on only 100 solutions generated by CP. Notably, introducing noise to the expert dataset yields comparable or superior results to using the expert dataset, with the same amount of instances, a promising finding for real-world applications, where data is inherently noisy and imperfect.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"114 8","pages":"191"},"PeriodicalIF":4.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12263752/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144660910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-18DOI: 10.1007/s10994-024-06612-0
Joanna Komorniczak, Paweł Ksieniewicz
Concept drift in data stream processing remains an intriguing challenge and states a popular research topic. Methods that actively process data streams usually employ drift detectors, whose performance is often based on monitoring the variability of different stream properties. This publication provides an overview and analysis of metafeatures variability describing data streams with concept drifts. Five experiments conducted on synthetic, semi-synthetic, and real-world data streams examine the ability of over 160 metafeatures from 9 categories to recognize concepts in non-stationary data streams. The work reveals the distinctions in the considered sources of streams and specifies 17 metafeatures with a high ability of concept identification.
{"title":"On metafeatures’ ability of implicit concept identification","authors":"Joanna Komorniczak, Paweł Ksieniewicz","doi":"10.1007/s10994-024-06612-0","DOIUrl":"https://doi.org/10.1007/s10994-024-06612-0","url":null,"abstract":"<p>Concept drift in data stream processing remains an intriguing challenge and states a popular research topic. Methods that actively process data streams usually employ drift detectors, whose performance is often based on monitoring the variability of different stream properties. This publication provides an overview and analysis of metafeatures variability describing data streams with concept drifts. Five experiments conducted on synthetic, semi-synthetic, and real-world data streams examine the ability of over 160 metafeatures from 9 categories to recognize concepts in non-stationary data streams. The work reveals the distinctions in the considered sources of streams and specifies 17 metafeatures with a high ability of concept identification.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"51 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142253817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-13DOI: 10.1007/s10994-024-06606-y
Tiago Mendes-Neves, Luís Meireles, João Mendes-Moreira
This paper introduces the Large Events Model (LEM) for soccer, a novel deep learning framework for generating and analyzing soccer matches. The framework can simulate games from a given game state, with its primary output being the ensuing probabilities and events from multiple simulations. These can provide insights into match dynamics and underlying mechanisms. We discuss the framework’s design, features, and methodologies, including model optimization, data processing, and evaluation techniques. The models within this framework are developed to predict specific aspects of soccer events, such as event type, success likelihood, and further details. In an applied context, we showcase the estimation of xP+, a metric estimating a player’s contribution to the team’s points earned. This work ultimately enhances the field of sports event prediction and practical applications and emphasizes the potential for this kind of method.
{"title":"Towards a foundation large events model for soccer","authors":"Tiago Mendes-Neves, Luís Meireles, João Mendes-Moreira","doi":"10.1007/s10994-024-06606-y","DOIUrl":"https://doi.org/10.1007/s10994-024-06606-y","url":null,"abstract":"<p>This paper introduces the Large Events Model (LEM) for soccer, a novel deep learning framework for generating and analyzing soccer matches. The framework can simulate games from a given game state, with its primary output being the ensuing probabilities and events from multiple simulations. These can provide insights into match dynamics and underlying mechanisms. We discuss the framework’s design, features, and methodologies, including model optimization, data processing, and evaluation techniques. The models within this framework are developed to predict specific aspects of soccer events, such as event type, success likelihood, and further details. In an applied context, we showcase the estimation of xP+, a metric estimating a player’s contribution to the team’s points earned. This work ultimately enhances the field of sports event prediction and practical applications and emphasizes the potential for this kind of method.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"23 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-13DOI: 10.1007/s10994-024-06616-w
Gokul Bhusal, Ekaterina Merkurjev, Guo-Wei Wei
The success of many machine learning (ML) methods depends crucially on having large amounts of labeled data. However, obtaining enough labeled data can be expensive, time-consuming, and subject to ethical constraints for many applications. One approach that has shown tremendous value in addressing this challenge is semi-supervised learning (SSL); this technique utilizes both labeled and unlabeled data during training, often with much less labeled data than unlabeled data, which is often relatively easy and inexpensive to obtain. In fact, SSL methods are particularly useful in applications where the cost of labeling data is especially expensive, such as medical analysis, natural language processing, or speech recognition. A subset of SSL methods that have achieved great success in various domains involves algorithms that integrate graph-based techniques. These procedures are popular due to the vast amount of information provided by the graphical framework. In this work, we propose an algebraic topology-based semi-supervised method called persistent Laplacian-enhanced graph MBO by integrating persistent spectral graph theory with the classical Merriman–Bence–Osher (MBO) scheme. Specifically, we use a filtration procedure to generate a sequence of chain complexes and associated families of simplicial complexes, from which we construct a family of persistent Laplacians. Overall, it is a very efficient procedure that requires much less labeled data to perform well compared to many ML techniques, and it can be adapted for both small and large datasets. We evaluate the performance of our method on classification, and the results indicate that the technique outperforms other existing semi-supervised algorithms.
许多机器学习(ML)方法的成功在很大程度上取决于是否拥有大量的标记数据。然而,对于许多应用来说,获取足够多的标记数据既昂贵又耗时,而且还受到道德约束。半监督学习(SSL)是一种在应对这一挑战方面显示出巨大价值的方法;这种技术在训练过程中同时使用标记数据和非标记数据,但标记数据往往比非标记数据少得多,而非标记数据通常相对容易获得,而且成本低廉。事实上,在医疗分析、自然语言处理或语音识别等标注数据成本特别昂贵的应用中,SSL 方法尤其有用。在各个领域取得巨大成功的 SSL 方法中,有一个子集涉及集成了基于图的技术的算法。由于图形框架提供了大量信息,这些程序很受欢迎。在这项工作中,我们通过将持久谱图理论与经典的梅里曼-本斯-奥舍(MBO)方案相结合,提出了一种基于代数拓扑的半监督方法,称为持久拉普拉斯增强图 MBO。具体来说,我们使用过滤程序生成链复数序列和相关的简复数族,并由此构建持久拉普拉斯族。总体而言,这是一种非常高效的程序,与许多 ML 技术相比,它所需的标记数据要少得多,而且既适用于小型数据集,也适用于大型数据集。我们对该方法的分类性能进行了评估,结果表明该技术优于其他现有的半监督算法。
{"title":"Persistent Laplacian-enhanced algorithm for scarcely labeled data classification","authors":"Gokul Bhusal, Ekaterina Merkurjev, Guo-Wei Wei","doi":"10.1007/s10994-024-06616-w","DOIUrl":"https://doi.org/10.1007/s10994-024-06616-w","url":null,"abstract":"<p>The success of many machine learning (ML) methods depends crucially on having large amounts of labeled data. However, obtaining enough labeled data can be expensive, time-consuming, and subject to ethical constraints for many applications. One approach that has shown tremendous value in addressing this challenge is semi-supervised learning (SSL); this technique utilizes both labeled and unlabeled data during training, often with much less labeled data than unlabeled data, which is often relatively easy and inexpensive to obtain. In fact, SSL methods are particularly useful in applications where the cost of labeling data is especially expensive, such as medical analysis, natural language processing, or speech recognition. A subset of SSL methods that have achieved great success in various domains involves algorithms that integrate graph-based techniques. These procedures are popular due to the vast amount of information provided by the graphical framework. In this work, we propose an algebraic topology-based semi-supervised method called persistent Laplacian-enhanced graph MBO by integrating persistent spectral graph theory with the classical Merriman–Bence–Osher (MBO) scheme. Specifically, we use a filtration procedure to generate a sequence of chain complexes and associated families of simplicial complexes, from which we construct a family of persistent Laplacians. Overall, it is a very efficient procedure that requires much less labeled data to perform well compared to many ML techniques, and it can be adapted for both small and large datasets. We evaluate the performance of our method on classification, and the results indicate that the technique outperforms other existing semi-supervised algorithms.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"176 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}