Pub Date : 2024-09-18DOI: 10.1007/s10994-024-06612-0
Joanna Komorniczak, Paweł Ksieniewicz
Concept drift in data stream processing remains an intriguing challenge and states a popular research topic. Methods that actively process data streams usually employ drift detectors, whose performance is often based on monitoring the variability of different stream properties. This publication provides an overview and analysis of metafeatures variability describing data streams with concept drifts. Five experiments conducted on synthetic, semi-synthetic, and real-world data streams examine the ability of over 160 metafeatures from 9 categories to recognize concepts in non-stationary data streams. The work reveals the distinctions in the considered sources of streams and specifies 17 metafeatures with a high ability of concept identification.
{"title":"On metafeatures’ ability of implicit concept identification","authors":"Joanna Komorniczak, Paweł Ksieniewicz","doi":"10.1007/s10994-024-06612-0","DOIUrl":"https://doi.org/10.1007/s10994-024-06612-0","url":null,"abstract":"<p>Concept drift in data stream processing remains an intriguing challenge and states a popular research topic. Methods that actively process data streams usually employ drift detectors, whose performance is often based on monitoring the variability of different stream properties. This publication provides an overview and analysis of metafeatures variability describing data streams with concept drifts. Five experiments conducted on synthetic, semi-synthetic, and real-world data streams examine the ability of over 160 metafeatures from 9 categories to recognize concepts in non-stationary data streams. The work reveals the distinctions in the considered sources of streams and specifies 17 metafeatures with a high ability of concept identification.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"51 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142253817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-13DOI: 10.1007/s10994-024-06606-y
Tiago Mendes-Neves, Luís Meireles, João Mendes-Moreira
This paper introduces the Large Events Model (LEM) for soccer, a novel deep learning framework for generating and analyzing soccer matches. The framework can simulate games from a given game state, with its primary output being the ensuing probabilities and events from multiple simulations. These can provide insights into match dynamics and underlying mechanisms. We discuss the framework’s design, features, and methodologies, including model optimization, data processing, and evaluation techniques. The models within this framework are developed to predict specific aspects of soccer events, such as event type, success likelihood, and further details. In an applied context, we showcase the estimation of xP+, a metric estimating a player’s contribution to the team’s points earned. This work ultimately enhances the field of sports event prediction and practical applications and emphasizes the potential for this kind of method.
{"title":"Towards a foundation large events model for soccer","authors":"Tiago Mendes-Neves, Luís Meireles, João Mendes-Moreira","doi":"10.1007/s10994-024-06606-y","DOIUrl":"https://doi.org/10.1007/s10994-024-06606-y","url":null,"abstract":"<p>This paper introduces the Large Events Model (LEM) for soccer, a novel deep learning framework for generating and analyzing soccer matches. The framework can simulate games from a given game state, with its primary output being the ensuing probabilities and events from multiple simulations. These can provide insights into match dynamics and underlying mechanisms. We discuss the framework’s design, features, and methodologies, including model optimization, data processing, and evaluation techniques. The models within this framework are developed to predict specific aspects of soccer events, such as event type, success likelihood, and further details. In an applied context, we showcase the estimation of xP+, a metric estimating a player’s contribution to the team’s points earned. This work ultimately enhances the field of sports event prediction and practical applications and emphasizes the potential for this kind of method.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"23 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-13DOI: 10.1007/s10994-024-06616-w
Gokul Bhusal, Ekaterina Merkurjev, Guo-Wei Wei
The success of many machine learning (ML) methods depends crucially on having large amounts of labeled data. However, obtaining enough labeled data can be expensive, time-consuming, and subject to ethical constraints for many applications. One approach that has shown tremendous value in addressing this challenge is semi-supervised learning (SSL); this technique utilizes both labeled and unlabeled data during training, often with much less labeled data than unlabeled data, which is often relatively easy and inexpensive to obtain. In fact, SSL methods are particularly useful in applications where the cost of labeling data is especially expensive, such as medical analysis, natural language processing, or speech recognition. A subset of SSL methods that have achieved great success in various domains involves algorithms that integrate graph-based techniques. These procedures are popular due to the vast amount of information provided by the graphical framework. In this work, we propose an algebraic topology-based semi-supervised method called persistent Laplacian-enhanced graph MBO by integrating persistent spectral graph theory with the classical Merriman–Bence–Osher (MBO) scheme. Specifically, we use a filtration procedure to generate a sequence of chain complexes and associated families of simplicial complexes, from which we construct a family of persistent Laplacians. Overall, it is a very efficient procedure that requires much less labeled data to perform well compared to many ML techniques, and it can be adapted for both small and large datasets. We evaluate the performance of our method on classification, and the results indicate that the technique outperforms other existing semi-supervised algorithms.
许多机器学习(ML)方法的成功在很大程度上取决于是否拥有大量的标记数据。然而,对于许多应用来说,获取足够多的标记数据既昂贵又耗时,而且还受到道德约束。半监督学习(SSL)是一种在应对这一挑战方面显示出巨大价值的方法;这种技术在训练过程中同时使用标记数据和非标记数据,但标记数据往往比非标记数据少得多,而非标记数据通常相对容易获得,而且成本低廉。事实上,在医疗分析、自然语言处理或语音识别等标注数据成本特别昂贵的应用中,SSL 方法尤其有用。在各个领域取得巨大成功的 SSL 方法中,有一个子集涉及集成了基于图的技术的算法。由于图形框架提供了大量信息,这些程序很受欢迎。在这项工作中,我们通过将持久谱图理论与经典的梅里曼-本斯-奥舍(MBO)方案相结合,提出了一种基于代数拓扑的半监督方法,称为持久拉普拉斯增强图 MBO。具体来说,我们使用过滤程序生成链复数序列和相关的简复数族,并由此构建持久拉普拉斯族。总体而言,这是一种非常高效的程序,与许多 ML 技术相比,它所需的标记数据要少得多,而且既适用于小型数据集,也适用于大型数据集。我们对该方法的分类性能进行了评估,结果表明该技术优于其他现有的半监督算法。
{"title":"Persistent Laplacian-enhanced algorithm for scarcely labeled data classification","authors":"Gokul Bhusal, Ekaterina Merkurjev, Guo-Wei Wei","doi":"10.1007/s10994-024-06616-w","DOIUrl":"https://doi.org/10.1007/s10994-024-06616-w","url":null,"abstract":"<p>The success of many machine learning (ML) methods depends crucially on having large amounts of labeled data. However, obtaining enough labeled data can be expensive, time-consuming, and subject to ethical constraints for many applications. One approach that has shown tremendous value in addressing this challenge is semi-supervised learning (SSL); this technique utilizes both labeled and unlabeled data during training, often with much less labeled data than unlabeled data, which is often relatively easy and inexpensive to obtain. In fact, SSL methods are particularly useful in applications where the cost of labeling data is especially expensive, such as medical analysis, natural language processing, or speech recognition. A subset of SSL methods that have achieved great success in various domains involves algorithms that integrate graph-based techniques. These procedures are popular due to the vast amount of information provided by the graphical framework. In this work, we propose an algebraic topology-based semi-supervised method called persistent Laplacian-enhanced graph MBO by integrating persistent spectral graph theory with the classical Merriman–Bence–Osher (MBO) scheme. Specifically, we use a filtration procedure to generate a sequence of chain complexes and associated families of simplicial complexes, from which we construct a family of persistent Laplacians. Overall, it is a very efficient procedure that requires much less labeled data to perform well compared to many ML techniques, and it can be adapted for both small and large datasets. We evaluate the performance of our method on classification, and the results indicate that the technique outperforms other existing semi-supervised algorithms.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"176 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-09DOI: 10.1007/s10994-024-06615-x
Solène Vilfroy, Lionel Bombrun, Thierry Urruty, Florence De Grancey, Jean-Philippe Lebrat, Philippe Carré
Semi-autonomous aircraft navigation is a high-risk domain where confidence on the prediction is required. For that, this paper introduces the use of conformal predictions strategies for regression problems. While standard approaches use an absolute nonconformity scores, we aim at introducing a signed version of the nonconformity scores. Experimental results on synthetic data have shown their interest for non-centered errors. Moreover, in order to reduce the width of the prediction interval, we introduce an optimization procedure which learn the optimal alpha risks for the lower and upper bounds of the interval. In practice, we show that a line search algorithm can be employed to solve it. Practically, this novel adaptive conformal prediction strategy has revealed to be well adapted for skew distributed errors. In addition, an extension of these conformal prediction strategies is introduced to incorporate numeric and categorical auxiliary variables describing the acquisition context. Based on a quantile regression model, they allow to maintain the coverage for each metadata value. All these strategies have then been applied on a real use case of runway localization from data acquired by an aircraft during landing maneuver. Extensive experiments on multiple airports have shown the interest of the proposed conformal prediction strategies, in particular for runways equipped with a very long ramp approach where asymmetric angular deviation error are observed.
{"title":"Conformal prediction for regression models with asymmetrically distributed errors: application to aircraft navigation during landing maneuver","authors":"Solène Vilfroy, Lionel Bombrun, Thierry Urruty, Florence De Grancey, Jean-Philippe Lebrat, Philippe Carré","doi":"10.1007/s10994-024-06615-x","DOIUrl":"https://doi.org/10.1007/s10994-024-06615-x","url":null,"abstract":"<p>Semi-autonomous aircraft navigation is a high-risk domain where confidence on the prediction is required. For that, this paper introduces the use of conformal predictions strategies for regression problems. While standard approaches use an absolute nonconformity scores, we aim at introducing a signed version of the nonconformity scores. Experimental results on synthetic data have shown their interest for non-centered errors. Moreover, in order to reduce the width of the prediction interval, we introduce an optimization procedure which learn the optimal alpha risks for the lower and upper bounds of the interval. In practice, we show that a line search algorithm can be employed to solve it. Practically, this novel adaptive conformal prediction strategy has revealed to be well adapted for skew distributed errors. In addition, an extension of these conformal prediction strategies is introduced to incorporate numeric and categorical auxiliary variables describing the acquisition context. Based on a quantile regression model, they allow to maintain the coverage for each metadata value. All these strategies have then been applied on a real use case of runway localization from data acquired by an aircraft during landing maneuver. Extensive experiments on multiple airports have shown the interest of the proposed conformal prediction strategies, in particular for runways equipped with a very long ramp approach where asymmetric angular deviation error are observed.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"38 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-06DOI: 10.1007/s10994-024-06611-1
Pegah Rahimian, Balazs Mark Mihalyi, Laszlo Toka
Predicting outcomes in soccer is crucial for various stakeholders, including teams, leagues, bettors, the betting industry, media, and fans. With advancements in computer vision, player tracking data has become abundant, leading to the development of sophisticated soccer analytics models. However, existing models often rely solely on spatiotemporal features derived from player tracking data, which may not fully capture the complexities of in-game dynamics. In this paper, we present an end-to-end system that leverages raw event and tracking data to predict both offensive and defensive actions, along with the optimal decision for each game scenario, based solely on historical game data. Our model incorporates the effectiveness of these actions to accurately predict win probabilities at every minute of the game. Experimental results demonstrate the effectiveness of our approach, achieving an accuracy of 87% in predicting offensive and defensive actions. Furthermore, our in-game outcome prediction model exhibits an error rate of 0.1, outperforming counterpart models and bookmakers’ odds.
{"title":"In-game soccer outcome prediction with offline reinforcement learning","authors":"Pegah Rahimian, Balazs Mark Mihalyi, Laszlo Toka","doi":"10.1007/s10994-024-06611-1","DOIUrl":"https://doi.org/10.1007/s10994-024-06611-1","url":null,"abstract":"<p>Predicting outcomes in soccer is crucial for various stakeholders, including teams, leagues, bettors, the betting industry, media, and fans. With advancements in computer vision, player tracking data has become abundant, leading to the development of sophisticated soccer analytics models. However, existing models often rely solely on spatiotemporal features derived from player tracking data, which may not fully capture the complexities of in-game dynamics. In this paper, we present an end-to-end system that leverages raw event and tracking data to predict both offensive and defensive actions, along with the optimal decision for each game scenario, based solely on historical game data. Our model incorporates the effectiveness of these actions to accurately predict win probabilities at every minute of the game. Experimental results demonstrate the effectiveness of our approach, achieving an accuracy of 87% in predicting offensive and defensive actions. Furthermore, our in-game outcome prediction model exhibits an error rate of 0.1, outperforming counterpart models and bookmakers’ odds.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"32 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Current stance detection methods employ topic-aligned data, resulting in many unexplored topics due to insufficient training samples. Large Language Models (LLMs) pre-trained on a vast amount of web data offer a viable solution when training data is unavailable. This work introduces Tweets2Stance - T2S, an unsupervised stance detection framework based on zero-shot classification, i.e. leveraging an LLM pre-trained on Natural Language Inference tasks. T2S detects a five-valued user’s stance on social-political statements by analyzing their X (Twitter) timeline. The Ground Truth of a user’s stance is obtained from Voting Advice Applications (VAAs). Through comprehensive experiments, a T2S’s optimal setting was identified for each election. Linguistic limitations related to the language model are further addressed by integrating state-of-the-art LLMs like GPT-4 and Mixtral into the T2S framework. The T2S framework’s generalization potential is demonstrated by measuring its performance (F1 and MAE scores) across nine datasets. These datasets were built by collecting tweets from competing parties’ Twitter accounts in nine political elections held in different countries from 2019 to 2021. The results, in terms of F1 and MAE scores, outperformed all baselines and approached the best scores for each election. This showcases the ability of T2S, particularly when combined with state-of-the-art LLMs, to generalize across different cultural-political contexts.
{"title":"Evaluating large language models for user stance detection on X (Twitter)","authors":"Margherita Gambini, Caterina Senette, Tiziano Fagni, Maurizio Tesconi","doi":"10.1007/s10994-024-06587-y","DOIUrl":"https://doi.org/10.1007/s10994-024-06587-y","url":null,"abstract":"<p>Current stance detection methods employ topic-aligned data, resulting in many unexplored topics due to insufficient training samples. Large Language Models (LLMs) pre-trained on a vast amount of web data offer a viable solution when training data is unavailable. This work introduces <i>Tweets2Stance - T2S</i>, an unsupervised stance detection framework based on zero-shot classification, i.e. leveraging an LLM pre-trained on Natural Language Inference tasks. T2S detects a five-valued user’s stance on social-political statements by analyzing their X (Twitter) timeline. The Ground Truth of a user’s stance is obtained from Voting Advice Applications (VAAs). Through comprehensive experiments, a T2S’s optimal setting was identified for each election. Linguistic limitations related to the language model are further addressed by integrating state-of-the-art LLMs like GPT-4 and Mixtral into the <i>T2S</i> framework. The <i>T2S</i> framework’s generalization potential is demonstrated by measuring its performance (F1 and MAE scores) across nine datasets. These datasets were built by collecting tweets from competing parties’ Twitter accounts in nine political elections held in different countries from 2019 to 2021. The results, in terms of F1 and MAE scores, outperformed all baselines and approached the best scores for each election. This showcases the ability of T2S, particularly when combined with state-of-the-art LLMs, to generalize across different cultural-political contexts.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"26 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142226584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-05DOI: 10.1007/s10994-024-06596-x
Lee-Ad Gottlieb, Eran Kaufman, Aryeh Kontorovich, Gabriel Nivasch, Ofir Pele
We introduce a new embedding technique based on a nested barycentric coordinate system. We show that our embedding can be used to transform the problems of polyhedron approximation, piecewise linear classification and convex regression into one of finding a linear classifier or regressor in a higher dimensional (but nevertheless quite sparse) representation. Our embedding maps a piecewise linear function into an everywhere-linear function, and allows us to invoke well-known algorithms for the latter problem to solve the former. We explain the applications of our embedding to the problems of approximating separating polyhedra—in fact, it can approximate any convex body and unions of convex bodies—as well as to classification by separating polyhedra, and to piecewise linear regression.
{"title":"Nested barycentric coordinate system as an explicit feature map for polyhedra approximation and learning tasks","authors":"Lee-Ad Gottlieb, Eran Kaufman, Aryeh Kontorovich, Gabriel Nivasch, Ofir Pele","doi":"10.1007/s10994-024-06596-x","DOIUrl":"https://doi.org/10.1007/s10994-024-06596-x","url":null,"abstract":"<p>We introduce a new embedding technique based on a nested barycentric coordinate system. We show that our embedding can be used to transform the problems of polyhedron approximation, piecewise linear classification and convex regression into one of finding a <i>linear</i> classifier or regressor in a higher dimensional (but nevertheless quite sparse) representation. Our embedding maps a piecewise linear function into an everywhere-linear function, and allows us to invoke well-known algorithms for the latter problem to solve the former. We explain the applications of our embedding to the problems of approximating separating polyhedra—in fact, it can approximate any convex body and unions of convex bodies—as well as to classification by separating polyhedra, and to piecewise linear regression.\u0000</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"63 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142226585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-03DOI: 10.1007/s10994-024-06607-x
Laura Maria Palomino Mariño, Francisco de Assis Tenorio de Carvalho
There has been an increasing interest in multi-view approaches based on their ability to manage data from several sources. However, regarding unsupervised learning, most multi-view approaches are clustering algorithms suitable for analyzing vector data. Currently, only a relatively few SOM algorithms can manage multi-view dissimilarity data, despite their usefulness. This paper proposes two new families of batch SOM algorithms for multi-view dissimilarity data: multi-medoids SOM and relational SOM, both designed to give a crisp partition and learn the relevance weight for each dissimilarity matrix by optimizing an objective function, aiming to preserve the topological properties of the map data. In both families, the weight represents the relevance of each dissimilarity matrix for the learning task being computed, either locally, for each cluster, or globally, for the whole partition. The proposed algorithms were compared with already in the literature single-view SOM and set-medoids SOM for multi-view dissimilarity data. According to the experiments using 14 datasets for F-measure, NMI, Topographic Error, and Silhouette, the relevance weights of the dissimilarity matrices must be considered. In addition, the multi-medoids and relational SOM performed better than the set-medoids SOM. An application study was also carried out on a dermatology dataset, where the proposed methods have the best performance.
多视图方法能够管理来自多个来源的数据,因此越来越受到人们的关注。然而,在无监督学习方面,大多数多视角方法都是适用于分析向量数据的聚类算法。目前,只有相对较少的 SOM 算法可以管理多视角差异数据,尽管它们非常有用。本文针对多视角异质性数据提出了两个新的批量 SOM 算法系列:多媒介 SOM 和关系 SOM,这两个系列都旨在通过优化目标函数来给出一个清晰的分区并学习每个异质性矩阵的相关性权重,目的是保留地图数据的拓扑特性。在这两个系列中,权重代表了每个异质性矩阵对于正在计算的学习任务的相关性,可以是局部的(针对每个群组),也可以是全局的(针对整个分区)。针对多视角异质性数据,我们将所提出的算法与已有文献中的单视角 SOM 和集合媒介 SOM 进行了比较。根据使用 14 个数据集进行的 F-measure、NMI、Topographic Error 和 Silhouette 实验,必须考虑异质性矩阵的相关性权重。此外,多媒介 SOM 和关系 SOM 的性能优于集合媒介 SOM。我们还在一个皮肤科数据集上进行了应用研究,发现所提出的方法在该数据集上表现最佳。
{"title":"Self-organizing maps with adaptive distances for multiple dissimilarity matrices","authors":"Laura Maria Palomino Mariño, Francisco de Assis Tenorio de Carvalho","doi":"10.1007/s10994-024-06607-x","DOIUrl":"https://doi.org/10.1007/s10994-024-06607-x","url":null,"abstract":"<p>There has been an increasing interest in multi-view approaches based on their ability to manage data from several sources. However, regarding unsupervised learning, most multi-view approaches are clustering algorithms suitable for analyzing vector data. Currently, only a relatively few SOM algorithms can manage multi-view dissimilarity data, despite their usefulness. This paper proposes two new families of batch SOM algorithms for multi-view dissimilarity data: multi-medoids SOM and relational SOM, both designed to give a crisp partition and learn the relevance weight for each dissimilarity matrix by optimizing an objective function, aiming to preserve the topological properties of the map data. In both families, the weight represents the relevance of each dissimilarity matrix for the learning task being computed, either locally, for each cluster, or globally, for the whole partition. The proposed algorithms were compared with already in the literature single-view SOM and set-medoids SOM for multi-view dissimilarity data. According to the experiments using 14 datasets for F-measure, NMI, Topographic Error, and Silhouette, the relevance weights of the dissimilarity matrices must be considered. In addition, the multi-medoids and relational SOM performed better than the set-medoids SOM. An application study was also carried out on a dermatology dataset, where the proposed methods have the best performance.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"71 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-28DOI: 10.1007/s10994-024-06604-0
Le Zhang, Qibin Hou, Yun Liu, Jia-Wang Bian, Xun Xu, Joey Tianyi Zhou, Ce Zhu
Ensemble learning serves as a straightforward way to improve the performance of almost any machine learning algorithm. Existing deep ensemble methods usually naïvely train many different models and then aggregate their predictions. This is not optimal in our view from two aspects: (1) Naïvely training multiple models adds much more computational burden, especially in the deep learning era; (2) Purely optimizing each base model without considering their interactions limits the diversity of ensemble and performance gains. We tackle these issues by proposing deep negative correlation classification (DNCC), in which the accuracy and diversity trade-off is systematically controlled by decomposing the loss function seamlessly into individual accuracy and the “correlation” between individual models and the ensemble. DNCC yields a deep classification ensemble where the individual estimator is both accurate and “negatively correlated”. Thanks to the optimized diversities, DNCC works well even when utilizing a shared network backbone, which significantly improves its efficiency when compared with most existing ensemble systems, as illustrated in Fig. 2. Extensive experiments on multiple benchmark datasets and network structures demonstrate the superiority of the proposed method.
{"title":"Deep negative correlation classification","authors":"Le Zhang, Qibin Hou, Yun Liu, Jia-Wang Bian, Xun Xu, Joey Tianyi Zhou, Ce Zhu","doi":"10.1007/s10994-024-06604-0","DOIUrl":"https://doi.org/10.1007/s10994-024-06604-0","url":null,"abstract":"<p>Ensemble learning serves as a straightforward way to improve the performance of almost any machine learning algorithm. Existing deep ensemble methods usually naïvely train many different models and then aggregate their predictions. This is not optimal in our view from two aspects: (1) Naïvely training multiple models adds much more computational burden, especially in the deep learning era; (2) Purely optimizing each base model without considering their interactions limits the diversity of ensemble and performance gains. We tackle these issues by proposing deep negative correlation classification (DNCC), in which the accuracy and diversity trade-off is systematically controlled by decomposing the loss function seamlessly into individual accuracy and the “correlation” between individual models and the ensemble. DNCC yields a deep classification ensemble where the individual estimator is both accurate and “negatively correlated”. Thanks to the optimized diversities, DNCC works well even when utilizing a shared network backbone, which significantly improves its efficiency when compared with most existing ensemble systems, as illustrated in Fig. 2. Extensive experiments on multiple benchmark datasets and network structures demonstrate the superiority of the proposed method.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"17 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-26DOI: 10.1007/s10994-024-06614-y
Duo Xu, Faramarz Fekri
Temporal logic (TL) tasks consist of complex and temporally extended subgoals and they are common for many real-world applications, such as service and navigation robots. However, it is often inefficient or even infeasible to train reinforcement learning (RL) agents to solve multiple TL tasks, since rewards are sparse and non-Markovian in these tasks. A promising solution to this problem is to learn task-conditioned policies which can zero-shot generalize to new TL tasks without further training. However, influenced by some practical issues, such as issues of lossy symbolic observation and long time-horizon of completing TL task, previous works suffer from sample inefficiency in training and sub-optimality (or even infeasibility) in task execution. In order to tackle these issues, this paper proposes an option-based framework to generalize TL tasks, consisting of option training and task execution parts. We have innovations in both parts. In option training, we propose to learn options dependent on the future subgoals via a novel approach. Additionally, we propose to train a multi-step value function which can propagate the rewards of satisfying future subgoals more efficiently in long-horizon tasks. In task execution, in order to ensure the optimality and safety, we propose a model-free MPC planner for option selection, circumventing the learning of a transition model which is required by previous MPC planners. In experiments on three different domains, we evaluate the generalization capability of the agent trained by the proposed method, showing its significant advantage over previous methods.
{"title":"Generalization of temporal logic tasks via future dependent options","authors":"Duo Xu, Faramarz Fekri","doi":"10.1007/s10994-024-06614-y","DOIUrl":"https://doi.org/10.1007/s10994-024-06614-y","url":null,"abstract":"<p>Temporal logic (TL) tasks consist of complex and temporally extended subgoals and they are common for many real-world applications, such as service and navigation robots. However, it is often inefficient or even infeasible to train reinforcement learning (RL) agents to solve multiple TL tasks, since rewards are sparse and non-Markovian in these tasks. A promising solution to this problem is to learn task-conditioned policies which can zero-shot generalize to new TL tasks without further training. However, influenced by some practical issues, such as issues of lossy symbolic observation and long time-horizon of completing TL task, previous works suffer from sample inefficiency in training and sub-optimality (or even infeasibility) in task execution. In order to tackle these issues, this paper proposes an option-based framework to generalize TL tasks, consisting of option training and task execution parts. We have innovations in both parts. In option training, we propose to learn options dependent on the future subgoals via a novel approach. Additionally, we propose to train a multi-step value function which can propagate the rewards of satisfying future subgoals more efficiently in long-horizon tasks. In task execution, in order to ensure the optimality and safety, we propose a model-free MPC planner for option selection, circumventing the learning of a transition model which is required by previous MPC planners. In experiments on three different domains, we evaluate the generalization capability of the agent trained by the proposed method, showing its significant advantage over previous methods.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"317 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142226586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}