Federated learning offers a paradigm to the challenge of preserving privacy in distributed machine learning. However, datasets distributed across each client in the real world are inevitably heterogeneous, and if the datasets can be globally aggregated, they tend to be long-tailed distributed, which greatly affects the performance of the model. The traditional approach to federated learning primarily addresses the heterogeneity of data among clients, yet it fails to address the phenomenon of class-wise bias in global long-tailed data. This results in the trained model focusing on the head classes while neglecting the equally important tail classes. Consequently, it is essential to develop a methodology that considers classes holistically. To address the above problems, we propose a new method FedLF, which introduces three modifications in the local training phase: adaptive logit adjustment, continuous class centred optimization, and feature decorrelation. We compare seven state-of-the-art methods with varying degrees of data heterogeneity and long-tailed distribution. Extensive experiments on benchmark datasets CIFAR-10-LT and CIFAR-100-LT demonstrate that our approach effectively mitigates the problem of model performance degradation due to data heterogeneity and long-tailed distribution. our code is available at https://github.com/18sym/FedLF.
{"title":"FedLF: Adaptive Logit Adjustment and Feature Optimization in Federated Long-Tailed Learning","authors":"Xiuhua Lu, Peng Li, Xuefeng Jiang","doi":"arxiv-2409.12105","DOIUrl":"https://doi.org/arxiv-2409.12105","url":null,"abstract":"Federated learning offers a paradigm to the challenge of preserving privacy\u0000in distributed machine learning. However, datasets distributed across each\u0000client in the real world are inevitably heterogeneous, and if the datasets can\u0000be globally aggregated, they tend to be long-tailed distributed, which greatly\u0000affects the performance of the model. The traditional approach to federated\u0000learning primarily addresses the heterogeneity of data among clients, yet it\u0000fails to address the phenomenon of class-wise bias in global long-tailed data.\u0000This results in the trained model focusing on the head classes while neglecting\u0000the equally important tail classes. Consequently, it is essential to develop a\u0000methodology that considers classes holistically. To address the above problems,\u0000we propose a new method FedLF, which introduces three modifications in the\u0000local training phase: adaptive logit adjustment, continuous class centred\u0000optimization, and feature decorrelation. We compare seven state-of-the-art\u0000methods with varying degrees of data heterogeneity and long-tailed\u0000distribution. Extensive experiments on benchmark datasets CIFAR-10-LT and\u0000CIFAR-100-LT demonstrate that our approach effectively mitigates the problem of\u0000model performance degradation due to data heterogeneity and long-tailed\u0000distribution. our code is available at https://github.com/18sym/FedLF.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Out-of-distribution (OOD) detection aims to detect test samples outside the training category space, which is an essential component in building reliable machine learning systems. Existing reviews on OOD detection primarily focus on method taxonomy, surveying the field by categorizing various approaches. However, many recent works concentrate on non-traditional OOD detection scenarios, such as test-time adaptation, multi-modal data sources and other novel contexts. In this survey, we uniquely review recent advances in OOD detection from the problem scenario perspective for the first time. According to whether the training process is completely controlled, we divide OOD detection methods into training-driven and training-agnostic. Besides, considering the rapid development of pre-trained models, large pre-trained model-based OOD detection is also regarded as an important category and discussed separately. Furthermore, we provide a discussion of the evaluation scenarios, a variety of applications, and several future research directions. We believe this survey with new taxonomy will benefit the proposal of new methods and the expansion of more practical scenarios. A curated list of related papers is provided in the Github repository: url{https://github.com/shuolucs/Awesome-Out-Of-Distribution-Detection}
{"title":"Recent Advances in OOD Detection: Problems and Approaches","authors":"Shuo Lu, YingSheng Wang, LuJun Sheng, AiHua Zheng, LinXiao He, Jian Liang","doi":"arxiv-2409.11884","DOIUrl":"https://doi.org/arxiv-2409.11884","url":null,"abstract":"Out-of-distribution (OOD) detection aims to detect test samples outside the\u0000training category space, which is an essential component in building reliable\u0000machine learning systems. Existing reviews on OOD detection primarily focus on\u0000method taxonomy, surveying the field by categorizing various approaches.\u0000However, many recent works concentrate on non-traditional OOD detection\u0000scenarios, such as test-time adaptation, multi-modal data sources and other\u0000novel contexts. In this survey, we uniquely review recent advances in OOD\u0000detection from the problem scenario perspective for the first time. According\u0000to whether the training process is completely controlled, we divide OOD\u0000detection methods into training-driven and training-agnostic. Besides,\u0000considering the rapid development of pre-trained models, large pre-trained\u0000model-based OOD detection is also regarded as an important category and\u0000discussed separately. Furthermore, we provide a discussion of the evaluation\u0000scenarios, a variety of applications, and several future research directions.\u0000We believe this survey with new taxonomy will benefit the proposal of new\u0000methods and the expansion of more practical scenarios. A curated list of\u0000related papers is provided in the Github repository:\u0000url{https://github.com/shuolucs/Awesome-Out-Of-Distribution-Detection}","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The ever-growing scale of deep neural networks (DNNs) has lead to an equally rapid growth in computational resource requirements. Many recent architectures, most prominently Large Language Models, have to be trained using supercomputers with thousands of accelerators, such as GPUs or TPUs. Next to the vast number of floating point operations the memory footprint of DNNs is also exploding. In contrast, GPU architectures are notoriously short on memory. Even comparatively small architectures like some EfficientNet variants cannot be trained on a single consumer-grade GPU at reasonable mini-batch sizes. During training, intermediate input activations have to be stored until backpropagation for gradient calculation. These make up the vast majority of the memory footprint. In this work we therefore consider compressing activation maps for the backward pass using pooling, which can reduce both the memory footprint and amount of data movement. The forward computation remains uncompressed. We empirically show convergence and study effects on feature detection at the example of the common vision architecture ResNet. With this approach we are able to reduce the peak memory consumption by 29% at the cost of a longer training schedule, while maintaining prediction accuracy compared to an uncompressed baseline.
{"title":"Less Memory Means smaller GPUs: Backpropagation with Compressed Activations","authors":"Daniel Barley, Holger Fröning","doi":"arxiv-2409.11902","DOIUrl":"https://doi.org/arxiv-2409.11902","url":null,"abstract":"The ever-growing scale of deep neural networks (DNNs) has lead to an equally\u0000rapid growth in computational resource requirements. Many recent architectures,\u0000most prominently Large Language Models, have to be trained using supercomputers\u0000with thousands of accelerators, such as GPUs or TPUs. Next to the vast number\u0000of floating point operations the memory footprint of DNNs is also exploding. In\u0000contrast, GPU architectures are notoriously short on memory. Even comparatively\u0000small architectures like some EfficientNet variants cannot be trained on a\u0000single consumer-grade GPU at reasonable mini-batch sizes. During training,\u0000intermediate input activations have to be stored until backpropagation for\u0000gradient calculation. These make up the vast majority of the memory footprint.\u0000In this work we therefore consider compressing activation maps for the backward\u0000pass using pooling, which can reduce both the memory footprint and amount of\u0000data movement. The forward computation remains uncompressed. We empirically\u0000show convergence and study effects on feature detection at the example of the\u0000common vision architecture ResNet. With this approach we are able to reduce the\u0000peak memory consumption by 29% at the cost of a longer training schedule, while\u0000maintaining prediction accuracy compared to an uncompressed baseline.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maarten Meire, Quinten Van Baelen, Ted Ooijevaar, Peter Karsmakers
The main goal of machine condition monitoring is, as the name implies, to monitor the condition of industrial applications. The objective of this monitoring can be mainly split into two problems. A diagnostic problem, where normal data should be distinguished from anomalous data, otherwise called Anomaly Detection (AD), or a prognostic problem, where the aim is to predict the evolution of a Condition Indicator (CI) that reflects the condition of an asset throughout its life time. When considering machine condition monitoring, it is expected that this CI shows a monotonic behavior, as the condition of a machine gradually degrades over time. This work proposes an extension to Constraint Guided AutoEncoders (CGAE), which is a robust AD method, that enables building a single model that can be used for both AD and CI estimation. For the purpose of improved CI estimation the extension incorporates a constraint that enforces the model to have monotonically increasing CI predictions over time. Experimental results indicate that the proposed algorithm performs similar, or slightly better, than CGAE, with regards to AD, while improving the monotonic behavior of the CI.
顾名思义,机器状态监测的主要目的是监测工业应用的状态。这种监控的目标主要可分为两个问题。一个是诊断问题,需要将正常数据与异常数据区分开来,也称为异常检测 (AD);另一个是预测问题,目的是预测状态指标 (CI) 的变化,该指标反映了资产在整个生命周期内的状态。在考虑机器状态监控时,随着时间的推移,机器的状态会逐渐恶化,因此预计该 CI 会表现出单调的行为。为了改进 CI 估算,该扩展包含了一个约束条件,强制模型具有随时间单调递增的 CI 预测。实验结果表明,所提出的算法在 AD 方面的表现与 CGAE 相似或略胜一筹,同时改进了 CI 的单调行为。
{"title":"Constraint Guided AutoEncoders for Joint Optimization of Condition Indicator Estimation and Anomaly Detection in Machine Condition Monitoring","authors":"Maarten Meire, Quinten Van Baelen, Ted Ooijevaar, Peter Karsmakers","doi":"arxiv-2409.11807","DOIUrl":"https://doi.org/arxiv-2409.11807","url":null,"abstract":"The main goal of machine condition monitoring is, as the name implies, to\u0000monitor the condition of industrial applications. The objective of this\u0000monitoring can be mainly split into two problems. A diagnostic problem, where\u0000normal data should be distinguished from anomalous data, otherwise called\u0000Anomaly Detection (AD), or a prognostic problem, where the aim is to predict\u0000the evolution of a Condition Indicator (CI) that reflects the condition of an\u0000asset throughout its life time. When considering machine condition monitoring,\u0000it is expected that this CI shows a monotonic behavior, as the condition of a\u0000machine gradually degrades over time. This work proposes an extension to\u0000Constraint Guided AutoEncoders (CGAE), which is a robust AD method, that\u0000enables building a single model that can be used for both AD and CI estimation.\u0000For the purpose of improved CI estimation the extension incorporates a\u0000constraint that enforces the model to have monotonically increasing CI\u0000predictions over time. Experimental results indicate that the proposed\u0000algorithm performs similar, or slightly better, than CGAE, with regards to AD,\u0000while improving the monotonic behavior of the CI.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Snelleman, B. M. Renting, H. H. Hoos, J. N. van Rijn
Graph-structured data naturally occurs in many research fields, such as chemistry and sociology. The relational information contained therein can be leveraged to statistically model graph properties through geometrical deep learning. Graph neural networks employ techniques, such as message-passing layers, to propagate local features through a graph. However, message-passing layers can be computationally expensive when dealing with large and sparse graphs. Graph pooling operators offer the possibility of removing or merging nodes in such graphs, thus lowering computational costs. However, pooling operators that remove nodes cause data loss, and pooling operators that merge nodes are often computationally expensive. We propose a pooling operator that merges nodes so as not to cause data loss but is also conceptually simple and computationally inexpensive. We empirically demonstrate that the proposed pooling operator performs statistically significantly better than edge pool on four popular benchmark datasets while reducing time complexity and the number of trainable parameters by 70.6% on average. Compared to another maximally powerful method named Graph Isomporhic Network, we show that we outperform them on two popular benchmark datasets while reducing the number of learnable parameters on average by 60.9%.
{"title":"Edge-Based Graph Component Pooling","authors":"T. Snelleman, B. M. Renting, H. H. Hoos, J. N. van Rijn","doi":"arxiv-2409.11856","DOIUrl":"https://doi.org/arxiv-2409.11856","url":null,"abstract":"Graph-structured data naturally occurs in many research fields, such as\u0000chemistry and sociology. The relational information contained therein can be\u0000leveraged to statistically model graph properties through geometrical deep\u0000learning. Graph neural networks employ techniques, such as message-passing\u0000layers, to propagate local features through a graph. However, message-passing\u0000layers can be computationally expensive when dealing with large and sparse\u0000graphs. Graph pooling operators offer the possibility of removing or merging\u0000nodes in such graphs, thus lowering computational costs. However, pooling\u0000operators that remove nodes cause data loss, and pooling operators that merge\u0000nodes are often computationally expensive. We propose a pooling operator that\u0000merges nodes so as not to cause data loss but is also conceptually simple and\u0000computationally inexpensive. We empirically demonstrate that the proposed\u0000pooling operator performs statistically significantly better than edge pool on\u0000four popular benchmark datasets while reducing time complexity and the number\u0000of trainable parameters by 70.6% on average. Compared to another maximally\u0000powerful method named Graph Isomporhic Network, we show that we outperform them\u0000on two popular benchmark datasets while reducing the number of learnable\u0000parameters on average by 60.9%.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The removal of carefully-selected examples from training data has recently emerged as an effective way of improving the robustness of machine learning models. However, the best way to select these examples remains an open question. In this paper, we consider the problem from the perspective of unsupervised domain adaptation (UDA). We propose AdaPrune, a method for UDA whereby training examples are removed to attempt to align the training distribution to that of the target data. By adopting the maximum mean discrepancy (MMD) as the criterion for alignment, the problem can be neatly formulated and solved as an integer quadratic program. We evaluate our approach on a real-world domain shift task of bioacoustic event detection. As a method for UDA, we show that AdaPrune outperforms related techniques, and is complementary to other UDA algorithms such as CORAL. Our analysis of the relationship between the MMD and model accuracy, along with t-SNE plots, validate the proposed method as a principled and well-founded way of performing data pruning.
{"title":"Unsupervised Domain Adaptation Via Data Pruning","authors":"Andrea Napoli, Paul White","doi":"arxiv-2409.12076","DOIUrl":"https://doi.org/arxiv-2409.12076","url":null,"abstract":"The removal of carefully-selected examples from training data has recently\u0000emerged as an effective way of improving the robustness of machine learning\u0000models. However, the best way to select these examples remains an open\u0000question. In this paper, we consider the problem from the perspective of\u0000unsupervised domain adaptation (UDA). We propose AdaPrune, a method for UDA\u0000whereby training examples are removed to attempt to align the training\u0000distribution to that of the target data. By adopting the maximum mean\u0000discrepancy (MMD) as the criterion for alignment, the problem can be neatly\u0000formulated and solved as an integer quadratic program. We evaluate our approach\u0000on a real-world domain shift task of bioacoustic event detection. As a method\u0000for UDA, we show that AdaPrune outperforms related techniques, and is\u0000complementary to other UDA algorithms such as CORAL. Our analysis of the\u0000relationship between the MMD and model accuracy, along with t-SNE plots,\u0000validate the proposed method as a principled and well-founded way of performing\u0000data pruning.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"35 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Advancement in finite element methods have become essential in various disciplines, and in particular for Computational Fluid Dynamics (CFD), driving research efforts for improved precision and efficiency. While Convolutional Neural Networks (CNNs) have found success in CFD by mapping meshes into images, recent attention has turned to leveraging Graph Neural Networks (GNNs) for direct mesh processing. This paper introduces a novel model merging Self-Attention with Message Passing in GNNs, achieving a 15% reduction in RMSE on the well known flow past a cylinder benchmark. Furthermore, a dynamic mesh pruning technique based on Self-Attention is proposed, that leads to a robust GNN-based multigrid approach, also reducing RMSE by 15%. Additionally, a new self-supervised training method based on BERT is presented, resulting in a 25% RMSE reduction. The paper includes an ablation study and outperforms state-of-the-art models on several challenging datasets, promising advancements similar to those recently achieved in natural language and image processing. Finally, the paper introduces a dataset with meshes larger than existing ones by at least an order of magnitude. Code and Datasets will be released at https://github.com/DonsetPG/multigrid-gnn.
{"title":"Multi-Grid Graph Neural Networks with Self-Attention for Computational Mechanics","authors":"Paul Garnier, Jonathan Viquerat, Elie Hachem","doi":"arxiv-2409.11899","DOIUrl":"https://doi.org/arxiv-2409.11899","url":null,"abstract":"Advancement in finite element methods have become essential in various\u0000disciplines, and in particular for Computational Fluid Dynamics (CFD), driving\u0000research efforts for improved precision and efficiency. While Convolutional\u0000Neural Networks (CNNs) have found success in CFD by mapping meshes into images,\u0000recent attention has turned to leveraging Graph Neural Networks (GNNs) for\u0000direct mesh processing. This paper introduces a novel model merging\u0000Self-Attention with Message Passing in GNNs, achieving a 15% reduction in RMSE\u0000on the well known flow past a cylinder benchmark. Furthermore, a dynamic mesh\u0000pruning technique based on Self-Attention is proposed, that leads to a robust\u0000GNN-based multigrid approach, also reducing RMSE by 15%. Additionally, a new\u0000self-supervised training method based on BERT is presented, resulting in a 25%\u0000RMSE reduction. The paper includes an ablation study and outperforms\u0000state-of-the-art models on several challenging datasets, promising advancements\u0000similar to those recently achieved in natural language and image processing.\u0000Finally, the paper introduces a dataset with meshes larger than existing ones\u0000by at least an order of magnitude. Code and Datasets will be released at\u0000https://github.com/DonsetPG/multigrid-gnn.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study explores the potential of utilizing administrative claims data, combined with advanced machine learning and deep learning techniques, to predict the progression of Chronic Kidney Disease (CKD) to End-Stage Renal Disease (ESRD). We analyze a comprehensive, 10-year dataset provided by a major health insurance organization to develop prediction models for multiple observation windows using traditional machine learning methods such as Random Forest and XGBoost as well as deep learning approaches such as Long Short-Term Memory (LSTM) networks. Our findings demonstrate that the LSTM model, particularly with a 24-month observation window, exhibits superior performance in predicting ESRD progression, outperforming existing models in the literature. We further apply SHapley Additive exPlanations (SHAP) analysis to enhance interpretability, providing insights into the impact of individual features on predictions at the individual patient level. This study underscores the value of leveraging administrative claims data for CKD management and predicting ESRD progression.
{"title":"Towards Interpretable End-Stage Renal Disease (ESRD) Prediction: Utilizing Administrative Claims Data with Explainable AI Techniques","authors":"Yubo Li, Saba Al-Sayouri, Rema Padman","doi":"arxiv-2409.12087","DOIUrl":"https://doi.org/arxiv-2409.12087","url":null,"abstract":"This study explores the potential of utilizing administrative claims data,\u0000combined with advanced machine learning and deep learning techniques, to\u0000predict the progression of Chronic Kidney Disease (CKD) to End-Stage Renal\u0000Disease (ESRD). We analyze a comprehensive, 10-year dataset provided by a major\u0000health insurance organization to develop prediction models for multiple\u0000observation windows using traditional machine learning methods such as Random\u0000Forest and XGBoost as well as deep learning approaches such as Long Short-Term\u0000Memory (LSTM) networks. Our findings demonstrate that the LSTM model,\u0000particularly with a 24-month observation window, exhibits superior performance\u0000in predicting ESRD progression, outperforming existing models in the\u0000literature. We further apply SHapley Additive exPlanations (SHAP) analysis to\u0000enhance interpretability, providing insights into the impact of individual\u0000features on predictions at the individual patient level. This study underscores\u0000the value of leveraging administrative claims data for CKD management and\u0000predicting ESRD progression.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Semi-Supervised Learning (SSL) has become a preferred paradigm in many deep learning tasks, which reduces the need for human labor. Previous studies primarily focus on effectively utilising the labelled and unlabeled data to improve performance. However, we observe that how to select samples for labelling also significantly impacts performance, particularly under extremely low-budget settings. The sample selection task in SSL has been under-explored for a long time. To fill in this gap, we propose a Representative and Diverse Sample Selection approach (RDSS). By adopting a modified Frank-Wolfe algorithm to minimise a novel criterion $alpha$-Maximum Mean Discrepancy ($alpha$-MMD), RDSS samples a representative and diverse subset for annotation from the unlabeled data. We demonstrate that minimizing $alpha$-MMD enhances the generalization ability of low-budget learning. Experimental results show that RDSS consistently improves the performance of several popular SSL frameworks and outperforms the state-of-the-art sample selection approaches used in Active Learning (AL) and Semi-Supervised Active Learning (SSAL), even with constrained annotation budgets.
半监督学习(SSL)已成为许多深度学习任务的首选范式,它减少了对人力的需求。以往的研究主要集中在有效利用标记数据和未标记数据来提高性能。然而,我们发现,如何选择标记样本也会对性能产生重大影响,尤其是在预算极低的情况下。长期以来,SSL 中的样本选择任务一直未得到充分探索。为了填补这一空白,我们提出了一种代表性和多样性样本选择方法(RDSS)。通过采用改进的弗兰克-沃尔夫算法(Frank-Wolfe algorithm)来最小化一个新标准($alpha$-Maximum Mean Discrepancy ($alpha$-MMD)),RDSS从未标明的数据中采样出一个具有代表性和多样性的注释子集。我们证明,最小化$alpha$-MMD可以增强低预算学习的泛化能力。实验结果表明,即使在标注预算受限的情况下,RDSS 也能持续提高几种流行的 SSL 框架的性能,并优于主动学习(ActiveLearning,AL)和半监督主动学习(Semi-Supervised Active Learning,SSAL)中使用的最先进的样本选择方法。
{"title":"Enhancing Semi-Supervised Learning via Representative and Diverse Sample Selection","authors":"Qian Shao, Jiangrui Kang, Qiyuan Chen, Zepeng Li, Hongxia Xu, Yiwen Cao, Jiajuan Liang, Jian Wu","doi":"arxiv-2409.11653","DOIUrl":"https://doi.org/arxiv-2409.11653","url":null,"abstract":"Semi-Supervised Learning (SSL) has become a preferred paradigm in many deep\u0000learning tasks, which reduces the need for human labor. Previous studies\u0000primarily focus on effectively utilising the labelled and unlabeled data to\u0000improve performance. However, we observe that how to select samples for\u0000labelling also significantly impacts performance, particularly under extremely\u0000low-budget settings. The sample selection task in SSL has been under-explored\u0000for a long time. To fill in this gap, we propose a Representative and Diverse\u0000Sample Selection approach (RDSS). By adopting a modified Frank-Wolfe algorithm\u0000to minimise a novel criterion $alpha$-Maximum Mean Discrepancy ($alpha$-MMD),\u0000RDSS samples a representative and diverse subset for annotation from the\u0000unlabeled data. We demonstrate that minimizing $alpha$-MMD enhances the\u0000generalization ability of low-budget learning. Experimental results show that\u0000RDSS consistently improves the performance of several popular SSL frameworks\u0000and outperforms the state-of-the-art sample selection approaches used in Active\u0000Learning (AL) and Semi-Supervised Active Learning (SSAL), even with constrained\u0000annotation budgets.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Wazed AliIntelligent Embedded Systems, Asif bin MustafaSchool of CIT, Technical University of Munich, Munich, Germany, Md. Aukerul Moin ShuvoDept. of Computer Science and Engineering, Rajshahi University of Engg. & Technology, Rajshahi, Bangladesh, Bernhard SickIntelligent Embedded Systems
Electrification of vehicles is a potential way of reducing fossil fuel usage and thus lessening environmental pollution. Electric Vehicles (EVs) of various types for different transport modes (including air, water, and land) are evolving. Moreover, different EV user groups (commuters, commercial or domestic users, drivers) may use different charging infrastructures (public, private, home, and workplace) at various times. Therefore, usage patterns and energy demand are very stochastic. Characterizing and forecasting the charging demand of these diverse EV usage profiles is essential in preventing power outages. Previously developed data-driven load models are limited to specific use cases and locations. None of these models are simultaneously adaptive enough to transfer knowledge of day-ahead forecasting among EV charging sites of diverse locations, trained with limited data, and cost-effective. This article presents a location-based load forecasting of EV charging sites using a deep Multi-Quantile Temporal Convolutional Network (MQ-TCN) to overcome the limitations of earlier models. We conducted our experiments on data from four charging sites, namely Caltech, JPL, Office-1, and NREL, which have diverse EV user types like students, full-time and part-time employees, random visitors, etc. With a Prediction Interval Coverage Probability (PICP) score of 93.62%, our proposed deep MQ-TCN model exhibited a remarkable 28.93% improvement over the XGBoost model for a day-ahead load forecasting at the JPL charging site. By transferring knowledge with the inductive Transfer Learning (TL) approach, the MQ-TCN model achieved a 96.88% PICP score for the load forecasting task at the NREL site using only two weeks of data.
{"title":"Location based Probabilistic Load Forecasting of EV Charging Sites: Deep Transfer Learning with Multi-Quantile Temporal Convolutional Network","authors":"Mohammad Wazed AliIntelligent Embedded Systems, Asif bin MustafaSchool of CIT, Technical University of Munich, Munich, Germany, Md. Aukerul Moin ShuvoDept. of Computer Science and Engineering, Rajshahi University of Engg. & Technology, Rajshahi, Bangladesh, Bernhard SickIntelligent Embedded Systems","doi":"arxiv-2409.11862","DOIUrl":"https://doi.org/arxiv-2409.11862","url":null,"abstract":"Electrification of vehicles is a potential way of reducing fossil fuel usage\u0000and thus lessening environmental pollution. Electric Vehicles (EVs) of various\u0000types for different transport modes (including air, water, and land) are\u0000evolving. Moreover, different EV user groups (commuters, commercial or domestic\u0000users, drivers) may use different charging infrastructures (public, private,\u0000home, and workplace) at various times. Therefore, usage patterns and energy\u0000demand are very stochastic. Characterizing and forecasting the charging demand\u0000of these diverse EV usage profiles is essential in preventing power outages.\u0000Previously developed data-driven load models are limited to specific use cases\u0000and locations. None of these models are simultaneously adaptive enough to\u0000transfer knowledge of day-ahead forecasting among EV charging sites of diverse\u0000locations, trained with limited data, and cost-effective. This article presents\u0000a location-based load forecasting of EV charging sites using a deep\u0000Multi-Quantile Temporal Convolutional Network (MQ-TCN) to overcome the\u0000limitations of earlier models. We conducted our experiments on data from four\u0000charging sites, namely Caltech, JPL, Office-1, and NREL, which have diverse EV\u0000user types like students, full-time and part-time employees, random visitors,\u0000etc. With a Prediction Interval Coverage Probability (PICP) score of 93.62%,\u0000our proposed deep MQ-TCN model exhibited a remarkable 28.93% improvement over\u0000the XGBoost model for a day-ahead load forecasting at the JPL charging site. By\u0000transferring knowledge with the inductive Transfer Learning (TL) approach, the\u0000MQ-TCN model achieved a 96.88% PICP score for the load forecasting task at the\u0000NREL site using only two weeks of data.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}