arXiv - CS - Machine Learning最新文献_第2页

FedLF: Adaptive Logit Adjustment and Feature Optimization in Federated Long-Tailed Learning FedLF：联合长尾学习中的自适应对数调整和特征优化

arXiv - CS - Machine Learning

Pub Date : 2024-09-18 DOI: arxiv-2409.12105

Xiuhua Lu, Peng Li, Xuefeng Jiang

Federated learning offers a paradigm to the challenge of preserving privacyin distributed machine learning. However, datasets distributed across eachclient in the real world are inevitably heterogeneous, and if the datasets canbe globally aggregated, they tend to be long-tailed distributed, which greatlyaffects the performance of the model. The traditional approach to federatedlearning primarily addresses the heterogeneity of data among clients, yet itfails to address the phenomenon of class-wise bias in global long-tailed data.This results in the trained model focusing on the head classes while neglectingthe equally important tail classes. Consequently, it is essential to develop amethodology that considers classes holistically. To address the above problems,we propose a new method FedLF, which introduces three modifications in thelocal training phase: adaptive logit adjustment, continuous class centredoptimization, and feature decorrelation. We compare seven state-of-the-artmethods with varying degrees of data heterogeneity and long-taileddistribution. Extensive experiments on benchmark datasets CIFAR-10-LT andCIFAR-100-LT demonstrate that our approach effectively mitigates the problem ofmodel performance degradation due to data heterogeneity and long-taileddistribution. our code is available at https://github.com/18sym/FedLF.

联盟学习为解决分布式机器学习中的隐私保护难题提供了一种范例。然而，现实世界中分布在每个客户端的数据集不可避免地具有异质性，如果数据集可以进行全局聚合，它们往往是长尾分布的，这会极大地影响模型的性能。传统的联合学习方法主要解决的是客户端之间数据的异构性问题，但却无法解决全局长尾数据中的类偏差现象。因此，开发一种全面考虑类别的方法至关重要。为了解决上述问题，我们提出了一种新方法 FedLF，它在局部训练阶段引入了三项修正：自适应 logit 调整、连续类中心优化和特征去相关性。我们比较了数据异质性和长尾分布程度不同的七种最新方法。在基准数据集 CIFAR-10-LT 和 CIFAR-100-LT 上进行的大量实验证明，我们的方法能有效缓解数据异质性和长尾分布导致的模型性能下降问题。我们的代码可在 https://github.com/18sym/FedLF 上获取。

{"title":"FedLF: Adaptive Logit Adjustment and Feature Optimization in Federated Long-Tailed Learning","authors":"Xiuhua Lu, Peng Li, Xuefeng Jiang","doi":"arxiv-2409.12105","DOIUrl":"https://doi.org/arxiv-2409.12105","url":null,"abstract":"Federated learning offers a paradigm to the challenge of preserving privacy\u0000in distributed machine learning. However, datasets distributed across each\u0000client in the real world are inevitably heterogeneous, and if the datasets can\u0000be globally aggregated, they tend to be long-tailed distributed, which greatly\u0000affects the performance of the model. The traditional approach to federated\u0000learning primarily addresses the heterogeneity of data among clients, yet it\u0000fails to address the phenomenon of class-wise bias in global long-tailed data.\u0000This results in the trained model focusing on the head classes while neglecting\u0000the equally important tail classes. Consequently, it is essential to develop a\u0000methodology that considers classes holistically. To address the above problems,\u0000we propose a new method FedLF, which introduces three modifications in the\u0000local training phase: adaptive logit adjustment, continuous class centred\u0000optimization, and feature decorrelation. We compare seven state-of-the-art\u0000methods with varying degrees of data heterogeneity and long-tailed\u0000distribution. Extensive experiments on benchmark datasets CIFAR-10-LT and\u0000CIFAR-100-LT demonstrate that our approach effectively mitigates the problem of\u0000model performance degradation due to data heterogeneity and long-tailed\u0000distribution. our code is available at https://github.com/18sym/FedLF.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Recent Advances in OOD Detection: Problems and Approaches OOD 检测的最新进展：问题与方法

arXiv - CS - Machine Learning

Pub Date : 2024-09-18 DOI: arxiv-2409.11884

Shuo Lu, YingSheng Wang, LuJun Sheng, AiHua Zheng, LinXiao He, Jian Liang

Out-of-distribution (OOD) detection aims to detect test samples outside thetraining category space, which is an essential component in building reliablemachine learning systems. Existing reviews on OOD detection primarily focus onmethod taxonomy, surveying the field by categorizing various approaches.However, many recent works concentrate on non-traditional OOD detectionscenarios, such as test-time adaptation, multi-modal data sources and othernovel contexts. In this survey, we uniquely review recent advances in OODdetection from the problem scenario perspective for the first time. Accordingto whether the training process is completely controlled, we divide OODdetection methods into training-driven and training-agnostic. Besides,considering the rapid development of pre-trained models, large pre-trainedmodel-based OOD detection is also regarded as an important category anddiscussed separately. Furthermore, we provide a discussion of the evaluationscenarios, a variety of applications, and several future research directions.We believe this survey with new taxonomy will benefit the proposal of newmethods and the expansion of more practical scenarios. A curated list ofrelated papers is provided in the Github repository:url{https://github.com/shuolucs/Awesome-Out-Of-Distribution-Detection}

分布外（OOD）检测旨在检测训练类别空间之外的测试样本，是构建可靠的机器学习系统的重要组成部分。现有的 OOD 检测综述主要集中在方法分类学方面，通过对各种方法进行分类来对该领域进行调查。然而，最近的许多作品都集中在非传统的 OOD 检测场景上，如测试时间适应、多模式数据源和其他新的背景。在本研究中，我们首次从问题场景的角度独特地回顾了 OOD 检测的最新进展。根据训练过程是否完全可控，我们将 OOD 检测方法分为训练驱动型和训练无关型。此外，考虑到预训练模型的快速发展，基于大型预训练模型的 OOD 检测也被视为一个重要类别，并单独进行了讨论。此外，我们还讨论了评估场景、各种应用以及未来的几个研究方向。我们相信，这份带有新分类法的调查报告将有助于提出新方法和扩展更多实用场景。Github 存储库中提供了相关论文的精选列表：url{https://github.com/shuolucs/Awesome-Out-Of-Distribution-Detection}。

{"title":"Recent Advances in OOD Detection: Problems and Approaches","authors":"Shuo Lu, YingSheng Wang, LuJun Sheng, AiHua Zheng, LinXiao He, Jian Liang","doi":"arxiv-2409.11884","DOIUrl":"https://doi.org/arxiv-2409.11884","url":null,"abstract":"Out-of-distribution (OOD) detection aims to detect test samples outside the\u0000training category space, which is an essential component in building reliable\u0000machine learning systems. Existing reviews on OOD detection primarily focus on\u0000method taxonomy, surveying the field by categorizing various approaches.\u0000However, many recent works concentrate on non-traditional OOD detection\u0000scenarios, such as test-time adaptation, multi-modal data sources and other\u0000novel contexts. In this survey, we uniquely review recent advances in OOD\u0000detection from the problem scenario perspective for the first time. According\u0000to whether the training process is completely controlled, we divide OOD\u0000detection methods into training-driven and training-agnostic. Besides,\u0000considering the rapid development of pre-trained models, large pre-trained\u0000model-based OOD detection is also regarded as an important category and\u0000discussed separately. Furthermore, we provide a discussion of the evaluation\u0000scenarios, a variety of applications, and several future research directions.\u0000We believe this survey with new taxonomy will benefit the proposal of new\u0000methods and the expansion of more practical scenarios. A curated list of\u0000related papers is provided in the Github repository:\u0000url{https://github.com/shuolucs/Awesome-Out-Of-Distribution-Detection}","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Less Memory Means smaller GPUs: Backpropagation with Compressed Activations 更少的内存意味着更小的 GPU：使用压缩激活的反向传播

arXiv - CS - Machine Learning

Pub Date : 2024-09-18 DOI: arxiv-2409.11902

Daniel Barley, Holger Fröning

The ever-growing scale of deep neural networks (DNNs) has lead to an equallyrapid growth in computational resource requirements. Many recent architectures,most prominently Large Language Models, have to be trained using supercomputerswith thousands of accelerators, such as GPUs or TPUs. Next to the vast numberof floating point operations the memory footprint of DNNs is also exploding. Incontrast, GPU architectures are notoriously short on memory. Even comparativelysmall architectures like some EfficientNet variants cannot be trained on asingle consumer-grade GPU at reasonable mini-batch sizes. During training,intermediate input activations have to be stored until backpropagation forgradient calculation. These make up the vast majority of the memory footprint.In this work we therefore consider compressing activation maps for the backwardpass using pooling, which can reduce both the memory footprint and amount ofdata movement. The forward computation remains uncompressed. We empiricallyshow convergence and study effects on feature detection at the example of thecommon vision architecture ResNet. With this approach we are able to reduce thepeak memory consumption by 29% at the cost of a longer training schedule, whilemaintaining prediction accuracy compared to an uncompressed baseline.

深度神经网络（DNN）的规模不断扩大，导致对计算资源的需求也同样快速增长。最近的许多架构，尤其是大型语言模型，都必须使用配备数千个加速器（如 GPU 或 TPU）的超级计算机进行训练。除了大量浮点运算外，DNN 的内存占用也呈爆炸式增长。与此形成鲜明对比的是，GPU 体系结构的内存不足是众所周知的。即使是像某些 EfficientNet 变体这样相对较小的架构，也无法在单个消费级 GPU 上以合理的小批量规模进行训练。在训练过程中，必须存储中间输入激活，直到反向传播梯度计算为止。因此，在这项工作中，我们考虑使用池化技术压缩后向通路的激活图，这样可以减少内存占用和数据移动量。前向计算仍未压缩。我们以常见的视觉架构 ResNet 为例，通过经验展示了收敛性，并研究了对特征检测的影响。通过这种方法，我们能够将峰值内存消耗减少 29%，但代价是需要更长的训练时间，同时与未压缩的基线相比，预测准确率得以保持。

{"title":"Less Memory Means smaller GPUs: Backpropagation with Compressed Activations","authors":"Daniel Barley, Holger Fröning","doi":"arxiv-2409.11902","DOIUrl":"https://doi.org/arxiv-2409.11902","url":null,"abstract":"The ever-growing scale of deep neural networks (DNNs) has lead to an equally\u0000rapid growth in computational resource requirements. Many recent architectures,\u0000most prominently Large Language Models, have to be trained using supercomputers\u0000with thousands of accelerators, such as GPUs or TPUs. Next to the vast number\u0000of floating point operations the memory footprint of DNNs is also exploding. In\u0000contrast, GPU architectures are notoriously short on memory. Even comparatively\u0000small architectures like some EfficientNet variants cannot be trained on a\u0000single consumer-grade GPU at reasonable mini-batch sizes. During training,\u0000intermediate input activations have to be stored until backpropagation for\u0000gradient calculation. These make up the vast majority of the memory footprint.\u0000In this work we therefore consider compressing activation maps for the backward\u0000pass using pooling, which can reduce both the memory footprint and amount of\u0000data movement. The forward computation remains uncompressed. We empirically\u0000show convergence and study effects on feature detection at the example of the\u0000common vision architecture ResNet. With this approach we are able to reduce the\u0000peak memory consumption by 29% at the cost of a longer training schedule, while\u0000maintaining prediction accuracy compared to an uncompressed baseline.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Constraint Guided AutoEncoders for Joint Optimization of Condition Indicator Estimation and Anomaly Detection in Machine Condition Monitoring 用于联合优化机器状态监测中的状态指标估计和异常检测的约束引导自动编码器

arXiv - CS - Machine Learning

Pub Date : 2024-09-18 DOI: arxiv-2409.11807

Maarten Meire, Quinten Van Baelen, Ted Ooijevaar, Peter Karsmakers

The main goal of machine condition monitoring is, as the name implies, tomonitor the condition of industrial applications. The objective of thismonitoring can be mainly split into two problems. A diagnostic problem, wherenormal data should be distinguished from anomalous data, otherwise calledAnomaly Detection (AD), or a prognostic problem, where the aim is to predictthe evolution of a Condition Indicator (CI) that reflects the condition of anasset throughout its life time. When considering machine condition monitoring,it is expected that this CI shows a monotonic behavior, as the condition of amachine gradually degrades over time. This work proposes an extension toConstraint Guided AutoEncoders (CGAE), which is a robust AD method, thatenables building a single model that can be used for both AD and CI estimation.For the purpose of improved CI estimation the extension incorporates aconstraint that enforces the model to have monotonically increasing CIpredictions over time. Experimental results indicate that the proposedalgorithm performs similar, or slightly better, than CGAE, with regards to AD,while improving the monotonic behavior of the CI.

顾名思义，机器状态监测的主要目的是监测工业应用的状态。这种监控的目标主要可分为两个问题。一个是诊断问题，需要将正常数据与异常数据区分开来，也称为异常检测 (AD)；另一个是预测问题，目的是预测状态指标 (CI) 的变化，该指标反映了资产在整个生命周期内的状态。在考虑机器状态监控时，随着时间的推移，机器的状态会逐渐恶化，因此预计该 CI 会表现出单调的行为。为了改进 CI 估算，该扩展包含了一个约束条件，强制模型具有随时间单调递增的 CI 预测。实验结果表明，所提出的算法在 AD 方面的表现与 CGAE 相似或略胜一筹，同时改进了 CI 的单调行为。

{"title":"Constraint Guided AutoEncoders for Joint Optimization of Condition Indicator Estimation and Anomaly Detection in Machine Condition Monitoring","authors":"Maarten Meire, Quinten Van Baelen, Ted Ooijevaar, Peter Karsmakers","doi":"arxiv-2409.11807","DOIUrl":"https://doi.org/arxiv-2409.11807","url":null,"abstract":"The main goal of machine condition monitoring is, as the name implies, to\u0000monitor the condition of industrial applications. The objective of this\u0000monitoring can be mainly split into two problems. A diagnostic problem, where\u0000normal data should be distinguished from anomalous data, otherwise called\u0000Anomaly Detection (AD), or a prognostic problem, where the aim is to predict\u0000the evolution of a Condition Indicator (CI) that reflects the condition of an\u0000asset throughout its life time. When considering machine condition monitoring,\u0000it is expected that this CI shows a monotonic behavior, as the condition of a\u0000machine gradually degrades over time. This work proposes an extension to\u0000Constraint Guided AutoEncoders (CGAE), which is a robust AD method, that\u0000enables building a single model that can be used for both AD and CI estimation.\u0000For the purpose of improved CI estimation the extension incorporates a\u0000constraint that enforces the model to have monotonically increasing CI\u0000predictions over time. Experimental results indicate that the proposed\u0000algorithm performs similar, or slightly better, than CGAE, with regards to AD,\u0000while improving the monotonic behavior of the CI.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Edge-Based Graph Component Pooling 基于边的图形组件池

arXiv - CS - Machine Learning

Pub Date : 2024-09-18 DOI: arxiv-2409.11856

T. Snelleman, B. M. Renting, H. H. Hoos, J. N. van Rijn

Graph-structured data naturally occurs in many research fields, such aschemistry and sociology. The relational information contained therein can beleveraged to statistically model graph properties through geometrical deeplearning. Graph neural networks employ techniques, such as message-passinglayers, to propagate local features through a graph. However, message-passinglayers can be computationally expensive when dealing with large and sparsegraphs. Graph pooling operators offer the possibility of removing or mergingnodes in such graphs, thus lowering computational costs. However, poolingoperators that remove nodes cause data loss, and pooling operators that mergenodes are often computationally expensive. We propose a pooling operator thatmerges nodes so as not to cause data loss but is also conceptually simple andcomputationally inexpensive. We empirically demonstrate that the proposedpooling operator performs statistically significantly better than edge pool onfour popular benchmark datasets while reducing time complexity and the numberof trainable parameters by 70.6% on average. Compared to another maximallypowerful method named Graph Isomporhic Network, we show that we outperform themon two popular benchmark datasets while reducing the number of learnableparameters on average by 60.9%.

图结构数据自然出现在许多研究领域，如化学和社会学。其中包含的关系信息可以通过几何深度学习来对图形属性进行统计建模。图神经网络采用消息传递层等技术在图中传播局部特征。然而，在处理大型稀疏图时，消息传递层的计算成本可能会很高。图池算子提供了在此类图中移除或合并节点的可能性，从而降低了计算成本。然而，移除节点的汇集算子会导致数据丢失，而合并节点的汇集算子通常计算成本很高。我们提出了一种合并节点的汇集算子，它不仅不会造成数据丢失，而且概念简单、计算成本低廉。我们通过实证证明，在四个流行的基准数据集上，所提出的汇集算子的统计性能明显优于边缘汇集算子，同时平均降低了 70.6% 的时间复杂度和可训练参数的数量。与另一种名为 "图形等距网络 "的最大化方法相比，我们表明在两个流行的基准数据集上，我们的表现优于它们，同时可学习参数的数量平均减少了 60.9%。

{"title":"Edge-Based Graph Component Pooling","authors":"T. Snelleman, B. M. Renting, H. H. Hoos, J. N. van Rijn","doi":"arxiv-2409.11856","DOIUrl":"https://doi.org/arxiv-2409.11856","url":null,"abstract":"Graph-structured data naturally occurs in many research fields, such as\u0000chemistry and sociology. The relational information contained therein can be\u0000leveraged to statistically model graph properties through geometrical deep\u0000learning. Graph neural networks employ techniques, such as message-passing\u0000layers, to propagate local features through a graph. However, message-passing\u0000layers can be computationally expensive when dealing with large and sparse\u0000graphs. Graph pooling operators offer the possibility of removing or merging\u0000nodes in such graphs, thus lowering computational costs. However, pooling\u0000operators that remove nodes cause data loss, and pooling operators that merge\u0000nodes are often computationally expensive. We propose a pooling operator that\u0000merges nodes so as not to cause data loss but is also conceptually simple and\u0000computationally inexpensive. We empirically demonstrate that the proposed\u0000pooling operator performs statistically significantly better than edge pool on\u0000four popular benchmark datasets while reducing time complexity and the number\u0000of trainable parameters by 70.6% on average. Compared to another maximally\u0000powerful method named Graph Isomporhic Network, we show that we outperform them\u0000on two popular benchmark datasets while reducing the number of learnable\u0000parameters on average by 60.9%.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unsupervised Domain Adaptation Via Data Pruning 通过数据剪枝实现无监督领域自适应

arXiv - CS - Machine Learning

Pub Date : 2024-09-18 DOI: arxiv-2409.12076

Andrea Napoli, Paul White

The removal of carefully-selected examples from training data has recentlyemerged as an effective way of improving the robustness of machine learningmodels. However, the best way to select these examples remains an openquestion. In this paper, we consider the problem from the perspective ofunsupervised domain adaptation (UDA). We propose AdaPrune, a method for UDAwhereby training examples are removed to attempt to align the trainingdistribution to that of the target data. By adopting the maximum meandiscrepancy (MMD) as the criterion for alignment, the problem can be neatlyformulated and solved as an integer quadratic program. We evaluate our approachon a real-world domain shift task of bioacoustic event detection. As a methodfor UDA, we show that AdaPrune outperforms related techniques, and iscomplementary to other UDA algorithms such as CORAL. Our analysis of therelationship between the MMD and model accuracy, along with t-SNE plots,validate the proposed method as a principled and well-founded way of performingdata pruning.

最近，从训练数据中删除精心挑选的示例已成为提高机器学习模型鲁棒性的一种有效方法。然而，选择这些示例的最佳方法仍然是一个悬而未决的问题。在本文中，我们从无监督领域适应（UDA）的角度来考虑这个问题。我们提出的 AdaPrune 是一种用于 UDA 的方法，通过移除训练示例来尝试使训练分布与目标数据的分布保持一致。通过采用最大差分（MMD）作为对齐标准，可以将问题简化为整数二次方程程序并加以解决。我们在生物声学事件检测的实际领域转移任务中评估了我们的方法。结果表明，作为一种 UDA 方法，AdaPrune 优于相关技术，并可与 CORAL 等其他 UDA 算法互补。我们对 MMD 和模型准确性之间关系的分析以及 t-SNE 图验证了所提出的方法是一种原则性的、有理有据的数据剪枝方法。

引用次数: 0

Multi-Grid Graph Neural Networks with Self-Attention for Computational Mechanics 用于计算力学的具有自注意力的多网格图神经网络

arXiv - CS - Machine Learning

Pub Date : 2024-09-18 DOI: arxiv-2409.11899

Paul Garnier, Jonathan Viquerat, Elie Hachem

Advancement in finite element methods have become essential in variousdisciplines, and in particular for Computational Fluid Dynamics (CFD), drivingresearch efforts for improved precision and efficiency. While ConvolutionalNeural Networks (CNNs) have found success in CFD by mapping meshes into images,recent attention has turned to leveraging Graph Neural Networks (GNNs) fordirect mesh processing. This paper introduces a novel model mergingSelf-Attention with Message Passing in GNNs, achieving a 15% reduction in RMSEon the well known flow past a cylinder benchmark. Furthermore, a dynamic meshpruning technique based on Self-Attention is proposed, that leads to a robustGNN-based multigrid approach, also reducing RMSE by 15%. Additionally, a newself-supervised training method based on BERT is presented, resulting in a 25%RMSE reduction. The paper includes an ablation study and outperformsstate-of-the-art models on several challenging datasets, promising advancementssimilar to those recently achieved in natural language and image processing.Finally, the paper introduces a dataset with meshes larger than existing onesby at least an order of magnitude. Code and Datasets will be released athttps://github.com/DonsetPG/multigrid-gnn.

有限元方法的进步已成为各学科，特别是计算流体动力学（CFD）的关键，推动了提高精度和效率的研究工作。虽然卷积神经网络（CNN）通过将网格映射到图像而在 CFD 领域取得了成功，但最近的注意力已转向利用图神经网络（GNN）进行直接网格处理。本文在 GNNs 中引入了一种融合了自我关注和消息传递的新型模型，在众所周知的流过圆柱体基准测试中，RMSE 降低了 15%。此外，本文还提出了一种基于自注意的动态网格剪枝技术，从而产生了一种基于 GNN 的鲁棒多网格方法，也将 RMSE 降低了 15%。此外，还提出了一种基于 BERT 的自我监督训练方法，使 RMSE 降低了 25%。该论文包括一项消融研究，在几个具有挑战性的数据集上的表现优于目前最先进的模型，有望取得类似于最近在自然语言和图像处理领域取得的进展。最后，该论文介绍了一个网格比现有网格大至少一个数量级的数据集。代码和数据集将在https://github.com/DonsetPG/multigrid-gnn。

{"title":"Multi-Grid Graph Neural Networks with Self-Attention for Computational Mechanics","authors":"Paul Garnier, Jonathan Viquerat, Elie Hachem","doi":"arxiv-2409.11899","DOIUrl":"https://doi.org/arxiv-2409.11899","url":null,"abstract":"Advancement in finite element methods have become essential in various\u0000disciplines, and in particular for Computational Fluid Dynamics (CFD), driving\u0000research efforts for improved precision and efficiency. While Convolutional\u0000Neural Networks (CNNs) have found success in CFD by mapping meshes into images,\u0000recent attention has turned to leveraging Graph Neural Networks (GNNs) for\u0000direct mesh processing. This paper introduces a novel model merging\u0000Self-Attention with Message Passing in GNNs, achieving a 15% reduction in RMSE\u0000on the well known flow past a cylinder benchmark. Furthermore, a dynamic mesh\u0000pruning technique based on Self-Attention is proposed, that leads to a robust\u0000GNN-based multigrid approach, also reducing RMSE by 15%. Additionally, a new\u0000self-supervised training method based on BERT is presented, resulting in a 25%\u0000RMSE reduction. The paper includes an ablation study and outperforms\u0000state-of-the-art models on several challenging datasets, promising advancements\u0000similar to those recently achieved in natural language and image processing.\u0000Finally, the paper introduces a dataset with meshes larger than existing ones\u0000by at least an order of magnitude. Code and Datasets will be released at\u0000https://github.com/DonsetPG/multigrid-gnn.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Interpretable End-Stage Renal Disease (ESRD) Prediction: Utilizing Administrative Claims Data with Explainable AI Techniques 实现可解释的终末期肾病 (ESRD) 预测：利用行政索赔数据和可解释的人工智能技术

arXiv - CS - Machine Learning

Pub Date : 2024-09-18 DOI: arxiv-2409.12087

Yubo Li, Saba Al-Sayouri, Rema Padman

This study explores the potential of utilizing administrative claims data,combined with advanced machine learning and deep learning techniques, topredict the progression of Chronic Kidney Disease (CKD) to End-Stage RenalDisease (ESRD). We analyze a comprehensive, 10-year dataset provided by a majorhealth insurance organization to develop prediction models for multipleobservation windows using traditional machine learning methods such as RandomForest and XGBoost as well as deep learning approaches such as Long Short-TermMemory (LSTM) networks. Our findings demonstrate that the LSTM model,particularly with a 24-month observation window, exhibits superior performancein predicting ESRD progression, outperforming existing models in theliterature. We further apply SHapley Additive exPlanations (SHAP) analysis toenhance interpretability, providing insights into the impact of individualfeatures on predictions at the individual patient level. This study underscoresthe value of leveraging administrative claims data for CKD management andpredicting ESRD progression.

本研究探讨了利用行政报销数据，结合先进的机器学习和深度学习技术，预测慢性肾脏病（CKD）向终末期肾病（ESRD）进展的潜力。我们分析了一家大型医疗保险机构提供的为期 10 年的综合数据集，利用随机森林（RandomForest）和 XGBoost 等传统机器学习方法以及长短期记忆（LSTM）网络等深度学习方法，开发了多个观察窗的预测模型。我们的研究结果表明，LSTM 模型，尤其是在 24 个月的观察窗口中，在预测 ESRD 进展方面表现出卓越的性能，优于文献中的现有模型。我们还进一步应用了SHAPLEY Additive exPlanations（SHAP）分析来增强可解释性，从而深入了解个体特征对患者个体水平预测的影响。这项研究强调了利用行政报销数据进行 CKD 管理和预测 ESRD 进展的价值。

{"title":"Towards Interpretable End-Stage Renal Disease (ESRD) Prediction: Utilizing Administrative Claims Data with Explainable AI Techniques","authors":"Yubo Li, Saba Al-Sayouri, Rema Padman","doi":"arxiv-2409.12087","DOIUrl":"https://doi.org/arxiv-2409.12087","url":null,"abstract":"This study explores the potential of utilizing administrative claims data,\u0000combined with advanced machine learning and deep learning techniques, to\u0000predict the progression of Chronic Kidney Disease (CKD) to End-Stage Renal\u0000Disease (ESRD). We analyze a comprehensive, 10-year dataset provided by a major\u0000health insurance organization to develop prediction models for multiple\u0000observation windows using traditional machine learning methods such as Random\u0000Forest and XGBoost as well as deep learning approaches such as Long Short-Term\u0000Memory (LSTM) networks. Our findings demonstrate that the LSTM model,\u0000particularly with a 24-month observation window, exhibits superior performance\u0000in predicting ESRD progression, outperforming existing models in the\u0000literature. We further apply SHapley Additive exPlanations (SHAP) analysis to\u0000enhance interpretability, providing insights into the impact of individual\u0000features on predictions at the individual patient level. This study underscores\u0000the value of leveraging administrative claims data for CKD management and\u0000predicting ESRD progression.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing Semi-Supervised Learning via Representative and Diverse Sample Selection 通过代表性和多样性样本选择加强半监督学习

arXiv - CS - Machine Learning

Pub Date : 2024-09-18 DOI: arxiv-2409.11653

Qian Shao, Jiangrui Kang, Qiyuan Chen, Zepeng Li, Hongxia Xu, Yiwen Cao, Jiajuan Liang, Jian Wu

Semi-Supervised Learning (SSL) has become a preferred paradigm in many deeplearning tasks, which reduces the need for human labor. Previous studiesprimarily focus on effectively utilising the labelled and unlabeled data toimprove performance. However, we observe that how to select samples forlabelling also significantly impacts performance, particularly under extremelylow-budget settings. The sample selection task in SSL has been under-exploredfor a long time. To fill in this gap, we propose a Representative and DiverseSample Selection approach (RDSS). By adopting a modified Frank-Wolfe algorithmto minimise a novel criterion $alpha$-Maximum Mean Discrepancy ($alpha$-MMD),RDSS samples a representative and diverse subset for annotation from theunlabeled data. We demonstrate that minimizing $alpha$-MMD enhances thegeneralization ability of low-budget learning. Experimental results show thatRDSS consistently improves the performance of several popular SSL frameworksand outperforms the state-of-the-art sample selection approaches used in ActiveLearning (AL) and Semi-Supervised Active Learning (SSAL), even with constrainedannotation budgets.

半监督学习（SSL）已成为许多深度学习任务的首选范式，它减少了对人力的需求。以往的研究主要集中在有效利用标记数据和未标记数据来提高性能。然而，我们发现，如何选择标记样本也会对性能产生重大影响，尤其是在预算极低的情况下。长期以来，SSL 中的样本选择任务一直未得到充分探索。为了填补这一空白，我们提出了一种代表性和多样性样本选择方法（RDSS）。通过采用改进的弗兰克-沃尔夫算法（Frank-Wolfe algorithm）来最小化一个新标准（$alpha$-Maximum Mean Discrepancy ($alpha$-MMD)），RDSS从未标明的数据中采样出一个具有代表性和多样性的注释子集。我们证明，最小化$alpha$-MMD可以增强低预算学习的泛化能力。实验结果表明，即使在标注预算受限的情况下，RDSS 也能持续提高几种流行的 SSL 框架的性能，并优于主动学习（ActiveLearning，AL）和半监督主动学习（Semi-Supervised Active Learning，SSAL）中使用的最先进的样本选择方法。

{"title":"Enhancing Semi-Supervised Learning via Representative and Diverse Sample Selection","authors":"Qian Shao, Jiangrui Kang, Qiyuan Chen, Zepeng Li, Hongxia Xu, Yiwen Cao, Jiajuan Liang, Jian Wu","doi":"arxiv-2409.11653","DOIUrl":"https://doi.org/arxiv-2409.11653","url":null,"abstract":"Semi-Supervised Learning (SSL) has become a preferred paradigm in many deep\u0000learning tasks, which reduces the need for human labor. Previous studies\u0000primarily focus on effectively utilising the labelled and unlabeled data to\u0000improve performance. However, we observe that how to select samples for\u0000labelling also significantly impacts performance, particularly under extremely\u0000low-budget settings. The sample selection task in SSL has been under-explored\u0000for a long time. To fill in this gap, we propose a Representative and Diverse\u0000Sample Selection approach (RDSS). By adopting a modified Frank-Wolfe algorithm\u0000to minimise a novel criterion $alpha$-Maximum Mean Discrepancy ($alpha$-MMD),\u0000RDSS samples a representative and diverse subset for annotation from the\u0000unlabeled data. We demonstrate that minimizing $alpha$-MMD enhances the\u0000generalization ability of low-budget learning. Experimental results show that\u0000RDSS consistently improves the performance of several popular SSL frameworks\u0000and outperforms the state-of-the-art sample selection approaches used in Active\u0000Learning (AL) and Semi-Supervised Active Learning (SSAL), even with constrained\u0000annotation budgets.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Location based Probabilistic Load Forecasting of EV Charging Sites: Deep Transfer Learning with Multi-Quantile Temporal Convolutional Network 基于位置的电动汽车充电站点概率负荷预测：利用多量级时态卷积网络进行深度迁移学习

arXiv - CS - Machine Learning

Pub Date : 2024-09-18 DOI: arxiv-2409.11862

Mohammad Wazed AliIntelligent Embedded Systems, Asif bin MustafaSchool of CIT, Technical University of Munich, Munich, Germany, Md. Aukerul Moin ShuvoDept. of Computer Science and Engineering, Rajshahi University of Engg. & Technology, Rajshahi, Bangladesh, Bernhard SickIntelligent Embedded Systems

Electrification of vehicles is a potential way of reducing fossil fuel usageand thus lessening environmental pollution. Electric Vehicles (EVs) of varioustypes for different transport modes (including air, water, and land) areevolving. Moreover, different EV user groups (commuters, commercial or domesticusers, drivers) may use different charging infrastructures (public, private,home, and workplace) at various times. Therefore, usage patterns and energydemand are very stochastic. Characterizing and forecasting the charging demandof these diverse EV usage profiles is essential in preventing power outages.Previously developed data-driven load models are limited to specific use casesand locations. None of these models are simultaneously adaptive enough totransfer knowledge of day-ahead forecasting among EV charging sites of diverselocations, trained with limited data, and cost-effective. This article presentsa location-based load forecasting of EV charging sites using a deepMulti-Quantile Temporal Convolutional Network (MQ-TCN) to overcome thelimitations of earlier models. We conducted our experiments on data from fourcharging sites, namely Caltech, JPL, Office-1, and NREL, which have diverse EVuser types like students, full-time and part-time employees, random visitors,etc. With a Prediction Interval Coverage Probability (PICP) score of 93.62%,our proposed deep MQ-TCN model exhibited a remarkable 28.93% improvement overthe XGBoost model for a day-ahead load forecasting at the JPL charging site. Bytransferring knowledge with the inductive Transfer Learning (TL) approach, theMQ-TCN model achieved a 96.88% PICP score for the load forecasting task at theNREL site using only two weeks of data.

车辆电气化是减少化石燃料使用从而减轻环境污染的潜在途径。用于不同运输方式（包括航空、水路和陆路）的各种类型的电动汽车（EV）正在不断发展。此外，不同的电动汽车用户群体（通勤者、商业或家庭用户、驾驶员）可能在不同时间使用不同的充电基础设施（公共、私人、家庭和工作场所）。因此，使用模式和能源需求具有很大的随机性。描述和预测这些不同电动汽车使用情况的充电需求对于防止停电至关重要。以前开发的数据驱动负荷模型仅限于特定的使用情况和地点，这些模型都不具备足够的自适应能力，无法同时在不同地点的电动汽车充电点之间传递日前预测的知识，只能利用有限的数据进行训练，而且成本效益不高。本文介绍了一种基于位置的电动汽车充电点负荷预测模型，该模型采用了深度多梯度时序卷积网络（MQ-TCN），克服了早期模型的局限性。我们在加州理工学院、JPL、Office-1 和 NREL 四个充电点的数据上进行了实验，这些充电点的电动汽车用户类型多种多样，如学生、全职和兼职员工、随机访客等。我们提出的深度 MQ-TCN 模型的预测区间覆盖概率（PICP）为 93.62%，与 XGBoost 模型相比，在 JPL 充电点的日前负荷预测方面有 28.93% 的显著改进。通过使用归纳转移学习（TL）方法转移知识，MQ-TCN 模型仅使用两周的数据就在 NREL 站点的负荷预测任务中取得了 96.88% 的 PICP 分数。

{"title":"Location based Probabilistic Load Forecasting of EV Charging Sites: Deep Transfer Learning with Multi-Quantile Temporal Convolutional Network","authors":"Mohammad Wazed AliIntelligent Embedded Systems, Asif bin MustafaSchool of CIT, Technical University of Munich, Munich, Germany, Md. Aukerul Moin ShuvoDept. of Computer Science and Engineering, Rajshahi University of Engg. & Technology, Rajshahi, Bangladesh, Bernhard SickIntelligent Embedded Systems","doi":"arxiv-2409.11862","DOIUrl":"https://doi.org/arxiv-2409.11862","url":null,"abstract":"Electrification of vehicles is a potential way of reducing fossil fuel usage\u0000and thus lessening environmental pollution. Electric Vehicles (EVs) of various\u0000types for different transport modes (including air, water, and land) are\u0000evolving. Moreover, different EV user groups (commuters, commercial or domestic\u0000users, drivers) may use different charging infrastructures (public, private,\u0000home, and workplace) at various times. Therefore, usage patterns and energy\u0000demand are very stochastic. Characterizing and forecasting the charging demand\u0000of these diverse EV usage profiles is essential in preventing power outages.\u0000Previously developed data-driven load models are limited to specific use cases\u0000and locations. None of these models are simultaneously adaptive enough to\u0000transfer knowledge of day-ahead forecasting among EV charging sites of diverse\u0000locations, trained with limited data, and cost-effective. This article presents\u0000a location-based load forecasting of EV charging sites using a deep\u0000Multi-Quantile Temporal Convolutional Network (MQ-TCN) to overcome the\u0000limitations of earlier models. We conducted our experiments on data from four\u0000charging sites, namely Caltech, JPL, Office-1, and NREL, which have diverse EV\u0000user types like students, full-time and part-time employees, random visitors,\u0000etc. With a Prediction Interval Coverage Probability (PICP) score of 93.62%,\u0000our proposed deep MQ-TCN model exhibited a remarkable 28.93% improvement over\u0000the XGBoost model for a day-ahead load forecasting at the JPL charging site. By\u0000transferring knowledge with the inductive Transfer Learning (TL) approach, the\u0000MQ-TCN model achieved a 96.88% PICP score for the load forecasting task at the\u0000NREL site using only two weeks of data.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0