首页 > 最新文献

arXiv - CS - Machine Learning最新文献

英文 中文
Learning Generalized Hamiltonians using fully Symplectic Mappings 利用全交映映射学习广义哈密顿数
Pub Date : 2024-09-17 DOI: arxiv-2409.11138
Harsh Choudhary, Chandan Gupta, Vyacheslav kungrutsev, Melvin Leok, Georgios Korpas
Many important physical systems can be described as the evolution of aHamiltonian system, which has the important property of being conservative,that is, energy is conserved throughout the evolution. Physics Informed NeuralNetworks and in particular Hamiltonian Neural Networks have emerged as amechanism to incorporate structural inductive bias into the NN model. Byensuring physical invariances are conserved, the models exhibit significantlybetter sample complexity and out-of-distribution accuracy than standard NNs.Learning the Hamiltonian as a function of its canonical variables, typicallyposition and velocity, from sample observations of the system thus becomes acritical task in system identification and long-term prediction of systembehavior. However, to truly preserve the long-run physical conservationproperties of Hamiltonian systems, one must use symplectic integrators for aforward pass of the system's simulation. While symplectic schemes have beenused in the literature, they are thus far limited to situations when theyreduce to explicit algorithms, which include the case of separable Hamiltoniansor augmented non-separable Hamiltonians. We extend it to generalizednon-separable Hamiltonians, and noting the self-adjoint property of symplecticintegrators, we bypass computationally intensive backpropagation through an ODEsolver. We show that the method is robust to noise and provides a goodapproximation of the system Hamiltonian when the state variables are sampledfrom a noisy observation. In the numerical results, we show the performance ofthe method concerning Hamiltonian reconstruction and conservation, indicatingits particular advantage for non-separable systems.
许多重要的物理系统都可以描述为哈密尔顿系统的演化过程,而哈密尔顿系统具有保守的重要特性,即在整个演化过程中能量是守恒的。物理信息神经网络,尤其是哈密顿神经网络,是将结构归纳偏差纳入神经网络模型的一种机制。通过确保物理不变性得到保留,这些模型的样本复杂性和分布外准确性明显优于标准神经网络。因此,从系统的样本观测中学习哈密顿函数作为其典型变量(通常是位置和速度)的函数,成为系统识别和系统行为长期预测的关键任务。然而,要真正保持哈密顿系统的长期物理守恒特性,就必须使用交映积分器对系统进行前向模拟。虽然文献中已经使用了交映方案,但迄今为止,它们仅限于简化为显式算法的情况,其中包括可分离哈密顿或增强非可分离哈密顿的情况。我们将其扩展到广义的非可分哈密顿,并注意到交点积分器的自交特性,通过一个 ODE 求解器绕过了计算密集的反向传播。我们证明了该方法对噪声的鲁棒性,并在从噪声观测中对状态变量进行采样时,提供了系统哈密顿的良好近似值。在数值结果中,我们展示了该方法在哈密顿重构和守恒方面的性能,显示了它在非分离系统中的特殊优势。
{"title":"Learning Generalized Hamiltonians using fully Symplectic Mappings","authors":"Harsh Choudhary, Chandan Gupta, Vyacheslav kungrutsev, Melvin Leok, Georgios Korpas","doi":"arxiv-2409.11138","DOIUrl":"https://doi.org/arxiv-2409.11138","url":null,"abstract":"Many important physical systems can be described as the evolution of a\u0000Hamiltonian system, which has the important property of being conservative,\u0000that is, energy is conserved throughout the evolution. Physics Informed Neural\u0000Networks and in particular Hamiltonian Neural Networks have emerged as a\u0000mechanism to incorporate structural inductive bias into the NN model. By\u0000ensuring physical invariances are conserved, the models exhibit significantly\u0000better sample complexity and out-of-distribution accuracy than standard NNs.\u0000Learning the Hamiltonian as a function of its canonical variables, typically\u0000position and velocity, from sample observations of the system thus becomes a\u0000critical task in system identification and long-term prediction of system\u0000behavior. However, to truly preserve the long-run physical conservation\u0000properties of Hamiltonian systems, one must use symplectic integrators for a\u0000forward pass of the system's simulation. While symplectic schemes have been\u0000used in the literature, they are thus far limited to situations when they\u0000reduce to explicit algorithms, which include the case of separable Hamiltonians\u0000or augmented non-separable Hamiltonians. We extend it to generalized\u0000non-separable Hamiltonians, and noting the self-adjoint property of symplectic\u0000integrators, we bypass computationally intensive backpropagation through an ODE\u0000solver. We show that the method is robust to noise and provides a good\u0000approximation of the system Hamiltonian when the state variables are sampled\u0000from a noisy observation. In the numerical results, we show the performance of\u0000the method concerning Hamiltonian reconstruction and conservation, indicating\u0000its particular advantage for non-separable systems.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A logical alarm for misaligned binary classifiers 二进制分类器错位的逻辑警报
Pub Date : 2024-09-17 DOI: arxiv-2409.11052
Andrés Corrada-Emmanuel, Ilya Parker, Ramesh Bharadwaj
If two agents disagree in their decisions, we may suspect they are not bothcorrect. This intuition is formalized for evaluating agents that have carriedout a binary classification task. Their agreements and disagreements on a jointtest allow us to establish the only group evaluations logically consistent withtheir responses. This is done by establishing a set of axioms (algebraicrelations) that must be universally obeyed by all evaluations of binaryresponders. A complete set of such axioms are possible for each ensemble ofsize N. The axioms for $N = 1, 2$ are used to construct a fully logical alarm -one that can prove that at least one ensemble member is malfunctioning usingonly unlabeled data. The similarities of this approach to formal softwareverification and its utility for recent agendas of safe guaranteed AI arediscussed.
如果两个代理的决定不一致,我们可能会怀疑他们并不都是正确的。这种直觉在对执行二元分类任务的代理进行评估时得到了形式化。通过它们在联合测试中的一致和分歧,我们可以确定与它们的回答在逻辑上唯一一致的小组评价。要做到这一点,我们需要建立一组公理(代数关系),所有对二元应答者的评价都必须普遍遵守这些公理。N=1、2$ 的公理可用于构建完全符合逻辑的警报--只需使用未标记的数据,就能证明至少有一个集合成员出现了故障。本文讨论了这种方法与正式软件验证的相似之处,以及它对近期安全保证人工智能议程的实用性。
{"title":"A logical alarm for misaligned binary classifiers","authors":"Andrés Corrada-Emmanuel, Ilya Parker, Ramesh Bharadwaj","doi":"arxiv-2409.11052","DOIUrl":"https://doi.org/arxiv-2409.11052","url":null,"abstract":"If two agents disagree in their decisions, we may suspect they are not both\u0000correct. This intuition is formalized for evaluating agents that have carried\u0000out a binary classification task. Their agreements and disagreements on a joint\u0000test allow us to establish the only group evaluations logically consistent with\u0000their responses. This is done by establishing a set of axioms (algebraic\u0000relations) that must be universally obeyed by all evaluations of binary\u0000responders. A complete set of such axioms are possible for each ensemble of\u0000size N. The axioms for $N = 1, 2$ are used to construct a fully logical alarm -\u0000one that can prove that at least one ensemble member is malfunctioning using\u0000only unlabeled data. The similarities of this approach to formal software\u0000verification and its utility for recent agendas of safe guaranteed AI are\u0000discussed.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
WaterQualityNeT: Prediction of Seasonal Water Quality of Nepal Using Hybrid Deep Learning Models WaterQualityNeT:利用混合深度学习模型预测尼泊尔的季节性水质
Pub Date : 2024-09-17 DOI: arxiv-2409.10898
Biplov Paneru, Bishwash Paneru
Ensuring a safe and uncontaminated water supply is contingent upon themonitoring of water quality, especially in developing countries such as Nepal,where water sources are susceptible to pollution. This paper presents a hybriddeep learning model for predicting Nepal's seasonal water quality using a smalldataset with many water quality parameters. The model integrates convolutionalneural networks (CNN) and recurrent neural networks (RNN) to exploit temporaland spatial patterns in the data. The results demonstrate significantimprovements in forecast accuracy over traditional methods, providing areliable tool for proactive control of water quality. The model that used WQIparameters to classify people into good, poor, and average groups performed 92%of the time in testing. Similarly, the R2 score was 0.97 and the root meansquare error was 2.87 when predicting WQI values using regression analysis.Additionally, a multifunctional application that uses both a regression and aclassification approach is built to predict WQI values.
确保安全和不受污染的供水取决于对水质的监测,尤其是在尼泊尔等水源易受污染的发展中国家。本文介绍了一种混合深度学习模型,该模型利用包含许多水质参数的小型数据集预测尼泊尔的季节性水质。该模型整合了卷积神经网络(CNN)和循环神经网络(RNN),以利用数据中的时间和空间模式。结果表明,与传统方法相比,该模型的预测准确率有了明显提高,为主动控制水质提供了可靠的工具。使用 WQI 参数将人们分为好、差和一般组的模型在测试中的表现达到了 92%。同样,在使用回归分析预测 WQI 值时,R2 得分为 0.97,均方根误差为 2.87。
{"title":"WaterQualityNeT: Prediction of Seasonal Water Quality of Nepal Using Hybrid Deep Learning Models","authors":"Biplov Paneru, Bishwash Paneru","doi":"arxiv-2409.10898","DOIUrl":"https://doi.org/arxiv-2409.10898","url":null,"abstract":"Ensuring a safe and uncontaminated water supply is contingent upon the\u0000monitoring of water quality, especially in developing countries such as Nepal,\u0000where water sources are susceptible to pollution. This paper presents a hybrid\u0000deep learning model for predicting Nepal's seasonal water quality using a small\u0000dataset with many water quality parameters. The model integrates convolutional\u0000neural networks (CNN) and recurrent neural networks (RNN) to exploit temporal\u0000and spatial patterns in the data. The results demonstrate significant\u0000improvements in forecast accuracy over traditional methods, providing a\u0000reliable tool for proactive control of water quality. The model that used WQI\u0000parameters to classify people into good, poor, and average groups performed 92%\u0000of the time in testing. Similarly, the R2 score was 0.97 and the root mean\u0000square error was 2.87 when predicting WQI values using regression analysis.\u0000Additionally, a multifunctional application that uses both a regression and a\u0000classification approach is built to predict WQI values.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Chess Rating Estimation from Moves and Clock Times Using a CNN-LSTM 使用 CNN-LSTM 根据走棋和时钟时间估算国际象棋等级分
Pub Date : 2024-09-17 DOI: arxiv-2409.11506
Michael Omori, Prasad Tadepalli
Current rating systems update ratings incrementally and may not alwaysaccurately reflect a player's true strength at all times, especially forrapidly improving players or very rusty players. To overcome this, we explore amethod to estimate player ratings directly from game moves and clock times. Wecompiled a benchmark dataset from Lichess, encompassing various time controlsand including move sequences and clock times. Our model architecture comprisesa CNN to learn positional features, which are then integrated with clock-timedata into a bidirectional LSTM, predicting player ratings after each move. Themodel achieved an MAE of 182 rating points in the test data. Additionally, weapplied our model to the 2024 IEEE Big Data Cup Chess Puzzle DifficultyCompetition dataset, predicted puzzle ratings and achieved competitive results.This model is the first to use no hand-crafted features to estimate chessratings and also the first to output a rating prediction for each move. Ourmethod highlights the potential of using move-based rating estimation forenhancing rating systems and potentially other applications such as cheatingdetection.
目前的等级分系统是逐步更新等级分的,可能并不总能准确反映棋手在任何时候的真实实力,特别是对于进步很快的棋手或非常生疏的棋手。为了克服这一问题,我们探索了一种直接从棋谱和时钟时间估算棋手等级分的方法。我们从 Lichess 中编译了一个基准数据集,其中包含各种时间控制,包括移动序列和时钟时间。我们的模型架构包括一个学习位置特征的 CNN,然后将其与时钟时间数据整合到一个双向 LSTM 中,预测每次移动后的棋手评分。该模型在测试数据中取得了 182 个评分点的 MAE。此外,我们还将模型应用于 2024 年 IEEE 大数据杯国际象棋谜题难度竞赛数据集,预测谜题评级并取得了具有竞争力的结果。我们的方法凸显了使用基于棋步的评分估算来提高评分系统以及其他潜在应用(如作弊检测)的潜力。
{"title":"Chess Rating Estimation from Moves and Clock Times Using a CNN-LSTM","authors":"Michael Omori, Prasad Tadepalli","doi":"arxiv-2409.11506","DOIUrl":"https://doi.org/arxiv-2409.11506","url":null,"abstract":"Current rating systems update ratings incrementally and may not always\u0000accurately reflect a player's true strength at all times, especially for\u0000rapidly improving players or very rusty players. To overcome this, we explore a\u0000method to estimate player ratings directly from game moves and clock times. We\u0000compiled a benchmark dataset from Lichess, encompassing various time controls\u0000and including move sequences and clock times. Our model architecture comprises\u0000a CNN to learn positional features, which are then integrated with clock-time\u0000data into a bidirectional LSTM, predicting player ratings after each move. The\u0000model achieved an MAE of 182 rating points in the test data. Additionally, we\u0000applied our model to the 2024 IEEE Big Data Cup Chess Puzzle Difficulty\u0000Competition dataset, predicted puzzle ratings and achieved competitive results.\u0000This model is the first to use no hand-crafted features to estimate chess\u0000ratings and also the first to output a rating prediction for each move. Our\u0000method highlights the potential of using move-based rating estimation for\u0000enhancing rating systems and potentially other applications such as cheating\u0000detection.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
D2Vformer: A Flexible Time Series Prediction Model Based on Time Position Embedding D2Vformer:基于时间位置嵌入的灵活时间序列预测模型
Pub Date : 2024-09-17 DOI: arxiv-2409.11024
Xiaobao Song, Hao Wang, Liwei Deng, Yuxin He, Wenming Cao, Chi-Sing Leungc
Time position embeddings capture the positional information of time steps,often serving as auxiliary inputs to enhance the predictive capabilities oftime series models. However, existing models exhibit limitations in capturingintricate time positional information and effectively utilizing theseembeddings. To address these limitations, this paper proposes a novel modelcalled D2Vformer. Unlike typical prediction methods that rely on RNNs orTransformers, this approach can directly handle scenarios where the predictedsequence is not adjacent to the input sequence or where its length dynamicallychanges. In comparison to conventional methods, D2Vformer undoubtedly saves asignificant amount of training resources. In D2Vformer, the Date2Vec moduleuses the timestamp information and feature sequences to generate time positionembeddings. Afterward, D2Vformer introduces a new fusion block that utilizes anattention mechanism to explore the similarity in time positions between theembeddings of the input sequence and the predicted sequence, thereby generatingpredictions based on this similarity. Through extensive experiments on sixdatasets, we demonstrate that Date2Vec outperforms other time positionembedding methods, and D2Vformer surpasses state-of-the-art methods in bothfixed-length and variable-length prediction tasks.
时间位置嵌入捕捉时间步长的位置信息,通常作为辅助输入来增强时间序列模型的预测能力。然而,现有模型在捕捉错综复杂的时间位置信息和有效利用这些嵌入方面表现出局限性。为了解决这些局限性,本文提出了一种名为 D2Vformer 的新型模型。与依赖 RNN 或变换器的典型预测方法不同,这种方法可以直接处理预测序列与输入序列不相邻或其长度动态变化的情况。与传统方法相比,D2Vformer 无疑节省了大量的训练资源。在 D2Vformer 中,Date2Vec 模块利用时间戳信息和特征序列生成时间位置嵌套。之后,D2Vformer 引入了一个新的融合模块,利用注意力机制来探索输入序列嵌入和预测序列嵌入在时间位置上的相似性,从而根据这种相似性生成预测结果。通过在六个数据集上的广泛实验,我们证明了 Date2Vec 的性能优于其他时间位置嵌入方法,而 D2Vformer 在定长和变长预测任务中的性能都超过了最先进的方法。
{"title":"D2Vformer: A Flexible Time Series Prediction Model Based on Time Position Embedding","authors":"Xiaobao Song, Hao Wang, Liwei Deng, Yuxin He, Wenming Cao, Chi-Sing Leungc","doi":"arxiv-2409.11024","DOIUrl":"https://doi.org/arxiv-2409.11024","url":null,"abstract":"Time position embeddings capture the positional information of time steps,\u0000often serving as auxiliary inputs to enhance the predictive capabilities of\u0000time series models. However, existing models exhibit limitations in capturing\u0000intricate time positional information and effectively utilizing these\u0000embeddings. To address these limitations, this paper proposes a novel model\u0000called D2Vformer. Unlike typical prediction methods that rely on RNNs or\u0000Transformers, this approach can directly handle scenarios where the predicted\u0000sequence is not adjacent to the input sequence or where its length dynamically\u0000changes. In comparison to conventional methods, D2Vformer undoubtedly saves a\u0000significant amount of training resources. In D2Vformer, the Date2Vec module\u0000uses the timestamp information and feature sequences to generate time position\u0000embeddings. Afterward, D2Vformer introduces a new fusion block that utilizes an\u0000attention mechanism to explore the similarity in time positions between the\u0000embeddings of the input sequence and the predicted sequence, thereby generating\u0000predictions based on this similarity. Through extensive experiments on six\u0000datasets, we demonstrate that Date2Vec outperforms other time position\u0000embedding methods, and D2Vformer surpasses state-of-the-art methods in both\u0000fixed-length and variable-length prediction tasks.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cost-informed dimensionality reduction for structural digital twin technologies 针对结构数字孪生技术的成本导向降维技术
Pub Date : 2024-09-17 DOI: arxiv-2409.11236
Aidan J. Hughes, Keith Worden, Nikolaos Dervilis, Timothy J. Rogers
Classification models are a key component of structural digital twintechnologies used for supporting asset management decision-making. An importantconsideration when developing classification models is the dimensionality ofthe input, or feature space, used. If the dimensionality is too high, then the`curse of dimensionality' may rear its ugly head; manifesting as reducedpredictive performance. To mitigate such effects, practitioners can employdimensionality reduction techniques. The current paper formulates adecision-theoretic approach to dimensionality reduction for structural assetmanagement. In this approach, the aim is to keep incurred misclassificationcosts to a minimum, as the dimensionality is reduced and discriminatoryinformation may be lost. This formulation is constructed as an eigenvalueproblem, with separabilities between classes weighted according to the cost ofmisclassifying them when considered in the context of a decision process. Theapproach is demonstrated using a synthetic case study.
分类模型是用于支持资产管理决策的结构数字孪生技术的关键组成部分。在开发分类模型时,一个重要的考虑因素是输入或特征空间的维度。如果维度过高,"维度诅咒 "就会出现,表现为预测性能下降。为了减轻这种影响,实践者可以采用降维技术。本文提出了一种用于结构资产管理的决策理论降维方法。在这种方法中,目的是将产生的误分类成本保持在最低水平,因为维度降低了,判别信息可能会丢失。这种方法被构建为一个特征值问题,在决策过程中,类别之间的分离度根据误分类成本进行加权。该方法通过一个合成案例研究进行了演示。
{"title":"Cost-informed dimensionality reduction for structural digital twin technologies","authors":"Aidan J. Hughes, Keith Worden, Nikolaos Dervilis, Timothy J. Rogers","doi":"arxiv-2409.11236","DOIUrl":"https://doi.org/arxiv-2409.11236","url":null,"abstract":"Classification models are a key component of structural digital twin\u0000technologies used for supporting asset management decision-making. An important\u0000consideration when developing classification models is the dimensionality of\u0000the input, or feature space, used. If the dimensionality is too high, then the\u0000`curse of dimensionality' may rear its ugly head; manifesting as reduced\u0000predictive performance. To mitigate such effects, practitioners can employ\u0000dimensionality reduction techniques. The current paper formulates a\u0000decision-theoretic approach to dimensionality reduction for structural asset\u0000management. In this approach, the aim is to keep incurred misclassification\u0000costs to a minimum, as the dimensionality is reduced and discriminatory\u0000information may be lost. This formulation is constructed as an eigenvalue\u0000problem, with separabilities between classes weighted according to the cost of\u0000misclassifying them when considered in the context of a decision process. The\u0000approach is demonstrated using a synthetic case study.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FedNE: Surrogate-Assisted Federated Neighbor Embedding for Dimensionality Reduction FedNE:用于降维的代理辅助联合邻域嵌入
Pub Date : 2024-09-17 DOI: arxiv-2409.11509
Ziwei Li, Xiaoqi Wang, Hong-You Chen, Han-Wei Shen, Wei-Lun Chao
Federated learning (FL) has rapidly evolved as a promising paradigm thatenables collaborative model training across distributed participants withoutexchanging their local data. Despite its broad applications in fields such ascomputer vision, graph learning, and natural language processing, thedevelopment of a data projection model that can be effectively used tovisualize data in the context of FL is crucial yet remains heavilyunder-explored. Neighbor embedding (NE) is an essential technique forvisualizing complex high-dimensional data, but collaboratively learning a jointNE model is difficult. The key challenge lies in the objective function, aseffective visualization algorithms like NE require computing loss functionsamong pairs of data. In this paper, we introduce textsc{FedNE}, a novelapproach that integrates the textsc{FedAvg} framework with the contrastive NEtechnique, without any requirements of shareable data. To address the lack ofinter-client repulsion which is crucial for the alignment in the globalembedding space, we develop a surrogate loss function that each client learnsand shares with each other. Additionally, we propose a data-mixing strategy toaugment the local data, aiming to relax the problems of invisible neighbors andfalse neighbors constructed by the local $k$NN graphs. We conduct comprehensiveexperiments on both synthetic and real-world datasets. The results demonstratethat our textsc{FedNE} can effectively preserve the neighborhood datastructures and enhance the alignment in the global embedding space compared toseveral baseline methods.
联盟学习(Federated Learning,FL)已迅速发展成为一种前景广阔的范式,它能让分布式参与者在不改变本地数据的情况下进行协作模型训练。尽管联合学习在计算机视觉、图学习和自然语言处理等领域有着广泛的应用,但在联合学习的背景下,开发一种能有效用于可视化数据的数据投影模型至关重要,但这一问题仍未得到充分探索。邻域嵌入(NE)是将复杂的高维数据可视化的重要技术,但协同学习联合 NE 模型却很困难。关键的挑战在于目标函数,因为有效的可视化算法(如 NE)需要计算数据对之间的损失函数。在本文中,我们介绍了一种新方法--textsc{FedNE},它将textsc{FedAvg}框架与对比NE技术整合在一起,而不需要任何可共享数据。为了解决缺乏客户端间排斥的问题(这对全局嵌入空间中的配准至关重要),我们开发了一种替代损失函数,每个客户端都可以学习并相互共享该函数。此外,我们还提出了一种数据混合策略来补充本地数据,旨在放宽本地 $k$NN 图构建的隐形邻居和假邻居问题。我们在合成数据集和真实世界数据集上进行了全面的实验。结果表明,与其他基线方法相比,我们的文本{FedNE}能有效地保留邻域数据结构,并增强全局嵌入空间的对齐度。
{"title":"FedNE: Surrogate-Assisted Federated Neighbor Embedding for Dimensionality Reduction","authors":"Ziwei Li, Xiaoqi Wang, Hong-You Chen, Han-Wei Shen, Wei-Lun Chao","doi":"arxiv-2409.11509","DOIUrl":"https://doi.org/arxiv-2409.11509","url":null,"abstract":"Federated learning (FL) has rapidly evolved as a promising paradigm that\u0000enables collaborative model training across distributed participants without\u0000exchanging their local data. Despite its broad applications in fields such as\u0000computer vision, graph learning, and natural language processing, the\u0000development of a data projection model that can be effectively used to\u0000visualize data in the context of FL is crucial yet remains heavily\u0000under-explored. Neighbor embedding (NE) is an essential technique for\u0000visualizing complex high-dimensional data, but collaboratively learning a joint\u0000NE model is difficult. The key challenge lies in the objective function, as\u0000effective visualization algorithms like NE require computing loss functions\u0000among pairs of data. In this paper, we introduce textsc{FedNE}, a novel\u0000approach that integrates the textsc{FedAvg} framework with the contrastive NE\u0000technique, without any requirements of shareable data. To address the lack of\u0000inter-client repulsion which is crucial for the alignment in the global\u0000embedding space, we develop a surrogate loss function that each client learns\u0000and shares with each other. Additionally, we propose a data-mixing strategy to\u0000augment the local data, aiming to relax the problems of invisible neighbors and\u0000false neighbors constructed by the local $k$NN graphs. We conduct comprehensive\u0000experiments on both synthetic and real-world datasets. The results demonstrate\u0000that our textsc{FedNE} can effectively preserve the neighborhood data\u0000structures and enhance the alignment in the global embedding space compared to\u0000several baseline methods.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advances in APPFL: A Comprehensive and Extensible Federated Learning Framework APPFL 的进展:全面、可扩展的联合学习框架
Pub Date : 2024-09-17 DOI: arxiv-2409.11585
Zilinghan Li, Shilan He, Ze Yang, Minseok Ryu, Kibaek Kim, Ravi Madduri
Federated learning (FL) is a distributed machine learning paradigm enablingcollaborative model training while preserving data privacy. In today'slandscape, where most data is proprietary, confidential, and distributed, FLhas become a promising approach to leverage such data effectively, particularlyin sensitive domains such as medicine and the electric grid. Heterogeneity andsecurity are the key challenges in FL, however; most existing FL frameworkseither fail to address these challenges adequately or lack the flexibility toincorporate new solutions. To this end, we present the recent advances indeveloping APPFL, an extensible framework and benchmarking suite for federatedlearning, which offers comprehensive solutions for heterogeneity and securityconcerns, as well as user-friendly interfaces for integrating new algorithms oradapting to new applications. We demonstrate the capabilities of APPFL throughextensive experiments evaluating various aspects of FL, including communicationefficiency, privacy preservation, computational performance, and resourceutilization. We further highlight the extensibility of APPFL through casestudies in vertical, hierarchical, and decentralized FL. APPFL is open-sourcedat https://github.com/APPFL/APPFL.
联合学习(FL)是一种分布式机器学习范式,能够在保护数据隐私的同时进行协作式模型训练。如今,大多数数据都是专有、保密和分布式的,在这种情况下,FL 已成为有效利用这些数据的一种有前途的方法,尤其是在医学和电网等敏感领域。然而,异构性和安全性是 FL 面临的主要挑战;大多数现有的 FL 框架要么未能充分应对这些挑战,要么缺乏灵活性,无法纳入新的解决方案。为此,我们介绍了开发 APPFL 的最新进展,APPFL 是用于联合学习的可扩展框架和基准测试套件,它为异构性和安全性问题提供了全面的解决方案,并为集成新算法或适应新应用提供了友好的用户界面。我们通过广泛的实验来展示 APPFL 的能力,评估了 FL 的各个方面,包括通信效率、隐私保护、计算性能和资源利用率。我们通过垂直、分层和分散式 FL 的案例研究,进一步强调了 APPFL 的可扩展性。APPFL 在 https://github.com/APPFL/APPFL 上开源。
{"title":"Advances in APPFL: A Comprehensive and Extensible Federated Learning Framework","authors":"Zilinghan Li, Shilan He, Ze Yang, Minseok Ryu, Kibaek Kim, Ravi Madduri","doi":"arxiv-2409.11585","DOIUrl":"https://doi.org/arxiv-2409.11585","url":null,"abstract":"Federated learning (FL) is a distributed machine learning paradigm enabling\u0000collaborative model training while preserving data privacy. In today's\u0000landscape, where most data is proprietary, confidential, and distributed, FL\u0000has become a promising approach to leverage such data effectively, particularly\u0000in sensitive domains such as medicine and the electric grid. Heterogeneity and\u0000security are the key challenges in FL, however; most existing FL frameworks\u0000either fail to address these challenges adequately or lack the flexibility to\u0000incorporate new solutions. To this end, we present the recent advances in\u0000developing APPFL, an extensible framework and benchmarking suite for federated\u0000learning, which offers comprehensive solutions for heterogeneity and security\u0000concerns, as well as user-friendly interfaces for integrating new algorithms or\u0000adapting to new applications. We demonstrate the capabilities of APPFL through\u0000extensive experiments evaluating various aspects of FL, including communication\u0000efficiency, privacy preservation, computational performance, and resource\u0000utilization. We further highlight the extensibility of APPFL through case\u0000studies in vertical, hierarchical, and decentralized FL. APPFL is open-sourced\u0000at https://github.com/APPFL/APPFL.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing TinyML: The Impact of Reduced Data Acquisition Rates for Time Series Classification on Microcontrollers 优化 TinyML:降低数据采集速率对微控制器时间序列分类的影响
Pub Date : 2024-09-17 DOI: arxiv-2409.10942
Riya Samanta, Bidyut Saha, Soumya K. Ghosh, Ram Babu Roy
Tiny Machine Learning (TinyML) enables efficient, lowcost, and privacypreserving machine learning inference directly on microcontroller units (MCUs)connected to sensors. Optimizing models for these constrained environments iscrucial. This paper investigates how reducing data acquisition rates affectsTinyML models for time series classification, focusing on resource-constrained,battery operated IoT devices. By lowering data sampling frequency, we aim toreduce computational demands RAM usage, energy consumption, latency, and MACoperations by approximately fourfold while maintaining similar classificationaccuracies. Our experiments with six benchmark datasets (UCIHAR, WISDM, PAMAP2,MHEALTH, MITBIH, and PTB) showed that reducing data acquisition ratessignificantly cut energy consumption and computational load, with minimalaccuracy loss. For example, a 75% reduction in acquisition rate for MITBIH andPTB datasets led to a 60% decrease in RAM usage, 75% reduction in MACoperations, 74% decrease in latency, and 70% reduction in energy consumption,without accuracy loss. These results offer valuable insights for deployingefficient TinyML models in constrained environments.
微型机器学习(TinyML)可直接在与传感器相连的微控制器单元(MCU)上实现高效、低成本和保护隐私的机器学习推理。针对这些受限环境优化模型至关重要。本文研究了降低数据采集率如何影响用于时间序列分类的 TinyML 模型,重点关注资源受限、电池供电的物联网设备。通过降低数据采样频率,我们旨在将计算需求、内存使用、能耗、延迟和 MAC 操作降低约四倍,同时保持类似的分类精度。我们用六个基准数据集(UCIHAR、WISDM、PAMAP2、MHEALTH、MITBIH 和 PTB)进行的实验表明,降低数据采集频率可显著降低能耗和计算负荷,同时将精度损失降到最低。例如,MITBIH和PTB数据集的采集率降低了75%,RAM使用率降低了60%,MAC操作降低了75%,延迟降低了74%,能耗降低了70%,而准确度却没有降低。这些结果为在受限环境中部署高效的 TinyML 模型提供了宝贵的启示。
{"title":"Optimizing TinyML: The Impact of Reduced Data Acquisition Rates for Time Series Classification on Microcontrollers","authors":"Riya Samanta, Bidyut Saha, Soumya K. Ghosh, Ram Babu Roy","doi":"arxiv-2409.10942","DOIUrl":"https://doi.org/arxiv-2409.10942","url":null,"abstract":"Tiny Machine Learning (TinyML) enables efficient, lowcost, and privacy\u0000preserving machine learning inference directly on microcontroller units (MCUs)\u0000connected to sensors. Optimizing models for these constrained environments is\u0000crucial. This paper investigates how reducing data acquisition rates affects\u0000TinyML models for time series classification, focusing on resource-constrained,\u0000battery operated IoT devices. By lowering data sampling frequency, we aim to\u0000reduce computational demands RAM usage, energy consumption, latency, and MAC\u0000operations by approximately fourfold while maintaining similar classification\u0000accuracies. Our experiments with six benchmark datasets (UCIHAR, WISDM, PAMAP2,\u0000MHEALTH, MITBIH, and PTB) showed that reducing data acquisition rates\u0000significantly cut energy consumption and computational load, with minimal\u0000accuracy loss. For example, a 75% reduction in acquisition rate for MITBIH and\u0000PTB datasets led to a 60% decrease in RAM usage, 75% reduction in MAC\u0000operations, 74% decrease in latency, and 70% reduction in energy consumption,\u0000without accuracy loss. These results offer valuable insights for deploying\u0000efficient TinyML models in constrained environments.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Beyond LoRA: Exploring Efficient Fine-Tuning Techniques for Time Series Foundational Models 超越 LoRA:探索时间序列基础模型的高效微调技术
Pub Date : 2024-09-17 DOI: arxiv-2409.11302
Divij Gupta, Anubhav Bhatti, Surajsinh Parmar
Time Series Foundation Models (TSFMs) have recently garnered attention fortheir ability to model complex, large-scale time series data across domainssuch as retail, finance, and transportation. However, their application tosensitive, domain-specific fields like healthcare remains challenging,primarily due to the difficulty of fine-tuning these models for specialized,out-of-domain tasks with scarce publicly available datasets. In this work, weexplore the use of Parameter-Efficient Fine-Tuning (PEFT) techniques to addressthese limitations, focusing on healthcare applications, particularly ICU vitalsforecasting for sepsis patients. We introduce and evaluate two selective(BitFit and LayerNorm Tuning) and two additive (VeRA and FourierFT) PEFTtechniques on multiple configurations of the Chronos TSFM for forecasting vitalsigns of sepsis patients. Our comparative analysis demonstrates that some ofthese PEFT methods outperform LoRA in terms of parameter efficiency and domainadaptation, establishing state-of-the-art (SOTA) results in ICU vitalforecasting tasks. Interestingly, FourierFT applied to the Chronos (Tiny)variant surpasses the SOTA model while fine-tuning only 2,400 parameterscompared to the 700K parameters of the benchmark.
时间序列基础模型(TSFMs)最近因其能够对零售、金融和交通等领域的复杂、大规模时间序列数据进行建模而备受关注。然而,它们在医疗保健等敏感、特定领域的应用仍然具有挑战性,这主要是由于很难针对专业、领域外的任务以及稀缺的公开可用数据集对这些模型进行微调。在这项工作中,我们探讨了如何利用参数高效微调(PEFT)技术来解决这些局限性,重点是医疗保健应用,尤其是重症监护室脓毒症患者的生命预报。我们在 Chronos TSFM 的多种配置上引入并评估了两种选择性(BitFit 和 LayerNorm Tuning)和两种添加性(VeRA 和 FourierFT)PEFT 技术,用于预测败血症患者的生命体征。我们的比较分析表明,其中一些 PEFT 方法在参数效率和领域适应性方面优于 LoRA,在 ICU 生命体征预测任务中取得了最先进(SOTA)的结果。有趣的是,应用于 Chronos(Tiny)变体的 FourierFT 仅微调了 2,400 个参数,就超越了 SOTA 模型,而基准模型需要微调 70 万个参数。
{"title":"Beyond LoRA: Exploring Efficient Fine-Tuning Techniques for Time Series Foundational Models","authors":"Divij Gupta, Anubhav Bhatti, Surajsinh Parmar","doi":"arxiv-2409.11302","DOIUrl":"https://doi.org/arxiv-2409.11302","url":null,"abstract":"Time Series Foundation Models (TSFMs) have recently garnered attention for\u0000their ability to model complex, large-scale time series data across domains\u0000such as retail, finance, and transportation. However, their application to\u0000sensitive, domain-specific fields like healthcare remains challenging,\u0000primarily due to the difficulty of fine-tuning these models for specialized,\u0000out-of-domain tasks with scarce publicly available datasets. In this work, we\u0000explore the use of Parameter-Efficient Fine-Tuning (PEFT) techniques to address\u0000these limitations, focusing on healthcare applications, particularly ICU vitals\u0000forecasting for sepsis patients. We introduce and evaluate two selective\u0000(BitFit and LayerNorm Tuning) and two additive (VeRA and FourierFT) PEFT\u0000techniques on multiple configurations of the Chronos TSFM for forecasting vital\u0000signs of sepsis patients. Our comparative analysis demonstrates that some of\u0000these PEFT methods outperform LoRA in terms of parameter efficiency and domain\u0000adaptation, establishing state-of-the-art (SOTA) results in ICU vital\u0000forecasting tasks. Interestingly, FourierFT applied to the Chronos (Tiny)\u0000variant surpasses the SOTA model while fine-tuning only 2,400 parameters\u0000compared to the 700K parameters of the benchmark.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - CS - Machine Learning
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1