Many important physical systems can be described as the evolution of a Hamiltonian system, which has the important property of being conservative, that is, energy is conserved throughout the evolution. Physics Informed Neural Networks and in particular Hamiltonian Neural Networks have emerged as a mechanism to incorporate structural inductive bias into the NN model. By ensuring physical invariances are conserved, the models exhibit significantly better sample complexity and out-of-distribution accuracy than standard NNs. Learning the Hamiltonian as a function of its canonical variables, typically position and velocity, from sample observations of the system thus becomes a critical task in system identification and long-term prediction of system behavior. However, to truly preserve the long-run physical conservation properties of Hamiltonian systems, one must use symplectic integrators for a forward pass of the system's simulation. While symplectic schemes have been used in the literature, they are thus far limited to situations when they reduce to explicit algorithms, which include the case of separable Hamiltonians or augmented non-separable Hamiltonians. We extend it to generalized non-separable Hamiltonians, and noting the self-adjoint property of symplectic integrators, we bypass computationally intensive backpropagation through an ODE solver. We show that the method is robust to noise and provides a good approximation of the system Hamiltonian when the state variables are sampled from a noisy observation. In the numerical results, we show the performance of the method concerning Hamiltonian reconstruction and conservation, indicating its particular advantage for non-separable systems.
{"title":"Learning Generalized Hamiltonians using fully Symplectic Mappings","authors":"Harsh Choudhary, Chandan Gupta, Vyacheslav kungrutsev, Melvin Leok, Georgios Korpas","doi":"arxiv-2409.11138","DOIUrl":"https://doi.org/arxiv-2409.11138","url":null,"abstract":"Many important physical systems can be described as the evolution of a\u0000Hamiltonian system, which has the important property of being conservative,\u0000that is, energy is conserved throughout the evolution. Physics Informed Neural\u0000Networks and in particular Hamiltonian Neural Networks have emerged as a\u0000mechanism to incorporate structural inductive bias into the NN model. By\u0000ensuring physical invariances are conserved, the models exhibit significantly\u0000better sample complexity and out-of-distribution accuracy than standard NNs.\u0000Learning the Hamiltonian as a function of its canonical variables, typically\u0000position and velocity, from sample observations of the system thus becomes a\u0000critical task in system identification and long-term prediction of system\u0000behavior. However, to truly preserve the long-run physical conservation\u0000properties of Hamiltonian systems, one must use symplectic integrators for a\u0000forward pass of the system's simulation. While symplectic schemes have been\u0000used in the literature, they are thus far limited to situations when they\u0000reduce to explicit algorithms, which include the case of separable Hamiltonians\u0000or augmented non-separable Hamiltonians. We extend it to generalized\u0000non-separable Hamiltonians, and noting the self-adjoint property of symplectic\u0000integrators, we bypass computationally intensive backpropagation through an ODE\u0000solver. We show that the method is robust to noise and provides a good\u0000approximation of the system Hamiltonian when the state variables are sampled\u0000from a noisy observation. In the numerical results, we show the performance of\u0000the method concerning Hamiltonian reconstruction and conservation, indicating\u0000its particular advantage for non-separable systems.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"95 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
If two agents disagree in their decisions, we may suspect they are not both correct. This intuition is formalized for evaluating agents that have carried out a binary classification task. Their agreements and disagreements on a joint test allow us to establish the only group evaluations logically consistent with their responses. This is done by establishing a set of axioms (algebraic relations) that must be universally obeyed by all evaluations of binary responders. A complete set of such axioms are possible for each ensemble of size N. The axioms for $N = 1, 2$ are used to construct a fully logical alarm - one that can prove that at least one ensemble member is malfunctioning using only unlabeled data. The similarities of this approach to formal software verification and its utility for recent agendas of safe guaranteed AI are discussed.
{"title":"A logical alarm for misaligned binary classifiers","authors":"Andrés Corrada-Emmanuel, Ilya Parker, Ramesh Bharadwaj","doi":"arxiv-2409.11052","DOIUrl":"https://doi.org/arxiv-2409.11052","url":null,"abstract":"If two agents disagree in their decisions, we may suspect they are not both\u0000correct. This intuition is formalized for evaluating agents that have carried\u0000out a binary classification task. Their agreements and disagreements on a joint\u0000test allow us to establish the only group evaluations logically consistent with\u0000their responses. This is done by establishing a set of axioms (algebraic\u0000relations) that must be universally obeyed by all evaluations of binary\u0000responders. A complete set of such axioms are possible for each ensemble of\u0000size N. The axioms for $N = 1, 2$ are used to construct a fully logical alarm -\u0000one that can prove that at least one ensemble member is malfunctioning using\u0000only unlabeled data. The similarities of this approach to formal software\u0000verification and its utility for recent agendas of safe guaranteed AI are\u0000discussed.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ensuring a safe and uncontaminated water supply is contingent upon the monitoring of water quality, especially in developing countries such as Nepal, where water sources are susceptible to pollution. This paper presents a hybrid deep learning model for predicting Nepal's seasonal water quality using a small dataset with many water quality parameters. The model integrates convolutional neural networks (CNN) and recurrent neural networks (RNN) to exploit temporal and spatial patterns in the data. The results demonstrate significant improvements in forecast accuracy over traditional methods, providing a reliable tool for proactive control of water quality. The model that used WQI parameters to classify people into good, poor, and average groups performed 92% of the time in testing. Similarly, the R2 score was 0.97 and the root mean square error was 2.87 when predicting WQI values using regression analysis. Additionally, a multifunctional application that uses both a regression and a classification approach is built to predict WQI values.
{"title":"WaterQualityNeT: Prediction of Seasonal Water Quality of Nepal Using Hybrid Deep Learning Models","authors":"Biplov Paneru, Bishwash Paneru","doi":"arxiv-2409.10898","DOIUrl":"https://doi.org/arxiv-2409.10898","url":null,"abstract":"Ensuring a safe and uncontaminated water supply is contingent upon the\u0000monitoring of water quality, especially in developing countries such as Nepal,\u0000where water sources are susceptible to pollution. This paper presents a hybrid\u0000deep learning model for predicting Nepal's seasonal water quality using a small\u0000dataset with many water quality parameters. The model integrates convolutional\u0000neural networks (CNN) and recurrent neural networks (RNN) to exploit temporal\u0000and spatial patterns in the data. The results demonstrate significant\u0000improvements in forecast accuracy over traditional methods, providing a\u0000reliable tool for proactive control of water quality. The model that used WQI\u0000parameters to classify people into good, poor, and average groups performed 92%\u0000of the time in testing. Similarly, the R2 score was 0.97 and the root mean\u0000square error was 2.87 when predicting WQI values using regression analysis.\u0000Additionally, a multifunctional application that uses both a regression and a\u0000classification approach is built to predict WQI values.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Current rating systems update ratings incrementally and may not always accurately reflect a player's true strength at all times, especially for rapidly improving players or very rusty players. To overcome this, we explore a method to estimate player ratings directly from game moves and clock times. We compiled a benchmark dataset from Lichess, encompassing various time controls and including move sequences and clock times. Our model architecture comprises a CNN to learn positional features, which are then integrated with clock-time data into a bidirectional LSTM, predicting player ratings after each move. The model achieved an MAE of 182 rating points in the test data. Additionally, we applied our model to the 2024 IEEE Big Data Cup Chess Puzzle Difficulty Competition dataset, predicted puzzle ratings and achieved competitive results. This model is the first to use no hand-crafted features to estimate chess ratings and also the first to output a rating prediction for each move. Our method highlights the potential of using move-based rating estimation for enhancing rating systems and potentially other applications such as cheating detection.
{"title":"Chess Rating Estimation from Moves and Clock Times Using a CNN-LSTM","authors":"Michael Omori, Prasad Tadepalli","doi":"arxiv-2409.11506","DOIUrl":"https://doi.org/arxiv-2409.11506","url":null,"abstract":"Current rating systems update ratings incrementally and may not always\u0000accurately reflect a player's true strength at all times, especially for\u0000rapidly improving players or very rusty players. To overcome this, we explore a\u0000method to estimate player ratings directly from game moves and clock times. We\u0000compiled a benchmark dataset from Lichess, encompassing various time controls\u0000and including move sequences and clock times. Our model architecture comprises\u0000a CNN to learn positional features, which are then integrated with clock-time\u0000data into a bidirectional LSTM, predicting player ratings after each move. The\u0000model achieved an MAE of 182 rating points in the test data. Additionally, we\u0000applied our model to the 2024 IEEE Big Data Cup Chess Puzzle Difficulty\u0000Competition dataset, predicted puzzle ratings and achieved competitive results.\u0000This model is the first to use no hand-crafted features to estimate chess\u0000ratings and also the first to output a rating prediction for each move. Our\u0000method highlights the potential of using move-based rating estimation for\u0000enhancing rating systems and potentially other applications such as cheating\u0000detection.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Time position embeddings capture the positional information of time steps, often serving as auxiliary inputs to enhance the predictive capabilities of time series models. However, existing models exhibit limitations in capturing intricate time positional information and effectively utilizing these embeddings. To address these limitations, this paper proposes a novel model called D2Vformer. Unlike typical prediction methods that rely on RNNs or Transformers, this approach can directly handle scenarios where the predicted sequence is not adjacent to the input sequence or where its length dynamically changes. In comparison to conventional methods, D2Vformer undoubtedly saves a significant amount of training resources. In D2Vformer, the Date2Vec module uses the timestamp information and feature sequences to generate time position embeddings. Afterward, D2Vformer introduces a new fusion block that utilizes an attention mechanism to explore the similarity in time positions between the embeddings of the input sequence and the predicted sequence, thereby generating predictions based on this similarity. Through extensive experiments on six datasets, we demonstrate that Date2Vec outperforms other time position embedding methods, and D2Vformer surpasses state-of-the-art methods in both fixed-length and variable-length prediction tasks.
{"title":"D2Vformer: A Flexible Time Series Prediction Model Based on Time Position Embedding","authors":"Xiaobao Song, Hao Wang, Liwei Deng, Yuxin He, Wenming Cao, Chi-Sing Leungc","doi":"arxiv-2409.11024","DOIUrl":"https://doi.org/arxiv-2409.11024","url":null,"abstract":"Time position embeddings capture the positional information of time steps,\u0000often serving as auxiliary inputs to enhance the predictive capabilities of\u0000time series models. However, existing models exhibit limitations in capturing\u0000intricate time positional information and effectively utilizing these\u0000embeddings. To address these limitations, this paper proposes a novel model\u0000called D2Vformer. Unlike typical prediction methods that rely on RNNs or\u0000Transformers, this approach can directly handle scenarios where the predicted\u0000sequence is not adjacent to the input sequence or where its length dynamically\u0000changes. In comparison to conventional methods, D2Vformer undoubtedly saves a\u0000significant amount of training resources. In D2Vformer, the Date2Vec module\u0000uses the timestamp information and feature sequences to generate time position\u0000embeddings. Afterward, D2Vformer introduces a new fusion block that utilizes an\u0000attention mechanism to explore the similarity in time positions between the\u0000embeddings of the input sequence and the predicted sequence, thereby generating\u0000predictions based on this similarity. Through extensive experiments on six\u0000datasets, we demonstrate that Date2Vec outperforms other time position\u0000embedding methods, and D2Vformer surpasses state-of-the-art methods in both\u0000fixed-length and variable-length prediction tasks.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"59 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aidan J. Hughes, Keith Worden, Nikolaos Dervilis, Timothy J. Rogers
Classification models are a key component of structural digital twin technologies used for supporting asset management decision-making. An important consideration when developing classification models is the dimensionality of the input, or feature space, used. If the dimensionality is too high, then the `curse of dimensionality' may rear its ugly head; manifesting as reduced predictive performance. To mitigate such effects, practitioners can employ dimensionality reduction techniques. The current paper formulates a decision-theoretic approach to dimensionality reduction for structural asset management. In this approach, the aim is to keep incurred misclassification costs to a minimum, as the dimensionality is reduced and discriminatory information may be lost. This formulation is constructed as an eigenvalue problem, with separabilities between classes weighted according to the cost of misclassifying them when considered in the context of a decision process. The approach is demonstrated using a synthetic case study.
{"title":"Cost-informed dimensionality reduction for structural digital twin technologies","authors":"Aidan J. Hughes, Keith Worden, Nikolaos Dervilis, Timothy J. Rogers","doi":"arxiv-2409.11236","DOIUrl":"https://doi.org/arxiv-2409.11236","url":null,"abstract":"Classification models are a key component of structural digital twin\u0000technologies used for supporting asset management decision-making. An important\u0000consideration when developing classification models is the dimensionality of\u0000the input, or feature space, used. If the dimensionality is too high, then the\u0000`curse of dimensionality' may rear its ugly head; manifesting as reduced\u0000predictive performance. To mitigate such effects, practitioners can employ\u0000dimensionality reduction techniques. The current paper formulates a\u0000decision-theoretic approach to dimensionality reduction for structural asset\u0000management. In this approach, the aim is to keep incurred misclassification\u0000costs to a minimum, as the dimensionality is reduced and discriminatory\u0000information may be lost. This formulation is constructed as an eigenvalue\u0000problem, with separabilities between classes weighted according to the cost of\u0000misclassifying them when considered in the context of a decision process. The\u0000approach is demonstrated using a synthetic case study.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Federated learning (FL) has rapidly evolved as a promising paradigm that enables collaborative model training across distributed participants without exchanging their local data. Despite its broad applications in fields such as computer vision, graph learning, and natural language processing, the development of a data projection model that can be effectively used to visualize data in the context of FL is crucial yet remains heavily under-explored. Neighbor embedding (NE) is an essential technique for visualizing complex high-dimensional data, but collaboratively learning a joint NE model is difficult. The key challenge lies in the objective function, as effective visualization algorithms like NE require computing loss functions among pairs of data. In this paper, we introduce textsc{FedNE}, a novel approach that integrates the textsc{FedAvg} framework with the contrastive NE technique, without any requirements of shareable data. To address the lack of inter-client repulsion which is crucial for the alignment in the global embedding space, we develop a surrogate loss function that each client learns and shares with each other. Additionally, we propose a data-mixing strategy to augment the local data, aiming to relax the problems of invisible neighbors and false neighbors constructed by the local $k$NN graphs. We conduct comprehensive experiments on both synthetic and real-world datasets. The results demonstrate that our textsc{FedNE} can effectively preserve the neighborhood data structures and enhance the alignment in the global embedding space compared to several baseline methods.
联盟学习(Federated Learning,FL)已迅速发展成为一种前景广阔的范式,它能让分布式参与者在不改变本地数据的情况下进行协作模型训练。尽管联合学习在计算机视觉、图学习和自然语言处理等领域有着广泛的应用,但在联合学习的背景下,开发一种能有效用于可视化数据的数据投影模型至关重要,但这一问题仍未得到充分探索。邻域嵌入(NE)是将复杂的高维数据可视化的重要技术,但协同学习联合 NE 模型却很困难。关键的挑战在于目标函数,因为有效的可视化算法(如 NE)需要计算数据对之间的损失函数。在本文中,我们介绍了一种新方法--textsc{FedNE},它将textsc{FedAvg}框架与对比NE技术整合在一起,而不需要任何可共享数据。为了解决缺乏客户端间排斥的问题(这对全局嵌入空间中的配准至关重要),我们开发了一种替代损失函数,每个客户端都可以学习并相互共享该函数。此外,我们还提出了一种数据混合策略来补充本地数据,旨在放宽本地 $k$NN 图构建的隐形邻居和假邻居问题。我们在合成数据集和真实世界数据集上进行了全面的实验。结果表明,与其他基线方法相比,我们的文本{FedNE}能有效地保留邻域数据结构,并增强全局嵌入空间的对齐度。
{"title":"FedNE: Surrogate-Assisted Federated Neighbor Embedding for Dimensionality Reduction","authors":"Ziwei Li, Xiaoqi Wang, Hong-You Chen, Han-Wei Shen, Wei-Lun Chao","doi":"arxiv-2409.11509","DOIUrl":"https://doi.org/arxiv-2409.11509","url":null,"abstract":"Federated learning (FL) has rapidly evolved as a promising paradigm that\u0000enables collaborative model training across distributed participants without\u0000exchanging their local data. Despite its broad applications in fields such as\u0000computer vision, graph learning, and natural language processing, the\u0000development of a data projection model that can be effectively used to\u0000visualize data in the context of FL is crucial yet remains heavily\u0000under-explored. Neighbor embedding (NE) is an essential technique for\u0000visualizing complex high-dimensional data, but collaboratively learning a joint\u0000NE model is difficult. The key challenge lies in the objective function, as\u0000effective visualization algorithms like NE require computing loss functions\u0000among pairs of data. In this paper, we introduce textsc{FedNE}, a novel\u0000approach that integrates the textsc{FedAvg} framework with the contrastive NE\u0000technique, without any requirements of shareable data. To address the lack of\u0000inter-client repulsion which is crucial for the alignment in the global\u0000embedding space, we develop a surrogate loss function that each client learns\u0000and shares with each other. Additionally, we propose a data-mixing strategy to\u0000augment the local data, aiming to relax the problems of invisible neighbors and\u0000false neighbors constructed by the local $k$NN graphs. We conduct comprehensive\u0000experiments on both synthetic and real-world datasets. The results demonstrate\u0000that our textsc{FedNE} can effectively preserve the neighborhood data\u0000structures and enhance the alignment in the global embedding space compared to\u0000several baseline methods.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zilinghan Li, Shilan He, Ze Yang, Minseok Ryu, Kibaek Kim, Ravi Madduri
Federated learning (FL) is a distributed machine learning paradigm enabling collaborative model training while preserving data privacy. In today's landscape, where most data is proprietary, confidential, and distributed, FL has become a promising approach to leverage such data effectively, particularly in sensitive domains such as medicine and the electric grid. Heterogeneity and security are the key challenges in FL, however; most existing FL frameworks either fail to address these challenges adequately or lack the flexibility to incorporate new solutions. To this end, we present the recent advances in developing APPFL, an extensible framework and benchmarking suite for federated learning, which offers comprehensive solutions for heterogeneity and security concerns, as well as user-friendly interfaces for integrating new algorithms or adapting to new applications. We demonstrate the capabilities of APPFL through extensive experiments evaluating various aspects of FL, including communication efficiency, privacy preservation, computational performance, and resource utilization. We further highlight the extensibility of APPFL through case studies in vertical, hierarchical, and decentralized FL. APPFL is open-sourced at https://github.com/APPFL/APPFL.
{"title":"Advances in APPFL: A Comprehensive and Extensible Federated Learning Framework","authors":"Zilinghan Li, Shilan He, Ze Yang, Minseok Ryu, Kibaek Kim, Ravi Madduri","doi":"arxiv-2409.11585","DOIUrl":"https://doi.org/arxiv-2409.11585","url":null,"abstract":"Federated learning (FL) is a distributed machine learning paradigm enabling\u0000collaborative model training while preserving data privacy. In today's\u0000landscape, where most data is proprietary, confidential, and distributed, FL\u0000has become a promising approach to leverage such data effectively, particularly\u0000in sensitive domains such as medicine and the electric grid. Heterogeneity and\u0000security are the key challenges in FL, however; most existing FL frameworks\u0000either fail to address these challenges adequately or lack the flexibility to\u0000incorporate new solutions. To this end, we present the recent advances in\u0000developing APPFL, an extensible framework and benchmarking suite for federated\u0000learning, which offers comprehensive solutions for heterogeneity and security\u0000concerns, as well as user-friendly interfaces for integrating new algorithms or\u0000adapting to new applications. We demonstrate the capabilities of APPFL through\u0000extensive experiments evaluating various aspects of FL, including communication\u0000efficiency, privacy preservation, computational performance, and resource\u0000utilization. We further highlight the extensibility of APPFL through case\u0000studies in vertical, hierarchical, and decentralized FL. APPFL is open-sourced\u0000at https://github.com/APPFL/APPFL.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Riya Samanta, Bidyut Saha, Soumya K. Ghosh, Ram Babu Roy
Tiny Machine Learning (TinyML) enables efficient, lowcost, and privacy preserving machine learning inference directly on microcontroller units (MCUs) connected to sensors. Optimizing models for these constrained environments is crucial. This paper investigates how reducing data acquisition rates affects TinyML models for time series classification, focusing on resource-constrained, battery operated IoT devices. By lowering data sampling frequency, we aim to reduce computational demands RAM usage, energy consumption, latency, and MAC operations by approximately fourfold while maintaining similar classification accuracies. Our experiments with six benchmark datasets (UCIHAR, WISDM, PAMAP2, MHEALTH, MITBIH, and PTB) showed that reducing data acquisition rates significantly cut energy consumption and computational load, with minimal accuracy loss. For example, a 75% reduction in acquisition rate for MITBIH and PTB datasets led to a 60% decrease in RAM usage, 75% reduction in MAC operations, 74% decrease in latency, and 70% reduction in energy consumption, without accuracy loss. These results offer valuable insights for deploying efficient TinyML models in constrained environments.
微型机器学习(TinyML)可直接在与传感器相连的微控制器单元(MCU)上实现高效、低成本和保护隐私的机器学习推理。针对这些受限环境优化模型至关重要。本文研究了降低数据采集率如何影响用于时间序列分类的 TinyML 模型,重点关注资源受限、电池供电的物联网设备。通过降低数据采样频率,我们旨在将计算需求、内存使用、能耗、延迟和 MAC 操作降低约四倍,同时保持类似的分类精度。我们用六个基准数据集(UCIHAR、WISDM、PAMAP2、MHEALTH、MITBIH 和 PTB)进行的实验表明,降低数据采集频率可显著降低能耗和计算负荷,同时将精度损失降到最低。例如,MITBIH和PTB数据集的采集率降低了75%,RAM使用率降低了60%,MAC操作降低了75%,延迟降低了74%,能耗降低了70%,而准确度却没有降低。这些结果为在受限环境中部署高效的 TinyML 模型提供了宝贵的启示。
{"title":"Optimizing TinyML: The Impact of Reduced Data Acquisition Rates for Time Series Classification on Microcontrollers","authors":"Riya Samanta, Bidyut Saha, Soumya K. Ghosh, Ram Babu Roy","doi":"arxiv-2409.10942","DOIUrl":"https://doi.org/arxiv-2409.10942","url":null,"abstract":"Tiny Machine Learning (TinyML) enables efficient, lowcost, and privacy\u0000preserving machine learning inference directly on microcontroller units (MCUs)\u0000connected to sensors. Optimizing models for these constrained environments is\u0000crucial. This paper investigates how reducing data acquisition rates affects\u0000TinyML models for time series classification, focusing on resource-constrained,\u0000battery operated IoT devices. By lowering data sampling frequency, we aim to\u0000reduce computational demands RAM usage, energy consumption, latency, and MAC\u0000operations by approximately fourfold while maintaining similar classification\u0000accuracies. Our experiments with six benchmark datasets (UCIHAR, WISDM, PAMAP2,\u0000MHEALTH, MITBIH, and PTB) showed that reducing data acquisition rates\u0000significantly cut energy consumption and computational load, with minimal\u0000accuracy loss. For example, a 75% reduction in acquisition rate for MITBIH and\u0000PTB datasets led to a 60% decrease in RAM usage, 75% reduction in MAC\u0000operations, 74% decrease in latency, and 70% reduction in energy consumption,\u0000without accuracy loss. These results offer valuable insights for deploying\u0000efficient TinyML models in constrained environments.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Time Series Foundation Models (TSFMs) have recently garnered attention for their ability to model complex, large-scale time series data across domains such as retail, finance, and transportation. However, their application to sensitive, domain-specific fields like healthcare remains challenging, primarily due to the difficulty of fine-tuning these models for specialized, out-of-domain tasks with scarce publicly available datasets. In this work, we explore the use of Parameter-Efficient Fine-Tuning (PEFT) techniques to address these limitations, focusing on healthcare applications, particularly ICU vitals forecasting for sepsis patients. We introduce and evaluate two selective (BitFit and LayerNorm Tuning) and two additive (VeRA and FourierFT) PEFT techniques on multiple configurations of the Chronos TSFM for forecasting vital signs of sepsis patients. Our comparative analysis demonstrates that some of these PEFT methods outperform LoRA in terms of parameter efficiency and domain adaptation, establishing state-of-the-art (SOTA) results in ICU vital forecasting tasks. Interestingly, FourierFT applied to the Chronos (Tiny) variant surpasses the SOTA model while fine-tuning only 2,400 parameters compared to the 700K parameters of the benchmark.
{"title":"Beyond LoRA: Exploring Efficient Fine-Tuning Techniques for Time Series Foundational Models","authors":"Divij Gupta, Anubhav Bhatti, Surajsinh Parmar","doi":"arxiv-2409.11302","DOIUrl":"https://doi.org/arxiv-2409.11302","url":null,"abstract":"Time Series Foundation Models (TSFMs) have recently garnered attention for\u0000their ability to model complex, large-scale time series data across domains\u0000such as retail, finance, and transportation. However, their application to\u0000sensitive, domain-specific fields like healthcare remains challenging,\u0000primarily due to the difficulty of fine-tuning these models for specialized,\u0000out-of-domain tasks with scarce publicly available datasets. In this work, we\u0000explore the use of Parameter-Efficient Fine-Tuning (PEFT) techniques to address\u0000these limitations, focusing on healthcare applications, particularly ICU vitals\u0000forecasting for sepsis patients. We introduce and evaluate two selective\u0000(BitFit and LayerNorm Tuning) and two additive (VeRA and FourierFT) PEFT\u0000techniques on multiple configurations of the Chronos TSFM for forecasting vital\u0000signs of sepsis patients. Our comparative analysis demonstrates that some of\u0000these PEFT methods outperform LoRA in terms of parameter efficiency and domain\u0000adaptation, establishing state-of-the-art (SOTA) results in ICU vital\u0000forecasting tasks. Interestingly, FourierFT applied to the Chronos (Tiny)\u0000variant surpasses the SOTA model while fine-tuning only 2,400 parameters\u0000compared to the 700K parameters of the benchmark.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}