Neural algorithmic reasoning (NAR) is an emerging field that seeks to design neural networks that mimic classical algorithmic computations. Today, graph neural networks (GNNs) are widely used in neural algorithmic reasoners due to their message passing framework and permutation equivariance. In this extended abstract, we challenge this design choice, and replace the equivariant aggregation function with a recurrent neural network. While seemingly counter-intuitive, this approach has appropriate grounding when nodes have a natural ordering -- and this is the case frequently in established reasoning benchmarks like CLRS-30. Indeed, our recurrent NAR (RNAR) model performs very strongly on such tasks, while handling many others gracefully. A notable achievement of RNAR is its decisive state-of-the-art result on the Heapsort and Quickselect tasks, both deemed as a significant challenge for contemporary neural algorithmic reasoners -- especially the latter, where RNAR achieves a mean micro-F1 score of 87%.
{"title":"Recurrent Aggregators in Neural Algorithmic Reasoning","authors":"Kaijia Xu, Petar Veličković","doi":"arxiv-2409.07154","DOIUrl":"https://doi.org/arxiv-2409.07154","url":null,"abstract":"Neural algorithmic reasoning (NAR) is an emerging field that seeks to design\u0000neural networks that mimic classical algorithmic computations. Today, graph\u0000neural networks (GNNs) are widely used in neural algorithmic reasoners due to\u0000their message passing framework and permutation equivariance. In this extended\u0000abstract, we challenge this design choice, and replace the equivariant\u0000aggregation function with a recurrent neural network. While seemingly\u0000counter-intuitive, this approach has appropriate grounding when nodes have a\u0000natural ordering -- and this is the case frequently in established reasoning\u0000benchmarks like CLRS-30. Indeed, our recurrent NAR (RNAR) model performs very\u0000strongly on such tasks, while handling many others gracefully. A notable\u0000achievement of RNAR is its decisive state-of-the-art result on the Heapsort and\u0000Quickselect tasks, both deemed as a significant challenge for contemporary\u0000neural algorithmic reasoners -- especially the latter, where RNAR achieves a\u0000mean micro-F1 score of 87%.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a lightweight approach to sequence classification using Ensemble Methods for Hidden Markov Models (HMMs). HMMs offer significant advantages in scenarios with imbalanced or smaller datasets due to their simplicity, interpretability, and efficiency. These models are particularly effective in domains such as finance and biology, where traditional methods struggle with high feature dimensionality and varied sequence lengths. Our ensemble-based scoring method enables the comparison of sequences of any length and improves performance on imbalanced datasets. This study focuses on the binary classification problem, particularly in scenarios with data imbalance, where the negative class is the majority (e.g., normal data) and the positive class is the minority (e.g., anomalous data), often with extreme distribution skews. We propose a novel training approach for HMM Ensembles that generalizes to multi-class problems and supports classification and anomaly detection. Our method fits class-specific groups of diverse models using random data subsets, and compares likelihoods across classes to produce composite scores, achieving high average precisions and AUCs. In addition, we compare our approach with neural network-based methods such as Convolutional Neural Networks (CNNs) and Long Short-Term Memory networks (LSTMs), highlighting the efficiency and robustness of HMMs in data-scarce environments. Motivated by real-world use cases, our method demonstrates robust performance across various benchmarks, offering a flexible framework for diverse applications.
{"title":"Ensemble Methods for Sequence Classification with Hidden Markov Models","authors":"Maxime Kawawa-Beaudan, Srijan Sood, Soham Palande, Ganapathy Mani, Tucker Balch, Manuela Veloso","doi":"arxiv-2409.07619","DOIUrl":"https://doi.org/arxiv-2409.07619","url":null,"abstract":"We present a lightweight approach to sequence classification using Ensemble\u0000Methods for Hidden Markov Models (HMMs). HMMs offer significant advantages in\u0000scenarios with imbalanced or smaller datasets due to their simplicity,\u0000interpretability, and efficiency. These models are particularly effective in\u0000domains such as finance and biology, where traditional methods struggle with\u0000high feature dimensionality and varied sequence lengths. Our ensemble-based\u0000scoring method enables the comparison of sequences of any length and improves\u0000performance on imbalanced datasets. This study focuses on the binary classification problem, particularly in\u0000scenarios with data imbalance, where the negative class is the majority (e.g.,\u0000normal data) and the positive class is the minority (e.g., anomalous data),\u0000often with extreme distribution skews. We propose a novel training approach for\u0000HMM Ensembles that generalizes to multi-class problems and supports\u0000classification and anomaly detection. Our method fits class-specific groups of\u0000diverse models using random data subsets, and compares likelihoods across\u0000classes to produce composite scores, achieving high average precisions and\u0000AUCs. In addition, we compare our approach with neural network-based methods such\u0000as Convolutional Neural Networks (CNNs) and Long Short-Term Memory networks\u0000(LSTMs), highlighting the efficiency and robustness of HMMs in data-scarce\u0000environments. Motivated by real-world use cases, our method demonstrates robust\u0000performance across various benchmarks, offering a flexible framework for\u0000diverse applications.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marcus Rüb, Axel Sikora, Daniel Mueller-Gritschneder
This study introduces TinyPropv2, an innovative algorithm optimized for on-device learning in deep neural networks, specifically designed for low-power microcontroller units. TinyPropv2 refines sparse backpropagation by dynamically adjusting the level of sparsity, including the ability to selectively skip training steps. This feature significantly lowers computational effort without substantially compromising accuracy. Our comprehensive evaluation across diverse datasets CIFAR 10, CIFAR100, Flower, Food, Speech Command, MNIST, HAR, and DCASE2020 reveals that TinyPropv2 achieves near-parity with full training methods, with an average accuracy drop of only around 1 percent in most cases. For instance, against full training, TinyPropv2's accuracy drop is minimal, for example, only 0.82 percent on CIFAR 10 and 1.07 percent on CIFAR100. In terms of computational effort, TinyPropv2 shows a marked reduction, requiring as little as 10 percent of the computational effort needed for full training in some scenarios, and consistently outperforms other sparse training methodologies. These findings underscore TinyPropv2's capacity to efficiently manage computational resources while maintaining high accuracy, positioning it as an advantageous solution for advanced embedded device applications in the IoT ecosystem.
{"title":"Advancing On-Device Neural Network Training with TinyPropv2: Dynamic, Sparse, and Efficient Backpropagation","authors":"Marcus Rüb, Axel Sikora, Daniel Mueller-Gritschneder","doi":"arxiv-2409.07109","DOIUrl":"https://doi.org/arxiv-2409.07109","url":null,"abstract":"This study introduces TinyPropv2, an innovative algorithm optimized for\u0000on-device learning in deep neural networks, specifically designed for low-power\u0000microcontroller units. TinyPropv2 refines sparse backpropagation by dynamically\u0000adjusting the level of sparsity, including the ability to selectively skip\u0000training steps. This feature significantly lowers computational effort without\u0000substantially compromising accuracy. Our comprehensive evaluation across\u0000diverse datasets CIFAR 10, CIFAR100, Flower, Food, Speech Command, MNIST, HAR,\u0000and DCASE2020 reveals that TinyPropv2 achieves near-parity with full training\u0000methods, with an average accuracy drop of only around 1 percent in most cases.\u0000For instance, against full training, TinyPropv2's accuracy drop is minimal, for\u0000example, only 0.82 percent on CIFAR 10 and 1.07 percent on CIFAR100. In terms\u0000of computational effort, TinyPropv2 shows a marked reduction, requiring as\u0000little as 10 percent of the computational effort needed for full training in\u0000some scenarios, and consistently outperforms other sparse training\u0000methodologies. These findings underscore TinyPropv2's capacity to efficiently\u0000manage computational resources while maintaining high accuracy, positioning it\u0000as an advantageous solution for advanced embedded device applications in the\u0000IoT ecosystem.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Inverse Constrained Reinforcement Learning (ICRL) is the task of inferring the implicit constraints followed by expert agents from their demonstration data. As an emerging research topic, ICRL has received considerable attention in recent years. This article presents a categorical survey of the latest advances in ICRL. It serves as a comprehensive reference for machine learning researchers and practitioners, as well as starters seeking to comprehend the definitions, advancements, and important challenges in ICRL. We begin by formally defining the problem and outlining the algorithmic framework that facilitates constraint inference across various scenarios. These include deterministic or stochastic environments, environments with limited demonstrations, and multiple agents. For each context, we illustrate the critical challenges and introduce a series of fundamental methods to tackle these issues. This survey encompasses discrete, virtual, and realistic environments for evaluating ICRL agents. We also delve into the most pertinent applications of ICRL, such as autonomous driving, robot control, and sports analytics. To stimulate continuing research, we conclude the survey with a discussion of key unresolved questions in ICRL that can effectively foster a bridge between theoretical understanding and practical industrial applications.
{"title":"A Survey of Inverse Constrained Reinforcement Learning: Definitions, Progress and Challenges","authors":"Guiliang Liu, Sheng Xu, Shicheng Liu, Ashish Gaurav, Sriram Ganapathi Subramanian, Pascal Poupart","doi":"arxiv-2409.07569","DOIUrl":"https://doi.org/arxiv-2409.07569","url":null,"abstract":"Inverse Constrained Reinforcement Learning (ICRL) is the task of inferring\u0000the implicit constraints followed by expert agents from their demonstration\u0000data. As an emerging research topic, ICRL has received considerable attention\u0000in recent years. This article presents a categorical survey of the latest\u0000advances in ICRL. It serves as a comprehensive reference for machine learning\u0000researchers and practitioners, as well as starters seeking to comprehend the\u0000definitions, advancements, and important challenges in ICRL. We begin by\u0000formally defining the problem and outlining the algorithmic framework that\u0000facilitates constraint inference across various scenarios. These include\u0000deterministic or stochastic environments, environments with limited\u0000demonstrations, and multiple agents. For each context, we illustrate the\u0000critical challenges and introduce a series of fundamental methods to tackle\u0000these issues. This survey encompasses discrete, virtual, and realistic\u0000environments for evaluating ICRL agents. We also delve into the most pertinent\u0000applications of ICRL, such as autonomous driving, robot control, and sports\u0000analytics. To stimulate continuing research, we conclude the survey with a\u0000discussion of key unresolved questions in ICRL that can effectively foster a\u0000bridge between theoretical understanding and practical industrial applications.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
STAND is a data-efficient and computationally efficient machine learning approach that produces better classification accuracy than popular approaches like XGBoost on small-data tabular classification problems like learning rule preconditions from interactive training. STAND accounts for a complete set of good candidate generalizations instead of selecting a single generalization by breaking ties randomly. STAND can use any greedy concept construction strategy, like decision tree learning or sequential covering, and build a structure that approximates a version space over disjunctive normal logical statements. Unlike candidate elimination approaches to version-space learning, STAND does not suffer from issues of version-space collapse from noisy data nor is it restricted to learning strictly conjunctive concepts. More importantly, STAND can produce a measure called instance certainty that can predict increases in holdout set performance and has high utility as an active-learning heuristic. Instance certainty enables STAND to be self-aware of its own learning: it knows when it learns and what example will help it learn the most. We illustrate that instance certainty has desirable properties that can help users select next training problems, and estimate when training is complete in applications where users interactively teach an AI a complex program.
STAND 是一种数据效率高、计算效率高的机器学习方法,与 XGBoost 等流行方法相比,它在小数据表格分类问题(如从交互式训练中学习规则条件)上的分类准确率更高。STAND 考虑了一整套良好的候选概括,而不是通过随机断开并列关系来选择单一概括。STAND 可以使用任何贪婪概念构建策略(如决策树学习或顺序覆盖),并构建一个近似于非结正则逻辑语句版本空间的结构。与版本空间学习中的候选消除方法不同,STAND 不存在版本空间因噪声数据而崩溃的问题,也不局限于学习严格的连接概念。更重要的是,STAND 能够产生一种称为实例确定性的度量,这种度量可以预测holdout集性能的提高,并且作为一种主动学习启发式具有很高的实用性。我们说明了实例确定性具有理想的特性,可以帮助用户选择下一个训练问题,并在用户交互式地向人工智能教授复杂程序的应用中估计训练何时完成。
{"title":"STAND: Data-Efficient and Self-Aware Precondition Induction for Interactive Task Learning","authors":"Daniel Weitekamp, Kenneth Koedinger","doi":"arxiv-2409.07653","DOIUrl":"https://doi.org/arxiv-2409.07653","url":null,"abstract":"STAND is a data-efficient and computationally efficient machine learning\u0000approach that produces better classification accuracy than popular approaches\u0000like XGBoost on small-data tabular classification problems like learning rule\u0000preconditions from interactive training. STAND accounts for a complete set of\u0000good candidate generalizations instead of selecting a single generalization by\u0000breaking ties randomly. STAND can use any greedy concept construction strategy,\u0000like decision tree learning or sequential covering, and build a structure that\u0000approximates a version space over disjunctive normal logical statements. Unlike\u0000candidate elimination approaches to version-space learning, STAND does not\u0000suffer from issues of version-space collapse from noisy data nor is it\u0000restricted to learning strictly conjunctive concepts. More importantly, STAND\u0000can produce a measure called instance certainty that can predict increases in\u0000holdout set performance and has high utility as an active-learning heuristic.\u0000Instance certainty enables STAND to be self-aware of its own learning: it knows\u0000when it learns and what example will help it learn the most. We illustrate that\u0000instance certainty has desirable properties that can help users select next\u0000training problems, and estimate when training is complete in applications where\u0000users interactively teach an AI a complex program.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zeno Kujawa, John Poole, Dobrik Georgiev, Danilo Numeroso, Pietro Liò
Neural Algorithmic Reasoning (NAR) aims to optimize classical algorithms. However, canonical implementations of NAR train neural networks to return only a single solution, even when there are multiple correct solutions to a problem, such as single-source shortest paths. For some applications, it is desirable to recover more than one correct solution. To that end, we give the first method for NAR with multiple solutions. We demonstrate our method on two classical algorithms: Bellman-Ford (BF) and Depth-First Search (DFS), favouring deeper insight into two algorithms over a broader survey of algorithms. This method involves generating appropriate training data as well as sampling and validating solutions from model output. Each step of our method, which can serve as a framework for neural algorithmic reasoning beyond the tasks presented in this paper, might be of independent interest to the field and our results represent the first attempt at this task in the NAR literature.
{"title":"Neural Algorithmic Reasoning with Multiple Correct Solutions","authors":"Zeno Kujawa, John Poole, Dobrik Georgiev, Danilo Numeroso, Pietro Liò","doi":"arxiv-2409.06953","DOIUrl":"https://doi.org/arxiv-2409.06953","url":null,"abstract":"Neural Algorithmic Reasoning (NAR) aims to optimize classical algorithms.\u0000However, canonical implementations of NAR train neural networks to return only\u0000a single solution, even when there are multiple correct solutions to a problem,\u0000such as single-source shortest paths. For some applications, it is desirable to\u0000recover more than one correct solution. To that end, we give the first method\u0000for NAR with multiple solutions. We demonstrate our method on two classical\u0000algorithms: Bellman-Ford (BF) and Depth-First Search (DFS), favouring deeper\u0000insight into two algorithms over a broader survey of algorithms. This method\u0000involves generating appropriate training data as well as sampling and\u0000validating solutions from model output. Each step of our method, which can\u0000serve as a framework for neural algorithmic reasoning beyond the tasks\u0000presented in this paper, might be of independent interest to the field and our\u0000results represent the first attempt at this task in the NAR literature.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Paula Rodriguez-Diaz, Lingkai Kong, Kai Wang, David Alvarez-Melis, Milind Tambe
Comparing datasets is a fundamental task in machine learning, essential for various learning paradigms; from evaluating train and test datasets for model generalization to using dataset similarity for detecting data drift. While traditional notions of dataset distances offer principled measures of similarity, their utility has largely been assessed through prediction error minimization. However, in Predict-then-Optimize (PtO) frameworks, where predictions serve as inputs for downstream optimization tasks, model performance is measured through decision regret minimization rather than prediction error minimization. In this work, we (i) show that traditional dataset distances, which rely solely on feature and label dimensions, lack informativeness in the PtO context, and (ii) propose a new dataset distance that incorporates the impacts of downstream decisions. Our results show that this decision-aware dataset distance effectively captures adaptation success in PtO contexts, providing a PtO adaptation bound in terms of dataset distance. Empirically, we show that our proposed distance measure accurately predicts transferability across three different PtO tasks from the literature.
{"title":"What is the Right Notion of Distance between Predict-then-Optimize Tasks?","authors":"Paula Rodriguez-Diaz, Lingkai Kong, Kai Wang, David Alvarez-Melis, Milind Tambe","doi":"arxiv-2409.06997","DOIUrl":"https://doi.org/arxiv-2409.06997","url":null,"abstract":"Comparing datasets is a fundamental task in machine learning, essential for\u0000various learning paradigms; from evaluating train and test datasets for model\u0000generalization to using dataset similarity for detecting data drift. While\u0000traditional notions of dataset distances offer principled measures of\u0000similarity, their utility has largely been assessed through prediction error\u0000minimization. However, in Predict-then-Optimize (PtO) frameworks, where\u0000predictions serve as inputs for downstream optimization tasks, model\u0000performance is measured through decision regret minimization rather than\u0000prediction error minimization. In this work, we (i) show that traditional\u0000dataset distances, which rely solely on feature and label dimensions, lack\u0000informativeness in the PtO context, and (ii) propose a new dataset distance\u0000that incorporates the impacts of downstream decisions. Our results show that\u0000this decision-aware dataset distance effectively captures adaptation success in\u0000PtO contexts, providing a PtO adaptation bound in terms of dataset distance.\u0000Empirically, we show that our proposed distance measure accurately predicts\u0000transferability across three different PtO tasks from the literature.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Self-training methods have proven to be effective in exploiting abundant unlabeled data in semi-supervised learning, particularly when labeled data is scarce. While many of these approaches rely on a cross-entropy loss function (CE), recent advances have shown that the supervised contrastive loss function (SupCon) can be more effective. Additionally, unsupervised contrastive learning approaches have also been shown to capture high quality data representations in the unsupervised setting. To benefit from these advantages in a semi-supervised setting, we propose a general framework to enhance self-training methods, which replaces all instances of CE losses with a unique contrastive loss. By using class prototypes, which are a set of class-wise trainable parameters, we recover the probability distributions of the CE setting and show a theoretical equivalence with it. Our framework, when applied to popular self-training methods, results in significant performance improvements across three different datasets with a limited number of labeled data. Additionally, we demonstrate further improvements in convergence speed, transfer ability, and hyperparameter stability. The code is available at url{https://github.com/AurelienGauffre/semisupcon/}.
事实证明,在半监督学习中,自我训练方法可以有效利用丰富的无标记数据,尤其是在标记数据稀缺的情况下。虽然这些方法中很多都依赖于交叉熵损失函数(CE),但最近的进展表明,有监督的对比损失函数(SupCon)可能更有效。此外,无监督对比学习方法也被证明可以在无监督环境下捕捉到高质量的数据表示。为了在半监督环境中受益于这些优势,我们提出了一个通用框架来增强自我训练方法,用独特的对比损失来替代所有的 CE 损失实例。通过使用类原型(即一组可训练的类参数),我们覆盖了 CE 设置的概率分布,并展示了与它的理论等价性。当我们的框架应用于流行的自训练方法时,在标注数据数量有限的三个不同数据集上,性能得到了显著提高。此外,我们还证明了收敛速度、转移能力和超参数稳定性的进一步提高。代码可在(url{https://github.com/AurelienGauffre/semisupcon/}.
{"title":"A Unified Contrastive Loss for Self-Training","authors":"Aurelien Gauffre, Julien Horvat, Massih-Reza Amini","doi":"arxiv-2409.07292","DOIUrl":"https://doi.org/arxiv-2409.07292","url":null,"abstract":"Self-training methods have proven to be effective in exploiting abundant\u0000unlabeled data in semi-supervised learning, particularly when labeled data is\u0000scarce. While many of these approaches rely on a cross-entropy loss function\u0000(CE), recent advances have shown that the supervised contrastive loss function\u0000(SupCon) can be more effective. Additionally, unsupervised contrastive learning\u0000approaches have also been shown to capture high quality data representations in\u0000the unsupervised setting. To benefit from these advantages in a semi-supervised\u0000setting, we propose a general framework to enhance self-training methods, which\u0000replaces all instances of CE losses with a unique contrastive loss. By using\u0000class prototypes, which are a set of class-wise trainable parameters, we\u0000recover the probability distributions of the CE setting and show a theoretical\u0000equivalence with it. Our framework, when applied to popular self-training\u0000methods, results in significant performance improvements across three different\u0000datasets with a limited number of labeled data. Additionally, we demonstrate\u0000further improvements in convergence speed, transfer ability, and hyperparameter\u0000stability. The code is available at\u0000url{https://github.com/AurelienGauffre/semisupcon/}.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this study, we present a non-invasive glucose prediction system that integrates Near-Infrared (NIR) spectroscopy and millimeter-wave (mm-wave) sensing. We employ a Mixed Linear Model (MixedLM) to analyze the association between mm-wave frequency S_21 parameters and blood glucose levels within a heterogeneous dataset. The MixedLM method considers inter-subject variability and integrates multiple predictors, offering a more comprehensive analysis than traditional correlation analysis. Additionally, we incorporate a Domain Generalization (DG) model, Meta-forests, to effectively handle domain variance in the dataset, enhancing the model's adaptability to individual differences. Our results demonstrate promising accuracy in glucose prediction for unseen subjects, with a mean absolute error (MAE) of 17.47 mg/dL, a root mean square error (RMSE) of 31.83 mg/dL, and a mean absolute percentage error (MAPE) of 10.88%, highlighting its potential for clinical application. This study marks a significant step towards developing accurate, personalized, and non-invasive glucose monitoring systems, contributing to improved diabetes management.
{"title":"Non-Invasive Glucose Prediction System Enhanced by Mixed Linear Models and Meta-Forests for Domain Generalization","authors":"Yuyang Sun, Panagiotis Kosmas","doi":"arxiv-2409.07308","DOIUrl":"https://doi.org/arxiv-2409.07308","url":null,"abstract":"In this study, we present a non-invasive glucose prediction system that\u0000integrates Near-Infrared (NIR) spectroscopy and millimeter-wave (mm-wave)\u0000sensing. We employ a Mixed Linear Model (MixedLM) to analyze the association\u0000between mm-wave frequency S_21 parameters and blood glucose levels within a\u0000heterogeneous dataset. The MixedLM method considers inter-subject variability\u0000and integrates multiple predictors, offering a more comprehensive analysis than\u0000traditional correlation analysis. Additionally, we incorporate a Domain\u0000Generalization (DG) model, Meta-forests, to effectively handle domain variance\u0000in the dataset, enhancing the model's adaptability to individual differences.\u0000Our results demonstrate promising accuracy in glucose prediction for unseen\u0000subjects, with a mean absolute error (MAE) of 17.47 mg/dL, a root mean square\u0000error (RMSE) of 31.83 mg/dL, and a mean absolute percentage error (MAPE) of\u000010.88%, highlighting its potential for clinical application. This study marks a\u0000significant step towards developing accurate, personalized, and non-invasive\u0000glucose monitoring systems, contributing to improved diabetes management.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chufan Gao, Mandis Beigi, Afrah Shafquat, Jacob Aptekar, Jimeng Sun
Analyzing data from past clinical trials is part of the ongoing effort to optimize the design, implementation, and execution of new clinical trials and more efficiently bring life-saving interventions to market. While there have been recent advances in the generation of static context synthetic clinical trial data, due to both limited patient availability and constraints imposed by patient privacy needs, the generation of fine-grained synthetic time-sequential clinical trial data has been challenging. Given that patient trajectories over an entire clinical trial are of high importance for optimizing trial design and efforts to prevent harmful adverse events, there is a significant need for the generation of high-fidelity time-sequence clinical trial data. Here we introduce TrialSynth, a Variational Autoencoder (VAE) designed to address the specific challenges of generating synthetic time-sequence clinical trial data. Distinct from related clinical data VAE methods, the core of our method leverages Hawkes Processes (HP), which are particularly well-suited for modeling event-type and time gap prediction needed to capture the structure of sequential clinical trial data. Our experiments demonstrate that TrialSynth surpasses the performance of other comparable methods that can generate sequential clinical trial data, in terms of both fidelity and in enabling the generation of highly accurate event sequences across multiple real-world sequential event datasets with small patient source populations when using minimal external information. Notably, our empirical findings highlight that TrialSynth not only outperforms existing clinical sequence-generating methods but also produces data with superior utility while empirically preserving patient privacy.
{"title":"TrialSynth: Generation of Synthetic Sequential Clinical Trial Data","authors":"Chufan Gao, Mandis Beigi, Afrah Shafquat, Jacob Aptekar, Jimeng Sun","doi":"arxiv-2409.07089","DOIUrl":"https://doi.org/arxiv-2409.07089","url":null,"abstract":"Analyzing data from past clinical trials is part of the ongoing effort to\u0000optimize the design, implementation, and execution of new clinical trials and\u0000more efficiently bring life-saving interventions to market. While there have\u0000been recent advances in the generation of static context synthetic clinical\u0000trial data, due to both limited patient availability and constraints imposed by\u0000patient privacy needs, the generation of fine-grained synthetic time-sequential\u0000clinical trial data has been challenging. Given that patient trajectories over\u0000an entire clinical trial are of high importance for optimizing trial design and\u0000efforts to prevent harmful adverse events, there is a significant need for the\u0000generation of high-fidelity time-sequence clinical trial data. Here we\u0000introduce TrialSynth, a Variational Autoencoder (VAE) designed to address the\u0000specific challenges of generating synthetic time-sequence clinical trial data.\u0000Distinct from related clinical data VAE methods, the core of our method\u0000leverages Hawkes Processes (HP), which are particularly well-suited for\u0000modeling event-type and time gap prediction needed to capture the structure of\u0000sequential clinical trial data. Our experiments demonstrate that TrialSynth\u0000surpasses the performance of other comparable methods that can generate\u0000sequential clinical trial data, in terms of both fidelity and in enabling the\u0000generation of highly accurate event sequences across multiple real-world\u0000sequential event datasets with small patient source populations when using\u0000minimal external information. Notably, our empirical findings highlight that\u0000TrialSynth not only outperforms existing clinical sequence-generating methods\u0000but also produces data with superior utility while empirically preserving\u0000patient privacy.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}