arXiv - CS - Machine Learning最新文献_第10页

Recurrent Aggregators in Neural Algorithmic Reasoning 神经算法推理中的循环聚合器

arXiv - CS - Machine Learning

Pub Date : 2024-09-11 DOI: arxiv-2409.07154

Kaijia Xu, Petar Veličković

Neural algorithmic reasoning (NAR) is an emerging field that seeks to designneural networks that mimic classical algorithmic computations. Today, graphneural networks (GNNs) are widely used in neural algorithmic reasoners due totheir message passing framework and permutation equivariance. In this extendedabstract, we challenge this design choice, and replace the equivariantaggregation function with a recurrent neural network. While seeminglycounter-intuitive, this approach has appropriate grounding when nodes have anatural ordering -- and this is the case frequently in established reasoningbenchmarks like CLRS-30. Indeed, our recurrent NAR (RNAR) model performs verystrongly on such tasks, while handling many others gracefully. A notableachievement of RNAR is its decisive state-of-the-art result on the Heapsort andQuickselect tasks, both deemed as a significant challenge for contemporaryneural algorithmic reasoners -- especially the latter, where RNAR achieves amean micro-F1 score of 87%.

神经算法推理（NAR）是一个新兴领域，旨在设计能模拟经典算法计算的神经网络。如今，图神经网络（GNN）因其消息传递框架和包络等差性而被广泛应用于神经算法推理中。在这篇扩展摘要中，我们对这种设计选择提出了质疑，并用递归神经网络取代了等变聚集函数。虽然看似有违直觉，但当节点具有自然排序时，这种方法就有了适当的基础--在 CLRS-30 等成熟的推理基准中，这种情况经常出现。事实上，我们的递归 NAR（RNAR）模型在此类任务中表现非常出色，同时还能优雅地处理许多其他任务。RNAR 的一个显著成就是它在 Heapsort 和Quickselect 任务上取得了决定性的一流成绩，这两项任务都被认为是对当代神经算法推理器的重大挑战，尤其是后者，RNAR 的平均 micro-F1 得分为 87%。

引用次数: 0

Ensemble Methods for Sequence Classification with Hidden Markov Models 利用隐马尔可夫模型进行序列分类的集合方法

arXiv - CS - Machine Learning

Pub Date : 2024-09-11 DOI: arxiv-2409.07619

Maxime Kawawa-Beaudan, Srijan Sood, Soham Palande, Ganapathy Mani, Tucker Balch, Manuela Veloso

We present a lightweight approach to sequence classification using EnsembleMethods for Hidden Markov Models (HMMs). HMMs offer significant advantages inscenarios with imbalanced or smaller datasets due to their simplicity,interpretability, and efficiency. These models are particularly effective indomains such as finance and biology, where traditional methods struggle withhigh feature dimensionality and varied sequence lengths. Our ensemble-basedscoring method enables the comparison of sequences of any length and improvesperformance on imbalanced datasets. This study focuses on the binary classification problem, particularly inscenarios with data imbalance, where the negative class is the majority (e.g.,normal data) and the positive class is the minority (e.g., anomalous data),often with extreme distribution skews. We propose a novel training approach forHMM Ensembles that generalizes to multi-class problems and supportsclassification and anomaly detection. Our method fits class-specific groups ofdiverse models using random data subsets, and compares likelihoods acrossclasses to produce composite scores, achieving high average precisions andAUCs. In addition, we compare our approach with neural network-based methods suchas Convolutional Neural Networks (CNNs) and Long Short-Term Memory networks(LSTMs), highlighting the efficiency and robustness of HMMs in data-scarceenvironments. Motivated by real-world use cases, our method demonstrates robustperformance across various benchmarks, offering a flexible framework fordiverse applications.

我们提出了一种使用隐马尔可夫模型（HMM）集合方法进行序列分类的轻量级方法。HMM 因其简单性、可解释性和高效性，在数据集不平衡或较小的情况下具有显著优势。这些模型在金融和生物等领域尤为有效，因为这些领域的传统方法难以应对高特征维度和不同序列长度的问题。我们基于集合的评分方法可以比较任何长度的序列，并提高在不平衡数据集上的性能。本研究重点关注二元分类问题，尤其是在数据不平衡的情况下，即阴性类占多数（如正常数据），而阳性类占少数（如异常数据），通常会有极端的分布倾斜。我们提出了一种新颖的 HMM Ensembles 训练方法，该方法适用于多类问题，并支持分类和异常检测。我们的方法使用随机数据子集拟合不同模型的特定类别组，并比较不同类别之间的似然性以产生综合分数，从而获得较高的平均精确度和AUC。此外，我们还将我们的方法与卷积神经网络（CNN）和长短期记忆网络（LSTM）等基于神经网络的方法进行了比较，突出了 HMM 在数据稀缺环境中的效率和鲁棒性。在实际应用案例的激励下，我们的方法在各种基准测试中都表现出了强劲的性能，为各种应用提供了灵活的框架。

{"title":"Ensemble Methods for Sequence Classification with Hidden Markov Models","authors":"Maxime Kawawa-Beaudan, Srijan Sood, Soham Palande, Ganapathy Mani, Tucker Balch, Manuela Veloso","doi":"arxiv-2409.07619","DOIUrl":"https://doi.org/arxiv-2409.07619","url":null,"abstract":"We present a lightweight approach to sequence classification using Ensemble\u0000Methods for Hidden Markov Models (HMMs). HMMs offer significant advantages in\u0000scenarios with imbalanced or smaller datasets due to their simplicity,\u0000interpretability, and efficiency. These models are particularly effective in\u0000domains such as finance and biology, where traditional methods struggle with\u0000high feature dimensionality and varied sequence lengths. Our ensemble-based\u0000scoring method enables the comparison of sequences of any length and improves\u0000performance on imbalanced datasets. This study focuses on the binary classification problem, particularly in\u0000scenarios with data imbalance, where the negative class is the majority (e.g.,\u0000normal data) and the positive class is the minority (e.g., anomalous data),\u0000often with extreme distribution skews. We propose a novel training approach for\u0000HMM Ensembles that generalizes to multi-class problems and supports\u0000classification and anomaly detection. Our method fits class-specific groups of\u0000diverse models using random data subsets, and compares likelihoods across\u0000classes to produce composite scores, achieving high average precisions and\u0000AUCs. In addition, we compare our approach with neural network-based methods such\u0000as Convolutional Neural Networks (CNNs) and Long Short-Term Memory networks\u0000(LSTMs), highlighting the efficiency and robustness of HMMs in data-scarce\u0000environments. Motivated by real-world use cases, our method demonstrates robust\u0000performance across various benchmarks, offering a flexible framework for\u0000diverse applications.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Advancing On-Device Neural Network Training with TinyPropv2: Dynamic, Sparse, and Efficient Backpropagation 利用 TinyPropv2 推进设备上的神经网络训练：动态、稀疏、高效的反向传播

arXiv - CS - Machine Learning

Pub Date : 2024-09-11 DOI: arxiv-2409.07109

Marcus Rüb, Axel Sikora, Daniel Mueller-Gritschneder

This study introduces TinyPropv2, an innovative algorithm optimized foron-device learning in deep neural networks, specifically designed for low-powermicrocontroller units. TinyPropv2 refines sparse backpropagation by dynamicallyadjusting the level of sparsity, including the ability to selectively skiptraining steps. This feature significantly lowers computational effort withoutsubstantially compromising accuracy. Our comprehensive evaluation acrossdiverse datasets CIFAR 10, CIFAR100, Flower, Food, Speech Command, MNIST, HAR,and DCASE2020 reveals that TinyPropv2 achieves near-parity with full trainingmethods, with an average accuracy drop of only around 1 percent in most cases.For instance, against full training, TinyPropv2's accuracy drop is minimal, forexample, only 0.82 percent on CIFAR 10 and 1.07 percent on CIFAR100. In termsof computational effort, TinyPropv2 shows a marked reduction, requiring aslittle as 10 percent of the computational effort needed for full training insome scenarios, and consistently outperforms other sparse trainingmethodologies. These findings underscore TinyPropv2's capacity to efficientlymanage computational resources while maintaining high accuracy, positioning itas an advantageous solution for advanced embedded device applications in theIoT ecosystem.

本研究介绍了 TinyPropv2，这是一种专为低功耗微控制器设计的创新算法，针对深度神经网络的设备上学习进行了优化。TinyPropv2 通过动态调整稀疏程度来完善稀疏反向传播，包括有选择地跳过训练步骤的能力。这一功能大大降低了计算量，同时也不会对准确性造成实质性影响。我们在 CIFAR 10、CIFAR 100、花卉、食品、语音命令、MNIST、HAR 和 DCASE2020 等不同数据集上进行的综合评估表明，TinyPropv2 与完全训练方法几乎达到了平分秋色的效果，在大多数情况下，平均准确率下降幅度只有 1% 左右。在计算工作量方面，TinyPropv2 显示出明显的降低，在某些情况下只需要完全训练所需的 10%，并且一直优于其他稀疏训练方法。这些发现强调了 TinyPropv2 在保持高精度的同时有效管理计算资源的能力，使其成为物联网生态系统中先进嵌入式设备应用的有利解决方案。

{"title":"Advancing On-Device Neural Network Training with TinyPropv2: Dynamic, Sparse, and Efficient Backpropagation","authors":"Marcus Rüb, Axel Sikora, Daniel Mueller-Gritschneder","doi":"arxiv-2409.07109","DOIUrl":"https://doi.org/arxiv-2409.07109","url":null,"abstract":"This study introduces TinyPropv2, an innovative algorithm optimized for\u0000on-device learning in deep neural networks, specifically designed for low-power\u0000microcontroller units. TinyPropv2 refines sparse backpropagation by dynamically\u0000adjusting the level of sparsity, including the ability to selectively skip\u0000training steps. This feature significantly lowers computational effort without\u0000substantially compromising accuracy. Our comprehensive evaluation across\u0000diverse datasets CIFAR 10, CIFAR100, Flower, Food, Speech Command, MNIST, HAR,\u0000and DCASE2020 reveals that TinyPropv2 achieves near-parity with full training\u0000methods, with an average accuracy drop of only around 1 percent in most cases.\u0000For instance, against full training, TinyPropv2's accuracy drop is minimal, for\u0000example, only 0.82 percent on CIFAR 10 and 1.07 percent on CIFAR100. In terms\u0000of computational effort, TinyPropv2 shows a marked reduction, requiring as\u0000little as 10 percent of the computational effort needed for full training in\u0000some scenarios, and consistently outperforms other sparse training\u0000methodologies. These findings underscore TinyPropv2's capacity to efficiently\u0000manage computational resources while maintaining high accuracy, positioning it\u0000as an advantageous solution for advanced embedded device applications in the\u0000IoT ecosystem.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Survey of Inverse Constrained Reinforcement Learning: Definitions, Progress and Challenges 反约束强化学习概览：定义、进展与挑战

arXiv - CS - Machine Learning

Pub Date : 2024-09-11 DOI: arxiv-2409.07569

Guiliang Liu, Sheng Xu, Shicheng Liu, Ashish Gaurav, Sriram Ganapathi Subramanian, Pascal Poupart

Inverse Constrained Reinforcement Learning (ICRL) is the task of inferringthe implicit constraints followed by expert agents from their demonstrationdata. As an emerging research topic, ICRL has received considerable attentionin recent years. This article presents a categorical survey of the latestadvances in ICRL. It serves as a comprehensive reference for machine learningresearchers and practitioners, as well as starters seeking to comprehend thedefinitions, advancements, and important challenges in ICRL. We begin byformally defining the problem and outlining the algorithmic framework thatfacilitates constraint inference across various scenarios. These includedeterministic or stochastic environments, environments with limiteddemonstrations, and multiple agents. For each context, we illustrate thecritical challenges and introduce a series of fundamental methods to tacklethese issues. This survey encompasses discrete, virtual, and realisticenvironments for evaluating ICRL agents. We also delve into the most pertinentapplications of ICRL, such as autonomous driving, robot control, and sportsanalytics. To stimulate continuing research, we conclude the survey with adiscussion of key unresolved questions in ICRL that can effectively foster abridge between theoretical understanding and practical industrial applications.

反约束强化学习（ICRL）是一项从专家代理的演示数据中推断出其遵循的隐式约束的任务。作为一个新兴的研究课题，ICRL 近年来受到了广泛关注。本文分类介绍了 ICRL 的最新进展。对于机器学习研究人员和从业人员，以及希望了解 ICRL 的定义、进展和重要挑战的初学者来说，它是一份全面的参考资料。我们首先对问题进行了正式定义，并概述了有助于在各种情况下进行约束推理的算法框架。这些场景包括确定性或随机环境、演示有限的环境以及多个代理。针对每种情况，我们都说明了关键挑战，并介绍了一系列解决这些问题的基本方法。这项调查涵盖了用于评估 ICRL 代理的离散、虚拟和现实环境。我们还深入探讨了 ICRL 最相关的应用，如自动驾驶、机器人控制和体育分析。为了激励继续研究，我们在调查的最后讨论了 ICRL 中尚未解决的关键问题，这些问题可以有效促进理论理解与实际工业应用之间的衔接。

{"title":"A Survey of Inverse Constrained Reinforcement Learning: Definitions, Progress and Challenges","authors":"Guiliang Liu, Sheng Xu, Shicheng Liu, Ashish Gaurav, Sriram Ganapathi Subramanian, Pascal Poupart","doi":"arxiv-2409.07569","DOIUrl":"https://doi.org/arxiv-2409.07569","url":null,"abstract":"Inverse Constrained Reinforcement Learning (ICRL) is the task of inferring\u0000the implicit constraints followed by expert agents from their demonstration\u0000data. As an emerging research topic, ICRL has received considerable attention\u0000in recent years. This article presents a categorical survey of the latest\u0000advances in ICRL. It serves as a comprehensive reference for machine learning\u0000researchers and practitioners, as well as starters seeking to comprehend the\u0000definitions, advancements, and important challenges in ICRL. We begin by\u0000formally defining the problem and outlining the algorithmic framework that\u0000facilitates constraint inference across various scenarios. These include\u0000deterministic or stochastic environments, environments with limited\u0000demonstrations, and multiple agents. For each context, we illustrate the\u0000critical challenges and introduce a series of fundamental methods to tackle\u0000these issues. This survey encompasses discrete, virtual, and realistic\u0000environments for evaluating ICRL agents. We also delve into the most pertinent\u0000applications of ICRL, such as autonomous driving, robot control, and sports\u0000analytics. To stimulate continuing research, we conclude the survey with a\u0000discussion of key unresolved questions in ICRL that can effectively foster a\u0000bridge between theoretical understanding and practical industrial applications.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

STAND: Data-Efficient and Self-Aware Precondition Induction for Interactive Task Learning STAND：针对交互式任务学习的数据高效和自我意识前提条件诱导

arXiv - CS - Machine Learning

Pub Date : 2024-09-11 DOI: arxiv-2409.07653

Daniel Weitekamp, Kenneth Koedinger

STAND is a data-efficient and computationally efficient machine learningapproach that produces better classification accuracy than popular approacheslike XGBoost on small-data tabular classification problems like learning rulepreconditions from interactive training. STAND accounts for a complete set ofgood candidate generalizations instead of selecting a single generalization bybreaking ties randomly. STAND can use any greedy concept construction strategy,like decision tree learning or sequential covering, and build a structure thatapproximates a version space over disjunctive normal logical statements. Unlikecandidate elimination approaches to version-space learning, STAND does notsuffer from issues of version-space collapse from noisy data nor is itrestricted to learning strictly conjunctive concepts. More importantly, STANDcan produce a measure called instance certainty that can predict increases inholdout set performance and has high utility as an active-learning heuristic.Instance certainty enables STAND to be self-aware of its own learning: it knowswhen it learns and what example will help it learn the most. We illustrate thatinstance certainty has desirable properties that can help users select nexttraining problems, and estimate when training is complete in applications whereusers interactively teach an AI a complex program.

STAND 是一种数据效率高、计算效率高的机器学习方法，与 XGBoost 等流行方法相比，它在小数据表格分类问题（如从交互式训练中学习规则条件）上的分类准确率更高。STAND 考虑了一整套良好的候选概括，而不是通过随机断开并列关系来选择单一概括。STAND 可以使用任何贪婪概念构建策略（如决策树学习或顺序覆盖），并构建一个近似于非结正则逻辑语句版本空间的结构。与版本空间学习中的候选消除方法不同，STAND 不存在版本空间因噪声数据而崩溃的问题，也不局限于学习严格的连接概念。更重要的是，STAND 能够产生一种称为实例确定性的度量，这种度量可以预测holdout集性能的提高，并且作为一种主动学习启发式具有很高的实用性。我们说明了实例确定性具有理想的特性，可以帮助用户选择下一个训练问题，并在用户交互式地向人工智能教授复杂程序的应用中估计训练何时完成。

{"title":"STAND: Data-Efficient and Self-Aware Precondition Induction for Interactive Task Learning","authors":"Daniel Weitekamp, Kenneth Koedinger","doi":"arxiv-2409.07653","DOIUrl":"https://doi.org/arxiv-2409.07653","url":null,"abstract":"STAND is a data-efficient and computationally efficient machine learning\u0000approach that produces better classification accuracy than popular approaches\u0000like XGBoost on small-data tabular classification problems like learning rule\u0000preconditions from interactive training. STAND accounts for a complete set of\u0000good candidate generalizations instead of selecting a single generalization by\u0000breaking ties randomly. STAND can use any greedy concept construction strategy,\u0000like decision tree learning or sequential covering, and build a structure that\u0000approximates a version space over disjunctive normal logical statements. Unlike\u0000candidate elimination approaches to version-space learning, STAND does not\u0000suffer from issues of version-space collapse from noisy data nor is it\u0000restricted to learning strictly conjunctive concepts. More importantly, STAND\u0000can produce a measure called instance certainty that can predict increases in\u0000holdout set performance and has high utility as an active-learning heuristic.\u0000Instance certainty enables STAND to be self-aware of its own learning: it knows\u0000when it learns and what example will help it learn the most. We illustrate that\u0000instance certainty has desirable properties that can help users select next\u0000training problems, and estimate when training is complete in applications where\u0000users interactively teach an AI a complex program.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"85 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Neural Algorithmic Reasoning with Multiple Correct Solutions 具有多个正确解决方案的神经算法推理

arXiv - CS - Machine Learning

Pub Date : 2024-09-11 DOI: arxiv-2409.06953

Zeno Kujawa, John Poole, Dobrik Georgiev, Danilo Numeroso, Pietro Liò

Neural Algorithmic Reasoning (NAR) aims to optimize classical algorithms.However, canonical implementations of NAR train neural networks to return onlya single solution, even when there are multiple correct solutions to a problem,such as single-source shortest paths. For some applications, it is desirable torecover more than one correct solution. To that end, we give the first methodfor NAR with multiple solutions. We demonstrate our method on two classicalalgorithms: Bellman-Ford (BF) and Depth-First Search (DFS), favouring deeperinsight into two algorithms over a broader survey of algorithms. This methodinvolves generating appropriate training data as well as sampling andvalidating solutions from model output. Each step of our method, which canserve as a framework for neural algorithmic reasoning beyond the taskspresented in this paper, might be of independent interest to the field and ourresults represent the first attempt at this task in the NAR literature.

神经算法推理（NAR）旨在优化经典算法。然而，NAR 的典型实现训练神经网络只返回单一解，即使问题有多个正确解，如单源最短路径。对于某些应用来说，最好能恢复不止一个正确解。为此，我们给出了第一种多解 NAR 方法。我们在两个经典算法上演示了我们的方法：贝尔曼-福德算法（Bellman-Ford，BF）和深度优先搜索算法（Depth-First Search，DFS）。这种方法涉及生成适当的训练数据，以及从模型输出中采样和验证解决方案。我们方法的每一步都可以作为神经算法推理的框架，超越本文提出的任务，可能会引起该领域的独立兴趣，而我们的结果代表了 NAR 文献中对这一任务的首次尝试。

引用次数: 0

What is the Right Notion of Distance between Predict-then-Optimize Tasks? 什么是 "先预测后优化 "任务间距离的正确概念？

arXiv - CS - Machine Learning

Pub Date : 2024-09-11 DOI: arxiv-2409.06997

Paula Rodriguez-Diaz, Lingkai Kong, Kai Wang, David Alvarez-Melis, Milind Tambe

Comparing datasets is a fundamental task in machine learning, essential forvarious learning paradigms; from evaluating train and test datasets for modelgeneralization to using dataset similarity for detecting data drift. Whiletraditional notions of dataset distances offer principled measures ofsimilarity, their utility has largely been assessed through prediction errorminimization. However, in Predict-then-Optimize (PtO) frameworks, wherepredictions serve as inputs for downstream optimization tasks, modelperformance is measured through decision regret minimization rather thanprediction error minimization. In this work, we (i) show that traditionaldataset distances, which rely solely on feature and label dimensions, lackinformativeness in the PtO context, and (ii) propose a new dataset distancethat incorporates the impacts of downstream decisions. Our results show thatthis decision-aware dataset distance effectively captures adaptation success inPtO contexts, providing a PtO adaptation bound in terms of dataset distance.Empirically, we show that our proposed distance measure accurately predictstransferability across three different PtO tasks from the literature.

比较数据集是机器学习的一项基本任务，对各种学习范式都至关重要；从评估训练数据集和测试数据集以实现模型泛化，到利用数据集相似性检测数据漂移，不一而足。虽然传统的数据集距离概念提供了原则性的相似性度量，但其效用主要是通过预测错误最小化来评估的。然而，在预测-优化（PtO）框架中，预测是下游优化任务的输入，模型性能是通过决策遗憾最小化而不是预测误差最小化来衡量的。在这项工作中，我们(i) 证明了仅依赖于特征和标签维度的传统数据集距离在 PtO 环境中缺乏信息性，(ii) 提出了一种新的数据集距离，它包含了下游决策的影响。我们的研究结果表明，这种决策感知数据集距离能有效捕捉 PtO 情境下的适应成功率，并提供了数据集距离的 PtO 适应约束。

{"title":"What is the Right Notion of Distance between Predict-then-Optimize Tasks?","authors":"Paula Rodriguez-Diaz, Lingkai Kong, Kai Wang, David Alvarez-Melis, Milind Tambe","doi":"arxiv-2409.06997","DOIUrl":"https://doi.org/arxiv-2409.06997","url":null,"abstract":"Comparing datasets is a fundamental task in machine learning, essential for\u0000various learning paradigms; from evaluating train and test datasets for model\u0000generalization to using dataset similarity for detecting data drift. While\u0000traditional notions of dataset distances offer principled measures of\u0000similarity, their utility has largely been assessed through prediction error\u0000minimization. However, in Predict-then-Optimize (PtO) frameworks, where\u0000predictions serve as inputs for downstream optimization tasks, model\u0000performance is measured through decision regret minimization rather than\u0000prediction error minimization. In this work, we (i) show that traditional\u0000dataset distances, which rely solely on feature and label dimensions, lack\u0000informativeness in the PtO context, and (ii) propose a new dataset distance\u0000that incorporates the impacts of downstream decisions. Our results show that\u0000this decision-aware dataset distance effectively captures adaptation success in\u0000PtO contexts, providing a PtO adaptation bound in terms of dataset distance.\u0000Empirically, we show that our proposed distance measure accurately predicts\u0000transferability across three different PtO tasks from the literature.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Unified Contrastive Loss for Self-Training 用于自我训练的统一对比损失

arXiv - CS - Machine Learning

Pub Date : 2024-09-11 DOI: arxiv-2409.07292

Aurelien Gauffre, Julien Horvat, Massih-Reza Amini

Self-training methods have proven to be effective in exploiting abundantunlabeled data in semi-supervised learning, particularly when labeled data isscarce. While many of these approaches rely on a cross-entropy loss function(CE), recent advances have shown that the supervised contrastive loss function(SupCon) can be more effective. Additionally, unsupervised contrastive learningapproaches have also been shown to capture high quality data representations inthe unsupervised setting. To benefit from these advantages in a semi-supervisedsetting, we propose a general framework to enhance self-training methods, whichreplaces all instances of CE losses with a unique contrastive loss. By usingclass prototypes, which are a set of class-wise trainable parameters, werecover the probability distributions of the CE setting and show a theoreticalequivalence with it. Our framework, when applied to popular self-trainingmethods, results in significant performance improvements across three differentdatasets with a limited number of labeled data. Additionally, we demonstratefurther improvements in convergence speed, transfer ability, and hyperparameterstability. The code is available aturl{https://github.com/AurelienGauffre/semisupcon/}.

事实证明，在半监督学习中，自我训练方法可以有效利用丰富的无标记数据，尤其是在标记数据稀缺的情况下。虽然这些方法中很多都依赖于交叉熵损失函数（CE），但最近的进展表明，有监督的对比损失函数（SupCon）可能更有效。此外，无监督对比学习方法也被证明可以在无监督环境下捕捉到高质量的数据表示。为了在半监督环境中受益于这些优势，我们提出了一个通用框架来增强自我训练方法，用独特的对比损失来替代所有的 CE 损失实例。通过使用类原型（即一组可训练的类参数），我们覆盖了 CE 设置的概率分布，并展示了与它的理论等价性。当我们的框架应用于流行的自训练方法时，在标注数据数量有限的三个不同数据集上，性能得到了显著提高。此外，我们还证明了收敛速度、转移能力和超参数稳定性的进一步提高。代码可在（url{https://github.com/AurelienGauffre/semisupcon/}.

{"title":"A Unified Contrastive Loss for Self-Training","authors":"Aurelien Gauffre, Julien Horvat, Massih-Reza Amini","doi":"arxiv-2409.07292","DOIUrl":"https://doi.org/arxiv-2409.07292","url":null,"abstract":"Self-training methods have proven to be effective in exploiting abundant\u0000unlabeled data in semi-supervised learning, particularly when labeled data is\u0000scarce. While many of these approaches rely on a cross-entropy loss function\u0000(CE), recent advances have shown that the supervised contrastive loss function\u0000(SupCon) can be more effective. Additionally, unsupervised contrastive learning\u0000approaches have also been shown to capture high quality data representations in\u0000the unsupervised setting. To benefit from these advantages in a semi-supervised\u0000setting, we propose a general framework to enhance self-training methods, which\u0000replaces all instances of CE losses with a unique contrastive loss. By using\u0000class prototypes, which are a set of class-wise trainable parameters, we\u0000recover the probability distributions of the CE setting and show a theoretical\u0000equivalence with it. Our framework, when applied to popular self-training\u0000methods, results in significant performance improvements across three different\u0000datasets with a limited number of labeled data. Additionally, we demonstrate\u0000further improvements in convergence speed, transfer ability, and hyperparameter\u0000stability. The code is available at\u0000url{https://github.com/AurelienGauffre/semisupcon/}.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Non-Invasive Glucose Prediction System Enhanced by Mixed Linear Models and Meta-Forests for Domain Generalization 利用混合线性模型和元森林实现领域泛化的非侵入式葡萄糖预测系统

arXiv - CS - Machine Learning

Pub Date : 2024-09-11 DOI: arxiv-2409.07308

Yuyang Sun, Panagiotis Kosmas

In this study, we present a non-invasive glucose prediction system thatintegrates Near-Infrared (NIR) spectroscopy and millimeter-wave (mm-wave)sensing. We employ a Mixed Linear Model (MixedLM) to analyze the associationbetween mm-wave frequency S_21 parameters and blood glucose levels within aheterogeneous dataset. The MixedLM method considers inter-subject variabilityand integrates multiple predictors, offering a more comprehensive analysis thantraditional correlation analysis. Additionally, we incorporate a DomainGeneralization (DG) model, Meta-forests, to effectively handle domain variancein the dataset, enhancing the model's adaptability to individual differences.Our results demonstrate promising accuracy in glucose prediction for unseensubjects, with a mean absolute error (MAE) of 17.47 mg/dL, a root mean squareerror (RMSE) of 31.83 mg/dL, and a mean absolute percentage error (MAPE) of10.88%, highlighting its potential for clinical application. This study marks asignificant step towards developing accurate, personalized, and non-invasiveglucose monitoring systems, contributing to improved diabetes management.

在本研究中，我们介绍了一种集成了近红外（NIR）光谱和毫米波（mm-wave）传感技术的无创血糖预测系统。我们采用混合线性模型（MixedLM）来分析异构数据集中毫米波频率 S_21 参数与血糖水平之间的关联。混合线性模型方法考虑了受试者之间的变异性，并整合了多个预测因子，提供了比传统相关分析更全面的分析。我们的研究结果表明，该方法对非受试者的血糖预测准确性很高，平均绝对误差（MAE）为 17.47 mg/dL，均方根误差（RMSE）为 31.83 mg/dL，平均绝对百分比误差（MAPE）为 10.88%，这突出表明该方法具有临床应用潜力。这项研究标志着在开发精确、个性化和无创葡萄糖监测系统方面迈出了重要一步，有助于改善糖尿病管理。

{"title":"Non-Invasive Glucose Prediction System Enhanced by Mixed Linear Models and Meta-Forests for Domain Generalization","authors":"Yuyang Sun, Panagiotis Kosmas","doi":"arxiv-2409.07308","DOIUrl":"https://doi.org/arxiv-2409.07308","url":null,"abstract":"In this study, we present a non-invasive glucose prediction system that\u0000integrates Near-Infrared (NIR) spectroscopy and millimeter-wave (mm-wave)\u0000sensing. We employ a Mixed Linear Model (MixedLM) to analyze the association\u0000between mm-wave frequency S_21 parameters and blood glucose levels within a\u0000heterogeneous dataset. The MixedLM method considers inter-subject variability\u0000and integrates multiple predictors, offering a more comprehensive analysis than\u0000traditional correlation analysis. Additionally, we incorporate a Domain\u0000Generalization (DG) model, Meta-forests, to effectively handle domain variance\u0000in the dataset, enhancing the model's adaptability to individual differences.\u0000Our results demonstrate promising accuracy in glucose prediction for unseen\u0000subjects, with a mean absolute error (MAE) of 17.47 mg/dL, a root mean square\u0000error (RMSE) of 31.83 mg/dL, and a mean absolute percentage error (MAPE) of\u000010.88%, highlighting its potential for clinical application. This study marks a\u0000significant step towards developing accurate, personalized, and non-invasive\u0000glucose monitoring systems, contributing to improved diabetes management.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TrialSynth: Generation of Synthetic Sequential Clinical Trial Data TrialSynth：生成合成序列临床试验数据

arXiv - CS - Machine Learning

Pub Date : 2024-09-11 DOI: arxiv-2409.07089

Chufan Gao, Mandis Beigi, Afrah Shafquat, Jacob Aptekar, Jimeng Sun

Analyzing data from past clinical trials is part of the ongoing effort tooptimize the design, implementation, and execution of new clinical trials andmore efficiently bring life-saving interventions to market. While there havebeen recent advances in the generation of static context synthetic clinicaltrial data, due to both limited patient availability and constraints imposed bypatient privacy needs, the generation of fine-grained synthetic time-sequentialclinical trial data has been challenging. Given that patient trajectories overan entire clinical trial are of high importance for optimizing trial design andefforts to prevent harmful adverse events, there is a significant need for thegeneration of high-fidelity time-sequence clinical trial data. Here weintroduce TrialSynth, a Variational Autoencoder (VAE) designed to address thespecific challenges of generating synthetic time-sequence clinical trial data.Distinct from related clinical data VAE methods, the core of our methodleverages Hawkes Processes (HP), which are particularly well-suited formodeling event-type and time gap prediction needed to capture the structure ofsequential clinical trial data. Our experiments demonstrate that TrialSynthsurpasses the performance of other comparable methods that can generatesequential clinical trial data, in terms of both fidelity and in enabling thegeneration of highly accurate event sequences across multiple real-worldsequential event datasets with small patient source populations when usingminimal external information. Notably, our empirical findings highlight thatTrialSynth not only outperforms existing clinical sequence-generating methodsbut also produces data with superior utility while empirically preservingpatient privacy.

分析过去临床试验的数据是优化新临床试验的设计、实施和执行以及更有效地将救生干预措施推向市场的持续努力的一部分。虽然最近在生成静态背景合成临床试验数据方面取得了进展，但由于患者可用性有限以及患者隐私需求的限制，生成细粒度合成时序临床试验数据一直是个挑战。鉴于患者在整个临床试验过程中的轨迹对于优化试验设计和努力预防有害不良事件非常重要，因此非常需要生成高保真时序临床试验数据。有别于相关的临床数据 VAE 方法，我们方法的核心是利用霍克斯过程（Hawkes Processes，HP），HP 特别适合对事件类型和时间间隙进行建模预测，以捕捉连续临床试验数据的结构。我们的实验证明，TrialSynths 超越了其他可生成连续临床试验数据的同类方法，无论是在保真度方面，还是在使用最少的外部信息在多个真实世界连续事件数据集上生成高精度事件序列方面，都是如此。值得注意的是，我们的实证研究结果表明，TrialSynth 不仅在性能上优于现有的临床序列生成方法，而且还能生成具有卓越实用性的数据，同时根据经验保护了患者的隐私。

{"title":"TrialSynth: Generation of Synthetic Sequential Clinical Trial Data","authors":"Chufan Gao, Mandis Beigi, Afrah Shafquat, Jacob Aptekar, Jimeng Sun","doi":"arxiv-2409.07089","DOIUrl":"https://doi.org/arxiv-2409.07089","url":null,"abstract":"Analyzing data from past clinical trials is part of the ongoing effort to\u0000optimize the design, implementation, and execution of new clinical trials and\u0000more efficiently bring life-saving interventions to market. While there have\u0000been recent advances in the generation of static context synthetic clinical\u0000trial data, due to both limited patient availability and constraints imposed by\u0000patient privacy needs, the generation of fine-grained synthetic time-sequential\u0000clinical trial data has been challenging. Given that patient trajectories over\u0000an entire clinical trial are of high importance for optimizing trial design and\u0000efforts to prevent harmful adverse events, there is a significant need for the\u0000generation of high-fidelity time-sequence clinical trial data. Here we\u0000introduce TrialSynth, a Variational Autoencoder (VAE) designed to address the\u0000specific challenges of generating synthetic time-sequence clinical trial data.\u0000Distinct from related clinical data VAE methods, the core of our method\u0000leverages Hawkes Processes (HP), which are particularly well-suited for\u0000modeling event-type and time gap prediction needed to capture the structure of\u0000sequential clinical trial data. Our experiments demonstrate that TrialSynth\u0000surpasses the performance of other comparable methods that can generate\u0000sequential clinical trial data, in terms of both fidelity and in enabling the\u0000generation of highly accurate event sequences across multiple real-world\u0000sequential event datasets with small patient source populations when using\u0000minimal external information. Notably, our empirical findings highlight that\u0000TrialSynth not only outperforms existing clinical sequence-generating methods\u0000but also produces data with superior utility while empirically preserving\u0000patient privacy.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"113 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0