首页 > 最新文献

arXiv - STAT - Machine Learning最新文献

英文 中文
The Sample Complexity of Smooth Boosting and the Tightness of the Hardcore Theorem 平滑提升的采样复杂性与硬核定理的严密性
Pub Date : 2024-09-17 DOI: arxiv-2409.11597
Guy Blanc, Alexandre Hayderi, Caleb Koch, Li-Yang Tan
Smooth boosters generate distributions that do not place too much weight onany given example. Originally introduced for their noise-tolerant properties,such boosters have also found applications in differential privacy,reproducibility, and quantum learning theory. We study and settle the samplecomplexity of smooth boosting: we exhibit a class that can be weak learned to$gamma$-advantage over smooth distributions with $m$ samples, for which stronglearning over the uniform distribution requires$tilde{Omega}(1/gamma^2)cdot m$ samples. This matches the overhead ofexisting smooth boosters and provides the first separation from the setting ofdistribution-independent boosting, for which the corresponding overhead is$O(1/gamma)$. Our work also sheds new light on Impagliazzo's hardcore theorem fromcomplexity theory, all known proofs of which can be cast in the framework ofsmooth boosting. For a function $f$ that is mildly hard against size-$s$circuits, the hardcore theorem provides a set of inputs on which $f$ isextremely hard against size-$s'$ circuits. A downside of this important resultis the loss in circuit size, i.e. that $s' ll s$. Answering a question ofTrevisan, we show that this size loss is necessary and in fact, the parametersachieved by known proofs are the best possible.
平滑助推器产生的分布不会对任何给定的例子赋予过多的权重。这类助推器最初是因其噪声容忍特性而被引入的,现在也被应用于差分隐私、可重现性和量子学习理论中。我们研究并解决了平滑提升的样本复杂性问题:我们展示了一类可以在具有 $m$ 样本的平滑分布上弱学习到$gamma$-advantage,而在均匀分布上强学习需要$tilde{Omega}(1/gamma^2)cdot m$ 样本。这与现有的平滑助推器的开销相匹配,并首次与独立于分布的助推设置相分离,后者的相应开销为$O(1/gamma)$。我们的研究还为复杂性理论中的 Impagliazzo 铁杆定理带来了新的启示,所有已知的证明都可以在平滑助推器的框架内进行。对于一个对 size-$s$ 电路有轻微困难的函数 $f$,核心定理提供了一组输入,在这些输入上,$f$ 对 size-$s'$ 电路有极大的困难。这一重要结果的一个缺点是电路规模的损失,即 $s' ll s$。在回答特雷维桑的一个问题时,我们证明了这种大小损失是必要的,而且事实上,已知证明所达到的参数是可能的最佳参数。
{"title":"The Sample Complexity of Smooth Boosting and the Tightness of the Hardcore Theorem","authors":"Guy Blanc, Alexandre Hayderi, Caleb Koch, Li-Yang Tan","doi":"arxiv-2409.11597","DOIUrl":"https://doi.org/arxiv-2409.11597","url":null,"abstract":"Smooth boosters generate distributions that do not place too much weight on\u0000any given example. Originally introduced for their noise-tolerant properties,\u0000such boosters have also found applications in differential privacy,\u0000reproducibility, and quantum learning theory. We study and settle the sample\u0000complexity of smooth boosting: we exhibit a class that can be weak learned to\u0000$gamma$-advantage over smooth distributions with $m$ samples, for which strong\u0000learning over the uniform distribution requires\u0000$tilde{Omega}(1/gamma^2)cdot m$ samples. This matches the overhead of\u0000existing smooth boosters and provides the first separation from the setting of\u0000distribution-independent boosting, for which the corresponding overhead is\u0000$O(1/gamma)$. Our work also sheds new light on Impagliazzo's hardcore theorem from\u0000complexity theory, all known proofs of which can be cast in the framework of\u0000smooth boosting. For a function $f$ that is mildly hard against size-$s$\u0000circuits, the hardcore theorem provides a set of inputs on which $f$ is\u0000extremely hard against size-$s'$ circuits. A downside of this important result\u0000is the loss in circuit size, i.e. that $s' ll s$. Answering a question of\u0000Trevisan, we show that this size loss is necessary and in fact, the parameters\u0000achieved by known proofs are the best possible.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fractional Naive Bayes (FNB): non-convex optimization for a parsimonious weighted selective naive Bayes classifier 分数奈维贝叶 (FNB):针对简明加权选择性奈维贝叶分类器的非凸优化
Pub Date : 2024-09-17 DOI: arxiv-2409.11100
Carine Hue, Marc Boullé
We study supervised classification for datasets with a very large number ofinput variables. The na"ive Bayes classifier is attractive for its simplicity,scalability and effectiveness in many real data applications. When the strongna"ive Bayes assumption of conditional independence of the input variablesgiven the target variable is not valid, variable selection and model averagingare two common ways to improve the performance. In the case of the na"iveBayes classifier, the resulting weighting scheme on the models reduces to aweighting scheme on the variables. Here we focus on direct estimation ofvariable weights in such a weighted na"ive Bayes classifier. We propose asparse regularization of the model log-likelihood, which takes into accountprior penalization costs related to each input variable. Compared to averagingbased classifiers used up until now, our main goal is to obtain parsimoniousrobust models with less variables and equivalent performance. The directestimation of the variable weights amounts to a non-convex optimization problemfor which we propose and compare several two-stage algorithms. First, thecriterion obtained by convex relaxation is minimized using several variants ofstandard gradient methods. Then, the initial non-convex optimization problem issolved using local optimization methods initialized with the result of thefirst stage. The various proposed algorithms result in optimization-basedweighted na"ive Bayes classifiers, that are evaluated on benchmark datasetsand positioned w.r.t. to a reference averaging-based classifier.
我们研究了具有大量输入变量的数据集的监督分类。贝叶斯分类器因其简单性、可扩展性和在许多真实数据应用中的有效性而极具吸引力。当给定目标变量的输入变量条件独立的强贝叶斯假设不成立时,变量选择和模型平均是提高性能的两种常用方法。对于贝叶斯分类器来说,由此产生的模型加权方案可以简化为变量加权方案。在这里,我们将重点放在直接估计这种加权贝叶斯分类器中的变量权重上。我们提出了模型对数似然的稀疏正则化,它考虑了与每个输入变量相关的先前惩罚成本。与迄今为止使用的基于平均值的分类器相比,我们的主要目标是以更少的变量和同等的性能获得简洁的稳健模型。变量权重的直接估计相当于一个非凸优化问题,为此我们提出并比较了几种两阶段算法。首先,使用标准梯度法的几种变体,将凸松弛得到的标准最小化。然后,使用以第一阶段结果为初始化的局部优化方法解决初始非凸优化问题。所提出的各种算法产生了基于优化的加权贝叶斯分类器,这些分类器在基准数据集上进行了评估,并与基于参考平均的分类器进行了比较。
{"title":"Fractional Naive Bayes (FNB): non-convex optimization for a parsimonious weighted selective naive Bayes classifier","authors":"Carine Hue, Marc Boullé","doi":"arxiv-2409.11100","DOIUrl":"https://doi.org/arxiv-2409.11100","url":null,"abstract":"We study supervised classification for datasets with a very large number of\u0000input variables. The na\"ive Bayes classifier is attractive for its simplicity,\u0000scalability and effectiveness in many real data applications. When the strong\u0000na\"ive Bayes assumption of conditional independence of the input variables\u0000given the target variable is not valid, variable selection and model averaging\u0000are two common ways to improve the performance. In the case of the na\"ive\u0000Bayes classifier, the resulting weighting scheme on the models reduces to a\u0000weighting scheme on the variables. Here we focus on direct estimation of\u0000variable weights in such a weighted na\"ive Bayes classifier. We propose a\u0000sparse regularization of the model log-likelihood, which takes into account\u0000prior penalization costs related to each input variable. Compared to averaging\u0000based classifiers used up until now, our main goal is to obtain parsimonious\u0000robust models with less variables and equivalent performance. The direct\u0000estimation of the variable weights amounts to a non-convex optimization problem\u0000for which we propose and compare several two-stage algorithms. First, the\u0000criterion obtained by convex relaxation is minimized using several variants of\u0000standard gradient methods. Then, the initial non-convex optimization problem is\u0000solved using local optimization methods initialized with the result of the\u0000first stage. The various proposed algorithms result in optimization-based\u0000weighted na\"ive Bayes classifiers, that are evaluated on benchmark datasets\u0000and positioned w.r.t. to a reference averaging-based classifier.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Partially Observable Contextual Bandits with Linear Payoffs 线性报酬的部分可观测情境强盗游戏
Pub Date : 2024-09-17 DOI: arxiv-2409.11521
Sihan Zeng, Sujay Bhatt, Alec Koppel, Sumitra Ganesh
The standard contextual bandit framework assumes fully observable andactionable contexts. In this work, we consider a new bandit setting withpartially observable, correlated contexts and linear payoffs, motivated by theapplications in finance where decision making is based on market informationthat typically displays temporal correlation and is not fully observed. We makethe following contributions marrying ideas from statistical signal processingwith bandits: (i) We propose an algorithmic pipeline named EMKF-Bandit, whichintegrates system identification, filtering, and classic contextual banditalgorithms into an iterative method alternating between latent parameterestimation and decision making. (ii) We analyze EMKF-Bandit when we selectThompson sampling as the bandit algorithm and show that it incurs a sub-linearregret under conditions on filtering. (iii) We conduct numerical simulationsthat demonstrate the benefits and practical applicability of the proposedpipeline.
标准的情境强盗框架假定情境是完全可观察和可操作的。在这项工作中,我们考虑了一种新的匪帮设置,这种设置具有部分可观测、相关的上下文和线性报酬,其灵感来自金融领域的应用,在这些应用中,决策是基于市场信息做出的,而市场信息通常显示出时间相关性,并且不完全可观测。我们将统计信号处理的思想与匪帮相结合,做出了以下贡献:(i) 我们提出了一种名为 EMKF-Bandit 的算法管道,它将系统识别、滤波和经典的上下文匪帮算法集成到一种在潜在参数估计和决策制定之间交替进行的迭代方法中。(ii) 我们分析了选择汤普森采样作为匪算法时的 EMKF-Bandit,结果表明,在滤波条件下,EMKF-Bandit 会产生亚线性遗憾。(iii) 我们进行了数值模拟,证明了拟议管道的优势和实际适用性。
{"title":"Partially Observable Contextual Bandits with Linear Payoffs","authors":"Sihan Zeng, Sujay Bhatt, Alec Koppel, Sumitra Ganesh","doi":"arxiv-2409.11521","DOIUrl":"https://doi.org/arxiv-2409.11521","url":null,"abstract":"The standard contextual bandit framework assumes fully observable and\u0000actionable contexts. In this work, we consider a new bandit setting with\u0000partially observable, correlated contexts and linear payoffs, motivated by the\u0000applications in finance where decision making is based on market information\u0000that typically displays temporal correlation and is not fully observed. We make\u0000the following contributions marrying ideas from statistical signal processing\u0000with bandits: (i) We propose an algorithmic pipeline named EMKF-Bandit, which\u0000integrates system identification, filtering, and classic contextual bandit\u0000algorithms into an iterative method alternating between latent parameter\u0000estimation and decision making. (ii) We analyze EMKF-Bandit when we select\u0000Thompson sampling as the bandit algorithm and show that it incurs a sub-linear\u0000regret under conditions on filtering. (iii) We conduct numerical simulations\u0000that demonstrate the benefits and practical applicability of the proposed\u0000pipeline.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kolmogorov-Arnold Networks in Low-Data Regimes: A Comparative Study with Multilayer Perceptrons 低数据模式下的科尔莫戈罗夫-阿诺德网络:与多层感知器的比较研究
Pub Date : 2024-09-16 DOI: arxiv-2409.10463
Farhad Pourkamali-Anaraki
Multilayer Perceptrons (MLPs) have long been a cornerstone in deep learning,known for their capacity to model complex relationships. Recently,Kolmogorov-Arnold Networks (KANs) have emerged as a compelling alternative,utilizing highly flexible learnable activation functions directly on networkedges, a departure from the neuron-centric approach of MLPs. However, KANssignificantly increase the number of learnable parameters, raising concernsabout their effectiveness in data-scarce environments. This paper presents acomprehensive comparative study of MLPs and KANs from both algorithmic andexperimental perspectives, with a focus on low-data regimes. We introduce aneffective technique for designing MLPs with unique, parameterized activationfunctions for each neuron, enabling a more balanced comparison with KANs. Usingempirical evaluations on simulated data and two real-world data sets frommedicine and engineering, we explore the trade-offs between model complexityand accuracy, with particular attention to the role of network depth. Ourfindings show that MLPs with individualized activation functions achievesignificantly higher predictive accuracy with only a modest increase inparameters, especially when the sample size is limited to around one hundred.For example, in a three-class classification problem within additivemanufacturing, MLPs achieve a median accuracy of 0.91, significantlyoutperforming KANs, which only reach a median accuracy of 0.53 with defaulthyperparameters. These results offer valuable insights into the impact ofactivation function selection in neural networks.
长期以来,多层感知器(MLP)一直是深度学习的基石,因其建模复杂关系的能力而闻名。最近,Kolmogorov-Arnold 网络(KANs)作为一种引人注目的替代方案出现了,它直接在网络边上利用高度灵活的可学习激活函数,一改 MLPs 以神经元为中心的方法。然而,KAN 显著增加了可学习参数的数量,这引起了人们对其在数据稀缺环境中有效性的担忧。本文从算法和实验两个角度对 MLP 和 KAN 进行了全面的比较研究,重点关注低数据环境。我们介绍了一种设计 MLP 的有效技术,该技术为每个神经元设计了独特的参数化激活函数,从而能够与 KAN 进行更均衡的比较。通过对模拟数据以及医学和工程学两个真实世界数据集的经验评估,我们探讨了模型复杂性和准确性之间的权衡,尤其关注了网络深度的作用。我们的研究结果表明,具有个性化激活函数的 MLP 只需适度增加参数,就能显著提高预测准确率,尤其是当样本量限制在 100 个左右时。例如,在添加剂制造的三类分类问题中,MLP 的中位准确率达到了 0.91,明显优于 KAN,而 KAN 在使用默认参数时的中位准确率仅为 0.53。这些结果对神经网络中激活函数选择的影响提供了宝贵的启示。
{"title":"Kolmogorov-Arnold Networks in Low-Data Regimes: A Comparative Study with Multilayer Perceptrons","authors":"Farhad Pourkamali-Anaraki","doi":"arxiv-2409.10463","DOIUrl":"https://doi.org/arxiv-2409.10463","url":null,"abstract":"Multilayer Perceptrons (MLPs) have long been a cornerstone in deep learning,\u0000known for their capacity to model complex relationships. Recently,\u0000Kolmogorov-Arnold Networks (KANs) have emerged as a compelling alternative,\u0000utilizing highly flexible learnable activation functions directly on network\u0000edges, a departure from the neuron-centric approach of MLPs. However, KANs\u0000significantly increase the number of learnable parameters, raising concerns\u0000about their effectiveness in data-scarce environments. This paper presents a\u0000comprehensive comparative study of MLPs and KANs from both algorithmic and\u0000experimental perspectives, with a focus on low-data regimes. We introduce an\u0000effective technique for designing MLPs with unique, parameterized activation\u0000functions for each neuron, enabling a more balanced comparison with KANs. Using\u0000empirical evaluations on simulated data and two real-world data sets from\u0000medicine and engineering, we explore the trade-offs between model complexity\u0000and accuracy, with particular attention to the role of network depth. Our\u0000findings show that MLPs with individualized activation functions achieve\u0000significantly higher predictive accuracy with only a modest increase in\u0000parameters, especially when the sample size is limited to around one hundred.\u0000For example, in a three-class classification problem within additive\u0000manufacturing, MLPs achieve a median accuracy of 0.91, significantly\u0000outperforming KANs, which only reach a median accuracy of 0.53 with default\u0000hyperparameters. These results offer valuable insights into the impact of\u0000activation function selection in neural networks.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multidimensional Deconvolution with Profiling 多维解卷积与轮廓分析
Pub Date : 2024-09-16 DOI: arxiv-2409.10421
Huanbiao Zhu, Krish Desai, Mikael Kuusela, Vinicius Mikuni, Benjamin Nachman, Larry Wasserman
In many experimental contexts, it is necessary to statistically remove theimpact of instrumental effects in order to physically interpret measurements.This task has been extensively studied in particle physics, where thedeconvolution task is called unfolding. A number of recent methods have shownhow to perform high-dimensional, unbinned unfolding using machine learning.However, one of the assumptions in all of these methods is that the detectorresponse is accurately modeled in the Monte Carlo simulation. In practice, thedetector response depends on a number of nuisance parameters that can beconstrained with data. We propose a new algorithm called Profile OmniFold(POF), which works in a similar iterative manner as the OmniFold (OF) algorithmwhile being able to simultaneously profile the nuisance parameters. Weillustrate the method with a Gaussian example as a proof of concepthighlighting its promising capabilities.
在许多实验环境中,有必要从统计学角度消除仪器效应的影响,以便从物理角度解释测量结果。粒子物理学对这项任务进行了广泛研究,其中的解卷积任务被称为展开。然而,所有这些方法的假设之一是,在蒙特卡罗模拟中探测器响应被准确地建模。在实践中,探测器的响应取决于许多干扰参数,而这些参数可以用数据来约束。我们提出了一种名为 "Profile OmniFold(POF)"的新算法,它的迭代方式与 OmniFold(OF)算法类似,同时能够对干扰参数进行剖析。我们用一个高斯例子来证明这种方法的概念,突出了它的强大功能。
{"title":"Multidimensional Deconvolution with Profiling","authors":"Huanbiao Zhu, Krish Desai, Mikael Kuusela, Vinicius Mikuni, Benjamin Nachman, Larry Wasserman","doi":"arxiv-2409.10421","DOIUrl":"https://doi.org/arxiv-2409.10421","url":null,"abstract":"In many experimental contexts, it is necessary to statistically remove the\u0000impact of instrumental effects in order to physically interpret measurements.\u0000This task has been extensively studied in particle physics, where the\u0000deconvolution task is called unfolding. A number of recent methods have shown\u0000how to perform high-dimensional, unbinned unfolding using machine learning.\u0000However, one of the assumptions in all of these methods is that the detector\u0000response is accurately modeled in the Monte Carlo simulation. In practice, the\u0000detector response depends on a number of nuisance parameters that can be\u0000constrained with data. We propose a new algorithm called Profile OmniFold\u0000(POF), which works in a similar iterative manner as the OmniFold (OF) algorithm\u0000while being able to simultaneously profile the nuisance parameters. We\u0000illustrate the method with a Gaussian example as a proof of concept\u0000highlighting its promising capabilities.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variance-reduced first-order methods for deterministically constrained stochastic nonconvex optimization with strong convergence guarantees 具有强收敛保证的确定性约束随机非凸优化的方差缩小一阶方法
Pub Date : 2024-09-16 DOI: arxiv-2409.09906
Zhaosong Lu, Sanyou Mei, Yifeng Xiao
In this paper, we study a class of deterministically constrained stochasticoptimization problems. Existing methods typically aim to find an$epsilon$-stochastic stationary point, where the expected violations of boththe constraints and first-order stationarity are within a prescribed accuracyof $epsilon$. However, in many practical applications, it is crucial that theconstraints be nearly satisfied with certainty, making such an$epsilon$-stochastic stationary point potentially undesirable due to the riskof significant constraint violations. To address this issue, we proposesingle-loop variance-reduced stochastic first-order methods, where thestochastic gradient of the stochastic component is computed using either atruncated recursive momentum scheme or a truncated Polyak momentum scheme forvariance reduction, while the gradient of the deterministic component iscomputed exactly. Under the error bound condition with a parameter $theta geq1$ and other suitable assumptions, we establish that the proposed methodsachieve a sample complexity and first-order operation complexity of $widetildeO(epsilon^{-max{4, 2theta}})$ for finding a stronger $epsilon$-stochasticstationary point, where the constraint violation is within $epsilon$ withcertainty, and the expected violation of first-order stationarity is within$epsilon$. To the best of our knowledge, this is the first work to developmethods with provable complexity guarantees for finding an approximatestochastic stationary point of such problems that nearly satisfies allconstraints with certainty.
本文研究了一类确定性约束随机优化问题。现有方法通常旨在找到一个$epsilon$随机静止点,在这个点上,对约束条件和一阶静止性的预期违反都在$epsilon$的规定精度之内。然而,在许多实际应用中,约束条件必须近乎确定无疑地得到满足,这就使得这种$epsilon$-随机静止点可能不可取,因为存在严重违反约束条件的风险。为了解决这个问题,我们提出了单环方差降低随机一阶方法,其中随机分量的随机梯度使用截断递归动量方案或截断波利亚克动量方案进行方差降低计算,而确定分量的梯度则精确计算。在参数为 $theta geq1$ 和其他合适假设的误差约束条件下,我们确定了所提出方法的采样复杂度和一阶运算复杂度为 $widetildeO(epsilon^{-max{4、(epsilon^{-max{4, 2theta})$)$来找到一个更强的$epsilon$-随机静止点,其中违反约束的情况在$epsilon$以内,并且预期违反一阶静止性的情况在$epsilon$以内。据我们所知,这是第一部为寻找这类问题的近似随机静止点而开发具有可证明复杂性保证的方法的著作,该方法几乎可以肯定地满足所有约束条件。
{"title":"Variance-reduced first-order methods for deterministically constrained stochastic nonconvex optimization with strong convergence guarantees","authors":"Zhaosong Lu, Sanyou Mei, Yifeng Xiao","doi":"arxiv-2409.09906","DOIUrl":"https://doi.org/arxiv-2409.09906","url":null,"abstract":"In this paper, we study a class of deterministically constrained stochastic\u0000optimization problems. Existing methods typically aim to find an\u0000$epsilon$-stochastic stationary point, where the expected violations of both\u0000the constraints and first-order stationarity are within a prescribed accuracy\u0000of $epsilon$. However, in many practical applications, it is crucial that the\u0000constraints be nearly satisfied with certainty, making such an\u0000$epsilon$-stochastic stationary point potentially undesirable due to the risk\u0000of significant constraint violations. To address this issue, we propose\u0000single-loop variance-reduced stochastic first-order methods, where the\u0000stochastic gradient of the stochastic component is computed using either a\u0000truncated recursive momentum scheme or a truncated Polyak momentum scheme for\u0000variance reduction, while the gradient of the deterministic component is\u0000computed exactly. Under the error bound condition with a parameter $theta geq\u00001$ and other suitable assumptions, we establish that the proposed methods\u0000achieve a sample complexity and first-order operation complexity of $widetilde\u0000O(epsilon^{-max{4, 2theta}})$ for finding a stronger $epsilon$-stochastic\u0000stationary point, where the constraint violation is within $epsilon$ with\u0000certainty, and the expected violation of first-order stationarity is within\u0000$epsilon$. To the best of our knowledge, this is the first work to develop\u0000methods with provable complexity guarantees for finding an approximate\u0000stochastic stationary point of such problems that nearly satisfies all\u0000constraints with certainty.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust Reinforcement Learning with Dynamic Distortion Risk Measures 利用动态失真风险度量进行稳健强化学习
Pub Date : 2024-09-16 DOI: arxiv-2409.10096
Anthony Coache, Sebastian Jaimungal
In a reinforcement learning (RL) setting, the agent's optimal strategyheavily depends on her risk preferences and the underlying model dynamics ofthe training environment. These two aspects influence the agent's ability tomake well-informed and time-consistent decisions when facing testingenvironments. In this work, we devise a framework to solve robust risk-aware RLproblems where we simultaneously account for environmental uncertainty and riskwith a class of dynamic robust distortion risk measures. Robustness isintroduced by considering all models within a Wasserstein ball around areference model. We estimate such dynamic robust risk measures using neuralnetworks by making use of strictly consistent scoring functions, derive policygradient formulae using the quantile representation of distortion riskmeasures, and construct an actor-critic algorithm to solve this class of robustrisk-aware RL problems. We demonstrate the performance of our algorithm on aportfolio allocation example.
在强化学习(RL)环境中,代理的最优策略在很大程度上取决于其风险偏好和训练环境的底层模型动态。这两方面会影响代理在面对测试环境时做出知情且时间一致的决策的能力。在这项工作中,我们设计了一个解决鲁棒风险感知 RL 问题的框架,在这个框架中,我们用一类动态鲁棒失真风险度量来同时考虑环境的不确定性和风险。鲁棒性是通过在一个围绕参考模型的 Wasserstein 球内考虑所有模型而引入的。我们利用严格一致的评分函数,使用神经网络估算此类动态稳健风险度量,使用扭曲风险度量的量子表示法推导出政策梯度公式,并构建了一种行为批判算法来解决这类稳健风险感知 RL 问题。我们在一个投资组合分配实例中演示了我们算法的性能。
{"title":"Robust Reinforcement Learning with Dynamic Distortion Risk Measures","authors":"Anthony Coache, Sebastian Jaimungal","doi":"arxiv-2409.10096","DOIUrl":"https://doi.org/arxiv-2409.10096","url":null,"abstract":"In a reinforcement learning (RL) setting, the agent's optimal strategy\u0000heavily depends on her risk preferences and the underlying model dynamics of\u0000the training environment. These two aspects influence the agent's ability to\u0000make well-informed and time-consistent decisions when facing testing\u0000environments. In this work, we devise a framework to solve robust risk-aware RL\u0000problems where we simultaneously account for environmental uncertainty and risk\u0000with a class of dynamic robust distortion risk measures. Robustness is\u0000introduced by considering all models within a Wasserstein ball around a\u0000reference model. We estimate such dynamic robust risk measures using neural\u0000networks by making use of strictly consistent scoring functions, derive policy\u0000gradient formulae using the quantile representation of distortion risk\u0000measures, and construct an actor-critic algorithm to solve this class of robust\u0000risk-aware RL problems. We demonstrate the performance of our algorithm on a\u0000portfolio allocation example.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Bayesian Interpretation of Adaptive Low-Rank Adaptation 贝叶斯对自适应低函数适应的解释
Pub Date : 2024-09-16 DOI: arxiv-2409.10673
Haolin Chen, Philip N. Garner
Motivated by the sensitivity-based importance score of the adaptive low-rankadaptation (AdaLoRA), we utilize more theoretically supported metrics,including the signal-to-noise ratio (SNR), along with the Improved VariationalOnline Newton (IVON) optimizer, for adaptive parameter budget allocation. Theresulting Bayesian counterpart not only has matched or surpassed theperformance of using the sensitivity-based importance metric but is also afaster alternative to AdaLoRA with Adam. Our theoretical analysis reveals asignificant connection between the two metrics, providing a Bayesianperspective on the efficacy of sensitivity as an importance score. Furthermore,our findings suggest that the magnitude, rather than the variance, is theprimary indicator of the importance of parameters.
受自适应低秩适应(AdaLoRA)基于灵敏度的重要性评分的启发,我们利用更多理论支持的指标,包括信噪比(SNR),以及改进的变异在线牛顿(IVON)优化器,进行自适应参数预算分配。由此产生的贝叶斯对应算法不仅与使用基于灵敏度的重要性度量的性能相当,甚至超过了后者,而且比使用 Adam 的 AdaLoRA 更快。我们的理论分析揭示了这两个指标之间的重要联系,为灵敏度作为重要性评分的有效性提供了贝叶斯视角。此外,我们的研究结果表明,幅度而非方差是衡量参数重要性的主要指标。
{"title":"A Bayesian Interpretation of Adaptive Low-Rank Adaptation","authors":"Haolin Chen, Philip N. Garner","doi":"arxiv-2409.10673","DOIUrl":"https://doi.org/arxiv-2409.10673","url":null,"abstract":"Motivated by the sensitivity-based importance score of the adaptive low-rank\u0000adaptation (AdaLoRA), we utilize more theoretically supported metrics,\u0000including the signal-to-noise ratio (SNR), along with the Improved Variational\u0000Online Newton (IVON) optimizer, for adaptive parameter budget allocation. The\u0000resulting Bayesian counterpart not only has matched or surpassed the\u0000performance of using the sensitivity-based importance metric but is also a\u0000faster alternative to AdaLoRA with Adam. Our theoretical analysis reveals a\u0000significant connection between the two metrics, providing a Bayesian\u0000perspective on the efficacy of sensitivity as an importance score. Furthermore,\u0000our findings suggest that the magnitude, rather than the variance, is the\u0000primary indicator of the importance of parameters.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatiotemporal Covariance Neural Networks 时空协方差神经网络
Pub Date : 2024-09-16 DOI: arxiv-2409.10068
Andrea Cavallo, Mohammad Sabbaqi, Elvin Isufi
Modeling spatiotemporal interactions in multivariate time series is key totheir effective processing, but challenging because of their irregular andoften unknown structure. Statistical properties of the data provide usefulbiases to model interdependencies and are leveraged by correlation andcovariance-based networks as well as by processing pipelines relying onprincipal component analysis (PCA). However, PCA and its temporal extensionssuffer instabilities in the covariance eigenvectors when the correspondingeigenvalues are close to each other, making their application to dynamic andstreaming data settings challenging. To address these issues, we exploit theanalogy between PCA and graph convolutional filters to introduce theSpatioTemporal coVariance Neural Network (STVNN), a relational learning modelthat operates on the sample covariance matrix of the time series and leveragesjoint spatiotemporal convolutions to model the data. To account for thestreaming and non-stationary setting, we consider an online update of theparameters and sample covariance matrix. We prove the STVNN is stable to theuncertainties introduced by these online estimations, thus improving overtemporal PCA-based methods. Experimental results corroborate our theoreticalfindings and show that STVNN is competitive for multivariate time seriesprocessing, it adapts to changes in the data distribution, and it is orders ofmagnitude more stable than online temporal PCA.
多变量时间序列中的时空相互作用建模是其有效处理的关键,但由于其结构不规则且往往未知,因此具有挑战性。数据的统计特性为建立相互依存关系模型提供了有用的基础,基于相关性和协方差的网络以及依赖于主要成分分析(PCA)的处理管道都利用了这些特性。然而,当对应的特征值彼此接近时,PCA 及其时间扩展会导致协方差特征向量不稳定,从而使其在动态和流式数据设置中的应用面临挑战。为了解决这些问题,我们利用 PCA 和图卷积滤波器之间的相似性,引入了时空协方差神经网络(STVNN),这是一种关系学习模型,对时间序列的样本协方差矩阵进行操作,并利用联合时空卷积对数据建模。为了考虑流变和非稳态环境,我们考虑了参数和样本协方差矩阵的在线更新。我们证明了 STVNN 对这些在线估计引入的不确定性是稳定的,从而改进了基于 PCA 的超时空方法。实验结果证实了我们的理论发现,并表明 STVNN 在多变量时间序列处理方面具有竞争力,它能适应数据分布的变化,而且比在线时间 PCA 更稳定。
{"title":"Spatiotemporal Covariance Neural Networks","authors":"Andrea Cavallo, Mohammad Sabbaqi, Elvin Isufi","doi":"arxiv-2409.10068","DOIUrl":"https://doi.org/arxiv-2409.10068","url":null,"abstract":"Modeling spatiotemporal interactions in multivariate time series is key to\u0000their effective processing, but challenging because of their irregular and\u0000often unknown structure. Statistical properties of the data provide useful\u0000biases to model interdependencies and are leveraged by correlation and\u0000covariance-based networks as well as by processing pipelines relying on\u0000principal component analysis (PCA). However, PCA and its temporal extensions\u0000suffer instabilities in the covariance eigenvectors when the corresponding\u0000eigenvalues are close to each other, making their application to dynamic and\u0000streaming data settings challenging. To address these issues, we exploit the\u0000analogy between PCA and graph convolutional filters to introduce the\u0000SpatioTemporal coVariance Neural Network (STVNN), a relational learning model\u0000that operates on the sample covariance matrix of the time series and leverages\u0000joint spatiotemporal convolutions to model the data. To account for the\u0000streaming and non-stationary setting, we consider an online update of the\u0000parameters and sample covariance matrix. We prove the STVNN is stable to the\u0000uncertainties introduced by these online estimations, thus improving over\u0000temporal PCA-based methods. Experimental results corroborate our theoretical\u0000findings and show that STVNN is competitive for multivariate time series\u0000processing, it adapts to changes in the data distribution, and it is orders of\u0000magnitude more stable than online temporal PCA.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Manifold-Constrained Nucleus-Level Denoising Diffusion Model for Structure-Based Drug Design 基于结构的药物设计的显式约束核级去噪扩散模型
Pub Date : 2024-09-16 DOI: arxiv-2409.10584
Shengchao Liu, Divin Yan, Weitao Du, Weiyang Liu, Zhuoxinran Li, Hongyu Guo, Christian Borgs, Jennifer Chayes, Anima Anandkumar
Artificial intelligence models have shown great potential in structure-baseddrug design, generating ligands with high binding affinities. However, existingmodels have often overlooked a crucial physical constraint: atoms must maintaina minimum pairwise distance to avoid separation violation, a phenomenongoverned by the balance of attractive and repulsive forces. To mitigate suchseparation violations, we propose NucleusDiff. It models the interactionsbetween atomic nuclei and their surrounding electron clouds by enforcing thedistance constraint between the nuclei and manifolds. We quantitativelyevaluate NucleusDiff using the CrossDocked2020 dataset and a COVID-19therapeutic target, demonstrating that NucleusDiff reduces violation rate by upto 100.00% and enhances binding affinity by up to 22.16%, surpassingstate-of-the-art models for structure-based drug design. We also providequalitative analysis through manifold sampling, visually confirming theeffectiveness of NucleusDiff in reducing separation violations and improvingbinding affinities.
人工智能模型在基于结构的药物设计中显示出巨大潜力,可以生成具有高结合亲和力的配体。然而,现有的模型往往忽略了一个重要的物理约束条件:原子必须保持最小的成对距离,以避免分离违规,这种现象由吸引力和排斥力的平衡决定。为了减轻这种分离违规现象,我们提出了 NucleusDiff 模型。它通过强制原子核和流形之间的距离约束来模拟原子核与其周围电子云之间的相互作用。我们使用 CrossDocked 2020 数据集和 COVID-19 治疗靶点对 NucleusDiff 进行了定量评估,结果表明 NucleusDiff 可降低高达 100.00% 的违规率,提高高达 22.16% 的结合亲和力,超越了基于结构的药物设计的最先进模型。我们还通过流形采样提供了定性分析,直观地证实了 NucleusDiff 在减少分离违规率和提高结合亲和力方面的效果。
{"title":"Manifold-Constrained Nucleus-Level Denoising Diffusion Model for Structure-Based Drug Design","authors":"Shengchao Liu, Divin Yan, Weitao Du, Weiyang Liu, Zhuoxinran Li, Hongyu Guo, Christian Borgs, Jennifer Chayes, Anima Anandkumar","doi":"arxiv-2409.10584","DOIUrl":"https://doi.org/arxiv-2409.10584","url":null,"abstract":"Artificial intelligence models have shown great potential in structure-based\u0000drug design, generating ligands with high binding affinities. However, existing\u0000models have often overlooked a crucial physical constraint: atoms must maintain\u0000a minimum pairwise distance to avoid separation violation, a phenomenon\u0000governed by the balance of attractive and repulsive forces. To mitigate such\u0000separation violations, we propose NucleusDiff. It models the interactions\u0000between atomic nuclei and their surrounding electron clouds by enforcing the\u0000distance constraint between the nuclei and manifolds. We quantitatively\u0000evaluate NucleusDiff using the CrossDocked2020 dataset and a COVID-19\u0000therapeutic target, demonstrating that NucleusDiff reduces violation rate by up\u0000to 100.00% and enhances binding affinity by up to 22.16%, surpassing\u0000state-of-the-art models for structure-based drug design. We also provide\u0000qualitative analysis through manifold sampling, visually confirming the\u0000effectiveness of NucleusDiff in reducing separation violations and improving\u0000binding affinities.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - STAT - Machine Learning
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1