Guy Blanc, Alexandre Hayderi, Caleb Koch, Li-Yang Tan
Smooth boosters generate distributions that do not place too much weight on any given example. Originally introduced for their noise-tolerant properties, such boosters have also found applications in differential privacy, reproducibility, and quantum learning theory. We study and settle the sample complexity of smooth boosting: we exhibit a class that can be weak learned to $gamma$-advantage over smooth distributions with $m$ samples, for which strong learning over the uniform distribution requires $tilde{Omega}(1/gamma^2)cdot m$ samples. This matches the overhead of existing smooth boosters and provides the first separation from the setting of distribution-independent boosting, for which the corresponding overhead is $O(1/gamma)$. Our work also sheds new light on Impagliazzo's hardcore theorem from complexity theory, all known proofs of which can be cast in the framework of smooth boosting. For a function $f$ that is mildly hard against size-$s$ circuits, the hardcore theorem provides a set of inputs on which $f$ is extremely hard against size-$s'$ circuits. A downside of this important result is the loss in circuit size, i.e. that $s' ll s$. Answering a question of Trevisan, we show that this size loss is necessary and in fact, the parameters achieved by known proofs are the best possible.
{"title":"The Sample Complexity of Smooth Boosting and the Tightness of the Hardcore Theorem","authors":"Guy Blanc, Alexandre Hayderi, Caleb Koch, Li-Yang Tan","doi":"arxiv-2409.11597","DOIUrl":"https://doi.org/arxiv-2409.11597","url":null,"abstract":"Smooth boosters generate distributions that do not place too much weight on\u0000any given example. Originally introduced for their noise-tolerant properties,\u0000such boosters have also found applications in differential privacy,\u0000reproducibility, and quantum learning theory. We study and settle the sample\u0000complexity of smooth boosting: we exhibit a class that can be weak learned to\u0000$gamma$-advantage over smooth distributions with $m$ samples, for which strong\u0000learning over the uniform distribution requires\u0000$tilde{Omega}(1/gamma^2)cdot m$ samples. This matches the overhead of\u0000existing smooth boosters and provides the first separation from the setting of\u0000distribution-independent boosting, for which the corresponding overhead is\u0000$O(1/gamma)$. Our work also sheds new light on Impagliazzo's hardcore theorem from\u0000complexity theory, all known proofs of which can be cast in the framework of\u0000smooth boosting. For a function $f$ that is mildly hard against size-$s$\u0000circuits, the hardcore theorem provides a set of inputs on which $f$ is\u0000extremely hard against size-$s'$ circuits. A downside of this important result\u0000is the loss in circuit size, i.e. that $s' ll s$. Answering a question of\u0000Trevisan, we show that this size loss is necessary and in fact, the parameters\u0000achieved by known proofs are the best possible.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"74 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We study supervised classification for datasets with a very large number of input variables. The na"ive Bayes classifier is attractive for its simplicity, scalability and effectiveness in many real data applications. When the strong na"ive Bayes assumption of conditional independence of the input variables given the target variable is not valid, variable selection and model averaging are two common ways to improve the performance. In the case of the na"ive Bayes classifier, the resulting weighting scheme on the models reduces to a weighting scheme on the variables. Here we focus on direct estimation of variable weights in such a weighted na"ive Bayes classifier. We propose a sparse regularization of the model log-likelihood, which takes into account prior penalization costs related to each input variable. Compared to averaging based classifiers used up until now, our main goal is to obtain parsimonious robust models with less variables and equivalent performance. The direct estimation of the variable weights amounts to a non-convex optimization problem for which we propose and compare several two-stage algorithms. First, the criterion obtained by convex relaxation is minimized using several variants of standard gradient methods. Then, the initial non-convex optimization problem is solved using local optimization methods initialized with the result of the first stage. The various proposed algorithms result in optimization-based weighted na"ive Bayes classifiers, that are evaluated on benchmark datasets and positioned w.r.t. to a reference averaging-based classifier.
{"title":"Fractional Naive Bayes (FNB): non-convex optimization for a parsimonious weighted selective naive Bayes classifier","authors":"Carine Hue, Marc Boullé","doi":"arxiv-2409.11100","DOIUrl":"https://doi.org/arxiv-2409.11100","url":null,"abstract":"We study supervised classification for datasets with a very large number of\u0000input variables. The na\"ive Bayes classifier is attractive for its simplicity,\u0000scalability and effectiveness in many real data applications. When the strong\u0000na\"ive Bayes assumption of conditional independence of the input variables\u0000given the target variable is not valid, variable selection and model averaging\u0000are two common ways to improve the performance. In the case of the na\"ive\u0000Bayes classifier, the resulting weighting scheme on the models reduces to a\u0000weighting scheme on the variables. Here we focus on direct estimation of\u0000variable weights in such a weighted na\"ive Bayes classifier. We propose a\u0000sparse regularization of the model log-likelihood, which takes into account\u0000prior penalization costs related to each input variable. Compared to averaging\u0000based classifiers used up until now, our main goal is to obtain parsimonious\u0000robust models with less variables and equivalent performance. The direct\u0000estimation of the variable weights amounts to a non-convex optimization problem\u0000for which we propose and compare several two-stage algorithms. First, the\u0000criterion obtained by convex relaxation is minimized using several variants of\u0000standard gradient methods. Then, the initial non-convex optimization problem is\u0000solved using local optimization methods initialized with the result of the\u0000first stage. The various proposed algorithms result in optimization-based\u0000weighted na\"ive Bayes classifiers, that are evaluated on benchmark datasets\u0000and positioned w.r.t. to a reference averaging-based classifier.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"94 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The standard contextual bandit framework assumes fully observable and actionable contexts. In this work, we consider a new bandit setting with partially observable, correlated contexts and linear payoffs, motivated by the applications in finance where decision making is based on market information that typically displays temporal correlation and is not fully observed. We make the following contributions marrying ideas from statistical signal processing with bandits: (i) We propose an algorithmic pipeline named EMKF-Bandit, which integrates system identification, filtering, and classic contextual bandit algorithms into an iterative method alternating between latent parameter estimation and decision making. (ii) We analyze EMKF-Bandit when we select Thompson sampling as the bandit algorithm and show that it incurs a sub-linear regret under conditions on filtering. (iii) We conduct numerical simulations that demonstrate the benefits and practical applicability of the proposed pipeline.
{"title":"Partially Observable Contextual Bandits with Linear Payoffs","authors":"Sihan Zeng, Sujay Bhatt, Alec Koppel, Sumitra Ganesh","doi":"arxiv-2409.11521","DOIUrl":"https://doi.org/arxiv-2409.11521","url":null,"abstract":"The standard contextual bandit framework assumes fully observable and\u0000actionable contexts. In this work, we consider a new bandit setting with\u0000partially observable, correlated contexts and linear payoffs, motivated by the\u0000applications in finance where decision making is based on market information\u0000that typically displays temporal correlation and is not fully observed. We make\u0000the following contributions marrying ideas from statistical signal processing\u0000with bandits: (i) We propose an algorithmic pipeline named EMKF-Bandit, which\u0000integrates system identification, filtering, and classic contextual bandit\u0000algorithms into an iterative method alternating between latent parameter\u0000estimation and decision making. (ii) We analyze EMKF-Bandit when we select\u0000Thompson sampling as the bandit algorithm and show that it incurs a sub-linear\u0000regret under conditions on filtering. (iii) We conduct numerical simulations\u0000that demonstrate the benefits and practical applicability of the proposed\u0000pipeline.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"53 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multilayer Perceptrons (MLPs) have long been a cornerstone in deep learning, known for their capacity to model complex relationships. Recently, Kolmogorov-Arnold Networks (KANs) have emerged as a compelling alternative, utilizing highly flexible learnable activation functions directly on network edges, a departure from the neuron-centric approach of MLPs. However, KANs significantly increase the number of learnable parameters, raising concerns about their effectiveness in data-scarce environments. This paper presents a comprehensive comparative study of MLPs and KANs from both algorithmic and experimental perspectives, with a focus on low-data regimes. We introduce an effective technique for designing MLPs with unique, parameterized activation functions for each neuron, enabling a more balanced comparison with KANs. Using empirical evaluations on simulated data and two real-world data sets from medicine and engineering, we explore the trade-offs between model complexity and accuracy, with particular attention to the role of network depth. Our findings show that MLPs with individualized activation functions achieve significantly higher predictive accuracy with only a modest increase in parameters, especially when the sample size is limited to around one hundred. For example, in a three-class classification problem within additive manufacturing, MLPs achieve a median accuracy of 0.91, significantly outperforming KANs, which only reach a median accuracy of 0.53 with default hyperparameters. These results offer valuable insights into the impact of activation function selection in neural networks.
长期以来,多层感知器(MLP)一直是深度学习的基石,因其建模复杂关系的能力而闻名。最近,Kolmogorov-Arnold 网络(KANs)作为一种引人注目的替代方案出现了,它直接在网络边上利用高度灵活的可学习激活函数,一改 MLPs 以神经元为中心的方法。然而,KAN 显著增加了可学习参数的数量,这引起了人们对其在数据稀缺环境中有效性的担忧。本文从算法和实验两个角度对 MLP 和 KAN 进行了全面的比较研究,重点关注低数据环境。我们介绍了一种设计 MLP 的有效技术,该技术为每个神经元设计了独特的参数化激活函数,从而能够与 KAN 进行更均衡的比较。通过对模拟数据以及医学和工程学两个真实世界数据集的经验评估,我们探讨了模型复杂性和准确性之间的权衡,尤其关注了网络深度的作用。我们的研究结果表明,具有个性化激活函数的 MLP 只需适度增加参数,就能显著提高预测准确率,尤其是当样本量限制在 100 个左右时。例如,在添加剂制造的三类分类问题中,MLP 的中位准确率达到了 0.91,明显优于 KAN,而 KAN 在使用默认参数时的中位准确率仅为 0.53。这些结果对神经网络中激活函数选择的影响提供了宝贵的启示。
{"title":"Kolmogorov-Arnold Networks in Low-Data Regimes: A Comparative Study with Multilayer Perceptrons","authors":"Farhad Pourkamali-Anaraki","doi":"arxiv-2409.10463","DOIUrl":"https://doi.org/arxiv-2409.10463","url":null,"abstract":"Multilayer Perceptrons (MLPs) have long been a cornerstone in deep learning,\u0000known for their capacity to model complex relationships. Recently,\u0000Kolmogorov-Arnold Networks (KANs) have emerged as a compelling alternative,\u0000utilizing highly flexible learnable activation functions directly on network\u0000edges, a departure from the neuron-centric approach of MLPs. However, KANs\u0000significantly increase the number of learnable parameters, raising concerns\u0000about their effectiveness in data-scarce environments. This paper presents a\u0000comprehensive comparative study of MLPs and KANs from both algorithmic and\u0000experimental perspectives, with a focus on low-data regimes. We introduce an\u0000effective technique for designing MLPs with unique, parameterized activation\u0000functions for each neuron, enabling a more balanced comparison with KANs. Using\u0000empirical evaluations on simulated data and two real-world data sets from\u0000medicine and engineering, we explore the trade-offs between model complexity\u0000and accuracy, with particular attention to the role of network depth. Our\u0000findings show that MLPs with individualized activation functions achieve\u0000significantly higher predictive accuracy with only a modest increase in\u0000parameters, especially when the sample size is limited to around one hundred.\u0000For example, in a three-class classification problem within additive\u0000manufacturing, MLPs achieve a median accuracy of 0.91, significantly\u0000outperforming KANs, which only reach a median accuracy of 0.53 with default\u0000hyperparameters. These results offer valuable insights into the impact of\u0000activation function selection in neural networks.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huanbiao Zhu, Krish Desai, Mikael Kuusela, Vinicius Mikuni, Benjamin Nachman, Larry Wasserman
In many experimental contexts, it is necessary to statistically remove the impact of instrumental effects in order to physically interpret measurements. This task has been extensively studied in particle physics, where the deconvolution task is called unfolding. A number of recent methods have shown how to perform high-dimensional, unbinned unfolding using machine learning. However, one of the assumptions in all of these methods is that the detector response is accurately modeled in the Monte Carlo simulation. In practice, the detector response depends on a number of nuisance parameters that can be constrained with data. We propose a new algorithm called Profile OmniFold (POF), which works in a similar iterative manner as the OmniFold (OF) algorithm while being able to simultaneously profile the nuisance parameters. We illustrate the method with a Gaussian example as a proof of concept highlighting its promising capabilities.
{"title":"Multidimensional Deconvolution with Profiling","authors":"Huanbiao Zhu, Krish Desai, Mikael Kuusela, Vinicius Mikuni, Benjamin Nachman, Larry Wasserman","doi":"arxiv-2409.10421","DOIUrl":"https://doi.org/arxiv-2409.10421","url":null,"abstract":"In many experimental contexts, it is necessary to statistically remove the\u0000impact of instrumental effects in order to physically interpret measurements.\u0000This task has been extensively studied in particle physics, where the\u0000deconvolution task is called unfolding. A number of recent methods have shown\u0000how to perform high-dimensional, unbinned unfolding using machine learning.\u0000However, one of the assumptions in all of these methods is that the detector\u0000response is accurately modeled in the Monte Carlo simulation. In practice, the\u0000detector response depends on a number of nuisance parameters that can be\u0000constrained with data. We propose a new algorithm called Profile OmniFold\u0000(POF), which works in a similar iterative manner as the OmniFold (OF) algorithm\u0000while being able to simultaneously profile the nuisance parameters. We\u0000illustrate the method with a Gaussian example as a proof of concept\u0000highlighting its promising capabilities.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we study a class of deterministically constrained stochastic optimization problems. Existing methods typically aim to find an $epsilon$-stochastic stationary point, where the expected violations of both the constraints and first-order stationarity are within a prescribed accuracy of $epsilon$. However, in many practical applications, it is crucial that the constraints be nearly satisfied with certainty, making such an $epsilon$-stochastic stationary point potentially undesirable due to the risk of significant constraint violations. To address this issue, we propose single-loop variance-reduced stochastic first-order methods, where the stochastic gradient of the stochastic component is computed using either a truncated recursive momentum scheme or a truncated Polyak momentum scheme for variance reduction, while the gradient of the deterministic component is computed exactly. Under the error bound condition with a parameter $theta geq 1$ and other suitable assumptions, we establish that the proposed methods achieve a sample complexity and first-order operation complexity of $widetilde O(epsilon^{-max{4, 2theta}})$ for finding a stronger $epsilon$-stochastic stationary point, where the constraint violation is within $epsilon$ with certainty, and the expected violation of first-order stationarity is within $epsilon$. To the best of our knowledge, this is the first work to develop methods with provable complexity guarantees for finding an approximate stochastic stationary point of such problems that nearly satisfies all constraints with certainty.
{"title":"Variance-reduced first-order methods for deterministically constrained stochastic nonconvex optimization with strong convergence guarantees","authors":"Zhaosong Lu, Sanyou Mei, Yifeng Xiao","doi":"arxiv-2409.09906","DOIUrl":"https://doi.org/arxiv-2409.09906","url":null,"abstract":"In this paper, we study a class of deterministically constrained stochastic\u0000optimization problems. Existing methods typically aim to find an\u0000$epsilon$-stochastic stationary point, where the expected violations of both\u0000the constraints and first-order stationarity are within a prescribed accuracy\u0000of $epsilon$. However, in many practical applications, it is crucial that the\u0000constraints be nearly satisfied with certainty, making such an\u0000$epsilon$-stochastic stationary point potentially undesirable due to the risk\u0000of significant constraint violations. To address this issue, we propose\u0000single-loop variance-reduced stochastic first-order methods, where the\u0000stochastic gradient of the stochastic component is computed using either a\u0000truncated recursive momentum scheme or a truncated Polyak momentum scheme for\u0000variance reduction, while the gradient of the deterministic component is\u0000computed exactly. Under the error bound condition with a parameter $theta geq\u00001$ and other suitable assumptions, we establish that the proposed methods\u0000achieve a sample complexity and first-order operation complexity of $widetilde\u0000O(epsilon^{-max{4, 2theta}})$ for finding a stronger $epsilon$-stochastic\u0000stationary point, where the constraint violation is within $epsilon$ with\u0000certainty, and the expected violation of first-order stationarity is within\u0000$epsilon$. To the best of our knowledge, this is the first work to develop\u0000methods with provable complexity guarantees for finding an approximate\u0000stochastic stationary point of such problems that nearly satisfies all\u0000constraints with certainty.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In a reinforcement learning (RL) setting, the agent's optimal strategy heavily depends on her risk preferences and the underlying model dynamics of the training environment. These two aspects influence the agent's ability to make well-informed and time-consistent decisions when facing testing environments. In this work, we devise a framework to solve robust risk-aware RL problems where we simultaneously account for environmental uncertainty and risk with a class of dynamic robust distortion risk measures. Robustness is introduced by considering all models within a Wasserstein ball around a reference model. We estimate such dynamic robust risk measures using neural networks by making use of strictly consistent scoring functions, derive policy gradient formulae using the quantile representation of distortion risk measures, and construct an actor-critic algorithm to solve this class of robust risk-aware RL problems. We demonstrate the performance of our algorithm on a portfolio allocation example.
{"title":"Robust Reinforcement Learning with Dynamic Distortion Risk Measures","authors":"Anthony Coache, Sebastian Jaimungal","doi":"arxiv-2409.10096","DOIUrl":"https://doi.org/arxiv-2409.10096","url":null,"abstract":"In a reinforcement learning (RL) setting, the agent's optimal strategy\u0000heavily depends on her risk preferences and the underlying model dynamics of\u0000the training environment. These two aspects influence the agent's ability to\u0000make well-informed and time-consistent decisions when facing testing\u0000environments. In this work, we devise a framework to solve robust risk-aware RL\u0000problems where we simultaneously account for environmental uncertainty and risk\u0000with a class of dynamic robust distortion risk measures. Robustness is\u0000introduced by considering all models within a Wasserstein ball around a\u0000reference model. We estimate such dynamic robust risk measures using neural\u0000networks by making use of strictly consistent scoring functions, derive policy\u0000gradient formulae using the quantile representation of distortion risk\u0000measures, and construct an actor-critic algorithm to solve this class of robust\u0000risk-aware RL problems. We demonstrate the performance of our algorithm on a\u0000portfolio allocation example.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shengchao Liu, Divin Yan, Weitao Du, Weiyang Liu, Zhuoxinran Li, Hongyu Guo, Christian Borgs, Jennifer Chayes, Anima Anandkumar
Artificial intelligence models have shown great potential in structure-based drug design, generating ligands with high binding affinities. However, existing models have often overlooked a crucial physical constraint: atoms must maintain a minimum pairwise distance to avoid separation violation, a phenomenon governed by the balance of attractive and repulsive forces. To mitigate such separation violations, we propose NucleusDiff. It models the interactions between atomic nuclei and their surrounding electron clouds by enforcing the distance constraint between the nuclei and manifolds. We quantitatively evaluate NucleusDiff using the CrossDocked2020 dataset and a COVID-19 therapeutic target, demonstrating that NucleusDiff reduces violation rate by up to 100.00% and enhances binding affinity by up to 22.16%, surpassing state-of-the-art models for structure-based drug design. We also provide qualitative analysis through manifold sampling, visually confirming the effectiveness of NucleusDiff in reducing separation violations and improving binding affinities.
{"title":"Manifold-Constrained Nucleus-Level Denoising Diffusion Model for Structure-Based Drug Design","authors":"Shengchao Liu, Divin Yan, Weitao Du, Weiyang Liu, Zhuoxinran Li, Hongyu Guo, Christian Borgs, Jennifer Chayes, Anima Anandkumar","doi":"arxiv-2409.10584","DOIUrl":"https://doi.org/arxiv-2409.10584","url":null,"abstract":"Artificial intelligence models have shown great potential in structure-based\u0000drug design, generating ligands with high binding affinities. However, existing\u0000models have often overlooked a crucial physical constraint: atoms must maintain\u0000a minimum pairwise distance to avoid separation violation, a phenomenon\u0000governed by the balance of attractive and repulsive forces. To mitigate such\u0000separation violations, we propose NucleusDiff. It models the interactions\u0000between atomic nuclei and their surrounding electron clouds by enforcing the\u0000distance constraint between the nuclei and manifolds. We quantitatively\u0000evaluate NucleusDiff using the CrossDocked2020 dataset and a COVID-19\u0000therapeutic target, demonstrating that NucleusDiff reduces violation rate by up\u0000to 100.00% and enhances binding affinity by up to 22.16%, surpassing\u0000state-of-the-art models for structure-based drug design. We also provide\u0000qualitative analysis through manifold sampling, visually confirming the\u0000effectiveness of NucleusDiff in reducing separation violations and improving\u0000binding affinities.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivated by the sensitivity-based importance score of the adaptive low-rank adaptation (AdaLoRA), we utilize more theoretically supported metrics, including the signal-to-noise ratio (SNR), along with the Improved Variational Online Newton (IVON) optimizer, for adaptive parameter budget allocation. The resulting Bayesian counterpart not only has matched or surpassed the performance of using the sensitivity-based importance metric but is also a faster alternative to AdaLoRA with Adam. Our theoretical analysis reveals a significant connection between the two metrics, providing a Bayesian perspective on the efficacy of sensitivity as an importance score. Furthermore, our findings suggest that the magnitude, rather than the variance, is the primary indicator of the importance of parameters.
受自适应低秩适应(AdaLoRA)基于灵敏度的重要性评分的启发,我们利用更多理论支持的指标,包括信噪比(SNR),以及改进的变异在线牛顿(IVON)优化器,进行自适应参数预算分配。由此产生的贝叶斯对应算法不仅与使用基于灵敏度的重要性度量的性能相当,甚至超过了后者,而且比使用 Adam 的 AdaLoRA 更快。我们的理论分析揭示了这两个指标之间的重要联系,为灵敏度作为重要性评分的有效性提供了贝叶斯视角。此外,我们的研究结果表明,幅度而非方差是衡量参数重要性的主要指标。
{"title":"A Bayesian Interpretation of Adaptive Low-Rank Adaptation","authors":"Haolin Chen, Philip N. Garner","doi":"arxiv-2409.10673","DOIUrl":"https://doi.org/arxiv-2409.10673","url":null,"abstract":"Motivated by the sensitivity-based importance score of the adaptive low-rank\u0000adaptation (AdaLoRA), we utilize more theoretically supported metrics,\u0000including the signal-to-noise ratio (SNR), along with the Improved Variational\u0000Online Newton (IVON) optimizer, for adaptive parameter budget allocation. The\u0000resulting Bayesian counterpart not only has matched or surpassed the\u0000performance of using the sensitivity-based importance metric but is also a\u0000faster alternative to AdaLoRA with Adam. Our theoretical analysis reveals a\u0000significant connection between the two metrics, providing a Bayesian\u0000perspective on the efficacy of sensitivity as an importance score. Furthermore,\u0000our findings suggest that the magnitude, rather than the variance, is the\u0000primary indicator of the importance of parameters.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Modeling spatiotemporal interactions in multivariate time series is key to their effective processing, but challenging because of their irregular and often unknown structure. Statistical properties of the data provide useful biases to model interdependencies and are leveraged by correlation and covariance-based networks as well as by processing pipelines relying on principal component analysis (PCA). However, PCA and its temporal extensions suffer instabilities in the covariance eigenvectors when the corresponding eigenvalues are close to each other, making their application to dynamic and streaming data settings challenging. To address these issues, we exploit the analogy between PCA and graph convolutional filters to introduce the SpatioTemporal coVariance Neural Network (STVNN), a relational learning model that operates on the sample covariance matrix of the time series and leverages joint spatiotemporal convolutions to model the data. To account for the streaming and non-stationary setting, we consider an online update of the parameters and sample covariance matrix. We prove the STVNN is stable to the uncertainties introduced by these online estimations, thus improving over temporal PCA-based methods. Experimental results corroborate our theoretical findings and show that STVNN is competitive for multivariate time series processing, it adapts to changes in the data distribution, and it is orders of magnitude more stable than online temporal PCA.
{"title":"Spatiotemporal Covariance Neural Networks","authors":"Andrea Cavallo, Mohammad Sabbaqi, Elvin Isufi","doi":"arxiv-2409.10068","DOIUrl":"https://doi.org/arxiv-2409.10068","url":null,"abstract":"Modeling spatiotemporal interactions in multivariate time series is key to\u0000their effective processing, but challenging because of their irregular and\u0000often unknown structure. Statistical properties of the data provide useful\u0000biases to model interdependencies and are leveraged by correlation and\u0000covariance-based networks as well as by processing pipelines relying on\u0000principal component analysis (PCA). However, PCA and its temporal extensions\u0000suffer instabilities in the covariance eigenvectors when the corresponding\u0000eigenvalues are close to each other, making their application to dynamic and\u0000streaming data settings challenging. To address these issues, we exploit the\u0000analogy between PCA and graph convolutional filters to introduce the\u0000SpatioTemporal coVariance Neural Network (STVNN), a relational learning model\u0000that operates on the sample covariance matrix of the time series and leverages\u0000joint spatiotemporal convolutions to model the data. To account for the\u0000streaming and non-stationary setting, we consider an online update of the\u0000parameters and sample covariance matrix. We prove the STVNN is stable to the\u0000uncertainties introduced by these online estimations, thus improving over\u0000temporal PCA-based methods. Experimental results corroborate our theoretical\u0000findings and show that STVNN is competitive for multivariate time series\u0000processing, it adapts to changes in the data distribution, and it is orders of\u0000magnitude more stable than online temporal PCA.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}