首页 > 最新文献

Journal of Machine Learning Research最新文献

英文 中文
Flexible Bayesian Product Mixture Models for Vector Autoregressions. 灵活的贝叶斯向量自回归产品混合物模型
IF 4.3 3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-04-01
Suprateek Kundu, Joshua Lukemire

Bayesian non-parametric methods based on Dirichlet process mixtures have seen tremendous success in various domains and are appealing in being able to borrow information by clustering samples that share identical parameters. However, such methods can face hurdles in heterogeneous settings where objects are expected to cluster only along a subset of axes or where clusters of samples share only a subset of identical parameters. We overcome such limitations by developing a novel class of product of Dirichlet process location-scale mixtures that enables independent clustering at multiple scales, which results in varying levels of information sharing across samples. First, we develop the approach for independent multivariate data. Subsequently we generalize it to multivariate time-series data under the framework of multi-subject Vector Autoregressive (VAR) models that is our primary focus, which go beyond parametric single-subject VAR models. We establish posterior consistency and develop efficient posterior computation for implementation. Extensive numerical studies involving VAR models show distinct advantages over competing methods in terms of estimation, clustering, and feature selection accuracy. Our resting state fMRI analysis from the Human Connectome Project reveals biologically interpretable connectivity differences between distinct intelligence groups, while another air pollution application illustrates the superior forecasting accuracy compared to alternate methods.

基于Dirichlet过程混合的贝叶斯非参数方法在各个领域都取得了巨大的成功,并且能够通过聚类共享相同参数的样本来获取信息。然而,这种方法在异质环境中可能面临障碍,在异质环境中,期望对象仅沿着轴的子集聚集,或者样本集群仅共享相同参数的子集。我们通过开发一种新的狄利克雷过程位置尺度混合物的产品来克服这些限制,该产品能够在多个尺度上独立聚类,从而导致不同水平的样本信息共享。首先,我们开发了独立多元数据的方法。随后,我们将其推广到多主体向量自回归(VAR)模型框架下的多变量时间序列数据,这是我们的重点,它超越了参数化的单主体VAR模型。我们建立了后验一致性,并开发了有效的后验计算实现。大量涉及VAR模型的数值研究表明,在估计、聚类和特征选择准确性方面,VAR模型比其他竞争方法有明显的优势。我们对人类连接组项目的静息状态fMRI分析揭示了不同智力群体之间生物学上可解释的连接差异,而另一个空气污染应用表明,与其他方法相比,预测准确性更高。
{"title":"Flexible Bayesian Product Mixture Models for Vector Autoregressions.","authors":"Suprateek Kundu, Joshua Lukemire","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Bayesian non-parametric methods based on Dirichlet process mixtures have seen tremendous success in various domains and are appealing in being able to borrow information by clustering samples that share identical parameters. However, such methods can face hurdles in heterogeneous settings where objects are expected to cluster only along a subset of axes or where clusters of samples share only a subset of identical parameters. We overcome such limitations by developing a novel class of product of Dirichlet process location-scale mixtures that enables independent clustering at multiple scales, which results in varying levels of information sharing across samples. First, we develop the approach for independent multivariate data. Subsequently we generalize it to multivariate time-series data under the framework of multi-subject Vector Autoregressive (VAR) models that is our primary focus, which go beyond parametric single-subject VAR models. We establish posterior consistency and develop efficient posterior computation for implementation. Extensive numerical studies involving VAR models show distinct advantages over competing methods in terms of estimation, clustering, and feature selection accuracy. Our resting state fMRI analysis from the Human Connectome Project reveals biologically interpretable connectivity differences between distinct intelligence groups, while another air pollution application illustrates the superior forecasting accuracy compared to alternate methods.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"25 ","pages":""},"PeriodicalIF":4.3,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11646655/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142830693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effect-Invariant Mechanisms for Policy Generalization. 政策通用化的效应不变机制。
IF 4.3 3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-01-01
Sorawit Saengkyongam, Niklas Pfister, Predrag Klasnja, Susan Murphy, Jonas Peters

Policy learning is an important component of many real-world learning systems. A major challenge in policy learning is how to adapt efficiently to unseen environments or tasks. Recently, it has been suggested to exploit invariant conditional distributions to learn models that generalize better to unseen environments. However, assuming invariance of entire conditional distributions (which we call full invariance) may be too strong of an assumption in practice. In this paper, we introduce a relaxation of full invariance called effect-invariance (e-invariance for short) and prove that it is sufficient, under suitable assumptions, for zero-shot policy generalization. We also discuss an extension that exploits e-invariance when we have a small sample from the test environment, enabling few-shot policy generalization. Our work does not assume an underlying causal graph or that the data are generated by a structural causal model; instead, we develop testing procedures to test e-invariance directly from data. We present empirical results using simulated data and a mobile health intervention dataset to demonstrate the effectiveness of our approach.

策略学习是现实世界中许多学习系统的重要组成部分。策略学习的一个主要挑战是如何有效地适应未知环境或任务。最近,有人建议利用不变条件分布来学习模型,以便更好地泛化到未知环境中。然而,假设整个条件分布不变(我们称之为完全不变)在实践中可能是一个太强的假设。在本文中,我们介绍了完全不变性的一种松弛,称为效应不变性(简称 e-不变性),并证明在合适的假设条件下,它足以实现零次策略泛化。我们还讨论了一种扩展方法,当我们从测试环境中获得少量样本时,可以利用 e-invariance 实现少次策略泛化。我们的工作没有假设底层因果图,也没有假设数据是由结构因果模型生成的;相反,我们开发了测试程序,直接从数据中测试电子不变量。我们使用模拟数据和移动健康干预数据集展示了实证结果,以证明我们方法的有效性。
{"title":"Effect-Invariant Mechanisms for Policy Generalization.","authors":"Sorawit Saengkyongam, Niklas Pfister, Predrag Klasnja, Susan Murphy, Jonas Peters","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Policy learning is an important component of many real-world learning systems. A major challenge in policy learning is how to adapt efficiently to unseen environments or tasks. Recently, it has been suggested to exploit invariant conditional distributions to learn models that generalize better to unseen environments. However, assuming invariance of entire conditional distributions (which we call full invariance) may be too strong of an assumption in practice. In this paper, we introduce a relaxation of full invariance called effect-invariance (e-invariance for short) and prove that it is sufficient, under suitable assumptions, for zero-shot policy generalization. We also discuss an extension that exploits e-invariance when we have a small sample from the test environment, enabling few-shot policy generalization. Our work does not assume an underlying causal graph or that the data are generated by a structural causal model; instead, we develop testing procedures to test e-invariance directly from data. We present empirical results using simulated data and a mobile health intervention dataset to demonstrate the effectiveness of our approach.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"25 ","pages":""},"PeriodicalIF":4.3,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11286230/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141857003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nonparametric Regression for 3D Point Cloud Learning. 用于 3D 点云学习的非参数回归。
IF 4.3 3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-01-01
Xinyi Li, Shan Yu, Yueying Wang, Guannan Wang, Li Wang, Ming-Jun Lai

In recent years, there has been an exponentially increased amount of point clouds collected with irregular shapes in various areas. Motivated by the importance of solid modeling for point clouds, we develop a novel and efficient smoothing tool based on multivariate splines over the triangulation to extract the underlying signal and build up a 3D solid model from the point cloud. The proposed method can denoise or deblur the point cloud effectively, provide a multi-resolution reconstruction of the actual signal, and handle sparse and irregularly distributed point clouds to recover the underlying trajectory. In addition, our method provides a natural way of numerosity data reduction. We establish the theoretical guarantees of the proposed method, including the convergence rate and asymptotic normality of the estimator, and show that the convergence rate achieves optimal nonparametric convergence. We also introduce a bootstrap method to quantify the uncertainty of the estimators. Through extensive simulation studies and a real data example, we demonstrate the superiority of the proposed method over traditional smoothing methods in terms of estimation accuracy and efficiency of data reduction.

近年来,在各个领域收集到的不规则形状的点云数量呈指数级增长。鉴于实体模型对点云的重要性,我们开发了一种基于三角剖分的多元样条的新型高效平滑工具,以提取底层信号并从点云中建立三维实体模型。所提出的方法能有效地对点云进行去噪或去模糊处理,提供实际信号的多分辨率重建,并能处理稀疏和不规则分布的点云,从而恢复底层轨迹。此外,我们的方法还提供了一种减少数值数据的自然方法。我们建立了所提方法的理论保证,包括估计器的收敛速率和渐近正态性,并证明收敛速率达到了最佳非参数收敛。我们还引入了一种自举方法来量化估计器的不确定性。通过大量的模拟研究和真实数据实例,我们证明了所提出的方法在估计精度和数据缩减效率方面优于传统的平滑方法。
{"title":"Nonparametric Regression for 3D Point Cloud Learning.","authors":"Xinyi Li, Shan Yu, Yueying Wang, Guannan Wang, Li Wang, Ming-Jun Lai","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>In recent years, there has been an exponentially increased amount of point clouds collected with irregular shapes in various areas. Motivated by the importance of solid modeling for point clouds, we develop a novel and efficient smoothing tool based on multivariate splines over the triangulation to extract the underlying signal and build up a 3D solid model from the point cloud. The proposed method can denoise or deblur the point cloud effectively, provide a multi-resolution reconstruction of the actual signal, and handle sparse and irregularly distributed point clouds to recover the underlying trajectory. In addition, our method provides a natural way of numerosity data reduction. We establish the theoretical guarantees of the proposed method, including the convergence rate and asymptotic normality of the estimator, and show that the convergence rate achieves optimal nonparametric convergence. We also introduce a bootstrap method to quantify the uncertainty of the estimators. Through extensive simulation studies and a real data example, we demonstrate the superiority of the proposed method over traditional smoothing methods in terms of estimation accuracy and efficiency of data reduction.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"25 ","pages":""},"PeriodicalIF":4.3,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11465206/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142401809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Causal Discovery with Generalized Linear Models through Peeling Algorithms. 基于剥离算法的广义线性模型因果发现。
IF 4.3 3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-01-01
Minjie Wang, Xiaotong Shen, Wei Pan

This article presents a novel method for causal discovery with generalized structural equation models suited for analyzing diverse types of outcomes, including discrete, continuous, and mixed data. Causal discovery often faces challenges due to unmeasured confounders that hinder the identification of causal relationships. The proposed approach addresses this issue by developing two peeling algorithms (bottom-up and top-down) to ascertain causal relationships and valid instruments. This approach first reconstructs a super-graph to represent ancestral relationships between variables, using a peeling algorithm based on nodewise GLM regressions that exploit relationships between primary and instrumental variables. Then, it estimates parent-child effects from the ancestral relationships using another peeling algorithm while deconfounding a child's model with information borrowed from its parents' models. The article offers a theoretical analysis of the proposed approach, establishing conditions for model identifiability and providing statistical guarantees for accurately discovering parent-child relationships via the peeling algorithms. Furthermore, the article presents numerical experiments showcasing the effectiveness of our approach in comparison to state-of-the-art structure learning methods without confounders. Lastly, it demonstrates an application to Alzheimer's disease (AD), highlighting the method's utility in constructing gene-to-gene and gene-to-disease regulatory networks involving Single Nucleotide Polymorphisms (SNPs) for healthy and AD subjects.

本文提出了一种新的方法,适用于分析不同类型的结果,包括离散,连续和混合数据的广义结构方程模型的因果发现。由于无法测量的混杂因素阻碍了因果关系的识别,因果发现经常面临挑战。提出的方法通过开发两种剥离算法(自下而上和自上而下)来确定因果关系和有效工具来解决这一问题。该方法首先重建一个超级图来表示变量之间的祖先关系,使用基于节点的GLM回归的剥离算法,该算法利用主要变量和工具变量之间的关系。然后,它使用另一种剥离算法从祖先关系中估计亲子效应,同时用从父母模型中借来的信息解构孩子的模型。本文对本文提出的方法进行了理论分析,建立了模型可识别的条件,并为通过剥离算法准确发现亲子关系提供了统计保证。此外,本文还介绍了数值实验,与没有混杂因素的最先进的结构学习方法相比,展示了我们的方法的有效性。最后,它展示了在阿尔茨海默病(AD)中的应用,突出了该方法在构建涉及健康和阿尔茨海默病受试者的单核苷酸多态性(snp)的基因到基因和基因到疾病调控网络中的实用性。
{"title":"Causal Discovery with Generalized Linear Models through Peeling Algorithms.","authors":"Minjie Wang, Xiaotong Shen, Wei Pan","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>This article presents a novel method for causal discovery with generalized structural equation models suited for analyzing diverse types of outcomes, including discrete, continuous, and mixed data. Causal discovery often faces challenges due to unmeasured confounders that hinder the identification of causal relationships. The proposed approach addresses this issue by developing two peeling algorithms (bottom-up and top-down) to ascertain causal relationships and valid instruments. This approach first reconstructs a super-graph to represent ancestral relationships between variables, using a peeling algorithm based on nodewise GLM regressions that exploit relationships between primary and instrumental variables. Then, it estimates parent-child effects from the ancestral relationships using another peeling algorithm while deconfounding a child's model with information borrowed from its parents' models. The article offers a theoretical analysis of the proposed approach, establishing conditions for model identifiability and providing statistical guarantees for accurately discovering parent-child relationships via the peeling algorithms. Furthermore, the article presents numerical experiments showcasing the effectiveness of our approach in comparison to state-of-the-art structure learning methods without confounders. Lastly, it demonstrates an application to Alzheimer's disease (AD), highlighting the method's utility in constructing gene-to-gene and gene-to-disease regulatory networks involving Single Nucleotide Polymorphisms (SNPs) for healthy and AD subjects.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"25 ","pages":""},"PeriodicalIF":4.3,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11699566/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142933418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graphical Dirichlet Process for Clustering Non-Exchangeable Grouped Data. 用于对不可交换分组数据进行聚类的图形 Dirichlet Process。
IF 4.3 3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-01-01
Arhit Chakrabarti, Yang Ni, Ellen Ruth A Morris, Michael L Salinas, Robert S Chapkin, Bani K Mallick

We consider the problem of clustering grouped data with possibly non-exchangeable groups whose dependencies can be characterized by a known directed acyclic graph. To allow the sharing of clusters among the non-exchangeable groups, we propose a Bayesian nonparametric approach, termed graphical Dirichlet process, that jointly models the dependent group-specific random measures by assuming each random measure to be distributed as a Dirichlet process whose concentration parameter and base probability measure depend on those of its parent groups. The resulting joint stochastic process respects the Markov property of the directed acyclic graph that links the groups. We characterize the graphical Dirichlet process using a novel hypergraph representation as well as the stick-breaking representation, the restaurant-type representation, and the representation as a limit of a finite mixture model. We develop an efficient posterior inference algorithm and illustrate our model with simulations and a real grouped single-cell data set.

研究了一类可能不可交换的群的聚类问题,这些群的依赖关系可以用已知的有向无环图来表征。为了在不可交换的群之间共享簇,我们提出了一种称为图形狄利克雷过程的贝叶斯非参数方法,该方法通过假设每个随机测度作为狄利克雷过程分布,其浓度参数和基本概率测度依赖于其父群的浓度参数和基本概率测度,从而联合建模依赖于特定组的随机测度。所得到的联合随机过程尊重连接群的有向无环图的马尔可夫性质。我们使用一种新的超图表示、断棒表示、餐馆类型表示和有限混合模型的极限表示来表征图形狄利克雷过程。我们开发了一种有效的后验推理算法,并通过模拟和真实的分组单细胞数据集来说明我们的模型。
{"title":"Graphical Dirichlet Process for Clustering Non-Exchangeable Grouped Data.","authors":"Arhit Chakrabarti, Yang Ni, Ellen Ruth A Morris, Michael L Salinas, Robert S Chapkin, Bani K Mallick","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We consider the problem of clustering grouped data with possibly non-exchangeable groups whose dependencies can be characterized by a known directed acyclic graph. To allow the sharing of clusters among the non-exchangeable groups, we propose a Bayesian nonparametric approach, termed graphical Dirichlet process, that jointly models the dependent group-specific random measures by assuming each random measure to be distributed as a Dirichlet process whose concentration parameter and base probability measure depend on those of its parent groups. The resulting joint stochastic process respects the Markov property of the directed acyclic graph that links the groups. We characterize the graphical Dirichlet process using a novel hypergraph representation as well as the stick-breaking representation, the restaurant-type representation, and the representation as a limit of a finite mixture model. We develop an efficient posterior inference algorithm and illustrate our model with simulations and a real grouped single-cell data set.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"25 ","pages":""},"PeriodicalIF":4.3,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11650374/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142848140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Convergence for nonconvex ADMM, with applications to CT imaging. 非凸 ADMM 的收敛性,并应用于 CT 成像。
IF 4.3 3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-01-01
Rina Foygel Barber, Emil Y Sidky

The alternating direction method of multipliers (ADMM) algorithm is a powerful and flexible tool for complex optimization problems of the form m i n { f ( x ) + g ( y ) : A x + B y = c } . ADMM exhibits robust empirical performance across a range of challenging settings including nonsmoothness and nonconvexity of the objective functions f and g , and provides a simple and natural approach to the inverse problem of image reconstruction for computed tomography (CT) imaging. From the theoretical point of view, existing results for convergence in the nonconvex setting generally assume smoothness in at least one of the component functions in the objective. In this work, our new theoretical results provide convergence guarantees under a restricted strong convexity assumption without requiring smoothness or differentiability, while still allowing differentiable terms to be treated approximately if needed. We validate these theoretical results empirically, with a simulated example where both f and g are nondifferentiable-and thus outside the scope of existing theory-as well as a simulated CT image reconstruction problem.

交替乘数方向法(ADMM)算法是一种强大而灵活的工具,可用于解决形式为 m i n { f ( x ) + g ( y ) : A x + B y = c } 的复杂优化问题。.ADMM 在目标函数 f 和 g 的非光滑性和非凸性等一系列挑战性设置中表现出稳健的经验性能,为计算机断层扫描 (CT) 成像的图像重建逆问题提供了一种简单而自然的方法。从理论角度来看,现有的非凸环境下的收敛结果一般都假设目标函数中至少有一个分量函数是平滑的。在这项工作中,我们的新理论结果提供了在受限强凸假设下的收敛保证,而不要求平滑性或可微性,同时还允许在需要时近似处理可微项。我们通过一个 f 和 g 都不可微的模拟例子(因此超出了现有理论的范围)以及一个模拟 CT 图像重建问题,对这些理论结果进行了经验验证。
{"title":"Convergence for nonconvex ADMM, with applications to CT imaging.","authors":"Rina Foygel Barber, Emil Y Sidky","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The alternating direction method of multipliers (ADMM) algorithm is a powerful and flexible tool for complex optimization problems of the form <math><mi>m</mi> <mi>i</mi> <mi>n</mi> <mo>{</mo> <mi>f</mi> <mo>(</mo> <mi>x</mi> <mo>)</mo> <mo>+</mo> <mi>g</mi> <mo>(</mo> <mi>y</mi> <mo>)</mo> <mspace></mspace> <mo>:</mo> <mspace></mspace> <mi>A</mi> <mi>x</mi> <mo>+</mo> <mi>B</mi> <mi>y</mi> <mo>=</mo> <mi>c</mi> <mo>}</mo></math> . ADMM exhibits robust empirical performance across a range of challenging settings including nonsmoothness and nonconvexity of the objective functions <math><mi>f</mi></math> and <math><mi>g</mi></math> , and provides a simple and natural approach to the inverse problem of image reconstruction for computed tomography (CT) imaging. From the theoretical point of view, existing results for convergence in the nonconvex setting generally assume smoothness in at least one of the component functions in the objective. In this work, our new theoretical results provide convergence guarantees under a restricted strong convexity assumption without requiring smoothness or differentiability, while still allowing differentiable terms to be treated approximately if needed. We validate these theoretical results empirically, with a simulated example where both <math><mi>f</mi></math> and <math><mi>g</mi></math> are nondifferentiable-and thus outside the scope of existing theory-as well as a simulated CT image reconstruction problem.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"25 ","pages":""},"PeriodicalIF":4.3,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11155492/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141297149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Batch Normalization Preconditioning for Stochastic Gradient Langevin Dynamics 随机梯度朗格万动力学的批归一化预处理
IF 6 3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2023-06-01 DOI: 10.4208/jml.220726a
Susanne Lange, Wei Deng, Q. Ye, Guang Lin
{"title":"Batch Normalization Preconditioning for Stochastic Gradient Langevin Dynamics","authors":"Susanne Lange, Wei Deng, Q. Ye, Guang Lin","doi":"10.4208/jml.220726a","DOIUrl":"https://doi.org/10.4208/jml.220726a","url":null,"abstract":"","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"132 2 1","pages":""},"PeriodicalIF":6.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76596604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Local Convergence Theory for the Stochastic Gradient Descent Method in Non-Convex Optimization with NonIsolated Local Minima 具有非孤立局部极小值的非凸优化随机梯度下降法的局部收敛理论
3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2023-06-01 DOI: 10.4208/jml.230106
Taehee Ko and Xiantao Li
Non-convex loss functions arise frequently in modern machine learning, and for the theoretical analysis of stochastic optimization methods, the presence of non-isolated minima presents a unique challenge that has remained under-explored. In this paper, we study the local convergence of the stochastic gradient descent method to non-isolated global minima. Under mild assumptions, we estimate the probability for the iterations to stay near the minima by adopting the notion of stochastic stability. After establishing such stability, we present the lower bound complexity in terms of various error criteria for a given error tolerance ǫ and a failure probability γ .
{"title":"A Local Convergence Theory for the Stochastic Gradient Descent Method in Non-Convex Optimization with NonIsolated Local Minima","authors":"Taehee Ko and Xiantao Li","doi":"10.4208/jml.230106","DOIUrl":"https://doi.org/10.4208/jml.230106","url":null,"abstract":"Non-convex loss functions arise frequently in modern machine learning, and for the theoretical analysis of stochastic optimization methods, the presence of non-isolated minima presents a unique challenge that has remained under-explored. In this paper, we study the local convergence of the stochastic gradient descent method to non-isolated global minima. Under mild assumptions, we estimate the probability for the iterations to stay near the minima by adopting the notion of stochastic stability. After establishing such stability, we present the lower bound complexity in terms of various error criteria for a given error tolerance ǫ and a failure probability γ .","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135674517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Anti-Symmetrization of a Neural Network Layer by Taming the Sign Problem 基于驯服符号问题的神经网络层的有效抗对称
3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2023-06-01 DOI: 10.4208/jml.230703
Nilin Abrahamsen and Lin Lin
Explicit antisymmetrization of a neural network is a potential candidate for a universal function approximator for generic antisymmetric functions, which are ubiquitous in quantum physics. However, this procedure is a priori factorially costly to implement, making it impractical for large numbers of particles. The strategy also suffers from a sign problem. Namely, due to near-exact cancellation of positive and negative contributions, the magnitude of the antisymmetrized function may be significantly smaller than before anti-symmetrization. We show that the anti-symmetric projection of a two-layer neural network can be evaluated efficiently, opening the door to using a generic antisymmetric layer as a building block in anti-symmetric neural network Ansatzes. This approximation is effective when the sign problem is controlled, and we show that this property depends crucially the choice of activation function under standard Xavier/He initialization methods. As a consequence, using a smooth activation function requires re-scaling of the neural network weights compared to standard initializations.
{"title":"Efficient Anti-Symmetrization of a Neural Network Layer by Taming the Sign Problem","authors":"Nilin Abrahamsen and Lin Lin","doi":"10.4208/jml.230703","DOIUrl":"https://doi.org/10.4208/jml.230703","url":null,"abstract":"Explicit antisymmetrization of a neural network is a potential candidate for a universal function approximator for generic antisymmetric functions, which are ubiquitous in quantum physics. However, this procedure is a priori factorially costly to implement, making it impractical for large numbers of particles. The strategy also suffers from a sign problem. Namely, due to near-exact cancellation of positive and negative contributions, the magnitude of the antisymmetrized function may be significantly smaller than before anti-symmetrization. We show that the anti-symmetric projection of a two-layer neural network can be evaluated efficiently, opening the door to using a generic antisymmetric layer as a building block in anti-symmetric neural network Ansatzes. This approximation is effective when the sign problem is controlled, and we show that this property depends crucially the choice of activation function under standard Xavier/He initialization methods. As a consequence, using a smooth activation function requires re-scaling of the neural network weights compared to standard initializations.","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135144017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Brief Survey on the Approximation Theory for Sequence Modelling 序列建模的近似理论综述
3区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2023-06-01 DOI: 10.4208/jml.221221
Haotian Jiang, Qianxiao Li, Zhong Li null, Shida Wang
{"title":"A Brief Survey on the Approximation Theory for Sequence Modelling","authors":"Haotian Jiang, Qianxiao Li, Zhong Li null, Shida Wang","doi":"10.4208/jml.221221","DOIUrl":"https://doi.org/10.4208/jml.221221","url":null,"abstract":"","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135381134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Machine Learning Research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1