首页 > 最新文献

J. Mach. Learn. Res.最新文献

英文 中文
Scalable Computation of Causal Bounds 因果界的可伸缩计算
Pub Date : 2023-08-04 DOI: 10.48550/arXiv.2308.02709
Madhumitha Shridharan, G. Iyengar
We consider the problem of computing bounds for causal queries on causal graphs with unobserved confounders and discrete valued observed variables, where identifiability does not hold. Existing non-parametric approaches for computing such bounds use linear programming (LP) formulations that quickly become intractable for existing solvers because the size of the LP grows exponentially in the number of edges in the causal graph. We show that this LP can be significantly pruned, allowing us to compute bounds for significantly larger causal inference problems compared to existing techniques. This pruning procedure allows us to compute bounds in closed form for a special class of problems, including a well-studied family of problems where multiple confounded treatments influence an outcome. We extend our pruning methodology to fractional LPs which compute bounds for causal queries which incorporate additional observations about the unit. We show that our methods provide significant runtime improvement compared to benchmarks in experiments and extend our results to the finite data setting. For causal inference without additional observations, we propose an efficient greedy heuristic that produces high quality bounds, and scales to problems that are several orders of magnitude larger than those for which the pruned LP can be solved.
我们考虑的问题是计算因果图上的因果查询的边界与未观察到的混杂和离散值观察变量,其中可辨识性不成立。现有的计算这种边界的非参数方法使用线性规划(LP)公式,由于LP的大小随着因果图中边的数量呈指数增长,这种公式很快就变得难以解决。我们表明,与现有技术相比,这种LP可以被显著修剪,使我们能够为更大的因果推理问题计算界限。这种修剪过程使我们能够以封闭形式计算一类特殊问题的界,其中包括经过充分研究的一类问题,其中多个混淆处理会影响结果。我们将我们的修剪方法扩展到分数lp,它计算包含关于单元的额外观察的因果查询的界限。我们表明,与实验中的基准测试相比,我们的方法提供了显着的运行时改进,并将我们的结果扩展到有限的数据设置。对于没有额外观察的因果推理,我们提出了一种有效的贪婪启发式方法,它产生高质量的界,并扩展到比修剪LP可以解决的问题大几个数量级的问题。
{"title":"Scalable Computation of Causal Bounds","authors":"Madhumitha Shridharan, G. Iyengar","doi":"10.48550/arXiv.2308.02709","DOIUrl":"https://doi.org/10.48550/arXiv.2308.02709","url":null,"abstract":"We consider the problem of computing bounds for causal queries on causal graphs with unobserved confounders and discrete valued observed variables, where identifiability does not hold. Existing non-parametric approaches for computing such bounds use linear programming (LP) formulations that quickly become intractable for existing solvers because the size of the LP grows exponentially in the number of edges in the causal graph. We show that this LP can be significantly pruned, allowing us to compute bounds for significantly larger causal inference problems compared to existing techniques. This pruning procedure allows us to compute bounds in closed form for a special class of problems, including a well-studied family of problems where multiple confounded treatments influence an outcome. We extend our pruning methodology to fractional LPs which compute bounds for causal queries which incorporate additional observations about the unit. We show that our methods provide significant runtime improvement compared to benchmarks in experiments and extend our results to the finite data setting. For causal inference without additional observations, we propose an efficient greedy heuristic that produces high quality bounds, and scales to problems that are several orders of magnitude larger than those for which the pruned LP can be solved.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"72 1","pages":"237:1-237:35"},"PeriodicalIF":0.0,"publicationDate":"2023-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72930770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Unified Framework for Factorizing Distributional Value Functions for Multi-Agent Reinforcement Learning 多智能体强化学习中分布值函数分解的统一框架
Pub Date : 2023-06-04 DOI: 10.48550/arXiv.2306.02430
Wei-Fang Sun, Cheng-Kuang Lee, S. See, Chun-Yi Lee
In fully cooperative multi-agent reinforcement learning (MARL) settings, environments are highly stochastic due to the partial observability of each agent and the continuously changing policies of other agents. To address the above issues, we proposed a unified framework, called DFAC, for integrating distributional RL with value function factorization methods. This framework generalizes expected value function factorization methods to enable the factorization of return distributions. To validate DFAC, we first demonstrate its ability to factorize the value functions of a simple matrix game with stochastic rewards. Then, we perform experiments on all Super Hard maps of the StarCraft Multi-Agent Challenge and six self-designed Ultra Hard maps, showing that DFAC is able to outperform a number of baselines.
在完全合作的多智能体强化学习(MARL)设置中,由于每个智能体的部分可观察性和其他智能体不断变化的策略,环境是高度随机的。为了解决上述问题,我们提出了一个统一的框架,称为DFAC,用于集成分布式RL与价值函数分解方法。该框架推广了期望值函数分解方法,以实现收益分布的分解。为了验证DFAC,我们首先展示了它分解具有随机奖励的简单矩阵博弈的价值函数的能力。然后,我们在《星际争霸》Multi-Agent Challenge的所有超硬地图和6张自己设计的超硬地图上进行了实验,结果表明DFAC能够超越许多基线。
{"title":"A Unified Framework for Factorizing Distributional Value Functions for Multi-Agent Reinforcement Learning","authors":"Wei-Fang Sun, Cheng-Kuang Lee, S. See, Chun-Yi Lee","doi":"10.48550/arXiv.2306.02430","DOIUrl":"https://doi.org/10.48550/arXiv.2306.02430","url":null,"abstract":"In fully cooperative multi-agent reinforcement learning (MARL) settings, environments are highly stochastic due to the partial observability of each agent and the continuously changing policies of other agents. To address the above issues, we proposed a unified framework, called DFAC, for integrating distributional RL with value function factorization methods. This framework generalizes expected value function factorization methods to enable the factorization of return distributions. To validate DFAC, we first demonstrate its ability to factorize the value functions of a simple matrix game with stochastic rewards. Then, we perform experiments on all Super Hard maps of the StarCraft Multi-Agent Challenge and six self-designed Ultra Hard maps, showing that DFAC is able to outperform a number of baselines.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"7 1","pages":"220:1-220:32"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87734904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive False Discovery Rate Control with Privacy Guarantee 具有隐私保证的自适应错误发现率控制
Pub Date : 2023-05-31 DOI: 10.48550/arXiv.2305.19482
Xintao Xia, Zhanrui Cai
Differentially private multiple testing procedures can protect the information of individuals used in hypothesis tests while guaranteeing a small fraction of false discoveries. In this paper, we propose a differentially private adaptive FDR control method that can control the classic FDR metric exactly at a user-specified level $alpha$ with privacy guarantee, which is a non-trivial improvement compared to the differentially private Benjamini-Hochberg method proposed in Dwork et al. (2021). Our analysis is based on two key insights: 1) a novel p-value transformation that preserves both privacy and the mirror conservative property, and 2) a mirror peeling algorithm that allows the construction of the filtration and application of the optimal stopping technique. Numerical studies demonstrate that the proposed DP-AdaPT performs better compared to the existing differentially private FDR control methods. Compared to the non-private AdaPT, it incurs a small accuracy loss but significantly reduces the computation cost.
差异私有多重测试程序可以保护假设测试中使用的个人信息,同时保证一小部分错误发现。在本文中,我们提出了一种差分私有自适应FDR控制方法,该方法可以在用户指定的水平$alpha$上精确地控制经典FDR度量,并且具有隐私保证,与Dwork等人(2021)提出的差分私有Benjamini-Hochberg方法相比,这是一个重要的改进。我们的分析基于两个关键见解:1)一种新颖的p值变换,既保留隐私又保留镜像保守性;2)一种镜像剥离算法,允许构建过滤并应用最优停止技术。数值研究表明,与现有的差分私有FDR控制方法相比,所提出的DP-AdaPT控制方法具有更好的性能。与非私有的AdaPT相比,它的精度损失很小,但大大降低了计算成本。
{"title":"Adaptive False Discovery Rate Control with Privacy Guarantee","authors":"Xintao Xia, Zhanrui Cai","doi":"10.48550/arXiv.2305.19482","DOIUrl":"https://doi.org/10.48550/arXiv.2305.19482","url":null,"abstract":"Differentially private multiple testing procedures can protect the information of individuals used in hypothesis tests while guaranteeing a small fraction of false discoveries. In this paper, we propose a differentially private adaptive FDR control method that can control the classic FDR metric exactly at a user-specified level $alpha$ with privacy guarantee, which is a non-trivial improvement compared to the differentially private Benjamini-Hochberg method proposed in Dwork et al. (2021). Our analysis is based on two key insights: 1) a novel p-value transformation that preserves both privacy and the mirror conservative property, and 2) a mirror peeling algorithm that allows the construction of the filtration and application of the optimal stopping technique. Numerical studies demonstrate that the proposed DP-AdaPT performs better compared to the existing differentially private FDR control methods. Compared to the non-private AdaPT, it incurs a small accuracy loss but significantly reduces the computation cost.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"6 1","pages":"252:1-252:35"},"PeriodicalIF":0.0,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88380712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fairlearn: Assessing and Improving Fairness of AI Systems 公平学习:评估和提高人工智能系统的公平性
Pub Date : 2023-03-29 DOI: 10.48550/arXiv.2303.16626
Roman Lutz
Fairlearn is an open source project to help practitioners assess and improve fairness of artificial intelligence (AI) systems. The associated Python library, also named fairlearn, supports evaluation of a model's output across affected populations and includes several algorithms for mitigating fairness issues. Grounded in the understanding that fairness is a sociotechnical challenge, the project integrates learning resources that aid practitioners in considering a system's broader societal context.
Fairlearn是一个开源项目,旨在帮助从业者评估和提高人工智能(AI)系统的公平性。相关的Python库,也称为fairlearn,支持评估模型在受影响人群中的输出,并包括几个算法来减轻公平性问题。基于对公平是一项社会技术挑战的理解,该项目整合了学习资源,帮助从业者考虑系统更广泛的社会背景。
{"title":"Fairlearn: Assessing and Improving Fairness of AI Systems","authors":"Roman Lutz","doi":"10.48550/arXiv.2303.16626","DOIUrl":"https://doi.org/10.48550/arXiv.2303.16626","url":null,"abstract":"Fairlearn is an open source project to help practitioners assess and improve fairness of artificial intelligence (AI) systems. The associated Python library, also named fairlearn, supports evaluation of a model's output across affected populations and includes several algorithms for mitigating fairness issues. Grounded in the understanding that fairness is a sociotechnical challenge, the project integrates learning resources that aid practitioners in considering a system's broader societal context.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"75 1","pages":"257:1-257:8"},"PeriodicalIF":0.0,"publicationDate":"2023-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85219416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Generalization Bounds for Adversarial Contrastive Learning 对抗性对比学习的泛化界限
Pub Date : 2023-02-21 DOI: 10.48550/arXiv.2302.10633
Xin Zou, Weiwei Liu
Deep networks are well-known to be fragile to adversarial attacks, and adversarial training is one of the most popular methods used to train a robust model. To take advantage of unlabeled data, recent works have applied adversarial training to contrastive learning (Adversarial Contrastive Learning; ACL for short) and obtain promising robust performance. However, the theory of ACL is not well understood. To fill this gap, we leverage the Rademacher complexity to analyze the generalization performance of ACL, with a particular focus on linear models and multi-layer neural networks under $ell_p$ attack ($p ge 1$). Our theory shows that the average adversarial risk of the downstream tasks can be upper bounded by the adversarial unsupervised risk of the upstream task. The experimental results validate our theory.
众所周知,深度网络容易受到对抗性攻击,而对抗性训练是训练鲁棒模型最常用的方法之一。为了利用未标记数据,最近的研究将对抗性训练应用于对比学习(对抗性对比学习;ACL(简称ACL),获得了良好的鲁棒性能。然而,ACL的理论并没有得到很好的理解。为了填补这一空白,我们利用Rademacher复杂度来分析ACL的泛化性能,特别关注线性模型和多层神经网络在$ell_p$攻击($p ge 1$)下的性能。我们的理论表明,下游任务的平均对抗风险可以由上游任务的对抗无监督风险上界。实验结果验证了我们的理论。
{"title":"Generalization Bounds for Adversarial Contrastive Learning","authors":"Xin Zou, Weiwei Liu","doi":"10.48550/arXiv.2302.10633","DOIUrl":"https://doi.org/10.48550/arXiv.2302.10633","url":null,"abstract":"Deep networks are well-known to be fragile to adversarial attacks, and adversarial training is one of the most popular methods used to train a robust model. To take advantage of unlabeled data, recent works have applied adversarial training to contrastive learning (Adversarial Contrastive Learning; ACL for short) and obtain promising robust performance. However, the theory of ACL is not well understood. To fill this gap, we leverage the Rademacher complexity to analyze the generalization performance of ACL, with a particular focus on linear models and multi-layer neural networks under $ell_p$ attack ($p ge 1$). Our theory shows that the average adversarial risk of the downstream tasks can be upper bounded by the adversarial unsupervised risk of the upstream task. The experimental results validate our theory.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"23 1","pages":"114:1-114:54"},"PeriodicalIF":0.0,"publicationDate":"2023-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79130580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Intrinsic Gaussian Process on Unknown Manifolds with Probabilistic Metrics 具有概率度量的未知流形上的内征高斯过程
Pub Date : 2023-01-16 DOI: 10.48550/arXiv.2301.06533
Mu Niu, Zhenwen Dai, P. Cheung, Yizhu Wang
This article presents a novel approach to construct Intrinsic Gaussian Processes for regression on unknown manifolds with probabilistic metrics (GPUM) in point clouds. In many real world applications, one often encounters high dimensional data (e.g. point cloud data) centred around some lower dimensional unknown manifolds. The geometry of manifold is in general different from the usual Euclidean geometry. Naively applying traditional smoothing methods such as Euclidean Gaussian Processes (GPs) to manifold valued data and so ignoring the geometry of the space can potentially lead to highly misleading predictions and inferences. A manifold embedded in a high dimensional Euclidean space can be well described by a probabilistic mapping function and the corresponding latent space. We investigate the geometrical structure of the unknown manifolds using the Bayesian Gaussian Processes latent variable models(BGPLVM) and Riemannian geometry. The distribution of the metric tensor is learned using BGPLVM. The boundary of the resulting manifold is defined based on the uncertainty quantification of the mapping. We use the the probabilistic metric tensor to simulate Brownian Motion paths on the unknown manifold. The heat kernel is estimated as the transition density of Brownian Motion and used as the covariance functions of GPUM. The applications of GPUM are illustrated in the simulation studies on the Swiss roll, high dimensional real datasets of WiFi signals and image data examples. Its performance is compared with the Graph Laplacian GP, Graph Matern GP and Euclidean GP.
本文提出了一种构造内禀高斯过程的新方法,用于点云中具有概率度量的未知流形回归。在许多现实世界的应用中,人们经常遇到以一些低维未知流形为中心的高维数据(例如点云数据)。流形的几何一般不同于通常的欧几里得几何。天真地将传统的平滑方法(如欧几里得高斯过程(GPs))应用于流形值数据,从而忽略空间的几何形状,可能会导致高度误导性的预测和推断。嵌入在高维欧几里德空间中的流形可以用概率映射函数和相应的隐空间很好地描述。我们利用贝叶斯高斯过程隐变量模型(BGPLVM)和黎曼几何研究了未知流形的几何结构。使用BGPLVM学习度量张量的分布。根据映射的不确定性量化来定义生成流形的边界。我们用概率度量张量来模拟未知流形上的布朗运动路径。热核估计为布朗运动的跃迁密度,并作为gpu的协方差函数。通过对瑞士卷的仿真研究、WiFi信号的高维真实数据集和图像数据实例说明了gpu的应用。将其性能与图拉普拉斯GP、图Matern GP和欧几里得GP进行了比较。
{"title":"Intrinsic Gaussian Process on Unknown Manifolds with Probabilistic Metrics","authors":"Mu Niu, Zhenwen Dai, P. Cheung, Yizhu Wang","doi":"10.48550/arXiv.2301.06533","DOIUrl":"https://doi.org/10.48550/arXiv.2301.06533","url":null,"abstract":"This article presents a novel approach to construct Intrinsic Gaussian Processes for regression on unknown manifolds with probabilistic metrics (GPUM) in point clouds. In many real world applications, one often encounters high dimensional data (e.g. point cloud data) centred around some lower dimensional unknown manifolds. The geometry of manifold is in general different from the usual Euclidean geometry. Naively applying traditional smoothing methods such as Euclidean Gaussian Processes (GPs) to manifold valued data and so ignoring the geometry of the space can potentially lead to highly misleading predictions and inferences. A manifold embedded in a high dimensional Euclidean space can be well described by a probabilistic mapping function and the corresponding latent space. We investigate the geometrical structure of the unknown manifolds using the Bayesian Gaussian Processes latent variable models(BGPLVM) and Riemannian geometry. The distribution of the metric tensor is learned using BGPLVM. The boundary of the resulting manifold is defined based on the uncertainty quantification of the mapping. We use the the probabilistic metric tensor to simulate Brownian Motion paths on the unknown manifold. The heat kernel is estimated as the transition density of Brownian Motion and used as the covariance functions of GPUM. The applications of GPUM are illustrated in the simulation studies on the Swiss roll, high dimensional real datasets of WiFi signals and image data examples. Its performance is compared with the Graph Laplacian GP, Graph Matern GP and Euclidean GP.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"23 1","pages":"104:1-104:42"},"PeriodicalIF":0.0,"publicationDate":"2023-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72749755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Minimal Width for Universal Property of Deep RNN 深度RNN通用性的最小宽度
Pub Date : 2022-11-25 DOI: 10.48550/arXiv.2211.13866
Changhoon Song, Geonho Hwang, Jun ho Lee, Myung-joo Kang
A recurrent neural network (RNN) is a widely used deep-learning network for dealing with sequential data. Imitating a dynamical system, an infinite-width RNN can approximate any open dynamical system in a compact domain. In general, deep networks with bounded widths are more effective than wide networks in practice; however, the universal approximation theorem for deep narrow structures has yet to be extensively studied. In this study, we prove the universality of deep narrow RNNs and show that the upper bound of the minimum width for universality can be independent of the length of the data. Specifically, we show that a deep RNN with ReLU activation can approximate any continuous function or $L^p$ function with the widths $d_x+d_y+2$ and $max{d_x+1,d_y}$, respectively, where the target function maps a finite sequence of vectors in $mathbb{R}^{d_x}$ to a finite sequence of vectors in $mathbb{R}^{d_y}$. We also compute the additional width required if the activation function is $tanh$ or more. In addition, we prove the universality of other recurrent networks, such as bidirectional RNNs. Bridging a multi-layer perceptron and an RNN, our theory and proof technique can be an initial step toward further research on deep RNNs.
递归神经网络(RNN)是一种广泛应用于序列数据处理的深度学习网络。无限宽RNN可以模拟动态系统,在紧域上近似任何开放的动态系统。一般来说,在实践中,有界宽度的深度网络比宽网络更有效;然而,深窄结构的普遍近似定理还没有得到广泛的研究。在本研究中,我们证明了深窄rnn的普遍性,并证明了普遍性的最小宽度的上界可以独立于数据的长度。具体来说,我们证明了具有ReLU激活的深度RNN可以分别近似宽度为$d_x+d_y+2$和$max{d_x+1,d_y}$的任何连续函数或$L^p$函数,其中目标函数将$mathbb{R}^{d_x}$中的有限向量序列映射到$mathbb{R}^{d_y}$中的有限向量序列。如果激活函数为$tanh$或更多,我们还计算所需的额外宽度。此外,我们证明了其他递归网络的通用性,如双向rnn。将多层感知器和RNN连接起来,我们的理论和证明技术可以成为进一步研究深度RNN的第一步。
{"title":"Minimal Width for Universal Property of Deep RNN","authors":"Changhoon Song, Geonho Hwang, Jun ho Lee, Myung-joo Kang","doi":"10.48550/arXiv.2211.13866","DOIUrl":"https://doi.org/10.48550/arXiv.2211.13866","url":null,"abstract":"A recurrent neural network (RNN) is a widely used deep-learning network for dealing with sequential data. Imitating a dynamical system, an infinite-width RNN can approximate any open dynamical system in a compact domain. In general, deep networks with bounded widths are more effective than wide networks in practice; however, the universal approximation theorem for deep narrow structures has yet to be extensively studied. In this study, we prove the universality of deep narrow RNNs and show that the upper bound of the minimum width for universality can be independent of the length of the data. Specifically, we show that a deep RNN with ReLU activation can approximate any continuous function or $L^p$ function with the widths $d_x+d_y+2$ and $max{d_x+1,d_y}$, respectively, where the target function maps a finite sequence of vectors in $mathbb{R}^{d_x}$ to a finite sequence of vectors in $mathbb{R}^{d_y}$. We also compute the additional width required if the activation function is $tanh$ or more. In addition, we prove the universality of other recurrent networks, such as bidirectional RNNs. Bridging a multi-layer perceptron and an RNN, our theory and proof technique can be an initial step toward further research on deep RNNs.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"200 1","pages":"121:1-121:41"},"PeriodicalIF":0.0,"publicationDate":"2022-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79915630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Adaptive Data Depth via Multi-Armed Bandits 基于多武装强盗的自适应数据深度
Pub Date : 2022-11-08 DOI: 10.48550/arXiv.2211.03985
Tavor Z. Baharav, T. Lai
Data depth, introduced by Tukey (1975), is an important tool in data science, robust statistics, and computational geometry. One chief barrier to its broader practical utility is that many common measures of depth are computationally intensive, requiring on the order of $n^d$ operations to exactly compute the depth of a single point within a data set of $n$ points in $d$-dimensional space. Often however, we are not directly interested in the absolute depths of the points, but rather in their relative ordering. For example, we may want to find the most central point in a data set (a generalized median), or to identify and remove all outliers (points on the fringe of the data set with low depth). With this observation, we develop a novel and instance-adaptive algorithm for adaptive data depth computation by reducing the problem of exactly computing $n$ depths to an $n$-armed stochastic multi-armed bandit problem which we can efficiently solve. We focus our exposition on simplicial depth, developed by Liu (1990), which has emerged as a promising notion of depth due to its interpretability and asymptotic properties. We provide general instance-dependent theoretical guarantees for our proposed algorithms, which readily extend to many other common measures of data depth including majority depth, Oja depth, and likelihood depth. When specialized to the case where the gaps in the data follow a power law distribution with parameter $alpha<2$, we show that we can reduce the complexity of identifying the deepest point in the data set (the simplicial median) from $O(n^d)$ to $tilde{O}(n^{d-(d-1)alpha/2})$, where $tilde{O}$ suppresses logarithmic factors. We corroborate our theoretical results with numerical experiments on synthetic data, showing the practical utility of our proposed methods.
由Tukey(1975)引入的数据深度是数据科学、鲁棒统计和计算几何中的重要工具。其更广泛的实际应用的一个主要障碍是,许多常见的深度测量都是计算密集型的,需要在$d$维空间的$n$点的数据集中精确计算单个点的深度,其运算顺序为$n^d$。然而,我们通常并不直接对点的绝对深度感兴趣,而是对它们的相对顺序感兴趣。例如,我们可能想要在数据集中找到最中心的点(广义中位数),或者识别并删除所有离群点(数据集边缘上的低深度点)。基于这一观察结果,我们开发了一种新颖的实例自适应自适应数据深度计算算法,将精确计算$n$深度的问题简化为一个$n$臂随机多臂强盗问题,我们可以有效地解决这个问题。我们将重点放在Liu(1990)开发的简单深度上,由于其可解释性和渐近性,它已成为一个有前途的深度概念。我们为我们提出的算法提供了一般的依赖实例的理论保证,这些算法很容易扩展到许多其他常见的数据深度度量,包括多数深度、Oja深度和似然深度。当专门研究数据中的间隙遵循参数$alpha<2$的幂律分布的情况时,我们表明,我们可以降低识别数据集中最深点(简单中位数)从$O(n^d)$到$tilde{O}(n^{d-(d-1)alpha/2})$的复杂性,其中$tilde{O}$抑制对数因子。我们用合成数据的数值实验证实了我们的理论结果,表明了我们提出的方法的实用性。
{"title":"Adaptive Data Depth via Multi-Armed Bandits","authors":"Tavor Z. Baharav, T. Lai","doi":"10.48550/arXiv.2211.03985","DOIUrl":"https://doi.org/10.48550/arXiv.2211.03985","url":null,"abstract":"Data depth, introduced by Tukey (1975), is an important tool in data science, robust statistics, and computational geometry. One chief barrier to its broader practical utility is that many common measures of depth are computationally intensive, requiring on the order of $n^d$ operations to exactly compute the depth of a single point within a data set of $n$ points in $d$-dimensional space. Often however, we are not directly interested in the absolute depths of the points, but rather in their relative ordering. For example, we may want to find the most central point in a data set (a generalized median), or to identify and remove all outliers (points on the fringe of the data set with low depth). With this observation, we develop a novel and instance-adaptive algorithm for adaptive data depth computation by reducing the problem of exactly computing $n$ depths to an $n$-armed stochastic multi-armed bandit problem which we can efficiently solve. We focus our exposition on simplicial depth, developed by Liu (1990), which has emerged as a promising notion of depth due to its interpretability and asymptotic properties. We provide general instance-dependent theoretical guarantees for our proposed algorithms, which readily extend to many other common measures of data depth including majority depth, Oja depth, and likelihood depth. When specialized to the case where the gaps in the data follow a power law distribution with parameter $alpha<2$, we show that we can reduce the complexity of identifying the deepest point in the data set (the simplicial median) from $O(n^d)$ to $tilde{O}(n^{d-(d-1)alpha/2})$, where $tilde{O}$ suppresses logarithmic factors. We corroborate our theoretical results with numerical experiments on synthetic data, showing the practical utility of our proposed methods.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"11 1","pages":"155:1-155:29"},"PeriodicalIF":0.0,"publicationDate":"2022-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75395385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model 176B参数语言模型BLOOM的碳足迹估算
Pub Date : 2022-11-03 DOI: 10.48550/arXiv.2211.02001
A. Luccioni, S. Viguier, Anne-Laure Ligozat
Progress in machine learning (ML) comes with a cost to the environment, given that training ML models requires significant computational resources, energy and materials. In the present article, we aim to quantify the carbon footprint of BLOOM, a 176-billion parameter language model, across its life cycle. We estimate that BLOOM's final training emitted approximately 24.7 tonnes of~carboneq~if we consider only the dynamic power consumption, and 50.5 tonnes if we account for all processes ranging from equipment manufacturing to energy-based operational consumption. We also study the energy requirements and carbon emissions of its deployment for inference via an API endpoint receiving user queries in real-time. We conclude with a discussion regarding the difficulty of precisely estimating the carbon footprint of ML models and future research directions that can contribute towards improving carbon emissions reporting.
机器学习(ML)的进步伴随着环境的代价,因为训练ML模型需要大量的计算资源、能源和材料。在本文中,我们旨在量化BLOOM(一个1760亿参数语言模型)在其整个生命周期中的碳足迹。我们估计,如果只考虑动态功耗,BLOOM的最终培训排放了大约24.7吨碳当量,如果考虑从设备制造到基于能源的运营消耗的所有过程,则排放了50.5吨碳当量。我们还研究了通过实时接收用户查询的API端点进行推理的部署的能源需求和碳排放。最后,我们讨论了精确估计ML模型碳足迹的难度,以及有助于改进碳排放报告的未来研究方向。
{"title":"Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model","authors":"A. Luccioni, S. Viguier, Anne-Laure Ligozat","doi":"10.48550/arXiv.2211.02001","DOIUrl":"https://doi.org/10.48550/arXiv.2211.02001","url":null,"abstract":"Progress in machine learning (ML) comes with a cost to the environment, given that training ML models requires significant computational resources, energy and materials. In the present article, we aim to quantify the carbon footprint of BLOOM, a 176-billion parameter language model, across its life cycle. We estimate that BLOOM's final training emitted approximately 24.7 tonnes of~carboneq~if we consider only the dynamic power consumption, and 50.5 tonnes if we account for all processes ranging from equipment manufacturing to energy-based operational consumption. We also study the energy requirements and carbon emissions of its deployment for inference via an API endpoint receiving user queries in real-time. We conclude with a discussion regarding the difficulty of precisely estimating the carbon footprint of ML models and future research directions that can contribute towards improving carbon emissions reporting.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"174 1","pages":"253:1-253:15"},"PeriodicalIF":0.0,"publicationDate":"2022-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86462319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities 高效计算深度学习:算法趋势和机遇
Pub Date : 2022-10-13 DOI: 10.48550/arXiv.2210.06640
Brian Bartoldson, B. Kailkhura, Davis W. Blalock
Although deep learning has made great progress in recent years, the exploding economic and environmental costs of training neural networks are becoming unsustainable. To address this problem, there has been a great deal of research on *algorithmically-efficient deep learning*, which seeks to reduce training costs not at the hardware or implementation level, but through changes in the semantics of the training program. In this paper, we present a structured and comprehensive overview of the research in this field. First, we formalize the *algorithmic speedup* problem, then we use fundamental building blocks of algorithmically efficient training to develop a taxonomy. Our taxonomy highlights commonalities of seemingly disparate methods and reveals current research gaps. Next, we present evaluation best practices to enable comprehensive, fair, and reliable comparisons of speedup techniques. To further aid research and applications, we discuss common bottlenecks in the training pipeline (illustrated via experiments) and offer taxonomic mitigation strategies for them. Finally, we highlight some unsolved research challenges and present promising future directions.
为了解决这个问题,已经有大量关于“算法高效深度学习”的研究,它寻求减少训练成本,而不是在硬件或实现层面,而是通过改变训练程序的语义。在本文中,我们对这一领域的研究进行了结构化和全面的概述。首先,我们将“算法加速”问题形式化,然后使用算法高效训练的基本构建块来开发分类法。我们的分类法突出了看似不同的方法的共性,并揭示了当前的研究差距。接下来,我们将介绍评估最佳实践,以便对加速技术进行全面、公平和可靠的比较。为了进一步帮助研究和应用,我们讨论了培训管道中的常见瓶颈(通过实验说明),并提供了分类缓解策略。最后,我们强调了一些尚未解决的研究挑战,并提出了有希望的未来方向。
{"title":"Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities","authors":"Brian Bartoldson, B. Kailkhura, Davis W. Blalock","doi":"10.48550/arXiv.2210.06640","DOIUrl":"https://doi.org/10.48550/arXiv.2210.06640","url":null,"abstract":"Although deep learning has made great progress in recent years, the exploding economic and environmental costs of training neural networks are becoming unsustainable. To address this problem, there has been a great deal of research on *algorithmically-efficient deep learning*, which seeks to reduce training costs not at the hardware or implementation level, but through changes in the semantics of the training program. In this paper, we present a structured and comprehensive overview of the research in this field. First, we formalize the *algorithmic speedup* problem, then we use fundamental building blocks of algorithmically efficient training to develop a taxonomy. Our taxonomy highlights commonalities of seemingly disparate methods and reveals current research gaps. Next, we present evaluation best practices to enable comprehensive, fair, and reliable comparisons of speedup techniques. To further aid research and applications, we discuss common bottlenecks in the training pipeline (illustrated via experiments) and offer taxonomic mitigation strategies for them. Finally, we highlight some unsolved research challenges and present promising future directions.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"36 1","pages":"122:1-122:77"},"PeriodicalIF":0.0,"publicationDate":"2022-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90619569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
期刊
J. Mach. Learn. Res.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1