Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning最新文献_第3页

Can Neural Network Memorization Be Localized? 神经网络记忆可以局部化吗?

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning

Pub Date : 2023-07-18 DOI: 10.48550/arXiv.2307.09542

Pratyush Maini, M. Mozer, Hanie Sedghi, Zachary Chase Lipton, J. Z. Kolter, Chiyuan Zhang

Recent efforts at explaining the interplay of memorization and generalization in deep overparametrized networks have posited that neural networks $textit{memorize}$"hard"examples in the final few layers of the model. Memorization refers to the ability to correctly predict on $textit{atypical}$ examples of the training set. In this work, we show that rather than being confined to individual layers, memorization is a phenomenon confined to a small set of neurons in various layers of the model. First, via three experimental sources of converging evidence, we find that most layers are redundant for the memorization of examples and the layers that contribute to example memorization are, in general, not the final layers. The three sources are $textit{gradient accounting}$ (measuring the contribution to the gradient norms from memorized and clean examples), $textit{layer rewinding}$ (replacing specific model weights of a converged model with previous training checkpoints), and $textit{retraining}$ (training rewound layers only on clean examples). Second, we ask a more generic question: can memorization be localized $textit{anywhere}$ in a model? We discover that memorization is often confined to a small number of neurons or channels (around 5) of the model. Based on these insights we propose a new form of dropout -- $textit{example-tied dropout}$ that enables us to direct the memorization of examples to an apriori determined set of neurons. By dropping out these neurons, we are able to reduce the accuracy on memorized examples from $100%to3%$, while also reducing the generalization gap.

最近在解释深度超参数化网络中记忆和泛化的相互作用方面的努力已经假设神经网络$textit{memorize}$在模型的最后几层中的“硬”示例。记忆是指正确预测$textit{atypical}$训练集样本的能力。在这项工作中，我们表明，记忆不是局限于单个层，而是一种局限于模型各层中的一小组神经元的现象。首先，通过三个汇聚证据的实验来源，我们发现大多数层对于记忆示例是冗余的，并且有助于记忆示例的层通常不是最终层。这三个来源分别是$textit{gradient accounting}$(测量记忆和干净示例对梯度规范的贡献)、$textit{layer rewinding}$(用以前的训练检查点替换聚合模型的特定模型权重)和$textit{retraining}$(仅在干净示例上训练重绕层)。第二，我们问一个更一般的问题:记忆可以在模型中本地化$textit{anywhere}$吗?我们发现，记忆通常局限于模型的少数神经元或通道(约5个)。基于这些见解，我们提出了一种新的辍学形式——$textit{example-tied dropout}$，它使我们能够将示例的记忆引导到先验确定的神经元集上。通过去掉这些神经元，我们能够降低$100%to3%$中记忆样本的准确性，同时也减少了泛化差距。

{"title":"Can Neural Network Memorization Be Localized?","authors":"Pratyush Maini, M. Mozer, Hanie Sedghi, Zachary Chase Lipton, J. Z. Kolter, Chiyuan Zhang","doi":"10.48550/arXiv.2307.09542","DOIUrl":"https://doi.org/10.48550/arXiv.2307.09542","url":null,"abstract":"Recent efforts at explaining the interplay of memorization and generalization in deep overparametrized networks have posited that neural networks $textit{memorize}$\"hard\"examples in the final few layers of the model. Memorization refers to the ability to correctly predict on $textit{atypical}$ examples of the training set. In this work, we show that rather than being confined to individual layers, memorization is a phenomenon confined to a small set of neurons in various layers of the model. First, via three experimental sources of converging evidence, we find that most layers are redundant for the memorization of examples and the layers that contribute to example memorization are, in general, not the final layers. The three sources are $textit{gradient accounting}$ (measuring the contribution to the gradient norms from memorized and clean examples), $textit{layer rewinding}$ (replacing specific model weights of a converged model with previous training checkpoints), and $textit{retraining}$ (training rewound layers only on clean examples). Second, we ask a more generic question: can memorization be localized $textit{anywhere}$ in a model? We discover that memorization is often confined to a small number of neurons or channels (around 5) of the model. Based on these insights we propose a new form of dropout -- $textit{example-tied dropout}$ that enables us to direct the memorization of examples to an apriori determined set of neurons. By dropping out these neurons, we are able to reduce the accuracy on memorized examples from $100%to3%$, while also reducing the generalization gap.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"49 1","pages":"23536-23557"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74074256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Autoregressive Diffusion Model for Graph Generation 图生成的自回归扩散模型

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning

Pub Date : 2023-07-17 DOI: 10.48550/arXiv.2307.08849

Lingkai Kong, Jiaming Cui, Haotian Sun, Yuchen Zhuang, B. Prakash, Chao Zhang

Diffusion-based graph generative models have recently obtained promising results for graph generation. However, existing diffusion-based graph generative models are mostly one-shot generative models that apply Gaussian diffusion in the dequantized adjacency matrix space. Such a strategy can suffer from difficulty in model training, slow sampling speed, and incapability of incorporating constraints. We propose an emph{autoregressive diffusion} model for graph generation. Unlike existing methods, we define a node-absorbing diffusion process that operates directly in the discrete graph space. For forward diffusion, we design a emph{diffusion ordering network}, which learns a data-dependent node absorbing ordering from graph topology. For reverse generation, we design a emph{denoising network} that uses the reverse node ordering to efficiently reconstruct the graph by predicting the node type of the new node and its edges with previously denoised nodes at a time. Based on the permutation invariance of graph, we show that the two networks can be jointly trained by optimizing a simple lower bound of data likelihood. Our experiments on six diverse generic graph datasets and two molecule datasets show that our model achieves better or comparable generation performance with previous state-of-the-art, and meanwhile enjoys fast generation speed.

基于扩散的图生成模型最近在图生成方面取得了可喜的成果。然而，现有的基于扩散的图生成模型大多是在去量化邻接矩阵空间中应用高斯扩散的一次性生成模型。这样的策略在模型训练上有困难，采样速度慢，不能结合约束。我们提出了一emph{种自回归扩散}模型用于图的生成。与现有方法不同，我们定义了一个直接在离散图空间中操作的节点吸收扩散过程。对于正向扩散，我们设计了一个emph{扩散排序网络}，该网络从图拓扑中学习数据依赖节点，吸收排序。对于反向生成，我们设计了一个使用反向节点排序的emph{去噪网络}，通过一次预测新节点及其边缘与先前去噪节点的节点类型来有效地重建图。基于图的排列不变性，我们证明了两个网络可以通过优化一个简单的数据似然下界来联合训练。我们在6个不同的通用图数据集和2个分子数据集上的实验表明，我们的模型达到了与现有技术更好或相当的生成性能，同时具有较快的生成速度。

{"title":"Autoregressive Diffusion Model for Graph Generation","authors":"Lingkai Kong, Jiaming Cui, Haotian Sun, Yuchen Zhuang, B. Prakash, Chao Zhang","doi":"10.48550/arXiv.2307.08849","DOIUrl":"https://doi.org/10.48550/arXiv.2307.08849","url":null,"abstract":"Diffusion-based graph generative models have recently obtained promising results for graph generation. However, existing diffusion-based graph generative models are mostly one-shot generative models that apply Gaussian diffusion in the dequantized adjacency matrix space. Such a strategy can suffer from difficulty in model training, slow sampling speed, and incapability of incorporating constraints. We propose an emph{autoregressive diffusion} model for graph generation. Unlike existing methods, we define a node-absorbing diffusion process that operates directly in the discrete graph space. For forward diffusion, we design a emph{diffusion ordering network}, which learns a data-dependent node absorbing ordering from graph topology. For reverse generation, we design a emph{denoising network} that uses the reverse node ordering to efficiently reconstruct the graph by predicting the node type of the new node and its edges with previously denoised nodes at a time. Based on the permutation invariance of graph, we show that the two networks can be jointly trained by optimizing a simple lower bound of data likelihood. Our experiments on six diverse generic graph datasets and two molecule datasets show that our model achieves better or comparable generation performance with previous state-of-the-art, and meanwhile enjoys fast generation speed.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"8 1","pages":"17391-17408"},"PeriodicalIF":0.0,"publicationDate":"2023-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81534047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Learning Expressive Priors for Generalization and Uncertainty Estimation in Neural Networks 神经网络泛化与不确定性估计的表达先验学习

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning

Pub Date : 2023-07-15 DOI: 10.48550/arXiv.2307.07753

Dominik Schnaus, Jongseok Lee, D. Cremers, Rudolph Triebel

In this work, we propose a novel prior learning method for advancing generalization and uncertainty estimation in deep neural networks. The key idea is to exploit scalable and structured posteriors of neural networks as informative priors with generalization guarantees. Our learned priors provide expressive probabilistic representations at large scale, like Bayesian counterparts of pre-trained models on ImageNet, and further produce non-vacuous generalization bounds. We also extend this idea to a continual learning framework, where the favorable properties of our priors are desirable. Major enablers are our technical contributions: (1) the sums-of-Kronecker-product computations, and (2) the derivations and optimizations of tractable objectives that lead to improved generalization bounds. Empirically, we exhaustively show the effectiveness of this method for uncertainty estimation and generalization.

在这项工作中，我们提出了一种新的先验学习方法来提高深度神经网络的泛化和不确定性估计。关键思想是利用神经网络的可扩展和结构化后验作为具有泛化保证的信息先验。我们学习到的先验提供了大规模的表达性概率表示，就像ImageNet上预训练模型的贝叶斯对应，并进一步产生非空洞的泛化边界。我们还将这个想法扩展到持续学习框架中，在这个框架中，我们先验的有利属性是可取的。主要的推动因素是我们的技术贡献:(1)kronecker -product计算的总和，(2)可处理目标的推导和优化，导致改进的泛化界限。通过实证，充分证明了该方法对不确定性估计和泛化的有效性。

引用次数: 0

Drug Discovery under Covariate Shift with Domain-Informed Prior Distributions over Functions 协变量移位下的药物发现与域知情先验分布

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning

Pub Date : 2023-07-14 DOI: 10.48550/arXiv.2307.15073

Leo Klarner, Tim G. J. Rudner, M. Reutlinger, Torsten Schindler, G. Morris, C. Deane, Y. Teh

Accelerating the discovery of novel and more effective therapeutics is an important pharmaceutical problem in which deep learning is playing an increasingly significant role. However, real-world drug discovery tasks are often characterized by a scarcity of labeled data and significant covariate shift$unicode{x2013}unicode{x2013}$a setting that poses a challenge to standard deep learning methods. In this paper, we present Q-SAVI, a probabilistic model able to address these challenges by encoding explicit prior knowledge of the data-generating process into a prior distribution over functions, presenting researchers with a transparent and probabilistically principled way to encode data-driven modeling preferences. Building on a novel, gold-standard bioactivity dataset that facilitates a meaningful comparison of models in an extrapolative regime, we explore different approaches to induce data shift and construct a challenging evaluation setup. We then demonstrate that using Q-SAVI to integrate contextualized prior knowledge of drug-like chemical space into the modeling process affords substantial gains in predictive accuracy and calibration, outperforming a broad range of state-of-the-art self-supervised pre-training and domain adaptation techniques.

加速发现新的和更有效的治疗方法是一个重要的制药问题，深度学习在其中发挥着越来越重要的作用。然而，现实世界的药物发现任务通常以标记数据的稀缺性和显著的协变量移位$unicode{x2013} $为特征，这对标准的深度学习方法提出了挑战。在本文中，我们提出了Q-SAVI，一种概率模型，能够通过将数据生成过程的显式先验知识编码为函数上的先验分布来解决这些挑战，为研究人员提供了一种透明和概率原则的方法来编码数据驱动的建模偏好。建立在一个新的、金标准的生物活性数据集上，促进了外推机制中模型的有意义的比较，我们探索了不同的方法来诱导数据转移，并构建了一个具有挑战性的评估设置。然后，我们证明，使用Q-SAVI将药物样化学空间的上下文先验知识集成到建模过程中，在预测准确性和校准方面取得了重大进展，优于广泛的最先进的自监督预训练和领域自适应技术。

{"title":"Drug Discovery under Covariate Shift with Domain-Informed Prior Distributions over Functions","authors":"Leo Klarner, Tim G. J. Rudner, M. Reutlinger, Torsten Schindler, G. Morris, C. Deane, Y. Teh","doi":"10.48550/arXiv.2307.15073","DOIUrl":"https://doi.org/10.48550/arXiv.2307.15073","url":null,"abstract":"Accelerating the discovery of novel and more effective therapeutics is an important pharmaceutical problem in which deep learning is playing an increasingly significant role. However, real-world drug discovery tasks are often characterized by a scarcity of labeled data and significant covariate shift$unicode{x2013}unicode{x2013}$a setting that poses a challenge to standard deep learning methods. In this paper, we present Q-SAVI, a probabilistic model able to address these challenges by encoding explicit prior knowledge of the data-generating process into a prior distribution over functions, presenting researchers with a transparent and probabilistically principled way to encode data-driven modeling preferences. Building on a novel, gold-standard bioactivity dataset that facilitates a meaningful comparison of models in an extrapolative regime, we explore different approaches to induce data shift and construct a challenging evaluation setup. We then demonstrate that using Q-SAVI to integrate contextualized prior knowledge of drug-like chemical space into the modeling process affords substantial gains in predictive accuracy and calibration, outperforming a broad range of state-of-the-art self-supervised pre-training and domain adaptation techniques.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"31 1","pages":"17176-17197"},"PeriodicalIF":0.0,"publicationDate":"2023-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79998982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Sequential Monte Carlo Learning for Time Series Structure Discovery 时序蒙特卡罗学习在时间序列结构发现中的应用

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning

Pub Date : 2023-07-13 DOI: 10.48550/arXiv.2307.09607

Feras A. Saad, Brian Patton, Matt Hoffman, R. Saurous, Vikash K. Mansinghka

This paper presents a new approach to automatically discovering accurate models of complex time series data. Working within a Bayesian nonparametric prior over a symbolic space of Gaussian process time series models, we present a novel structure learning algorithm that integrates sequential Monte Carlo (SMC) and involutive MCMC for highly effective posterior inference. Our method can be used both in"online"settings, where new data is incorporated sequentially in time, and in"offline"settings, by using nested subsets of historical data to anneal the posterior. Empirical measurements on real-world time series show that our method can deliver 10x--100x runtime speedups over previous MCMC and greedy-search structure learning algorithms targeting the same model family. We use our method to perform the first large-scale evaluation of Gaussian process time series structure learning on a prominent benchmark of 1,428 econometric datasets. The results show that our method discovers sensible models that deliver more accurate point forecasts and interval forecasts over multiple horizons as compared to widely used statistical and neural baselines that struggle on this challenging data.

本文提出了一种自动发现复杂时间序列数据精确模型的新方法。在高斯过程时间序列模型的符号空间上的贝叶斯非参数先验中，我们提出了一种新的结构学习算法，该算法集成了顺序蒙特卡罗(SMC)和对合MCMC，用于高效的后验推理。我们的方法既可以用于“在线”设置，其中新数据按时间顺序合并，也可以用于“离线”设置，通过使用历史数据的嵌套子集来退火后验。对真实世界时间序列的经验测量表明，我们的方法可以比以前针对同一模型族的MCMC和贪婪搜索结构学习算法提供10倍-100倍的运行时加速。我们使用我们的方法在1428个计量经济数据集的突出基准上对高斯过程时间序列结构学习进行了第一次大规模评估。结果表明，与广泛使用的统计和神经基线相比，我们的方法发现了合理的模型，这些模型可以在多个视界上提供更准确的点预测和区间预测。

{"title":"Sequential Monte Carlo Learning for Time Series Structure Discovery","authors":"Feras A. Saad, Brian Patton, Matt Hoffman, R. Saurous, Vikash K. Mansinghka","doi":"10.48550/arXiv.2307.09607","DOIUrl":"https://doi.org/10.48550/arXiv.2307.09607","url":null,"abstract":"This paper presents a new approach to automatically discovering accurate models of complex time series data. Working within a Bayesian nonparametric prior over a symbolic space of Gaussian process time series models, we present a novel structure learning algorithm that integrates sequential Monte Carlo (SMC) and involutive MCMC for highly effective posterior inference. Our method can be used both in\"online\"settings, where new data is incorporated sequentially in time, and in\"offline\"settings, by using nested subsets of historical data to anneal the posterior. Empirical measurements on real-world time series show that our method can deliver 10x--100x runtime speedups over previous MCMC and greedy-search structure learning algorithms targeting the same model family. We use our method to perform the first large-scale evaluation of Gaussian process time series structure learning on a prominent benchmark of 1,428 econometric datasets. The results show that our method discovers sensible models that deliver more accurate point forecasts and interval forecasts over multiple horizons as compared to widely used statistical and neural baselines that struggle on this challenging data.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"1 1","pages":"29473-29489"},"PeriodicalIF":0.0,"publicationDate":"2023-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76885590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Trainability, Expressivity and Interpretability in Gated Neural ODEs 门控神经ode的可训练性、表达性和可解释性

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning

Pub Date : 2023-07-12 DOI: 10.48550/arXiv.2307.06398

T. Kim, T. Can, K. Krishnamurthy

Understanding how the dynamics in biological and artificial neural networks implement the computations required for a task is a salient open question in machine learning and neuroscience. In particular, computations requiring complex memory storage and retrieval pose a significant challenge for these networks to implement or learn. Recently, a family of models described by neural ordinary differential equations (nODEs) has emerged as powerful dynamical neural network models capable of capturing complex dynamics. Here, we extend nODEs by endowing them with adaptive timescales using gating interactions. We refer to these as gated neural ODEs (gnODEs). Using a task that requires memory of continuous quantities, we demonstrate the inductive bias of the gnODEs to learn (approximate) continuous attractors. We further show how reduced-dimensional gnODEs retain their modeling power while greatly improving interpretability, even allowing explicit visualization of the structure of learned attractors. We introduce a novel measure of expressivity which probes the capacity of a neural network to generate complex trajectories. Using this measure, we explore how the phase-space dimension of the nODEs and the complexity of the function modeling the flow field contribute to expressivity. We see that a more complex function for modeling the flow field allows a lower-dimensional nODE to capture a given target dynamics. Finally, we demonstrate the benefit of gating in nODEs on several real-world tasks.

了解生物和人工神经网络中的动态如何实现任务所需的计算是机器学习和神经科学中一个突出的开放性问题。特别是，需要复杂内存存储和检索的计算对这些网络的实现或学习提出了重大挑战。最近，一组由神经常微分方程(node)描述的模型已经成为能够捕捉复杂动态的强大动态神经网络模型。在这里，我们通过使用门控交互赋予节点自适应时间尺度来扩展节点。我们将其称为门控神经ode (gnODEs)。使用一个需要连续量记忆的任务，我们证明了gnODEs学习(近似)连续吸引子的归纳偏置。我们进一步展示了降维gnode如何在保持其建模能力的同时大大提高了可解释性，甚至允许对学习到的吸引子的结构进行显式可视化。我们引入了一种新的表达性度量，它探测了神经网络生成复杂轨迹的能力。利用这一度量，我们探讨了节点的相空间维度和流场建模函数的复杂性如何影响表现力。我们看到，用于流场建模的更复杂的函数允许低维nODE捕获给定的目标动态。最后，我们将在几个实际任务中演示在node中进行门控的好处。

{"title":"Trainability, Expressivity and Interpretability in Gated Neural ODEs","authors":"T. Kim, T. Can, K. Krishnamurthy","doi":"10.48550/arXiv.2307.06398","DOIUrl":"https://doi.org/10.48550/arXiv.2307.06398","url":null,"abstract":"Understanding how the dynamics in biological and artificial neural networks implement the computations required for a task is a salient open question in machine learning and neuroscience. In particular, computations requiring complex memory storage and retrieval pose a significant challenge for these networks to implement or learn. Recently, a family of models described by neural ordinary differential equations (nODEs) has emerged as powerful dynamical neural network models capable of capturing complex dynamics. Here, we extend nODEs by endowing them with adaptive timescales using gating interactions. We refer to these as gated neural ODEs (gnODEs). Using a task that requires memory of continuous quantities, we demonstrate the inductive bias of the gnODEs to learn (approximate) continuous attractors. We further show how reduced-dimensional gnODEs retain their modeling power while greatly improving interpretability, even allowing explicit visualization of the structure of learned attractors. We introduce a novel measure of expressivity which probes the capacity of a neural network to generate complex trajectories. Using this measure, we explore how the phase-space dimension of the nODEs and the complexity of the function modeling the flow field contribute to expressivity. We see that a more complex function for modeling the flow field allows a lower-dimensional nODE to capture a given target dynamics. Finally, we demonstrate the benefit of gating in nODEs on several real-world tasks.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"7 1","pages":"16393-16423"},"PeriodicalIF":0.0,"publicationDate":"2023-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78462711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Diversity-enhancing Generative Network for Few-shot Hypothesis Adaptation 基于少镜头假设自适应的多样性增强生成网络

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning

Pub Date : 2023-07-12 DOI: 10.48550/arXiv.2307.05948

Ruijiang Dong, Feng Liu, Haoang Chi, Tongliang Liu, Mingming Gong, Gang Niu, Masashi Sugiyama, Bo Han

Generating unlabeled data has been recently shown to help address the few-shot hypothesis adaptation (FHA) problem, where we aim to train a classifier for the target domain with a few labeled target-domain data and a well-trained source-domain classifier (i.e., a source hypothesis), for the additional information of the highly-compatible unlabeled data. However, the generated data of the existing methods are extremely similar or even the same. The strong dependency among the generated data will lead the learning to fail. In this paper, we propose a diversity-enhancing generative network (DEG-Net) for the FHA problem, which can generate diverse unlabeled data with the help of a kernel independence measure: the Hilbert-Schmidt independence criterion (HSIC). Specifically, DEG-Net will generate data via minimizing the HSIC value (i.e., maximizing the independence) among the semantic features of the generated data. By DEG-Net, the generated unlabeled data are more diverse and more effective for addressing the FHA problem. Experimental results show that the DEG-Net outperforms existing FHA baselines and further verifies that generating diverse data plays a vital role in addressing the FHA problem

生成未标记数据最近被证明有助于解决少拍假设适应(FHA)问题，我们的目标是用几个标记的目标域数据和一个训练良好的源域分类器(即源假设)来训练目标域的分类器，以获取高度兼容的未标记数据的附加信息。然而，现有方法生成的数据非常相似，甚至相同。生成的数据之间的强依赖性会导致学习失败。在本文中，我们提出了一个多样性增强的生成网络(DEG-Net)用于FHA问题，该网络可以利用核独立性度量:Hilbert-Schmidt独立性准则(HSIC)生成多样化的未标记数据。具体来说，DEG-Net将通过最小化生成数据的语义特征之间的HSIC值(即最大化独立性)来生成数据。通过DEG-Net，生成的未标记数据更多样化，更有效地解决FHA问题。实验结果表明，DEG-Net优于现有的FHA基线，进一步验证了生成多样化数据在解决FHA问题中起着至关重要的作用

{"title":"Diversity-enhancing Generative Network for Few-shot Hypothesis Adaptation","authors":"Ruijiang Dong, Feng Liu, Haoang Chi, Tongliang Liu, Mingming Gong, Gang Niu, Masashi Sugiyama, Bo Han","doi":"10.48550/arXiv.2307.05948","DOIUrl":"https://doi.org/10.48550/arXiv.2307.05948","url":null,"abstract":"Generating unlabeled data has been recently shown to help address the few-shot hypothesis adaptation (FHA) problem, where we aim to train a classifier for the target domain with a few labeled target-domain data and a well-trained source-domain classifier (i.e., a source hypothesis), for the additional information of the highly-compatible unlabeled data. However, the generated data of the existing methods are extremely similar or even the same. The strong dependency among the generated data will lead the learning to fail. In this paper, we propose a diversity-enhancing generative network (DEG-Net) for the FHA problem, which can generate diverse unlabeled data with the help of a kernel independence measure: the Hilbert-Schmidt independence criterion (HSIC). Specifically, DEG-Net will generate data via minimizing the HSIC value (i.e., maximizing the independence) among the semantic features of the generated data. By DEG-Net, the generated unlabeled data are more diverse and more effective for addressing the FHA problem. Experimental results show that the DEG-Net outperforms existing FHA baselines and further verifies that generating diverse data plays a vital role in addressing the FHA problem","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"89 1","pages":"8260-8275"},"PeriodicalIF":0.0,"publicationDate":"2023-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80304652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Conformalization of Sparse Generalized Linear Models 稀疏广义线性模型的保形化

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning

Pub Date : 2023-07-11 DOI: 10.48550/arXiv.2307.05109

E. Guha, Eugène Ndiaye, X. Huo

Given a sequence of observable variables ${(x_1, y_1), ldots, (x_n, y_n)}$, the conformal prediction method estimates a confidence set for $y_{n+1}$ given $x_{n+1}$ that is valid for any finite sample size by merely assuming that the joint distribution of the data is permutation invariant. Although attractive, computing such a set is computationally infeasible in most regression problems. Indeed, in these cases, the unknown variable $y_{n+1}$ can take an infinite number of possible candidate values, and generating conformal sets requires retraining a predictive model for each candidate. In this paper, we focus on a sparse linear model with only a subset of variables for prediction and use numerical continuation techniques to approximate the solution path efficiently. The critical property we exploit is that the set of selected variables is invariant under a small perturbation of the input data. Therefore, it is sufficient to enumerate and refit the model only at the change points of the set of active features and smoothly interpolate the rest of the solution via a Predictor-Corrector mechanism. We show how our path-following algorithm accurately approximates conformal prediction sets and illustrate its performance using synthetic and real data examples.

给定一系列可观测变量${(x_1, y_1)， ldots， (x_n, y_n)}$，保形预测方法估计了$y_{n+1}$给定$x_{n+1}$的置信集，该置信集仅假设数据的联合分布是排列不变的，对任何有限样本容量有效。虽然很有吸引力，但在大多数回归问题中计算这样一个集合在计算上是不可行的。实际上，在这些情况下，未知变量$y_{n+1}$可以取无限个可能的候选值，而生成保形集需要为每个候选值重新训练一个预测模型。在本文中，我们关注一个只有一组变量用于预测的稀疏线性模型，并使用数值延拓技术有效地逼近解路径。我们利用的关键性质是，在输入数据的小扰动下，所选变量的集合是不变的。因此，仅在活动特征集的变化点处枚举和重构模型并通过Predictor-Corrector机制平滑地插值其余的解决方案就足够了。我们展示了我们的路径跟踪算法如何准确地逼近保形预测集，并使用合成和真实数据示例说明其性能。

{"title":"Conformalization of Sparse Generalized Linear Models","authors":"E. Guha, Eugène Ndiaye, X. Huo","doi":"10.48550/arXiv.2307.05109","DOIUrl":"https://doi.org/10.48550/arXiv.2307.05109","url":null,"abstract":"Given a sequence of observable variables ${(x_1, y_1), ldots, (x_n, y_n)}$, the conformal prediction method estimates a confidence set for $y_{n+1}$ given $x_{n+1}$ that is valid for any finite sample size by merely assuming that the joint distribution of the data is permutation invariant. Although attractive, computing such a set is computationally infeasible in most regression problems. Indeed, in these cases, the unknown variable $y_{n+1}$ can take an infinite number of possible candidate values, and generating conformal sets requires retraining a predictive model for each candidate. In this paper, we focus on a sparse linear model with only a subset of variables for prediction and use numerical continuation techniques to approximate the solution path efficiently. The critical property we exploit is that the set of selected variables is invariant under a small perturbation of the input data. Therefore, it is sufficient to enumerate and refit the model only at the change points of the set of active features and smoothly interpolate the rest of the solution via a Predictor-Corrector mechanism. We show how our path-following algorithm accurately approximates conformal prediction sets and illustrate its performance using synthetic and real data examples.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"5 1","pages":"11871-11887"},"PeriodicalIF":0.0,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88597018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Injecting Logical Constraints into Neural Networks via Straight-Through Estimators 通过直通估计器向神经网络注入逻辑约束

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning

Pub Date : 2023-07-10 DOI: 10.48550/arXiv.2307.04347

Zhun Yang, Joohyung Lee, Chi-youn Park

Injecting discrete logical constraints into neural network learning is one of the main challenges in neuro-symbolic AI. We find that a straight-through-estimator, a method introduced to train binary neural networks, could effectively be applied to incorporate logical constraints into neural network learning. More specifically, we design a systematic way to represent discrete logical constraints as a loss function; minimizing this loss using gradient descent via a straight-through-estimator updates the neural network's weights in the direction that the binarized outputs satisfy the logical constraints. The experimental results show that by leveraging GPUs and batch training, this method scales significantly better than existing neuro-symbolic methods that require heavy symbolic computation for computing gradients. Also, we demonstrate that our method applies to different types of neural networks, such as MLP, CNN, and GNN, making them learn with no or fewer labeled data by learning directly from known constraints.

将离散逻辑约束注入神经网络学习是神经符号人工智能的主要挑战之一。我们发现直通式估计器是一种用于训练二元神经网络的方法，可以有效地将逻辑约束纳入神经网络学习中。更具体地说，我们设计了一种系统的方法来表示离散逻辑约束作为损失函数;通过直通式估计器使用梯度下降最小化这种损失，在二值化输出满足逻辑约束的方向上更新神经网络的权重。实验结果表明，通过利用gpu和批处理训练，该方法的可扩展性明显优于现有的需要大量符号计算来计算梯度的神经符号方法。此外，我们证明了我们的方法适用于不同类型的神经网络，如MLP、CNN和GNN，通过直接从已知约束中学习，使它们在没有或更少标记数据的情况下学习。

引用次数: 7

TGRL: An Algorithm for Teacher Guided Reinforcement Learning TGRL:一种教师引导的强化学习算法

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning

Pub Date : 2023-07-06 DOI: 10.48550/arXiv.2307.03186

Idan Shenfeld, Zhang-Wei Hong, Aviv Tamar, Pulkit Agrawal

Learning from rewards (i.e., reinforcement learning or RL) and learning to imitate a teacher (i.e., teacher-student learning) are two established approaches for solving sequential decision-making problems. To combine the benefits of these different forms of learning, it is common to train a policy to maximize a combination of reinforcement and teacher-student learning objectives. However, without a principled method to balance these objectives, prior work used heuristics and problem-specific hyperparameter searches to balance the two objectives. We present a $textit{principled}$ approach, along with an approximate implementation for $textit{dynamically}$ and $textit{automatically}$ balancing when to follow the teacher and when to use rewards. The main idea is to adjust the importance of teacher supervision by comparing the agent's performance to the counterfactual scenario of the agent learning without teacher supervision and only from rewards. If using teacher supervision improves performance, the importance of teacher supervision is increased and otherwise it is decreased. Our method, $textit{Teacher Guided Reinforcement Learning}$ (TGRL), outperforms strong baselines across diverse domains without hyper-parameter tuning.

从奖励中学习(即强化学习或RL)和学习模仿老师(即师生学习)是解决顺序决策问题的两种既定方法。为了结合这些不同学习形式的好处，通常需要制定一项政策，以最大限度地结合强化和师生学习目标。然而，由于缺乏平衡这些目标的原则性方法，先前的工作使用启发式和特定于问题的超参数搜索来平衡这两个目标。我们提出了$textit{principled}$方法，以及$textit{dynamically}$和$textit{automatically}$平衡何时跟随老师和何时使用奖励的近似实现。主要思想是通过将智能体的表现与没有教师监督和仅从奖励中学习的智能体学习的反事实情景进行比较，来调整教师监督的重要性。如果使用教师监督可以提高绩效，那么教师监督的重要性就会增加，反之则会降低。我们的方法$textit{Teacher Guided Reinforcement Learning}$ (TGRL)在没有超参数调优的情况下优于不同领域的强基线。

{"title":"TGRL: An Algorithm for Teacher Guided Reinforcement Learning","authors":"Idan Shenfeld, Zhang-Wei Hong, Aviv Tamar, Pulkit Agrawal","doi":"10.48550/arXiv.2307.03186","DOIUrl":"https://doi.org/10.48550/arXiv.2307.03186","url":null,"abstract":"Learning from rewards (i.e., reinforcement learning or RL) and learning to imitate a teacher (i.e., teacher-student learning) are two established approaches for solving sequential decision-making problems. To combine the benefits of these different forms of learning, it is common to train a policy to maximize a combination of reinforcement and teacher-student learning objectives. However, without a principled method to balance these objectives, prior work used heuristics and problem-specific hyperparameter searches to balance the two objectives. We present a $textit{principled}$ approach, along with an approximate implementation for $textit{dynamically}$ and $textit{automatically}$ balancing when to follow the teacher and when to use rewards. The main idea is to adjust the importance of teacher supervision by comparing the agent's performance to the counterfactual scenario of the agent learning without teacher supervision and only from rewards. If using teacher supervision improves performance, the importance of teacher supervision is increased and otherwise it is decreased. Our method, $textit{Teacher Guided Reinforcement Learning}$ (TGRL), outperforms strong baselines across diverse domains without hyper-parameter tuning.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"41 1","pages":"31077-31093"},"PeriodicalIF":0.0,"publicationDate":"2023-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86499350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2