Pub Date : 2022-05-06DOI: 10.48550/arXiv.2205.03059
Quanming Yao, Yaqing Wang, Bo Han, J. Kwok
Nonconvex regularization has been popularly used in low-rank matrix learning. However, extending it for low-rank tensor learning is still computationally expensive. To address this problem, we develop an efficient solver for use with a nonconvex extension of the overlapped nuclear norm regularizer. Based on the proximal average algorithm, the proposed algorithm can avoid expensive tensor folding/unfolding operations. A special"sparse plus low-rank"structure is maintained throughout the iterations, and allows fast computation of the individual proximal steps. Empirical convergence is further improved with the use of adaptive momentum. We provide convergence guarantees to critical points on smooth losses and also on objectives satisfying the Kurdyka-{L}ojasiewicz condition. While the optimization problem is nonconvex and nonsmooth, we show that its critical points still have good statistical performance on the tensor completion problem. Experiments on various synthetic and real-world data sets show that the proposed algorithm is efficient in both time and space and more accurate than the existing state-of-the-art.
{"title":"Low-rank Tensor Learning with Nonconvex Overlapped Nuclear Norm Regularization","authors":"Quanming Yao, Yaqing Wang, Bo Han, J. Kwok","doi":"10.48550/arXiv.2205.03059","DOIUrl":"https://doi.org/10.48550/arXiv.2205.03059","url":null,"abstract":"Nonconvex regularization has been popularly used in low-rank matrix learning. However, extending it for low-rank tensor learning is still computationally expensive. To address this problem, we develop an efficient solver for use with a nonconvex extension of the overlapped nuclear norm regularizer. Based on the proximal average algorithm, the proposed algorithm can avoid expensive tensor folding/unfolding operations. A special\"sparse plus low-rank\"structure is maintained throughout the iterations, and allows fast computation of the individual proximal steps. Empirical convergence is further improved with the use of adaptive momentum. We provide convergence guarantees to critical points on smooth losses and also on objectives satisfying the Kurdyka-{L}ojasiewicz condition. While the optimization problem is nonconvex and nonsmooth, we show that its critical points still have good statistical performance on the tensor completion problem. Experiments on various synthetic and real-world data sets show that the proposed algorithm is efficient in both time and space and more accurate than the existing state-of-the-art.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"23 1","pages":"136:1-136:60"},"PeriodicalIF":0.0,"publicationDate":"2022-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73533017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-05DOI: 10.48550/arXiv.2205.02654
Marcel Wienöbst, Max Bannach, M. Liskiewicz
Counting and sampling directed acyclic graphs from a Markov equivalence class are fundamental tasks in graphical causal analysis. In this paper we show that these tasks can be performed in polynomial time, solving a long-standing open problem in this area. Our algorithms are effective and easily implementable. As we show in experiments, these breakthroughs make thought-to-be-infeasible strategies in active learning of causal structures and causal effect identification with regard to a Markov equivalence class practically applicable.
{"title":"Polynomial-Time Algorithms for Counting and Sampling Markov Equivalent DAGs with Applications","authors":"Marcel Wienöbst, Max Bannach, M. Liskiewicz","doi":"10.48550/arXiv.2205.02654","DOIUrl":"https://doi.org/10.48550/arXiv.2205.02654","url":null,"abstract":"Counting and sampling directed acyclic graphs from a Markov equivalence class are fundamental tasks in graphical causal analysis. In this paper we show that these tasks can be performed in polynomial time, solving a long-standing open problem in this area. Our algorithms are effective and easily implementable. As we show in experiments, these breakthroughs make thought-to-be-infeasible strategies in active learning of causal structures and causal effect identification with regard to a Markov equivalence class practically applicable.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"8 1","pages":"213:1-213:45"},"PeriodicalIF":0.0,"publicationDate":"2022-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73462406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-01DOI: 10.48550/arXiv.2205.00403
J. Liu, Shreyas Padhy, Jie Jessie Ren, Zi Lin, Yeming Wen, Ghassen Jerfel, Zachary Nado, Jasper Snoek, Dustin Tran, Balaji Lakshminarayanan
Accurate uncertainty quantification is a major challenge in deep learning, as neural networks can make overconfident errors and assign high confidence predictions to out-of-distribution (OOD) inputs. The most popular approaches to estimate predictive uncertainty in deep learning are methods that combine predictions from multiple neural networks, such as Bayesian neural networks (BNNs) and deep ensembles. However their practicality in real-time, industrial-scale applications are limited due to the high memory and computational cost. Furthermore, ensembles and BNNs do not necessarily fix all the issues with the underlying member networks. In this work, we study principled approaches to improve uncertainty property of a single network, based on a single, deterministic representation. By formalizing the uncertainty quantification as a minimax learning problem, we first identify distance awareness, i.e., the model's ability to quantify the distance of a testing example from the training data, as a necessary condition for a DNN to achieve high-quality (i.e., minimax optimal) uncertainty estimation. We then propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs with two simple changes: (1) applying spectral normalization to hidden weights to enforce bi-Lipschitz smoothness in representations and (2) replacing the last output layer with a Gaussian process layer. On a suite of vision and language understanding benchmarks, SNGP outperforms other single-model approaches in prediction, calibration and out-of-domain detection. Furthermore, SNGP provides complementary benefits to popular techniques such as deep ensembles and data augmentation, making it a simple and scalable building block for probabilistic deep learning. Code is open-sourced at https://github.com/google/uncertainty-baselines
{"title":"A Simple Approach to Improve Single-Model Deep Uncertainty via Distance-Awareness","authors":"J. Liu, Shreyas Padhy, Jie Jessie Ren, Zi Lin, Yeming Wen, Ghassen Jerfel, Zachary Nado, Jasper Snoek, Dustin Tran, Balaji Lakshminarayanan","doi":"10.48550/arXiv.2205.00403","DOIUrl":"https://doi.org/10.48550/arXiv.2205.00403","url":null,"abstract":"Accurate uncertainty quantification is a major challenge in deep learning, as neural networks can make overconfident errors and assign high confidence predictions to out-of-distribution (OOD) inputs. The most popular approaches to estimate predictive uncertainty in deep learning are methods that combine predictions from multiple neural networks, such as Bayesian neural networks (BNNs) and deep ensembles. However their practicality in real-time, industrial-scale applications are limited due to the high memory and computational cost. Furthermore, ensembles and BNNs do not necessarily fix all the issues with the underlying member networks. In this work, we study principled approaches to improve uncertainty property of a single network, based on a single, deterministic representation. By formalizing the uncertainty quantification as a minimax learning problem, we first identify distance awareness, i.e., the model's ability to quantify the distance of a testing example from the training data, as a necessary condition for a DNN to achieve high-quality (i.e., minimax optimal) uncertainty estimation. We then propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs with two simple changes: (1) applying spectral normalization to hidden weights to enforce bi-Lipschitz smoothness in representations and (2) replacing the last output layer with a Gaussian process layer. On a suite of vision and language understanding benchmarks, SNGP outperforms other single-model approaches in prediction, calibration and out-of-domain detection. Furthermore, SNGP provides complementary benefits to popular techniques such as deep ensembles and data augmentation, making it a simple and scalable building block for probabilistic deep learning. Code is open-sourced at https://github.com/google/uncertainty-baselines","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"82 1","pages":"42:1-42:63"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83998193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-18DOI: 10.48550/arXiv.2204.08573
Ali Ghadirzadeh, Petra Poklukar, Karol Arndt, Chelsea Finn, V. Kyrki, D. Kragic, Marten Bjorkman
We present a data-efficient framework for solving sequential decision-making problems which exploits the combination of reinforcement learning (RL) and latent variable generative models. The framework, called GenRL, trains deep policies by introducing an action latent variable such that the feed-forward policy search can be divided into two parts: (i) training a sub-policy that outputs a distribution over the action latent variable given a state of the system, and (ii) unsupervised training of a generative model that outputs a sequence of motor actions conditioned on the latent action variable. GenRL enables safe exploration and alleviates the data-inefficiency problem as it exploits prior knowledge about valid sequences of motor actions. Moreover, we provide a set of measures for evaluation of generative models such that we are able to predict the performance of the RL policy training prior to the actual training on a physical robot. We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training on two robotics tasks: shooting a hockey puck and throwing a basketball. Furthermore, we empirically demonstrate that GenRL is the only method which can safely and efficiently solve the robotics tasks compared to two state-of-the-art RL methods.
{"title":"Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models","authors":"Ali Ghadirzadeh, Petra Poklukar, Karol Arndt, Chelsea Finn, V. Kyrki, D. Kragic, Marten Bjorkman","doi":"10.48550/arXiv.2204.08573","DOIUrl":"https://doi.org/10.48550/arXiv.2204.08573","url":null,"abstract":"We present a data-efficient framework for solving sequential decision-making problems which exploits the combination of reinforcement learning (RL) and latent variable generative models. The framework, called GenRL, trains deep policies by introducing an action latent variable such that the feed-forward policy search can be divided into two parts: (i) training a sub-policy that outputs a distribution over the action latent variable given a state of the system, and (ii) unsupervised training of a generative model that outputs a sequence of motor actions conditioned on the latent action variable. GenRL enables safe exploration and alleviates the data-inefficiency problem as it exploits prior knowledge about valid sequences of motor actions. Moreover, we provide a set of measures for evaluation of generative models such that we are able to predict the performance of the RL policy training prior to the actual training on a physical robot. We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training on two robotics tasks: shooting a hockey puck and throwing a basketball. Furthermore, we empirically demonstrate that GenRL is the only method which can safely and efficiently solve the robotics tasks compared to two state-of-the-art RL methods.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"34 1","pages":"174:1-174:37"},"PeriodicalIF":0.0,"publicationDate":"2022-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73886843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-13DOI: 10.48550/arXiv.2204.06264
Tomer Levy, F. Abramovich
We consider high-dimensional multiclass classification by sparse multinomial logistic regression. Unlike binary classification, in the multiclass setup one can think about an entire spectrum of possible notions of sparsity associated with different structural assumptions on the regression coefficients matrix. We propose a computationally feasible feature selection procedure based on penalized maximum likelihood with convex penalties capturing a specific type of sparsity at hand. In particular, we consider global sparsity, double row-wise sparsity, and low-rank sparsity, and show that with the properly chosen tuning parameters the derived plug-in classifiers attain the minimax generalization error bounds (in terms of misclassification excess risk) within the corresponding classes of multiclass sparse linear classifiers. The developed approach is general and can be adapted to other types of sparsity as well.
{"title":"Generalization Error Bounds for Multiclass Sparse Linear Classifiers","authors":"Tomer Levy, F. Abramovich","doi":"10.48550/arXiv.2204.06264","DOIUrl":"https://doi.org/10.48550/arXiv.2204.06264","url":null,"abstract":"We consider high-dimensional multiclass classification by sparse multinomial logistic regression. Unlike binary classification, in the multiclass setup one can think about an entire spectrum of possible notions of sparsity associated with different structural assumptions on the regression coefficients matrix. We propose a computationally feasible feature selection procedure based on penalized maximum likelihood with convex penalties capturing a specific type of sparsity at hand. In particular, we consider global sparsity, double row-wise sparsity, and low-rank sparsity, and show that with the properly chosen tuning parameters the derived plug-in classifiers attain the minimax generalization error bounds (in terms of misclassification excess risk) within the corresponding classes of multiclass sparse linear classifiers. The developed approach is general and can be adapted to other types of sparsity as well.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"10 1","pages":"151:1-151:35"},"PeriodicalIF":0.0,"publicationDate":"2022-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79815817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-11DOI: 10.48550/arXiv.2204.04808
Guanyang Wang, T. Wang
Constructing unbiased estimators from Markov chain Monte Carlo (MCMC) outputs is a difficult problem that has recently received a lot of attention in the statistics and machine learning communities. However, the current unbiased MCMC framework only works when the quantity of interest is an expectation, which excludes many practical applications. In this paper, we propose a general method for constructing unbiased estimators for functions of expectations and extend it to construct unbiased estimators for nested expectations. Our approach combines and generalizes the unbiased MCMC and Multilevel Monte Carlo (MLMC) methods. In contrast to traditional sequential methods, our estimator can be implemented on parallel processors. We show that our estimator has a finite variance and computational complexity and can achieve $varepsilon$-accuracy within the optimal $O(1/varepsilon^2)$ computational cost under mild conditions. Our numerical experiments confirm our theoretical findings and demonstrate the benefits of unbiased estimators in the massively parallel regime.
{"title":"Unbiased Multilevel Monte Carlo methods for intractable distributions: MLMC meets MCMC","authors":"Guanyang Wang, T. Wang","doi":"10.48550/arXiv.2204.04808","DOIUrl":"https://doi.org/10.48550/arXiv.2204.04808","url":null,"abstract":"Constructing unbiased estimators from Markov chain Monte Carlo (MCMC) outputs is a difficult problem that has recently received a lot of attention in the statistics and machine learning communities. However, the current unbiased MCMC framework only works when the quantity of interest is an expectation, which excludes many practical applications. In this paper, we propose a general method for constructing unbiased estimators for functions of expectations and extend it to construct unbiased estimators for nested expectations. Our approach combines and generalizes the unbiased MCMC and Multilevel Monte Carlo (MLMC) methods. In contrast to traditional sequential methods, our estimator can be implemented on parallel processors. We show that our estimator has a finite variance and computational complexity and can achieve $varepsilon$-accuracy within the optimal $O(1/varepsilon^2)$ computational cost under mild conditions. Our numerical experiments confirm our theoretical findings and demonstrate the benefits of unbiased estimators in the massively parallel regime.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"369 1","pages":"249:1-249:40"},"PeriodicalIF":0.0,"publicationDate":"2022-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82737072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-08DOI: 10.48550/arXiv.2204.05117
Francesco Martinuzzi, Chris Rackauckas, Anas Abdelrehim, M. Mahecha, Karin Mora
We introduce ReservoirComputing.jl, an open source Julia library for reservoir computing models. The software offers a great number of algorithms presented in the literature, and allows to expand on them with both internal and external tools in a simple way. The implementation is highly modular, fast and comes with a comprehensive documentation, which includes reproduced experiments from literature. The code and documentation are hosted on Github under an MIT license https://github.com/SciML/ReservoirComputing.jl.
{"title":"ReservoirComputing.jl: An Efficient and Modular Library for Reservoir Computing Models","authors":"Francesco Martinuzzi, Chris Rackauckas, Anas Abdelrehim, M. Mahecha, Karin Mora","doi":"10.48550/arXiv.2204.05117","DOIUrl":"https://doi.org/10.48550/arXiv.2204.05117","url":null,"abstract":"We introduce ReservoirComputing.jl, an open source Julia library for reservoir computing models. The software offers a great number of algorithms presented in the literature, and allows to expand on them with both internal and external tools in a simple way. The implementation is highly modular, fast and comes with a comprehensive documentation, which includes reproduced experiments from literature. The code and documentation are hosted on Github under an MIT license https://github.com/SciML/ReservoirComputing.jl.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"22 1","pages":"288:1-288:8"},"PeriodicalIF":0.0,"publicationDate":"2022-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87233960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-07DOI: 10.48550/arXiv.2204.03132
Michael I. Jordan, Tianyi Lin, M. Zampetakis
We consider the problem of computing an equilibrium in a class of textit{nonlinear generalized Nash equilibrium problems (NGNEPs)} in which the strategy sets for each player are defined by equality and inequality constraints that may depend on the choices of rival players. While the asymptotic global convergence and local convergence rates of algorithms to solve this problem have been extensively investigated, the analysis of nonasymptotic iteration complexity is still in its infancy. This paper presents two first-order algorithms -- based on the quadratic penalty method (QPM) and augmented Lagrangian method (ALM), respectively -- with an accelerated mirror-prox algorithm as the solver in each inner loop. We establish a global convergence guarantee for solving monotone and strongly monotone NGNEPs and provide nonasymptotic complexity bounds expressed in terms of the number of gradient evaluations. Experimental results demonstrate the efficiency of our algorithms in practice.
{"title":"First-Order Algorithms for Nonlinear Generalized Nash Equilibrium Problems","authors":"Michael I. Jordan, Tianyi Lin, M. Zampetakis","doi":"10.48550/arXiv.2204.03132","DOIUrl":"https://doi.org/10.48550/arXiv.2204.03132","url":null,"abstract":"We consider the problem of computing an equilibrium in a class of textit{nonlinear generalized Nash equilibrium problems (NGNEPs)} in which the strategy sets for each player are defined by equality and inequality constraints that may depend on the choices of rival players. While the asymptotic global convergence and local convergence rates of algorithms to solve this problem have been extensively investigated, the analysis of nonasymptotic iteration complexity is still in its infancy. This paper presents two first-order algorithms -- based on the quadratic penalty method (QPM) and augmented Lagrangian method (ALM), respectively -- with an accelerated mirror-prox algorithm as the solver in each inner loop. We establish a global convergence guarantee for solving monotone and strongly monotone NGNEPs and provide nonasymptotic complexity bounds expressed in terms of the number of gradient evaluations. Experimental results demonstrate the efficiency of our algorithms in practice.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"20 1","pages":"38:1-38:46"},"PeriodicalIF":0.0,"publicationDate":"2022-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85567811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-29DOI: 10.48550/arXiv.2203.16328
Bo Shen, Weijun Xie, Zhen Kong
The objective of this study is to address the problem of background/foreground separation with missing pixels by combining the video acquisition, video recovery, background/foreground separation into a single framework. To achieve this, a smooth robust tensor completion (SRTC) model is proposed to recover the data and decompose it into the static background and smooth foreground, respectively. Specifically, the static background is modeled by the low-rank tucker decomposition and the smooth foreground (moving objects) is modeled by the spatiotemporal continuity, which is enforced by the total variation regularization. An efficient algorithm based on tensor proximal alternating minimization (tenPAM) is implemented to solve the proposed model with global convergence guarantee under very mild conditions. Extensive experiments on real data demonstrate that the proposed method significantly outperforms the state-of-the-art approaches for background/foreground separation with missing pixels.
{"title":"Smooth Robust Tensor Completion for Background/Foreground Separation with Missing Pixels: Novel Algorithm with Convergence Guarantee","authors":"Bo Shen, Weijun Xie, Zhen Kong","doi":"10.48550/arXiv.2203.16328","DOIUrl":"https://doi.org/10.48550/arXiv.2203.16328","url":null,"abstract":"The objective of this study is to address the problem of background/foreground separation with missing pixels by combining the video acquisition, video recovery, background/foreground separation into a single framework. To achieve this, a smooth robust tensor completion (SRTC) model is proposed to recover the data and decompose it into the static background and smooth foreground, respectively. Specifically, the static background is modeled by the low-rank tucker decomposition and the smooth foreground (moving objects) is modeled by the spatiotemporal continuity, which is enforced by the total variation regularization. An efficient algorithm based on tensor proximal alternating minimization (tenPAM) is implemented to solve the proposed model with global convergence guarantee under very mild conditions. Extensive experiments on real data demonstrate that the proposed method significantly outperforms the state-of-the-art approaches for background/foreground separation with missing pixels.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"24 1","pages":"217:1-217:40"},"PeriodicalIF":0.0,"publicationDate":"2022-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78880960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-17DOI: 10.48550/arXiv.2203.09410
David Holzmüller, V. Zaverkin, Johannes Kastner, Ingo Steinwart
The acquisition of labels for supervised learning can be expensive. To improve the sample efficiency of neural network regression, we study active learning methods that adaptively select batches of unlabeled data for labeling. We present a framework for constructing such methods out of (network-dependent) base kernels, kernel transformations, and selection methods. Our framework encompasses many existing Bayesian methods based on Gaussian process approximations of neural networks as well as non-Bayesian methods. Additionally, we propose to replace the commonly used last-layer features with sketched finite-width neural tangent kernels and to combine them with a novel clustering method. To evaluate different methods, we introduce an open-source benchmark consisting of 15 large tabular regression data sets. Our proposed method outperforms the state-of-the-art on our benchmark, scales to large data sets, and works out-of-the-box without adjusting the network architecture or training code. We provide open-source code that includes efficient implementations of all kernels, kernel transformations, and selection methods, and can be used for reproducing our results.
{"title":"A Framework and Benchmark for Deep Batch Active Learning for Regression","authors":"David Holzmüller, V. Zaverkin, Johannes Kastner, Ingo Steinwart","doi":"10.48550/arXiv.2203.09410","DOIUrl":"https://doi.org/10.48550/arXiv.2203.09410","url":null,"abstract":"The acquisition of labels for supervised learning can be expensive. To improve the sample efficiency of neural network regression, we study active learning methods that adaptively select batches of unlabeled data for labeling. We present a framework for constructing such methods out of (network-dependent) base kernels, kernel transformations, and selection methods. Our framework encompasses many existing Bayesian methods based on Gaussian process approximations of neural networks as well as non-Bayesian methods. Additionally, we propose to replace the commonly used last-layer features with sketched finite-width neural tangent kernels and to combine them with a novel clustering method. To evaluate different methods, we introduce an open-source benchmark consisting of 15 large tabular regression data sets. Our proposed method outperforms the state-of-the-art on our benchmark, scales to large data sets, and works out-of-the-box without adjusting the network architecture or training code. We provide open-source code that includes efficient implementations of all kernels, kernel transformations, and selection methods, and can be used for reproducing our results.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"112 1","pages":"164:1-164:81"},"PeriodicalIF":0.0,"publicationDate":"2022-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79638527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}