A Simple Approach to Improve Single-Model Deep Uncertainty via Distance-Awareness

J. Mach. Learn. Res. Pub Date : 2022-05-01 DOI:10.48550/arXiv.2205.00403

J. Liu, Shreyas Padhy, Jie Jessie Ren, Zi Lin, Yeming Wen, Ghassen Jerfel, Zachary Nado, Jasper Snoek, Dustin Tran, Balaji Lakshminarayanan

{"title":"A Simple Approach to Improve Single-Model Deep Uncertainty via Distance-Awareness","authors":"J. Liu, Shreyas Padhy, Jie Jessie Ren, Zi Lin, Yeming Wen, Ghassen Jerfel, Zachary Nado, Jasper Snoek, Dustin Tran, Balaji Lakshminarayanan","doi":"10.48550/arXiv.2205.00403","DOIUrl":null,"url":null,"abstract":"Accurate uncertainty quantification is a major challenge in deep learning, as neural networks can make overconfident errors and assign high confidence predictions to out-of-distribution (OOD) inputs. The most popular approaches to estimate predictive uncertainty in deep learning are methods that combine predictions from multiple neural networks, such as Bayesian neural networks (BNNs) and deep ensembles. However their practicality in real-time, industrial-scale applications are limited due to the high memory and computational cost. Furthermore, ensembles and BNNs do not necessarily fix all the issues with the underlying member networks. In this work, we study principled approaches to improve uncertainty property of a single network, based on a single, deterministic representation. By formalizing the uncertainty quantification as a minimax learning problem, we first identify distance awareness, i.e., the model's ability to quantify the distance of a testing example from the training data, as a necessary condition for a DNN to achieve high-quality (i.e., minimax optimal) uncertainty estimation. We then propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs with two simple changes: (1) applying spectral normalization to hidden weights to enforce bi-Lipschitz smoothness in representations and (2) replacing the last output layer with a Gaussian process layer. On a suite of vision and language understanding benchmarks, SNGP outperforms other single-model approaches in prediction, calibration and out-of-domain detection. Furthermore, SNGP provides complementary benefits to popular techniques such as deep ensembles and data augmentation, making it a simple and scalable building block for probabilistic deep learning. Code is open-sourced at https://github.com/google/uncertainty-baselines","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"82 1","pages":"42:1-42:63"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Mach. Learn. Res.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2205.00403","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

Abstract

Accurate uncertainty quantification is a major challenge in deep learning, as neural networks can make overconfident errors and assign high confidence predictions to out-of-distribution (OOD) inputs. The most popular approaches to estimate predictive uncertainty in deep learning are methods that combine predictions from multiple neural networks, such as Bayesian neural networks (BNNs) and deep ensembles. However their practicality in real-time, industrial-scale applications are limited due to the high memory and computational cost. Furthermore, ensembles and BNNs do not necessarily fix all the issues with the underlying member networks. In this work, we study principled approaches to improve uncertainty property of a single network, based on a single, deterministic representation. By formalizing the uncertainty quantification as a minimax learning problem, we first identify distance awareness, i.e., the model's ability to quantify the distance of a testing example from the training data, as a necessary condition for a DNN to achieve high-quality (i.e., minimax optimal) uncertainty estimation. We then propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs with two simple changes: (1) applying spectral normalization to hidden weights to enforce bi-Lipschitz smoothness in representations and (2) replacing the last output layer with a Gaussian process layer. On a suite of vision and language understanding benchmarks, SNGP outperforms other single-model approaches in prediction, calibration and out-of-domain detection. Furthermore, SNGP provides complementary benefits to popular techniques such as deep ensembles and data augmentation, making it a simple and scalable building block for probabilistic deep learning. Code is open-sourced at https://github.com/google/uncertainty-baselines

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

一种利用距离感知提高单模型深度不确定性的简单方法

准确的不确定性量化是深度学习的主要挑战，因为神经网络可能会产生过度自信的错误，并将高置信度的预测分配给分布外(OOD)输入。估计深度学习中预测不确定性的最流行方法是将来自多个神经网络(如贝叶斯神经网络(bnn)和深度集成)的预测结合起来的方法。然而，由于高内存和计算成本，它们在实时、工业规模应用中的实用性受到限制。此外，集成和bnn不一定能解决底层成员网络的所有问题。在这项工作中，我们研究了基于单一确定性表示的原则方法来提高单个网络的不确定性。通过将不确定性量化形式化为极小极大学习问题，我们首先识别距离感知，即模型量化测试样例与训练数据之间距离的能力，这是DNN实现高质量(即极小极大最优)不确定性估计的必要条件。然后，我们提出了谱归一化神经高斯过程(SNGP)，这是一种简单的方法，通过两个简单的改变来提高现代dnn的距离感知能力:(1)对隐藏权重应用谱归一化以增强表征中的bi-Lipschitz平滑性;(2)用高斯过程层替换最后一个输出层。在一系列视觉和语言理解基准测试中，SNGP在预测、校准和域外检测方面优于其他单模型方法。此外，SNGP为深度集成和数据增强等流行技术提供了补充优势，使其成为概率深度学习的简单且可扩展的构建块。代码在https://github.com/google/uncertainty-baselines上是开源的

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

J. Mach. Learn. Res.

自引率

0.00%

发文量

期刊最新文献

Scalable Computation of Causal Bounds A Unified Framework for Factorizing Distributional Value Functions for Multi-Agent Reinforcement Learning Adaptive False Discovery Rate Control with Privacy Guarantee Fairlearn: Assessing and Improving Fairness of AI Systems Generalization Bounds for Adversarial Contrastive Learning