Doubly Stochastic Normalization of the Gaussian Kernel Is Robust to Heteroskedastic Noise.

IF 1.9 Q1 MATHEMATICS, APPLIED SIAM journal on mathematics of data science Pub Date : 2021-01-01 Epub Date: 2021-03-23 DOI:10.1137/20M1342124
Boris Landa, Ronald R Coifman, Yuval Kluger
{"title":"Doubly Stochastic Normalization of the Gaussian Kernel Is Robust to Heteroskedastic Noise.","authors":"Boris Landa,&nbsp;Ronald R Coifman,&nbsp;Yuval Kluger","doi":"10.1137/20M1342124","DOIUrl":null,"url":null,"abstract":"<p><p>A fundamental step in many data-analysis techniques is the construction of an affinity matrix describing similarities between data points. When the data points reside in Euclidean space, a widespread approach is to from an affinity matrix by the Gaussian kernel with pairwise distances, and to follow with a certain normalization (e.g. the row-stochastic normalization or its symmetric variant). We demonstrate that the doubly-stochastic normalization of the Gaussian kernel with zero main diagonal (i.e., no self loops) is robust to heteroskedastic noise. That is, the doubly-stochastic normalization is advantageous in that it automatically accounts for observations with different noise variances. Specifically, we prove that in a suitable high-dimensional setting where heteroskedastic noise does not concentrate too much in any particular direction in space, the resulting (doubly-stochastic) noisy affinity matrix converges to its clean counterpart with rate <i>m</i> <sup>-1/2</sup>, where <i>m</i> is the ambient dimension. We demonstrate this result numerically, and show that in contrast, the popular row-stochastic and symmetric normalizations behave unfavorably under heteroskedastic noise. Furthermore, we provide examples of simulated and experimental single-cell RNA sequence data with intrinsic heteroskedasticity, where the advantage of the doubly-stochastic normalization for exploratory analysis is evident.</p>","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"3 1","pages":"388-413"},"PeriodicalIF":1.9000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8194191/pdf/nihms-1702812.pdf","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIAM journal on mathematics of data science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1137/20M1342124","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/3/23 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}
引用次数: 17

Abstract

A fundamental step in many data-analysis techniques is the construction of an affinity matrix describing similarities between data points. When the data points reside in Euclidean space, a widespread approach is to from an affinity matrix by the Gaussian kernel with pairwise distances, and to follow with a certain normalization (e.g. the row-stochastic normalization or its symmetric variant). We demonstrate that the doubly-stochastic normalization of the Gaussian kernel with zero main diagonal (i.e., no self loops) is robust to heteroskedastic noise. That is, the doubly-stochastic normalization is advantageous in that it automatically accounts for observations with different noise variances. Specifically, we prove that in a suitable high-dimensional setting where heteroskedastic noise does not concentrate too much in any particular direction in space, the resulting (doubly-stochastic) noisy affinity matrix converges to its clean counterpart with rate m -1/2, where m is the ambient dimension. We demonstrate this result numerically, and show that in contrast, the popular row-stochastic and symmetric normalizations behave unfavorably under heteroskedastic noise. Furthermore, we provide examples of simulated and experimental single-cell RNA sequence data with intrinsic heteroskedasticity, where the advantage of the doubly-stochastic normalization for exploratory analysis is evident.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
高斯核的双随机归一化对异方差噪声具有鲁棒性。
许多数据分析技术中的一个基本步骤是构建描述数据点之间相似性的关联矩阵。当数据点位于欧几里得空间时,一种广泛的方法是通过具有成对距离的高斯核从亲和矩阵中提取,并遵循一定的归一化(例如行随机归一化或其对称变体)。我们证明了零主对角线高斯核的双随机归一化(即没有自环)对异方差噪声具有鲁棒性。也就是说,双随机归一化是有利的,因为它自动解释了具有不同噪声方差的观测值。具体来说,我们证明了在合适的高维环境中,异方差噪声不会过多地集中在空间的任何特定方向上,由此产生的(双随机)噪声亲和矩阵以m -1/2的速率收敛到其干净的对应矩阵,其中m是环境维数。我们在数值上证明了这一结果,并表明,与之相反,流行的行随机和对称归一化在异方差噪声下表现不利。此外,我们提供了具有内在异方差的模拟和实验单细胞RNA序列数据的示例,其中双随机归一化用于探索性分析的优势是显而易见的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Entropic Optimal Transport on Random Graphs A Universal Trade-off Between the Model Size, Test Loss, and Training Loss of Linear Predictors Approximating Probability Distributions by Using Wasserstein Generative Adversarial Networks Adversarial Robustness of Sparse Local Lipschitz Predictors The GenCol Algorithm for High-Dimensional Optimal Transport: General Formulation and Application to Barycenters and Wasserstein Splines
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1