SIAM journal on mathematics of data science最新文献

英文中文

Test Data Reuse for the Evaluation of Continuously Evolving Classification Algorithms Using the Area under the Receiver Operating Characteristic Curve 基于接收者工作特征曲线下面积的连续进化分类算法评估测试数据重用

Q1 MATHEMATICS, APPLIED

SIAM journal on mathematics of data science

Pub Date : 2021-01-01 DOI: 10.1137/20M1333110

Alexej Gossmann, Aria Pezeshk, Yu-ping Wang, B. Sahiner

引用次数: 6

Error Bounds for Dynamical Spectral Estimation. 动态频谱估算的误差限。

Q1 MATHEMATICS, APPLIED

SIAM journal on mathematics of data science

Pub Date : 2021-01-01 Epub Date: 2021-02-11 DOI: 10.1137/20m1335984

Robert J Webber, Erik H Thiede, Douglas Dow, Aaron R Dinner, Jonathan Weare

Dynamical spectral estimation is a well-established numerical approach for estimating eigenvalues and eigenfunctions of the Markov transition operator from trajectory data. Although the approach has been widely applied in biomolecular simulations, its error properties remain poorly understood. Here we analyze the error of a dynamical spectral estimation method called "the variational approach to conformational dynamics" (VAC). We bound the approximation error and estimation error for VAC estimates. Our analysis establishes VAC's convergence properties and suggests new strategies for tuning VAC to improve accuracy.

动态谱估计是从轨迹数据中估算马尔可夫转换算子特征值和特征函数的一种成熟的数值方法。虽然这种方法已广泛应用于生物分子模拟，但人们对其误差特性仍然知之甚少。在此，我们分析了一种名为 "构象动力学变分法"（VAC）的动态谱估计方法的误差。我们对 VAC 估计的近似误差和估计误差进行了约束。我们的分析确定了 VAC 的收敛特性，并提出了调整 VAC 以提高准确性的新策略。

引用次数: 0

Global Minima of Overparameterized Neural Networks 过参数化神经网络的全局最小值

Q1 MATHEMATICS, APPLIED

SIAM journal on mathematics of data science

Pub Date : 2021-01-01 DOI: 10.1137/19M1308943

Y. Cooper

引用次数: 20

Approximation Properties of Ridge Functions and Extreme Learning Machines 岭函数的近似性质与极限学习机

Q1 MATHEMATICS, APPLIED

SIAM journal on mathematics of data science

Pub Date : 2021-01-01 DOI: 10.1137/20m1356348

P. Jorgensen, D. Stewart

For a compact set $Dsubsetmathbb{R}^{m}$ we consider the problem of approximating a function $f$ over $D$ by sums of ridge functions ${x}mapstovarphi({w}^{T}{x})$ with ${w}$ in a given set $ma...

对于一个紧集$D子集$ mathbb{R}^{m}$，我们考虑用脊函数${x}mapstovarphi({w}^{T}{x})$和${w}$在给定集合$ma…

引用次数: 2

Nonbacktracking Eigenvalues under Node Removal: X-Centrality and Targeted Immunization 节点去除下的非回溯特征值:x中心性和靶向免疫

Q1 MATHEMATICS, APPLIED

SIAM journal on mathematics of data science

Pub Date : 2021-01-01 DOI: 10.1137/20M1352132

Leonardo A. B. Tôrres, Kevin S. Chan, Hanghang Tong, T. Eliassi-Rad

. The non-backtracking matrix and its eigenvalues have many applications in network science and 5 graph mining, such as node and edge centrality, community detection, length spectrum theory, 6 graph distance, and epidemic and percolation thresholds. In network epidemiology, the reciprocal 7 of the largest eigenvalue of the non-backtracking matrix is a good approximation for the epidemic 8 threshold of certain network dynamics. In this work, we develop techniques that identify which 9 nodes have the largest impact on this leading eigenvalue. We do so by studying the behavior of 10 the spectrum of the non-backtracking matrix after a node is removed from the graph. From this 11 analysis we derive two new centrality measures: X -degree and X-non-backtracking centrality . We 12 perform extensive experimentation with targeted immunization strategies derived from these two 13 centrality measures. Our spectral analysis and centrality measures can be broadly applied, and will 14 be of interest to both theorists and practitioners alike. the perturbation of quadratic eigenvalue problems, with applications to the NB- eigenvalues of the stochastic block

．非回溯矩阵及其特征值在网络科学和5图挖掘中有许多应用，如节点和边缘中心性、社区检测、长度谱理论、6图距离、流行和渗透阈值。在网络流行病学中，非回溯矩阵的最大特征值的倒数7是对某些网络动力学的流行阈值的一个很好的逼近。在这项工作中，我们开发了识别哪9个节点对这个主要特征值影响最大的技术。我们通过研究从图中删除一个节点后非回溯矩阵谱的行为来做到这一点。从这个分析中，我们得到了两个新的中心性度量:X度中心性和X非回溯中心性。我们对基于这两种中心性措施的靶向免疫策略进行了广泛的实验。我们的光谱分析和中心性测量可以广泛应用，并将引起理论家和实践者的兴趣。二次特征值问题的摄动，并应用于随机块的NB-特征值

{"title":"Nonbacktracking Eigenvalues under Node Removal: X-Centrality and Targeted Immunization","authors":"Leonardo A. B. Tôrres, Kevin S. Chan, Hanghang Tong, T. Eliassi-Rad","doi":"10.1137/20M1352132","DOIUrl":"https://doi.org/10.1137/20M1352132","url":null,"abstract":". The non-backtracking matrix and its eigenvalues have many applications in network science and 5 graph mining, such as node and edge centrality, community detection, length spectrum theory, 6 graph distance, and epidemic and percolation thresholds. In network epidemiology, the reciprocal 7 of the largest eigenvalue of the non-backtracking matrix is a good approximation for the epidemic 8 threshold of certain network dynamics. In this work, we develop techniques that identify which 9 nodes have the largest impact on this leading eigenvalue. We do so by studying the behavior of 10 the spectrum of the non-backtracking matrix after a node is removed from the graph. From this 11 analysis we derive two new centrality measures: X -degree and X-non-backtracking centrality . We 12 perform extensive experimentation with targeted immunization strategies derived from these two 13 centrality measures. Our spectral analysis and centrality measures can be broadly applied, and will 14 be of interest to both theorists and practitioners alike. the perturbation of quadratic eigenvalue problems, with applications to the NB- eigenvalues of the stochastic block","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"31 1","pages":"656-675"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85132979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Spectral neighbor joining for reconstruction of latent tree Models. 用于潜在树模型重建的光谱邻居连接。

Q1 MATHEMATICS, APPLIED

SIAM journal on mathematics of data science

Pub Date : 2021-01-01 Epub Date: 2021-02-01 DOI: 10.1137/20m1365715

Ariel Jaffe, Noah Amsel, Yariv Aizenbud, Boaz Nadler, Joseph T Chang, Yuval Kluger

A common assumption in multiple scientific applications is that the distribution of observed data can be modeled by a latent tree graphical model. An important example is phylogenetics, where the tree models the evolutionary lineages of a set of observed organisms. Given a set of independent realizations of the random variables at the leaves of the tree, a key challenge is to infer the underlying tree topology. In this work we develop Spectral Neighbor Joining (SNJ), a novel method to recover the structure of latent tree graphical models. Given a matrix that contains a measure of similarity between all pairs of observed variables, SNJ computes a spectral measure of cohesion between groups of observed variables. We prove that SNJ is consistent, and derive a sufficient condition for correct tree recovery from an estimated similarity matrix. Combining this condition with a concentration of measure result on the similarity matrix, we bound the number of samples required to recover the tree with high probability. We illustrate via extensive simulations that in comparison to several other reconstruction methods, SNJ requires fewer samples to accurately recover trees with a large number of leaves or long edges.

在许多科学应用中，一个常见的假设是观测数据的分布可以用潜在树图形模型来建模。一个重要的例子是系统发育学，其中树模拟了一组观察到的生物体的进化谱系。给定树的叶子处随机变量的一组独立实现，一个关键的挑战是推断底层树的拓扑结构。本文提出了一种用于恢复潜在树图模型结构的新方法——谱邻域连接(SNJ)。给定一个矩阵，其中包含所有观察变量对之间的相似性度量，SNJ计算观察变量组之间的内聚度的谱度量。我们证明了SNJ是一致的，并从估计的相似矩阵中得到了正确树恢复的充分条件。结合这一条件和测量结果在相似性矩阵上的集中，我们限定了高概率恢复树所需的样本数。我们通过大量的模拟证明，与其他几种重建方法相比，SNJ需要更少的样本才能准确地恢复具有大量叶子或长边的树木。

{"title":"Spectral neighbor joining for reconstruction of latent tree Models.","authors":"Ariel Jaffe, Noah Amsel, Yariv Aizenbud, Boaz Nadler, Joseph T Chang, Yuval Kluger","doi":"10.1137/20m1365715","DOIUrl":"https://doi.org/10.1137/20m1365715","url":null,"abstract":"A common assumption in multiple scientific applications is that the distribution of observed data can be modeled by a latent tree graphical model. An important example is phylogenetics, where the tree models the evolutionary lineages of a set of observed organisms. Given a set of independent realizations of the random variables at the leaves of the tree, a key challenge is to infer the underlying tree topology. In this work we develop Spectral Neighbor Joining (SNJ), a novel method to recover the structure of latent tree graphical models. Given a matrix that contains a measure of similarity between all pairs of observed variables, SNJ computes a spectral measure of cohesion between groups of observed variables. We prove that SNJ is consistent, and derive a sufficient condition for correct tree recovery from an estimated similarity matrix. Combining this condition with a concentration of measure result on the similarity matrix, we bound the number of samples required to recover the tree with high probability. We illustrate via extensive simulations that in comparison to several other reconstruction methods, SNJ requires fewer samples to accurately recover trees with a large number of leaves or long edges.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"3 1","pages":"113-141"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8194222/pdf/nihms-1702804.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39091867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Doubly Stochastic Normalization of the Gaussian Kernel Is Robust to Heteroskedastic Noise. 高斯核的双随机归一化对异方差噪声具有鲁棒性。

Q1 MATHEMATICS, APPLIED

SIAM journal on mathematics of data science

Pub Date : 2021-01-01 Epub Date: 2021-03-23 DOI: 10.1137/20M1342124

Boris Landa, Ronald R Coifman, Yuval Kluger

A fundamental step in many data-analysis techniques is the construction of an affinity matrix describing similarities between data points. When the data points reside in Euclidean space, a widespread approach is to from an affinity matrix by the Gaussian kernel with pairwise distances, and to follow with a certain normalization (e.g. the row-stochastic normalization or its symmetric variant). We demonstrate that the doubly-stochastic normalization of the Gaussian kernel with zero main diagonal (i.e., no self loops) is robust to heteroskedastic noise. That is, the doubly-stochastic normalization is advantageous in that it automatically accounts for observations with different noise variances. Specifically, we prove that in a suitable high-dimensional setting where heteroskedastic noise does not concentrate too much in any particular direction in space, the resulting (doubly-stochastic) noisy affinity matrix converges to its clean counterpart with rate m ^-1/2, where m is the ambient dimension. We demonstrate this result numerically, and show that in contrast, the popular row-stochastic and symmetric normalizations behave unfavorably under heteroskedastic noise. Furthermore, we provide examples of simulated and experimental single-cell RNA sequence data with intrinsic heteroskedasticity, where the advantage of the doubly-stochastic normalization for exploratory analysis is evident.

许多数据分析技术中的一个基本步骤是构建描述数据点之间相似性的关联矩阵。当数据点位于欧几里得空间时，一种广泛的方法是通过具有成对距离的高斯核从亲和矩阵中提取，并遵循一定的归一化(例如行随机归一化或其对称变体)。我们证明了零主对角线高斯核的双随机归一化(即没有自环)对异方差噪声具有鲁棒性。也就是说，双随机归一化是有利的，因为它自动解释了具有不同噪声方差的观测值。具体来说，我们证明了在合适的高维环境中，异方差噪声不会过多地集中在空间的任何特定方向上，由此产生的(双随机)噪声亲和矩阵以m -1/2的速率收敛到其干净的对应矩阵，其中m是环境维数。我们在数值上证明了这一结果，并表明，与之相反，流行的行随机和对称归一化在异方差噪声下表现不利。此外，我们提供了具有内在异方差的模拟和实验单细胞RNA序列数据的示例，其中双随机归一化用于探索性分析的优势是显而易见的。

{"title":"Doubly Stochastic Normalization of the Gaussian Kernel Is Robust to Heteroskedastic Noise.","authors":"Boris Landa, Ronald R Coifman, Yuval Kluger","doi":"10.1137/20M1342124","DOIUrl":"https://doi.org/10.1137/20M1342124","url":null,"abstract":"A fundamental step in many data-analysis techniques is the construction of an affinity matrix describing similarities between data points. When the data points reside in Euclidean space, a widespread approach is to from an affinity matrix by the Gaussian kernel with pairwise distances, and to follow with a certain normalization (e.g. the row-stochastic normalization or its symmetric variant). We demonstrate that the doubly-stochastic normalization of the Gaussian kernel with zero main diagonal (i.e., no self loops) is robust to heteroskedastic noise. That is, the doubly-stochastic normalization is advantageous in that it automatically accounts for observations with different noise variances. Specifically, we prove that in a suitable high-dimensional setting where heteroskedastic noise does not concentrate too much in any particular direction in space, the resulting (doubly-stochastic) noisy affinity matrix converges to its clean counterpart with rate m -1/2, where m is the ambient dimension. We demonstrate this result numerically, and show that in contrast, the popular row-stochastic and symmetric normalizations behave unfavorably under heteroskedastic noise. Furthermore, we provide examples of simulated and experimental single-cell RNA sequence data with intrinsic heteroskedasticity, where the advantage of the doubly-stochastic normalization for exploratory analysis is evident.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"3 1","pages":"388-413"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8194191/pdf/nihms-1702812.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39091868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

MR-GAN: Manifold Regularized Generative Adversarial Networks for Scientific Data gan:科学数据的流形正则化生成对抗网络

Q1 MATHEMATICS, APPLIED

SIAM journal on mathematics of data science

Pub Date : 2021-01-01 DOI: 10.1137/20m1344299

Qunwei Li, B. Kailkhura, R. Anirudh, Jize Zhang, Yi Zhou, Yingbin Liang, T. Y. Han, P. Varshney

引用次数: 1

An Optimal Algorithm for Strict Circular Seriation 严格循环序列的最优算法

Q1 MATHEMATICS, APPLIED

SIAM journal on mathematics of data science

Pub Date : 2021-01-01 DOI: 10.1137/21m139356x

Santiago Armstrong, Crist'obal Guzm'an, C. Sing-Long

We study the problem of circular seriation, where we are given a matrix of pairwise dissimilarities between $n$ objects, and the goal is to find a {em circular order} of the objects in a manner that is consistent with their dissimilarity. This problem is a generalization of the classical {em linear seriation} problem where the goal is to find a {em linear order}, and for which optimal ${cal O}(n^2)$ algorithms are known. Our contributions can be summarized as follows. First, we introduce {em circular Robinson matrices} as the natural class of dissimilarity matrices for the circular seriation problem. Second, for the case of {em strict circular Robinson dissimilarity matrices} we provide an optimal ${cal O}(n^2)$ algorithm for the circular seriation problem. Finally, we propose a statistical model to analyze the well-posedness of the circular seriation problem for large $n$. In particular, we establish ${cal O}(log(n)/n)$ rates on the distance between any circular ordering found by solving the circular seriation problem to the underlying order of the model, in the Kendall-tau metric.

我们研究了圆序列问题，在这个问题中，我们给出了一个$n$对象之间的配对不相似度矩阵，目标是以与它们的不相似度一致的方式找到对象的{em圆阶}。这个问题是经典的{em线性序列化}问题的推广，其目标是找到一个{em线性顺序}，并且已知最优的${cal O}(n^2)$算法。我们的贡献可以概括如下。首先，我们引入{em圆形罗宾逊矩阵}作为圆形序列问题的不相似矩阵的自然类别。其次，对于{em严格循环罗宾逊不相似矩阵}的情况，我们为循环序列化问题提供了一个最优的${cal O}(n^2)$算法。最后，我们提出了一个统计模型来分析大$n$的圆序列问题的适定性。特别地，我们在Kendall-tau度量中建立了${cal O}(log(n)/n)$速率，该速率是通过解决循环序列化问题找到的任何循环顺序与模型底层顺序之间的距离。

引用次数: 4

k-Variance: A Clustered Notion of Variance k-方差:方差的聚类概念

Q1 MATHEMATICS, APPLIED

SIAM journal on mathematics of data science

Pub Date : 2020-12-13 DOI: 10.1137/20m1385895

J. Solomon, Kristjan H. Greenewald, H. Nagaraja

We introduce $k$-variance, a generalization of variance built on the machinery of random bipartite matchings. $K$-variance measures the expected cost of matching two sets of $k$ samples from a distribution to each other, capturing local rather than global information about a measure as $k$ increases; it is easily approximated stochastically using sampling and linear programming. In addition to defining $k$-variance and proving its basic properties, we provide in-depth analysis of this quantity in several key cases, including one-dimensional measures, clustered measures, and measures concentrated on low-dimensional subsets of $mathbb R^n$. We conclude with experiments and open problems motivated by this new way to summarize distributional shape.

我们引入$k$-variance，一个建立在随机二部匹配机制上的方差的泛化。$K$-variance衡量从一个分布中匹配两组$K$样本的预期成本，随着$K$的增加，捕获有关度量的局部信息而不是全局信息;它很容易用抽样和线性规划进行随机逼近。除了定义$k$方差并证明其基本性质之外，我们还在几个关键情况下对这个量进行了深入分析，包括一维度量、聚类度量和集中在$mathbb R^n$的低维子集上的度量。最后，我们用实验和开放性问题来总结这种新的分布形状。

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

SIAM journal on mathematics of data science

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀