Internet Mathematics最新文献_第9页

Multiscale Matrix Sampling and Sublinear-Time PageRank Computation 多尺度矩阵采样与次线性时间PageRank计算

Q3 Mathematics

Internet Mathematics

Pub Date : 2012-02-13 DOI: 10.1080/15427951.2013.802752

C. Borgs, Mickey Brautbar, J. Chayes, S. Teng

Abstract A fundamental problem arising in many applications in Web science and social network analysis is the problem of identifying all nodes in a network whose PageRank exceeds a given threshold Δ. In this paper, we study the probabilistic version of the problem whereby given an arbitrary approximation factor c > 1, we are asked to output a set S of nodes such that with high probability, S contains all nodes of PageRank at least Δ, and no node of PageRank smaller than Δ/c. We call this problem SignificantPageRanks. We develop a nearly optimal local algorithm for the problem with time complexity on networks with n nodes, where the tilde hides a polylogarithmic factor. We show that every algorithm for solving this problem must have running time of Ω(n/Δ), rendering our algorithm optimal up to logarithmic factors. Our algorithm has sublinear time complexity for applications including Web crawling and Web search that require efficient identification of nodes whose PageRanks are above a threshold Δ = nδ, for some constant 0 < δ < 1. Our algorithm comes with two main technical contributions. The first is a multiscale sampling scheme for a basic matrix problem that could be of interest on its own. For us, it appears as an abstraction of a subproblem we need to tackle in order to solve the SignificantPageRanks problem, but we hope that this abstraction will be useful in designing fast algorithms for identifying nodes that are significant beyond PageRank measurements. In the abstract matrix problem, it is assumed that one can access an unknown right-stochastic matrix by querying its rows, where the cost of a query and the accuracy of the answers depend on a precision parameter ε. At a cost propositional to 1/ε, the query will return a list of O(1/ε) entries and their indices that provide an ε-precision approximation of the row. Our task is to find a set that contains all columns whose sum is at least Δ and omits every column whose sum is less than Δ/c. Our multiscale sampling scheme solves this problem with cost , while traditional sampling algorithms would take time Θ((n/Δ)2). Our second main technical contribution is a new local algorithm for approximating personalized PageRank, which is more robust than the earlier ones developed in [Jeh and Widom 03, Andersen et al. 06] and is highly efficient, particularly for networks with large in-degrees or out-degrees. Together with our multiscale sampling scheme, we are able to solve the SignificantPageRanks problem optimally.

在Web科学和社会网络分析的许多应用中出现的一个基本问题是识别网络中PageRank超过给定阈值Δ的所有节点的问题。在本文中，我们研究了该问题的概率版本，即给定任意近似因子c > 1，我们被要求输出一个节点集S，使得在高概率下，S包含PageRank的所有节点至少Δ，并且没有PageRank的节点小于Δ/c。我们称这个问题为显著网页排名。我们开发了一种近乎最优的局部算法来解决n节点网络上的时间复杂度问题，其中波浪隐藏了一个多对数因子。我们证明，解决这个问题的每个算法必须具有Ω(n/Δ)的运行时间，使我们的算法优化到对数因子。对于需要有效识别pagerank超过阈值Δ = nδ(对于某些常数0 < Δ < 1)的节点的Web爬行和Web搜索等应用，我们的算法具有亚线性时间复杂度。我们的算法有两个主要的技术贡献。第一个是一个基本矩阵问题的多尺度抽样方案，它本身可能很有趣。对我们来说，它似乎是为了解决显著PageRank问题而需要解决的子问题的抽象，但我们希望这种抽象将有助于设计快速算法来识别超越PageRank测量的重要节点。在抽象矩阵问题中，假设一个人可以通过查询其行来访问未知的右随机矩阵，其中查询的代价和答案的准确性取决于精度参数ε。在代价命题为1/ε时，查询将返回一个由O(1/ε)个条目组成的列表，以及它们的索引，这些索引提供了行的ε精度近似值。我们的任务是找到一个集合，它包含总和至少为Δ的所有列，并省略总和小于Δ/c的所有列。我们的多尺度采样方案用成本解决了这个问题，而传统的采样算法需要时间Θ((n/Δ)2)。我们的第二个主要技术贡献是一种新的局部算法，用于近似个性化PageRank，它比[Jeh and wisdom 03, Andersen et al. 06]中开发的早期算法更鲁棒，并且效率很高，特别是对于具有较大的进度或出度的网络。结合我们的多尺度采样方案，我们能够最优地解决显著网页排名问题。

{"title":"Multiscale Matrix Sampling and Sublinear-Time PageRank Computation","authors":"C. Borgs, Mickey Brautbar, J. Chayes, S. Teng","doi":"10.1080/15427951.2013.802752","DOIUrl":"https://doi.org/10.1080/15427951.2013.802752","url":null,"abstract":"Abstract A fundamental problem arising in many applications in Web science and social network analysis is the problem of identifying all nodes in a network whose PageRank exceeds a given threshold Δ. In this paper, we study the probabilistic version of the problem whereby given an arbitrary approximation factor c > 1, we are asked to output a set S of nodes such that with high probability, S contains all nodes of PageRank at least Δ, and no node of PageRank smaller than Δ/c. We call this problem SignificantPageRanks. We develop a nearly optimal local algorithm for the problem with time complexity on networks with n nodes, where the tilde hides a polylogarithmic factor. We show that every algorithm for solving this problem must have running time of Ω(n/Δ), rendering our algorithm optimal up to logarithmic factors. Our algorithm has sublinear time complexity for applications including Web crawling and Web search that require efficient identification of nodes whose PageRanks are above a threshold Δ = nδ, for some constant 0 < δ < 1. Our algorithm comes with two main technical contributions. The first is a multiscale sampling scheme for a basic matrix problem that could be of interest on its own. For us, it appears as an abstraction of a subproblem we need to tackle in order to solve the SignificantPageRanks problem, but we hope that this abstraction will be useful in designing fast algorithms for identifying nodes that are significant beyond PageRank measurements. In the abstract matrix problem, it is assumed that one can access an unknown right-stochastic matrix by querying its rows, where the cost of a query and the accuracy of the answers depend on a precision parameter ε. At a cost propositional to 1/ε, the query will return a list of O(1/ε) entries and their indices that provide an ε-precision approximation of the row. Our task is to find a set that contains all columns whose sum is at least Δ and omits every column whose sum is less than Δ/c. Our multiscale sampling scheme solves this problem with cost , while traditional sampling algorithms would take time Θ((n/Δ)2). Our second main technical contribution is a new local algorithm for approximating personalized PageRank, which is more robust than the earlier ones developed in [Jeh and Widom 03, Andersen et al. 06] and is highly efficient, particularly for networks with large in-degrees or out-degrees. Together with our multiscale sampling scheme, we are able to solve the SignificantPageRanks problem optimally.","PeriodicalId":38105,"journal":{"name":"Internet Mathematics","volume":"10 1","pages":"20 - 48"},"PeriodicalIF":0.0,"publicationDate":"2012-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15427951.2013.802752","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59947534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

On the Hyperbolicity of Small-World and Treelike Random Graphs 关于小世界树状随机图的双曲性

Q3 Mathematics

Internet Mathematics

Pub Date : 2012-01-09 DOI: 10.1080/15427951.2013.828336

Wei Chen, Wenjie Fang, Guangda Hu, Michael W. Mahoney

Hyperbolicity is a property of a graph that may be viewed as a “soft” version of a tree, and recent empirical and theoretical work has suggested that many graphs arising in Internet and related data applications have hyperbolic properties. Here we consider Gromov's notion of δ-hyperbolicity and establish several positive and negative results for small-world and treelike random graph models. First, we study the hyperbolicity of the class of Kleinberg small-world random graphs , where n is the number of vertices in the graph, d is the dimension of the underlying base grid B, and γ is the small-world parameter such that each node u in the graph connects to another node v in the graph with probability proportional to 1/dB (u, v)γ, with dB (u, v) the grid distance from u to v in the base grid B. We show that when γ=d, the parameter value allowing efficient decentralized routing in Kleinberg's small-world network,the hyperbolic δ is with probability 1−o(1) for every ϵ>0 independent of n. We see that hyperbolicity is not significantly improved in relation to graph diameter even when the long-range connections greatly improve decentralized navigation. We also show that for other values of γ, the hyperbolic δ is very close to the graph diameter, indicating poor hyperbolicity in these graphs as well. Next we study a class of treelike graphs called ringed trees that have constant hyperbolicity. We show that adding random links among the leaves in a manner similar to the small-world graph constructions may easily destroy the hyperbolicity of the graphs, except for a class of random edges added using an exponentially decaying probability function based on the ring distance among the leaves. Our study provides one of the first significant analytic results on the hyperbolicity of a rich class of random graphs, which sheds light on the relationship between hyperbolicity and navigability of random graphs, as well as on the sensitivity of hyperbolic δ to noises in random graphs.

双曲性是图的一种属性，可以看作是树的“软”版本，最近的经验和理论工作表明，在互联网和相关数据应用中出现的许多图都具有双曲性。本文考虑了Gromov的δ-双曲性概念，并建立了小世界和树状随机图模型的几个正负结果。首先,我们研究类的双曲率小世界jonkleinberg随机图,其中n是图中顶点的数量,d是底层基础网格的尺寸B,γ是小世界参数,这样每个节点u图中连接到另一个节点图中概率正比于1 / dB (u, v)γ与dB (u, v)网格距离u, v在网格基础我们表明,当γ= d,在Kleinberg的小世界网络中，允许高效分散路由的参数值，对于每个λ >0，双曲δ的概率为1−0(1)，与n无关。我们看到，即使远程连接极大地改善了分散导航，双曲度也没有显著改善。我们还表明，对于γ的其他值，双曲δ非常接近图直径，表明这些图的双曲性也很差。接下来，我们研究一类具有常双曲性的树状图，称为环状树。我们表明，以类似于小世界图构造的方式在叶之间添加随机链接可能很容易破坏图的双曲性，除了使用基于叶之间环距离的指数衰减概率函数添加的一类随机边。我们的研究提供了关于一类丰富的随机图的双曲性的第一个重要的分析结果之一，它揭示了双曲性与随机图的可通航性之间的关系，以及随机图中双曲δ对噪声的敏感性。

{"title":"On the Hyperbolicity of Small-World and Treelike Random Graphs","authors":"Wei Chen, Wenjie Fang, Guangda Hu, Michael W. Mahoney","doi":"10.1080/15427951.2013.828336","DOIUrl":"https://doi.org/10.1080/15427951.2013.828336","url":null,"abstract":"Hyperbolicity is a property of a graph that may be viewed as a “soft” version of a tree, and recent empirical and theoretical work has suggested that many graphs arising in Internet and related data applications have hyperbolic properties. Here we consider Gromov's notion of δ-hyperbolicity and establish several positive and negative results for small-world and treelike random graph models. First, we study the hyperbolicity of the class of Kleinberg small-world random graphs , where n is the number of vertices in the graph, d is the dimension of the underlying base grid B, and γ is the small-world parameter such that each node u in the graph connects to another node v in the graph with probability proportional to 1/dB (u, v)γ, with dB (u, v) the grid distance from u to v in the base grid B. We show that when γ=d, the parameter value allowing efficient decentralized routing in Kleinberg's small-world network,the hyperbolic δ is with probability 1−o(1) for every ϵ>0 independent of n. We see that hyperbolicity is not significantly improved in relation to graph diameter even when the long-range connections greatly improve decentralized navigation. We also show that for other values of γ, the hyperbolic δ is very close to the graph diameter, indicating poor hyperbolicity in these graphs as well. Next we study a class of treelike graphs called ringed trees that have constant hyperbolicity. We show that adding random links among the leaves in a manner similar to the small-world graph constructions may easily destroy the hyperbolicity of the graphs, except for a class of random edges added using an exponentially decaying probability function based on the ring distance among the leaves. Our study provides one of the first significant analytic results on the hyperbolicity of a rich class of random graphs, which sheds light on the relationship between hyperbolicity and navigability of random graphs, as well as on the sensitivity of hyperbolic δ to noises in random graphs.","PeriodicalId":38105,"journal":{"name":"Internet Mathematics","volume":"9 1","pages":"434 - 491"},"PeriodicalIF":0.0,"publicationDate":"2012-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15427951.2013.828336","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59947635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 82

Editorial Board EOV 编辑委员会EOV

Q3 Mathematics

Internet Mathematics

Pub Date : 2011-11-28 DOI: 10.1080/15427951.2011.630923

引用次数: 0

Extension and Robustness of Transitivity Clustering for Protein–Protein Interaction Network Analysis 传递性聚类在蛋白质-蛋白质相互作用网络分析中的可拓性和鲁棒性

Q3 Mathematics

Internet Mathematics

Pub Date : 2011-11-28 DOI: 10.1080/15427951.2011.604559

T. Wittkop, S. Rahmann, Richard Röttger, Sebastian Böcker, J. Baumbach

Abstract Partitioning biological data objects into groups such that the objects within the groups share common traits is a longstanding challenge in computational biology. Recently, we developed and established transitivity clustering, a partitioning approach based on weighted transitive graph projection that utilizes a single similarity threshold as density parameter. In previous publications, we concentrated on the graphical user interface and on concrete biomedical application protocols. Here, we contribute the following theoretical considerations: (1) We provide proofs that the average similarity between objects from the same cluster is above the user-given threshold and that the average similarity between objects from different clusters is below the threshold. (2) We extend transitivity clustering to an overlapping clustering tool by integrating two new approaches. (3) We demonstrate the power of transitivity clustering for protein-complex detection. We evaluate our approaches against others by utilizing gold-standard data that was previously used by Brohée et al. for reviewing existing bioinformatics clustering tools. The extended version of this article is available online at http://transclust.mpi-inf.mpg.de .

将生物数据对象划分为组，使组内的对象具有共同的特征，是计算生物学中一个长期存在的挑战。最近，我们开发并建立了传递性聚类，这是一种基于加权传递图投影的划分方法，利用单个相似阈值作为密度参数。在以前的出版物中，我们主要关注图形用户界面和具体的生物医学应用协议。在这里，我们提供了以下理论考虑:(1)我们提供了证明，同一簇中对象之间的平均相似度高于用户给定的阈值，而不同簇中对象之间的平均相似度低于阈值。(2)通过整合两种新方法，将传递性聚类扩展为重叠聚类工具。(3)我们证明了传递性聚类在蛋白质复合物检测中的作用。我们利用broh等人先前用于审查现有生物信息学聚类工具的金标准数据，对我们的方法进行了评估。本文的扩展版本可在http://transclust.mpi-inf.mpg.de上在线获得。

{"title":"Extension and Robustness of Transitivity Clustering for Protein–Protein Interaction Network Analysis","authors":"T. Wittkop, S. Rahmann, Richard Röttger, Sebastian Böcker, J. Baumbach","doi":"10.1080/15427951.2011.604559","DOIUrl":"https://doi.org/10.1080/15427951.2011.604559","url":null,"abstract":"Abstract Partitioning biological data objects into groups such that the objects within the groups share common traits is a longstanding challenge in computational biology. Recently, we developed and established transitivity clustering, a partitioning approach based on weighted transitive graph projection that utilizes a single similarity threshold as density parameter. In previous publications, we concentrated on the graphical user interface and on concrete biomedical application protocols. Here, we contribute the following theoretical considerations: (1) We provide proofs that the average similarity between objects from the same cluster is above the user-given threshold and that the average similarity between objects from different clusters is below the threshold. (2) We extend transitivity clustering to an overlapping clustering tool by integrating two new approaches. (3) We demonstrate the power of transitivity clustering for protein-complex detection. We evaluate our approaches against others by utilizing gold-standard data that was previously used by Brohée et al. for reviewing existing bioinformatics clustering tools. The extended version of this article is available online at http://transclust.mpi-inf.mpg.de .","PeriodicalId":38105,"journal":{"name":"Internet Mathematics","volume":"7 1","pages":"255 - 273"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15427951.2011.604559","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59946736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Using Biological Networks in Protein Function Prediction and Gene Expression Analysis 生物网络在蛋白质功能预测和基因表达分析中的应用

Q3 Mathematics

Internet Mathematics

Pub Date : 2011-11-28 DOI: 10.1080/15427951.2011.604561

L. Wong

Abstract While sequence homology search has been the main workhorse in protein function prediction, it is not applicable to a significant portion of novel proteins that do not have informative homologues in sequence databases. Similarly, while statistical tests and learning algorithms based purely on gene expression profiles have been popular for analyzing disease samples, critical issues remain in the understanding of diseases based on the differentially expressed genes suggested by these methods. In the past decade, a large number of databases providing information on various types of biological networks have become available. These databases make it possible to tackle these and other biological problems in novel ways. This paper presents a review of biological network databases and approaches to protein function prediction and gene expression profile analysis that are based on biological networks.

虽然序列同源性搜索一直是蛋白质功能预测的主要手段，但它并不适用于在序列数据库中没有信息同源性的新蛋白质的很大一部分。同样，虽然纯粹基于基因表达谱的统计测试和学习算法在分析疾病样本方面已经很流行，但关键问题仍然是基于这些方法所建议的差异表达基因来理解疾病。在过去的十年中，已经出现了大量提供各种类型生物网络信息的数据库。这些数据库使得以新颖的方式解决这些和其他生物学问题成为可能。本文综述了生物网络数据库以及基于生物网络的蛋白质功能预测和基因表达谱分析方法。

引用次数: 3

KeyPathwayMiner: Detecting Case-Specific Biological Pathways Using Expression Data KeyPathwayMiner:使用表达数据检测特定病例的生物学途径

Q3 Mathematics

Internet Mathematics

Pub Date : 2011-11-28 DOI: 10.1080/15427951.2011.604548

N. Alcaraz, Hande Küçük, Jochen Weile, A. Wipat, J. Baumbach

Abstract Recent advances in systems biology have provided us with massive amounts of pathway data that describe the interplay of genes and their products. The resulting biological networks can be modeled as graphs. By means of “omics” technologies, such as microarrays, the activity of genes and proteins can be measured. Here, data from microarray experiments is integrated with the network data to gain deeper insights into gene expression. We introduce KeyPathwayMiner, a method that enables the extraction and visualization of interesting subpathways given the results of a series of gene expression studies. We aim to detect highly connected subnetworks in which most genes or proteins show similar patterns of expression. Specifically, given network and gene expression data, KeyPathwayMiner identifies those maximal subgraphs where all but k nodes of the subnetwork are expressed similarly in all but l cases in the gene expression data. Since identifying these subgraphs is computationally intensive, we developed a heuristic algorithm based on Ant Colony Optimization. We implemented KeyPathwayMiner as a plug-in for Cytoscape. Our computational model is related to a strategy presented by Ulitsky et al. in 2008. Consequently, we used the same data sets for evaluation. KeyPathwayMiner is available online at http://keypathwayminer.mpi-inf.mpg.de .

系统生物学的最新进展为我们提供了大量描述基因及其产物相互作用的途径数据。由此产生的生物网络可以建模为图形。通过“组学”技术，如微阵列，可以测量基因和蛋白质的活性。在这里，来自微阵列实验的数据与网络数据相结合，以更深入地了解基因表达。我们介绍了KeyPathwayMiner，这是一种方法，可以根据一系列基因表达研究的结果提取和可视化有趣的子通路。我们的目标是检测高度连接的子网络，其中大多数基因或蛋白质表现出相似的表达模式。具体来说，给定网络和基因表达数据，KeyPathwayMiner识别出那些最大的子图，在这些子图中，除了k个节点外，子网络的所有节点在基因表达数据中的所有情况下都是相似的。由于识别这些子图的计算量很大，我们开发了一种基于蚁群优化的启发式算法。我们将KeyPathwayMiner作为Cytoscape的插件来实现。我们的计算模型与Ulitsky等人在2008年提出的策略有关。因此，我们使用相同的数据集进行评估。KeyPathwayMiner可在线访问http://keypathwayminer.mpi-inf.mpg.de。

{"title":"KeyPathwayMiner: Detecting Case-Specific Biological Pathways Using Expression Data","authors":"N. Alcaraz, Hande Küçük, Jochen Weile, A. Wipat, J. Baumbach","doi":"10.1080/15427951.2011.604548","DOIUrl":"https://doi.org/10.1080/15427951.2011.604548","url":null,"abstract":"Abstract Recent advances in systems biology have provided us with massive amounts of pathway data that describe the interplay of genes and their products. The resulting biological networks can be modeled as graphs. By means of “omics” technologies, such as microarrays, the activity of genes and proteins can be measured. Here, data from microarray experiments is integrated with the network data to gain deeper insights into gene expression. We introduce KeyPathwayMiner, a method that enables the extraction and visualization of interesting subpathways given the results of a series of gene expression studies. We aim to detect highly connected subnetworks in which most genes or proteins show similar patterns of expression. Specifically, given network and gene expression data, KeyPathwayMiner identifies those maximal subgraphs where all but k nodes of the subnetwork are expressed similarly in all but l cases in the gene expression data. Since identifying these subgraphs is computationally intensive, we developed a heuristic algorithm based on Ant Colony Optimization. We implemented KeyPathwayMiner as a plug-in for Cytoscape. Our computational model is related to a strategy presented by Ulitsky et al. in 2008. Consequently, we used the same data sets for evaluation. KeyPathwayMiner is available online at http://keypathwayminer.mpi-inf.mpg.de .","PeriodicalId":38105,"journal":{"name":"Internet Mathematics","volume":"7 1","pages":"299 - 313"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15427951.2011.604548","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59946688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 55

Googling the Brain: Discovering Hierarchical and Asymmetric Network Structures, with Applications in Neuroscience 谷歌搜索大脑:发现层次和不对称网络结构，在神经科学中的应用

Q3 Mathematics

Internet Mathematics

Pub Date : 2011-11-28 DOI: 10.1080/15427951.2011.604284

J. J. Crofts, D. Higham

Abstract Hierarchical organization is a common feature of many directed networks arising in nature and technology. For example, a well-defined message-passing framework based on managerial status typically exists in a business organization. However, in many real-world networks, such patterns of hierarchy are unlikely to be quite so transparent. Due to the nature in which empirical data are collated, the nodes will often be ordered so as to obscure any underlying structure. In addition, the possibility of even a small number of links violating any overall “chain of command” makes the determination of such structures extremely challenging. Here we address the issue of how to reorder a directed network to reveal this type of hierarchy. In doing so, we also look at the task of quantifying the level of hierarchy, given a particular node ordering. We look at a variety of approaches. Using ideas from the graph Laplacian literature, we show that a relevant discrete optimization problem leads to a natural hierarchical node ranking. We also show that this ranking arises via a maximum likelihood problem associated with a new range-dependent hierarchical random-graph model. This random-graph insight allows us to compute a likelihood ratio that quantifies the overall tendency for a given network to be hierarchical. We also develop a generalization of this node-ordering algorithm based on the combinatorics of directed walks. In passing, we note that Google's PageRank algorithm tackles a closely related problem, and may also be motivated from a combinatoric, walk-counting viewpoint. We illustrate the performance of the resulting algorithms on synthetic network data, and on a real-world network from neuroscience where results may be validated biologically.

层次组织是自然界和技术中出现的许多有向网络的共同特征。例如，基于管理状态的良好定义的消息传递框架通常存在于业务组织中。然而，在许多现实世界的网络中，这种层级模式不太可能如此透明。由于经验数据被整理的性质，节点通常会被排序，以掩盖任何潜在的结构。此外，即使是少数环节也有可能违反任何整体的“指挥系统”，这使得确定这种结构极具挑战性。在这里，我们解决了如何重新排序有向网络以揭示这种类型的层次结构的问题。在此过程中，我们还研究了在给定特定节点顺序的情况下量化层次结构级别的任务。我们研究了各种各样的方法。利用图拉普拉斯文献中的思想，我们展示了一个相关的离散优化问题导致了一个自然的分层节点排序。我们还表明，这种排名是通过与一个新的依赖范围的分层随机图模型相关的最大似然问题产生的。这种随机图的洞察力使我们能够计算出一个似然比，它量化了给定网络分层的总体趋势。在有向行走组合的基础上，对这种节点排序算法进行了推广。顺便说一下，我们注意到b谷歌的PageRank算法处理了一个密切相关的问题，并且可能是从组合的、行走计数的观点出发的。我们在合成网络数据和神经科学的现实世界网络上演示了结果算法的性能，其中结果可能得到生物学验证。

{"title":"Googling the Brain: Discovering Hierarchical and Asymmetric Network Structures, with Applications in Neuroscience","authors":"J. J. Crofts, D. Higham","doi":"10.1080/15427951.2011.604284","DOIUrl":"https://doi.org/10.1080/15427951.2011.604284","url":null,"abstract":"Abstract Hierarchical organization is a common feature of many directed networks arising in nature and technology. For example, a well-defined message-passing framework based on managerial status typically exists in a business organization. However, in many real-world networks, such patterns of hierarchy are unlikely to be quite so transparent. Due to the nature in which empirical data are collated, the nodes will often be ordered so as to obscure any underlying structure. In addition, the possibility of even a small number of links violating any overall “chain of command” makes the determination of such structures extremely challenging. Here we address the issue of how to reorder a directed network to reveal this type of hierarchy. In doing so, we also look at the task of quantifying the level of hierarchy, given a particular node ordering. We look at a variety of approaches. Using ideas from the graph Laplacian literature, we show that a relevant discrete optimization problem leads to a natural hierarchical node ranking. We also show that this ranking arises via a maximum likelihood problem associated with a new range-dependent hierarchical random-graph model. This random-graph insight allows us to compute a likelihood ratio that quantifies the overall tendency for a given network to be hierarchical. We also develop a generalization of this node-ordering algorithm based on the combinatorics of directed walks. In passing, we note that Google's PageRank algorithm tackles a closely related problem, and may also be motivated from a combinatoric, walk-counting viewpoint. We illustrate the performance of the resulting algorithms on synthetic network data, and on a real-world network from neuroscience where results may be validated biologically.","PeriodicalId":38105,"journal":{"name":"Internet Mathematics","volume":"7 1","pages":"233 - 254"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15427951.2011.604284","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59946641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

On the Approximability of Reachability-Preserving Network Orientations 关于保持可达网络方向的逼近性

Q3 Mathematics

Internet Mathematics

Pub Date : 2011-11-28 DOI: 10.1080/15427951.2011.604554

Michael Elberfeld, V. Bafna, Iftah Gamzu, Alexander Medvedovsky, D. Segev, Dana Silverbush, Uri Zwick, R. Sharan

Abstract We introduce a graph-orientation problem arising in the study of biological networks. Given an undirected graph and a list of ordered source–target vertex pairs, the goal is to orient the graph such that a maximum number of pairs admit a directed source-to-target path. We study the complexity and approximability of this problem. We show that the problem is -hard even on star graphs and hard to approximate to within some constant factor. On the positive side, we provide an Ω(log log n/log n) factor approximation algorithm for the problem on n-vertex graphs. We further show that for any instance of the problem there exists an orientation of the input graph that satisfies a logarithmic fraction of all pairs and that this bound is tight up to a constant factor. Our techniques also lead to constant-factor approximation algorithms for some restricted variants of the problem.

摘要介绍了生物网络研究中出现的一个面向图问题。给定一个无向图和一个有序的源-目标顶点对列表，目标是确定图的方向，使最大数量的顶点对允许有向的源-目标路径。我们研究了这个问题的复杂性和近似性。我们证明了这个问题即使在星图上也是-困难的，并且很难在某个常数因子内近似。在积极的方面，我们提供了一个Ω(log log n/log n)因子近似算法来解决n顶点图上的问题。我们进一步证明，对于问题的任何实例，存在一个输入图的方向，满足所有对的对数分数，并且这个界紧到一个常数因子。我们的技术也导致了对问题的一些限制变量的常因子近似算法。

引用次数: 6

Introduction to the Special Issue on Biological Networks 生物网络特刊导论

Q3 Mathematics

Internet Mathematics

Pub Date : 2011-11-28 DOI: 10.1080/15427951.2011.621769

Natasa Przulj

In this special issue on biological networks, we aim to interest the readership of Internet Mathematics in network theory applied to bioinformatics. Network biology is a new and emerging research area that is fast-growing, spurred by the collection of biological data representing connections or interactions of molecules in the cell. As such, it has the potential to have at least as profound an impact on our understanding of the cell as sequence data has had. However, the datasets are large, noisy and many graph theoretic problems are formally intractable (impossible to solve exactly in any time less than the age of the universe), and so heuristic approximations must be developed in an attempt to find approximate solutions. Furthermore, the tools developed to solve these problems must be made accessible to biological practitioners. In this direction, this issue contains papers on the many databases available, theoretical and algorithmic advances in analyzing these data, as well as papers on some specific biomedical applications, and two papers introducing software tools. This issue presents six papers from some of the leading research groups in the area. Three papers present significant theoretical advances in techniques. Two of them (Elberfeld et al.; Crofts and Higham) look at directed graphs. First, Elberfeld et al. attack the “maximum graph orientation problem”, in which, given a list of source-sink pairs of nodes, we attempt to add direction to an undirected graph in such a way as to maximize the number of pairs for which directed paths exist from the source to the sink. This has applications in the problem of learning biological pathways, but Elberfeld et al. show that the problem is NP-hard

在这期关于生物网络的特刊中，我们的目标是让互联网数学的读者对网络理论在生物信息学中的应用感兴趣。网络生物学是一个快速发展的新兴研究领域，受到细胞中分子连接或相互作用的生物学数据收集的刺激。因此，它有可能对我们对细胞的理解产生至少与序列数据一样深远的影响。然而，数据集很大，有噪声，许多图论问题在形式上是难以解决的(不可能在小于宇宙年龄的任何时间内精确解决)，因此必须开发启发式近似来试图找到近似解。此外，为解决这些问题而开发的工具必须使生物学从业者能够使用。在这个方向上，这一期包含了关于许多可用数据库的论文，分析这些数据的理论和算法进展，以及关于一些特定生物医学应用的论文，以及两篇介绍软件工具的论文。本期杂志介绍了该领域一些主要研究小组的六篇论文。三篇论文介绍了技术方面的重大理论进展。其中两人(Elberfeld et al.;Crofts和Higham)研究有向图。首先，Elberfeld等人解决了“最大图方向问题”，在该问题中，给定一个源-汇节点对列表，我们试图以这样一种方式为无向图添加方向，从而使从源到汇存在有向路径的对的数量最大化。这在学习生物途径的问题上也有应用，但Elberfeld等人表明这个问题是np困难的

{"title":"Introduction to the Special Issue on Biological Networks","authors":"Natasa Przulj","doi":"10.1080/15427951.2011.621769","DOIUrl":"https://doi.org/10.1080/15427951.2011.621769","url":null,"abstract":"In this special issue on biological networks, we aim to interest the readership of Internet Mathematics in network theory applied to bioinformatics. Network biology is a new and emerging research area that is fast-growing, spurred by the collection of biological data representing connections or interactions of molecules in the cell. As such, it has the potential to have at least as profound an impact on our understanding of the cell as sequence data has had. However, the datasets are large, noisy and many graph theoretic problems are formally intractable (impossible to solve exactly in any time less than the age of the universe), and so heuristic approximations must be developed in an attempt to find approximate solutions. Furthermore, the tools developed to solve these problems must be made accessible to biological practitioners. In this direction, this issue contains papers on the many databases available, theoretical and algorithmic advances in analyzing these data, as well as papers on some specific biomedical applications, and two papers introducing software tools. This issue presents six papers from some of the leading research groups in the area. Three papers present significant theoretical advances in techniques. Two of them (Elberfeld et al.; Crofts and Higham) look at directed graphs. First, Elberfeld et al. attack the “maximum graph orientation problem”, in which, given a list of source-sink pairs of nodes, we attempt to add direction to an undirected graph in such a way as to maximize the number of pairs for which directed paths exist from the source to the sink. This has applications in the problem of learning biological pathways, but Elberfeld et al. show that the problem is NP-hard","PeriodicalId":38105,"journal":{"name":"Internet Mathematics","volume":"7 1","pages":"207 - 208"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15427951.2011.621769","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59946781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

NAViGaTOR: Large Scalable and Interactive Navigation and Analysis of Large Graphs NAViGaTOR:大型可伸缩和交互式导航以及大型图形的分析

Q3 Mathematics

Internet Mathematics

Pub Date : 2011-11-28 DOI: 10.1080/15427951.2011.604289

A. Djebbari, Muhammad Ali, D. Otasek, M. Kotlyar, Kristen Fortney, Serene W. H. Wong, A. Hrvojic, I. Jurisica

Abstract Network visualization tools offer features enabling a variety of analyses to satisfy diverse requirements. Considering complexity and diversity of data and tasks, there is no single best layout, no single best file format or visualization tool: one size does not fit all. One way to cope with these dynamics is to support multiple scenarios and workflows. NAViGaTOR (Network Analysis, Visualization & Graphing TORonto) offers a complete system to manage diverse workflows from one application. It allows users to manipulate large graphs interactively using an innovative graphical user interface (GUI) and through fast layout algorithms with a small memory footprint. NAViGaTOR facilitates integrative network analysis by supporting not only visualization but also visual data mining.

网络可视化工具提供了多种分析功能，以满足不同的需求。考虑到数据和任务的复杂性和多样性，没有单一的最佳布局，没有单一的最佳文件格式或可视化工具:一种尺寸不适合所有。处理这些动态的一种方法是支持多个场景和工作流。NAViGaTOR (Network Analysis, Visualization & graphics TORonto)提供了一个完整的系统，可以从一个应用程序管理不同的工作流。它允许用户使用创新的图形用户界面(GUI)和使用小内存占用的快速布局算法交互式地操作大型图形。NAViGaTOR不仅支持可视化，还支持可视化数据挖掘，从而促进了综合网络分析。

引用次数: 14