Pub Date : 2012-02-13DOI: 10.1080/15427951.2013.802752
C. Borgs, Mickey Brautbar, J. Chayes, S. Teng
Abstract A fundamental problem arising in many applications in Web science and social network analysis is the problem of identifying all nodes in a network whose PageRank exceeds a given threshold Δ. In this paper, we study the probabilistic version of the problem whereby given an arbitrary approximation factor c > 1, we are asked to output a set S of nodes such that with high probability, S contains all nodes of PageRank at least Δ, and no node of PageRank smaller than Δ/c. We call this problem SignificantPageRanks. We develop a nearly optimal local algorithm for the problem with time complexity on networks with n nodes, where the tilde hides a polylogarithmic factor. We show that every algorithm for solving this problem must have running time of Ω(n/Δ), rendering our algorithm optimal up to logarithmic factors. Our algorithm has sublinear time complexity for applications including Web crawling and Web search that require efficient identification of nodes whose PageRanks are above a threshold Δ = nδ, for some constant 0 < δ < 1. Our algorithm comes with two main technical contributions. The first is a multiscale sampling scheme for a basic matrix problem that could be of interest on its own. For us, it appears as an abstraction of a subproblem we need to tackle in order to solve the SignificantPageRanks problem, but we hope that this abstraction will be useful in designing fast algorithms for identifying nodes that are significant beyond PageRank measurements. In the abstract matrix problem, it is assumed that one can access an unknown right-stochastic matrix by querying its rows, where the cost of a query and the accuracy of the answers depend on a precision parameter ε. At a cost propositional to 1/ε, the query will return a list of O(1/ε) entries and their indices that provide an ε-precision approximation of the row. Our task is to find a set that contains all columns whose sum is at least Δ and omits every column whose sum is less than Δ/c. Our multiscale sampling scheme solves this problem with cost , while traditional sampling algorithms would take time Θ((n/Δ)2). Our second main technical contribution is a new local algorithm for approximating personalized PageRank, which is more robust than the earlier ones developed in [Jeh and Widom 03, Andersen et al. 06] and is highly efficient, particularly for networks with large in-degrees or out-degrees. Together with our multiscale sampling scheme, we are able to solve the SignificantPageRanks problem optimally.
在Web科学和社会网络分析的许多应用中出现的一个基本问题是识别网络中PageRank超过给定阈值Δ的所有节点的问题。在本文中,我们研究了该问题的概率版本,即给定任意近似因子c > 1,我们被要求输出一个节点集S,使得在高概率下,S包含PageRank的所有节点至少Δ,并且没有PageRank的节点小于Δ/c。我们称这个问题为显著网页排名。我们开发了一种近乎最优的局部算法来解决n节点网络上的时间复杂度问题,其中波浪隐藏了一个多对数因子。我们证明,解决这个问题的每个算法必须具有Ω(n/Δ)的运行时间,使我们的算法优化到对数因子。对于需要有效识别pagerank超过阈值Δ = nδ(对于某些常数0 < Δ < 1)的节点的Web爬行和Web搜索等应用,我们的算法具有亚线性时间复杂度。我们的算法有两个主要的技术贡献。第一个是一个基本矩阵问题的多尺度抽样方案,它本身可能很有趣。对我们来说,它似乎是为了解决显著PageRank问题而需要解决的子问题的抽象,但我们希望这种抽象将有助于设计快速算法来识别超越PageRank测量的重要节点。在抽象矩阵问题中,假设一个人可以通过查询其行来访问未知的右随机矩阵,其中查询的代价和答案的准确性取决于精度参数ε。在代价命题为1/ε时,查询将返回一个由O(1/ε)个条目组成的列表,以及它们的索引,这些索引提供了行的ε精度近似值。我们的任务是找到一个集合,它包含总和至少为Δ的所有列,并省略总和小于Δ/c的所有列。我们的多尺度采样方案用成本解决了这个问题,而传统的采样算法需要时间Θ((n/Δ)2)。我们的第二个主要技术贡献是一种新的局部算法,用于近似个性化PageRank,它比[Jeh and wisdom 03, Andersen et al. 06]中开发的早期算法更鲁棒,并且效率很高,特别是对于具有较大的进度或出度的网络。结合我们的多尺度采样方案,我们能够最优地解决显著网页排名问题。
{"title":"Multiscale Matrix Sampling and Sublinear-Time PageRank Computation","authors":"C. Borgs, Mickey Brautbar, J. Chayes, S. Teng","doi":"10.1080/15427951.2013.802752","DOIUrl":"https://doi.org/10.1080/15427951.2013.802752","url":null,"abstract":"Abstract A fundamental problem arising in many applications in Web science and social network analysis is the problem of identifying all nodes in a network whose PageRank exceeds a given threshold Δ. In this paper, we study the probabilistic version of the problem whereby given an arbitrary approximation factor c > 1, we are asked to output a set S of nodes such that with high probability, S contains all nodes of PageRank at least Δ, and no node of PageRank smaller than Δ/c. We call this problem SignificantPageRanks. We develop a nearly optimal local algorithm for the problem with time complexity on networks with n nodes, where the tilde hides a polylogarithmic factor. We show that every algorithm for solving this problem must have running time of Ω(n/Δ), rendering our algorithm optimal up to logarithmic factors. Our algorithm has sublinear time complexity for applications including Web crawling and Web search that require efficient identification of nodes whose PageRanks are above a threshold Δ = nδ, for some constant 0 < δ < 1. Our algorithm comes with two main technical contributions. The first is a multiscale sampling scheme for a basic matrix problem that could be of interest on its own. For us, it appears as an abstraction of a subproblem we need to tackle in order to solve the SignificantPageRanks problem, but we hope that this abstraction will be useful in designing fast algorithms for identifying nodes that are significant beyond PageRank measurements. In the abstract matrix problem, it is assumed that one can access an unknown right-stochastic matrix by querying its rows, where the cost of a query and the accuracy of the answers depend on a precision parameter ε. At a cost propositional to 1/ε, the query will return a list of O(1/ε) entries and their indices that provide an ε-precision approximation of the row. Our task is to find a set that contains all columns whose sum is at least Δ and omits every column whose sum is less than Δ/c. Our multiscale sampling scheme solves this problem with cost , while traditional sampling algorithms would take time Θ((n/Δ)2). Our second main technical contribution is a new local algorithm for approximating personalized PageRank, which is more robust than the earlier ones developed in [Jeh and Widom 03, Andersen et al. 06] and is highly efficient, particularly for networks with large in-degrees or out-degrees. Together with our multiscale sampling scheme, we are able to solve the SignificantPageRanks problem optimally.","PeriodicalId":38105,"journal":{"name":"Internet Mathematics","volume":"10 1","pages":"20 - 48"},"PeriodicalIF":0.0,"publicationDate":"2012-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15427951.2013.802752","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59947534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-01-09DOI: 10.1080/15427951.2013.828336
Wei Chen, Wenjie Fang, Guangda Hu, Michael W. Mahoney
Hyperbolicity is a property of a graph that may be viewed as a “soft” version of a tree, and recent empirical and theoretical work has suggested that many graphs arising in Internet and related data applications have hyperbolic properties. Here we consider Gromov's notion of δ-hyperbolicity and establish several positive and negative results for small-world and treelike random graph models. First, we study the hyperbolicity of the class of Kleinberg small-world random graphs , where n is the number of vertices in the graph, d is the dimension of the underlying base grid B, and γ is the small-world parameter such that each node u in the graph connects to another node v in the graph with probability proportional to 1/dB (u, v)γ, with dB (u, v) the grid distance from u to v in the base grid B. We show that when γ=d, the parameter value allowing efficient decentralized routing in Kleinberg's small-world network,the hyperbolic δ is with probability 1−o(1) for every ϵ>0 independent of n. We see that hyperbolicity is not significantly improved in relation to graph diameter even when the long-range connections greatly improve decentralized navigation. We also show that for other values of γ, the hyperbolic δ is very close to the graph diameter, indicating poor hyperbolicity in these graphs as well. Next we study a class of treelike graphs called ringed trees that have constant hyperbolicity. We show that adding random links among the leaves in a manner similar to the small-world graph constructions may easily destroy the hyperbolicity of the graphs, except for a class of random edges added using an exponentially decaying probability function based on the ring distance among the leaves. Our study provides one of the first significant analytic results on the hyperbolicity of a rich class of random graphs, which sheds light on the relationship between hyperbolicity and navigability of random graphs, as well as on the sensitivity of hyperbolic δ to noises in random graphs.
双曲性是图的一种属性,可以看作是树的“软”版本,最近的经验和理论工作表明,在互联网和相关数据应用中出现的许多图都具有双曲性。本文考虑了Gromov的δ-双曲性概念,并建立了小世界和树状随机图模型的几个正负结果。首先,我们研究类的双曲率小世界jonkleinberg随机图,其中n是图中顶点的数量,d是底层基础网格的尺寸B,γ是小世界参数,这样每个节点u图中连接到另一个节点图中概率正比于1 / dB (u, v)γ与dB (u, v)网格距离u, v在网格基础我们表明,当γ= d,在Kleinberg的小世界网络中,允许高效分散路由的参数值,对于每个λ >0,双曲δ的概率为1−0(1),与n无关。我们看到,即使远程连接极大地改善了分散导航,双曲度也没有显著改善。我们还表明,对于γ的其他值,双曲δ非常接近图直径,表明这些图的双曲性也很差。接下来,我们研究一类具有常双曲性的树状图,称为环状树。我们表明,以类似于小世界图构造的方式在叶之间添加随机链接可能很容易破坏图的双曲性,除了使用基于叶之间环距离的指数衰减概率函数添加的一类随机边。我们的研究提供了关于一类丰富的随机图的双曲性的第一个重要的分析结果之一,它揭示了双曲性与随机图的可通航性之间的关系,以及随机图中双曲δ对噪声的敏感性。
{"title":"On the Hyperbolicity of Small-World and Treelike Random Graphs","authors":"Wei Chen, Wenjie Fang, Guangda Hu, Michael W. Mahoney","doi":"10.1080/15427951.2013.828336","DOIUrl":"https://doi.org/10.1080/15427951.2013.828336","url":null,"abstract":"Hyperbolicity is a property of a graph that may be viewed as a “soft” version of a tree, and recent empirical and theoretical work has suggested that many graphs arising in Internet and related data applications have hyperbolic properties. Here we consider Gromov's notion of δ-hyperbolicity and establish several positive and negative results for small-world and treelike random graph models. First, we study the hyperbolicity of the class of Kleinberg small-world random graphs , where n is the number of vertices in the graph, d is the dimension of the underlying base grid B, and γ is the small-world parameter such that each node u in the graph connects to another node v in the graph with probability proportional to 1/dB (u, v)γ, with dB (u, v) the grid distance from u to v in the base grid B. We show that when γ=d, the parameter value allowing efficient decentralized routing in Kleinberg's small-world network,the hyperbolic δ is with probability 1−o(1) for every ϵ>0 independent of n. We see that hyperbolicity is not significantly improved in relation to graph diameter even when the long-range connections greatly improve decentralized navigation. We also show that for other values of γ, the hyperbolic δ is very close to the graph diameter, indicating poor hyperbolicity in these graphs as well. Next we study a class of treelike graphs called ringed trees that have constant hyperbolicity. We show that adding random links among the leaves in a manner similar to the small-world graph constructions may easily destroy the hyperbolicity of the graphs, except for a class of random edges added using an exponentially decaying probability function based on the ring distance among the leaves. Our study provides one of the first significant analytic results on the hyperbolicity of a rich class of random graphs, which sheds light on the relationship between hyperbolicity and navigability of random graphs, as well as on the sensitivity of hyperbolic δ to noises in random graphs.","PeriodicalId":38105,"journal":{"name":"Internet Mathematics","volume":"9 1","pages":"434 - 491"},"PeriodicalIF":0.0,"publicationDate":"2012-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15427951.2013.828336","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59947635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-28DOI: 10.1080/15427951.2011.604559
T. Wittkop, S. Rahmann, Richard Röttger, Sebastian Böcker, J. Baumbach
Abstract Partitioning biological data objects into groups such that the objects within the groups share common traits is a longstanding challenge in computational biology. Recently, we developed and established transitivity clustering, a partitioning approach based on weighted transitive graph projection that utilizes a single similarity threshold as density parameter. In previous publications, we concentrated on the graphical user interface and on concrete biomedical application protocols. Here, we contribute the following theoretical considerations: (1) We provide proofs that the average similarity between objects from the same cluster is above the user-given threshold and that the average similarity between objects from different clusters is below the threshold. (2) We extend transitivity clustering to an overlapping clustering tool by integrating two new approaches. (3) We demonstrate the power of transitivity clustering for protein-complex detection. We evaluate our approaches against others by utilizing gold-standard data that was previously used by Brohée et al. for reviewing existing bioinformatics clustering tools. The extended version of this article is available online at http://transclust.mpi-inf.mpg.de .
{"title":"Extension and Robustness of Transitivity Clustering for Protein–Protein Interaction Network Analysis","authors":"T. Wittkop, S. Rahmann, Richard Röttger, Sebastian Böcker, J. Baumbach","doi":"10.1080/15427951.2011.604559","DOIUrl":"https://doi.org/10.1080/15427951.2011.604559","url":null,"abstract":"Abstract Partitioning biological data objects into groups such that the objects within the groups share common traits is a longstanding challenge in computational biology. Recently, we developed and established transitivity clustering, a partitioning approach based on weighted transitive graph projection that utilizes a single similarity threshold as density parameter. In previous publications, we concentrated on the graphical user interface and on concrete biomedical application protocols. Here, we contribute the following theoretical considerations: (1) We provide proofs that the average similarity between objects from the same cluster is above the user-given threshold and that the average similarity between objects from different clusters is below the threshold. (2) We extend transitivity clustering to an overlapping clustering tool by integrating two new approaches. (3) We demonstrate the power of transitivity clustering for protein-complex detection. We evaluate our approaches against others by utilizing gold-standard data that was previously used by Brohée et al. for reviewing existing bioinformatics clustering tools. The extended version of this article is available online at http://transclust.mpi-inf.mpg.de .","PeriodicalId":38105,"journal":{"name":"Internet Mathematics","volume":"7 1","pages":"255 - 273"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15427951.2011.604559","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59946736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-28DOI: 10.1080/15427951.2011.604561
L. Wong
Abstract While sequence homology search has been the main workhorse in protein function prediction, it is not applicable to a significant portion of novel proteins that do not have informative homologues in sequence databases. Similarly, while statistical tests and learning algorithms based purely on gene expression profiles have been popular for analyzing disease samples, critical issues remain in the understanding of diseases based on the differentially expressed genes suggested by these methods. In the past decade, a large number of databases providing information on various types of biological networks have become available. These databases make it possible to tackle these and other biological problems in novel ways. This paper presents a review of biological network databases and approaches to protein function prediction and gene expression profile analysis that are based on biological networks.
{"title":"Using Biological Networks in Protein Function Prediction and Gene Expression Analysis","authors":"L. Wong","doi":"10.1080/15427951.2011.604561","DOIUrl":"https://doi.org/10.1080/15427951.2011.604561","url":null,"abstract":"Abstract While sequence homology search has been the main workhorse in protein function prediction, it is not applicable to a significant portion of novel proteins that do not have informative homologues in sequence databases. Similarly, while statistical tests and learning algorithms based purely on gene expression profiles have been popular for analyzing disease samples, critical issues remain in the understanding of diseases based on the differentially expressed genes suggested by these methods. In the past decade, a large number of databases providing information on various types of biological networks have become available. These databases make it possible to tackle these and other biological problems in novel ways. This paper presents a review of biological network databases and approaches to protein function prediction and gene expression profile analysis that are based on biological networks.","PeriodicalId":38105,"journal":{"name":"Internet Mathematics","volume":"7 1","pages":"274 - 298"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15427951.2011.604561","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59946743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-28DOI: 10.1080/15427951.2011.604548
N. Alcaraz, Hande Küçük, Jochen Weile, A. Wipat, J. Baumbach
Abstract Recent advances in systems biology have provided us with massive amounts of pathway data that describe the interplay of genes and their products. The resulting biological networks can be modeled as graphs. By means of “omics” technologies, such as microarrays, the activity of genes and proteins can be measured. Here, data from microarray experiments is integrated with the network data to gain deeper insights into gene expression. We introduce KeyPathwayMiner, a method that enables the extraction and visualization of interesting subpathways given the results of a series of gene expression studies. We aim to detect highly connected subnetworks in which most genes or proteins show similar patterns of expression. Specifically, given network and gene expression data, KeyPathwayMiner identifies those maximal subgraphs where all but k nodes of the subnetwork are expressed similarly in all but l cases in the gene expression data. Since identifying these subgraphs is computationally intensive, we developed a heuristic algorithm based on Ant Colony Optimization. We implemented KeyPathwayMiner as a plug-in for Cytoscape. Our computational model is related to a strategy presented by Ulitsky et al. in 2008. Consequently, we used the same data sets for evaluation. KeyPathwayMiner is available online at http://keypathwayminer.mpi-inf.mpg.de .
{"title":"KeyPathwayMiner: Detecting Case-Specific Biological Pathways Using Expression Data","authors":"N. Alcaraz, Hande Küçük, Jochen Weile, A. Wipat, J. Baumbach","doi":"10.1080/15427951.2011.604548","DOIUrl":"https://doi.org/10.1080/15427951.2011.604548","url":null,"abstract":"Abstract Recent advances in systems biology have provided us with massive amounts of pathway data that describe the interplay of genes and their products. The resulting biological networks can be modeled as graphs. By means of “omics” technologies, such as microarrays, the activity of genes and proteins can be measured. Here, data from microarray experiments is integrated with the network data to gain deeper insights into gene expression. We introduce KeyPathwayMiner, a method that enables the extraction and visualization of interesting subpathways given the results of a series of gene expression studies. We aim to detect highly connected subnetworks in which most genes or proteins show similar patterns of expression. Specifically, given network and gene expression data, KeyPathwayMiner identifies those maximal subgraphs where all but k nodes of the subnetwork are expressed similarly in all but l cases in the gene expression data. Since identifying these subgraphs is computationally intensive, we developed a heuristic algorithm based on Ant Colony Optimization. We implemented KeyPathwayMiner as a plug-in for Cytoscape. Our computational model is related to a strategy presented by Ulitsky et al. in 2008. Consequently, we used the same data sets for evaluation. KeyPathwayMiner is available online at http://keypathwayminer.mpi-inf.mpg.de .","PeriodicalId":38105,"journal":{"name":"Internet Mathematics","volume":"7 1","pages":"299 - 313"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15427951.2011.604548","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59946688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-28DOI: 10.1080/15427951.2011.604284
J. J. Crofts, D. Higham
Abstract Hierarchical organization is a common feature of many directed networks arising in nature and technology. For example, a well-defined message-passing framework based on managerial status typically exists in a business organization. However, in many real-world networks, such patterns of hierarchy are unlikely to be quite so transparent. Due to the nature in which empirical data are collated, the nodes will often be ordered so as to obscure any underlying structure. In addition, the possibility of even a small number of links violating any overall “chain of command” makes the determination of such structures extremely challenging. Here we address the issue of how to reorder a directed network to reveal this type of hierarchy. In doing so, we also look at the task of quantifying the level of hierarchy, given a particular node ordering. We look at a variety of approaches. Using ideas from the graph Laplacian literature, we show that a relevant discrete optimization problem leads to a natural hierarchical node ranking. We also show that this ranking arises via a maximum likelihood problem associated with a new range-dependent hierarchical random-graph model. This random-graph insight allows us to compute a likelihood ratio that quantifies the overall tendency for a given network to be hierarchical. We also develop a generalization of this node-ordering algorithm based on the combinatorics of directed walks. In passing, we note that Google's PageRank algorithm tackles a closely related problem, and may also be motivated from a combinatoric, walk-counting viewpoint. We illustrate the performance of the resulting algorithms on synthetic network data, and on a real-world network from neuroscience where results may be validated biologically.
{"title":"Googling the Brain: Discovering Hierarchical and Asymmetric Network Structures, with Applications in Neuroscience","authors":"J. J. Crofts, D. Higham","doi":"10.1080/15427951.2011.604284","DOIUrl":"https://doi.org/10.1080/15427951.2011.604284","url":null,"abstract":"Abstract Hierarchical organization is a common feature of many directed networks arising in nature and technology. For example, a well-defined message-passing framework based on managerial status typically exists in a business organization. However, in many real-world networks, such patterns of hierarchy are unlikely to be quite so transparent. Due to the nature in which empirical data are collated, the nodes will often be ordered so as to obscure any underlying structure. In addition, the possibility of even a small number of links violating any overall “chain of command” makes the determination of such structures extremely challenging. Here we address the issue of how to reorder a directed network to reveal this type of hierarchy. In doing so, we also look at the task of quantifying the level of hierarchy, given a particular node ordering. We look at a variety of approaches. Using ideas from the graph Laplacian literature, we show that a relevant discrete optimization problem leads to a natural hierarchical node ranking. We also show that this ranking arises via a maximum likelihood problem associated with a new range-dependent hierarchical random-graph model. This random-graph insight allows us to compute a likelihood ratio that quantifies the overall tendency for a given network to be hierarchical. We also develop a generalization of this node-ordering algorithm based on the combinatorics of directed walks. In passing, we note that Google's PageRank algorithm tackles a closely related problem, and may also be motivated from a combinatoric, walk-counting viewpoint. We illustrate the performance of the resulting algorithms on synthetic network data, and on a real-world network from neuroscience where results may be validated biologically.","PeriodicalId":38105,"journal":{"name":"Internet Mathematics","volume":"7 1","pages":"233 - 254"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15427951.2011.604284","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59946641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-28DOI: 10.1080/15427951.2011.604554
Michael Elberfeld, V. Bafna, Iftah Gamzu, Alexander Medvedovsky, D. Segev, Dana Silverbush, Uri Zwick, R. Sharan
Abstract We introduce a graph-orientation problem arising in the study of biological networks. Given an undirected graph and a list of ordered source–target vertex pairs, the goal is to orient the graph such that a maximum number of pairs admit a directed source-to-target path. We study the complexity and approximability of this problem. We show that the problem is -hard even on star graphs and hard to approximate to within some constant factor. On the positive side, we provide an Ω(log log n/log n) factor approximation algorithm for the problem on n-vertex graphs. We further show that for any instance of the problem there exists an orientation of the input graph that satisfies a logarithmic fraction of all pairs and that this bound is tight up to a constant factor. Our techniques also lead to constant-factor approximation algorithms for some restricted variants of the problem.
{"title":"On the Approximability of Reachability-Preserving Network Orientations","authors":"Michael Elberfeld, V. Bafna, Iftah Gamzu, Alexander Medvedovsky, D. Segev, Dana Silverbush, Uri Zwick, R. Sharan","doi":"10.1080/15427951.2011.604554","DOIUrl":"https://doi.org/10.1080/15427951.2011.604554","url":null,"abstract":"Abstract We introduce a graph-orientation problem arising in the study of biological networks. Given an undirected graph and a list of ordered source–target vertex pairs, the goal is to orient the graph such that a maximum number of pairs admit a directed source-to-target path. We study the complexity and approximability of this problem. We show that the problem is -hard even on star graphs and hard to approximate to within some constant factor. On the positive side, we provide an Ω(log log n/log n) factor approximation algorithm for the problem on n-vertex graphs. We further show that for any instance of the problem there exists an orientation of the input graph that satisfies a logarithmic fraction of all pairs and that this bound is tight up to a constant factor. Our techniques also lead to constant-factor approximation algorithms for some restricted variants of the problem.","PeriodicalId":38105,"journal":{"name":"Internet Mathematics","volume":"7 1","pages":"209 - 232"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15427951.2011.604554","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59946697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-28DOI: 10.1080/15427951.2011.621769
Natasa Przulj
In this special issue on biological networks, we aim to interest the readership of Internet Mathematics in network theory applied to bioinformatics. Network biology is a new and emerging research area that is fast-growing, spurred by the collection of biological data representing connections or interactions of molecules in the cell. As such, it has the potential to have at least as profound an impact on our understanding of the cell as sequence data has had. However, the datasets are large, noisy and many graph theoretic problems are formally intractable (impossible to solve exactly in any time less than the age of the universe), and so heuristic approximations must be developed in an attempt to find approximate solutions. Furthermore, the tools developed to solve these problems must be made accessible to biological practitioners. In this direction, this issue contains papers on the many databases available, theoretical and algorithmic advances in analyzing these data, as well as papers on some specific biomedical applications, and two papers introducing software tools. This issue presents six papers from some of the leading research groups in the area. Three papers present significant theoretical advances in techniques. Two of them (Elberfeld et al.; Crofts and Higham) look at directed graphs. First, Elberfeld et al. attack the “maximum graph orientation problem”, in which, given a list of source-sink pairs of nodes, we attempt to add direction to an undirected graph in such a way as to maximize the number of pairs for which directed paths exist from the source to the sink. This has applications in the problem of learning biological pathways, but Elberfeld et al. show that the problem is NP-hard
在这期关于生物网络的特刊中,我们的目标是让互联网数学的读者对网络理论在生物信息学中的应用感兴趣。网络生物学是一个快速发展的新兴研究领域,受到细胞中分子连接或相互作用的生物学数据收集的刺激。因此,它有可能对我们对细胞的理解产生至少与序列数据一样深远的影响。然而,数据集很大,有噪声,许多图论问题在形式上是难以解决的(不可能在小于宇宙年龄的任何时间内精确解决),因此必须开发启发式近似来试图找到近似解。此外,为解决这些问题而开发的工具必须使生物学从业者能够使用。在这个方向上,这一期包含了关于许多可用数据库的论文,分析这些数据的理论和算法进展,以及关于一些特定生物医学应用的论文,以及两篇介绍软件工具的论文。本期杂志介绍了该领域一些主要研究小组的六篇论文。三篇论文介绍了技术方面的重大理论进展。其中两人(Elberfeld et al.;Crofts和Higham)研究有向图。首先,Elberfeld等人解决了“最大图方向问题”,在该问题中,给定一个源-汇节点对列表,我们试图以这样一种方式为无向图添加方向,从而使从源到汇存在有向路径的对的数量最大化。这在学习生物途径的问题上也有应用,但Elberfeld等人表明这个问题是np困难的
{"title":"Introduction to the Special Issue on Biological Networks","authors":"Natasa Przulj","doi":"10.1080/15427951.2011.621769","DOIUrl":"https://doi.org/10.1080/15427951.2011.621769","url":null,"abstract":"In this special issue on biological networks, we aim to interest the readership of Internet Mathematics in network theory applied to bioinformatics. Network biology is a new and emerging research area that is fast-growing, spurred by the collection of biological data representing connections or interactions of molecules in the cell. As such, it has the potential to have at least as profound an impact on our understanding of the cell as sequence data has had. However, the datasets are large, noisy and many graph theoretic problems are formally intractable (impossible to solve exactly in any time less than the age of the universe), and so heuristic approximations must be developed in an attempt to find approximate solutions. Furthermore, the tools developed to solve these problems must be made accessible to biological practitioners. In this direction, this issue contains papers on the many databases available, theoretical and algorithmic advances in analyzing these data, as well as papers on some specific biomedical applications, and two papers introducing software tools. This issue presents six papers from some of the leading research groups in the area. Three papers present significant theoretical advances in techniques. Two of them (Elberfeld et al.; Crofts and Higham) look at directed graphs. First, Elberfeld et al. attack the “maximum graph orientation problem”, in which, given a list of source-sink pairs of nodes, we attempt to add direction to an undirected graph in such a way as to maximize the number of pairs for which directed paths exist from the source to the sink. This has applications in the problem of learning biological pathways, but Elberfeld et al. show that the problem is NP-hard","PeriodicalId":38105,"journal":{"name":"Internet Mathematics","volume":"7 1","pages":"207 - 208"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15427951.2011.621769","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59946781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-28DOI: 10.1080/15427951.2011.604289
A. Djebbari, Muhammad Ali, D. Otasek, M. Kotlyar, Kristen Fortney, Serene W. H. Wong, A. Hrvojic, I. Jurisica
Abstract Network visualization tools offer features enabling a variety of analyses to satisfy diverse requirements. Considering complexity and diversity of data and tasks, there is no single best layout, no single best file format or visualization tool: one size does not fit all. One way to cope with these dynamics is to support multiple scenarios and workflows. NAViGaTOR (Network Analysis, Visualization & Graphing TORonto) offers a complete system to manage diverse workflows from one application. It allows users to manipulate large graphs interactively using an innovative graphical user interface (GUI) and through fast layout algorithms with a small memory footprint. NAViGaTOR facilitates integrative network analysis by supporting not only visualization but also visual data mining.
{"title":"NAViGaTOR: Large Scalable and Interactive Navigation and Analysis of Large Graphs","authors":"A. Djebbari, Muhammad Ali, D. Otasek, M. Kotlyar, Kristen Fortney, Serene W. H. Wong, A. Hrvojic, I. Jurisica","doi":"10.1080/15427951.2011.604289","DOIUrl":"https://doi.org/10.1080/15427951.2011.604289","url":null,"abstract":"Abstract Network visualization tools offer features enabling a variety of analyses to satisfy diverse requirements. Considering complexity and diversity of data and tasks, there is no single best layout, no single best file format or visualization tool: one size does not fit all. One way to cope with these dynamics is to support multiple scenarios and workflows. NAViGaTOR (Network Analysis, Visualization & Graphing TORonto) offers a complete system to manage diverse workflows from one application. It allows users to manipulate large graphs interactively using an innovative graphical user interface (GUI) and through fast layout algorithms with a small memory footprint. NAViGaTOR facilitates integrative network analysis by supporting not only visualization but also visual data mining.","PeriodicalId":38105,"journal":{"name":"Internet Mathematics","volume":"7 1","pages":"314 - 347"},"PeriodicalIF":0.0,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15427951.2011.604289","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59946653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}