首页 > 最新文献

Computational systems bioinformatics. Computational Systems Bioinformatics Conference最新文献

英文 中文
A max-flow based approach to the identification of protein complexes using protein interaction and microarray data. 利用蛋白质相互作用和微阵列数据,基于最大流量的方法来鉴定蛋白质复合物。
Jianxing Feng, Rui Jiang, Tao Jiang

The emergence of high-throughput technologies leads to abundant protein-protein interaction (PPI) data and microarray gene expression profiles, and provides a great opportunity for the identification of novel protein complexes using computational methods. Although it has been demonstrated in the literature that methods using protein-protein interaction data alone can successfully predict a large number of protein complexes, the incorporation of gene expression profiles could help refine the putative complexes and hence improve the accuracy of the computational methods. By combining protein-protein interaction data and microarray gene expression profiles, we propose a novel Graph Fragmentation Algorithm (GFA) for protein complex identification. Adapted from a classical max-flow algorithm for finding the (weighted) densest subgraphs, GFA first finds large (weighted) dense subgraphs in a protein-protein interaction network and then breaks each such subgraph into fragments iteratively by weighting its nodes appropriately in terms of their corresponding log fold changes in the microarray data, until the fragment subgraphs are sufficiently small. Our extensive tests on three widely used protein-protein interaction datasets and comparisons with the latest methods for protein complex identification demonstrate the superior performance of our method in terms of accuracy, efficiency, and capability in predicting novel protein complexes. Given the high specificity (or precision) that our method has achieved, we conjecture that our prediction results imply more than 200 novel protein complexes.

高通量技术的出现带来了丰富的蛋白质-蛋白质相互作用(PPI)数据和微阵列基因表达谱,并为使用计算方法鉴定新的蛋白质复合物提供了很大的机会。虽然文献已经证明,仅使用蛋白质-蛋白质相互作用数据的方法可以成功地预测大量蛋白质复合物,但基因表达谱的结合可以帮助改进假定的复合物,从而提高计算方法的准确性。通过结合蛋白质-蛋白质相互作用数据和微阵列基因表达谱,我们提出了一种新的用于蛋白质复合物识别的图碎片算法(GFA)。GFA改编自寻找(加权)最密集子图的经典最大流算法,首先在蛋白质-蛋白质相互作用网络中找到大的(加权)密集子图,然后根据微阵列数据中相应的对数折叠变化对其节点进行适当加权,迭代地将每个这样的子图分解为片段,直到片段子图足够小。我们对三种广泛使用的蛋白质-蛋白质相互作用数据集进行了广泛的测试,并与最新的蛋白质复合物鉴定方法进行了比较,证明了我们的方法在预测新型蛋白质复合物的准确性、效率和能力方面具有优越的性能。鉴于我们的方法已经达到的高特异性(或精度),我们推测我们的预测结果意味着超过200种新的蛋白质复合物。
{"title":"A max-flow based approach to the identification of protein complexes using protein interaction and microarray data.","authors":"Jianxing Feng, Rui Jiang, Tao Jiang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The emergence of high-throughput technologies leads to abundant protein-protein interaction (PPI) data and microarray gene expression profiles, and provides a great opportunity for the identification of novel protein complexes using computational methods. Although it has been demonstrated in the literature that methods using protein-protein interaction data alone can successfully predict a large number of protein complexes, the incorporation of gene expression profiles could help refine the putative complexes and hence improve the accuracy of the computational methods. By combining protein-protein interaction data and microarray gene expression profiles, we propose a novel Graph Fragmentation Algorithm (GFA) for protein complex identification. Adapted from a classical max-flow algorithm for finding the (weighted) densest subgraphs, GFA first finds large (weighted) dense subgraphs in a protein-protein interaction network and then breaks each such subgraph into fragments iteratively by weighting its nodes appropriately in terms of their corresponding log fold changes in the microarray data, until the fragment subgraphs are sufficiently small. Our extensive tests on three widely used protein-protein interaction datasets and comparisons with the latest methods for protein complex identification demonstrate the superior performance of our method in terms of accuracy, efficiency, and capability in predicting novel protein complexes. Given the high specificity (or precision) that our method has achieved, we conjecture that our prediction results imply more than 200 novel protein complexes.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 ","pages":"51-62"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28336172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting pathways transcriptionally correlated with clinical parameters. 检测途径转录与临床参数相关。
Igor Ulitsky, Ron Shamir

The recent explosion in the number of clinical studies involving microarray data calls for novel computational methods for their dissection. Human protein interaction networks are rapidly growing and can assist in the extraction of functional modules from microarray data. We describe a novel methodology for extraction of connected network modules with coherent gene expression patterns that are correlated with a specific clinical parameter. Our approach suits both numerical (e.g., age or tumor size) and logical parameters (e.g., gender or mutation status). We demonstrate the method on a large breast cancer dataset, where we identify biologically-relevant modules related to nine clinical parameters including patient age, tumor size, and metastasis-free survival. Our method is capable of detecting disease-relevant pathways that could not be found using other methods. Our results support some previous hypotheses regarding the molecular pathways underlying diversity of breast tumors and suggest novel ones.

最近,涉及微阵列数据的临床研究数量激增,需要新的计算方法来解剖它们。人类蛋白质相互作用网络正在迅速发展,可以帮助从微阵列数据中提取功能模块。我们描述了一种新的方法,用于提取与特定临床参数相关的具有相干基因表达模式的连接网络模块。我们的方法既适用于数值(例如,年龄或肿瘤大小),也适用于逻辑参数(例如,性别或突变状态)。我们在一个大型乳腺癌数据集上演示了该方法,在那里我们确定了与九个临床参数相关的生物学相关模块,包括患者年龄、肿瘤大小和无转移生存期。我们的方法能够检测到其他方法无法发现的疾病相关途径。我们的研究结果支持了先前关于乳腺肿瘤多样性的分子途径的一些假设,并提出了新的假设。
{"title":"Detecting pathways transcriptionally correlated with clinical parameters.","authors":"Igor Ulitsky,&nbsp;Ron Shamir","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The recent explosion in the number of clinical studies involving microarray data calls for novel computational methods for their dissection. Human protein interaction networks are rapidly growing and can assist in the extraction of functional modules from microarray data. We describe a novel methodology for extraction of connected network modules with coherent gene expression patterns that are correlated with a specific clinical parameter. Our approach suits both numerical (e.g., age or tumor size) and logical parameters (e.g., gender or mutation status). We demonstrate the method on a large breast cancer dataset, where we identify biologically-relevant modules related to nine clinical parameters including patient age, tumor size, and metastasis-free survival. Our method is capable of detecting disease-relevant pathways that could not be found using other methods. Our results support some previous hypotheses regarding the molecular pathways underlying diversity of breast tumors and suggest novel ones.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 ","pages":"249-58"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28337727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The effect of massive gene loss following whole genome duplication on the algorithmic reconstruction of the ancestral populus diploid. 全基因组复制后大量基因丢失对杨树祖先二倍体算法重建的影响。
Chunfang Zheng, P Kerr Wall, Jim Leebens-Mack, Victor A Albert, Claude dePamphilis, David Sankoff

We improve on guided genome halving algorithms so that several thousand gene sets, each containing two paralogs in the descendant T of the doubling event and their single ortholog from an undoubled reference genome R, can be analyzed to reconstruct the ancestor A of T at the time of doubling. At the same time, large numbers of defective gene sets, either missing one paralog from T or missing their ortholog in R, may be incorporated into the analysis in a consistent way. We apply this genomic rearrangement distance-based approach to the recently sequenced poplar (Populus trichocarpa) and grapevine (Vitis vinifera) genomes, as T and R respectively.

我们改进了引导基因组减半算法,从而可以分析数千个基因集,每个基因集在加倍事件的后代T中包含两个相似物,并且它们的单一同源物来自未加倍的参考基因组R,从而重建T在加倍时的祖先A。同时,大量有缺陷的基因集,无论是T中缺少一个平行序列,还是R中缺少其同源序列,都可以以一致的方式纳入分析。我们将这种基于距离的基因组重排方法应用于最近测序的杨树(Populus trichocarpa)和葡萄藤(Vitis vinifera)基因组,分别作为T和R。
{"title":"The effect of massive gene loss following whole genome duplication on the algorithmic reconstruction of the ancestral populus diploid.","authors":"Chunfang Zheng,&nbsp;P Kerr Wall,&nbsp;Jim Leebens-Mack,&nbsp;Victor A Albert,&nbsp;Claude dePamphilis,&nbsp;David Sankoff","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We improve on guided genome halving algorithms so that several thousand gene sets, each containing two paralogs in the descendant T of the doubling event and their single ortholog from an undoubled reference genome R, can be analyzed to reconstruct the ancestor A of T at the time of doubling. At the same time, large numbers of defective gene sets, either missing one paralog from T or missing their ortholog in R, may be incorporated into the analysis in a consistent way. We apply this genomic rearrangement distance-based approach to the recently sequenced poplar (Populus trichocarpa) and grapevine (Vitis vinifera) genomes, as T and R respectively.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 ","pages":"261-71"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28337728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extensive exploration of conformational space improves Rosetta results for short protein domains. 对构象空间的广泛探索改善了罗塞塔对短蛋白质结构域的结果。
Yaohang Li, A. Bordner, Yuan Tian, Xiuping Tao, A. Gorin
With some simplifications, computational protein folding can be understood as an optimization problem of a potential energy function on a variable space consisting of all conformation for a given protein molecule. It is well known that realistic energy potentials are very "rough" functions, when expressed in the standard variables, and the folding trajectories can be easily trapped in multiple local minima. We have integrated our variation of Parallel Tempering optimization into the protein folding program Rosetta in order to improve its capability to overcome energy barriers and estimate how such improvement will influence the quality of the folded protein domains. Here we report that (1) Parallel Tempering Rosetta (PTR) is significantly better in the exploration of protein structures than previous implementations of the program; (2) systematic improvements are observed across a large benchmark set in the parameters that are normally followed to estimate robustness of the folding; (3) these improvements are most dramatic in the subset of the shortest domains, where high-quality structures have been obtained for >75% of all tested sequences. Further analysis of the results will improve our understanding of protein conformational space and lead to new improvements in the protein folding methodology, while the current PTR implementation should be very efficient for short (up to approximately 80 a.a.) protein domains and therefore may find practical application in system biology studies.
经过一些简化,计算蛋白质折叠可以被理解为由给定蛋白质分子的所有构象组成的可变空间上势能函数的优化问题。众所周知,当用标准变量表示时,真实的能量势是非常“粗糙”的函数,并且折叠轨迹很容易陷入多个局部极小值。我们将平行回火优化的变化整合到蛋白质折叠程序Rosetta中,以提高其克服能量障碍的能力,并估计这种改进将如何影响折叠蛋白质结构域的质量。在此,我们报告了(1)平行回火罗塞塔(PTR)在蛋白质结构的探索方面明显优于之前的程序实现;(2)在通常用于估计折叠鲁棒性的参数中,通过大型基准集观察到系统的改进;(3)这些改进在最短结构域的子集中最为显著,在最短结构域中,所有测试序列中有75%的序列获得了高质量的结构。对结果的进一步分析将提高我们对蛋白质构象空间的理解,并导致蛋白质折叠方法的新改进,而目前的PTR实现对于短(高达约80 a.a)蛋白质结构域应该非常有效,因此可能在系统生物学研究中找到实际应用。
{"title":"Extensive exploration of conformational space improves Rosetta results for short protein domains.","authors":"Yaohang Li, A. Bordner, Yuan Tian, Xiuping Tao, A. Gorin","doi":"10.1142/9781848162648_0018","DOIUrl":"https://doi.org/10.1142/9781848162648_0018","url":null,"abstract":"With some simplifications, computational protein folding can be understood as an optimization problem of a potential energy function on a variable space consisting of all conformation for a given protein molecule. It is well known that realistic energy potentials are very \"rough\" functions, when expressed in the standard variables, and the folding trajectories can be easily trapped in multiple local minima. We have integrated our variation of Parallel Tempering optimization into the protein folding program Rosetta in order to improve its capability to overcome energy barriers and estimate how such improvement will influence the quality of the folded protein domains. Here we report that (1) Parallel Tempering Rosetta (PTR) is significantly better in the exploration of protein structures than previous implementations of the program; (2) systematic improvements are observed across a large benchmark set in the parameters that are normally followed to estimate robustness of the folding; (3) these improvements are most dramatic in the subset of the shortest domains, where high-quality structures have been obtained for >75% of all tested sequences. Further analysis of the results will improve our understanding of protein conformational space and lead to new improvements in the protein folding methodology, while the current PTR implementation should be very efficient for short (up to approximately 80 a.a.) protein domains and therefore may find practical application in system biology studies.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 1","pages":"203-9"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64003563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Efficient haplotype inference from pedigrees with missing data using linear systems with disjoint-set data structures. 利用不相交集数据结构的线性系统从缺失数据的谱系中进行有效的单倍型推断。
Xin Li, Jing Li
We study the haplotype inference problem from pedigree data under the zero recombination assumption, which is well supported by real data for tightly linked markers (i.e., single nucleotide polymorphisms (SNPs)) over a relatively large chromosome segment. We solve the problem in a rigorous mathematical manner by formulating genotype constraints as a linear system of inheritance variables. We then utilize disjoint-set structures to encode connectivity information among individuals, to detect constraints from genotypes, and to check consistency of constraints. On a tree pedigree without missing data, our algorithm can output a general solution as well as the number of total specific solutions in a nearly linear time O (mn x alpha(n)), where m is the number of loci, n is the number of individuals and alpha is the inverse Ackermann function, which is a further improvement over existing ones. We also extend the idea to looped pedigrees and pedigrees with missing data by considering existing (partial) constraints on inheritance variables. The algorithm has been implemented in C++ and will be incorporated into our PedPhase package. Experimental results show that it can correctly identify all 0-recombinant solutions with great efficiency. Comparisons with other two popular algorithms show that the proposed algorithm achieves 10 to 10(5)-fold improvements over a variety of parameter settings. The experimental study also provides empirical evidences on the complexity bounds suggested by theoretical analysis.
我们研究了零重组假设下系谱数据的单倍型推断问题,这一假设得到了相对较大染色体片段上紧密链接标记(即单核苷酸多态性(SNPs))的实际数据的很好支持。我们通过将基因型约束表述为遗传变量的线性系统,以严格的数学方式解决了这个问题。然后,我们利用disjoint-set结构来编码个体之间的连接信息,检测来自基因型的约束,并检查约束的一致性。在没有丢失数据的树谱系上,我们的算法可以在近线性时间O (mn x alpha(n))内输出通解和总特解的个数,其中m为基因座数,n为个体数,alpha为逆Ackermann函数,这是对现有算法的进一步改进。通过考虑继承变量上的现有(部分)约束,我们还将该思想扩展到循环谱系和缺少数据的谱系。该算法已在c++中实现,并将被纳入我们的PedPhase包中。实验结果表明,该方法能正确识别所有0-重组溶液,效率高。与其他两种流行算法的比较表明,该算法在各种参数设置下实现了10到10(5)倍的改进。实验研究也为理论分析提出的复杂性界限提供了经验证据。
{"title":"Efficient haplotype inference from pedigrees with missing data using linear systems with disjoint-set data structures.","authors":"Xin Li, Jing Li","doi":"10.1142/9781848162648_0026","DOIUrl":"https://doi.org/10.1142/9781848162648_0026","url":null,"abstract":"We study the haplotype inference problem from pedigree data under the zero recombination assumption, which is well supported by real data for tightly linked markers (i.e., single nucleotide polymorphisms (SNPs)) over a relatively large chromosome segment. We solve the problem in a rigorous mathematical manner by formulating genotype constraints as a linear system of inheritance variables. We then utilize disjoint-set structures to encode connectivity information among individuals, to detect constraints from genotypes, and to check consistency of constraints. On a tree pedigree without missing data, our algorithm can output a general solution as well as the number of total specific solutions in a nearly linear time O (mn x alpha(n)), where m is the number of loci, n is the number of individuals and alpha is the inverse Ackermann function, which is a further improvement over existing ones. We also extend the idea to looped pedigrees and pedigrees with missing data by considering existing (partial) constraints on inheritance variables. The algorithm has been implemented in C++ and will be incorporated into our PedPhase package. Experimental results show that it can correctly identify all 0-recombinant solutions with great efficiency. Comparisons with other two popular algorithms show that the proposed algorithm achieves 10 to 10(5)-fold improvements over a variety of parameter settings. The experimental study also provides empirical evidences on the complexity bounds suggested by theoretical analysis.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 1","pages":"297-308"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64003575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Graph wavelet alignment kernels for drug virtual screening. 用于药物虚拟筛选的图小波对齐核。
Aaron Smalter, Jun Huan, Gerald Lushington

In this paper we introduce a novel graph classification algorithm and demonstrate its efficacy in drug design. In our method, we use graphs to model chemical structures and apply a wavelet analysis of graphs to create features capturing graph local topology. We design a novel graph kernel function to utilize the created feature to build predictive models for chemicals. We call the new graph kernel a graph wavelet-alignment kernel. We have evaluated the efficacy of the wavelet-alignment kernel using a set of chemical structure-activity prediction benchmarks. Our results indicate that the use of the kernel function yields performance profiles comparable to, and sometimes exceeding that of the existing state-of-the-art chemical classification approaches. In addition, our results also show that the use of wavelet functions significantly decreases the computational costs for graph kernel computation with more than 10 fold speed up.

本文介绍了一种新的图分类算法,并论证了其在药物设计中的有效性。在我们的方法中,我们使用图来模拟化学结构,并应用图的小波分析来创建捕获图局部拓扑的特征。我们设计了一个新的图核函数,利用所创建的特征来构建化学品的预测模型。我们称这种新的图核为图小波对齐核。我们使用一组化学结构-活性预测基准评估了小波对准核的有效性。我们的结果表明,使用核函数产生的性能概况与现有的最先进的化学分类方法相当,有时甚至超过。此外,我们的结果还表明,小波函数的使用显著降低了图核计算的计算成本,速度提高了10倍以上。
{"title":"Graph wavelet alignment kernels for drug virtual screening.","authors":"Aaron Smalter,&nbsp;Jun Huan,&nbsp;Gerald Lushington","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>In this paper we introduce a novel graph classification algorithm and demonstrate its efficacy in drug design. In our method, we use graphs to model chemical structures and apply a wavelet analysis of graphs to create features capturing graph local topology. We design a novel graph kernel function to utilize the created feature to build predictive models for chemicals. We call the new graph kernel a graph wavelet-alignment kernel. We have evaluated the efficacy of the wavelet-alignment kernel using a set of chemical structure-activity prediction benchmarks. Our results indicate that the use of the kernel function yields performance profiles comparable to, and sometimes exceeding that of the existing state-of-the-art chemical classification approaches. In addition, our results also show that the use of wavelet functions significantly decreases the computational costs for graph kernel computation with more than 10 fold speed up.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 ","pages":"327-38"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28336043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A max-flow based approach to the identification of protein complexes using protein interaction and microarray data. 利用蛋白质相互作用和微阵列数据,基于最大流量的方法来鉴定蛋白质复合物。
Jianxing Feng, Rui Jiang, Tao Jiang
The emergence of high-throughput technologies leads to abundant protein-protein interaction (PPI) data and microarray gene expression profiles, and provides a great opportunity for the identification of novel protein complexes using computational methods. Although it has been demonstrated in the literature that methods using protein-protein interaction data alone can successfully predict a large number of protein complexes, the incorporation of gene expression profiles could help refine the putative complexes and hence improve the accuracy of the computational methods. By combining protein-protein interaction data and microarray gene expression profiles, we propose a novel Graph Fragmentation Algorithm (GFA) for protein complex identification. Adapted from a classical max-flow algorithm for finding the (weighted) densest subgraphs, GFA first finds large (weighted) dense subgraphs in a protein-protein interaction network and then breaks each such subgraph into fragments iteratively by weighting its nodes appropriately in terms of their corresponding log fold changes in the microarray data, until the fragment subgraphs are sufficiently small. Our extensive tests on three widely used protein-protein interaction datasets and comparisons with the latest methods for protein complex identification demonstrate the superior performance of our method in terms of accuracy, efficiency, and capability in predicting novel protein complexes. Given the high specificity (or precision) that our method has achieved, we conjecture that our prediction results imply more than 200 novel protein complexes.
高通量技术的出现带来了丰富的蛋白质-蛋白质相互作用(PPI)数据和微阵列基因表达谱,并为使用计算方法鉴定新的蛋白质复合物提供了很大的机会。虽然文献已经证明,仅使用蛋白质-蛋白质相互作用数据的方法可以成功地预测大量蛋白质复合物,但基因表达谱的结合可以帮助改进假定的复合物,从而提高计算方法的准确性。通过结合蛋白质-蛋白质相互作用数据和微阵列基因表达谱,我们提出了一种新的用于蛋白质复合物识别的图碎片算法(GFA)。GFA改编自寻找(加权)最密集子图的经典最大流算法,首先在蛋白质-蛋白质相互作用网络中找到大的(加权)密集子图,然后根据微阵列数据中相应的对数折叠变化对其节点进行适当加权,迭代地将每个这样的子图分解为片段,直到片段子图足够小。我们对三种广泛使用的蛋白质-蛋白质相互作用数据集进行了广泛的测试,并与最新的蛋白质复合物鉴定方法进行了比较,证明了我们的方法在预测新型蛋白质复合物的准确性、效率和能力方面具有优越的性能。鉴于我们的方法已经达到的高特异性(或精度),我们推测我们的预测结果意味着超过200种新的蛋白质复合物。
{"title":"A max-flow based approach to the identification of protein complexes using protein interaction and microarray data.","authors":"Jianxing Feng, Rui Jiang, Tao Jiang","doi":"10.1142/9781848162648_0005","DOIUrl":"https://doi.org/10.1142/9781848162648_0005","url":null,"abstract":"The emergence of high-throughput technologies leads to abundant protein-protein interaction (PPI) data and microarray gene expression profiles, and provides a great opportunity for the identification of novel protein complexes using computational methods. Although it has been demonstrated in the literature that methods using protein-protein interaction data alone can successfully predict a large number of protein complexes, the incorporation of gene expression profiles could help refine the putative complexes and hence improve the accuracy of the computational methods. By combining protein-protein interaction data and microarray gene expression profiles, we propose a novel Graph Fragmentation Algorithm (GFA) for protein complex identification. Adapted from a classical max-flow algorithm for finding the (weighted) densest subgraphs, GFA first finds large (weighted) dense subgraphs in a protein-protein interaction network and then breaks each such subgraph into fragments iteratively by weighting its nodes appropriately in terms of their corresponding log fold changes in the microarray data, until the fragment subgraphs are sufficiently small. Our extensive tests on three widely used protein-protein interaction datasets and comparisons with the latest methods for protein complex identification demonstrate the superior performance of our method in terms of accuracy, efficiency, and capability in predicting novel protein complexes. Given the high specificity (or precision) that our method has achieved, we conjecture that our prediction results imply more than 200 novel protein complexes.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 1","pages":"51-62"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64002534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 62
Improving homology models for protein-ligand binding sites. 改进蛋白质-配体结合位点的同源性模型。
Chris Kauffman, H. Rangwala, G. Karypis
In order to improve the prediction of protein-ligand binding sites through homology modeling, we incorporate knowledge of the binding residues into the modeling framework. Residues are identified as binding or nonbinding based on their true labels as well as labels predicted from structure and sequence. The sequence predictions were made using a support vector machine framework which employs a sophisticated window-based kernel. Binding labels are used with a very sensitive sequence alignment method to align the target and template. Relevant parameters governing the alignment process are searched for optimal values. Based on our results, homology models of the binding site can be improved if a priori knowledge of the binding residues is available. For target-template pairs with low sequence identity and high structural diversity our sequence-based prediction method provided sufficient information to realize this improvement.
为了通过同源性建模提高对蛋白质-配体结合位点的预测,我们将结合残基的知识纳入建模框架。根据残基的真实标签以及从结构和序列预测的标签来识别它们是结合的还是非结合的。使用支持向量机框架进行序列预测,该框架采用了复杂的基于窗口的内核。结合标签使用非常敏感的序列比对方法来对准目标和模板。寻找控制对准过程的相关参数的最优值。基于我们的研究结果,如果结合残基的先验知识可用,可以改进结合位点的同源性模型。对于低序列同一性和高结构多样性的目标模板对,基于序列的预测方法提供了足够的信息来实现这一改进。
{"title":"Improving homology models for protein-ligand binding sites.","authors":"Chris Kauffman, H. Rangwala, G. Karypis","doi":"10.1142/9781848162648_0019","DOIUrl":"https://doi.org/10.1142/9781848162648_0019","url":null,"abstract":"In order to improve the prediction of protein-ligand binding sites through homology modeling, we incorporate knowledge of the binding residues into the modeling framework. Residues are identified as binding or nonbinding based on their true labels as well as labels predicted from structure and sequence. The sequence predictions were made using a support vector machine framework which employs a sophisticated window-based kernel. Binding labels are used with a very sensitive sequence alignment method to align the target and template. Relevant parameters governing the alignment process are searched for optimal values. Based on our results, homology models of the binding site can be improved if a priori knowledge of the binding residues is available. For target-template pairs with low sequence identity and high structural diversity our sequence-based prediction method provided sufficient information to realize this improvement.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 1","pages":"211-22"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64003147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Graph wavelet alignment kernels for drug virtual screening. 用于药物虚拟筛选的图小波对齐核。
Aaron M. Smalter, Jun Huan, G. Lushington
In this paper we introduce a novel graph classification algorithm and demonstrate its efficacy in drug design. In our method, we use graphs to model chemical structures and apply a wavelet analysis of graphs to create features capturing graph local topology. We design a novel graph kernel function to utilize the created feature to build predictive models for chemicals. We call the new graph kernel a graph wavelet-alignment kernel. We have evaluated the efficacy of the wavelet-alignment kernel using a set of chemical structure-activity prediction benchmarks. Our results indicate that the use of the kernel function yields performance profiles comparable to, and sometimes exceeding that of the existing state-of-the-art chemical classification approaches. In addition, our results also show that the use of wavelet functions significantly decreases the computational costs for graph kernel computation with more than 10 fold speed up.
本文介绍了一种新的图分类算法,并论证了其在药物设计中的有效性。在我们的方法中,我们使用图来模拟化学结构,并应用图的小波分析来创建捕获图局部拓扑的特征。我们设计了一个新的图核函数,利用所创建的特征来构建化学品的预测模型。我们称这种新的图核为图小波对齐核。我们使用一组化学结构-活性预测基准评估了小波对准核的有效性。我们的结果表明,使用核函数产生的性能概况与现有的最先进的化学分类方法相当,有时甚至超过。此外,我们的结果还表明,小波函数的使用显著降低了图核计算的计算成本,速度提高了10倍以上。
{"title":"Graph wavelet alignment kernels for drug virtual screening.","authors":"Aaron M. Smalter, Jun Huan, G. Lushington","doi":"10.1142/9781848162648_0029","DOIUrl":"https://doi.org/10.1142/9781848162648_0029","url":null,"abstract":"In this paper we introduce a novel graph classification algorithm and demonstrate its efficacy in drug design. In our method, we use graphs to model chemical structures and apply a wavelet analysis of graphs to create features capturing graph local topology. We design a novel graph kernel function to utilize the created feature to build predictive models for chemicals. We call the new graph kernel a graph wavelet-alignment kernel. We have evaluated the efficacy of the wavelet-alignment kernel using a set of chemical structure-activity prediction benchmarks. Our results indicate that the use of the kernel function yields performance profiles comparable to, and sometimes exceeding that of the existing state-of-the-art chemical classification approaches. In addition, our results also show that the use of wavelet functions significantly decreases the computational costs for graph kernel computation with more than 10 fold speed up.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 1","pages":"327-38"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64003816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Predicting flexible length linear B-cell epitopes. 预测弹性长度线性b细胞表位。
Yasser El-Manzalawy, Drena Dobbs, Vasant Honavar

Identifying B-cell epitopes play an important role in vaccine design, immunodiagnostic tests, and antibody production. Therefore, computational tools for reliably predicting B-cell epitopes are highly desirable. We explore two machine learning approaches for predicting flexible length linear B-cell epitopes. The first approach utilizes four sequence kernels for determining a similarity score between any arbitrary pair of variable length sequences. The second approach utilizes four different methods of mapping a variable length sequence into a fixed length feature vector. Based on our empirical comparisons, we propose FBCPred, a novel method for predicting flexible length linear B-cell epitopes using the subsequence kernel. Our results demonstrate that FBCPred significantly outperforms all other classifiers evaluated in this study. An implementation of FBCPred and the datasets used in this study are publicly available through our linear B-cell epitope prediction server, BCPREDS, at: http://ailab.cs.iastate.edu/bcpreds/.

鉴定b细胞表位在疫苗设计、免疫诊断试验和抗体生产中起着重要作用。因此,可靠地预测b细胞表位的计算工具是非常需要的。我们探索了两种预测灵活长度线性b细胞表位的机器学习方法。第一种方法利用四个序列核来确定任意一对变长序列之间的相似性得分。第二种方法利用四种不同的方法将可变长度序列映射到固定长度的特征向量。基于我们的经验比较,我们提出了FBCPred,一种利用子序列核预测弹性长度线性b细胞表位的新方法。我们的结果表明,FBCPred显著优于本研究中评估的所有其他分类器。FBCPred的实现和本研究中使用的数据集可通过我们的线性b细胞表位预测服务器BCPREDS公开获取,网址:http://ailab.cs.iastate.edu/bcpreds/。
{"title":"Predicting flexible length linear B-cell epitopes.","authors":"Yasser El-Manzalawy, Drena Dobbs, Vasant Honavar","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Identifying B-cell epitopes play an important role in vaccine design, immunodiagnostic tests, and antibody production. Therefore, computational tools for reliably predicting B-cell epitopes are highly desirable. We explore two machine learning approaches for predicting flexible length linear B-cell epitopes. The first approach utilizes four sequence kernels for determining a similarity score between any arbitrary pair of variable length sequences. The second approach utilizes four different methods of mapping a variable length sequence into a fixed length feature vector. Based on our empirical comparisons, we propose FBCPred, a novel method for predicting flexible length linear B-cell epitopes using the subsequence kernel. Our results demonstrate that FBCPred significantly outperforms all other classifiers evaluated in this study. An implementation of FBCPred and the datasets used in this study are publicly available through our linear B-cell epitope prediction server, BCPREDS, at: http://ailab.cs.iastate.edu/bcpreds/.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 ","pages":"121-32"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3400678/pdf/nihms147917.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28337238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational systems bioinformatics. Computational Systems Bioinformatics Conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1