首页 > 最新文献

Proceedings of the ... Asia-Pacific bioinformatics conference最新文献

英文 中文
Flow Model of the Protein-protein Interaction Network for Finding Credible Interactions 寻找可信相互作用的蛋白质-蛋白质相互作用网络流模型
Pub Date : 2007-01-01 DOI: 10.1142/9781860947995_0034
Kinya Okada, K. Asai, Masanori Arita
Large-scale protein-protein interactions (PPIs) detected by yeast-two-hybrid (Y2H) systems are known to contain many false positives. The separation of credible interactions from background noise is still an unavoidable task. In the present study, we propose the relative reliability score for PPI as an intrinsic characteristic of global topology in the PPI networks. Our score is calculated as the dominant eigenvector of an adjacency matrix and represents the steady state of the network flow. By using this reliability score as a cut-off threshold from noisy Y2H PPI data, the credible interactions were extracted with better or comparable performance of previously proposed methods which were also based on the network topology. The result suggests that the application of the network-flow model to PPI data is useful for extracting credible interactions from noisy experimental data.
酵母-双杂交(Y2H)系统检测到的大规模蛋白质-蛋白质相互作用(PPIs)已知含有许多假阳性。从背景噪声中分离可信相互作用仍然是一个不可避免的任务。在本研究中,我们提出PPI的相对可靠性评分作为PPI网络全局拓扑的内在特征。我们的分数被计算为邻接矩阵的主要特征向量,并表示网络流的稳定状态。通过使用该可靠性评分作为噪声Y2H PPI数据的截止阈值,可以提取出与先前提出的基于网络拓扑的方法具有更好或相当性能的可信交互。结果表明,将网络流模型应用于PPI数据有助于从噪声实验数据中提取可信的相互作用。
{"title":"Flow Model of the Protein-protein Interaction Network for Finding Credible Interactions","authors":"Kinya Okada, K. Asai, Masanori Arita","doi":"10.1142/9781860947995_0034","DOIUrl":"https://doi.org/10.1142/9781860947995_0034","url":null,"abstract":"Large-scale protein-protein interactions (PPIs) detected by yeast-two-hybrid (Y2H) systems are known to contain many false positives. The separation of credible interactions from background noise is still an unavoidable task. In the present study, we propose the relative reliability score for PPI as an intrinsic characteristic of global topology in the PPI networks. Our score is calculated as the dominant eigenvector of an adjacency matrix and represents the steady state of the network flow. By using this reliability score as a cut-off threshold from noisy Y2H PPI data, the credible interactions were extracted with better or comparable performance of previously proposed methods which were also based on the network topology. The result suggests that the application of the network-flow model to PPI data is useful for extracting credible interactions from noisy experimental data.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74766164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Fast Structural Similarity Search Based on Topology String Matching 基于拓扑字符串匹配的快速结构相似性搜索
Pub Date : 2007-01-01 DOI: 10.1142/9781860947995_0036
Sung-Hee Park, D. Gilbert, K. Ryu
We describe an abstract data model of protein structures by representing the geometry of proteins using spatial data types and present a framework for fast structural similarity search based on the matching of topology strings using bipartite graph matching. The system has been implemented on top of the Oracle 9i spatial database management system. The performance evaluation was conducted on 36 proteins from the Chew and Kedem data set and also on a subset of the PDB40. Our method performs well in terms of the quality of matching whilst having the advantage of fast execution and being able to compute similarity search in polynomial time. Thus, this work shows that the pre-computed string representation of topological properties between secondary structure elements using spatial relationships of spatial database management system is practical for fast structural similarity search.
利用空间数据类型表示蛋白质的几何形状,描述了蛋白质结构的抽象数据模型,并提出了一种基于拓扑字符串匹配的快速结构相似性搜索框架。该系统是在Oracle 9i空间数据库管理系统之上实现的。对来自Chew和Kedem数据集的36种蛋白质以及PDB40的一个子集进行了性能评估。我们的方法在匹配质量方面表现良好,同时具有执行速度快和在多项式时间内计算相似度搜索的优点。因此,本研究表明,利用空间数据库管理系统的空间关系预先计算二级结构元素之间拓扑属性的字符串表示对于快速结构相似性搜索是可行的。
{"title":"Fast Structural Similarity Search Based on Topology String Matching","authors":"Sung-Hee Park, D. Gilbert, K. Ryu","doi":"10.1142/9781860947995_0036","DOIUrl":"https://doi.org/10.1142/9781860947995_0036","url":null,"abstract":"We describe an abstract data model of protein structures by representing the geometry of proteins using spatial data types and present a framework for fast structural similarity search based on the matching of topology strings using bipartite graph matching. The system has been implemented on top of the Oracle 9i spatial database management system. The performance evaluation was conducted on 36 proteins from the Chew and Kedem data set and also on a subset of the PDB40. Our method performs well in terms of the quality of matching whilst having the advantage of fast execution and being able to compute similarity search in polynomial time. Thus, this work shows that the pre-computed string representation of topological properties between secondary structure elements using spatial relationships of spatial database management system is practical for fast structural similarity search.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88494807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Inferring Gene Regulatory Networks by Machine Learning Methods 用机器学习方法推断基因调控网络
Pub Date : 2007-01-01 DOI: 10.1142/9781860947995_0027
J. Supper, H. Fröhlich, C. Spieth, Andreas Dräger, A. Zell
The ability to measure the transcriptional response after a stimulus has drawn much attention to the underlying gene regulatory networks. Several machine learning related methods, such as Bayesian networks and decision trees, have been proposed to deal with this difficult problem, but rarely a systematic comparison between different algorithms has been performed. In this work, we critically evaluate the application of multiple linear regression, SVMs, decision trees and Bayesian networks to reconstruct the budding yeast cell cycle network. The performance of these methods is assessed by comparing the topology of the reconstructed models to a validation network. This validation network is defined a priori and each interaction is specified by at least one publication. We also investigate the quality of the network reconstruction if a varying amount of gene regulatory dependencies is provided a priori.
在刺激后测量转录反应的能力引起了人们对潜在基因调控网络的关注。一些机器学习相关的方法,如贝叶斯网络和决策树,已经被提出来处理这个难题,但很少有不同算法之间的系统比较被执行。在这项工作中,我们批判性地评估了多元线性回归、支持向量机、决策树和贝叶斯网络在重建芽殖酵母细胞周期网络中的应用。通过将重建模型的拓扑结构与验证网络进行比较,评估了这些方法的性能。此验证网络是先验定义的,并且每个交互由至少一个发布指定。我们还研究了网络重建的质量,如果不同数量的基因调控依赖是先验的。
{"title":"Inferring Gene Regulatory Networks by Machine Learning Methods","authors":"J. Supper, H. Fröhlich, C. Spieth, Andreas Dräger, A. Zell","doi":"10.1142/9781860947995_0027","DOIUrl":"https://doi.org/10.1142/9781860947995_0027","url":null,"abstract":"The ability to measure the transcriptional response after a stimulus has drawn much attention to the underlying gene regulatory networks. Several machine learning related methods, such as Bayesian networks and decision trees, have been proposed to deal with this difficult problem, but rarely a systematic comparison between different algorithms has been performed. In this work, we critically evaluate the application of multiple linear regression, SVMs, decision trees and Bayesian networks to reconstruct the budding yeast cell cycle network. The performance of these methods is assessed by comparing the topology of the reconstructed models to a validation network. This validation network is defined a priori and each interaction is specified by at least one publication. We also investigate the quality of the network reconstruction if a varying amount of gene regulatory dependencies is provided a priori.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87057540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
All Hits All The Time: Parameter Free Calculation of Seed Sensitivity 所有命中所有时间:种子敏感性的参数自由计算
Pub Date : 2007-01-01 DOI: 10.1142/9781860947995_0035
Denise Y. F. Mak, Gary Benson
Standard search techniques for DNA repeats start by identifying seeds , that is, small matching words, that may inhabit larger repeats. Recent innovations in seed structure have led to the development of spacedseeds [8] andindel seeds [9] which are more sensitive than contiguous seeds (also known as k-mers, k-tuples, l-words, etc.). Evaluating seed s nsitivityrequires 1) specifying a homology model which describes types of alignments that can occur between two copies of a repeat, and 2) assigning probabilities to those alignments. Optimal seed selection is a resource intensive activity because essentially all alternative seeds must be tested [7]. Current methods require that the model and probability parameters be specified in advance. When the parameters change, the entire calculation has to be rerun. In this paper, we show how to eliminatethe need for prior parameter specification. The ideas presented follow from a simple observation: given a homology model, the alignments hit by a particular seed remain the same regardless of the probability parameters. Only the weights assigned to those alignments change. Therefore, if we know all the hits, we can easily (and quickly) find optimal seeds. We describe a highly efficient preprocessing step, which is computed just oncefor each seed. In this calculation, strings which represent possible alignments are unweightedby any probability parameters. Then we show several increasingly efficient methods to find the optimal seed when given specific probability parameters. Indeed, we show how to determine exactly which seeds can never be optimal under any set of probability parameters. This leads to the startling observation that out of thousands of seeds, only a handful have any chance of being optimal. We then show how to find optimal seeds and the boundaries within probability space where they are optimal. We expect this method to greatly facilitate the study of seed space sensitivity, construction of multiple seed sets, and the use of alternative definitions of optimality.
DNA重复序列的标准搜索技术从识别种子开始,即可能包含较大重复序列的小匹配词。最近在种子结构上的创新导致了间隔种子[8]和indel种子[9]的发展,它们比连续种子(也称为k-mers, k-tuples, l-words等)更敏感。评估种子的敏感性需要1)指定一个同源性模型,该模型描述了在重复的两个副本之间可能发生的配对类型,以及2)为这些配对分配概率。最佳种子选择是一项资源密集型活动,因为基本上所有备选种子都必须经过测试。目前的方法需要事先确定模型和概率参数。当参数改变时,整个计算必须重新运行。在本文中,我们展示了如何消除对预先参数规范的需要。提出的想法来自一个简单的观察:给定一个同源模型,被特定种子击中的排列保持不变,而不管概率参数。只有分配给这些对齐的权重才会改变。因此,如果我们知道所有的结果,我们就可以很容易(而且很快)找到最佳种子。我们描述了一个高效的预处理步骤,每个种子只计算一次。在此计算中,表示可能对齐的字符串不受任何概率参数的加权。在给定特定的概率参数时,我们给出了几种越来越有效的寻找最优种子的方法。实际上,我们展示了如何准确地确定在任何一组概率参数下哪些种子永远不会是最优的。这导致了一个惊人的观察结果:在成千上万的种子中,只有少数有机会成为最优的。然后,我们展示了如何在概率空间中找到最优种子和最优种子的边界。我们期望这种方法能够极大地促进种子空间敏感性的研究、多种子集的构造以及最优性的替代定义的使用。
{"title":"All Hits All The Time: Parameter Free Calculation of Seed Sensitivity","authors":"Denise Y. F. Mak, Gary Benson","doi":"10.1142/9781860947995_0035","DOIUrl":"https://doi.org/10.1142/9781860947995_0035","url":null,"abstract":"Standard search techniques for DNA repeats start by identifying seeds , that is, small matching words, that may inhabit larger repeats. Recent innovations in seed structure have led to the development of spacedseeds [8] andindel seeds [9] which are more sensitive than contiguous seeds (also known as k-mers, k-tuples, l-words, etc.). Evaluating seed s nsitivityrequires 1) specifying a homology model which describes types of alignments that can occur between two copies of a repeat, and 2) assigning probabilities to those alignments. Optimal seed selection is a resource intensive activity because essentially all alternative seeds must be tested [7]. Current methods require that the model and probability parameters be specified in advance. When the parameters change, the entire calculation has to be rerun. In this paper, we show how to eliminatethe need for prior parameter specification. The ideas presented follow from a simple observation: given a homology model, the alignments hit by a particular seed remain the same regardless of the probability parameters. Only the weights assigned to those alignments change. Therefore, if we know all the hits, we can easily (and quickly) find optimal seeds. We describe a highly efficient preprocessing step, which is computed just oncefor each seed. In this calculation, strings which represent possible alignments are unweightedby any probability parameters. Then we show several increasingly efficient methods to find the optimal seed when given specific probability parameters. Indeed, we show how to determine exactly which seeds can never be optimal under any set of probability parameters. This leads to the startling observation that out of thousands of seeds, only a handful have any chance of being optimal. We then show how to find optimal seeds and the boundaries within probability space where they are optimal. We expect this method to greatly facilitate the study of seed space sensitivity, construction of multiple seed sets, and the use of alternative definitions of optimality.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77724482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Semi-supervised Pattern Learning for Extracting Relations from Bioscience Texts 生物科学文本关系提取的半监督模式学习
Pub Date : 2007-01-01 DOI: 10.1142/9781860947995_0033
Shilin Ding, Minlie Huang, Xiaoyan Zhu
A variety of pattern-based methods have been exploited to extract biological relations from literatures. Many of them require significant domain-specific knowledge to build the patterns by hand, or a large amount of labeled data to learn the patterns automatically. In this paper, a semisupervised model is presented to combine both unlabeled and labeled data for the pattern learning procedure. First, a large amount of unlabeled data is used to generate a raw pattern set. Then it is refined in the evaluating phase by incorporating the domain knowledge provided by a relatively small labeled data. Comparative results show that labeled data, when used in conjunction with the inexpensive unlabeled data, can considerably improve the learning accuracy.
各种基于模式的方法已经被用来从文献中提取生物关系。它们中的许多都需要大量的领域特定知识来手工构建模式,或者需要大量的标记数据来自动学习模式。本文提出了一种半监督模型,将未标记数据和标记数据结合起来进行模式学习。首先,使用大量未标记的数据来生成原始模式集。然后在评估阶段通过结合由相对较小的标记数据提供的领域知识对其进行细化。对比结果表明,标记数据与廉价的未标记数据结合使用,可以显著提高学习精度。
{"title":"Semi-supervised Pattern Learning for Extracting Relations from Bioscience Texts","authors":"Shilin Ding, Minlie Huang, Xiaoyan Zhu","doi":"10.1142/9781860947995_0033","DOIUrl":"https://doi.org/10.1142/9781860947995_0033","url":null,"abstract":"A variety of pattern-based methods have been exploited to extract biological relations from literatures. Many of them require significant domain-specific knowledge to build the patterns by hand, or a large amount of labeled data to learn the patterns automatically. In this paper, a semisupervised model is presented to combine both unlabeled and labeled data for the pattern learning procedure. First, a large amount of unlabeled data is used to generate a raw pattern set. Then it is refined in the evaluating phase by incorporating the domain knowledge provided by a relatively small labeled data. Comparative results show that labeled data, when used in conjunction with the inexpensive unlabeled data, can considerably improve the learning accuracy.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88422880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Exploring Genomes of Distantly Related Mammals 探索近亲哺乳动物的基因组
Pub Date : 2007-01-01 DOI: 10.1142/9781860947995_0001
J. Graves
There are three groups of extant mammals, two of which abound in Australia. Marsupials (kangaroos and their relatives) and monotremes (echidna and the fabulous platypus) have been evolving independently for most of mammalian history. The genomes of marsupial and monotreme mammals are particularly valuable because these alternative mammals fill a phylogenetic gap in vertebrate species lined up for exhaustive genomic study. Human and mice (∼70MY) are too close to distinguish signal, whereas mammal/bird comparisons (∼310MY) are too distant to allow alignment. Kangaroos (180 MY) and platypus (210 MY) are just right. Sequence has diverged sufficiently for stringent detection of homologies that can reveal coding regions and regulatory signals. Importantly, marsupials and monotremes share with humans many mammal-specific developmental pathways and regulatory systems such as sex determination, lactation and X chromosome inactivation. The ARC Centre for Kangaroo Genomics is characterizing the genome of the model Australian kangaroo Macropus eugenii (the tammar wallaby), which is being sequenced by AGRF in Australia, and Baylor (funded by NIH) in the US. We are developing detailed physical and linkage maps of the genome to complement sequencing, and will prepare and array cDNAs for functional studies, especially of reproduction and development. Complete sequencing of the distantly related Brazilian short-tailed opossum Monodelphis domestica by the NIH allows us to compare distantly related marsupials. Sequencing of the genome of the platypus, Ornithorhynchus anatinus by Washington University (funded by the NIH) is complete, and our lab is anchoring contigs to the physical map. We have isolated and completely characterized many BACs and cDNAs containing kangaroo and platypus genes of interest, and demonstrate the value of comparisons to reveal conserved genome organization and function, and new insights in the evolution of the mammalian genome, particularly sex chromosomes.
现存的哺乳动物有三种,其中两种分布在澳大利亚。有袋动物(袋鼠及其近亲)和单孔动物(针鼹和鸭嘴兽)在哺乳动物历史的大部分时间里都是独立进化的。有袋类和单孔类哺乳动物的基因组特别有价值,因为这些替代的哺乳动物填补了脊椎动物物种的系统发育空白,需要进行详尽的基因组研究。人和小鼠(~ 70MY)距离太近,无法区分信号,而哺乳动物/鸟类比较(~ 310MY)距离太远,无法进行比对。袋鼠(180米)和鸭嘴兽(210米)刚刚好。序列已经分化到足以进行严格的同源性检测,从而揭示编码区域和调控信号。重要的是,有袋动物和单孔动物与人类共享许多哺乳动物特有的发育途径和调节系统,如性别决定、哺乳和X染色体失活。ARC袋鼠基因组学中心正在描述澳大利亚袋鼠模型Macropus eugenii(灰袋鼠)的基因组特征,澳大利亚AGRF和美国贝勒(由美国国立卫生研究院资助)正在对其进行测序。我们正在开发详细的基因组物理图谱和连锁图谱,以补充测序,并将为功能研究,特别是生殖和发育研究准备和排列cdna。由美国国立卫生研究院完成的远亲巴西短尾负鼠的完整测序使我们能够比较远亲的有袋动物。华盛顿大学(由美国国立卫生研究院资助)鸭嘴兽Ornithorhynchus anatinus的基因组测序已经完成,我们的实验室正在将基因组锚定在物理图谱上。我们已经分离并完全表征了许多含有袋鼠和鸭嘴兽基因的bac和cdna,并证明了比较的价值,揭示了保守的基因组组织和功能,以及哺乳动物基因组进化的新见解,特别是性染色体。
{"title":"Exploring Genomes of Distantly Related Mammals","authors":"J. Graves","doi":"10.1142/9781860947995_0001","DOIUrl":"https://doi.org/10.1142/9781860947995_0001","url":null,"abstract":"There are three groups of extant mammals, two of which abound in Australia. Marsupials (kangaroos and their relatives) and monotremes (echidna and the fabulous platypus) have been evolving independently for most of mammalian history. The genomes of marsupial and monotreme mammals are particularly valuable because these alternative mammals fill a phylogenetic gap in vertebrate species lined up for exhaustive genomic study. Human and mice (∼70MY) are too close to distinguish signal, whereas mammal/bird comparisons (∼310MY) are too distant to allow alignment. Kangaroos (180 MY) and platypus (210 MY) are just right. Sequence has diverged sufficiently for stringent detection of homologies that can reveal coding regions and regulatory signals. Importantly, marsupials and monotremes share with humans many mammal-specific developmental pathways and regulatory systems such as sex determination, lactation and X chromosome inactivation. The ARC Centre for Kangaroo Genomics is characterizing the genome of the model Australian kangaroo Macropus eugenii (the tammar wallaby), which is being sequenced by AGRF in Australia, and Baylor (funded by NIH) in the US. We are developing detailed physical and linkage maps of the genome to complement sequencing, and will prepare and array cDNAs for functional studies, especially of reproduction and development. Complete sequencing of the distantly related Brazilian short-tailed opossum Monodelphis domestica by the NIH allows us to compare distantly related marsupials. Sequencing of the genome of the platypus, Ornithorhynchus anatinus by Washington University (funded by the NIH) is complete, and our lab is anchoring contigs to the physical map. We have isolated and completely characterized many BACs and cDNAs containing kangaroo and platypus genes of interest, and demonstrate the value of comparisons to reveal conserved genome organization and function, and new insights in the evolution of the mammalian genome, particularly sex chromosomes.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78925367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deriving Protein Structure Topology from the Helix Skeletion in Low Resolution Density Map using Rosetta 利用Rosetta从低分辨率密度图的螺旋骨架中提取蛋白质结构拓扑
Pub Date : 2007-01-01 DOI: 10.1142/9781860947995_0017
Y. Lu, Jing He, C. Strauss
Electron cryo-microscopy (cryo-EM) is an experimental technique to determine the 3-dimensional structure for large protein complexes. Currently this technique is able to generate protein density maps at 6 to 9 A resolution. Although secondary structures such as α-helix and β-sheet can be visualized from these maps, there is no mature approach to deduce their tertiary topology, the linear order of the secondary structures on the sequence. The problem is challenging because given N secondary structure elements, the number of possible orders is (2)*N!. We have developed a method to predict the topology of the secondary structures using ab initio structure prediction. The Rosetta structure prediction algorithm was used to make purely sequence based structure predictions for the protein. We produced 1000 of these ab initio models, and then screened the models produced by Rosetta for agreement with the helix skeleton derived from the density map. The method was benchmarked on 60 mainly alpha helical proteins, finding that for about 3/4 of all the proteins, the majority of the helices in the skeleton were correctly assigned by one of the top 10 suggested topologies from the method, while for about 1/3 of all the proteins the best topology assignment without errors was ranked the first. This approach also provides an estimate of the sequence alignment of the skeleton. For most of those true-positive assignments, the alignment was accurate to within +/2 amino acids in the sequence.
电子冷冻显微镜(cryo-EM)是一种测定大型蛋白质复合物三维结构的实验技术。目前,这项技术能够生成6到9a分辨率的蛋白质密度图。虽然从这些图中可以直观地看到α-螺旋和β-薄片等二级结构,但还没有成熟的方法来推断它们的三级拓扑结构,即二级结构在序列上的线性顺序。这个问题很有挑战性,因为给定N个二级结构元素,可能的阶数是(2)*N!我们开发了一种利用从头算结构预测来预测二级结构拓扑的方法。使用Rosetta结构预测算法对蛋白质进行纯粹基于序列的结构预测。我们制作了1000个这样的从头开始模型,然后筛选由Rosetta生成的模型,以与从密度图中导出的螺旋骨架一致。该方法对60种主要的α螺旋蛋白进行了基准测试,发现对于大约3/4的蛋白质,骨架中的大多数螺旋都被该方法建议的前10种拓扑结构中的一种正确分配,而对于大约1/3的蛋白质,最佳的无错误拓扑分配被排在第一位。这种方法还提供了对骨架序列对齐的估计。对于大多数真阳性鉴定,比对结果精确到序列中+/2个氨基酸。
{"title":"Deriving Protein Structure Topology from the Helix Skeletion in Low Resolution Density Map using Rosetta","authors":"Y. Lu, Jing He, C. Strauss","doi":"10.1142/9781860947995_0017","DOIUrl":"https://doi.org/10.1142/9781860947995_0017","url":null,"abstract":"Electron cryo-microscopy (cryo-EM) is an experimental technique to determine the 3-dimensional structure for large protein complexes. Currently this technique is able to generate protein density maps at 6 to 9 A resolution. Although secondary structures such as α-helix and β-sheet can be visualized from these maps, there is no mature approach to deduce their tertiary topology, the linear order of the secondary structures on the sequence. The problem is challenging because given N secondary structure elements, the number of possible orders is (2)*N!. We have developed a method to predict the topology of the secondary structures using ab initio structure prediction. The Rosetta structure prediction algorithm was used to make purely sequence based structure predictions for the protein. We produced 1000 of these ab initio models, and then screened the models produced by Rosetta for agreement with the helix skeleton derived from the density map. The method was benchmarked on 60 mainly alpha helical proteins, finding that for about 3/4 of all the proteins, the majority of the helices in the skeleton were correctly assigned by one of the top 10 suggested topologies from the method, while for about 1/3 of all the proteins the best topology assignment without errors was ranked the first. This approach also provides an estimate of the sequence alignment of the skeleton. For most of those true-positive assignments, the alignment was accurate to within +/2 amino acids in the sequence.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83024073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Combining N-grams and Alignment in G-protein Coupling Specificity Prediction 结合N-grams和比对技术预测g蛋白偶联特异性
Pub Date : 2007-01-01 DOI: 10.1142/9781860947995_0038
B. Cheng, J. Carbonell
G-protein coupled receptors (GPCR) interact with G-proteins to regulate much of the cell’s response to external stimuli; abnormalities in which cause numerous diseases. We developed a new method to predict the families of G-proteins with which it interacts, given its residue sequence. We combine both alignment and n-gram features. The former captures long-range interactions but assumes the linear ordering of conserved segments is preserved. The latter makes no such assumption but cannot capture long-range interactions. By combining alignment and n-gram features, and using the entire GPCR sequence (instead of intracellular regions alone, as was done by others), our method outperformed the current state-of-the-art in precision, recall and F1, attaining 0.753 in F1 and 0.796 in accuracy on the PTbase 2004 dataset. Moreover, analysis of our results shows that the majority of coupling specificity information lies in the beginning of the 2nd intracellular loop and over the length of the 3rd.
g蛋白偶联受体(GPCR)与g蛋白相互作用,调节细胞对外界刺激的反应;引起许多疾病的异常。我们开发了一种新的方法来预测与它相互作用的g蛋白家族,给定它的残基序列。我们结合了对齐和n-gram特征。前者捕获远程相互作用,但假设保留了保守片段的线性顺序。后者没有这样的假设,但无法捕捉远程相互作用。通过结合比对和n-gram特征,并使用整个GPCR序列(而不是像其他人那样单独使用细胞内区域),我们的方法在精度,召回率和F1方面优于当前最先进的技术,在PTbase 2004数据集上F1达到0.753,准确性为0.796。此外,我们的结果分析表明,大多数偶联特异性信息位于细胞内第2环的开始和第3环的长度。
{"title":"Combining N-grams and Alignment in G-protein Coupling Specificity Prediction","authors":"B. Cheng, J. Carbonell","doi":"10.1142/9781860947995_0038","DOIUrl":"https://doi.org/10.1142/9781860947995_0038","url":null,"abstract":"G-protein coupled receptors (GPCR) interact with G-proteins to regulate much of the cell’s response to external stimuli; abnormalities in which cause numerous diseases. We developed a new method to predict the families of G-proteins with which it interacts, given its residue sequence. We combine both alignment and n-gram features. The former captures long-range interactions but assumes the linear ordering of conserved segments is preserved. The latter makes no such assumption but cannot capture long-range interactions. By combining alignment and n-gram features, and using the entire GPCR sequence (instead of intracellular regions alone, as was done by others), our method outperformed the current state-of-the-art in precision, recall and F1, attaining 0.753 in F1 and 0.796 in accuracy on the PTbase 2004 dataset. Moreover, analysis of our results shows that the majority of coupling specificity information lies in the beginning of the 2nd intracellular loop and over the length of the 3rd.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73245716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Computing the Quartet Distance Between Evolutionary Trees of Bounded Degree 计算有界度进化树之间的四重奏距离
Pub Date : 2007-01-01 DOI: 10.1142/9781860947995_0013
M. Stissing, Christian N. S. Pedersen, T. Mailund, G. Brodal, Rolf Fagerberg
We present an algorithm for calculating the quartet distance between two evolutionary trees of bounded degree on a common set of n species. The previous best algorithm has running time O(d2n2) when considering trees, where no node is of more than degree d. The algorithm developed herein has running time O(d9n logn)) which makes it the first algorithm for computing the quartet distance between non-binary trees which has a sub-quadratic worst case running time.
提出了一种在n个物种的公共集合上计算两个有界度进化树之间的四重奏距离的算法。先前的最佳算法在考虑树时的运行时间为O(d2n2),其中节点不超过d度。本文开发的算法的运行时间为O(d9n logn)),这使得它成为第一个计算具有次二次最坏情况运行时间的非二叉树之间的四维距离的算法。
{"title":"Computing the Quartet Distance Between Evolutionary Trees of Bounded Degree","authors":"M. Stissing, Christian N. S. Pedersen, T. Mailund, G. Brodal, Rolf Fagerberg","doi":"10.1142/9781860947995_0013","DOIUrl":"https://doi.org/10.1142/9781860947995_0013","url":null,"abstract":"We present an algorithm for calculating the quartet distance between two evolutionary trees of bounded degree on a common set of n species. The previous best algorithm has running time O(d2n2) when considering trees, where no node is of more than degree d. The algorithm developed herein has running time O(d9n logn)) which makes it the first algorithm for computing the quartet distance between non-binary trees which has a sub-quadratic worst case running time.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73012520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Protein Structure-Structure Alignment with Discrete Fr'echet Distance 蛋白质结构-离散链距离的结构比对
Pub Date : 2007-01-01 DOI: 10.1142/9781860947995_0016
Minghui Jiang, Ying Xu, B. Zhu
{"title":"Protein Structure-Structure Alignment with Discrete Fr'echet Distance","authors":"Minghui Jiang, Ying Xu, B. Zhu","doi":"10.1142/9781860947995_0016","DOIUrl":"https://doi.org/10.1142/9781860947995_0016","url":null,"abstract":"","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79027320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
期刊
Proceedings of the ... Asia-Pacific bioinformatics conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1