首页 > 最新文献

Computational systems bioinformatics. Computational Systems Bioinformatics Conference最新文献

英文 中文
Novel Gene Discovery in the Human Malaria Parasite using Nucleosome Positioning Data. 利用核糖体定位数据发现人类疟疾寄生虫中的新基因
N Pokhriyal, N Ponts, E Y Harris, K G Le Roch, S Lonardi

Recent genome-wide studies on nucleosome positioning in model organisms have shown strong evidence that nucleosome landscapes in the proximity of protein-coding genes exhibit regular characteristic patterns. Here, we propose a computational framework to discover novel genes in the human malaria parasite genome P. falciparum using nucleosome positioning inferred from MAINE-seq data. We rely on a classifier trained on the nucleosome landscape profiles of experimentally verified genes, and then used to discover new genes (without considering the primary DNA sequence). Cross-validation experiments show that our classifier is very accurate. About two thirds of the locations reported by the classifier match experimentally determined expressed sequence tags in GenBank, for which no gene has been annotated in the human malaria parasite.

最近关于模式生物核糖体定位的全基因组研究有力地证明,蛋白编码基因附近的核糖体景观表现出规律性的特征模式。在这里,我们提出了一个计算框架,利用从 MAINE-seq 数据中推断出的核糖体定位发现人类疟原虫基因组恶性疟原虫中的新基因。我们依靠的是根据实验验证基因的核糖体分布图训练的分类器,然后用来发现新基因(不考虑主 DNA 序列)。交叉验证实验表明,我们的分类器非常准确。分类器报告的位置中约有三分之二与 GenBank 中经实验确定的表达序列标签相匹配,而人类疟原虫中还没有基因被注释。
{"title":"Novel Gene Discovery in the Human Malaria Parasite using Nucleosome Positioning Data.","authors":"N Pokhriyal, N Ponts, E Y Harris, K G Le Roch, S Lonardi","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Recent genome-wide studies on nucleosome positioning in model organisms have shown strong evidence that nucleosome landscapes in the proximity of protein-coding genes exhibit regular characteristic patterns. Here, we propose a computational framework to discover novel genes in the human malaria parasite genome <i>P. falciparum</i> using nucleosome positioning inferred from MAINE-seq data. We rely on a classifier trained on the nucleosome landscape profiles of experimentally verified genes, and then used to discover new genes (without considering the primary DNA sequence). Cross-validation experiments show that our classifier is very accurate. About two thirds of the locations reported by the classifier match experimentally determined expressed sequence tags in GenBank, for which no gene has been annotated in the human malaria parasite.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"9 ","pages":"124-135"},"PeriodicalIF":0.0,"publicationDate":"2010-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4112967/pdf/nihms504365.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32547026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating support for protein-protein interaction data with applications to function prediction. 估计蛋白质-蛋白质相互作用数据在功能预测中的应用支持度。
Erliang Zeng, C. Ding, G. Narasimhan, S. Holbrook
Almost every cellular process requires the interactions of pairs or larger complexes of proteins. High throughput protein-protein interaction (PPI) data have been generated using techniques such as the yeast two-hybrid systems, mass spectrometry method, and many more. Such data provide us with a new perspective to predict protein functions and to generate protein-protein interaction networks, and many recent algorithms have been developed for this purpose. However, PPI data generated using high throughput techniques contain a large number of false positives. In this paper, we have proposed a novel method to evaluate the support for PPI data based on gene ontology information. If the semantic similarity between genes is computed using gene ontology information and using Resnik's formula, then our results show that we can model the PPI data as a mixture model predicated on the assumption that true protein-protein interactions will have higher support than the false positives in the data. Thus semantic similarity between genes serves as a metric of support for PPI data. Taking it one step further, new function prediction approaches are also being proposed with the help of the proposed metric of the support for the PPI data. These new function prediction approaches outperform their conventional counterparts. New evaluation methods are also proposed.
几乎每一个细胞过程都需要蛋白质对或更大的复合物的相互作用。高通量蛋白质-蛋白质相互作用(PPI)数据已经使用酵母双杂交系统、质谱法等技术生成。这些数据为我们预测蛋白质功能和生成蛋白质-蛋白质相互作用网络提供了一个新的视角,许多最近的算法已经为此目的而开发。然而,使用高通量技术生成的PPI数据包含大量假阳性。本文提出了一种基于基因本体信息的PPI数据支持度评价方法。如果使用基因本体信息和Resnik公式计算基因之间的语义相似度,那么我们的结果表明,我们可以将PPI数据建模为混合模型,该模型基于真实蛋白质-蛋白质相互作用比数据中的假阳性具有更高的支持度的假设。因此,基因之间的语义相似性作为支持PPI数据的度量。更进一步,新的功能预测方法也在PPI数据支持度指标的帮助下被提出。这些新的函数预测方法优于传统的方法。提出了新的评价方法。
{"title":"Estimating support for protein-protein interaction data with applications to function prediction.","authors":"Erliang Zeng, C. Ding, G. Narasimhan, S. Holbrook","doi":"10.1142/9781848162648_0007","DOIUrl":"https://doi.org/10.1142/9781848162648_0007","url":null,"abstract":"Almost every cellular process requires the interactions of pairs or larger complexes of proteins. High throughput protein-protein interaction (PPI) data have been generated using techniques such as the yeast two-hybrid systems, mass spectrometry method, and many more. Such data provide us with a new perspective to predict protein functions and to generate protein-protein interaction networks, and many recent algorithms have been developed for this purpose. However, PPI data generated using high throughput techniques contain a large number of false positives. In this paper, we have proposed a novel method to evaluate the support for PPI data based on gene ontology information. If the semantic similarity between genes is computed using gene ontology information and using Resnik's formula, then our results show that we can model the PPI data as a mixture model predicated on the assumption that true protein-protein interactions will have higher support than the false positives in the data. Thus semantic similarity between genes serves as a metric of support for PPI data. Taking it one step further, new function prediction approaches are also being proposed with the help of the proposed metric of the support for the PPI data. These new function prediction approaches outperform their conventional counterparts. New evaluation methods are also proposed.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 1","pages":"73-84"},"PeriodicalIF":0.0,"publicationDate":"2008-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64003189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
A max-flow based approach to the identification of protein complexes using protein interaction and microarray data. 利用蛋白质相互作用和微阵列数据,基于最大流量的方法来鉴定蛋白质复合物。
Jianxing Feng, Rui Jiang, Tao Jiang

The emergence of high-throughput technologies leads to abundant protein-protein interaction (PPI) data and microarray gene expression profiles, and provides a great opportunity for the identification of novel protein complexes using computational methods. Although it has been demonstrated in the literature that methods using protein-protein interaction data alone can successfully predict a large number of protein complexes, the incorporation of gene expression profiles could help refine the putative complexes and hence improve the accuracy of the computational methods. By combining protein-protein interaction data and microarray gene expression profiles, we propose a novel Graph Fragmentation Algorithm (GFA) for protein complex identification. Adapted from a classical max-flow algorithm for finding the (weighted) densest subgraphs, GFA first finds large (weighted) dense subgraphs in a protein-protein interaction network and then breaks each such subgraph into fragments iteratively by weighting its nodes appropriately in terms of their corresponding log fold changes in the microarray data, until the fragment subgraphs are sufficiently small. Our extensive tests on three widely used protein-protein interaction datasets and comparisons with the latest methods for protein complex identification demonstrate the superior performance of our method in terms of accuracy, efficiency, and capability in predicting novel protein complexes. Given the high specificity (or precision) that our method has achieved, we conjecture that our prediction results imply more than 200 novel protein complexes.

高通量技术的出现带来了丰富的蛋白质-蛋白质相互作用(PPI)数据和微阵列基因表达谱,并为使用计算方法鉴定新的蛋白质复合物提供了很大的机会。虽然文献已经证明,仅使用蛋白质-蛋白质相互作用数据的方法可以成功地预测大量蛋白质复合物,但基因表达谱的结合可以帮助改进假定的复合物,从而提高计算方法的准确性。通过结合蛋白质-蛋白质相互作用数据和微阵列基因表达谱,我们提出了一种新的用于蛋白质复合物识别的图碎片算法(GFA)。GFA改编自寻找(加权)最密集子图的经典最大流算法,首先在蛋白质-蛋白质相互作用网络中找到大的(加权)密集子图,然后根据微阵列数据中相应的对数折叠变化对其节点进行适当加权,迭代地将每个这样的子图分解为片段,直到片段子图足够小。我们对三种广泛使用的蛋白质-蛋白质相互作用数据集进行了广泛的测试,并与最新的蛋白质复合物鉴定方法进行了比较,证明了我们的方法在预测新型蛋白质复合物的准确性、效率和能力方面具有优越的性能。鉴于我们的方法已经达到的高特异性(或精度),我们推测我们的预测结果意味着超过200种新的蛋白质复合物。
{"title":"A max-flow based approach to the identification of protein complexes using protein interaction and microarray data.","authors":"Jianxing Feng,&nbsp;Rui Jiang,&nbsp;Tao Jiang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The emergence of high-throughput technologies leads to abundant protein-protein interaction (PPI) data and microarray gene expression profiles, and provides a great opportunity for the identification of novel protein complexes using computational methods. Although it has been demonstrated in the literature that methods using protein-protein interaction data alone can successfully predict a large number of protein complexes, the incorporation of gene expression profiles could help refine the putative complexes and hence improve the accuracy of the computational methods. By combining protein-protein interaction data and microarray gene expression profiles, we propose a novel Graph Fragmentation Algorithm (GFA) for protein complex identification. Adapted from a classical max-flow algorithm for finding the (weighted) densest subgraphs, GFA first finds large (weighted) dense subgraphs in a protein-protein interaction network and then breaks each such subgraph into fragments iteratively by weighting its nodes appropriately in terms of their corresponding log fold changes in the microarray data, until the fragment subgraphs are sufficiently small. Our extensive tests on three widely used protein-protein interaction datasets and comparisons with the latest methods for protein complex identification demonstrate the superior performance of our method in terms of accuracy, efficiency, and capability in predicting novel protein complexes. Given the high specificity (or precision) that our method has achieved, we conjecture that our prediction results imply more than 200 novel protein complexes.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 ","pages":"51-62"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28336172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MSDash: mass spectrometry database and search. 质谱数据库和搜索。
Zhan Wu, Gilles Lajoie, Bin Ma

Along with the wide application of mass spectrometry in proteomics, more and more mass spectrometry data are becoming publicly available. Several public mass spectrometry data repositories have been built on the Internet. However, most of these repositories are devoid of effective searching methods. In this paper we describe a new mass spectrometry data library, and a novel method to efficiently index and search in the library for spectra that are similar to a query spectrum. A public online server have been set up and demonstrated outstanding speed and scalability of our methods. Together with the mass spectrometry library, our searching method can improve the protein identification confidence by comparing a spectrum with the ones that are already characterized in the database. The searching method can also be used alone to cluster the similar spectra in a mass spectrometry dataset together, in order to to improve the speed and accuracy of the protein identification or quantification.

随着质谱法在蛋白质组学中的广泛应用,越来越多的质谱数据公开。在互联网上建立了几个公共质谱数据库。然而,这些存储库大多缺乏有效的搜索方法。本文描述了一种新的质谱数据库,并提出了一种新的方法来高效地索引和搜索与查询谱相似的谱库。已经建立了一个公共在线服务器,并证明了我们的方法具有出色的速度和可扩展性。与质谱库一起,我们的搜索方法可以通过与数据库中已经表征的谱进行比较来提高蛋白质鉴定的置信度。该搜索方法也可以单独用于将质谱数据集中的相似谱聚类在一起,以提高蛋白质鉴定或定量的速度和准确性。
{"title":"MSDash: mass spectrometry database and search.","authors":"Zhan Wu,&nbsp;Gilles Lajoie,&nbsp;Bin Ma","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Along with the wide application of mass spectrometry in proteomics, more and more mass spectrometry data are becoming publicly available. Several public mass spectrometry data repositories have been built on the Internet. However, most of these repositories are devoid of effective searching methods. In this paper we describe a new mass spectrometry data library, and a novel method to efficiently index and search in the library for spectra that are similar to a query spectrum. A public online server have been set up and demonstrated outstanding speed and scalability of our methods. Together with the mass spectrometry library, our searching method can improve the protein identification confidence by comparing a spectrum with the ones that are already characterized in the database. The searching method can also be used alone to cluster the similar spectra in a mass spectrometry dataset together, in order to to improve the speed and accuracy of the protein identification or quantification.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 ","pages":"63-71"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28336173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Error tolerant sibship reconstruction in wild populations. 野生种群的容错兄弟姐妹重建。
Saad I Sheikh, Tanya Y Berger-Wolf, Mary V Ashley, Isabel C Caballero, Wanpracha Chaovalitwongse, Bhaskar DasGupta

Kinship analysis using genetic data is important for many biological applications, including many in conservation biology. Wide availability of microsatellites has boosted studies in wild populations that rely on the knowledge of kinship, particularly sibling relationships (sibship). While there exist many methods for reconstructing sibling relationships, almost none account for errors and mutations in microsatellite data, which are prevalent and affect the quality of reconstruction. We present an error-tolerant method for reconstructing sibling relationships based on the concept of consensus methods. We test our approach on both real and simulated data, with both pre-existing and introduced errors. Our method is highly accurate on almost all simulations, giving over 90% accuracy in most cases. Ours is the first method designed to tolerate errors while making no assumptions about the population or the sampling.

利用遗传数据进行亲缘关系分析对许多生物学应用都很重要,包括保护生物学中的许多应用。微型卫星的广泛使用促进了对依赖亲属关系,特别是兄弟姐妹关系的野生种群的研究。虽然有许多重建兄弟关系的方法,但几乎没有一种方法能考虑到微卫星数据中普遍存在的误差和突变,这些误差和突变影响了重建的质量。基于共识方法的概念,提出了一种重构兄弟关系的容错方法。我们在真实和模拟数据上测试了我们的方法,包括预先存在的和引入的错误。我们的方法在几乎所有的模拟中都非常准确,在大多数情况下准确率超过90%。我们的方法是第一个在不对总体或抽样做任何假设的情况下允许误差的方法。
{"title":"Error tolerant sibship reconstruction in wild populations.","authors":"Saad I Sheikh,&nbsp;Tanya Y Berger-Wolf,&nbsp;Mary V Ashley,&nbsp;Isabel C Caballero,&nbsp;Wanpracha Chaovalitwongse,&nbsp;Bhaskar DasGupta","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Kinship analysis using genetic data is important for many biological applications, including many in conservation biology. Wide availability of microsatellites has boosted studies in wild populations that rely on the knowledge of kinship, particularly sibling relationships (sibship). While there exist many methods for reconstructing sibling relationships, almost none account for errors and mutations in microsatellite data, which are prevalent and affect the quality of reconstruction. We present an error-tolerant method for reconstructing sibling relationships based on the concept of consensus methods. We test our approach on both real and simulated data, with both pre-existing and introduced errors. Our method is highly accurate on almost all simulations, giving over 90% accuracy in most cases. Ours is the first method designed to tolerate errors while making no assumptions about the population or the sampling.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 ","pages":"273-84"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28337729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Iterative non-sequential protein structural alignment. 迭代非顺序蛋白质结构比对。
Saeed Salem, Mohammed J Zaki

Structural similarity between proteins gives us insights on the evolutionary relationship between proteins which have low sequence similarity. In this paper, we present a novel approach called STSA for non-sequential pair-wise structural alignment. Starting from an initial alignment, our approach iterates over a two-step process, a superposition step and an alignment step, until convergence. Given two superposed structures, we propose a novel greedy algorithm to construct both sequential and non-sequential alignments. The quality of STSA alignments is evident in the high agreement it has with the reference alignments in the challenging-to-align RPIC set. Moreover, on a dataset of 4410 protein pairs selected from the CATH database, STSA has a high sensitivity and high specificity values and is competitive with state-of-the-art alignment methods and gives longer alignments with lower rmsd. The STSA software along with the data sets will be made available on line at http://www.cs.rpi.edu/-zaki/software/STSA.

蛋白质之间的结构相似性使我们对序列相似性较低的蛋白质之间的进化关系有了更深入的了解。在本文中,我们提出了一种称为STSA的非顺序成对结构对齐的新方法。从初始对齐开始,我们的方法迭代两个步骤的过程,一个叠加步骤和一个对齐步骤,直到收敛。在给定两个重叠结构的情况下,我们提出了一种贪心算法来构造顺序和非顺序对齐。STSA对准的质量在与具有挑战性的RPIC集中的参考对准的高度一致性中是显而易见的。此外,在从CATH数据库中选择的4410个蛋白质对数据集上,STSA具有高灵敏度和高特异性值,与最先进的比对方法相比具有竞争力,并提供较长的比对和较低的rmsd。STSA软件和数据集将在http://www.cs.rpi.edu/-zaki/software/STSA网站上提供。
{"title":"Iterative non-sequential protein structural alignment.","authors":"Saeed Salem,&nbsp;Mohammed J Zaki","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Structural similarity between proteins gives us insights on the evolutionary relationship between proteins which have low sequence similarity. In this paper, we present a novel approach called STSA for non-sequential pair-wise structural alignment. Starting from an initial alignment, our approach iterates over a two-step process, a superposition step and an alignment step, until convergence. Given two superposed structures, we propose a novel greedy algorithm to construct both sequential and non-sequential alignments. The quality of STSA alignments is evident in the high agreement it has with the reference alignments in the challenging-to-align RPIC set. Moreover, on a dataset of 4410 protein pairs selected from the CATH database, STSA has a high sensitivity and high specificity values and is competitive with state-of-the-art alignment methods and gives longer alignments with lower rmsd. The STSA software along with the data sets will be made available on line at http://www.cs.rpi.edu/-zaki/software/STSA.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 ","pages":"183-94"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28337243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing Bayes error for protein structure model selection by stability mutagenesis. 基于稳定性诱变的蛋白质结构模型选择贝叶斯误差优化。
Xiaoduan Ye, A. Friedman, C. Bailey-Kellogg
Site-directed mutagenesis affects protein stability in a manner dependent on the local structural environment of the mutated residue; e.g., a hydrophobic to polar substitution would behave differently in the core vs. on the surface of the protein. Thus site-directed mutagenesis followed by stability measurement enables evaluation of and selection among predicted structure models, based on consistency between predicted and experimental stability changes (DeltaDeltaGo values). This paper develops a method for planning a set of individual site-directed mutations for protein structure model selection, so as to minimize the Bayes error, i.e., the probability of choosing the wrong model. While in general it is hard to calculate exactly the multi-dimensional Bayes error defined by a set of mutations, we leverage the structure of "DeltaDeltaGo space" to develop tight upper and lower bounds. We further develop a lower bound on the Bayes error of any plan that uses a fixed number of mutations from a set of candidates. We use this bound in a branch-and-bound planning algorithm to find optimal and near-optimal plans. We demonstrate the significance and effectiveness of this approach in planning mutations for elucidating the structure of the pTfa chaperone protein from bacteriophage lambda.
定点诱变以依赖于突变残基的局部结构环境的方式影响蛋白质的稳定性;例如,疏水取代到极性取代在蛋白质的核心和表面表现不同。因此,基于预测和实验稳定性变化(DeltaDeltaGo值)之间的一致性,定点诱变和稳定性测量可以对预测的结构模型进行评估和选择。本文提出了一种规划一组单个位点定向突变用于蛋白质结构模型选择的方法,以最小化贝叶斯误差,即选择错误模型的概率。虽然通常很难精确计算由一组突变定义的多维贝叶斯误差,但我们利用“DeltaDeltaGo空间”的结构来开发严密的上界和下界。我们进一步开发了使用候选集合中固定数量的突变的任何计划的贝叶斯误差的下界。我们在分支定界规划算法中使用这个定界来寻找最优和近最优规划。我们证明了这种方法在计划突变以阐明来自噬菌体lambda的pTfa伴侣蛋白结构方面的重要性和有效性。
{"title":"Optimizing Bayes error for protein structure model selection by stability mutagenesis.","authors":"Xiaoduan Ye, A. Friedman, C. Bailey-Kellogg","doi":"10.1142/9781848162648_0009","DOIUrl":"https://doi.org/10.1142/9781848162648_0009","url":null,"abstract":"Site-directed mutagenesis affects protein stability in a manner dependent on the local structural environment of the mutated residue; e.g., a hydrophobic to polar substitution would behave differently in the core vs. on the surface of the protein. Thus site-directed mutagenesis followed by stability measurement enables evaluation of and selection among predicted structure models, based on consistency between predicted and experimental stability changes (DeltaDeltaGo values). This paper develops a method for planning a set of individual site-directed mutations for protein structure model selection, so as to minimize the Bayes error, i.e., the probability of choosing the wrong model. While in general it is hard to calculate exactly the multi-dimensional Bayes error defined by a set of mutations, we leverage the structure of \"DeltaDeltaGo space\" to develop tight upper and lower bounds. We further develop a lower bound on the Bayes error of any plan that uses a fixed number of mutations from a set of candidates. We use this bound in a branch-and-bound planning algorithm to find optimal and near-optimal plans. We demonstrate the significance and effectiveness of this approach in planning mutations for elucidating the structure of the pTfa chaperone protein from bacteriophage lambda.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 1","pages":"99-108"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64003318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Fast and accurate multi-class protein fold recognition with spatial sample kernels. 利用空间样本核快速准确地识别多类蛋白质折叠。
P. Kuksa, Pai-Hsi Huang, V. Pavlovic
Establishing structural or functional relationship between sequences, for instance to infer the structural class of an unannotated protein, is a key task in biological sequence analysis. Recent computational methods such as profile and neighborhood mismatch kernels have shown very promising results for protein sequence classification, at the cost of high computational complexity. In this study we address the multi-class sequence classification problems using a class of string-based kernels, the sparse spatial sample kernels (SSSK), that are both biologically motivated and efficient to compute. The proposed methods can work with very large databases of protein sequences and show substantial improvements in computing time over the existing methods. Application of the SSSK to the multi-class protein prediction problems (fold recognition and remote homology detection) yields significantly better performance than existing state-of-the-art algorithms.
建立序列之间的结构或功能关系,例如推断未注释蛋白质的结构类别,是生物序列分析的关键任务。最近的计算方法,如轮廓和邻域不匹配核,在蛋白质序列分类中显示出非常有希望的结果,但代价是计算复杂度很高。在本研究中,我们使用一类基于字符串的核,即稀疏空间样本核(SSSK)来解决多类序列分类问题,该核既具有生物动机又具有计算效率。所提出的方法可以处理非常大的蛋白质序列数据库,并且在计算时间上比现有方法有了实质性的改进。将SSSK应用于多类蛋白质预测问题(折叠识别和远程同源性检测)的性能明显优于现有的最先进算法。
{"title":"Fast and accurate multi-class protein fold recognition with spatial sample kernels.","authors":"P. Kuksa, Pai-Hsi Huang, V. Pavlovic","doi":"10.1142/9781848162648_0012","DOIUrl":"https://doi.org/10.1142/9781848162648_0012","url":null,"abstract":"Establishing structural or functional relationship between sequences, for instance to infer the structural class of an unannotated protein, is a key task in biological sequence analysis. Recent computational methods such as profile and neighborhood mismatch kernels have shown very promising results for protein sequence classification, at the cost of high computational complexity. In this study we address the multi-class sequence classification problems using a class of string-based kernels, the sparse spatial sample kernels (SSSK), that are both biologically motivated and efficient to compute. The proposed methods can work with very large databases of protein sequences and show substantial improvements in computing time over the existing methods. Application of the SSSK to the multi-class protein prediction problems (fold recognition and remote homology detection) yields significantly better performance than existing state-of-the-art algorithms.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"59 1","pages":"133-43"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64003404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Consistent alignment of metabolic pathways without abstraction. 一致的排列代谢途径没有抽象。
F. Ay, Tamer Kahveci, V. de Crécy-Lagard
Pathways show how different biochemical entities interact with each other to perform vital functions for the survival of organisms. Similarities between pathways indicate functional similarities that are difficult to identify by comparing the individual entities that make up those pathways. When interacting entities are of single type, the problem of identifying similarities reduces to graph isomorphism problem. However, for pathways with varying types of entities, such as metabolic pathways, alignment problem is more challenging. Existing methods, often, address the metabolic pathway alignment problem by ignoring all the entities except for one type. This kind of abstraction reduces the relevance of the alignment significantly as it causes losses in the information content. In this paper, we develop a method to solve the pairwise alignment problem for metabolic pathways. One distinguishing feature of our method is that it aligns reactions, compounds and enzymes without abstraction of pathways. We pursue the intuition that both pairwise similarities of entities (homology) and their organization (topology) are crucial for metabolic pathway alignment. In our algorithm, we account for both by creating an eigenvalue problem for each entity type. We enforce the consistency by considering the reachability sets of the aligned entities. Our experiments show that, our method finds biologically and statistically significant alignments in the order of seconds for pathways with approximately 100 entities.
途径显示了不同的生化实体如何相互作用,以执行生物体生存的重要功能。途径之间的相似性表明,很难通过比较构成这些途径的单个实体来识别功能上的相似性。当交互实体为单一类型时,识别相似度的问题可简化为图同构问题。然而,对于具有不同类型实体的路径,如代谢路径,对齐问题更具挑战性。现有的方法通常通过忽略除一种类型外的所有实体来解决代谢途径对齐问题。这种抽象显著地降低了对齐的相关性,因为它会导致信息内容的丢失。在本文中,我们开发了一种方法来解决代谢途径的成对比对问题。我们的方法的一个显著特点是,它对齐反应,化合物和酶没有抽象的途径。我们追求的直觉是,实体的两两相似性(同源性)和它们的组织(拓扑)对代谢途径对齐至关重要。在我们的算法中,我们通过为每个实体类型创建一个特征值问题来解释这两个问题。我们通过考虑对齐实体的可达性集来增强一致性。我们的实验表明,我们的方法在大约100个实体的路径中以秒为单位发现了生物学和统计学上显著的对齐。
{"title":"Consistent alignment of metabolic pathways without abstraction.","authors":"F. Ay, Tamer Kahveci, V. de Crécy-Lagard","doi":"10.1142/9781848162648_0021","DOIUrl":"https://doi.org/10.1142/9781848162648_0021","url":null,"abstract":"Pathways show how different biochemical entities interact with each other to perform vital functions for the survival of organisms. Similarities between pathways indicate functional similarities that are difficult to identify by comparing the individual entities that make up those pathways. When interacting entities are of single type, the problem of identifying similarities reduces to graph isomorphism problem. However, for pathways with varying types of entities, such as metabolic pathways, alignment problem is more challenging. Existing methods, often, address the metabolic pathway alignment problem by ignoring all the entities except for one type. This kind of abstraction reduces the relevance of the alignment significantly as it causes losses in the information content. In this paper, we develop a method to solve the pairwise alignment problem for metabolic pathways. One distinguishing feature of our method is that it aligns reactions, compounds and enzymes without abstraction of pathways. We pursue the intuition that both pairwise similarities of entities (homology) and their organization (topology) are crucial for metabolic pathway alignment. In our algorithm, we account for both by creating an eigenvalue problem for each entity type. We enforce the consistency by considering the reachability sets of the aligned entities. Our experiments show that, our method finds biologically and statistically significant alignments in the order of seconds for pathways with approximately 100 entities.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 1","pages":"237-48"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64003751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Error tolerant sibship reconstruction in wild populations. 野生种群的容错兄弟姐妹重建。
S. Sheikh, T. Berger-Wolf, M. Ashley, I. Caballero, W. Chaovalitwongse, B. Dasgupta
Kinship analysis using genetic data is important for many biological applications, including many in conservation biology. Wide availability of microsatellites has boosted studies in wild populations that rely on the knowledge of kinship, particularly sibling relationships (sibship). While there exist many methods for reconstructing sibling relationships, almost none account for errors and mutations in microsatellite data, which are prevalent and affect the quality of reconstruction. We present an error-tolerant method for reconstructing sibling relationships based on the concept of consensus methods. We test our approach on both real and simulated data, with both pre-existing and introduced errors. Our method is highly accurate on almost all simulations, giving over 90% accuracy in most cases. Ours is the first method designed to tolerate errors while making no assumptions about the population or the sampling.
利用遗传数据进行亲缘关系分析对许多生物学应用都很重要,包括保护生物学中的许多应用。微型卫星的广泛使用促进了对依赖亲属关系,特别是兄弟姐妹关系的野生种群的研究。虽然有许多重建兄弟关系的方法,但几乎没有一种方法能考虑到微卫星数据中普遍存在的误差和突变,这些误差和突变影响了重建的质量。基于共识方法的概念,提出了一种重构兄弟关系的容错方法。我们在真实和模拟数据上测试了我们的方法,包括预先存在的和引入的错误。我们的方法在几乎所有的模拟中都非常准确,在大多数情况下准确率超过90%。我们的方法是第一个在不对总体或抽样做任何假设的情况下允许误差的方法。
{"title":"Error tolerant sibship reconstruction in wild populations.","authors":"S. Sheikh, T. Berger-Wolf, M. Ashley, I. Caballero, W. Chaovalitwongse, B. Dasgupta","doi":"10.1142/9781848162648_0024","DOIUrl":"https://doi.org/10.1142/9781848162648_0024","url":null,"abstract":"Kinship analysis using genetic data is important for many biological applications, including many in conservation biology. Wide availability of microsatellites has boosted studies in wild populations that rely on the knowledge of kinship, particularly sibling relationships (sibship). While there exist many methods for reconstructing sibling relationships, almost none account for errors and mutations in microsatellite data, which are prevalent and affect the quality of reconstruction. We present an error-tolerant method for reconstructing sibling relationships based on the concept of consensus methods. We test our approach on both real and simulated data, with both pre-existing and introduced errors. Our method is highly accurate on almost all simulations, giving over 90% accuracy in most cases. Ours is the first method designed to tolerate errors while making no assumptions about the population or the sampling.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 1","pages":"273-84"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64003910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
期刊
Computational systems bioinformatics. Computational Systems Bioinformatics Conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1