首页 > 最新文献

Computational systems bioinformatics. Computational Systems Bioinformatics Conference最新文献

英文 中文
Combining sequence and structural profiles for protein solvent accessibility prediction. 结合序列和结构特征预测蛋白质溶剂可及性。
R. Bondugula, Dong Xu
Solvent accessibility is an important structural feature for a protein. We propose a new method for solvent accessibility prediction that uses known structure and sequence information more efficiently. We first estimate the relative solvent accessibility of the query protein using fuzzy mean operator from the solvent accessibilities of known structure fragments that have similar sequences to the query protein. We then integrate the estimated solvent accessibility and the position specific scoring matrix of the query protein using a neural network. We tested our method on a large data set consisting of 3386 non-redundant proteins. The comparison with other methods show slightly improved prediction accuracies with our method. The resulting system does need not be re-trained when new data is available. We incorporated our method into the MUPRED system, which is available as a web server at http://digbio.missouri.edu/mupred.
溶剂亲和性是蛋白质的重要结构特征。我们提出了一种利用已知结构和序列信息更有效地预测溶剂可及性的新方法。我们首先利用与查询蛋白序列相似的已知结构片段的溶剂可及性,利用模糊平均算子估计查询蛋白的相对溶剂可及性。然后,我们使用神经网络整合估计的溶剂可及性和查询蛋白的位置特定评分矩阵。我们在包含3386个非冗余蛋白的大型数据集上测试了我们的方法。与其他方法的比较表明,本文方法的预测精度略有提高。当有新的数据可用时,生成的系统不需要重新训练。我们将我们的方法合并到MUPRED系统中,该系统作为web服务器可在http://digbio.missouri.edu/mupred上获得。
{"title":"Combining sequence and structural profiles for protein solvent accessibility prediction.","authors":"R. Bondugula, Dong Xu","doi":"10.1142/9781848162648_0017","DOIUrl":"https://doi.org/10.1142/9781848162648_0017","url":null,"abstract":"Solvent accessibility is an important structural feature for a protein. We propose a new method for solvent accessibility prediction that uses known structure and sequence information more efficiently. We first estimate the relative solvent accessibility of the query protein using fuzzy mean operator from the solvent accessibilities of known structure fragments that have similar sequences to the query protein. We then integrate the estimated solvent accessibility and the position specific scoring matrix of the query protein using a neural network. We tested our method on a large data set consisting of 3386 non-redundant proteins. The comparison with other methods show slightly improved prediction accuracies with our method. The resulting system does need not be re-trained when new data is available. We incorporated our method into the MUPRED system, which is available as a web server at http://digbio.missouri.edu/mupred.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 1","pages":"195-202"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64003549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Optimizing Bayes error for protein structure model selection by stability mutagenesis. 基于稳定性诱变的蛋白质结构模型选择贝叶斯误差优化。
Xiaoduan Ye, Alan M Friedman, Chris Bailey-Kellogg

Site-directed mutagenesis affects protein stability in a manner dependent on the local structural environment of the mutated residue; e.g., a hydrophobic to polar substitution would behave differently in the core vs. on the surface of the protein. Thus site-directed mutagenesis followed by stability measurement enables evaluation of and selection among predicted structure models, based on consistency between predicted and experimental stability changes (DeltaDeltaGo values). This paper develops a method for planning a set of individual site-directed mutations for protein structure model selection, so as to minimize the Bayes error, i.e., the probability of choosing the wrong model. While in general it is hard to calculate exactly the multi-dimensional Bayes error defined by a set of mutations, we leverage the structure of "DeltaDeltaGo space" to develop tight upper and lower bounds. We further develop a lower bound on the Bayes error of any plan that uses a fixed number of mutations from a set of candidates. We use this bound in a branch-and-bound planning algorithm to find optimal and near-optimal plans. We demonstrate the significance and effectiveness of this approach in planning mutations for elucidating the structure of the pTfa chaperone protein from bacteriophage lambda.

定点诱变以依赖于突变残基的局部结构环境的方式影响蛋白质的稳定性;例如,疏水取代到极性取代在蛋白质的核心和表面表现不同。因此,基于预测和实验稳定性变化(DeltaDeltaGo值)之间的一致性,定点诱变和稳定性测量可以对预测的结构模型进行评估和选择。本文提出了一种规划一组单个位点定向突变用于蛋白质结构模型选择的方法,以最小化贝叶斯误差,即选择错误模型的概率。虽然通常很难精确计算由一组突变定义的多维贝叶斯误差,但我们利用“DeltaDeltaGo空间”的结构来开发严密的上界和下界。我们进一步开发了使用候选集合中固定数量的突变的任何计划的贝叶斯误差的下界。我们在分支定界规划算法中使用这个定界来寻找最优和近最优规划。我们证明了这种方法在计划突变以阐明来自噬菌体lambda的pTfa伴侣蛋白结构方面的重要性和有效性。
{"title":"Optimizing Bayes error for protein structure model selection by stability mutagenesis.","authors":"Xiaoduan Ye,&nbsp;Alan M Friedman,&nbsp;Chris Bailey-Kellogg","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Site-directed mutagenesis affects protein stability in a manner dependent on the local structural environment of the mutated residue; e.g., a hydrophobic to polar substitution would behave differently in the core vs. on the surface of the protein. Thus site-directed mutagenesis followed by stability measurement enables evaluation of and selection among predicted structure models, based on consistency between predicted and experimental stability changes (DeltaDeltaGo values). This paper develops a method for planning a set of individual site-directed mutations for protein structure model selection, so as to minimize the Bayes error, i.e., the probability of choosing the wrong model. While in general it is hard to calculate exactly the multi-dimensional Bayes error defined by a set of mutations, we leverage the structure of \"DeltaDeltaGo space\" to develop tight upper and lower bounds. We further develop a lower bound on the Bayes error of any plan that uses a fixed number of mutations from a set of candidates. We use this bound in a branch-and-bound planning algorithm to find optimal and near-optimal plans. We demonstrate the significance and effectiveness of this approach in planning mutations for elucidating the structure of the pTfa chaperone protein from bacteriophage lambda.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 ","pages":"99-108"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28337236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feedback algorithm and web-server for protein structure alignment. 蛋白质结构比对的反馈算法和web服务器。
Zhiyu Zhao, Bin Fu, Francisco J Alanis, Christopher M Summa

We have developed a feedback algorithm for protein structure alignment between two protein backbones. A web portal implementing this method has been constructed and is freely available for use at http://fpsa.cs.uno.edu/ with a mirror site at http://fpsa.cs.panam.edu/FPSA/. We compare our algorithm with three other, commonly used methods: CE, DaliLite and SSM. The results show that in most cases our algorithm outputs a larger number of aligned positions when the (Calpha) RMSD is comparable. Also, in many cases where the number of aligned positions is larger or comparable, our learning method is able to achieve a smaller (Calpha) RMSD than the other methods tested. This trend of larger number of aligned positions and smaller (Calpha) RMSD is observed more frequently in cases where the similarity between protein structures is weak.

我们开发了一种反馈算法,用于两个蛋白质骨干之间的蛋白质结构比对。已经构建了一个实现这种方法的门户网站,可以在http://fpsa.cs.uno.edu/上免费使用,在http://fpsa.cs.panam.edu/FPSA/上有一个镜像站点。我们将我们的算法与其他三种常用的方法进行了比较:CE、DaliLite和SSM。结果表明,在大多数情况下,当(Calpha) RMSD可比较时,我们的算法输出更多的对齐位置。此外,在许多情况下,当对齐位置的数量较大或可比较时,我们的学习方法能够获得比其他测试方法更小的RMSD (Calpha)。在蛋白质结构之间的相似性较弱的情况下,更经常观察到这种排列位置数量较多和RMSD (Calpha)较小的趋势。
{"title":"Feedback algorithm and web-server for protein structure alignment.","authors":"Zhiyu Zhao,&nbsp;Bin Fu,&nbsp;Francisco J Alanis,&nbsp;Christopher M Summa","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We have developed a feedback algorithm for protein structure alignment between two protein backbones. A web portal implementing this method has been constructed and is freely available for use at http://fpsa.cs.uno.edu/ with a mirror site at http://fpsa.cs.panam.edu/FPSA/. We compare our algorithm with three other, commonly used methods: CE, DaliLite and SSM. The results show that in most cases our algorithm outputs a larger number of aligned positions when the (Calpha) RMSD is comparable. Also, in many cases where the number of aligned positions is larger or comparable, our learning method is able to achieve a smaller (Calpha) RMSD than the other methods tested. This trend of larger number of aligned positions and smaller (Calpha) RMSD is observed more frequently in cases where the similarity between protein structures is weak.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 ","pages":"109-20"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28337237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast and accurate multi-class protein fold recognition with spatial sample kernels. 利用空间样本核快速准确地识别多类蛋白质折叠。
Pavel Kuksa, Pai-Hsi Huang, Vladimir Pavlovic

Establishing structural or functional relationship between sequences, for instance to infer the structural class of an unannotated protein, is a key task in biological sequence analysis. Recent computational methods such as profile and neighborhood mismatch kernels have shown very promising results for protein sequence classification, at the cost of high computational complexity. In this study we address the multi-class sequence classification problems using a class of string-based kernels, the sparse spatial sample kernels (SSSK), that are both biologically motivated and efficient to compute. The proposed methods can work with very large databases of protein sequences and show substantial improvements in computing time over the existing methods. Application of the SSSK to the multi-class protein prediction problems (fold recognition and remote homology detection) yields significantly better performance than existing state-of-the-art algorithms.

建立序列之间的结构或功能关系,例如推断未注释蛋白质的结构类别,是生物序列分析的关键任务。最近的计算方法,如轮廓和邻域不匹配核,在蛋白质序列分类中显示出非常有希望的结果,但代价是计算复杂度很高。在本研究中,我们使用一类基于字符串的核,即稀疏空间样本核(SSSK)来解决多类序列分类问题,该核既具有生物动机又具有计算效率。所提出的方法可以处理非常大的蛋白质序列数据库,并且在计算时间上比现有方法有了实质性的改进。将SSSK应用于多类蛋白质预测问题(折叠识别和远程同源性检测)的性能明显优于现有的最先进算法。
{"title":"Fast and accurate multi-class protein fold recognition with spatial sample kernels.","authors":"Pavel Kuksa,&nbsp;Pai-Hsi Huang,&nbsp;Vladimir Pavlovic","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Establishing structural or functional relationship between sequences, for instance to infer the structural class of an unannotated protein, is a key task in biological sequence analysis. Recent computational methods such as profile and neighborhood mismatch kernels have shown very promising results for protein sequence classification, at the cost of high computational complexity. In this study we address the multi-class sequence classification problems using a class of string-based kernels, the sparse spatial sample kernels (SSSK), that are both biologically motivated and efficient to compute. The proposed methods can work with very large databases of protein sequences and show substantial improvements in computing time over the existing methods. Application of the SSSK to the multi-class protein prediction problems (fold recognition and remote homology detection) yields significantly better performance than existing state-of-the-art algorithms.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 ","pages":"133-43"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28337239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Combining sequence and structural profiles for protein solvent accessibility prediction. 结合序列和结构特征预测蛋白质溶剂可及性。
Rajkumar Bondugula, Dong Xu

Solvent accessibility is an important structural feature for a protein. We propose a new method for solvent accessibility prediction that uses known structure and sequence information more efficiently. We first estimate the relative solvent accessibility of the query protein using fuzzy mean operator from the solvent accessibilities of known structure fragments that have similar sequences to the query protein. We then integrate the estimated solvent accessibility and the position specific scoring matrix of the query protein using a neural network. We tested our method on a large data set consisting of 3386 non-redundant proteins. The comparison with other methods show slightly improved prediction accuracies with our method. The resulting system does need not be re-trained when new data is available. We incorporated our method into the MUPRED system, which is available as a web server at http://digbio.missouri.edu/mupred.

溶剂亲和性是蛋白质的重要结构特征。我们提出了一种利用已知结构和序列信息更有效地预测溶剂可及性的新方法。我们首先利用与查询蛋白序列相似的已知结构片段的溶剂可及性,利用模糊平均算子估计查询蛋白的相对溶剂可及性。然后,我们使用神经网络整合估计的溶剂可及性和查询蛋白的位置特定评分矩阵。我们在包含3386个非冗余蛋白的大型数据集上测试了我们的方法。与其他方法的比较表明,本文方法的预测精度略有提高。当有新的数据可用时,生成的系统不需要重新训练。我们将我们的方法合并到MUPRED系统中,该系统作为web服务器可在http://digbio.missouri.edu/mupred上获得。
{"title":"Combining sequence and structural profiles for protein solvent accessibility prediction.","authors":"Rajkumar Bondugula,&nbsp;Dong Xu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Solvent accessibility is an important structural feature for a protein. We propose a new method for solvent accessibility prediction that uses known structure and sequence information more efficiently. We first estimate the relative solvent accessibility of the query protein using fuzzy mean operator from the solvent accessibilities of known structure fragments that have similar sequences to the query protein. We then integrate the estimated solvent accessibility and the position specific scoring matrix of the query protein using a neural network. We tested our method on a large data set consisting of 3386 non-redundant proteins. The comparison with other methods show slightly improved prediction accuracies with our method. The resulting system does need not be re-trained when new data is available. We incorporated our method into the MUPRED system, which is available as a web server at http://digbio.missouri.edu/mupred.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 ","pages":"195-202"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2791713/pdf/nihms115358.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28337722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extensive exploration of conformational space improves Rosetta results for short protein domains. 对构象空间的广泛探索改善了罗塞塔对短蛋白质结构域的结果。
Yaohang Li, Andrew J Bordner, Yuan Tian, Xiuping Tao, Andrey A Gorin

With some simplifications, computational protein folding can be understood as an optimization problem of a potential energy function on a variable space consisting of all conformation for a given protein molecule. It is well known that realistic energy potentials are very "rough" functions, when expressed in the standard variables, and the folding trajectories can be easily trapped in multiple local minima. We have integrated our variation of Parallel Tempering optimization into the protein folding program Rosetta in order to improve its capability to overcome energy barriers and estimate how such improvement will influence the quality of the folded protein domains. Here we report that (1) Parallel Tempering Rosetta (PTR) is significantly better in the exploration of protein structures than previous implementations of the program; (2) systematic improvements are observed across a large benchmark set in the parameters that are normally followed to estimate robustness of the folding; (3) these improvements are most dramatic in the subset of the shortest domains, where high-quality structures have been obtained for >75% of all tested sequences. Further analysis of the results will improve our understanding of protein conformational space and lead to new improvements in the protein folding methodology, while the current PTR implementation should be very efficient for short (up to approximately 80 a.a.) protein domains and therefore may find practical application in system biology studies.

经过一些简化,计算蛋白质折叠可以被理解为由给定蛋白质分子的所有构象组成的可变空间上势能函数的优化问题。众所周知,当用标准变量表示时,真实的能量势是非常“粗糙”的函数,并且折叠轨迹很容易陷入多个局部极小值。我们将平行回火优化的变化整合到蛋白质折叠程序Rosetta中,以提高其克服能量障碍的能力,并估计这种改进将如何影响折叠蛋白质结构域的质量。在此,我们报告了(1)平行回火罗塞塔(PTR)在蛋白质结构的探索方面明显优于之前的程序实现;(2)在通常用于估计折叠鲁棒性的参数中,通过大型基准集观察到系统的改进;(3)这些改进在最短结构域的子集中最为显著,其中>75%的测试序列获得了高质量的结构。对结果的进一步分析将提高我们对蛋白质构象空间的理解,并导致蛋白质折叠方法的新改进,而目前的PTR实现对于短(高达约80 a.a)蛋白质结构域应该非常有效,因此可能在系统生物学研究中找到实际应用。
{"title":"Extensive exploration of conformational space improves Rosetta results for short protein domains.","authors":"Yaohang Li,&nbsp;Andrew J Bordner,&nbsp;Yuan Tian,&nbsp;Xiuping Tao,&nbsp;Andrey A Gorin","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>With some simplifications, computational protein folding can be understood as an optimization problem of a potential energy function on a variable space consisting of all conformation for a given protein molecule. It is well known that realistic energy potentials are very \"rough\" functions, when expressed in the standard variables, and the folding trajectories can be easily trapped in multiple local minima. We have integrated our variation of Parallel Tempering optimization into the protein folding program Rosetta in order to improve its capability to overcome energy barriers and estimate how such improvement will influence the quality of the folded protein domains. Here we report that (1) Parallel Tempering Rosetta (PTR) is significantly better in the exploration of protein structures than previous implementations of the program; (2) systematic improvements are observed across a large benchmark set in the parameters that are normally followed to estimate robustness of the folding; (3) these improvements are most dramatic in the subset of the shortest domains, where high-quality structures have been obtained for >75% of all tested sequences. Further analysis of the results will improve our understanding of protein conformational space and lead to new improvements in the protein folding methodology, while the current PTR implementation should be very efficient for short (up to approximately 80 a.a.) protein domains and therefore may find practical application in system biology studies.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 ","pages":"203-9"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28337723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scalable computation of kinship and identity coefficients on large pedigrees. 大型家系亲属关系和身份系数的可扩展计算。
E. Cheng, Brendan Elliott, Z. Ozsoyoglu
With the rapidly expanding field of medical genetics and genetic counseling, genealogy information is becoming increasingly abundant. An important computation on pedigree data is the calculation of identity coefficients, which provide a complete description of the degree of relatedness of a pair of individuals. The areas of application of identity coefficients are numerous and diverse, from genetic counseling to disease tracking, and thus, the computation of identity coefficients merits special attention. However, the computation of identity coefficients is not done directly, but rather as the final step after computing a set of generalized kinship coefficients. In this paper, we first propose a novel Path-Counting Formula for calculating generalized kinship coefficients, which is motivated by Wright's path-counting method for computing the inbreeding coefficient for an individual. We then present an efficient and scalable scheme for calculating generalized kinship coefficients on large pedigrees using NodeCodes, a special encoding scheme for expediting the evaluation of queries on pedigree graph structures. We also perform experiments for evaluating the efficiency of our method, and compare it with the performance of the traditional recursive algorithm for three individuals. Experimental results demonstrate that the resulting scheme is more scalable and efficient than the traditional recursive methods for computing generalized kinship coefficients.
随着医学遗传学和遗传咨询领域的迅速发展,家谱信息日益丰富。对系谱数据的一个重要计算是同一性系数的计算,它能完整地描述一对个体的亲缘程度。从遗传咨询到疾病跟踪,身份系数的应用领域非常广泛,因此,身份系数的计算值得特别关注。然而,身份系数的计算不是直接完成的,而是在计算一组广义亲属系数后的最后一步。本文首先提出了一种新的计算广义亲缘关系系数的路径计数公式,该公式受Wright计算个体近交系数的路径计数方法的启发。然后,我们提出了一种高效且可扩展的方案,用于使用NodeCodes计算大型谱系上的广义亲属系数,NodeCodes是一种特殊的编码方案,用于加快对谱系图结构查询的评估。我们还进行了实验来评估我们的方法的效率,并将其与传统的三个体递归算法的性能进行了比较。实验结果表明,该方法比传统的递归方法计算广义亲属系数具有更高的可扩展性和效率。
{"title":"Scalable computation of kinship and identity coefficients on large pedigrees.","authors":"E. Cheng, Brendan Elliott, Z. Ozsoyoglu","doi":"10.1142/9781848162648_0003","DOIUrl":"https://doi.org/10.1142/9781848162648_0003","url":null,"abstract":"With the rapidly expanding field of medical genetics and genetic counseling, genealogy information is becoming increasingly abundant. An important computation on pedigree data is the calculation of identity coefficients, which provide a complete description of the degree of relatedness of a pair of individuals. The areas of application of identity coefficients are numerous and diverse, from genetic counseling to disease tracking, and thus, the computation of identity coefficients merits special attention. However, the computation of identity coefficients is not done directly, but rather as the final step after computing a set of generalized kinship coefficients. In this paper, we first propose a novel Path-Counting Formula for calculating generalized kinship coefficients, which is motivated by Wright's path-counting method for computing the inbreeding coefficient for an individual. We then present an efficient and scalable scheme for calculating generalized kinship coefficients on large pedigrees using NodeCodes, a special encoding scheme for expediting the evaluation of queries on pedigree graph structures. We also perform experiments for evaluating the efficiency of our method, and compare it with the performance of the traditional recursive algorithm for three individuals. Experimental results demonstrate that the resulting scheme is more scalable and efficient than the traditional recursive methods for computing generalized kinship coefficients.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 1","pages":"27-36"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64001087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
The effect of massive gene loss following whole genome duplication on the algorithmic reconstruction of the ancestral populus diploid. 全基因组复制后大量基因丢失对杨树祖先二倍体算法重建的影响。
Chunfang Zheng
We improve on guided genome halving algorithms so that several thousand gene sets, each containing two paralogs in the descendant T of the doubling event and their single ortholog from an undoubled reference genome R, can be analyzed to reconstruct the ancestor A of T at the time of doubling. At the same time, large numbers of defective gene sets, either missing one paralog from T or missing their ortholog in R, may be incorporated into the analysis in a consistent way. We apply this genomic rearrangement distance-based approach to the recently sequenced poplar (Populus trichocarpa) and grapevine (Vitis vinifera) genomes, as T and R respectively.
我们改进了引导基因组减半算法,从而可以分析数千个基因集,每个基因集在加倍事件的后代T中包含两个相似物,并且它们的单一同源物来自未加倍的参考基因组R,从而重建T在加倍时的祖先A。同时,大量有缺陷的基因集,无论是T中缺少一个平行序列,还是R中缺少其同源序列,都可以以一致的方式纳入分析。我们将这种基于距离的基因组重排方法应用于最近测序的杨树(Populus trichocarpa)和葡萄藤(Vitis vinifera)基因组,分别作为T和R。
{"title":"The effect of massive gene loss following whole genome duplication on the algorithmic reconstruction of the ancestral populus diploid.","authors":"Chunfang Zheng","doi":"10.1142/9781848162648_0023","DOIUrl":"https://doi.org/10.1142/9781848162648_0023","url":null,"abstract":"We improve on guided genome halving algorithms so that several thousand gene sets, each containing two paralogs in the descendant T of the doubling event and their single ortholog from an undoubled reference genome R, can be analyzed to reconstruct the ancestor A of T at the time of doubling. At the same time, large numbers of defective gene sets, either missing one paralog from T or missing their ortholog in R, may be incorporated into the analysis in a consistent way. We apply this genomic rearrangement distance-based approach to the recently sequenced poplar (Populus trichocarpa) and grapevine (Vitis vinifera) genomes, as T and R respectively.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"7 1","pages":"261-71"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64003860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Method for effective virtual screening and scaffold-hopping in chemical compounds. 化合物的有效虚拟筛选和跳架方法。
Nikil Wale, G. Karypis, Ian A. Watson
Methods that can screen large databases to retrieve a structurally diverse set of compounds with desirable bioactivity properties are critical in the drug discovery and development process. This paper presents a set of such methods, which are designed to find compounds that are structurally different to a certain query compound while retaining its bioactivity properties (scaffold hops). These methods utilize various indirect ways of measuring the similarity between the query and a compound that take into account additional information beyond their structure-based similarities. Two sets of techniques are presented that capture these indirect similarities using approaches based on automatic relevance feedback and on analyzing the similarity network formed by the query and the database compounds. Experimental evaluation shows that many of these methods substantially outperform previously developed approaches both in terms of their ability to identify structurally diverse active compounds as well as active compounds in general.
筛选大型数据库以检索具有理想生物活性特性的结构多样化化合物的方法在药物发现和开发过程中至关重要。本文提出了一套这样的方法,旨在寻找结构上与特定查询化合物不同的化合物,同时保留其生物活性特性(支架啤酒花)。这些方法利用各种间接方法来测量查询和化合物之间的相似性,这些方法考虑了基于结构的相似性之外的其他信息。本文提出了两套技术,利用基于自动关联反馈的方法和基于分析由查询和数据库化合物形成的相似性网络的方法来捕获这些间接相似性。实验评估表明,这些方法中的许多方法在识别结构多样的活性化合物以及一般活性化合物的能力方面都大大优于先前开发的方法。
{"title":"Method for effective virtual screening and scaffold-hopping in chemical compounds.","authors":"Nikil Wale, G. Karypis, Ian A. Watson","doi":"10.1142/9781860948732_0041","DOIUrl":"https://doi.org/10.1142/9781860948732_0041","url":null,"abstract":"Methods that can screen large databases to retrieve a structurally diverse set of compounds with desirable bioactivity properties are critical in the drug discovery and development process. This paper presents a set of such methods, which are designed to find compounds that are structurally different to a certain query compound while retaining its bioactivity properties (scaffold hops). These methods utilize various indirect ways of measuring the similarity between the query and a compound that take into account additional information beyond their structure-based similarities. Two sets of techniques are presented that capture these indirect similarities using approaches based on automatic relevance feedback and on analyzing the similarity network formed by the query and the database compounds. Experimental evaluation shows that many of these methods substantially outperform previously developed approaches both in terms of their ability to identify structurally diverse active compounds as well as active compounds in general.","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":"6 1","pages":"403-14"},"PeriodicalIF":0.0,"publicationDate":"2007-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1142/9781860948732_0041","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64007669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks with Bayesian networks. 贝叶斯整合生物先验知识,用贝叶斯网络重构基因调控网络。
Dirk Husmeier, Adriano V Werhli

There have been various attempts to improve the reconstruction of gene regulatory networks from microarray data by the systematic integration of biological prior knowledge. Our approach is based on pioneering work by Imoto et al., where the prior knowledge is expressed in terms of energy functions, from which a prior distribution over network structures is obtained in the form of a Gibbs distribution. The hyperparameters of this distribution represent the weights associated with the prior knowledge relative to the data. To complement the work of Imoto et al., we have derived and tested an MCMC scheme for sampling networks and hyperparameters simultaneously from the posterior distribution. We have assessed the viability of this approach by reconstructing the RAF pathway from cytometry protein concentrations and prior knowledge from KEGG.

通过对生物先验知识的系统整合,已经有各种各样的尝试来改进从微阵列数据中重建基因调控网络。我们的方法基于Imoto等人的开创性工作,其中先验知识以能量函数表示,从中获得网络结构上的Gibbs分布形式的先验分布。该分布的超参数表示与相对于数据的先验知识相关的权重。为了补充Imoto等人的工作,我们从后验分布中推导并测试了同时用于采样网络和超参数的MCMC方案。我们通过利用细胞术蛋白浓度和KEGG的先验知识重建RAF通路,评估了这种方法的可行性。
{"title":"Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks with Bayesian networks.","authors":"Dirk Husmeier,&nbsp;Adriano V Werhli","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>There have been various attempts to improve the reconstruction of gene regulatory networks from microarray data by the systematic integration of biological prior knowledge. Our approach is based on pioneering work by Imoto et al., where the prior knowledge is expressed in terms of energy functions, from which a prior distribution over network structures is obtained in the form of a Gibbs distribution. The hyperparameters of this distribution represent the weights associated with the prior knowledge relative to the data. To complement the work of Imoto et al., we have derived and tested an MCMC scheme for sampling networks and hyperparameters simultaneously from the posterior distribution. We have assessed the viability of this approach by reconstructing the RAF pathway from cytometry protein concentrations and prior knowledge from KEGG.</p>","PeriodicalId":72665,"journal":{"name":"Computational systems bioinformatics. Computational Systems Bioinformatics Conference","volume":" ","pages":"85-95"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"27060669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational systems bioinformatics. Computational Systems Bioinformatics Conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1