首页 > 最新文献

Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )最新文献

英文 中文
Traversing the k-mer Landscape of NGS Read Datasets for Quality Score Sparsification. 遍历NGS读数据集的k-mer景观,用于质量分数稀疏化。
Y William Yu, Deniz Yorukoglu, Bonnie Berger

It is becoming increasingly impractical to indefinitely store raw sequencing data for later processing in an uncompressed state. In this paper, we describe a scalable compressive framework, Read-Quality-Sparsifier (RQS), which substantially outperforms the compression ratio and speed of other de novo quality score compression methods while maintaining SNP-calling accuracy. Surprisingly, RQS also improves the SNP-calling accuracy on a gold-standard, real-life sequencing dataset (NA12878) using a k-mer density profile constructed from 77 other individuals from the 1000 Genomes Project. This improvement in downstream accuracy emerges from the observation that quality score values within NGS datasets are inherently encoded in the k-mer landscape of the genomic sequences. To our knowledge, RQS is the first scalable sequence based quality compression method that can efficiently compress quality scores of terabyte-sized and larger sequencing datasets.

Availability: An implementation of our method, RQS, is available for download at: http://rqs.csail.mit.edu/.

无限期地存储原始测序数据以供以后在未压缩状态下进行处理变得越来越不切实际。在本文中,我们描述了一个可扩展的压缩框架,Read-Quality-Sparsifier (RQS),它在保持snp调用准确性的同时,大大优于其他全新的质量分数压缩方法的压缩比和速度。令人惊讶的是,RQS还提高了对黄金标准的真实测序数据集(NA12878)的snp调用准确性,该数据集使用了来自1000基因组计划的77个其他个体构建的k-mer密度谱。这种下游精度的提高来自于观察到NGS数据集中的质量得分值固有地编码在基因组序列的k-mer景观中。据我们所知,RQS是第一个可扩展的基于序列的质量压缩方法,可以有效地压缩tb大小和更大的测序数据集的质量分数。可用性:我们的方法RQS的实现可在http://rqs.csail.mit.edu/下载。
{"title":"Traversing the <i>k</i>-mer Landscape of NGS Read Datasets for Quality Score Sparsification.","authors":"Y William Yu,&nbsp;Deniz Yorukoglu,&nbsp;Bonnie Berger","doi":"10.1007/978-3-319-05269-4_31","DOIUrl":"https://doi.org/10.1007/978-3-319-05269-4_31","url":null,"abstract":"<p><p>It is becoming increasingly impractical to indefinitely store raw sequencing data for later processing in an uncompressed state. In this paper, we describe a scalable compressive framework, Read-Quality-Sparsifier (RQS), which substantially outperforms the compression ratio and speed of other de novo quality score compression methods while maintaining SNP-calling accuracy. Surprisingly, RQS also improves the SNP-calling accuracy on a gold-standard, real-life sequencing dataset (NA12878) using a <i>k</i>-mer density profile constructed from 77 other individuals from the 1000 Genomes Project. This improvement in downstream accuracy emerges from the observation that quality score values within NGS datasets are inherently encoded in the <i>k</i>-mer landscape of the genomic sequences. To our knowledge, RQS is the first scalable sequence based quality compression method that can efficiently compress quality scores of terabyte-sized and larger sequencing datasets.</p><p><strong>Availability: </strong>An implementation of our method, RQS, is available for download at: http://rqs.csail.mit.edu/.</p>","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"8394 ","pages":"385-399"},"PeriodicalIF":0.0,"publicationDate":"2014-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/978-3-319-05269-4_31","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35427238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
AptaCluster - A Method to Cluster HT-SELEX Aptamer Pools and Lessons from its Application. 一种聚类HT-SELEX适体池的方法及其应用的教训。
Jan Hoinka, Alexey Berezhnoy, Zuben E Sauna, Eli Gilboa, Teresa M Przytycka

Systematic Evolution of Ligands by EXponential Enrichment (SELEX) is a well established experimental procedure to identify aptamers - synthetic single-stranded (ribo)nucleic molecules that bind to a given molecular target. Recently, new sequencing technologies have revolutionized the SELEX protocol by allowing for deep sequencing of the selection pools after each cycle. The emergence of High Throughput SELEX (HT-SELEX) has opened the field to new computational opportunities and challenges that are yet to be addressed. To aid the analysis of the results of HT-SELEX and to advance the understanding of the selection process itself, we developed AptaCluster. This algorithm allows for an efficient clustering of whole HT-SELEX aptamer pools; a task that could not be accomplished with traditional clustering algorithms due to the enormous size of such datasets. We performed HT-SELEX with Interleukin 10 receptor alpha chain (IL-10RA) as the target molecule and used AptaCluster to analyze the resulting sequences. AptaCluster allowed for the first survey of the relationships between sequences in different selection rounds and revealed previously not appreciated properties of the SELEX protocol. As the first tool of this kind, AptaCluster enables novel ways to analyze and to optimize the HT-SELEX procedure. Our AptaCluster algorithm is available as a very fast multiprocessor implementation upon request.

配体的指数富集系统进化(SELEX)是一种成熟的实验程序,用于鉴定适配体-与给定分子靶标结合的合成单链(核糖)核分子。最近,新的测序技术通过允许在每个周期后对选择池进行深度测序,彻底改变了SELEX协议。高通量SELEX (HT-SELEX)的出现为尚未解决的新的计算机会和挑战开辟了领域。为了帮助分析HT-SELEX的结果并促进对选择过程本身的理解,我们开发了AptaCluster。该算法允许整个HT-SELEX适体池的高效聚类;由于此类数据集的巨大规模,传统的聚类算法无法完成这一任务。我们以白细胞介素10受体α链(IL-10RA)为靶分子进行HT-SELEX,并使用AptaCluster对结果序列进行分析。AptaCluster允许对不同选择轮中序列之间的关系进行第一次调查,并揭示了SELEX协议以前不了解的属性。作为此类工具的第一个,AptaCluster提供了分析和优化HT-SELEX过程的新方法。我们的AptaCluster算法可以根据请求作为非常快速的多处理器实现。
{"title":"AptaCluster - A Method to Cluster HT-SELEX Aptamer Pools and Lessons from its Application.","authors":"Jan Hoinka,&nbsp;Alexey Berezhnoy,&nbsp;Zuben E Sauna,&nbsp;Eli Gilboa,&nbsp;Teresa M Przytycka","doi":"10.1007/978-3-319-05269-4_9","DOIUrl":"https://doi.org/10.1007/978-3-319-05269-4_9","url":null,"abstract":"<p><p>Systematic Evolution of Ligands by EXponential Enrichment (SELEX) is a well established experimental procedure to identify aptamers - synthetic single-stranded (ribo)nucleic molecules that bind to a given molecular target. Recently, new sequencing technologies have revolutionized the SELEX protocol by allowing for deep sequencing of the selection pools after each cycle. The emergence of High Throughput SELEX (HT-SELEX) has opened the field to new computational opportunities and challenges that are yet to be addressed. To aid the analysis of the results of HT-SELEX and to advance the understanding of the selection process itself, we developed AptaCluster. This algorithm allows for an efficient clustering of whole HT-SELEX aptamer pools; a task that could not be accomplished with traditional clustering algorithms due to the enormous size of such datasets. We performed HT-SELEX with Interleukin 10 receptor alpha chain (IL-10RA) as the target molecule and used AptaCluster to analyze the resulting sequences. AptaCluster allowed for the first survey of the relationships between sequences in different selection rounds and revealed previously not appreciated properties of the SELEX protocol. As the first tool of this kind, AptaCluster enables novel ways to analyze and to optimize the HT-SELEX procedure. Our AptaCluster algorithm is available as a very fast multiprocessor implementation upon request.</p>","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"8394 ","pages":"115-128"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/978-3-319-05269-4_9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32949306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 66
Learning Sequence Determinants of Protein:protein Interaction Specificity with Sparse Graphical Models. 蛋白质的学习序列决定因素:与稀疏图形模型的蛋白质相互作用特异性。
Hetunandan Kamisetty, Bornika Ghosh, Christopher James Langmead, Chris Bailey-Kellogg

In studying the strength and specificity of interaction between members of two protein families, key questions center on which pairs of possible partners actually interact, how well they interact, and why they interact while others do not. The advent of large-scale experimental studies of interactions between members of a target family and a diverse set of possible interaction partners offers the opportunity to address these questions. We develop here a method, DgSpi (Data-driven Graphical models of Specificity in Protein:protein Interactions), for learning and using graphical models that explicitly represent the amino acid basis for interaction specificity (why) and extend earlier classification-oriented approaches (which) to predict the ΔG of binding (how well). We demonstrate the effectiveness of our approach in analyzing and predicting interactions between a set of 82 PDZ recognition modules, against a panel of 217 possible peptide partners, based on data from MacBeath and colleagues. Our predicted ΔG values are highly predictive of the experimentally measured ones, reaching correlation coefficients of 0.69 in 10-fold cross-validation and 0.63 in leave-one-PDZ-out cross-validation. Furthermore, the model serves as a compact representation of amino acid constraints underlying the interactions, enabling protein-level ΔG predictions to be naturally understood in terms of residue-level constraints. Finally, as a generative model, DgSpi readily enables the design of new interacting partners, and we demonstrate that designed ligands are novel and diverse.

在研究两个蛋白质家族成员之间相互作用的强度和特异性时,关键问题集中在哪些可能的伴侣实际上相互作用,它们相互作用的程度如何,以及为什么它们相互作用而另一些则不相互作用。针对目标家庭成员与各种可能的互动伙伴之间的互动进行的大规模实验研究的出现,为解决这些问题提供了机会。我们在这里开发了一种方法,DgSpi(数据驱动的蛋白质特异性图形模型:蛋白质相互作用),用于学习和使用明确表示相互作用特异性的氨基酸基础的图形模型(为什么),并扩展早期的面向分类的方法(哪)来预测结合的ΔG(如何好)。基于MacBeath及其同事的数据,我们证明了我们的方法在分析和预测一组82个PDZ识别模块与217个可能的肽伙伴之间的相互作用方面的有效性。我们预测的ΔG值与实验测量值具有很高的预测性,10倍交叉验证的相关系数为0.69,留一pdz交叉验证的相关系数为0.63。此外,该模型作为相互作用下氨基酸约束的紧凑表示,使蛋白质水平ΔG预测能够根据残基水平约束自然地理解。最后,作为一个生成模型,DgSpi很容易实现新的相互作用伙伴的设计,并且我们证明了设计的配体是新颖和多样化的。
{"title":"Learning Sequence Determinants of Protein:protein Interaction Specificity with Sparse Graphical Models.","authors":"Hetunandan Kamisetty,&nbsp;Bornika Ghosh,&nbsp;Christopher James Langmead,&nbsp;Chris Bailey-Kellogg","doi":"10.1007/978-3-319-05269-4_10","DOIUrl":"https://doi.org/10.1007/978-3-319-05269-4_10","url":null,"abstract":"<p><p>In studying the strength and specificity of interaction between members of two protein families, key questions center on <i>which</i> pairs of possible partners actually interact, <i>how well</i> they interact, and <i>why</i> they interact while others do not. The advent of large-scale experimental studies of interactions between members of a target family and a diverse set of possible interaction partners offers the opportunity to address these questions. We develop here a method, DgSpi (Data-driven Graphical models of Specificity in Protein:protein Interactions), for learning and using graphical models that explicitly represent the amino acid basis for interaction specificity (<i>why</i>) and extend earlier classification-oriented approaches (<i>which</i>) to predict the Δ<i>G</i> of binding (<i>how well</i>). We demonstrate the effectiveness of our approach in analyzing and predicting interactions between a set of 82 PDZ recognition modules, against a panel of 217 possible peptide partners, based on data from MacBeath and colleagues. Our predicted Δ<i>G</i> values are highly predictive of the experimentally measured ones, reaching correlation coefficients of 0.69 in 10-fold cross-validation and 0.63 in leave-one-PDZ-out cross-validation. Furthermore, the model serves as a compact representation of amino acid constraints underlying the interactions, enabling protein-level Δ<i>G</i> predictions to be naturally understood in terms of residue-level constraints. Finally, as a generative model, DgSpi readily enables the design of new interacting partners, and we demonstrate that designed ligands are novel and diverse.</p>","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"8394 ","pages":"129-143"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/978-3-319-05269-4_10","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32830491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Decoding coalescent hidden Markov models in linear time. 在线性时间内解码聚结隐马尔可夫模型。
Kelley Harris, Sara Sheehan, John A Kamm, Yun S Song

In many areas of computational biology, hidden Markov models (HMMs) have been used to model local genomic features. In particular, coalescent HMMs have been used to infer ancient population sizes, migration rates, divergence times, and other parameters such as mutation and recombination rates. As more loci, sequences, and hidden states are added to the model, however, the runtime of coalescent HMMs can quickly become prohibitive. Here we present a new algorithm for reducing the runtime of coalescent HMMs from quadratic in the number of hidden time states to linear, without making any additional approximations. Our algorithm can be incorporated into various coalescent HMMs, including the popular method PSMC for inferring variable effective population sizes. Here we implement this algorithm to speed up our demographic inference method diCal, which is equivalent to PSMC when applied to a sample of two haplotypes. We demonstrate that the linear-time method can reconstruct a population size change history more accurately than the quadratic-time method, given similar computation resources. We also apply the method to data from the 1000 Genomes project, inferring a high-resolution history of size changes in the European population.

在计算生物学的许多领域,隐马尔可夫模型(hmm)已被用于模拟局部基因组特征。特别是,聚结hmm已被用于推断古代种群规模、迁移率、分化时间和其他参数,如突变和重组率。然而,随着更多的基因座、序列和隐藏状态被添加到模型中,聚合hmm的运行时间很快就会变得令人望而却步。在此,我们提出了一种新的算法,在不做任何额外近似的情况下,将合并hmm的运行时间从隐藏时间状态数量的二次型减少到线性型。我们的算法可以应用到各种聚结hmm中,包括常用的PSMC方法来推断可变有效种群大小。在这里,我们实现了这个算法来加速我们的人口统计推断方法dicc,当应用于两个单倍型的样本时,它相当于PSMC。结果表明,在计算资源相似的情况下,线性时间方法比二次时间方法能更准确地重建种群规模变化历史。我们还将该方法应用于1000基因组计划的数据,推断出欧洲人口规模变化的高分辨率历史。
{"title":"Decoding coalescent hidden Markov models in linear time.","authors":"Kelley Harris,&nbsp;Sara Sheehan,&nbsp;John A Kamm,&nbsp;Yun S Song","doi":"10.1007/978-3-319-05269-4_8","DOIUrl":"https://doi.org/10.1007/978-3-319-05269-4_8","url":null,"abstract":"<p><p>In many areas of computational biology, hidden Markov models (HMMs) have been used to model local genomic features. In particular, coalescent HMMs have been used to infer ancient population sizes, migration rates, divergence times, and other parameters such as mutation and recombination rates. As more loci, sequences, and hidden states are added to the model, however, the runtime of coalescent HMMs can quickly become prohibitive. Here we present a new algorithm for reducing the runtime of coalescent HMMs from quadratic in the number of hidden time states to linear, without making any additional approximations. Our algorithm can be incorporated into various coalescent HMMs, including the popular method PSMC for inferring variable effective population sizes. Here we implement this algorithm to speed up our demographic inference method diCal, which is equivalent to PSMC when applied to a sample of two haplotypes. We demonstrate that the linear-time method can reconstruct a population size change history more accurately than the quadratic-time method, given similar computation resources. We also apply the method to data from the 1000 Genomes project, inferring a high-resolution history of size changes in the European population.</p>","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"8394 ","pages":"100-114"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/978-3-319-05269-4_8","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32766513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Fast and Accurate Calculation of Protein Depth by Euclidean Distance Transform. 欧几里得距离变换快速准确地计算蛋白质深度。
Dong Xu, Hua Li, Yang Zhang

The depth of each atom/residue in a protein structure is a key attribution that has been widely used in protein structure modeling and function annotation. However, the accurate calculation of depth is time consuming. Here, we propose to use the Euclidean distance transform (EDT) to calculate the depth, which conveniently converts the protein structure to a 3D gray-scale image with each pixel labeling the minimum distance of the pixel to the surface of the molecule (i.e. the depth). We tested the proposed EDT method on a set of 261 non-redundant protein structures. The data show that the EDT method is 2.6 times faster than the widely used method by Chakravarty and Varadarajan. The depth value by EDT method is also highly accurate, which is almost identical to the depth calculated by exhaustive search (Pearson's correlation coefficient≈1). We believe the EDT-based depth calculation program can be used as an efficient tool to assist the studies of protein fold recognition and structure-based function annotation.

蛋白质结构中每个原子/残基的深度是一个关键属性,已广泛应用于蛋白质结构建模和功能标注。然而,精确计算深度是费时的。在这里,我们提出使用欧几里得距离变换(EDT)来计算深度,它方便地将蛋白质结构转换为三维灰度图像,每个像素标记像素到分子表面的最小距离(即深度)。我们在261个非冗余蛋白结构上测试了所提出的EDT方法。数据显示,EDT方法比Chakravarty和Varadarajan广泛使用的方法快2.6倍。EDT法的深度值精度也很高,与穷穷搜索法计算的深度几乎相同(Pearson相关系数≈1)。我们相信基于edd的深度计算程序可以作为辅助蛋白质折叠识别和基于结构的功能标注研究的有效工具。
{"title":"Fast and Accurate Calculation of Protein Depth by Euclidean Distance Transform.","authors":"Dong Xu,&nbsp;Hua Li,&nbsp;Yang Zhang","doi":"10.1007/978-3-642-37195-0_30","DOIUrl":"https://doi.org/10.1007/978-3-642-37195-0_30","url":null,"abstract":"<p><p>The depth of each atom/residue in a protein structure is a key attribution that has been widely used in protein structure modeling and function annotation. However, the accurate calculation of depth is time consuming. Here, we propose to use the Euclidean distance transform (EDT) to calculate the depth, which conveniently converts the protein structure to a 3D gray-scale image with each pixel labeling the minimum distance of the pixel to the surface of the molecule (i.e. the depth). We tested the proposed EDT method on a set of 261 non-redundant protein structures. The data show that the EDT method is 2.6 times faster than the widely used method by Chakravarty and Varadarajan. The depth value by EDT method is also highly accurate, which is almost identical to the depth calculated by exhaustive search (Pearson's correlation coefficient≈1). We believe the EDT-based depth calculation program can be used as an efficient tool to assist the studies of protein fold recognition and structure-based function annotation.</p>","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"7821 ","pages":"304-316"},"PeriodicalIF":0.0,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4098708/pdf/nihms592637.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32513516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Combinatorics of Genome Rearrangements 基因组重排组合学
G. Fertin, A. Labarre, I. Rusu, Éric Tannier, Stéphane Vialette
From one cell to another, from one individual to another, and from one species to another, the content of DNA molecules is often similar. The organization of these molecules, however, differs dramatically, and the mutations that affect this organization are known as genome rearrangements. Combinatorial methods are used to reconstruct putative rearrangement scenarios in order to explain the evolutionary history of a set of species, often formalizing the evolutionary events that can explain the multiple combinations of observed genomes as combinatorial optimization problems. This book offers the first comprehensive survey of this rapidly expanding application of combinatorial optimization. It can be used as a reference for experienced researchers or as an introductory text for a broader audience. Genome rearrangement problems have proved so interesting from a combinatorial point of view that the field now belongs as much to mathematics as to biology. This book takes a mathematically oriented approach, but provides biological background when necessary. It presents a series of models, beginning with the simplest (which is progressively extended by dropping restrictions), each constructing a genome rearrangement problem. The book also discusses an important generalization of the basic problem known as the median problem, surveys attempts to reconstruct the relationships between genomes with phylogenetic trees, and offers a collection of summaries and appendixes with useful additional information. Computational Molecular Biology series
从一个细胞到另一个细胞,从一个个体到另一个个体,从一个物种到另一个物种,DNA分子的含量通常是相似的。然而,这些分子的组织却截然不同,影响这种组织的突变被称为基因组重排。组合方法用于重建假定的重排场景,以解释一组物种的进化史,通常将可以解释观察到的基因组的多个组合的进化事件形式化为组合优化问题。这本书提供了这种快速扩展的应用组合优化的第一个全面的调查。它可以作为有经验的研究人员的参考,也可以作为更广泛读者的介绍性文本。从组合的角度来看,基因组重排问题已经被证明是如此有趣,以至于这个领域现在既属于生物学,也属于数学。这本书以数学为导向,但必要时提供了生物学背景。它提出了一系列模型,从最简单的模型开始(通过取消限制逐步扩展),每个模型都构建了一个基因组重排问题。这本书还讨论了一个重要的概括的基本问题,被称为中位数问题,调查试图重建基因组与系统发育树之间的关系,并提供了总结和附录与有用的附加信息的集合。计算分子生物学系列
{"title":"Combinatorics of Genome Rearrangements","authors":"G. Fertin, A. Labarre, I. Rusu, Éric Tannier, Stéphane Vialette","doi":"10.7551/mitpress/9780262062824.001.0001","DOIUrl":"https://doi.org/10.7551/mitpress/9780262062824.001.0001","url":null,"abstract":"From one cell to another, from one individual to another, and from one species to another, the content of DNA molecules is often similar. The organization of these molecules, however, differs dramatically, and the mutations that affect this organization are known as genome rearrangements. Combinatorial methods are used to reconstruct putative rearrangement scenarios in order to explain the evolutionary history of a set of species, often formalizing the evolutionary events that can explain the multiple combinations of observed genomes as combinatorial optimization problems. This book offers the first comprehensive survey of this rapidly expanding application of combinatorial optimization. It can be used as a reference for experienced researchers or as an introductory text for a broader audience. Genome rearrangement problems have proved so interesting from a combinatorial point of view that the field now belongs as much to mathematics as to biology. This book takes a mathematically oriented approach, but provides biological background when necessary. It presents a series of models, beginning with the simplest (which is progressively extended by dropping restrictions), each constructing a genome rearrangement problem. The book also discusses an important generalization of the basic problem known as the median problem, surveys attempts to reconstruct the relationships between genomes with phylogenetic trees, and offers a collection of summaries and appendixes with useful additional information. Computational Molecular Biology series","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"28 1","pages":"I-XI, 1-288"},"PeriodicalIF":0.0,"publicationDate":"2009-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84613799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 326
Boosting Protein Threading Accuracy. 提高蛋白质穿线精度。
Jian Peng, Jinbo Xu

Protein threading is one of the most successful protein structure prediction methods. Most protein threading methods use a scoring function linearly combining sequence and structure features to measure the quality of a sequence-template alignment so that a dynamic programming algorithm can be used to optimize the scoring function. However, a linear scoring function cannot fully exploit interdependency among features and thus, limits alignment accuracy.This paper presents a nonlinear scoring function for protein threading, which not only can model interactions among different protein features, but also can be efficiently optimized using a dynamic programming algorithm. We achieve this by modeling the threading problem using a probabilistic graphical model Conditional Random Fields (CRF) and training the model using the gradient tree boosting algorithm. The resultant model is a nonlinear scoring function consisting of a collection of regression trees. Each regression tree models a type of nonlinear relationship among sequence and structure features. Experimental results indicate that this new threading model can effectively leverage weak biological signals and improve both alignment accuracy and fold recognition rate greatly.

蛋白质穿线是目前最成功的蛋白质结构预测方法之一。大多数蛋白质穿线方法使用序列和结构特征线性结合的评分函数来衡量序列-模板比对的质量,因此可以使用动态规划算法来优化评分函数。然而,线性评分函数不能充分利用特征之间的相互依赖性,因此限制了对齐精度。本文提出了一种用于蛋白质线程的非线性评分函数,该函数不仅可以模拟不同蛋白质特征之间的相互作用,而且可以使用动态规划算法进行有效的优化。我们通过使用概率图形模型条件随机场(CRF)建模线程问题并使用梯度树增强算法训练模型来实现这一点。所得模型是一个由回归树集合组成的非线性评分函数。每个回归树都对序列和结构特征之间的一种非线性关系进行建模。实验结果表明,该模型可以有效地利用微弱的生物信号,大大提高对线精度和折痕识别率。
{"title":"Boosting Protein Threading Accuracy.","authors":"Jian Peng,&nbsp;Jinbo Xu","doi":"10.1007/978-3-642-02008-7_3","DOIUrl":"https://doi.org/10.1007/978-3-642-02008-7_3","url":null,"abstract":"<p><p>Protein threading is one of the most successful protein structure prediction methods. Most protein threading methods use a scoring function linearly combining sequence and structure features to measure the quality of a sequence-template alignment so that a dynamic programming algorithm can be used to optimize the scoring function. However, a linear scoring function cannot fully exploit interdependency among features and thus, limits alignment accuracy.This paper presents a nonlinear scoring function for protein threading, which not only can model interactions among different protein features, but also can be efficiently optimized using a dynamic programming algorithm. We achieve this by modeling the threading problem using a probabilistic graphical model Conditional Random Fields (CRF) and training the model using the gradient tree boosting algorithm. The resultant model is a nonlinear scoring function consisting of a collection of regression trees. Each regression tree models a type of nonlinear relationship among sequence and structure features. Experimental results indicate that this new threading model can effectively leverage weak biological signals and improve both alignment accuracy and fold recognition rate greatly.</p>","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"5541 ","pages":"31-45"},"PeriodicalIF":0.0,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/978-3-642-02008-7_3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30576321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 69
A Probabilistic Graphical Model for Ab Initio Folding. 从头开始折叠的概率图模型。
Feng Zhao, Jian Peng, Joe Debartolo, Karl F Freed, Tobin R Sosnick, Jinbo Xu

Despite significant progress in recent years, ab initio folding is still one of the most challenging problems in structural biology. This paper presents a probabilistic graphical model for ab initio folding, which employs Conditional Random Fields (CRFs) and directional statistics to model the relationship between the primary sequence of a protein and its three-dimensional structure. Different from the widely-used fragment assembly method and the lattice model for protein folding, our graphical model can explore protein conformations in a continuous space according to their probability. The probability of a protein conformation reflects its stability and is estimated from PSI-BLAST sequence profile and predicted secondary structure. Experimental results indicate that this new method compares favorably with the fragment assembly method and the lattice model.

尽管近年来取得了重大进展,从头算折叠仍然是结构生物学中最具挑战性的问题之一。本文提出了一种从头开始折叠的概率图形模型,该模型利用条件随机场(CRFs)和方向统计来模拟蛋白质一级序列与其三维结构之间的关系。与目前广泛使用的片段组装方法和蛋白质折叠的晶格模型不同,我们的图形模型可以根据蛋白质构象的概率在连续空间中探索蛋白质构象。蛋白质构象的概率反映了其稳定性,并通过PSI-BLAST序列剖面和预测的二级结构来估计。实验结果表明,该方法与片段组装法和点阵模型相比具有较好的优越性。
{"title":"A Probabilistic Graphical Model for Ab Initio Folding.","authors":"Feng Zhao,&nbsp;Jian Peng,&nbsp;Joe Debartolo,&nbsp;Karl F Freed,&nbsp;Tobin R Sosnick,&nbsp;Jinbo Xu","doi":"10.1007/978-3-642-02008-7_5","DOIUrl":"https://doi.org/10.1007/978-3-642-02008-7_5","url":null,"abstract":"<p><p>Despite significant progress in recent years, <i>ab initio</i> folding is still one of the most challenging problems in structural biology. This paper presents a probabilistic graphical model for ab initio folding, which employs Conditional Random Fields (CRFs) and directional statistics to model the relationship between the primary sequence of a protein and its three-dimensional structure. Different from the widely-used fragment assembly method and the lattice model for protein folding, our graphical model can explore protein conformations in a continuous space according to their probability. The probability of a protein conformation reflects its stability and is estimated from PSI-BLAST sequence profile and predicted secondary structure. Experimental results indicate that this new method compares favorably with the fragment assembly method and the lattice model.</p>","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"5541 ","pages":"59-73"},"PeriodicalIF":0.0,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/978-3-642-02008-7_5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31281362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Structural Alignment of Pseudoknotted RNA 假结RNA的结构定位
Banu Dost, B. Han, Shaojie Zhang, V. Bafna
{"title":"Structural Alignment of Pseudoknotted RNA","authors":"Banu Dost, B. Han, Shaojie Zhang, V. Bafna","doi":"10.1007/11732990_13","DOIUrl":"https://doi.org/10.1007/11732990_13","url":null,"abstract":"","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"75 1","pages":"143 - 158"},"PeriodicalIF":0.0,"publicationDate":"2006-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86378525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
Immunological bioinformatics 免疫生物信息学
O. Lund, M. Nielsen, C. Lundegaard, C. Keşmir, S. Brunak
{"title":"Immunological bioinformatics","authors":"O. Lund, M. Nielsen, C. Lundegaard, C. Keşmir, S. Brunak","doi":"10.7551/mitpress/3679.001.0001","DOIUrl":"https://doi.org/10.7551/mitpress/3679.001.0001","url":null,"abstract":"","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"11 1","pages":"I-XII, 1-296"},"PeriodicalIF":0.0,"publicationDate":"2005-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79254056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 61
期刊
Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1