首页 > 最新文献

Current protocols in bioinformatics最新文献

英文 中文
Network-Based Approaches for Pathway Level Analysis 基于网络的路径级分析方法
Q1 Biochemistry, Genetics and Molecular Biology Pub Date : 2018-04-09 DOI: 10.1002/cpbi.42
Tin Nguyen, Cristina Mitrea, Sorin Draghici

Identification of impacted pathways is an important problem because it allows us to gain insights into the underlying biology beyond the detection of differentially expressed genes. In the past decade, a plethora of methods have been developed for this purpose. The last generation of pathway analysis methods are designed to take into account various aspects of pathway topology in order to increase the accuracy of the findings. Here, we cover 34 such topology-based pathway analysis methods published in the past 13 years. We compare these methods on categories related to implementation, availability, input format, graph models, and statistical approaches used to compute pathway level statistics and statistical significance. We also discuss a number of critical challenges that need to be addressed, arising both in methodology and pathway representation, including inconsistent terminology, data format, lack of meaningful benchmarks, and, more importantly, a systematic bias that is present in most existing methods. © 2018 by John Wiley & Sons, Inc.

识别受影响的途径是一个重要的问题,因为它使我们能够深入了解潜在的生物学,而不仅仅是检测差异表达的基因。在过去的十年中,为此目的开发了大量的方法。最后一代路径分析方法被设计为考虑到路径拓扑的各个方面,以提高结果的准确性。在这里,我们涵盖了过去13年中发表的34种基于拓扑的路径分析方法。我们从实现、可用性、输入格式、图形模型和用于计算路径级统计和统计显著性的统计方法等方面对这些方法进行了比较。我们还讨论了一些需要解决的关键挑战,这些挑战来自方法论和路径表示,包括不一致的术语、数据格式、缺乏有意义的基准,更重要的是,大多数现有方法中存在的系统偏差。©2018 by John Wiley &儿子,Inc。
{"title":"Network-Based Approaches for Pathway Level Analysis","authors":"Tin Nguyen,&nbsp;Cristina Mitrea,&nbsp;Sorin Draghici","doi":"10.1002/cpbi.42","DOIUrl":"10.1002/cpbi.42","url":null,"abstract":"<p>Identification of impacted pathways is an important problem because it allows us to gain insights into the underlying biology beyond the detection of differentially expressed genes. In the past decade, a plethora of methods have been developed for this purpose. The last generation of pathway analysis methods are designed to take into account various aspects of pathway topology in order to increase the accuracy of the findings. Here, we cover 34 such topology-based pathway analysis methods published in the past 13 years. We compare these methods on categories related to implementation, availability, input format, graph models, and statistical approaches used to compute pathway level statistics and statistical significance. We also discuss a number of critical challenges that need to be addressed, arising both in methodology and pathway representation, including inconsistent terminology, data format, lack of meaningful benchmarks, and, more importantly, a systematic bias that is present in most existing methods. © 2018 by John Wiley &amp; Sons, Inc.</p>","PeriodicalId":10958,"journal":{"name":"Current protocols in bioinformatics","volume":"61 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/cpbi.42","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36339898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Searching ECOD for Homologous Domains by Sequence and Structure 利用序列和结构搜索ECOD的同源结构域
Q1 Biochemistry, Genetics and Molecular Biology Pub Date : 2018-04-09 DOI: 10.1002/cpbi.45
R. Dustin Schaeffer, Yuxing Liao, Nick V. Grishin

ECOD is a database of evolutionary domains from structures deposited in the PDB. Domains in ECOD are classified by a mixed manual/automatic method wherein the bulk of newly deposited structures are classified automatically by protein-protein BLAST. Those structures that cannot be classified automatically are referred to manual curators who use a combination of alignment results, functional analysis, and close reading of the literature to generate novel assignments. ECOD differs from other structural domain resources in that it is continually updated, classifying thousands of proteins per week. ECOD recognizes homology as its key organizing concept, rather than structural or sequence similarity alone. Such a classification scheme provides functional information about proteins of interest by placing them in the correct evolutionary context among all proteins of known structure. This unit demonstrates how to access ECOD via the Web and how to search the database by sequence or structure. It also details the distributable data files available for large-scale bioinformatics users. © 2018 by John Wiley & Sons, Inc.

ECOD是一个由沉积在PDB中的结构组成的演化域数据库。ECOD中的结构域通过混合手动/自动方法进行分类,其中大部分新沉积的结构由蛋白质-蛋白质BLAST自动分类。那些不能自动分类的结构被引用到手动管理员,他们使用对齐结果、功能分析和仔细阅读文献的组合来生成新的任务。ECOD与其他结构域资源的不同之处在于它不断更新,每周对数千种蛋白质进行分类。ECOD承认同源性是其关键的组织概念,而不仅仅是结构或序列相似性。这种分类方案通过将感兴趣的蛋白质放在所有已知结构的蛋白质中正确的进化背景中,提供了有关蛋白质的功能信息。本单元演示如何通过Web访问ECOD,以及如何按序列或结构搜索数据库。它还详细介绍了大规模生物信息学用户可用的可分发数据文件。©2018 by John Wiley &儿子,Inc。
{"title":"Searching ECOD for Homologous Domains by Sequence and Structure","authors":"R. Dustin Schaeffer,&nbsp;Yuxing Liao,&nbsp;Nick V. Grishin","doi":"10.1002/cpbi.45","DOIUrl":"10.1002/cpbi.45","url":null,"abstract":"<p>ECOD is a database of evolutionary domains from structures deposited in the PDB. Domains in ECOD are classified by a mixed manual/automatic method wherein the bulk of newly deposited structures are classified automatically by protein-protein BLAST. Those structures that cannot be classified automatically are referred to manual curators who use a combination of alignment results, functional analysis, and close reading of the literature to generate novel assignments. ECOD differs from other structural domain resources in that it is continually updated, classifying thousands of proteins per week. ECOD recognizes homology as its key organizing concept, rather than structural or sequence similarity alone. Such a classification scheme provides functional information about proteins of interest by placing them in the correct evolutionary context among all proteins of known structure. This unit demonstrates how to access ECOD via the Web and how to search the database by sequence or structure. It also details the distributable data files available for large-scale bioinformatics users. © 2018 by John Wiley &amp; Sons, Inc.</p>","PeriodicalId":10958,"journal":{"name":"Current protocols in bioinformatics","volume":"61 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/cpbi.45","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36340323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Leveraging Experimental Details for an Improved Understanding of Host-Pathogen Interactome 利用实验细节来提高对宿主-病原体相互作用组的理解
Q1 Biochemistry, Genetics and Molecular Biology Pub Date : 2018-04-09 DOI: 10.1002/cpbi.44
Mais Ammari, Fiona McCarthy, Bindu Nanduri

An increasing proportion of curated host-pathogen interaction (HPI) information is becoming available in interaction databases. These data represent detailed, experimentally-verified, molecular interaction data, which may be used to better understand infectious diseases. By their very nature, HPIs are context dependent, where the outcome of two proteins as interacting or not depends on the precise biological conditions studied and approaches used for identifying these interactions. The associated biology and the technical details of the experiments identifying interacting protein molecules are increasing being curated using defined curation standards but are overlooked in current HPI network modeling. Given the increase in data size and complexity, awareness of the process and variables included in HPI identification and curation, and their effect on data analysis and interpretation is crucial in understanding pathogenesis. We describe the use of HPI data for network modeling, aspects of curation that can help researchers to more accurately model specific infection conditions, and provide examples to illustrate these principles. © 2018 by John Wiley & Sons, Inc.

越来越多的宿主-病原体相互作用(HPI)信息在相互作用数据库中可用。这些数据代表了详细的、经过实验验证的分子相互作用数据,可用于更好地了解传染病。就其本质而言,hpi依赖于上下文,其中两个蛋白质相互作用的结果取决于所研究的精确生物学条件和用于识别这些相互作用的方法。相关的生物学和鉴定相互作用蛋白质分子的实验的技术细节正在越来越多地使用定义的管理标准进行管理,但在当前的HPI网络建模中被忽视。鉴于数据量和复杂性的增加,了解HPI识别和管理中的过程和变量及其对数据分析和解释的影响对于理解发病机制至关重要。我们描述了HPI数据在网络建模中的使用,以及可以帮助研究人员更准确地建立特定感染条件模型的管理方面,并提供了示例来说明这些原则。©2018 by John Wiley &儿子,Inc。
{"title":"Leveraging Experimental Details for an Improved Understanding of Host-Pathogen Interactome","authors":"Mais Ammari,&nbsp;Fiona McCarthy,&nbsp;Bindu Nanduri","doi":"10.1002/cpbi.44","DOIUrl":"10.1002/cpbi.44","url":null,"abstract":"<p>An increasing proportion of curated host-pathogen interaction (HPI) information is becoming available in interaction databases. These data represent detailed, experimentally-verified, molecular interaction data, which may be used to better understand infectious diseases. By their very nature, HPIs are context dependent, where the outcome of two proteins as interacting or not depends on the precise biological conditions studied and approaches used for identifying these interactions. The associated biology and the technical details of the experiments identifying interacting protein molecules are increasing being curated using defined curation standards but are overlooked in current HPI network modeling. Given the increase in data size and complexity, awareness of the process and variables included in HPI identification and curation, and their effect on data analysis and interpretation is crucial in understanding pathogenesis. We describe the use of HPI data for network modeling, aspects of curation that can help researchers to more accurately model specific infection conditions, and provide examples to illustrate these principles. © 2018 by John Wiley &amp; Sons, Inc.</p>","PeriodicalId":10958,"journal":{"name":"Current protocols in bioinformatics","volume":"61 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/cpbi.44","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36339291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Using RegulonDB, the Escherichia coli K-12 Gene Regulatory Transcriptional Network Database 使用RegulonDB,大肠杆菌K-12基因调控转录网络数据库
Q1 Biochemistry, Genetics and Molecular Biology Pub Date : 2018-04-09 DOI: 10.1002/cpbi.43
Heladia Salgado, Irma Martínez-Flores, Víctor H. Bustamante, Kevin Alquicira-Hernández, Jair S. García-Sotelo, Delfino García-Alonso, Julio Collado-Vides

In RegulonDB, for over 25 years, we have been gathering knowledge by manual curation from original scientific literature on the regulation of transcription initiation and genome organization in transcription units of the Escherichia coli K-12 genome. This unit describes six basic protocols that can serve as a guiding introduction to the main content of the current version (v9.4) of this electronic resource. These protocols include general navigation as well as searching for specific objects such as genes, gene products, transcription units, promoters, transcription factors, coexpression, and genetic sensory response units or GENSOR Units. In these protocols, the user will find an initial introduction to the concepts pertinent to the protocol, the content obtained when performing the given navigation, and the necessary resources for carrying out the protocol. This easy-to-follow presentation should help anyone interested in quickly seeing all that is currently offered in RegulonDB, including position weight matrices of transcription factors, coexpression values based on published microarrays, and the GENSOR Units unique to RegulonDB that offer regulatory mechanisms in the context of their signals and metabolic consequences. © 2018 by John Wiley & Sons, Inc.

在RegulonDB,在过去的25年里,我们一直通过人工整理收集关于大肠杆菌K-12基因组转录单位的转录起始和基因组组织调控的原始科学文献。本单元描述了六种基本协议,它们可以作为该电子资源当前版本(v9.4)的主要内容的指导性介绍。这些协议包括一般导航以及搜索特定对象,如基因、基因产物、转录单位、启动子、转录因子、共表达和遗传感觉反应单位或GENSOR单位。在这些协议中,用户将找到与协议相关的概念的初步介绍,在执行给定导航时获得的内容,以及执行协议所需的资源。这个简单易懂的演示可以帮助任何有兴趣快速了解目前在RegulonDB中提供的所有内容的人,包括转录因子的位置权重矩阵,基于已发表的微阵列的共表达值,以及RegulonDB独有的GENSOR单元,在其信号和代谢后果的背景下提供调节机制。©2018 by John Wiley &儿子,Inc。
{"title":"Using RegulonDB, the Escherichia coli K-12 Gene Regulatory Transcriptional Network Database","authors":"Heladia Salgado,&nbsp;Irma Martínez-Flores,&nbsp;Víctor H. Bustamante,&nbsp;Kevin Alquicira-Hernández,&nbsp;Jair S. García-Sotelo,&nbsp;Delfino García-Alonso,&nbsp;Julio Collado-Vides","doi":"10.1002/cpbi.43","DOIUrl":"10.1002/cpbi.43","url":null,"abstract":"<p>In RegulonDB, for over 25 years, we have been gathering knowledge by manual curation from original scientific literature on the regulation of transcription initiation and genome organization in transcription units of the <i>Escherichia coli</i> K-12 genome. This unit describes six basic protocols that can serve as a guiding introduction to the main content of the current version (v9.4) of this electronic resource. These protocols include general navigation as well as searching for specific objects such as genes, gene products, transcription units, promoters, transcription factors, coexpression, and genetic sensory response units or GENSOR Units. In these protocols, the user will find an initial introduction to the concepts pertinent to the protocol, the content obtained when performing the given navigation, and the necessary resources for carrying out the protocol. This easy-to-follow presentation should help anyone interested in quickly seeing all that is currently offered in RegulonDB, including position weight matrices of transcription factors, coexpression values based on published microarrays, and the GENSOR Units unique to RegulonDB that offer regulatory mechanisms in the context of their signals and metabolic consequences. © 2018 by John Wiley &amp; Sons, Inc.</p>","PeriodicalId":10958,"journal":{"name":"Current protocols in bioinformatics","volume":"61 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/cpbi.43","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36339903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Exploring Biological Networks in 3D, Stereoscopic 3D, and Immersive 3D with iCAVE 探索生物网络在3D,立体3D,沉浸式3D与iCAVE
Q1 Biochemistry, Genetics and Molecular Biology Pub Date : 2018-04-09 DOI: 10.1002/cpbi.47
Selim Kalayci, Zeynep H. Gümüş

Biological networks are becoming increasingly large and complex, pushing the limits of existing 2D tools. iCAVE is an open-source software tool for interactive visual explorations of large and complex networks in 3D, stereoscopic 3D, or immersive 3D. It introduces new 3D network layout algorithms and 3D extensions of popular 2D network layout, clustering, and edge bundling algorithms to assist researchers in understanding the underlying patterns in large, multi-layered, clustered, or complex networks. This protocol aims to guide new users on the basic functions of iCAVE for loading data, laying out networks (single or multi-layered), bundling edges, clustering networks, visualizing clusters, visualizing data attributes, and saving output images or videos. It also provides examples on visualizing networks constrained in physical 3D space (e.g., proteins; neurons; brain). It is accompanied by a new version of iCAVE with an enhanced user interface and highlights new features useful for existing users. © 2018 by John Wiley & Sons, Inc.

生物网络正变得越来越庞大和复杂,突破了现有二维工具的极限。iCAVE是一个开源软件工具,用于在3D、立体3D或沉浸式3D中对大型复杂网络进行交互式可视化探索。它引入了新的3D网络布局算法和流行的2D网络布局、聚类和边缘捆绑算法的3D扩展,以帮助研究人员理解大型、多层、聚类或复杂网络中的底层模式。本协议旨在指导新用户了解iCAVE的基本功能,包括数据加载、网络布局(单层或多层)、边缘捆绑、网络聚类、集群可视化、数据属性可视化、输出图像或视频保存等。它还提供了在物理3D空间(例如,蛋白质;神经元;大脑)。它还附带了一个新的iCAVE版本,具有增强的用户界面,并突出了对现有用户有用的新功能。©2018 by John Wiley &儿子,Inc。
{"title":"Exploring Biological Networks in 3D, Stereoscopic 3D, and Immersive 3D with iCAVE","authors":"Selim Kalayci,&nbsp;Zeynep H. Gümüş","doi":"10.1002/cpbi.47","DOIUrl":"10.1002/cpbi.47","url":null,"abstract":"<p>Biological networks are becoming increasingly large and complex, pushing the limits of existing 2D tools. iCAVE is an open-source software tool for interactive visual explorations of large and complex networks in 3D, stereoscopic 3D, or immersive 3D. It introduces new 3D network layout algorithms and 3D extensions of popular 2D network layout, clustering, and edge bundling algorithms to assist researchers in understanding the underlying patterns in large, multi-layered, clustered, or complex networks. This protocol aims to guide new users on the basic functions of iCAVE for loading data, laying out networks (single or multi-layered), bundling edges, clustering networks, visualizing clusters, visualizing data attributes, and saving output images or videos. It also provides examples on visualizing networks constrained in physical 3D space (e.g., proteins; neurons; brain). It is accompanied by a new version of iCAVE with an enhanced user interface and highlights new features useful for existing users. © 2018 by John Wiley &amp; Sons, Inc.</p>","PeriodicalId":10958,"journal":{"name":"Current protocols in bioinformatics","volume":"61 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/cpbi.47","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36340322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
From FastQ Data to High-Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline 从快速数据到高可信度的变体调用:基因组分析工具包最佳实践管道
Q1 Biochemistry, Genetics and Molecular Biology Pub Date : 2018-03-15 DOI: 10.1002/0471250953.bi1110s43
Geraldine A. Van der Auwera, Mauricio O. Carneiro, Christopher Hartl, Ryan Poplin, Guillermo del Angel, Ami Levy-Moonshine, Tadeusz Jordan, Khalid Shakir, David Roazen, Joel Thibault, Eric Banks, Kiran V. Garimella, David Altshuler, Stacey Gabriel, Mark A. DePristo

This unit describes how to use BWA and the Genome Analysis Toolkit (GATK) to map genome sequencing data to a reference and produce high-quality variant calls that can be used in downstream analyses. The complete workflow includes the core NGS data-processing steps that are necessary to make the raw data suitable for analysis by the GATK, as well as the key methods involved in variant discovery using the GATK. Curr. Protoc. Bioinform. 43:11.10.1-11.10.33. © 2013 by John Wiley & Sons, Inc.

本单元描述了如何使用BWA和基因组分析工具包(GATK)将基因组测序数据映射到参考,并产生可用于下游分析的高质量变体调用。完整的工作流程包括使原始数据适合GATK分析所必需的核心NGS数据处理步骤,以及使用GATK发现变体所涉及的关键方法。咕咕叫。Protoc。Bioinform 43:11.10.1-11.10.33。©2013 by John Wiley &儿子,Inc。
{"title":"From FastQ Data to High-Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline","authors":"Geraldine A. Van der Auwera,&nbsp;Mauricio O. Carneiro,&nbsp;Christopher Hartl,&nbsp;Ryan Poplin,&nbsp;Guillermo del Angel,&nbsp;Ami Levy-Moonshine,&nbsp;Tadeusz Jordan,&nbsp;Khalid Shakir,&nbsp;David Roazen,&nbsp;Joel Thibault,&nbsp;Eric Banks,&nbsp;Kiran V. Garimella,&nbsp;David Altshuler,&nbsp;Stacey Gabriel,&nbsp;Mark A. DePristo","doi":"10.1002/0471250953.bi1110s43","DOIUrl":"10.1002/0471250953.bi1110s43","url":null,"abstract":"<p>This unit describes how to use BWA and the Genome Analysis Toolkit (GATK) to map genome sequencing data to a reference and produce high-quality variant calls that can be used in downstream analyses. The complete workflow includes the core NGS data-processing steps that are necessary to make the raw data suitable for analysis by the GATK, as well as the key methods involved in variant discovery using the GATK. <i>Curr. Protoc. Bioinform</i>. 43:11.10.1-11.10.33. © 2013 by John Wiley &amp; Sons, Inc.</p>","PeriodicalId":10958,"journal":{"name":"Current protocols in bioinformatics","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/0471250953.bi1110s43","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32843642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5158
Selecting the Right Similarity-Scoring Matrix 选择正确的相似度评分矩阵
Q1 Biochemistry, Genetics and Molecular Biology Pub Date : 2018-02-16 DOI: 10.1002/0471250953.bi0305s43
William R. Pearson

Protein sequence similarity searching programs like BLASTP, SSEARCH, and FASTA use scoring matrices that are designed to identify distant evolutionary relationships (BLOSUM62 for BLAST, BLOSUM50 for SSEARCH and FASTA). Different similarity scoring matrices are most effective at different evolutionary distances. “Deep” scoring matrices like BLOSUM62 and BLOSUM50 target alignments with 20% to 30% identity, while “shallow” scoring matrices (e.g., VTML10 to VTML80) target alignments that share 90% to 50% identity, reflecting much less evolutionary change. While “deep” matrices provide very sensitive similarity searches, they also require longer sequence alignments and can sometimes produce alignment overextension into nonhomologous regions. Shallower scoring matrices are more effective when searching for short protein domains, or when the goal is to limit the scope of the search to sequences that are likely to be orthologous between recently diverged organisms. Likewise, in DNA searches, the match and mismatch parameters set evolutionary look-back times and domain boundaries. In this unit, we will discuss the theoretical foundations that drive practical choices of protein and DNA similarity scoring matrices and gap penalties. Deep scoring matrices (BLOSUM62 and BLOSUM50) should be used for sensitive searches with full-length protein sequences, but short domains or restricted evolutionary look-back require shallower scoring matrices. Curr. Protoc. Bioinform. 43:3.5.1-3.5.9. © 2013 by John Wiley & Sons, Inc.

蛋白质序列相似性搜索程序,如BLASTP、SSEARCH和FASTA,使用评分矩阵来识别远距离进化关系(BLASTP为BLOSUM62, SSEARCH和FASTA为BLOSUM50)。不同的相似性评分矩阵在不同的进化距离下最有效。像BLOSUM62和BLOSUM50这样的“深度”评分矩阵的目标序列具有20%到30%的同同性,而“浅”评分矩阵(例如,VTML10到VTML80)的目标序列具有90%到50%的同同性,反映的进化变化要少得多。虽然“深度”矩阵提供了非常敏感的相似性搜索,但它们也需要更长的序列比对,并且有时会产生比对过度延伸到非同源区域。当搜索短蛋白结构域时,或者当目标是将搜索范围限制在最近分化的生物体之间可能是同源的序列时,较浅的评分矩阵更有效。同样,在DNA搜索中,匹配和不匹配参数设置了进化回顾时间和域边界。在本单元中,我们将讨论驱动蛋白质和DNA相似性评分矩阵和间隙处罚的实际选择的理论基础。深度评分矩阵(BLOSUM62和BLOSUM50)应该用于全长蛋白序列的敏感搜索,但短结构域或受限的进化回顾需要较浅的评分矩阵。咕咕叫。Protoc。Bioinform 43:3.5.1-3.5.9。©2013 by John Wiley &儿子,Inc。
{"title":"Selecting the Right Similarity-Scoring Matrix","authors":"William R. Pearson","doi":"10.1002/0471250953.bi0305s43","DOIUrl":"10.1002/0471250953.bi0305s43","url":null,"abstract":"<p>Protein sequence similarity searching programs like BLASTP, SSEARCH, and FASTA use scoring matrices that are designed to identify distant evolutionary relationships (BLOSUM62 for BLAST, BLOSUM50 for SSEARCH and FASTA). Different similarity scoring matrices are most effective at different evolutionary distances. “Deep” scoring matrices like BLOSUM62 and BLOSUM50 target alignments with 20% to 30% identity, while “shallow” scoring matrices (e.g., VTML10 to VTML80) target alignments that share 90% to 50% identity, reflecting much less evolutionary change. While “deep” matrices provide very sensitive similarity searches, they also require longer sequence alignments and can sometimes produce alignment overextension into nonhomologous regions. Shallower scoring matrices are more effective when searching for short protein domains, or when the goal is to limit the scope of the search to sequences that are likely to be orthologous between recently diverged organisms. Likewise, in DNA searches, the match and mismatch parameters set evolutionary look-back times and domain boundaries. In this unit, we will discuss the theoretical foundations that drive practical choices of protein and DNA similarity scoring matrices and gap penalties. Deep scoring matrices (BLOSUM62 and BLOSUM50) should be used for sensitive searches with full-length protein sequences, but short domains or restricted evolutionary look-back require shallower scoring matrices. <i>Curr. Protoc. Bioinform</i>. 43:3.5.1-3.5.9. © 2013 by John Wiley &amp; Sons, Inc.</p>","PeriodicalId":10958,"journal":{"name":"Current protocols in bioinformatics","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/0471250953.bi0305s43","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32100405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 113
Cloud Computing with iPlant Atmosphere 云计算与iPlant大气
Q1 Biochemistry, Genetics and Molecular Biology Pub Date : 2018-02-16 DOI: 10.1002/0471250953.bi0915s43
Sheldon J. McKay, Edwin J. Skidmore, Christopher J. LaRose, Andre W. Mercer, Christos Noutsos

Cloud Computing refers to distributed computing platforms that use virtualization software to provide easy access to physical computing infrastructure and data storage, typically administered through a Web interface. Cloud-based computing provides access to powerful servers, with specific software and virtual hardware configurations, while eliminating the initial capital cost of expensive computers and reducing the ongoing operating costs of system administration, maintenance contracts, power consumption, and cooling. This eliminates a significant barrier to entry into bioinformatics and high-performance computing for many researchers. This is especially true of free or modestly priced cloud computing services. The iPlant Collaborative offers a free cloud computing service, Atmosphere, which allows users to easily create and use instances on virtual servers preconfigured for their analytical needs. Atmosphere is a self-service, on-demand platform for scientific computing. This unit demonstrates how to set up, access and use cloud computing in Atmosphere. Curr. Protoc. Bioinform. 43:9.15.1-9.15.20. © 2013 by John Wiley & Sons, Inc.

云计算指的是分布式计算平台,它使用虚拟化软件提供对物理计算基础设施和数据存储的轻松访问,通常通过Web界面进行管理。基于云的计算通过特定的软件和虚拟硬件配置提供对功能强大的服务器的访问,同时消除了昂贵计算机的初始资本成本,并降低了系统管理、维护合同、功耗和冷却的持续操作成本。这为许多研究人员进入生物信息学和高性能计算领域消除了一个重要的障碍。免费或价格适中的云计算服务尤其如此。iPlant Collaborative提供免费的云计算服务Atmosphere,允许用户在预先配置的虚拟服务器上轻松创建和使用实例,以满足他们的分析需求。Atmosphere是一个自助、按需的科学计算平台。本单元演示如何在Atmosphere中设置、访问和使用云计算。咕咕叫。Protoc。Bioinform 43:9.15.1-9.15.20。©2013 by John Wiley &儿子,Inc。
{"title":"Cloud Computing with iPlant Atmosphere","authors":"Sheldon J. McKay,&nbsp;Edwin J. Skidmore,&nbsp;Christopher J. LaRose,&nbsp;Andre W. Mercer,&nbsp;Christos Noutsos","doi":"10.1002/0471250953.bi0915s43","DOIUrl":"10.1002/0471250953.bi0915s43","url":null,"abstract":"<p>Cloud Computing refers to distributed computing platforms that use virtualization software to provide easy access to physical computing infrastructure and data storage, typically administered through a Web interface. Cloud-based computing provides access to powerful servers, with specific software and virtual hardware configurations, while eliminating the initial capital cost of expensive computers and reducing the ongoing operating costs of system administration, maintenance contracts, power consumption, and cooling. This eliminates a significant barrier to entry into bioinformatics and high-performance computing for many researchers. This is especially true of free or modestly priced cloud computing services. The iPlant Collaborative offers a free cloud computing service, Atmosphere, which allows users to easily create and use instances on virtual servers preconfigured for their analytical needs. Atmosphere is a self-service, on-demand platform for scientific computing. This unit demonstrates how to set up, access and use cloud computing in Atmosphere. <i>Curr. Protoc. Bioinform</i>. 43:9.15.1-9.15.20. © 2013 by John Wiley &amp; Sons, Inc.</p>","PeriodicalId":10958,"journal":{"name":"Current protocols in bioinformatics","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/0471250953.bi0915s43","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34089477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An Overview of RNA Sequence Analyses: Structure Prediction, ncRNA Gene Identification, and RNAi Design RNA序列分析综述:结构预测、ncRNA基因鉴定和RNAi设计
Q1 Biochemistry, Genetics and Molecular Biology Pub Date : 2018-02-16 DOI: 10.1002/0471250953.bi1201s43
Gary D. Stormo
This unit briefly describes the two fundamentally different methods for predicting RNA structures. The first is to find that structure with the minimum free energy of folding, as predicted by various thermodynamic parameters related to base‐pair stacking, loop lengths, and other features. If one has only a single sequence, this thermodynamic approach is the best available method. The second fundamental approach to RNA structure prediction is to use multiple, homologous sequences for which one can infer a common structure, and then try and predict a structure common to all of the sequences. Such an approach is referred to as a comparative method or phylogenetic method of RNA structure prediction. Curr. Protoc. Bioinform. 43:12.1.1‐12.1.3. © 2013 by John Wiley & Sons, Inc.
本单元简要介绍了预测RNA结构的两种根本不同的方法。首先是根据与碱基对堆叠、环路长度和其他特征相关的各种热力学参数预测,找到具有最小折叠自由能的结构。如果只有一个序列,这种热力学方法是最好的方法。RNA结构预测的第二个基本方法是使用多个同源序列,从中可以推断出一个共同的结构,然后尝试预测所有序列的共同结构。这种方法被称为RNA结构预测的比较方法或系统发育方法。咕咕叫。Protoc。Bioinform 43:12.1.1-12.1.3。©2013 by John Wiley &儿子,Inc。
{"title":"An Overview of RNA Sequence Analyses: Structure Prediction, ncRNA Gene Identification, and RNAi Design","authors":"Gary D. Stormo","doi":"10.1002/0471250953.bi1201s43","DOIUrl":"10.1002/0471250953.bi1201s43","url":null,"abstract":"This unit briefly describes the two fundamentally different methods for predicting RNA structures. The first is to find that structure with the minimum free energy of folding, as predicted by various thermodynamic parameters related to base‐pair stacking, loop lengths, and other features. If one has only a single sequence, this thermodynamic approach is the best available method. The second fundamental approach to RNA structure prediction is to use multiple, homologous sequences for which one can infer a common structure, and then try and predict a structure common to all of the sequences. Such an approach is referred to as a comparative method or phylogenetic method of RNA structure prediction. Curr. Protoc. Bioinform. 43:12.1.1‐12.1.3. © 2013 by John Wiley & Sons, Inc.","PeriodicalId":10958,"journal":{"name":"Current protocols in bioinformatics","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/0471250953.bi1201s43","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34089475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
LipidXplorer: Software for Quantitative Shotgun Lipidomics Compatible with Multiple Mass Spectrometry Platforms LipidXplorer:与多个质谱平台兼容的定量霰弹枪脂组学软件
Q1 Biochemistry, Genetics and Molecular Biology Pub Date : 2018-02-16 DOI: 10.1002/0471250953.bi1412s43
Ronny Herzog, Dominik Schwudke, Andrej Shevchenko

LipidXplorer is an open-source software kit that supports the identification and quantification of molecular species of any lipid class detected by shotgun experiments performed on any mass spectrometry platform. LipidXplorer does not rely on a database of reference spectra: instead, lipid identification routines are user defined in the declarative molecular fragmentation query language (MFQL). The software supports batch processing of multiple shotgun acquisitions by high-resolution mass mapping, precursor and neutral-loss scanning, and data-dependent MS/MS lending itself to a variety of lipidomics applications in cell biology and molecular medicine. Curr. Protoc. Bioinform. 43:14.12.1-14.12.30. © 2013 by John Wiley & Sons, Inc.

LipidXplorer是一款开源软件,支持在任何质谱平台上通过霰弹枪实验检测到的任何脂类分子种类的鉴定和定量。LipidXplorer不依赖于参考光谱数据库:相反,脂质鉴定例程是用户在声明性分子碎片查询语言(MFQL)中定义的。该软件通过高分辨率质量图谱、前体和中性损失扫描,以及数据依赖的质谱联用技术,支持批量处理多个霰弹枪采集的数据,这些数据可用于细胞生物学和分子医学中的各种脂质组学应用。咕咕叫。Protoc。Bioinform 43:14.12.1-14.12.30。©2013 by John Wiley &儿子,Inc。
{"title":"LipidXplorer: Software for Quantitative Shotgun Lipidomics Compatible with Multiple Mass Spectrometry Platforms","authors":"Ronny Herzog,&nbsp;Dominik Schwudke,&nbsp;Andrej Shevchenko","doi":"10.1002/0471250953.bi1412s43","DOIUrl":"10.1002/0471250953.bi1412s43","url":null,"abstract":"<p>LipidXplorer is an open-source software kit that supports the identification and quantification of molecular species of any lipid class detected by shotgun experiments performed on any mass spectrometry platform. LipidXplorer does not rely on a database of reference spectra: instead, lipid identification routines are user defined in the declarative molecular fragmentation query language (MFQL). The software supports batch processing of multiple shotgun acquisitions by high-resolution mass mapping, precursor and neutral-loss scanning, and data-dependent MS/MS lending itself to a variety of lipidomics applications in cell biology and molecular medicine. <i>Curr. Protoc. Bioinform</i>. 43:14.12.1-14.12.30. © 2013 by John Wiley &amp; Sons, Inc.</p>","PeriodicalId":10958,"journal":{"name":"Current protocols in bioinformatics","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/0471250953.bi1412s43","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34089476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
期刊
Current protocols in bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1