CombAlign: a code for generating a one-to-many sequence alignment from a set of pairwise structure-based sequence alignments.

Q2 Decision Sciences Source Code for Biology and Medicine Pub Date : 2015-08-05 eCollection Date: 2015-01-01 DOI:10.1186/s13029-015-0039-1

Carol L Ecale Zhou

{"title":"CombAlign: a code for generating a one-to-many sequence alignment from a set of pairwise structure-based sequence alignments.","authors":"Carol L Ecale Zhou","doi":"10.1186/s13029-015-0039-1","DOIUrl":null,"url":null,"abstract":"Background: In order to better define regions of similarity among related protein structures, it is useful to identify the residue-residue correspondences among proteins. Few codes exist for constructing a one-to-many multiple sequence alignment derived from a set of structure or sequence alignments, and a need was evident for creating such a tool for combining pairwise structure alignments that would allow for insertion of gaps in the reference structure.Results: This report describes a new Python code, CombAlign, which takes as input a set of pairwise sequence alignments (which may be structure based) and generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA). The use and utility of CombAlign was demonstrated by generating gapped MSSAs using sets of pairwise structure-based sequence alignments between structure models of the matrix protein (VP40) and pre-small/secreted glycoprotein (sGP) of Reston Ebolavirus and the corresponding proteins of several other filoviruses. The gapped MSSAs revealed structure-based residue-residue correspondences, which enabled identification of structurally similar versus differing regions in the Reston proteins compared to each of the other corresponding proteins.Conclusions: CombAlign is a new Python code that generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA) given a set of pairwise sequence alignments (which may be structure based). CombAlign has utility in assisting the user in distinguishing structurally conserved versus divergent regions on a reference protein structure relative to other closely related proteins. CombAlign was developed in Python 2.6, and the source code is available for download from the GitHub code repository.","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"10 ","pages":"9"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13029-015-0039-1","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Source Code for Biology and Medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s13029-015-0039-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2015/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"Decision Sciences","Score":null,"Total":0}

引用次数: 3

Abstract

Background: In order to better define regions of similarity among related protein structures, it is useful to identify the residue-residue correspondences among proteins. Few codes exist for constructing a one-to-many multiple sequence alignment derived from a set of structure or sequence alignments, and a need was evident for creating such a tool for combining pairwise structure alignments that would allow for insertion of gaps in the reference structure.

Results: This report describes a new Python code, CombAlign, which takes as input a set of pairwise sequence alignments (which may be structure based) and generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA). The use and utility of CombAlign was demonstrated by generating gapped MSSAs using sets of pairwise structure-based sequence alignments between structure models of the matrix protein (VP40) and pre-small/secreted glycoprotein (sGP) of Reston Ebolavirus and the corresponding proteins of several other filoviruses. The gapped MSSAs revealed structure-based residue-residue correspondences, which enabled identification of structurally similar versus differing regions in the Reston proteins compared to each of the other corresponding proteins.

Conclusions: CombAlign is a new Python code that generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA) given a set of pairwise sequence alignments (which may be structure based). CombAlign has utility in assisting the user in distinguishing structurally conserved versus divergent regions on a reference protein structure relative to other closely related proteins. CombAlign was developed in Python 2.6, and the source code is available for download from the GitHub code repository.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CombAlign:从一组基于成对结构的序列对齐中生成一对多序列对齐的代码。

背景:为了更好地确定相关蛋白质结构之间的相似区域，鉴定蛋白质之间的残基-残基对应关系是有用的。很少有代码用于构造从一组结构或序列对齐派生的一对多多序列对齐，并且显然需要创建这样的工具来组合成对的结构对齐，从而允许在参考结构中插入间隙。结果:该报告描述了一个新的Python代码CombAlign，它将一组成对序列对齐(可能是基于结构的)作为输入，并生成一对多，间隙，多个结构或基于序列的序列对齐(MSSA)。通过将雷斯顿埃博拉病毒的基质蛋白(VP40)和前小/分泌糖蛋白(sGP)的结构模型与其他几种丝状病毒的相应蛋白进行成对序列比对，生成间隙MSSAs，证明了CombAlign的使用和实用性。间隙的MSSAs揭示了基于结构的残基-残基对应关系，这使得与其他相应蛋白相比，可以识别出Reston蛋白中结构相似或不同的区域。CombAlign是一个新的Python代码，它在给定一组成对序列对齐(可能是基于结构的)的情况下生成一对多，间隙，多个结构或基于序列的序列对齐(MSSA)。CombAlign在帮助用户区分相对于其他密切相关的蛋白质的参考蛋白结构上的结构保守区和分化区方面具有实用性。CombAlign是用Python 2.6开发的，源代码可以从GitHub代码库下载。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Source Code for Biology and Medicine Decision Sciences-Information Systems and Management

自引率

0.00%

发文量

期刊介绍： Source Code for Biology and Medicine is a peer-reviewed open access, online journal that publishes articles on source code employed over a wide range of applications in biology and medicine. The journal"s aim is to publish source code for distribution and use in the public domain in order to advance biological and medical research. Through this dissemination, it may be possible to shorten the time required for solving certain computational problems for which there is limited source code availability or resources.

期刊最新文献

2DKD: a toolkit for content-based local image search. Computing and graphing probability values of pearson distributions: a SAS/IML macro. iPBAvizu: a PyMOL plugin for an efficient 3D protein structure superimposition approach Social support for collaboration and group awareness in life science research teams. MZPAQ: a FASTQ data compression tool.