EMMA：给定约束子集排列的计算多序列排列的新方法

IF 1.5 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS Algorithms for Molecular Biology Pub Date : 2023-12-07 DOI:10.1186/s13015-023-00247-x

Chengze Shen, Baqiao Liu, Kelly P. Williams, Tandy Warnow

{"title":"EMMA：给定约束子集排列的计算多序列排列的新方法","authors":"Chengze Shen, Baqiao Liu, Kelly P. Williams, Tandy Warnow","doi":"10.1186/s13015-023-00247-x","DOIUrl":null,"url":null,"abstract":"Adding sequences into an existing (possibly user-provided) alignment has multiple applications, including updating a large alignment with new data, adding sequences into a constraint alignment constructed using biological knowledge, or computing alignments in the presence of sequence length heterogeneity. Although this is a natural problem, only a few tools have been developed to use this information with high fidelity. We present EMMA (Extending Multiple alignments using MAFFT--add) for the problem of adding a set of unaligned sequences into a multiple sequence alignment (i.e., a constraint alignment). EMMA builds on MAFFT--add, which is also designed to add sequences into a given constraint alignment. EMMA improves on MAFFT--add methods by using a divide-and-conquer framework to scale its most accurate version, MAFFT-linsi--add, to constraint alignments with many sequences. We show that EMMA has an accuracy advantage over other techniques for adding sequences into alignments under many realistic conditions and can scale to large datasets with high accuracy (hundreds of thousands of sequences). EMMA is available at https://github.com/c5shen/EMMA . EMMA is a new tool that provides high accuracy and scalability for adding sequences into an existing alignment.","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"23 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment\",\"authors\":\"Chengze Shen, Baqiao Liu, Kelly P. Williams, Tandy Warnow\",\"doi\":\"10.1186/s13015-023-00247-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Adding sequences into an existing (possibly user-provided) alignment has multiple applications, including updating a large alignment with new data, adding sequences into a constraint alignment constructed using biological knowledge, or computing alignments in the presence of sequence length heterogeneity. Although this is a natural problem, only a few tools have been developed to use this information with high fidelity. We present EMMA (Extending Multiple alignments using MAFFT--add) for the problem of adding a set of unaligned sequences into a multiple sequence alignment (i.e., a constraint alignment). EMMA builds on MAFFT--add, which is also designed to add sequences into a given constraint alignment. EMMA improves on MAFFT--add methods by using a divide-and-conquer framework to scale its most accurate version, MAFFT-linsi--add, to constraint alignments with many sequences. We show that EMMA has an accuracy advantage over other techniques for adding sequences into alignments under many realistic conditions and can scale to large datasets with high accuracy (hundreds of thousands of sequences). EMMA is available at https://github.com/c5shen/EMMA . EMMA is a new tool that provides high accuracy and scalability for adding sequences into an existing alignment.\",\"PeriodicalId\":50823,\"journal\":{\"name\":\"Algorithms for Molecular Biology\",\"volume\":\"23 1\",\"pages\":\"\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2023-12-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Algorithms for Molecular Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s13015-023-00247-x\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algorithms for Molecular Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13015-023-00247-x","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 1

摘要

将序列添加到现有的（可能是用户提供的）比对中有多种应用，包括用新数据更新大型比对、将序列添加到用生物知识构建的约束比对中，或在序列长度异质性的情况下计算比对。虽然这是一个自然问题，但目前只有少数工具能高保真地使用这些信息。我们提出的 EMMA（使用 MAFFT--add扩展多序列对齐）可解决将一组未对齐序列添加到多序列对齐（即约束对齐）中的问题。EMMA建立在MAFFT--add的基础上，MAFFT--add也是为了将序列添加到给定的约束比对中而设计的。EMMA改进了MAFFT--add方法，使用分而治之的框架将其最精确的版本MAFFT--linsi--add扩展到多序列的约束对齐。我们的研究表明，在许多现实条件下，EMMA在将序列添加到对齐中方面比其他技术更准确，而且能以高准确度（数十万条序列）扩展到大型数据集。EMMA 可在 https://github.com/c5shen/EMMA 上获取。EMMA是一种新工具，可将序列添加到现有的排列中，具有高准确性和可扩展性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment

Adding sequences into an existing (possibly user-provided) alignment has multiple applications, including updating a large alignment with new data, adding sequences into a constraint alignment constructed using biological knowledge, or computing alignments in the presence of sequence length heterogeneity. Although this is a natural problem, only a few tools have been developed to use this information with high fidelity. We present EMMA (Extending Multiple alignments using MAFFT--add) for the problem of adding a set of unaligned sequences into a multiple sequence alignment (i.e., a constraint alignment). EMMA builds on MAFFT--add, which is also designed to add sequences into a given constraint alignment. EMMA improves on MAFFT--add methods by using a divide-and-conquer framework to scale its most accurate version, MAFFT-linsi--add, to constraint alignments with many sequences. We show that EMMA has an accuracy advantage over other techniques for adding sequences into alignments under many realistic conditions and can scale to large datasets with high accuracy (hundreds of thousands of sequences). EMMA is available at https://github.com/c5shen/EMMA . EMMA is a new tool that provides high accuracy and scalability for adding sequences into an existing alignment.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Algorithms for Molecular Biology 生物-生化研究方法

CiteScore

2.40

自引率

10.00%

发文量

审稿时长

>12 weeks

期刊介绍： Algorithms for Molecular Biology publishes articles on novel algorithms for biological sequence and structure analysis, phylogeny reconstruction, and combinatorial algorithms and machine learning. Areas of interest include but are not limited to: algorithms for RNA and protein structure analysis, gene prediction and genome analysis, comparative sequence analysis and alignment, phylogeny, gene expression, machine learning, and combinatorial algorithms. Where appropriate, manuscripts should describe applications to real-world data. However, pure algorithm papers are also welcome if future applications to biological data are to be expected, or if they address complexity or approximation issues of novel computational problems in molecular biology. Articles about novel software tools will be considered for publication if they contain some algorithmically interesting aspects.