Large-scale Structure-Informed multiple sequence alignment of proteins with SIMSApiper.

IF 4.4 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS Bioinformatics Pub Date : 2024-04-22 DOI:10.1093/bioinformatics/btae276

Charlotte Crauwels, Sophie-Luise Heidig, Adrián Díaz, Wim F Vranken

引用次数: 0

Abstract

SUMMARY SIMSApiper is a Nextflow pipeline that creates reliable, structure-informed MSAs of thousands of protein sequences in time-frames faster than standard structure-based alignment methods. Structural information can be provided by the user or collected by the pipeline from online resources. Parallelization with sequence identity based subsets can be activated to significantly speed up the alignment process. Finally, the number of gaps in the final alignment can be reduced by leveraging the position of conserved secondary structure elements. AVAILABILITY AND IMPLEMENTATION The pipeline is implemented using Nextflow, Python3 and Bash. It is publicly available on github.com/Bio2Byte/simsapiper. SUPPLEMENTARY INFORMATION All data is available on GitHub.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用 SIMSApiper 对蛋白质进行大规模结构信息多序列比对。

摘要SIMSApiper 是一种 Nextflow 管道，它能以比标准的基于结构的比对方法更快的速度，为成千上万的蛋白质序列创建可靠的、基于结构的 MSAs。结构信息可以由用户提供，也可以由管道从在线资源中收集。基于序列同一性子集的并行化可以被激活，从而显著加快比对过程。最后，利用保守二级结构元素的位置，可以减少最终比对中的间隙数量。它可在 github.com/Bio2Byte/simsapiper 上公开获取。补充信息所有数据均可在 GitHub 上获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Bioinformatics 生物-生化研究方法

CiteScore

11.20

自引率

5.20%

发文量

753

审稿时长

2.1 months

期刊介绍： The leading journal in its field, Bioinformatics publishes the highest quality scientific papers and review articles of interest to academic and industrial researchers. Its main focus is on new developments in genome bioinformatics and computational biology. Two distinct sections within the journal - Discovery Notes and Application Notes- focus on shorter papers; the former reporting biologically interesting discoveries using computational methods, the latter exploring the applications used for experiments.