Large-scale Structure-Informed multiple sequence alignment of proteins with SIMSApiper.

IF 4.4 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Bioinformatics Pub Date : 2024-04-22 DOI:10.1093/bioinformatics/btae276
Charlotte Crauwels, Sophie-Luise Heidig, Adrián Díaz, Wim F Vranken
{"title":"Large-scale Structure-Informed multiple sequence alignment of proteins with SIMSApiper.","authors":"Charlotte Crauwels, Sophie-Luise Heidig, Adrián Díaz, Wim F Vranken","doi":"10.1093/bioinformatics/btae276","DOIUrl":null,"url":null,"abstract":"SUMMARY\nSIMSApiper is a Nextflow pipeline that creates reliable, structure-informed MSAs of thousands of protein sequences in time-frames faster than standard structure-based alignment methods. Structural information can be provided by the user or collected by the pipeline from online resources. Parallelization with sequence identity based subsets can be activated to significantly speed up the alignment process. Finally, the number of gaps in the final alignment can be reduced by leveraging the position of conserved secondary structure elements.\n\n\nAVAILABILITY AND IMPLEMENTATION\nThe pipeline is implemented using Nextflow, Python3 and Bash. It is publicly available on github.com/Bio2Byte/simsapiper.\n\n\nSUPPLEMENTARY INFORMATION\nAll data is available on GitHub.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.4000,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btae276","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

SUMMARY SIMSApiper is a Nextflow pipeline that creates reliable, structure-informed MSAs of thousands of protein sequences in time-frames faster than standard structure-based alignment methods. Structural information can be provided by the user or collected by the pipeline from online resources. Parallelization with sequence identity based subsets can be activated to significantly speed up the alignment process. Finally, the number of gaps in the final alignment can be reduced by leveraging the position of conserved secondary structure elements. AVAILABILITY AND IMPLEMENTATION The pipeline is implemented using Nextflow, Python3 and Bash. It is publicly available on github.com/Bio2Byte/simsapiper. SUPPLEMENTARY INFORMATION All data is available on GitHub.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用 SIMSApiper 对蛋白质进行大规模结构信息多序列比对。
摘要SIMSApiper 是一种 Nextflow 管道,它能以比标准的基于结构的比对方法更快的速度,为成千上万的蛋白质序列创建可靠的、基于结构的 MSAs。结构信息可以由用户提供,也可以由管道从在线资源中收集。基于序列同一性子集的并行化可以被激活,从而显著加快比对过程。最后,利用保守二级结构元素的位置,可以减少最终比对中的间隙数量。它可在 github.com/Bio2Byte/simsapiper 上公开获取。补充信息所有数据均可在 GitHub 上获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Bioinformatics
Bioinformatics 生物-生化研究方法
CiteScore
11.20
自引率
5.20%
发文量
753
审稿时长
2.1 months
期刊介绍: The leading journal in its field, Bioinformatics publishes the highest quality scientific papers and review articles of interest to academic and industrial researchers. Its main focus is on new developments in genome bioinformatics and computational biology. Two distinct sections within the journal - Discovery Notes and Application Notes- focus on shorter papers; the former reporting biologically interesting discoveries using computational methods, the latter exploring the applications used for experiments.
期刊最新文献
PQSDC: a parallel lossless compressor for quality scores data via sequences partition and Run-Length prediction mapping. MUSE-XAE: MUtational Signature Extraction with eXplainable AutoEncoder enhances tumour types classification. CopyVAE: a variational autoencoder-based approach for copy number variation inference using single-cell transcriptomics CORDAX web server: An online platform for the prediction and 3D visualization of aggregation motifs in protein sequences. LMCrot: An enhanced protein crotonylation site predictor by leveraging an interpretable window-level embedding from a transformer-based protein language model.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1