Charlotte Crauwels, Sophie-Luise Heidig, Adrián Díaz, Wim F Vranken
{"title":"Large-scale Structure-Informed multiple sequence alignment of proteins with SIMSApiper.","authors":"Charlotte Crauwels, Sophie-Luise Heidig, Adrián Díaz, Wim F Vranken","doi":"10.1093/bioinformatics/btae276","DOIUrl":null,"url":null,"abstract":"SUMMARY\nSIMSApiper is a Nextflow pipeline that creates reliable, structure-informed MSAs of thousands of protein sequences in time-frames faster than standard structure-based alignment methods. Structural information can be provided by the user or collected by the pipeline from online resources. Parallelization with sequence identity based subsets can be activated to significantly speed up the alignment process. Finally, the number of gaps in the final alignment can be reduced by leveraging the position of conserved secondary structure elements.\n\n\nAVAILABILITY AND IMPLEMENTATION\nThe pipeline is implemented using Nextflow, Python3 and Bash. It is publicly available on github.com/Bio2Byte/simsapiper.\n\n\nSUPPLEMENTARY INFORMATION\nAll data is available on GitHub.","PeriodicalId":4,"journal":{"name":"ACS Applied Energy Materials","volume":"30 7","pages":""},"PeriodicalIF":5.5000,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Energy Materials","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btae276","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
SUMMARY
SIMSApiper is a Nextflow pipeline that creates reliable, structure-informed MSAs of thousands of protein sequences in time-frames faster than standard structure-based alignment methods. Structural information can be provided by the user or collected by the pipeline from online resources. Parallelization with sequence identity based subsets can be activated to significantly speed up the alignment process. Finally, the number of gaps in the final alignment can be reduced by leveraging the position of conserved secondary structure elements.
AVAILABILITY AND IMPLEMENTATION
The pipeline is implemented using Nextflow, Python3 and Bash. It is publicly available on github.com/Bio2Byte/simsapiper.
SUPPLEMENTARY INFORMATION
All data is available on GitHub.
期刊介绍:
ACS Applied Energy Materials is an interdisciplinary journal publishing original research covering all aspects of materials, engineering, chemistry, physics and biology relevant to energy conversion and storage. The journal is devoted to reports of new and original experimental and theoretical research of an applied nature that integrate knowledge in the areas of materials, engineering, physics, bioscience, and chemistry into important energy applications.