Charlotte Crauwels, Sophie-Luise Heidig, Adrián Díaz, Wim F Vranken
{"title":"利用 SIMSApiper 对蛋白质进行大规模结构信息多序列比对。","authors":"Charlotte Crauwels, Sophie-Luise Heidig, Adrián Díaz, Wim F Vranken","doi":"10.1093/bioinformatics/btae276","DOIUrl":null,"url":null,"abstract":"SUMMARY\nSIMSApiper is a Nextflow pipeline that creates reliable, structure-informed MSAs of thousands of protein sequences in time-frames faster than standard structure-based alignment methods. Structural information can be provided by the user or collected by the pipeline from online resources. Parallelization with sequence identity based subsets can be activated to significantly speed up the alignment process. Finally, the number of gaps in the final alignment can be reduced by leveraging the position of conserved secondary structure elements.\n\n\nAVAILABILITY AND IMPLEMENTATION\nThe pipeline is implemented using Nextflow, Python3 and Bash. It is publicly available on github.com/Bio2Byte/simsapiper.\n\n\nSUPPLEMENTARY INFORMATION\nAll data is available on GitHub.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.4000,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Large-scale Structure-Informed multiple sequence alignment of proteins with SIMSApiper.\",\"authors\":\"Charlotte Crauwels, Sophie-Luise Heidig, Adrián Díaz, Wim F Vranken\",\"doi\":\"10.1093/bioinformatics/btae276\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"SUMMARY\\nSIMSApiper is a Nextflow pipeline that creates reliable, structure-informed MSAs of thousands of protein sequences in time-frames faster than standard structure-based alignment methods. Structural information can be provided by the user or collected by the pipeline from online resources. Parallelization with sequence identity based subsets can be activated to significantly speed up the alignment process. Finally, the number of gaps in the final alignment can be reduced by leveraging the position of conserved secondary structure elements.\\n\\n\\nAVAILABILITY AND IMPLEMENTATION\\nThe pipeline is implemented using Nextflow, Python3 and Bash. It is publicly available on github.com/Bio2Byte/simsapiper.\\n\\n\\nSUPPLEMENTARY INFORMATION\\nAll data is available on GitHub.\",\"PeriodicalId\":8903,\"journal\":{\"name\":\"Bioinformatics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2024-04-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/bioinformatics/btae276\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btae276","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
Large-scale Structure-Informed multiple sequence alignment of proteins with SIMSApiper.
SUMMARY
SIMSApiper is a Nextflow pipeline that creates reliable, structure-informed MSAs of thousands of protein sequences in time-frames faster than standard structure-based alignment methods. Structural information can be provided by the user or collected by the pipeline from online resources. Parallelization with sequence identity based subsets can be activated to significantly speed up the alignment process. Finally, the number of gaps in the final alignment can be reduced by leveraging the position of conserved secondary structure elements.
AVAILABILITY AND IMPLEMENTATION
The pipeline is implemented using Nextflow, Python3 and Bash. It is publicly available on github.com/Bio2Byte/simsapiper.
SUPPLEMENTARY INFORMATION
All data is available on GitHub.
期刊介绍:
The leading journal in its field, Bioinformatics publishes the highest quality scientific papers and review articles of interest to academic and industrial researchers. Its main focus is on new developments in genome bioinformatics and computational biology. Two distinct sections within the journal - Discovery Notes and Application Notes- focus on shorter papers; the former reporting biologically interesting discoveries using computational methods, the latter exploring the applications used for experiments.