J.M. Trinidad-Barnech, J.R. José Sotelo-Silveira, D. Fernandez Do Porto, P. Smircich
{"title":"Expanding kinetoplastid genome annotation through protein structure comparison","authors":"J.M. Trinidad-Barnech, J.R. José Sotelo-Silveira, D. Fernandez Do Porto, P. Smircich","doi":"10.1101/2024.08.07.607044","DOIUrl":null,"url":null,"abstract":"Kinetoplastids belong to the supergroup Discobids, an early divergent eukaryotic clade. Although the amount of genomic information on these parasites has grown substantially, assigning gene functions through traditional sequence-based homology methods remains challenging. Recently, significant advancements have been made in in silico protein structure prediction and algorithms for rapid and precise large-scale protein structure comparisons. In this work, we developed a protein structure-based homology search pipeline (ASC, Annotation by Structural Comparisons) and applied it to annotate all kinetoplastid proteins available in TriTrypDB. Our pipeline assigned functional annotation to 23,000 hypothetical proteins across all 35 kinetoplastid species in the database. Among these, we identified ubiquitous eukaryotic proteins that had not been previously detected in kinetoplastid genomes. The resulting annotations (KASC, Kinetoplastid Annotation by Structural Comparison) are openly available to the community (kasc.fcien.edu.uy). Author Summary Kinetoplastids are a group of parasites that cause severe diseases in the poorest regions of the world. Despite the increasing amount of genomic information available on these parasites, predicting the function of many of their genes using traditional methods has been difficult. Recently, there have been significant advancements in predicting protein structures and comparing them on a large scale. In this study, we created a new method called ASC (Annotation by Structural Comparisons) to find functions for all the kinetoplastid genes listed in the TriTrypDB database. Our strategy successfully assigned functions to 23,000 proteins in kinetoplastids. Among these, we discovered important proteins found in all eukaryotes that had not been previously identified in kinetoplastids. This information (KASC, Kinetoplastid Annotation by Structural Comparison) is freely available at kasc.fcien.edu.uy.","PeriodicalId":505198,"journal":{"name":"bioRxiv","volume":"18 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.08.07.607044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Kinetoplastids belong to the supergroup Discobids, an early divergent eukaryotic clade. Although the amount of genomic information on these parasites has grown substantially, assigning gene functions through traditional sequence-based homology methods remains challenging. Recently, significant advancements have been made in in silico protein structure prediction and algorithms for rapid and precise large-scale protein structure comparisons. In this work, we developed a protein structure-based homology search pipeline (ASC, Annotation by Structural Comparisons) and applied it to annotate all kinetoplastid proteins available in TriTrypDB. Our pipeline assigned functional annotation to 23,000 hypothetical proteins across all 35 kinetoplastid species in the database. Among these, we identified ubiquitous eukaryotic proteins that had not been previously detected in kinetoplastid genomes. The resulting annotations (KASC, Kinetoplastid Annotation by Structural Comparison) are openly available to the community (kasc.fcien.edu.uy). Author Summary Kinetoplastids are a group of parasites that cause severe diseases in the poorest regions of the world. Despite the increasing amount of genomic information available on these parasites, predicting the function of many of their genes using traditional methods has been difficult. Recently, there have been significant advancements in predicting protein structures and comparing them on a large scale. In this study, we created a new method called ASC (Annotation by Structural Comparisons) to find functions for all the kinetoplastid genes listed in the TriTrypDB database. Our strategy successfully assigned functions to 23,000 proteins in kinetoplastids. Among these, we discovered important proteins found in all eukaryotes that had not been previously identified in kinetoplastids. This information (KASC, Kinetoplastid Annotation by Structural Comparison) is freely available at kasc.fcien.edu.uy.