Expanding kinetoplastid genome annotation through protein structure comparison

bioRxiv Pub Date : 2024-08-09 DOI:10.1101/2024.08.07.607044
J.M. Trinidad-Barnech, J.R. José Sotelo-Silveira, D. Fernandez Do Porto, P. Smircich
{"title":"Expanding kinetoplastid genome annotation through protein structure comparison","authors":"J.M. Trinidad-Barnech, J.R. José Sotelo-Silveira, D. Fernandez Do Porto, P. Smircich","doi":"10.1101/2024.08.07.607044","DOIUrl":null,"url":null,"abstract":"Kinetoplastids belong to the supergroup Discobids, an early divergent eukaryotic clade. Although the amount of genomic information on these parasites has grown substantially, assigning gene functions through traditional sequence-based homology methods remains challenging. Recently, significant advancements have been made in in silico protein structure prediction and algorithms for rapid and precise large-scale protein structure comparisons. In this work, we developed a protein structure-based homology search pipeline (ASC, Annotation by Structural Comparisons) and applied it to annotate all kinetoplastid proteins available in TriTrypDB. Our pipeline assigned functional annotation to 23,000 hypothetical proteins across all 35 kinetoplastid species in the database. Among these, we identified ubiquitous eukaryotic proteins that had not been previously detected in kinetoplastid genomes. The resulting annotations (KASC, Kinetoplastid Annotation by Structural Comparison) are openly available to the community (kasc.fcien.edu.uy). Author Summary Kinetoplastids are a group of parasites that cause severe diseases in the poorest regions of the world. Despite the increasing amount of genomic information available on these parasites, predicting the function of many of their genes using traditional methods has been difficult. Recently, there have been significant advancements in predicting protein structures and comparing them on a large scale. In this study, we created a new method called ASC (Annotation by Structural Comparisons) to find functions for all the kinetoplastid genes listed in the TriTrypDB database. Our strategy successfully assigned functions to 23,000 proteins in kinetoplastids. Among these, we discovered important proteins found in all eukaryotes that had not been previously identified in kinetoplastids. This information (KASC, Kinetoplastid Annotation by Structural Comparison) is freely available at kasc.fcien.edu.uy.","PeriodicalId":505198,"journal":{"name":"bioRxiv","volume":"18 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.08.07.607044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Kinetoplastids belong to the supergroup Discobids, an early divergent eukaryotic clade. Although the amount of genomic information on these parasites has grown substantially, assigning gene functions through traditional sequence-based homology methods remains challenging. Recently, significant advancements have been made in in silico protein structure prediction and algorithms for rapid and precise large-scale protein structure comparisons. In this work, we developed a protein structure-based homology search pipeline (ASC, Annotation by Structural Comparisons) and applied it to annotate all kinetoplastid proteins available in TriTrypDB. Our pipeline assigned functional annotation to 23,000 hypothetical proteins across all 35 kinetoplastid species in the database. Among these, we identified ubiquitous eukaryotic proteins that had not been previously detected in kinetoplastid genomes. The resulting annotations (KASC, Kinetoplastid Annotation by Structural Comparison) are openly available to the community (kasc.fcien.edu.uy). Author Summary Kinetoplastids are a group of parasites that cause severe diseases in the poorest regions of the world. Despite the increasing amount of genomic information available on these parasites, predicting the function of many of their genes using traditional methods has been difficult. Recently, there have been significant advancements in predicting protein structures and comparing them on a large scale. In this study, we created a new method called ASC (Annotation by Structural Comparisons) to find functions for all the kinetoplastid genes listed in the TriTrypDB database. Our strategy successfully assigned functions to 23,000 proteins in kinetoplastids. Among these, we discovered important proteins found in all eukaryotes that had not been previously identified in kinetoplastids. This information (KASC, Kinetoplastid Annotation by Structural Comparison) is freely available at kasc.fcien.edu.uy.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过蛋白质结构比较扩展动粒体基因组注释
Kinetoplastids 属于 Discobids 超群,是一个早期分化的真核生物支系。尽管有关这些寄生虫的基因组信息量已大幅增加,但通过传统的基于序列的同源性方法来确定基因功能仍然具有挑战性。最近,硅学蛋白质结构预测和快速精确的大规模蛋白质结构比较算法取得了重大进展。在这项工作中,我们开发了基于蛋白质结构的同源性搜索管道(ASC,Annotation by Structural Comparisons),并将其应用于注释 TriTrypDB 中的所有动植体蛋白质。我们的管道为数据库中所有 35 个核原生动物物种的 23,000 个假定蛋白质分配了功能注释。在这些蛋白质中,我们发现了以前未在核原生质体基因组中检测到的普遍存在的真核蛋白质。由此产生的注释(KASC,Kinetoplastid Annotation by Structural Comparison)可向社区公开(kasc.fcien.edu.uy)。作者简介 Kinetoplastids 是一类寄生虫,在世界上最贫穷的地区引起严重的疾病。尽管有关这些寄生虫的基因组信息越来越多,但用传统方法预测其许多基因的功能一直很困难。最近,在预测蛋白质结构并对其进行大规模比较方面取得了重大进展。在这项研究中,我们创建了一种名为 ASC(通过结构比较进行注释)的新方法,为 TriTrypDB 数据库中列出的所有动植体基因寻找功能。我们的策略成功地为 23,000 个核原生动物蛋白质分配了功能。在这些蛋白质中,我们发现了所有真核生物中都有的重要蛋白质,而这些蛋白质以前从未在核原生质中发现过。这些信息(KASC,Kinetoplastid Annotation by Structural Comparison)可在 kasc.fcien.edu.uy 免费获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Stability of cross-sensory input to primary somatosensory cortex across experience Genomic re-sequencing reveals mutational divergence across genetically engineered strains of model archaea A principled approach to community detection in interareal cortical networks A minimal mathematical model for polarity establishment and centralsplindlin-independent cytokinesis PTEN neddylation aggravates CDK4/6 inhibitor resistance in breast cancer
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1