真菌基因组:功能性注释错误。

IF 5.2 1区 生物学 Q1 MYCOLOGY Ima Fungus Pub Date : 2021-11-01 DOI:10.1186/s43008-021-00083-x
Tapan Kumar Mohanta, Ahmed Al-Harrasi
{"title":"真菌基因组:功能性注释错误。","authors":"Tapan Kumar Mohanta,&nbsp;Ahmed Al-Harrasi","doi":"10.1186/s43008-021-00083-x","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The genome sequence data of more than 65985 species are publicly available as of October 2021 within the National Center for Biotechnology Information (NCBI) database alone and additional genome sequences are available in other databases and also continue to accumulate at a rapid pace. However, an error-free functional annotation of these genome is essential for the research communities to fully utilize these data in an optimum and efficient manner.</p><p><strong>Results: </strong>An analysis of proteome sequence data of 689 fungal species (7.15 million protein sequences) was conducted to identify the presence of functional annotation errors. Proteins associated with calcium signaling events, including calcium dependent protein kinases (CDPKs), calmodulins (CaM), calmodulin-like (CML) proteins, WRKY transcription factors, selenoproteins, and proteins associated with the terpene biosynthesis pathway, were targeted in the analysis. Gene associated with CDPKs and selenoproteins are known to be absent in fungal genomes. Our analysis, however, revealed the presence of proteins that were functionally annotated as CDPK proteins. However, InterproScan analysis indicated that none of the protein sequences annotated as \"calcium dependent protein kinase\" were found to encode calcium binding EF-hands at the regulatory domain. Similarly, none of a protein sequences annotated as a \"selenocysteine\" were found to contain a Sec (U) amino acid. Proteins annotated as CaM and CMLs also had significant discrepancies. CaM proteins should contain four calcium binding EF-hands, however, a range of 2-4 calcium binding EF-hands were present in the fungal proteins that were annotated as CaM proteins. Similarly, CMLs should possess four calcium binding EF-hands, but some of the CML annotated fungal proteins possessed either three or four calcium binding EF-hands. WRKY transcription factors are characterized by the presence of a WRKY domain and are confined to the plant kingdom. Several fungal proteins, however, were annotated as WRKY transcription factors, even though they did not contain a WRKY domain.</p><p><strong>Conclusion: </strong>The presence of functional annotation errors in fungal genome and proteome databases is of considerable concern and needs to be addressed in a timely manner.</p>","PeriodicalId":54345,"journal":{"name":"Ima Fungus","volume":"12 1","pages":"32"},"PeriodicalIF":5.2000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8559351/pdf/","citationCount":"3","resultStr":"{\"title\":\"Fungal genomes: suffering with functional annotation errors.\",\"authors\":\"Tapan Kumar Mohanta,&nbsp;Ahmed Al-Harrasi\",\"doi\":\"10.1186/s43008-021-00083-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The genome sequence data of more than 65985 species are publicly available as of October 2021 within the National Center for Biotechnology Information (NCBI) database alone and additional genome sequences are available in other databases and also continue to accumulate at a rapid pace. However, an error-free functional annotation of these genome is essential for the research communities to fully utilize these data in an optimum and efficient manner.</p><p><strong>Results: </strong>An analysis of proteome sequence data of 689 fungal species (7.15 million protein sequences) was conducted to identify the presence of functional annotation errors. Proteins associated with calcium signaling events, including calcium dependent protein kinases (CDPKs), calmodulins (CaM), calmodulin-like (CML) proteins, WRKY transcription factors, selenoproteins, and proteins associated with the terpene biosynthesis pathway, were targeted in the analysis. Gene associated with CDPKs and selenoproteins are known to be absent in fungal genomes. Our analysis, however, revealed the presence of proteins that were functionally annotated as CDPK proteins. However, InterproScan analysis indicated that none of the protein sequences annotated as \\\"calcium dependent protein kinase\\\" were found to encode calcium binding EF-hands at the regulatory domain. Similarly, none of a protein sequences annotated as a \\\"selenocysteine\\\" were found to contain a Sec (U) amino acid. Proteins annotated as CaM and CMLs also had significant discrepancies. CaM proteins should contain four calcium binding EF-hands, however, a range of 2-4 calcium binding EF-hands were present in the fungal proteins that were annotated as CaM proteins. Similarly, CMLs should possess four calcium binding EF-hands, but some of the CML annotated fungal proteins possessed either three or four calcium binding EF-hands. WRKY transcription factors are characterized by the presence of a WRKY domain and are confined to the plant kingdom. Several fungal proteins, however, were annotated as WRKY transcription factors, even though they did not contain a WRKY domain.</p><p><strong>Conclusion: </strong>The presence of functional annotation errors in fungal genome and proteome databases is of considerable concern and needs to be addressed in a timely manner.</p>\",\"PeriodicalId\":54345,\"journal\":{\"name\":\"Ima Fungus\",\"volume\":\"12 1\",\"pages\":\"32\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2021-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8559351/pdf/\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ima Fungus\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s43008-021-00083-x\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MYCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ima Fungus","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s43008-021-00083-x","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MYCOLOGY","Score":null,"Total":0}
引用次数: 3

摘要

背景:截至2021年10月,仅在国家生物技术信息中心(NCBI)数据库中就有超过65985个物种的基因组序列数据可供公开获取,其他数据库中也有更多的基因组序列,并且还在继续快速积累。然而,对这些基因组进行无错误的功能注释是研究团体充分利用这些数据的必要条件,以最佳和有效的方式。结果:对689种真菌(715万个蛋白质序列)的蛋白质组序列数据进行了分析,发现存在功能注释错误。与钙信号事件相关的蛋白,包括钙依赖性蛋白激酶(CDPKs)、钙调素(CaM)、钙调素样(CML)蛋白、WRKY转录因子、硒蛋白和与萜烯生物合成途径相关的蛋白,都是分析的目标。与CDPKs和硒蛋白相关的基因已知在真菌基因组中缺失。然而,我们的分析显示,存在被功能注释为CDPK蛋白的蛋白质。然而,InterproScan分析表明,没有发现标记为“钙依赖性蛋白激酶”的蛋白质序列在调控区域编码钙结合ef -hand。类似地,没有发现一个标记为“硒代半胱氨酸”的蛋白质序列含有Sec (U)氨基酸。标注为CaM和cml的蛋白也存在显著差异。CaM蛋白应该含有4个钙结合ef -手,然而,在真菌蛋白中存在2-4个钙结合ef -手,这些真菌蛋白被注释为CaM蛋白。同样,CML应该具有四个钙结合ef -手,但一些CML注释的真菌蛋白具有三个或四个钙结合ef -手。WRKY转录因子的特点是存在一个WRKY结构域,并且局限于植物界。然而,一些真菌蛋白被注释为WRKY转录因子,即使它们不包含WRKY结构域。结论:真菌基因组和蛋白质组数据库中存在的功能注释错误值得关注,需要及时解决。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Fungal genomes: suffering with functional annotation errors.

Background: The genome sequence data of more than 65985 species are publicly available as of October 2021 within the National Center for Biotechnology Information (NCBI) database alone and additional genome sequences are available in other databases and also continue to accumulate at a rapid pace. However, an error-free functional annotation of these genome is essential for the research communities to fully utilize these data in an optimum and efficient manner.

Results: An analysis of proteome sequence data of 689 fungal species (7.15 million protein sequences) was conducted to identify the presence of functional annotation errors. Proteins associated with calcium signaling events, including calcium dependent protein kinases (CDPKs), calmodulins (CaM), calmodulin-like (CML) proteins, WRKY transcription factors, selenoproteins, and proteins associated with the terpene biosynthesis pathway, were targeted in the analysis. Gene associated with CDPKs and selenoproteins are known to be absent in fungal genomes. Our analysis, however, revealed the presence of proteins that were functionally annotated as CDPK proteins. However, InterproScan analysis indicated that none of the protein sequences annotated as "calcium dependent protein kinase" were found to encode calcium binding EF-hands at the regulatory domain. Similarly, none of a protein sequences annotated as a "selenocysteine" were found to contain a Sec (U) amino acid. Proteins annotated as CaM and CMLs also had significant discrepancies. CaM proteins should contain four calcium binding EF-hands, however, a range of 2-4 calcium binding EF-hands were present in the fungal proteins that were annotated as CaM proteins. Similarly, CMLs should possess four calcium binding EF-hands, but some of the CML annotated fungal proteins possessed either three or four calcium binding EF-hands. WRKY transcription factors are characterized by the presence of a WRKY domain and are confined to the plant kingdom. Several fungal proteins, however, were annotated as WRKY transcription factors, even though they did not contain a WRKY domain.

Conclusion: The presence of functional annotation errors in fungal genome and proteome databases is of considerable concern and needs to be addressed in a timely manner.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Ima Fungus
Ima Fungus Agricultural and Biological Sciences-Agricultural and Biological Sciences (miscellaneous)
CiteScore
11.00
自引率
3.70%
发文量
18
审稿时长
20 weeks
期刊介绍: The flagship journal of the International Mycological Association. IMA Fungus is an international, peer-reviewed, open-access, full colour, fast-track journal. Papers on any aspect of mycology are considered, and published on-line with final pagination after proofs have been corrected; they are then effectively published under the International Code of Nomenclature for algae, fungi, and plants. The journal strongly supports good practice policies, and requires voucher specimens or cultures to be deposited in a public collection with an online database, DNA sequences in GenBank, alignments in TreeBASE, and validating information on new scientific names, including typifications, to be lodged in MycoBank. News, meeting reports, personalia, research news, correspondence, book news, and information on forthcoming international meetings are included in each issue
期刊最新文献
Elucidation of intragenomic variation of ribosomal DNA sequences in the enigmatic fungal genus Ceraceosorus, including a newly described species Ceraceosorus americanus. Correction: Diversity of Rhizophydiales (Chytridiomycota) in Thailand: unveiling the hidden gems of the Kingdom. Over 400 food resources from Brazil: evidence-based records of wild edible mushrooms. Peltigera lichens as sources of uncharacterized cultured basidiomycete yeasts. Different metabolite profiles across Penicillium roqueforti populations associated with ecological niche specialisation and domestication.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1