{"title":"New automatic and effective tools for genome annotation","authors":"M. Borodovsky","doi":"10.1109/ICCABS.2017.8114287","DOIUrl":null,"url":null,"abstract":"Gene prediction and annotation plays central role in genomics. However, in spite of much attention, open problems still exist and stimulate searches for new algorithmic solutions in all categories of gene finding. Prokaryotic genes can be identified with higher average accuracy than eukaryotic ones. Nevertheless, the error rate is not negligible and largely species-specific. Our prokaryotic gene finder GeneMarkS, a self-training tool working in iterations, was used in many genome projects [1]. In the new version, GeneMarkS-2, we introduced a series of heuristic models for training initialization, classification of genomes with respect to gene start organization, as well as an adaptive process of model structure modification. We used multiple tests to assess accuracy of the new tool as well as several other current gene finders. A self-training tool for gene annotation in eukaryotic genomes GeneMark-ES, has been constantly updated and has been used in a number of genome projects conducted by the DOE Joint Genome Institute and the Broad Institute since 2007. This tool was recently extended to fully automated GeneMark-ET [2] integrating information on RNA-Seq reads mapped to the genome. Another extension, GeneMark-EP uses genomic footprints of homologous proteins. Both algorithms carry similar approaches for filtering out errors in algorithms of processing external evidence. The metagenomic gene finder, MetaGeneMark [3] has been employed in IMG/M at DOE Joint Genome Institute for metagenome annotation. This tool was further developed to call genes in fungal metagenomes. Finally, BRAKER1, a pipeline for unsupervised RNA-Seq based genome annotation combines advantages of GeneMark-ET and AUGUSTUS [4]. All the tools described above can be applied for analysis of newly assembled NGS genomes without any additional preparation steps.","PeriodicalId":89933,"journal":{"name":"IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences","volume":"35 1","pages":"1"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCABS.2017.8114287","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Gene prediction and annotation plays central role in genomics. However, in spite of much attention, open problems still exist and stimulate searches for new algorithmic solutions in all categories of gene finding. Prokaryotic genes can be identified with higher average accuracy than eukaryotic ones. Nevertheless, the error rate is not negligible and largely species-specific. Our prokaryotic gene finder GeneMarkS, a self-training tool working in iterations, was used in many genome projects [1]. In the new version, GeneMarkS-2, we introduced a series of heuristic models for training initialization, classification of genomes with respect to gene start organization, as well as an adaptive process of model structure modification. We used multiple tests to assess accuracy of the new tool as well as several other current gene finders. A self-training tool for gene annotation in eukaryotic genomes GeneMark-ES, has been constantly updated and has been used in a number of genome projects conducted by the DOE Joint Genome Institute and the Broad Institute since 2007. This tool was recently extended to fully automated GeneMark-ET [2] integrating information on RNA-Seq reads mapped to the genome. Another extension, GeneMark-EP uses genomic footprints of homologous proteins. Both algorithms carry similar approaches for filtering out errors in algorithms of processing external evidence. The metagenomic gene finder, MetaGeneMark [3] has been employed in IMG/M at DOE Joint Genome Institute for metagenome annotation. This tool was further developed to call genes in fungal metagenomes. Finally, BRAKER1, a pipeline for unsupervised RNA-Seq based genome annotation combines advantages of GeneMark-ET and AUGUSTUS [4]. All the tools described above can be applied for analysis of newly assembled NGS genomes without any additional preparation steps.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
新的自动和有效的基因组注释工具
基因预测与注释是基因组学研究的核心内容。然而,尽管受到了很多关注,开放的问题仍然存在,并刺激了在所有类别的基因发现中寻找新的算法解决方案。原核生物基因的平均鉴定准确率高于真核生物基因。然而,错误率是不可忽略的,而且很大程度上是物种特异性的。我们的原核基因查找器GeneMarkS是一种迭代工作的自我训练工具,已被用于许多基因组计划[1]。在新版本GeneMarkS-2中,我们引入了一系列启发式模型,用于训练初始化、基因组在基因起始组织方面的分类以及模型结构修改的自适应过程。我们使用了多个测试来评估新工具以及其他几个现有基因发现器的准确性。GeneMark-ES是一种真核生物基因组基因注释的自我训练工具,自2007年以来一直在不断更新,并已用于美国能源部联合基因组研究所和布罗德研究所开展的许多基因组项目中。该工具最近扩展到全自动GeneMark-ET[2],整合了基因组上RNA-Seq reads的信息。另一个扩展,GeneMark-EP使用同源蛋白的基因组足迹。这两种算法都采用了类似的方法来过滤处理外部证据算法中的错误。元基因组基因查找器MetaGeneMark[3]已被美国能源部联合基因组研究所IMG/M用于元基因组注释。该工具被进一步发展为在真菌宏基因组中调用基因。最后,BRAKER1是一个基于无监督RNA-Seq的基因组注释管道,它结合了GeneMark-ET和AUGUSTUS的优点[4]。上述所有工具均可用于分析新组装的NGS基因组,无需任何额外的准备步骤。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Computational Advances in Bio and Medical Sciences: 11th International Conference, ICCABS 2021, Virtual Event, December 16–18, 2021, Revised Selected Papers Computational Advances in Bio and Medical Sciences: 10th International Conference, ICCABS 2020, Virtual Event, December 10-12, 2020, Revised Selected Papers Single-Cell Gene Regulatory Network Analysis Reveals Potential Mechanisms of Action of Antimalarials Against SARS-CoV-2 Computational Study of Action Potential Generation in Urethral Smooth Muscle Cell DNA Read Feature Importance Using Machine Learning for Read Alignment Categories
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1