Estimation of the number of authentic orphan genes in bacterial genomes.

S. Fukuchi, K. Nishikawa
{"title":"Estimation of the number of authentic orphan genes in bacterial genomes.","authors":"S. Fukuchi, K. Nishikawa","doi":"10.1093/DNARES/11.4.219","DOIUrl":null,"url":null,"abstract":"Genome annotation produces a considerable number of putative proteins lacking sequence similarity to known proteins. These are referred to as \"orphans.\" The proportion of orphan genes varies among genomes, and is independent of genome size. In the present study, we show that the proportion of orphan genes roughly correlates with the isolation index of organisms (IIO), an indicator introduced in the present study, which represents the degree of isolation of a given genome as measured by sequence similarity. However, there are outlier genomes with respect to the linear correlation, consisting of those genomes that may contain excess amounts of orphan genes. Comparisons of genome sequences among closely related strains revealed that some of the annotated genes are not conserved, suggesting that they are ORFs occurring by chance. Exclusion of these non-conserved ORFs within closely related genomes improved the correlation between the proportion of orphan genes and the IIO values. Assuming that the correlation holds in general, this relationship was used to estimate the number of \"authentic\" orphan genes in a genome. Using this definition of authentic orphan genes, the anomalies arising from over-assignments, e.g., the percentages of structural annotations, were corrected for 16 genomes, including those of five archaea.","PeriodicalId":11212,"journal":{"name":"DNA Research: An International Journal for Rapid Publication of Reports on Genes and Genomes","volume":"96 1","pages":"219-31, 311-313"},"PeriodicalIF":0.0000,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"27","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"DNA Research: An International Journal for Rapid Publication of Reports on Genes and Genomes","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/DNARES/11.4.219","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 27

Abstract

Genome annotation produces a considerable number of putative proteins lacking sequence similarity to known proteins. These are referred to as "orphans." The proportion of orphan genes varies among genomes, and is independent of genome size. In the present study, we show that the proportion of orphan genes roughly correlates with the isolation index of organisms (IIO), an indicator introduced in the present study, which represents the degree of isolation of a given genome as measured by sequence similarity. However, there are outlier genomes with respect to the linear correlation, consisting of those genomes that may contain excess amounts of orphan genes. Comparisons of genome sequences among closely related strains revealed that some of the annotated genes are not conserved, suggesting that they are ORFs occurring by chance. Exclusion of these non-conserved ORFs within closely related genomes improved the correlation between the proportion of orphan genes and the IIO values. Assuming that the correlation holds in general, this relationship was used to estimate the number of "authentic" orphan genes in a genome. Using this definition of authentic orphan genes, the anomalies arising from over-assignments, e.g., the percentages of structural annotations, were corrected for 16 genomes, including those of five archaea.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
细菌基因组中真实孤儿基因数量的估计。
基因组注释产生了相当数量的假定蛋白质,缺乏与已知蛋白质的序列相似性。这些人被称为“孤儿”。孤儿基因的比例因基因组而异,且与基因组大小无关。在本研究中,我们发现孤儿基因的比例与生物的分离指数(IIO)大致相关,IIO是本研究中引入的一个指标,它代表了通过序列相似性来衡量的给定基因组的分离程度。然而,在线性相关性方面存在异常基因组,由那些可能含有过量孤儿基因的基因组组成。对亲缘关系较近的菌株进行基因组序列比较发现,一些注释基因不保守,提示它们是偶然发生的orf。在密切相关的基因组中排除这些非保守的orf,提高了孤儿基因比例与IIO值之间的相关性。假设这种相关性在一般情况下成立,这种关系被用来估计基因组中“真正的”孤儿基因的数量。利用这一真实孤儿基因的定义,对16个基因组(包括5个古细菌的基因组)的过度赋值(如结构注释的百分比)引起的异常进行了校正。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Telomere-to-telomere genome assembly of Oldenlandia diffusa Genome and transcriptome analyses reveal genes involved in the formation of fine ridges on petal epidermal cells in Hibiscus trionum Chromosome-level genome assembly of Lilford’s wall lizard, Podarcis lilfordi (Günther, 1874) from the Balearic Islands (Spain) Mituru Takanami, 1929–2022 A highly contiguous genome assembly of red perilla (Perilla frutescens) domesticated in Japan
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1