The Ribosomal Operon Database: A Full-Length rDNA Operon Database Derived From Genome Assemblies.

IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Molecular Ecology Resources Pub Date : 2024-10-21 DOI:10.1111/1755-0998.14031
Anders K Krabberød, Embla Stokke, Ella Thoen, Inger Skrede, Håvard Kauserud
{"title":"The Ribosomal Operon Database: A Full-Length rDNA Operon Database Derived From Genome Assemblies.","authors":"Anders K Krabberød, Embla Stokke, Ella Thoen, Inger Skrede, Håvard Kauserud","doi":"10.1111/1755-0998.14031","DOIUrl":null,"url":null,"abstract":"<p><p>Current rDNA reference sequence databases are tailored towards shorter DNA markers, such as parts of the 16/18S marker or the internally transcribed spacer (ITS) region. However, due to advances in long-read DNA sequencing technologies, longer stretches of the rDNA operon are increasingly used in environmental sequencing studies to increase the phylogenetic resolution. There is, therefore, a growing need for longer rDNA reference sequences. Here, we present the ribosomal operon database (ROD), which includes eukaryotic full-length rDNA operons fished from publicly available genome assemblies. Full-length operons were detected in 34.1% of the 34,701 examined eukaryotic genome assemblies from NCBI. In most cases (53.1%), more than one operon variant was detected, which can be due to intragenomic operon copy variability, allelic variation in non-haploid genomes, or technical errors from the sequencing and assembly process. The highest copy number found was 5947 in Zea mays. In total, 453,697 unique operons were detected, with 69,480 operon variant clusters remaining after intragenomic clustering at 99% sequence identity. The operon length varied extensively across eukaryotes, ranging from 4136 to 16,463 bp, which will lead to considerable polymerase chain reaction (PCR) bias during amplification of the entire operon. Clustering the full-length operons revealed that the different parts (i.e., 18S, 28S, and the hypervariable regions V4 and V9 of 18S) provide divergent taxonomic resolution, with 18S, the V4 and V9 regions being the most conserved. The ROD will be updated regularly to provide an increasing number of full-length rDNA operons to the scientific community.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":" ","pages":"e14031"},"PeriodicalIF":5.5000,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Ecology Resources","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1111/1755-0998.14031","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Current rDNA reference sequence databases are tailored towards shorter DNA markers, such as parts of the 16/18S marker or the internally transcribed spacer (ITS) region. However, due to advances in long-read DNA sequencing technologies, longer stretches of the rDNA operon are increasingly used in environmental sequencing studies to increase the phylogenetic resolution. There is, therefore, a growing need for longer rDNA reference sequences. Here, we present the ribosomal operon database (ROD), which includes eukaryotic full-length rDNA operons fished from publicly available genome assemblies. Full-length operons were detected in 34.1% of the 34,701 examined eukaryotic genome assemblies from NCBI. In most cases (53.1%), more than one operon variant was detected, which can be due to intragenomic operon copy variability, allelic variation in non-haploid genomes, or technical errors from the sequencing and assembly process. The highest copy number found was 5947 in Zea mays. In total, 453,697 unique operons were detected, with 69,480 operon variant clusters remaining after intragenomic clustering at 99% sequence identity. The operon length varied extensively across eukaryotes, ranging from 4136 to 16,463 bp, which will lead to considerable polymerase chain reaction (PCR) bias during amplification of the entire operon. Clustering the full-length operons revealed that the different parts (i.e., 18S, 28S, and the hypervariable regions V4 and V9 of 18S) provide divergent taxonomic resolution, with 18S, the V4 and V9 regions being the most conserved. The ROD will be updated regularly to provide an increasing number of full-length rDNA operons to the scientific community.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
核糖体操作子数据库:从基因组组装中提取的全长 rDNA 操作子数据库。
目前的 rDNA 参考序列数据库主要针对较短的 DNA 标记,如 16/18S 标记或内部转录间隔区(ITS)的一部分。然而,由于长读程 DNA 测序技术的进步,环境测序研究中越来越多地使用较长的 rDNA 操作子,以提高系统发育的分辨率。因此,越来越需要更长的 rDNA 参考序列。在这里,我们介绍了核糖体操作子数据库(ROD),其中包括从公开的基因组汇编中获取的真核生物全长 rDNA 操作子。在NCBI提供的34,701个真核生物基因组汇编中,有34.1%检测到了全长操作子。在大多数情况下(53.1%),检测到一个以上的操作子变体,这可能是由于基因组内操作子拷贝变异、非单倍体基因组中的等位基因变异或测序和组装过程中的技术错误造成的。在玉米中发现的最高拷贝数为 5947。总共检测到 453,697 个独特的操作子,经过基因组内聚类后,剩下 69,480 个操作子变异群,序列同一性为 99%。真核生物的操作子长度差异很大,从 4136 到 16,463 bp 不等,这将导致在扩增整个操作子时聚合酶链反应(PCR)产生相当大的偏差。对全长操作子进行聚类发现,不同部分(即 18S、28S 以及 18S 的 V4 和 V9 超变区)提供了不同的分类分辨率,其中 18S、V4 和 V9 区最为保守。ROD 将定期更新,为科学界提供越来越多的全长 rDNA 操作子。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Molecular Ecology Resources
Molecular Ecology Resources 生物-进化生物学
CiteScore
15.60
自引率
5.20%
发文量
170
审稿时长
3 months
期刊介绍: Molecular Ecology Resources promotes the creation of comprehensive resources for the scientific community, encompassing computer programs, statistical and molecular advancements, and a diverse array of molecular tools. Serving as a conduit for disseminating these resources, the journal targets a broad audience of researchers in the fields of evolution, ecology, and conservation. Articles in Molecular Ecology Resources are crafted to support investigations tackling significant questions within these disciplines. In addition to original resource articles, Molecular Ecology Resources features Reviews, Opinions, and Comments relevant to the field. The journal also periodically releases Special Issues focusing on resource development within specific areas.
期刊最新文献
Chromosomal-Level Genome Suggests Adaptive Constraints Leading to the Historical Population Decline in an Extremely Endangered Plant. Development of SNP Panels from Low-Coverage Whole Genome Sequencing (lcWGS) to Support Indigenous Fisheries for Three Salmonid Species in Northern Canada. Probe Capture Enrichment Sequencing of amoA Genes Improves the Detection of Diverse Ammonia-Oxidising Archaeal and Bacterial Populations. HMicroDB: A Comprehensive Database of Herpetofaunal Microbiota With a Focus on Host Phylogeny, Physiological Traits, and Environment Factors. OGU: A Toolbox for Better Utilising Organelle Genomic Data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1