Improving the gold standard in NCBI GenBank and related databases: DNA sequences from type specimens and type strains.

IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY Systematic Biology Pub Date : 2024-07-27 DOI:10.1093/sysbio/syad068
Susanne S Renner, Mark D Scherz, Conrad L Schoch, Marc Gottschling, Miguel Vences
{"title":"Improving the gold standard in NCBI GenBank and related databases: DNA sequences from type specimens and type strains.","authors":"Susanne S Renner, Mark D Scherz, Conrad L Schoch, Marc Gottschling, Miguel Vences","doi":"10.1093/sysbio/syad068","DOIUrl":null,"url":null,"abstract":"<p><p>Scientific names permit humans and search engines to access knowledge about the biodiversity that surrounds us, and names linked to DNA sequences are playing an ever-greater role in search-and-match identification procedures. Here, we analyze how users and curators of the National Center for Biotechnology Information (NCBI) are flagging and curating sequences derived from nomenclatural type material, which is the only way to improve the quality of DNA-based identification in the long run. For prokaryotes, 18,281 genome assemblies from type strains have been curated by NCBI staff and improve the quality of prokaryote naming. For Fungi, type-derived sequences representing over 21,000 species are now essential for fungus naming and identification. For the remaining eukaryotes, however, the numbers of sequences identifiable as type-derived are minuscule, representing only 739 species of arthropods, 1542 vertebrates, and 125 embryophytes. An increase in the production and curation of such sequences will come from (i) sequencing of types or topotypic specimens in museum collections, (ii) the March 2023 rule changes at the International Nucleotide Sequence Database Collaboration requiring more metadata for specimens, and (iii) efforts by data submitters to facilitate curation, including informing NCBI curators about a specimen's type status. We illustrate different type-data submission journeys and provide best-practice examples from a range of organisms. Expanding the number of type-derived sequences in DNA databases, especially of eukaryotes, is crucial for capturing, documenting, and protecting biodiversity.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"486-494"},"PeriodicalIF":6.1000,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11502950/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Systematic Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/sysbio/syad068","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EVOLUTIONARY BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Scientific names permit humans and search engines to access knowledge about the biodiversity that surrounds us, and names linked to DNA sequences are playing an ever-greater role in search-and-match identification procedures. Here, we analyze how users and curators of the National Center for Biotechnology Information (NCBI) are flagging and curating sequences derived from nomenclatural type material, which is the only way to improve the quality of DNA-based identification in the long run. For prokaryotes, 18,281 genome assemblies from type strains have been curated by NCBI staff and improve the quality of prokaryote naming. For Fungi, type-derived sequences representing over 21,000 species are now essential for fungus naming and identification. For the remaining eukaryotes, however, the numbers of sequences identifiable as type-derived are minuscule, representing only 739 species of arthropods, 1542 vertebrates, and 125 embryophytes. An increase in the production and curation of such sequences will come from (i) sequencing of types or topotypic specimens in museum collections, (ii) the March 2023 rule changes at the International Nucleotide Sequence Database Collaboration requiring more metadata for specimens, and (iii) efforts by data submitters to facilitate curation, including informing NCBI curators about a specimen's type status. We illustrate different type-data submission journeys and provide best-practice examples from a range of organisms. Expanding the number of type-derived sequences in DNA databases, especially of eukaryotes, is crucial for capturing, documenting, and protecting biodiversity.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
模式标本和类型菌株的DNA序列——如何增加它们的数量并改进它们在NCBI GenBank和相关数据库中的注释。
学名使人类和搜索引擎能够获取有关我们周围生物多样性的知识,与DNA序列相关的名称在搜索和匹配鉴定程序中发挥着越来越大的作用。在这里,我们分析了国家生物技术信息中心(NCBI)的用户和管理者是如何标记和管理来自命名型材料的序列的,从长远来看,这是提高dna鉴定质量的唯一途径。对于原核生物,NCBI工作人员已经整理了18281个类型菌株的基因组组合,提高了原核生物命名的质量。对于真菌来说,代表超过21000个物种的类型衍生序列现在对于真菌的命名和鉴定是必不可少的。然而,对于剩余的真核生物,可识别为类型衍生的序列数量很少,仅代表1,000种节肢动物,8,441种脊椎动物和430种胚胎植物。这类序列的生产和管理的增加将来自于(i)博物馆收藏的类型或拓扑标本的测序,(ii) 2023年3月国际核苷酸序列数据库协作规则的变化,需要更多的标本元数据,以及(iii)数据提交者为促进管理所做的努力,包括告知NCBI馆长标本的类型状态。我们说明了不同类型数据提交过程,并提供了来自一系列生物体的最佳实践示例。扩大DNA数据库中类型衍生序列的数量,特别是真核生物的类型衍生序列,对于捕获、记录和保护生物多样性至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Systematic Biology
Systematic Biology 生物-进化生物学
CiteScore
13.00
自引率
7.70%
发文量
70
审稿时长
6-12 weeks
期刊介绍: Systematic Biology is the bimonthly journal of the Society of Systematic Biologists. Papers for the journal are original contributions to the theory, principles, and methods of systematics as well as phylogeny, evolution, morphology, biogeography, paleontology, genetics, and the classification of all living things. A Points of View section offers a forum for discussion, while book reviews and announcements of general interest are also featured.
期刊最新文献
A Double-edged Sword: Evolutionary Novelty along Deep-time Diversity Oscillation in An Iconic Group of Predatory Insects (Neuroptera: Mantispoidea) Are Modern Cryptic Species Detectable in the Fossil Record? A Case Study on Agamid Lizards. Bayesian Selection of Relaxed-clock Models: Distinguishing Between Independent and Autocorrelated Rates. Testing relationships between multiple regional features and biogeographic processes of speciation, extinction, and dispersal Robustness of Divergence Time Estimation Despite Gene Tree Estimation Error: A Case Study of Fireflies (Coleoptera: Lampyridae)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1