Pitfalls of genotyping microbial communities with rapidly growing genome collections.

IF 9 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Cell Systems Pub Date : 2023-02-15 Epub Date: 2023-01-18 DOI:10.1016/j.cels.2022.12.007
Chunyu Zhao, Zhou Jason Shi, Katherine S Pollard
{"title":"Pitfalls of genotyping microbial communities with rapidly growing genome collections.","authors":"Chunyu Zhao, Zhou Jason Shi, Katherine S Pollard","doi":"10.1016/j.cels.2022.12.007","DOIUrl":null,"url":null,"abstract":"<p><p>Detecting genetic variants in metagenomic data is a priority for understanding the evolution, ecology, and functional characteristics of microbial communities. Many tools that perform this metagenotyping rely on aligning reads of unknown origin to a database of sequences from many species before calling variants. In this synthesis, we investigate how databases of increasingly diverse and closely related species have pushed the limits of current alignment algorithms, thereby degrading the performance of metagenotyping tools. We identify multi-mapping reads as a prevalent source of errors and illustrate a trade-off between retaining correct alignments versus limiting incorrect alignments, many of which map reads to the wrong species. Then we evaluate several actionable mitigation strategies and review emerging methods showing promise to further improve metagenotyping in response to the rapid growth in genome collections. Our results have implications beyond metagenotyping to the many tools in microbial genomics that depend upon accurate read mapping.</p>","PeriodicalId":54348,"journal":{"name":"Cell Systems","volume":"14 2","pages":"160-176.e3"},"PeriodicalIF":9.0000,"publicationDate":"2023-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9957970/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cell Systems","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.cels.2022.12.007","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/1/18 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Detecting genetic variants in metagenomic data is a priority for understanding the evolution, ecology, and functional characteristics of microbial communities. Many tools that perform this metagenotyping rely on aligning reads of unknown origin to a database of sequences from many species before calling variants. In this synthesis, we investigate how databases of increasingly diverse and closely related species have pushed the limits of current alignment algorithms, thereby degrading the performance of metagenotyping tools. We identify multi-mapping reads as a prevalent source of errors and illustrate a trade-off between retaining correct alignments versus limiting incorrect alignments, many of which map reads to the wrong species. Then we evaluate several actionable mitigation strategies and review emerging methods showing promise to further improve metagenotyping in response to the rapid growth in genome collections. Our results have implications beyond metagenotyping to the many tools in microbial genomics that depend upon accurate read mapping.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用快速增长的基因组集合对微生物群落进行基因分型的陷阱。
检测元基因组数据中的遗传变异是了解微生物群落进化、生态和功能特征的当务之急。许多进行元基因分型的工具都依赖于将来源不明的读数与来自许多物种的序列数据库进行比对,然后再调用变异。在这篇综述中,我们研究了日益多样化和密切相关的物种数据库是如何挑战当前比对算法的极限,从而降低元基因分型工具的性能的。我们发现多重配对读数是错误的主要来源,并说明了保留正确配对与限制错误配对之间的权衡,其中许多错误配对将读数映射到了错误的物种。然后,我们评估了几种可行的缓解策略,并回顾了有望进一步改进元基因分型的新兴方法,以应对基因组收集的快速增长。我们的研究结果不仅对元基因分型有影响,还对微生物基因组学中许多依赖于准确读数映射的工具有影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Cell Systems
Cell Systems Medicine-Pathology and Forensic Medicine
CiteScore
16.50
自引率
1.10%
发文量
84
审稿时长
42 days
期刊介绍: In 2015, Cell Systems was founded as a platform within Cell Press to showcase innovative research in systems biology. Our primary goal is to investigate complex biological phenomena that cannot be simply explained by basic mathematical principles. While the physical sciences have long successfully tackled such challenges, we have discovered that our most impactful publications often employ quantitative, inference-based methodologies borrowed from the fields of physics, engineering, mathematics, and computer science. We are committed to providing a home for elegant research that addresses fundamental questions in systems biology.
期刊最新文献
pH and buffering capacity: Fundamental yet underappreciated drivers of algal-bacterial interactions What’s driving rhythmic gene expression: Sleep or the clock? Model integration of circadian- and sleep-wake-driven contributions to rhythmic gene expression reveals distinct regulatory principles On knowing a gene: A distributional hypothesis of gene function Acute response to pathogens in the early human placenta at single-cell resolution
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1