微生物二氧利用的无注释预测。

IF 5 2区 生物学 Q1 MICROBIOLOGY mSystems Pub Date : 2024-10-22 Epub Date: 2024-09-04 DOI:10.1128/msystems.00763-24
Avi I Flamholz, Joshua E Goldford, Philippa A Richter, Elin M Larsson, Adrian Jinich, Woodward W Fischer, Dianne K Newman
{"title":"微生物二氧利用的无注释预测。","authors":"Avi I Flamholz, Joshua E Goldford, Philippa A Richter, Elin M Larsson, Adrian Jinich, Woodward W Fischer, Dianne K Newman","doi":"10.1128/msystems.00763-24","DOIUrl":null,"url":null,"abstract":"<p><p>Aerobes require dioxygen (O<sub>2</sub>) to grow; anaerobes do not. However, nearly all microbes-aerobes, anaerobes, and facultative organisms alike-express enzymes whose substrates include O<sub>2</sub>, if only for detoxification. This presents a challenge when trying to assess which organisms are aerobic from genomic data alone. This challenge can be overcome by noting that O<sub>2</sub> utilization has wide-ranging effects on microbes: aerobes typically have larger genomes encoding distinctive O<sub>2</sub>-utilizing enzymes, for example. These effects permit high-quality prediction of O<sub>2</sub> utilization from annotated genome sequences, with several models displaying ≈80% accuracy on a ternary classification task for which blind guessing is only 33% accurate. Since genome annotation is compute-intensive and relies on many assumptions, we asked if annotation-free methods also perform well. We discovered that simple and efficient models based entirely on genomic sequence content-e.g., triplets of amino acids-perform as well as intensive annotation-based classifiers, enabling rapid processing of genomes. We further show that amino acid trimers are useful because they encode information about protein composition and phylogeny. To showcase the utility of rapid prediction, we estimated the prevalence of aerobes and anaerobes in diverse natural environments cataloged in the Earth Microbiome Project. Focusing on a well-studied O<sub>2</sub> gradient in the Black Sea, we found quantitative correspondence between local chemistry (O<sub>2</sub>:sulfide concentration ratio) and the composition of microbial communities. We, therefore, suggest that statistical methods like ours might be used to estimate, or \"sense,\" pivotal features of the chemical environment using DNA sequencing data.IMPORTANCEWe now have access to sequence data from a wide variety of natural environments. These data document a bewildering diversity of microbes, many known only from their genomes. Physiology-an organism's capacity to engage metabolically with its environment-may provide a more useful lens than taxonomy for understanding microbial communities. As an example of this broader principle, we developed algorithms that accurately predict microbial dioxygen utilization directly from genome sequences without annotating genes, e.g., by considering only the amino acids in protein sequences. Annotation-free algorithms enable rapid characterization of natural samples, highlighting quantitative correspondence between sequences and local O<sub>2</sub> levels in a data set from the Black Sea. This example suggests that DNA sequencing might be repurposed as a multi-pronged chemical sensor, estimating concentrations of O<sub>2</sub> and other key facets of complex natural settings.</p>","PeriodicalId":18819,"journal":{"name":"mSystems","volume":null,"pages":null},"PeriodicalIF":5.0000,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11494890/pdf/","citationCount":"0","resultStr":"{\"title\":\"Annotation-free prediction of microbial dioxygen utilization.\",\"authors\":\"Avi I Flamholz, Joshua E Goldford, Philippa A Richter, Elin M Larsson, Adrian Jinich, Woodward W Fischer, Dianne K Newman\",\"doi\":\"10.1128/msystems.00763-24\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Aerobes require dioxygen (O<sub>2</sub>) to grow; anaerobes do not. However, nearly all microbes-aerobes, anaerobes, and facultative organisms alike-express enzymes whose substrates include O<sub>2</sub>, if only for detoxification. This presents a challenge when trying to assess which organisms are aerobic from genomic data alone. This challenge can be overcome by noting that O<sub>2</sub> utilization has wide-ranging effects on microbes: aerobes typically have larger genomes encoding distinctive O<sub>2</sub>-utilizing enzymes, for example. These effects permit high-quality prediction of O<sub>2</sub> utilization from annotated genome sequences, with several models displaying ≈80% accuracy on a ternary classification task for which blind guessing is only 33% accurate. Since genome annotation is compute-intensive and relies on many assumptions, we asked if annotation-free methods also perform well. We discovered that simple and efficient models based entirely on genomic sequence content-e.g., triplets of amino acids-perform as well as intensive annotation-based classifiers, enabling rapid processing of genomes. We further show that amino acid trimers are useful because they encode information about protein composition and phylogeny. To showcase the utility of rapid prediction, we estimated the prevalence of aerobes and anaerobes in diverse natural environments cataloged in the Earth Microbiome Project. Focusing on a well-studied O<sub>2</sub> gradient in the Black Sea, we found quantitative correspondence between local chemistry (O<sub>2</sub>:sulfide concentration ratio) and the composition of microbial communities. We, therefore, suggest that statistical methods like ours might be used to estimate, or \\\"sense,\\\" pivotal features of the chemical environment using DNA sequencing data.IMPORTANCEWe now have access to sequence data from a wide variety of natural environments. These data document a bewildering diversity of microbes, many known only from their genomes. Physiology-an organism's capacity to engage metabolically with its environment-may provide a more useful lens than taxonomy for understanding microbial communities. As an example of this broader principle, we developed algorithms that accurately predict microbial dioxygen utilization directly from genome sequences without annotating genes, e.g., by considering only the amino acids in protein sequences. Annotation-free algorithms enable rapid characterization of natural samples, highlighting quantitative correspondence between sequences and local O<sub>2</sub> levels in a data set from the Black Sea. This example suggests that DNA sequencing might be repurposed as a multi-pronged chemical sensor, estimating concentrations of O<sub>2</sub> and other key facets of complex natural settings.</p>\",\"PeriodicalId\":18819,\"journal\":{\"name\":\"mSystems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":5.0000,\"publicationDate\":\"2024-10-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11494890/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"mSystems\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1128/msystems.00763-24\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/9/4 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"mSystems","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1128/msystems.00763-24","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/4 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

有氧菌需要二氧(O2)才能生长;厌氧菌则不需要。然而,几乎所有微生物--需氧菌、厌氧菌和兼性生物--都表达底物包括氧气的酶,即使只是为了解毒。这给仅从基因组数据评估哪些生物是需氧生物带来了挑战。注意到氧气利用对微生物有广泛的影响,就能克服这一挑战:例如,好氧生物通常有较大的基因组,编码独特的氧气利用酶。这些影响使得我们可以从注释的基因组序列中高质量地预测氧气利用率,一些模型在三元分类任务中显示出≈80%的准确率,而盲目猜测的准确率只有33%。由于基因组注释是计算密集型工作,而且依赖于许多假设,因此我们想知道无注释方法是否也能取得很好的效果。我们发现,完全基于基因组序列内容(如氨基酸三聚体)的简单高效模型与基于注释的密集型分类器性能相当,从而实现了基因组的快速处理。我们进一步表明,氨基酸三聚体非常有用,因为它们编码了有关蛋白质组成和系统发育的信息。为了展示快速预测的实用性,我们估算了地球微生物组计划中编目的各种自然环境中好氧菌和厌氧菌的流行率。我们重点研究了黑海的氧气梯度,发现了当地化学(氧气:硫化物浓度比)与微生物群落组成之间的定量对应关系。因此,我们建议,像我们这样的统计方法可用于利用 DNA 测序数据估计或 "感知 "化学环境的关键特征。这些数据记录了令人困惑的微生物多样性,其中许多微生物只有通过基因组才为人所知。生理学--生物体与环境进行新陈代谢的能力--可能比分类学更有助于我们了解微生物群落。作为这一更广泛原则的一个例子,我们开发了一种算法,可以直接从基因组序列准确预测微生物的二氧利用率,而无需对基因进行注释,例如只考虑蛋白质序列中的氨基酸。无注释算法能够快速描述自然样本的特征,在来自黑海的数据集中突出显示了序列与当地氧气水平之间的定量对应关系。这个例子表明,DNA 测序可以重新用作多管齐下的化学传感器,估算氧气浓度和复杂自然环境的其他关键方面。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Annotation-free prediction of microbial dioxygen utilization.

Aerobes require dioxygen (O2) to grow; anaerobes do not. However, nearly all microbes-aerobes, anaerobes, and facultative organisms alike-express enzymes whose substrates include O2, if only for detoxification. This presents a challenge when trying to assess which organisms are aerobic from genomic data alone. This challenge can be overcome by noting that O2 utilization has wide-ranging effects on microbes: aerobes typically have larger genomes encoding distinctive O2-utilizing enzymes, for example. These effects permit high-quality prediction of O2 utilization from annotated genome sequences, with several models displaying ≈80% accuracy on a ternary classification task for which blind guessing is only 33% accurate. Since genome annotation is compute-intensive and relies on many assumptions, we asked if annotation-free methods also perform well. We discovered that simple and efficient models based entirely on genomic sequence content-e.g., triplets of amino acids-perform as well as intensive annotation-based classifiers, enabling rapid processing of genomes. We further show that amino acid trimers are useful because they encode information about protein composition and phylogeny. To showcase the utility of rapid prediction, we estimated the prevalence of aerobes and anaerobes in diverse natural environments cataloged in the Earth Microbiome Project. Focusing on a well-studied O2 gradient in the Black Sea, we found quantitative correspondence between local chemistry (O2:sulfide concentration ratio) and the composition of microbial communities. We, therefore, suggest that statistical methods like ours might be used to estimate, or "sense," pivotal features of the chemical environment using DNA sequencing data.IMPORTANCEWe now have access to sequence data from a wide variety of natural environments. These data document a bewildering diversity of microbes, many known only from their genomes. Physiology-an organism's capacity to engage metabolically with its environment-may provide a more useful lens than taxonomy for understanding microbial communities. As an example of this broader principle, we developed algorithms that accurately predict microbial dioxygen utilization directly from genome sequences without annotating genes, e.g., by considering only the amino acids in protein sequences. Annotation-free algorithms enable rapid characterization of natural samples, highlighting quantitative correspondence between sequences and local O2 levels in a data set from the Black Sea. This example suggests that DNA sequencing might be repurposed as a multi-pronged chemical sensor, estimating concentrations of O2 and other key facets of complex natural settings.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
mSystems
mSystems Biochemistry, Genetics and Molecular Biology-Biochemistry
CiteScore
10.50
自引率
3.10%
发文量
308
审稿时长
13 weeks
期刊介绍: mSystems™ will publish preeminent work that stems from applying technologies for high-throughput analyses to achieve insights into the metabolic and regulatory systems at the scale of both the single cell and microbial communities. The scope of mSystems™ encompasses all important biological and biochemical findings drawn from analyses of large data sets, as well as new computational approaches for deriving these insights. mSystems™ will welcome submissions from researchers who focus on the microbiome, genomics, metagenomics, transcriptomics, metabolomics, proteomics, glycomics, bioinformatics, and computational microbiology. mSystems™ will provide streamlined decisions, while carrying on ASM''s tradition of rigorous peer review.
期刊最新文献
Effect of combined probiotics and doxycycline therapy on the gut-skin axis in rosacea. Stable, multigenerational transmission of the bean seed microbiome despite abiotic stress. Antimicrobial and antibiofilm activity of human recombinant H1 histones against bacterial infections. Gut and oral microbial compositional differences in women with breast cancer, women with ductal carcinoma in situ, and healthy women. Metagenomic sequencing of CRISPRs as a new marker to aid in personal identification with low-biomass samples.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1