Decomposition of the pangenome matrix reveals a structure in gene distribution in the Escherichia coli species.

IF 3.7 2区 生物学 Q2 MICROBIOLOGY mSphere Pub Date : 2025-01-28 Epub Date: 2024-12-31 DOI:10.1128/msphere.00532-24
Siddharth M Chauhan, Omid Ardalani, Jason C Hyun, Jonathan M Monk, Patrick V Phaneuf, Bernhard O Palsson
{"title":"Decomposition of the pangenome matrix reveals a structure in gene distribution in the <i>Escherichia coli</i> species.","authors":"Siddharth M Chauhan, Omid Ardalani, Jason C Hyun, Jonathan M Monk, Patrick V Phaneuf, Bernhard O Palsson","doi":"10.1128/msphere.00532-24","DOIUrl":null,"url":null,"abstract":"<p><p>Thousands of complete genome sequences for strains of a species that are now available enable the advancement of pangenome analytics to a new level of sophistication. We collected 2,377 publicly available complete genomes of <i>Escherichia coli</i> for detailed pangenome analysis. The core genome and accessory genomes consisted of 2,398 and 5,182 genes, respectively. We developed a machine learning approach to define the accessory genes characterizing the major phylogroups of <i>E. coli</i> plus <i>Shigella</i>: A, B1, B2, C, D, E, F, G, and <i>Shigella</i>. The analysis resulted in a detailed structure of the genetic basis of the phylogroups' differential traits. This pangenome structure was largely consistent with a housekeeping-gene-based MLST distribution, sequence-based Mash distance, and the Clermont quadruplex classification. The rare genome (consisting of genes found in <6.8% of all strains) consisted of 163,619 genes, about 79% of which represented variations of 315 underlying transposon elements. This analysis generated a mathematical definition of the genetic basis for a species.</p><p><strong>Importance: </strong>The comprehensive analysis of the pangenome of <i>Escherichia coli</i> presented in this study marks a significant advancement in understanding bacterial genetic diversity. By employing machine learning techniques to analyze 2,377 complete <i>E. coli</i> genomes, the study provides a detailed mapping of core, accessory, and rare genes. This approach reveals the genetic basis for differential traits across phylogroups, offering insights into pathogenicity, antibiotic resistance, and evolutionary adaptations. The findings enhance the potential for genome-based diagnostics and pave the way for future studies aimed at achieving a global genetic definition of bacterial phylogeny.</p>","PeriodicalId":19052,"journal":{"name":"mSphere","volume":" ","pages":"e0053224"},"PeriodicalIF":3.7000,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11774025/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"mSphere","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1128/msphere.00532-24","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/31 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Thousands of complete genome sequences for strains of a species that are now available enable the advancement of pangenome analytics to a new level of sophistication. We collected 2,377 publicly available complete genomes of Escherichia coli for detailed pangenome analysis. The core genome and accessory genomes consisted of 2,398 and 5,182 genes, respectively. We developed a machine learning approach to define the accessory genes characterizing the major phylogroups of E. coli plus Shigella: A, B1, B2, C, D, E, F, G, and Shigella. The analysis resulted in a detailed structure of the genetic basis of the phylogroups' differential traits. This pangenome structure was largely consistent with a housekeeping-gene-based MLST distribution, sequence-based Mash distance, and the Clermont quadruplex classification. The rare genome (consisting of genes found in <6.8% of all strains) consisted of 163,619 genes, about 79% of which represented variations of 315 underlying transposon elements. This analysis generated a mathematical definition of the genetic basis for a species.

Importance: The comprehensive analysis of the pangenome of Escherichia coli presented in this study marks a significant advancement in understanding bacterial genetic diversity. By employing machine learning techniques to analyze 2,377 complete E. coli genomes, the study provides a detailed mapping of core, accessory, and rare genes. This approach reveals the genetic basis for differential traits across phylogroups, offering insights into pathogenicity, antibiotic resistance, and evolutionary adaptations. The findings enhance the potential for genome-based diagnostics and pave the way for future studies aimed at achieving a global genetic definition of bacterial phylogeny.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
庞基因组矩阵的分解揭示了大肠杆菌物种基因分布的结构。
成千上万的物种菌株的完整基因组序列,现在可以使泛基因组分析的进步到一个新的复杂水平。我们收集了2377份公开的大肠杆菌全基因组进行了详细的泛基因组分析。核心基因组和辅助基因组分别包含2398个和5182个基因。我们开发了一种机器学习方法来定义表征大肠杆菌和志贺氏菌主要系统群的辅助基因:a、B1、B2、C、D、E、F、G和志贺氏菌。该分析得出了系统群差异性状的遗传基础的详细结构。这种泛基因组结构与基于管家基因的MLST分布、基于序列的Mash距离和Clermont四重分类基本一致。本研究对大肠杆菌泛基因组进行了全面分析,标志着对细菌遗传多样性的认识取得了重大进展。通过使用机器学习技术分析2377个完整的大肠杆菌基因组,该研究提供了核心、辅助和罕见基因的详细图谱。这种方法揭示了跨种群差异性状的遗传基础,为致病性、抗生素耐药性和进化适应提供了见解。这一发现增强了基因组诊断的潜力,并为旨在实现细菌系统发育的全球遗传定义的未来研究铺平了道路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
mSphere
mSphere Immunology and Microbiology-Microbiology
CiteScore
8.50
自引率
2.10%
发文量
192
审稿时长
11 weeks
期刊介绍: mSphere™ is a multi-disciplinary open-access journal that will focus on rapid publication of fundamental contributions to our understanding of microbiology. Its scope will reflect the immense range of fields within the microbial sciences, creating new opportunities for researchers to share findings that are transforming our understanding of human health and disease, ecosystems, neuroscience, agriculture, energy production, climate change, evolution, biogeochemical cycling, and food and drug production. Submissions will be encouraged of all high-quality work that makes fundamental contributions to our understanding of microbiology. mSphere™ will provide streamlined decisions, while carrying on ASM''s tradition for rigorous peer review.
期刊最新文献
Prospective comparison of the digestive tract resistome and microbiota in cattle raised in grass-fed versus grain-fed production systems. Prophages are infrequently associated with antibiotic resistance in Pseudomonas aeruginosa clinical isolates. Virus-induced perturbations in the mouse microbiome are impacted by microbial experience. Abundance of clinically relevant antimicrobial resistance genes in the golden jackal (Canis aureus) gut. Characterization of diet-linked amino acid pool influence on Fusobacterium spp. growth and metabolism.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1