通过基于机器学习的分析确定细菌重要基因中不同使用的密码子。

IF 2.3 3区 生物学 Q3 BIOCHEMISTRY & MOLECULAR BIOLOGY Molecular Genetics and Genomics Pub Date : 2024-07-27 DOI:10.1007/s00438-024-02163-0
Annushree Kurmi, Piyali Sen, Madhusmita Dash, Suvendra Kumar Ray, Siddhartha Sankar Satapathy
{"title":"通过基于机器学习的分析确定细菌重要基因中不同使用的密码子。","authors":"Annushree Kurmi, Piyali Sen, Madhusmita Dash, Suvendra Kumar Ray, Siddhartha Sankar Satapathy","doi":"10.1007/s00438-024-02163-0","DOIUrl":null,"url":null,"abstract":"<p><p>Codon usage bias (CUB), the uneven usage of synonymous codons encoding the same amino acid, differs among genes within and across bacteria genomes. CUB is known to be influenced by gene expression and accordingly, CUB differs between the high-expression and low-expression genes in several bacteria. In this article, we have extended codon usage study considering gene essentiality as a feature. Using machine learning (ML) based approaches, we have analysed Relative Synonymous Codon Usage (RSCU) values between essential and non-essential genes in Escherichia coli and thirty-four other bacterial genomes whose gene essentiality features were available in public databases. We observed significant differences in codon usage patterns between essential and non-essential genes for majority of the bacterial genomes and accordingly, ML based classifiers achieved high area under curve (AUC) scores, with a minimum score of 70.0 across twenty-eight organisms. Further, importance of the codons towards classifying genes found to differ among the codons in each genome. Arg codon CGT and Gly codon GGT were observed to be the most preferred codons among essential genes in Escherichia coli. Interestingly, some of the codons like CGT, ATA, GGT and GGG observed to be contributing consistently towards classifying essential genes across thirty-five bacteria genomes studied. In other hand, codons TGY and CAY encoding amino acids Cys and His respectively were among the least contributing codons towards classification among all these bacteria. This study demonstrates the gene essentiality based differences in synonymous codon usage in bacteria genomes and presents a common codon usage pattern across bacteria.</p>","PeriodicalId":18816,"journal":{"name":"Molecular Genetics and Genomics","volume":null,"pages":null},"PeriodicalIF":2.3000,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Differentially used codons among essential genes in bacteria identified by machine learning-based analysis.\",\"authors\":\"Annushree Kurmi, Piyali Sen, Madhusmita Dash, Suvendra Kumar Ray, Siddhartha Sankar Satapathy\",\"doi\":\"10.1007/s00438-024-02163-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Codon usage bias (CUB), the uneven usage of synonymous codons encoding the same amino acid, differs among genes within and across bacteria genomes. CUB is known to be influenced by gene expression and accordingly, CUB differs between the high-expression and low-expression genes in several bacteria. In this article, we have extended codon usage study considering gene essentiality as a feature. Using machine learning (ML) based approaches, we have analysed Relative Synonymous Codon Usage (RSCU) values between essential and non-essential genes in Escherichia coli and thirty-four other bacterial genomes whose gene essentiality features were available in public databases. We observed significant differences in codon usage patterns between essential and non-essential genes for majority of the bacterial genomes and accordingly, ML based classifiers achieved high area under curve (AUC) scores, with a minimum score of 70.0 across twenty-eight organisms. Further, importance of the codons towards classifying genes found to differ among the codons in each genome. Arg codon CGT and Gly codon GGT were observed to be the most preferred codons among essential genes in Escherichia coli. Interestingly, some of the codons like CGT, ATA, GGT and GGG observed to be contributing consistently towards classifying essential genes across thirty-five bacteria genomes studied. In other hand, codons TGY and CAY encoding amino acids Cys and His respectively were among the least contributing codons towards classification among all these bacteria. This study demonstrates the gene essentiality based differences in synonymous codon usage in bacteria genomes and presents a common codon usage pattern across bacteria.</p>\",\"PeriodicalId\":18816,\"journal\":{\"name\":\"Molecular Genetics and Genomics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2024-07-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular Genetics and Genomics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1007/s00438-024-02163-0\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Genetics and Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s00438-024-02163-0","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

密码子使用偏差(CUB)是指编码相同氨基酸的同义密码子的不均衡使用,在细菌基因组内部和不同基因之间存在差异。众所周知,CUB 受基因表达的影响,因此在一些细菌中,高表达基因和低表达基因之间的 CUB 存在差异。在本文中,我们扩展了密码子使用研究,将基因本质作为一个特征。利用基于机器学习(ML)的方法,我们分析了大肠杆菌和其他 34 个细菌基因组中基因本质特征可从公共数据库中获得的基因本质和非本质基因之间的相对同义密码子用法(RSCU)值。在大多数细菌基因组中,我们观察到必需基因和非必需基因之间的密码子使用模式存在明显差异,因此,基于 ML 的分类器获得了较高的曲线下面积(AUC)分数,在 28 个生物体中的最低分数为 70.0。此外,每个基因组中不同密码子对基因分类的重要性也不同。在大肠杆菌的重要基因中,Arg 密码子 CGT 和 Gly 密码子 GGT 是最受欢迎的密码子。有趣的是,在所研究的 35 个细菌基因组中,CGT、ATA、GGT 和 GGG 等一些密码子被观察到对重要基因的分类有一致的贡献。另一方面,分别编码氨基酸 Cys 和 His 的密码子 TGY 和 CAY 是所有这些细菌中对基因分类贡献最小的密码子。这项研究证明了细菌基因组中同义密码子使用的差异对基因本质的影响,并提出了细菌中常见的密码子使用模式。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Differentially used codons among essential genes in bacteria identified by machine learning-based analysis.

Codon usage bias (CUB), the uneven usage of synonymous codons encoding the same amino acid, differs among genes within and across bacteria genomes. CUB is known to be influenced by gene expression and accordingly, CUB differs between the high-expression and low-expression genes in several bacteria. In this article, we have extended codon usage study considering gene essentiality as a feature. Using machine learning (ML) based approaches, we have analysed Relative Synonymous Codon Usage (RSCU) values between essential and non-essential genes in Escherichia coli and thirty-four other bacterial genomes whose gene essentiality features were available in public databases. We observed significant differences in codon usage patterns between essential and non-essential genes for majority of the bacterial genomes and accordingly, ML based classifiers achieved high area under curve (AUC) scores, with a minimum score of 70.0 across twenty-eight organisms. Further, importance of the codons towards classifying genes found to differ among the codons in each genome. Arg codon CGT and Gly codon GGT were observed to be the most preferred codons among essential genes in Escherichia coli. Interestingly, some of the codons like CGT, ATA, GGT and GGG observed to be contributing consistently towards classifying essential genes across thirty-five bacteria genomes studied. In other hand, codons TGY and CAY encoding amino acids Cys and His respectively were among the least contributing codons towards classification among all these bacteria. This study demonstrates the gene essentiality based differences in synonymous codon usage in bacteria genomes and presents a common codon usage pattern across bacteria.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Molecular Genetics and Genomics
Molecular Genetics and Genomics 生物-生化与分子生物学
CiteScore
5.10
自引率
3.20%
发文量
134
审稿时长
1 months
期刊介绍: Molecular Genetics and Genomics (MGG) publishes peer-reviewed articles covering all areas of genetics and genomics. Any approach to the study of genes and genomes is considered, be it experimental, theoretical or synthetic. MGG publishes research on all organisms that is of broad interest to those working in the fields of genetics, genomics, biology, medicine and biotechnology. The journal investigates a broad range of topics, including these from recent issues: mechanisms for extending longevity in a variety of organisms; screening of yeast metal homeostasis genes involved in mitochondrial functions; molecular mapping of cultivar-specific avirulence genes in the rice blast fungus and more.
期刊最新文献
Research progress of nanog gene in fish. A systematic investigation of clear cell renal cell carcinoma using meta-analysis and systems biology approaches Transcriptomic analysis reveals oxidative stress-related signature and molecular subtypes in cholangio carcinoma. Uncovering novel regulatory variants in carbohydrate metabolism: a comprehensive multi-omics study of glycemic traits in the Indian population. A homozygous missense variant in YTHDC2 induces azoospermia in two siblings.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1