利用共同子串对细菌进行分类

M. Can, Osman Gursoy
{"title":"利用共同子串对细菌进行分类","authors":"M. Can, Osman Gursoy","doi":"10.21533/SCJOURNAL.V8I1.167","DOIUrl":null,"url":null,"abstract":"For the taxonomic classification of microbes, 16S ribosomal RNA (rRNA) gene sequences are widely used in environmental microbiology as reliable markers. Although the massive sequencing of 16S rRNA gene amplicons encompassing the full length of genes is not easy, because of the limitations of the current sequencing techniques, in databases Greengenes, RDP, and SILVA millions of rRNA gene sequences are uploaded. In this research, first a new similarity measure LCSS, for full length genes is defined. Then it is found that sequences reported for the same bacteria species demonstrate around 53% average sequence similarity in Greengenes and SILVA databases, while average similarity among genes reported for different bacteria species is around 15% only. This is 63%, and 20% respectively at genus level for the three data bases Greengenes, RDP, and SILVA. Hence, species, and genus-specific sequences constitute useful targets for diagnostic assays and other scientific investigations. In the present research, the built in function LongestCommonSubsequence is used repeatedly in computer algebra package MATHEMATICA to create an in silico pipeline for taxonomic classification uploaded new full-length sequences. Conclusions: Our results suggest that LongestCommonSubsequence similarity can be used for taxonomic classification of unknown bacteria through their full 16S ribosomal RNA (rRNA) gene sequences.","PeriodicalId":243185,"journal":{"name":"Southeast Europe Journal of Soft Computing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Taxonomic Classification of Bacteria Using Common Substrings\",\"authors\":\"M. Can, Osman Gursoy\",\"doi\":\"10.21533/SCJOURNAL.V8I1.167\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For the taxonomic classification of microbes, 16S ribosomal RNA (rRNA) gene sequences are widely used in environmental microbiology as reliable markers. Although the massive sequencing of 16S rRNA gene amplicons encompassing the full length of genes is not easy, because of the limitations of the current sequencing techniques, in databases Greengenes, RDP, and SILVA millions of rRNA gene sequences are uploaded. In this research, first a new similarity measure LCSS, for full length genes is defined. Then it is found that sequences reported for the same bacteria species demonstrate around 53% average sequence similarity in Greengenes and SILVA databases, while average similarity among genes reported for different bacteria species is around 15% only. This is 63%, and 20% respectively at genus level for the three data bases Greengenes, RDP, and SILVA. Hence, species, and genus-specific sequences constitute useful targets for diagnostic assays and other scientific investigations. In the present research, the built in function LongestCommonSubsequence is used repeatedly in computer algebra package MATHEMATICA to create an in silico pipeline for taxonomic classification uploaded new full-length sequences. Conclusions: Our results suggest that LongestCommonSubsequence similarity can be used for taxonomic classification of unknown bacteria through their full 16S ribosomal RNA (rRNA) gene sequences.\",\"PeriodicalId\":243185,\"journal\":{\"name\":\"Southeast Europe Journal of Soft Computing\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-04-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Southeast Europe Journal of Soft Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21533/SCJOURNAL.V8I1.167\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Southeast Europe Journal of Soft Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21533/SCJOURNAL.V8I1.167","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

对于微生物的分类分类,16S核糖体RNA (rRNA)基因序列是环境微生物学中广泛使用的可靠标记。尽管由于当前测序技术的限制,对包含基因全长的16S rRNA基因扩增子进行大规模测序并不容易,但在Greengenes, RDP和SILVA数据库中上传了数百万个rRNA基因序列。本研究首先定义了一种新的全长基因相似性测度LCSS。然后发现,在Greengenes和SILVA数据库中报道的相同细菌物种的序列平均相似性约为53%,而不同细菌物种的基因平均相似性仅为15%左右。对于Greengenes、RDP和SILVA这三个数据库,这一比例分别为63%和20%。因此,物种和属特异性序列构成了诊断分析和其他科学研究的有用目标。在本研究中,反复使用计算机代数软件包MATHEMATICA中内置的函数LongestCommonSubsequence来创建一个计算机管道,用于上传新的全长序列的分类分类。结论:longestcommon子序列相似性可用于未知细菌16S核糖体RNA (rRNA)基因序列的分类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Taxonomic Classification of Bacteria Using Common Substrings
For the taxonomic classification of microbes, 16S ribosomal RNA (rRNA) gene sequences are widely used in environmental microbiology as reliable markers. Although the massive sequencing of 16S rRNA gene amplicons encompassing the full length of genes is not easy, because of the limitations of the current sequencing techniques, in databases Greengenes, RDP, and SILVA millions of rRNA gene sequences are uploaded. In this research, first a new similarity measure LCSS, for full length genes is defined. Then it is found that sequences reported for the same bacteria species demonstrate around 53% average sequence similarity in Greengenes and SILVA databases, while average similarity among genes reported for different bacteria species is around 15% only. This is 63%, and 20% respectively at genus level for the three data bases Greengenes, RDP, and SILVA. Hence, species, and genus-specific sequences constitute useful targets for diagnostic assays and other scientific investigations. In the present research, the built in function LongestCommonSubsequence is used repeatedly in computer algebra package MATHEMATICA to create an in silico pipeline for taxonomic classification uploaded new full-length sequences. Conclusions: Our results suggest that LongestCommonSubsequence similarity can be used for taxonomic classification of unknown bacteria through their full 16S ribosomal RNA (rRNA) gene sequences.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Movie Recommender System Hypervariable Regions in 16S rRNA Genes for the Taxonomic Classification A Survey On Security In Wireless Sensor Network Zeka - Friendy Chatterbot Taxonomic Classification of Bacteria Using Common Substrings
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1