sRNAdeep: a novel tool for bacterial sRNA prediction based on DistilBERT encoding mode and deep learning algorithms.

IF 3.5 2区 生物学 Q2 BIOTECHNOLOGY & APPLIED MICROBIOLOGY BMC Genomics Pub Date : 2024-10-31 DOI:10.1186/s12864-024-10951-6
Weiye Qian, Jiawei Sun, Tianyi Liu, Zhiyuan Yang, Stephen Kwok-Wing Tsui
{"title":"sRNAdeep: a novel tool for bacterial sRNA prediction based on DistilBERT encoding mode and deep learning algorithms.","authors":"Weiye Qian, Jiawei Sun, Tianyi Liu, Zhiyuan Yang, Stephen Kwok-Wing Tsui","doi":"10.1186/s12864-024-10951-6","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Bacterial small regulatory RNA (sRNA) plays a crucial role in cell metabolism and could be used as a new potential drug target in the treatment of pathogen-induced disease. However, experimental methods for identifying sRNAs still require a large investment of human and material resources.</p><p><strong>Methods: </strong>In this study, we propose a novel sRNA prediction model called sRNAdeep based on the DistilBERT feature extraction and TextCNN methods. The sRNA and non-sRNA sequences of bacteria were considered as sentences and then fed into a composite model consisting of deep learning models to evaluate classification performance.</p><p><strong>Results: </strong>By filtering sRNAs from BSRD database, we obtained a validation dataset comprised of 2438 positive and 4730 negative samples. The benchmark experiments showed that sRNAdeep displayed better performance in the various indexes compared to previous sRNA prediction tools. By applying our tool to Mycobacterium tuberculosis (MTB) genome, we have identified 21 sRNAs within the intergenic and intron regions. A set of 272 targeted genes regulated by these sRNAs were also captured in MTB. The coding proteins of two genes (lysX and icd1) are implicated in drug response, with significant active sites related to drug resistance mechanisms of MTB.</p><p><strong>Conclusion: </strong>In conclusion, our newly developed sRNAdeep can help researchers identify bacterial sRNAs more precisely and can be freely available from https://github.com/pyajagod/sRNAdeep.git .</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":null,"pages":null},"PeriodicalIF":3.5000,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11526673/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12864-024-10951-6","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Bacterial small regulatory RNA (sRNA) plays a crucial role in cell metabolism and could be used as a new potential drug target in the treatment of pathogen-induced disease. However, experimental methods for identifying sRNAs still require a large investment of human and material resources.

Methods: In this study, we propose a novel sRNA prediction model called sRNAdeep based on the DistilBERT feature extraction and TextCNN methods. The sRNA and non-sRNA sequences of bacteria were considered as sentences and then fed into a composite model consisting of deep learning models to evaluate classification performance.

Results: By filtering sRNAs from BSRD database, we obtained a validation dataset comprised of 2438 positive and 4730 negative samples. The benchmark experiments showed that sRNAdeep displayed better performance in the various indexes compared to previous sRNA prediction tools. By applying our tool to Mycobacterium tuberculosis (MTB) genome, we have identified 21 sRNAs within the intergenic and intron regions. A set of 272 targeted genes regulated by these sRNAs were also captured in MTB. The coding proteins of two genes (lysX and icd1) are implicated in drug response, with significant active sites related to drug resistance mechanisms of MTB.

Conclusion: In conclusion, our newly developed sRNAdeep can help researchers identify bacterial sRNAs more precisely and can be freely available from https://github.com/pyajagod/sRNAdeep.git .

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
sRNAdeep:基于 DistilBERT 编码模式和深度学习算法的细菌 sRNA 预测新工具。
背景:细菌小调控 RNA(sRNA)在细胞代谢中起着至关重要的作用,可作为治疗病原体引起的疾病的潜在新药靶点。然而,鉴定 sRNA 的实验方法仍然需要投入大量的人力和物力:本研究基于 DistilBERT 特征提取和 TextCNN 方法,提出了一种新型 sRNA 预测模型 sRNAdeep。将细菌的sRNA和非sRNA序列视为句子,然后将其输入由深度学习模型组成的复合模型,以评估分类性能:通过过滤 BSRD 数据库中的 sRNA,我们得到了由 2438 个阳性样本和 4730 个阴性样本组成的验证数据集。基准实验结果表明,与之前的sRNA预测工具相比,sRNAdeep在各项指标上都有更好的表现。通过将我们的工具应用于结核分枝杆菌(MTB)基因组,我们在基因间区和内含子区发现了 21 个 sRNA。我们还在 MTB 中捕获了一组受这些 sRNA 调控的 272 个目标基因。两个基因(lysX 和 icd1)的编码蛋白与药物反应有关,其重要活性位点与 MTB 的耐药机制相关:总之,我们新开发的 sRNAdeep 可以帮助研究人员更精确地识别细菌 sRNA,并可从 https://github.com/pyajagod/sRNAdeep.git 免费获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
BMC Genomics
BMC Genomics 生物-生物工程与应用微生物
CiteScore
7.40
自引率
4.50%
发文量
769
审稿时长
6.4 months
期刊介绍: BMC Genomics is an open access, peer-reviewed journal that considers articles on all aspects of genome-scale analysis, functional genomics, and proteomics. BMC Genomics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.
期刊最新文献
Transcriptome analysis identifies the NR4A subfamily involved in the alleviating effect of folic acid on mastitis induced by high concentration of Staphylococcus aureus lipoteichoic acid. A comprehensive analysis of the defense responses of Odontotermes formosanus (Shiraki) provides insights into the changes during Serratia marcescens infection. An INDEL genomic approach to explore population diversity of phytoplankton. Epigenetic dynamics of partially methylated domains in human placenta and trophoblast stem cells. Genetic diversity assessment of cucumber landraces using molecular signatures.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1