DeepMEns: an ensemble model for predicting sgRNA on-target activity based on multiple features.

IF 2.5 3区 生物学 Q3 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Briefings in Functional Genomics Pub Date : 2024-11-11 DOI:10.1093/bfgp/elae043
Shumei Ding, Jia Zheng, Cangzhi Jia
{"title":"DeepMEns: an ensemble model for predicting sgRNA on-target activity based on multiple features.","authors":"Shumei Ding, Jia Zheng, Cangzhi Jia","doi":"10.1093/bfgp/elae043","DOIUrl":null,"url":null,"abstract":"<p><p>The CRISPR/Cas9 system developed from Streptococcus pyogenes (SpCas9) has high potential in gene editing. However, its successful application is hindered by the considerable variability in target efficiencies across different single guide RNAs (sgRNAs). Although several deep learning models have been created to predict sgRNA on-target activity, the intrinsic mechanisms of these models are difficult to explain, and there is still scope for improvement in prediction performance. To overcome these issues, we propose an ensemble interpretable model termed DeepMEns based on deep learning to predict sgRNA on-target activity. By using five different training and validation datasets, we constructed five sub-regressors, each comprising three parts. The first part uses one-hot encoding, wherein 0-1 representation of the secondary structure is used as the input to the convolutional neural network (CNN) with Transformer encoder. The second part uses the DNA shape feature matrix as the input to the CNN with Transformer encoder. The third part uses positional encoding feature matrices as the proposed input into a long short-term memory network with an attention mechanism. These three parts are concatenated through the flattened layer, and the final prediction result is the average of the five sub-regressors. Extensive benchmarking experiments indicated that DeepMEns achieved the highest Spearman correlation coefficient for 6 of 10 independent test datasets as compared to previous predictors, this finding confirmed that DeepMEns can accomplish state-of-the-art performance. Moreover, the ablation analysis also indicated that the ensemble strategy may improve the performance of the prediction model.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in Functional Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bfgp/elae043","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

The CRISPR/Cas9 system developed from Streptococcus pyogenes (SpCas9) has high potential in gene editing. However, its successful application is hindered by the considerable variability in target efficiencies across different single guide RNAs (sgRNAs). Although several deep learning models have been created to predict sgRNA on-target activity, the intrinsic mechanisms of these models are difficult to explain, and there is still scope for improvement in prediction performance. To overcome these issues, we propose an ensemble interpretable model termed DeepMEns based on deep learning to predict sgRNA on-target activity. By using five different training and validation datasets, we constructed five sub-regressors, each comprising three parts. The first part uses one-hot encoding, wherein 0-1 representation of the secondary structure is used as the input to the convolutional neural network (CNN) with Transformer encoder. The second part uses the DNA shape feature matrix as the input to the CNN with Transformer encoder. The third part uses positional encoding feature matrices as the proposed input into a long short-term memory network with an attention mechanism. These three parts are concatenated through the flattened layer, and the final prediction result is the average of the five sub-regressors. Extensive benchmarking experiments indicated that DeepMEns achieved the highest Spearman correlation coefficient for 6 of 10 independent test datasets as compared to previous predictors, this finding confirmed that DeepMEns can accomplish state-of-the-art performance. Moreover, the ablation analysis also indicated that the ensemble strategy may improve the performance of the prediction model.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
DeepMEns:基于多种特征预测 sgRNA 靶向活性的集合模型。
从化脓性链球菌(SpCas9)中开发的 CRISPR/Cas9 系统在基因编辑方面具有很大的潜力。然而,不同的单导RNA(sgRNA)在靶标效率上存在很大差异,这阻碍了它的成功应用。虽然已经创建了几个深度学习模型来预测 sgRNA 的靶上活性,但这些模型的内在机制难以解释,预测性能仍有改进的余地。为了克服这些问题,我们提出了一种基于深度学习的集合可解释模型,称为 DeepMEns,用于预测 sgRNA 靶向活性。通过使用五个不同的训练和验证数据集,我们构建了五个子回归器,每个子回归器由三部分组成。第一部分使用单次编码,其中二级结构的 0-1 表示被用作带有 Transformer 编码器的卷积神经网络(CNN)的输入。第二部分使用 DNA 形状特征矩阵作为带变换器编码器的卷积神经网络的输入。第三部分使用位置编码特征矩阵作为具有注意力机制的长短期记忆网络的拟议输入。这三个部分通过扁平化层进行串联,最终预测结果是五个子回归器的平均值。广泛的基准测试实验表明,在 10 个独立测试数据集中,DeepMEns 有 6 个数据集的斯皮尔曼相关系数与之前的预测器相比最高,这一结果证实了 DeepMEns 可以达到最先进的性能。此外,消融分析还表明,集合策略可以提高预测模型的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Briefings in Functional Genomics
Briefings in Functional Genomics BIOTECHNOLOGY & APPLIED MICROBIOLOGY-GENETICS & HEREDITY
CiteScore
6.30
自引率
2.50%
发文量
37
审稿时长
6-12 weeks
期刊介绍: Briefings in Functional Genomics publishes high quality peer reviewed articles that focus on the use, development or exploitation of genomic approaches, and their application to all areas of biological research. As well as exploring thematic areas where these techniques and protocols are being used, articles review the impact that these approaches have had, or are likely to have, on their field. Subjects covered by the Journal include but are not restricted to: the identification and functional characterisation of coding and non-coding features in genomes, microarray technologies, gene expression profiling, next generation sequencing, pharmacogenomics, phenomics, SNP technologies, transgenic systems, mutation screens and genotyping. Articles range in scope and depth from the introductory level to specific details of protocols and analyses, encompassing bacterial, fungal, plant, animal and human data. The editorial board welcome the submission of review articles for publication. Essential criteria for the publication of papers is that they do not contain primary data, and that they are high quality, clearly written review articles which provide a balanced, highly informative and up to date perspective to researchers in the field of functional genomics.
期刊最新文献
Sesame Genomic Web Resource (SesameGWR): a well-annotated data resource for transcriptomic signatures of abiotic and biotic stress responses in sesame (Sesamum indicum L.). A comprehensive survey of dimensionality reduction and clustering methods for single-cell and spatial transcriptomics data. AMLdb: a comprehensive multi-omics platform to identify biomarkers and drug targets for acute myeloid leukemia. Advances in integrating single-cell sequencing data to unravel the mechanism of ferroptosis in cancer. Long-read RNA sequencing can probe organelle genome pervasive transcription.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1