{"title":"CRISPR-OTE: Prediction of CRISPR On-Target Efficiency Based on Multi-Dimensional Feature Fusion","authors":"J. Xie , M. Liu , L. Zhou","doi":"10.1016/j.irbm.2022.07.003","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><p>Clustered Regularly Interspaced Short Palindromic Repeats<span> (CRISPR) is a powerful genome editing<span> technology. Guide RNA (gRNA) plays an essential guiding role in the CRISPR system by complementary base pairing with target DNA. Since the CRISPR targeting mechanism problem has not yet been fully resolved, it remains a challenge to predict gRNA on-target efficiency. Current gRNA design tools often lack efficient information extraction and cannot learn the target efficiency patterns thoroughly.</span></span></p></div><div><h3>Material and methods</h3><p>In this study, CRISPR-OTE is proposed to consider both multi-dimensional sequence information and important complementary prior knowledge based on a simple but effective framework. CRISPR-OTE consists of the local-contextual information branch and the prior knowledge branch. The local-contextual information branch extracts multi-dimensional sequence features from the DNA primary sequence by a parallel framework of Convolutional Neural Networks<span> (CNN) and bidirectional Long Short-Term Memory networks (biLSTM). The prior knowledge branch selects the optimal subset of physicochemical features to provide the neural network with complementary knowledge, such as complex secondary structures. A simple feature fusion strategy is also adopted to fully utilize multi-modal data from the two branches.</span></p></div><div><h3>Results</h3><p>The experimental results show that the optimal subset of physicochemical features (RNA secondary structure and melting temperature of 34nt target) can effectively improve the prediction performance. Additionally, combining multi-dimensional sequence features and multi-modal features can extract information more comprehensively. Through transfer learning, CRISPR-OTE trained on the CRISPR-Cpf1 system can also be successfully applied to the CRISPR-Cas9 system.</p></div><div><h3>Conclusion</h3><p>The performance of CRISPR-OTE is superior to other methods in different CRISPR systems and species. Therefore, CRISPR-OTE is a simple on-target efficiency prediction framework with better accuracy and generalization performance.</p></div>","PeriodicalId":14605,"journal":{"name":"Irbm","volume":"44 1","pages":"Article 100732"},"PeriodicalIF":5.6000,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Irbm","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S195903182200080X","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 2
Abstract
Objective
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) is a powerful genome editing technology. Guide RNA (gRNA) plays an essential guiding role in the CRISPR system by complementary base pairing with target DNA. Since the CRISPR targeting mechanism problem has not yet been fully resolved, it remains a challenge to predict gRNA on-target efficiency. Current gRNA design tools often lack efficient information extraction and cannot learn the target efficiency patterns thoroughly.
Material and methods
In this study, CRISPR-OTE is proposed to consider both multi-dimensional sequence information and important complementary prior knowledge based on a simple but effective framework. CRISPR-OTE consists of the local-contextual information branch and the prior knowledge branch. The local-contextual information branch extracts multi-dimensional sequence features from the DNA primary sequence by a parallel framework of Convolutional Neural Networks (CNN) and bidirectional Long Short-Term Memory networks (biLSTM). The prior knowledge branch selects the optimal subset of physicochemical features to provide the neural network with complementary knowledge, such as complex secondary structures. A simple feature fusion strategy is also adopted to fully utilize multi-modal data from the two branches.
Results
The experimental results show that the optimal subset of physicochemical features (RNA secondary structure and melting temperature of 34nt target) can effectively improve the prediction performance. Additionally, combining multi-dimensional sequence features and multi-modal features can extract information more comprehensively. Through transfer learning, CRISPR-OTE trained on the CRISPR-Cpf1 system can also be successfully applied to the CRISPR-Cas9 system.
Conclusion
The performance of CRISPR-OTE is superior to other methods in different CRISPR systems and species. Therefore, CRISPR-OTE is a simple on-target efficiency prediction framework with better accuracy and generalization performance.
期刊介绍:
IRBM is the journal of the AGBM (Alliance for engineering in Biology an Medicine / Alliance pour le génie biologique et médical) and the SFGBM (BioMedical Engineering French Society / Société française de génie biologique médical) and the AFIB (French Association of Biomedical Engineers / Association française des ingénieurs biomédicaux).
As a vehicle of information and knowledge in the field of biomedical technologies, IRBM is devoted to fundamental as well as clinical research. Biomedical engineering and use of new technologies are the cornerstones of IRBM, providing authors and users with the latest information. Its six issues per year propose reviews (state-of-the-art and current knowledge), original articles directed at fundamental research and articles focusing on biomedical engineering. All articles are submitted to peer reviewers acting as guarantors for IRBM''s scientific and medical content. The field covered by IRBM includes all the discipline of Biomedical engineering. Thereby, the type of papers published include those that cover the technological and methodological development in:
-Physiological and Biological Signal processing (EEG, MEG, ECG…)-
Medical Image processing-
Biomechanics-
Biomaterials-
Medical Physics-
Biophysics-
Physiological and Biological Sensors-
Information technologies in healthcare-
Disability research-
Computational physiology-
…