{"title":"A multi-perspective deep learning framework for enhancer characterization and identification","authors":"Liwei Liu , Zhebin Tan , Yuxiao Wei , Qianhui Sun","doi":"10.1016/j.compbiolchem.2024.108284","DOIUrl":null,"url":null,"abstract":"<div><div>Enhancers are vital elements in the genome that boost the transcriptional activity of neighboring genes and are essential in regulating cell-specific gene expression. Therefore, accurately identifying and characterizing enhancers is essential for comprehending gene regulatory networks and the development of related diseases. This study introduces MPDL-Enhancer, a novel multi-perspective deep learning framework aimed at enhancer characterization and identification. In this study, enhancer sequences are encoded using the dna2vec model along with features derived from the structural properties of DNA sequences. Subsequently, these representations are processed through a novel dual-scale deep neural network designed to discern subtle correlations and extended interactions embedded within the semantic content of DNA. The predictive phase of our methodology employs a Support Vector Machine classifier to render the final classification. To rigorously assess the efficacy of our approach, a comprehensive evaluation was executed utilizing an independent test dataset, thereby substantiating the robustness and accuracy of our model. Our methodology demonstrated superior performance over existing computational techniques, with an accuracy (ACC) of 81.00 %, a sensitivity (SN) of 79.00 %, and specificity (SP) of 83.00 %. The innovative dual-scale deep neural network and the unique feature representation strategy contributed to this performance improvement. MPDL-Enhancer has effectively characterized enhancer sequences and achieved excellent predictive performance. Building upon this foundation, we conducted an interpretability analysis of the model, which can assist researchers in identifying key features and patterns that affect the functionality of enhancers, thereby promoting a deeper understanding of gene regulatory networks.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"114 ","pages":"Article 108284"},"PeriodicalIF":2.6000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Biology and Chemistry","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S147692712400272X","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Enhancers are vital elements in the genome that boost the transcriptional activity of neighboring genes and are essential in regulating cell-specific gene expression. Therefore, accurately identifying and characterizing enhancers is essential for comprehending gene regulatory networks and the development of related diseases. This study introduces MPDL-Enhancer, a novel multi-perspective deep learning framework aimed at enhancer characterization and identification. In this study, enhancer sequences are encoded using the dna2vec model along with features derived from the structural properties of DNA sequences. Subsequently, these representations are processed through a novel dual-scale deep neural network designed to discern subtle correlations and extended interactions embedded within the semantic content of DNA. The predictive phase of our methodology employs a Support Vector Machine classifier to render the final classification. To rigorously assess the efficacy of our approach, a comprehensive evaluation was executed utilizing an independent test dataset, thereby substantiating the robustness and accuracy of our model. Our methodology demonstrated superior performance over existing computational techniques, with an accuracy (ACC) of 81.00 %, a sensitivity (SN) of 79.00 %, and specificity (SP) of 83.00 %. The innovative dual-scale deep neural network and the unique feature representation strategy contributed to this performance improvement. MPDL-Enhancer has effectively characterized enhancer sequences and achieved excellent predictive performance. Building upon this foundation, we conducted an interpretability analysis of the model, which can assist researchers in identifying key features and patterns that affect the functionality of enhancers, thereby promoting a deeper understanding of gene regulatory networks.
期刊介绍:
Computational Biology and Chemistry publishes original research papers and review articles in all areas of computational life sciences. High quality research contributions with a major computational component in the areas of nucleic acid and protein sequence research, molecular evolution, molecular genetics (functional genomics and proteomics), theory and practice of either biology-specific or chemical-biology-specific modeling, and structural biology of nucleic acids and proteins are particularly welcome. Exceptionally high quality research work in bioinformatics, systems biology, ecology, computational pharmacology, metabolism, biomedical engineering, epidemiology, and statistical genetics will also be considered.
Given their inherent uncertainty, protein modeling and molecular docking studies should be thoroughly validated. In the absence of experimental results for validation, the use of molecular dynamics simulations along with detailed free energy calculations, for example, should be used as complementary techniques to support the major conclusions. Submissions of premature modeling exercises without additional biological insights will not be considered.
Review articles will generally be commissioned by the editors and should not be submitted to the journal without explicit invitation. However prospective authors are welcome to send a brief (one to three pages) synopsis, which will be evaluated by the editors.