{"title":"BCDB: A Dual-Branch Network Based on Transformer for Predicting Transcription Factor Binding Sites.","authors":"Jia He, Yupeng Zhang, Yuhang Liu, Zhigan Zhou, Tianhao Li, Yongqing Zhang, Boqia Xie","doi":"10.1016/j.ymeth.2024.12.006","DOIUrl":null,"url":null,"abstract":"<p><p>Transcription factor binding sites (TFBSs) are critical in regulating gene expression. Precisely locating TFBSs can reveal the mechanisms of action of different transcription factors in gene transcription. Various deep learning methods have been proposed to predict TFBS; however, these models often need help demonstrating ideal performance under limited data conditions. Furthermore, these models typically have complex structures, which makes their decision-making processes difficult to transparentize. Addressing these issues, we have developed a framework named BCDB. This framework integrates multi-scale DNA information and employs a dual-branch output strategy. Integrating DNABERT, convolutional neural networks(CNN), and multi-head attention mechanisms enhances the feature extraction capabilities, significantly improving the accuracy of predictions. This innovative method aims to balance the extraction of global and local information, enhancing predictive performance while utilizing attention mechanisms to provide an intuitive way to explain the model's predictions, thus strengthening the overall interpretability of the model. Prediction results on 165 ChIP-seq datasets show that BCDB significantly outperforms other existing deep learning methods in terms of performance. Additionally, since the BCDB model utilizes transfer learning methods, it can transfer knowledge learned from many unlabeled data to specific cell line prediction tasks, allowing our model to achieve cross-cell line TFBS prediction. The source code for BCDB is available on https://github.com/ZhangLab312/BCDB.</p>","PeriodicalId":390,"journal":{"name":"Methods","volume":" ","pages":""},"PeriodicalIF":4.2000,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.ymeth.2024.12.006","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Transcription factor binding sites (TFBSs) are critical in regulating gene expression. Precisely locating TFBSs can reveal the mechanisms of action of different transcription factors in gene transcription. Various deep learning methods have been proposed to predict TFBS; however, these models often need help demonstrating ideal performance under limited data conditions. Furthermore, these models typically have complex structures, which makes their decision-making processes difficult to transparentize. Addressing these issues, we have developed a framework named BCDB. This framework integrates multi-scale DNA information and employs a dual-branch output strategy. Integrating DNABERT, convolutional neural networks(CNN), and multi-head attention mechanisms enhances the feature extraction capabilities, significantly improving the accuracy of predictions. This innovative method aims to balance the extraction of global and local information, enhancing predictive performance while utilizing attention mechanisms to provide an intuitive way to explain the model's predictions, thus strengthening the overall interpretability of the model. Prediction results on 165 ChIP-seq datasets show that BCDB significantly outperforms other existing deep learning methods in terms of performance. Additionally, since the BCDB model utilizes transfer learning methods, it can transfer knowledge learned from many unlabeled data to specific cell line prediction tasks, allowing our model to achieve cross-cell line TFBS prediction. The source code for BCDB is available on https://github.com/ZhangLab312/BCDB.
期刊介绍:
Methods focuses on rapidly developing techniques in the experimental biological and medical sciences.
Each topical issue, organized by a guest editor who is an expert in the area covered, consists solely of invited quality articles by specialist authors, many of them reviews. Issues are devoted to specific technical approaches with emphasis on clear detailed descriptions of protocols that allow them to be reproduced easily. The background information provided enables researchers to understand the principles underlying the methods; other helpful sections include comparisons of alternative methods giving the advantages and disadvantages of particular methods, guidance on avoiding potential pitfalls, and suggestions for troubleshooting.