{"title":"Prediction of Transcription Factor Binding Sites Using Deep Learning Combined with DNA Sequences and Shape Feature Data","authors":"Yangyang Li, Jie Liu, Hao Liu","doi":"10.1145/3469877.3497696","DOIUrl":null,"url":null,"abstract":"Knowing transcription factor binding sites (TFBS) is essential to model underlying binding mechanisms and cellular functions. Studies have shown that in addition to the DNA sequence, the shape information of DNA is also an important factor affecting its activity. Here, we developed a CNN model to integrate 3D DNA shape information derived using a high-throughput method for predicting TF binding sites (TFBSs). We identify the best performing architectures by varying CNN window size, kernels, hidden nodes and hidden layers. The performance of the two types of data and their combination was evaluated using 69 different ChIP-seq [1] experiments. Our results showed that the model integrating shape information and sequence information compared favorably to the sequence-based model This work combines knowledge from structural biology and genomics, and DNA shape features improved the description of TF binding specificity.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Multimedia Asia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3469877.3497696","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Knowing transcription factor binding sites (TFBS) is essential to model underlying binding mechanisms and cellular functions. Studies have shown that in addition to the DNA sequence, the shape information of DNA is also an important factor affecting its activity. Here, we developed a CNN model to integrate 3D DNA shape information derived using a high-throughput method for predicting TF binding sites (TFBSs). We identify the best performing architectures by varying CNN window size, kernels, hidden nodes and hidden layers. The performance of the two types of data and their combination was evaluated using 69 different ChIP-seq [1] experiments. Our results showed that the model integrating shape information and sequence information compared favorably to the sequence-based model This work combines knowledge from structural biology and genomics, and DNA shape features improved the description of TF binding specificity.