Mauricio Arenas-Salinas, Jessica Lara Muñoz, José Antonio Reyes, Felipe Besoain
{"title":"Prediction of DNA-binding Sites in Transcriptions Factor in Fur-like Proteins Using Machine Learning and Molecular Descriptors","authors":"Mauricio Arenas-Salinas, Jessica Lara Muñoz, José Antonio Reyes, Felipe Besoain","doi":"10.2174/0115748936264122231016094702","DOIUrl":null,"url":null,"abstract":"Introduction: Transcription factors are of great interest in biotechnology due to their key role in the regulation of gene expression. One of the most important transcription factors in gramnegative bacteria is Fur, a global regulator studied as a therapeutic target for the design of antibacterial agents. Its DNA-binding domain, which contains a helix-turn-helix motif, is one of its most relevant features. Methods: In this study, we evaluated several machine learning algorithms for the prediction of DNA-binding sites based on proteins from the Fur superfamily and other helix-turn-helix transcription factors, including Support-Vector Machines (SVM), Random Forest (RF), Decision Trees (DT), and Naive Bayes (NB). We also tested the efficacy of using several molecular descriptors derived from the amino acid sequence and the structure of the protein fragments that bind the DNA. A feature selection procedure was employed to select fewer descriptors in each case by maintaining a good classification performance. Results: The best results were obtained with the SVM model using twelve sequence-derived attributes and the DT model using nine structure-derived features, achieving 82% and 76% accuracy, respectively. Conclusion: The performance obtained indicates that the descriptors we used are relevant for predicting DNA-binding sites since they can discriminate between binding and non-binding regions of a protein.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"25 5","pages":"0"},"PeriodicalIF":2.4000,"publicationDate":"2023-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2174/0115748936264122231016094702","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Transcription factors are of great interest in biotechnology due to their key role in the regulation of gene expression. One of the most important transcription factors in gramnegative bacteria is Fur, a global regulator studied as a therapeutic target for the design of antibacterial agents. Its DNA-binding domain, which contains a helix-turn-helix motif, is one of its most relevant features. Methods: In this study, we evaluated several machine learning algorithms for the prediction of DNA-binding sites based on proteins from the Fur superfamily and other helix-turn-helix transcription factors, including Support-Vector Machines (SVM), Random Forest (RF), Decision Trees (DT), and Naive Bayes (NB). We also tested the efficacy of using several molecular descriptors derived from the amino acid sequence and the structure of the protein fragments that bind the DNA. A feature selection procedure was employed to select fewer descriptors in each case by maintaining a good classification performance. Results: The best results were obtained with the SVM model using twelve sequence-derived attributes and the DT model using nine structure-derived features, achieving 82% and 76% accuracy, respectively. Conclusion: The performance obtained indicates that the descriptors we used are relevant for predicting DNA-binding sites since they can discriminate between binding and non-binding regions of a protein.
期刊介绍:
Current Bioinformatics aims to publish all the latest and outstanding developments in bioinformatics. Each issue contains a series of timely, in-depth/mini-reviews, research papers and guest edited thematic issues written by leaders in the field, covering a wide range of the integration of biology with computer and information science.
The journal focuses on advances in computational molecular/structural biology, encompassing areas such as computing in biomedicine and genomics, computational proteomics and systems biology, and metabolic pathway engineering. Developments in these fields have direct implications on key issues related to health care, medicine, genetic disorders, development of agricultural products, renewable energy, environmental protection, etc.