Pub Date : 2023-04-01DOI: 10.1142/S0219720023500063
Standa Na, Dhammika Leshan Wannigama, Thammakorn Saethang
Antimicrobial resistance is a major public health concern. Antimicrobial peptides (AMPs) are one of the host defense mechanisms responding efficiently against multidrug-resistant microbes. Since the process of screening AMPs from a large number of peptides is still high-priced and time-consuming, the development of a precise and rapid computer-aided tool is essential for preliminary AMPs selection ahead of laboratory experiments. In this study, we proposed AMPs recognition models using a new peptide encoding method called amino acid index weight (AAIW). Four AMPs recognition models including antimicrobial, antibacterial, antiviral, and antifungal were trained based on datasets combined from the DRAMP and other published databases. These models achieved high performance compared to the preceding AMPs recognition models when evaluated on two independent test sets. All four models yielded over 93% in accuracy and 0.87 in Matthew's correlation coefficient (MCC). An online AMPs recognition server is accessible at https://amppred-aaiw.com.
{"title":"Antimicrobial peptides recognition using weighted physicochemical property encoding.","authors":"Standa Na, Dhammika Leshan Wannigama, Thammakorn Saethang","doi":"10.1142/S0219720023500063","DOIUrl":"https://doi.org/10.1142/S0219720023500063","url":null,"abstract":"<p><p>Antimicrobial resistance is a major public health concern. Antimicrobial peptides (AMPs) are one of the host defense mechanisms responding efficiently against multidrug-resistant microbes. Since the process of screening AMPs from a large number of peptides is still high-priced and time-consuming, the development of a precise and rapid computer-aided tool is essential for preliminary AMPs selection ahead of laboratory experiments. In this study, we proposed AMPs recognition models using a new peptide encoding method called amino acid index weight (AAIW). Four AMPs recognition models including antimicrobial, antibacterial, antiviral, and antifungal were trained based on datasets combined from the DRAMP and other published databases. These models achieved high performance compared to the preceding AMPs recognition models when evaluated on two independent test sets. All four models yielded over 93% in accuracy and 0.87 in Matthew's correlation coefficient (MCC). An online AMPs recognition server is accessible at https://amppred-aaiw.com.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9528874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-01DOI: 10.1142/S0219720023300010
Anna Dotsenko, Jury Denisenko, Dmitrii Osipov, Aleksandra Rozhkova, Ivan Zorov, Arkady Sinitsyn
Thermostability of cellulases can be increased through amino acid substitutions and by protein engineering with predictors of protein thermostability. We have carried out a systematic analysis of the performance of 18 predictors for the engineering of cellulases. The predictors were PoPMuSiC, HoTMuSiC, I-Mutant 2.0, I-Mutant Suite, PremPS, Hotspot, Maestroweb, DynaMut, ENCoM ([Formula: see text] and [Formula: see text], mCSM, SDM, DUET, RosettaDesign, Cupsat (thermal and denaturant approaches), ConSurf, and Voronoia. The highest values of accuracy, F-measure, and MCC were obtained for DynaMut, SDM, RosettaDesign, and PremPS. A combination of the predictors provided an improvement in the performance. F-measure and MCC were improved by 14% and 28%, respectively. Accuracy and sensitivity were also improved by 9% and 20%, respectively, compared to the maximal values of single predictors. The reported values of the performance of the predictors and their combination may aid research in the engineering of thermostable cellulases as well as the further development of thermostability predictors.
{"title":"Testing and improving the performance of protein thermostability predictors for the engineering of cellulases.","authors":"Anna Dotsenko, Jury Denisenko, Dmitrii Osipov, Aleksandra Rozhkova, Ivan Zorov, Arkady Sinitsyn","doi":"10.1142/S0219720023300010","DOIUrl":"https://doi.org/10.1142/S0219720023300010","url":null,"abstract":"Thermostability of cellulases can be increased through amino acid substitutions and by protein engineering with predictors of protein thermostability. We have carried out a systematic analysis of the performance of 18 predictors for the engineering of cellulases. The predictors were PoPMuSiC, HoTMuSiC, I-Mutant 2.0, I-Mutant Suite, PremPS, Hotspot, Maestroweb, DynaMut, ENCoM ([Formula: see text] and [Formula: see text], mCSM, SDM, DUET, RosettaDesign, Cupsat (thermal and denaturant approaches), ConSurf, and Voronoia. The highest values of accuracy, F-measure, and MCC were obtained for DynaMut, SDM, RosettaDesign, and PremPS. A combination of the predictors provided an improvement in the performance. F-measure and MCC were improved by 14% and 28%, respectively. Accuracy and sensitivity were also improved by 9% and 20%, respectively, compared to the maximal values of single predictors. The reported values of the performance of the predictors and their combination may aid research in the engineering of thermostable cellulases as well as the further development of thermostability predictors.","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9473268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-01DOI: 10.1142/S021972002350004X
Zi-Yi He, Jie-Yu Yang, Yong Li
To solve the problem of the lack of representativeness of the training set and the poor prediction accuracy due to the limited number of training samples when the machine learning method is used for the classification and prediction of pharmacokinetic indicators, this paper proposes a 1DCNN-Attention concentration prediction model optimized by the sparrow search algorithm (SSA). First, the SMOTE method is used to expand the small sample experimental data to make the data diverse and representative. Then a one-dimensional convolutional neural network (1DCNN) model is established, and the attention mechanism is introduced to calculate the weight of each variable for dividing the importance of each pharmacokinetic indicator by the output drug concentration. The SSA algorithm was used to optimize the parameters in the model to improve the prediction accuracy after data expansion. Taking the pharmacokinetic model of phenobarbital (PHB) combined with Cynanchum otophyllum saponins to treat epilepsy as an example, the concentration changes of PHB were predicted and the effectiveness of the method was verified. The results show that the proposed model has a better prediction effect than other methods.
{"title":"A pharmacokinetic model based on the SSA-1DCNN-Attention method.","authors":"Zi-Yi He, Jie-Yu Yang, Yong Li","doi":"10.1142/S021972002350004X","DOIUrl":"https://doi.org/10.1142/S021972002350004X","url":null,"abstract":"<p><p>To solve the problem of the lack of representativeness of the training set and the poor prediction accuracy due to the limited number of training samples when the machine learning method is used for the classification and prediction of pharmacokinetic indicators, this paper proposes a 1DCNN-Attention concentration prediction model optimized by the sparrow search algorithm (SSA). First, the SMOTE method is used to expand the small sample experimental data to make the data diverse and representative. Then a one-dimensional convolutional neural network (1DCNN) model is established, and the attention mechanism is introduced to calculate the weight of each variable for dividing the importance of each pharmacokinetic indicator by the output drug concentration. The SSA algorithm was used to optimize the parameters in the model to improve the prediction accuracy after data expansion. Taking the pharmacokinetic model of phenobarbital (PHB) combined with <i>Cynanchum otophyllum saponins</i> to treat epilepsy as an example, the concentration changes of PHB were predicted and the effectiveness of the method was verified. The results show that the proposed model has a better prediction effect than other methods.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9473265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work proposes a machine learning-based phylogenetic tree generation model based on agglomerative clustering (PTGAC) that compares protein sequences considering all known chemical properties of amino acids. The proposed model can serve as a suitable alternative to the Unweighted Pair Group Method with Arithmetic Mean (UPGMA), which is inherently time-consuming in nature. Initially, principal component analysis (PCA) is used in the proposed scheme to reduce the dimensions of 20 amino acids using seven known chemical characteristics, yielding 20 TP (Total Points) values for each amino acid. The approach of cumulative summing is then used to give a non-degenerate numeric representation of the sequences based on these 20 TP values. A special kind of three-component vector is proposed as a descriptor, which consists of a new type of non-central moment of orders one, two, and three. Subsequently, the proposed model uses Euclidean Distance measures among the descriptors to create a distance matrix. Finally, a phylogenetic tree is constructed using hierarchical agglomerative clustering based on the distance matrix. The results are compared with the UPGMA and other existing methods in terms of the quality and time of constructing the phylogenetic tree. Both qualitative and quantitative analysis are performed as key assessment criteria for analyzing the performance of the proposed model. The qualitative analysis of the phylogenetic tree is performed by considering rationalized perception, while the quantitative analysis is performed based on symmetric distance (SD). On both criteria, the results obtained by the proposed model are more satisfactory than those produced earlier on the same species by other methods. Notably, this method is found to be efficient in terms of both time and space requirements and is capable of dealing with protein sequences of varying lengths.
{"title":"PTGAC Model: A machine learning approach for constructing phylogenetic tree to compare protein sequences.","authors":"Jayanta Pal, Sourav Saha, Bansibadan Maji, Dilip Kumar Bhattacharya","doi":"10.1142/S0219720022500287","DOIUrl":"https://doi.org/10.1142/S0219720022500287","url":null,"abstract":"<p><p>This work proposes a machine learning-based phylogenetic tree generation model based on agglomerative clustering (PTGAC) that compares protein sequences considering all known chemical properties of amino acids. The proposed model can serve as a suitable alternative to the Unweighted Pair Group Method with Arithmetic Mean (UPGMA), which is inherently time-consuming in nature. Initially, principal component analysis (PCA) is used in the proposed scheme to reduce the dimensions of 20 amino acids using seven known chemical characteristics, yielding 20 TP (Total Points) values for each amino acid. The approach of cumulative summing is then used to give a non-degenerate numeric representation of the sequences based on these 20 TP values. A special kind of three-component vector is proposed as a descriptor, which consists of a new type of non-central moment of orders one, two, and three. Subsequently, the proposed model uses Euclidean Distance measures among the descriptors to create a distance matrix. Finally, a phylogenetic tree is constructed using hierarchical agglomerative clustering based on the distance matrix. The results are compared with the UPGMA and other existing methods in terms of the quality and time of constructing the phylogenetic tree. Both qualitative and quantitative analysis are performed as key assessment criteria for analyzing the performance of the proposed model. The qualitative analysis of the phylogenetic tree is performed by considering rationalized perception, while the quantitative analysis is performed based on symmetric distance (SD). On both criteria, the results obtained by the proposed model are more satisfactory than those produced earlier on the same species by other methods. Notably, this method is found to be efficient in terms of both time and space requirements and is capable of dealing with protein sequences of varying lengths.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9472273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-01DOI: 10.1142/S0219720023500038
Yonglin Zhang, Mei Hu, Qi Mo, Wenli Gan, Jiesi Luo
N4-methyladenosine (4mC) methylation is an essential epigenetic modification of deoxyribonucleic acid (DNA) that plays a key role in many biological processes such as gene expression, gene replication and transcriptional regulation. Genome-wide identification and analysis of the 4mC sites can better reveal the epigenetic mechanisms that regulate various biological processes. Although some high-throughput genomic experimental methods can effectively facilitate the identification in a genome-wide scale, they are still too expensive and laborious for routine use. Computational methods can compensate for these disadvantages, but they still leave much room for performance improvement. In this study, we develop a non-NN-style deep learning-based approach for accurately predicting 4mC sites from genomic DNA sequence. We generate various informative features represented sequence fragments around 4mC sites, and subsequently implement them into a deep forest (DF) model. After training the deep model using 10-fold cross-validation, the overall accuracies of 85.0%, 90.0%, and 87.8% were achieved for three representative model organisms, A. thaliana, C. elegans, and D. melanogaster, respectively. In addition, extensive experiment results show that our proposed approach outperforms other existing state-of-the-art predictors in the 4mC identification. Our approach stands for the first DF-based algorithm for the prediction of 4mC sites, providing a novel idea in this field.
{"title":"A novel method for predicting DNA N<sup>4</sup>-methylcytosine sites based on deep forest algorithm.","authors":"Yonglin Zhang, Mei Hu, Qi Mo, Wenli Gan, Jiesi Luo","doi":"10.1142/S0219720023500038","DOIUrl":"https://doi.org/10.1142/S0219720023500038","url":null,"abstract":"<p><p>N<sup>4</sup>-methyladenosine (4mC) methylation is an essential epigenetic modification of deoxyribonucleic acid (DNA) that plays a key role in many biological processes such as gene expression, gene replication and transcriptional regulation. Genome-wide identification and analysis of the 4mC sites can better reveal the epigenetic mechanisms that regulate various biological processes. Although some high-throughput genomic experimental methods can effectively facilitate the identification in a genome-wide scale, they are still too expensive and laborious for routine use. Computational methods can compensate for these disadvantages, but they still leave much room for performance improvement. In this study, we develop a non-NN-style deep learning-based approach for accurately predicting 4mC sites from genomic DNA sequence. We generate various informative features represented sequence fragments around 4mC sites, and subsequently implement them into a deep forest (DF) model. After training the deep model using 10-fold cross-validation, the overall accuracies of 85.0%, 90.0%, and 87.8% were achieved for three representative model organisms, <i>A. thaliana, C. elegans</i>, and <i>D. melanogaster</i>, respectively. In addition, extensive experiment results show that our proposed approach outperforms other existing state-of-the-art predictors in the 4mC identification. Our approach stands for the first DF-based algorithm for the prediction of 4mC sites, providing a novel idea in this field.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9474484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-01DOI: 10.1142/S0219720023500014
Mukhtar Ahmad Sofi, M Arif Wani
Protein secondary structure prediction (PSSP) is an important and challenging task in protein bioinformatics. Protein secondary structures (SSs) are categorized in regular and irregular structure classes. Regular SSs, representing nearly 50% of amino acids consist of helices and sheets, whereas the remaining amino acids represent irregular SSs. [Formula: see text]-turns and [Formula: see text]-turns are the most abundant irregular SSs present in proteins. Existing methods are well developed for separate prediction of regular and irregular SSs. However, for more comprehensive PSSP, it is essential to develop a uniform model to predict all types of SSs simultaneously. In this work, using a novel dataset comprising dictionary of secondary structure of protein (DSSP)-based SSs and PROMOTIF-based [Formula: see text]-turns and [Formula: see text]-turns, we propose a unified deep learning model consisting of convolutional neural networks (CNNs) and long short-term memory networks (LSTMs) for simultaneous prediction of regular and irregular SSs. To the best of our knowledge, this is the first study in PSSP covering both regular and irregular structures. The protein sequences in our constructed datasets, RiR6069 and RiR513, have been borrowed from benchmark CB6133 and CB513 datasets, respectively. The results are indicative of increased PSSP accuracy.
{"title":"RiRPSSP: A unified deep learning method for prediction of regular and irregular protein secondary structures.","authors":"Mukhtar Ahmad Sofi, M Arif Wani","doi":"10.1142/S0219720023500014","DOIUrl":"https://doi.org/10.1142/S0219720023500014","url":null,"abstract":"<p><p>Protein secondary structure prediction (PSSP) is an important and challenging task in protein bioinformatics. Protein secondary structures (SSs) are categorized in regular and irregular structure classes. Regular SSs, representing nearly 50% of amino acids consist of helices and sheets, whereas the remaining amino acids represent irregular SSs. [Formula: see text]-turns and [Formula: see text]-turns are the most abundant irregular SSs present in proteins. Existing methods are well developed for separate prediction of regular and irregular SSs. However, for more comprehensive PSSP, it is essential to develop a uniform model to predict all types of SSs simultaneously. In this work, using a novel dataset comprising dictionary of secondary structure of protein (DSSP)-based SSs and PROMOTIF-based [Formula: see text]-turns and [Formula: see text]-turns, we propose a unified deep learning model consisting of convolutional neural networks (CNNs) and long short-term memory networks (LSTMs) for simultaneous prediction of regular and irregular SSs. To the best of our knowledge, this is the first study in PSSP covering both regular and irregular structures. The protein sequences in our constructed datasets, RiR6069 and RiR513, have been borrowed from benchmark CB6133 and CB513 datasets, respectively. The results are indicative of increased PSSP accuracy.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9474486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-01DOI: 10.1142/S0219720023500026
Ching-Nung Lin, Christine H Chung, Aik Choon Tan
Nucleus segmentation represents the initial step for histopathological image analysis pipelines, and it remains a challenge in many quantitative analysis methods in terms of accuracy and speed. Recently, deep learning nucleus segmentation methods have demonstrated to outperform previous intensity- or pattern-based methods. However, the heavy computation of deep learning provides impression of lagging response in real time and hampered the adoptability of these models in routine research. We developed and implemented NuKit a deep learning platform, which accelerates nucleus segmentation and provides prompt results to the users. NuKit platform consists of two deep learning models coupled with an interactive graphical user interface (GUI) to provide fast and automatic nucleus segmentation "on the fly". Both deep learning models provide complementary tasks in nucleus segmentation. The whole image segmentation model performs whole image nucleus whereas the click segmentation model supplements the nucleus segmentation with user-driven input to edits the segmented nuclei. We trained the NuKit whole image segmentation model on a large public training data set and tested its performance in seven independent public image data sets. The whole image segmentation model achieves average [Formula: see text] and [Formula: see text]. The outputs could be exported into different file formats, as well as provides seamless integration with other image analysis tools such as QuPath. NuKit can be executed on Windows, Mac, and Linux using personal computers.
{"title":"NuKit: A deep learning platform for fast nucleus segmentation of histopathological images.","authors":"Ching-Nung Lin, Christine H Chung, Aik Choon Tan","doi":"10.1142/S0219720023500026","DOIUrl":"https://doi.org/10.1142/S0219720023500026","url":null,"abstract":"<p><p>Nucleus segmentation represents the initial step for histopathological image analysis pipelines, and it remains a challenge in many quantitative analysis methods in terms of accuracy and speed. Recently, deep learning nucleus segmentation methods have demonstrated to outperform previous intensity- or pattern-based methods. However, the heavy computation of deep learning provides impression of lagging response in real time and hampered the adoptability of these models in routine research. We developed and implemented NuKit a deep learning platform, which accelerates nucleus segmentation and provides prompt results to the users. NuKit platform consists of two deep learning models coupled with an interactive graphical user interface (GUI) to provide fast and automatic nucleus segmentation \"on the fly\". Both deep learning models provide complementary tasks in nucleus segmentation. The whole image segmentation model performs whole image nucleus whereas the click segmentation model supplements the nucleus segmentation with user-driven input to edits the segmented nuclei. We trained the NuKit whole image segmentation model on a large public training data set and tested its performance in seven independent public image data sets. The whole image segmentation model achieves average [Formula: see text] and [Formula: see text]. The outputs could be exported into different file formats, as well as provides seamless integration with other image analysis tools such as QuPath. NuKit can be executed on Windows, Mac, and Linux using personal computers.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/68/f9/nihms-1915365.PMC10362904.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9852066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-01DOI: 10.1142/S0219720022500299
Andrea Mae Añonuevo, Marineil Gomez, Lemmuel L Tayo
The World Health Organization (WHO) declared breast cancer (BC) as the most prevalent cancer in the world. With its prevalence and severity, there have been several breakthroughs in developing treatments for the disease. Targeted therapy treatments limit the damage done to healthy tissues. These targeted therapies are especially potent for luminal and HER-2 positive type breast cancer. However, for triple negative breast cancer (TNBC), the lack of defining biomarkers makes it hard to approach with targeted therapy methods. Protein-protein interactions (PPIs) have been studied as possible targets for drug action. However, small molecule drugs are not able to cover the entirety of the PPI binding interface. Peptides were found to be more suited to the large or flat PPI surfaces, in addition to their better pharmacokinetic properties. In this study, computational methods was used in order to verify whether peptide drug inhibitors are good drug candidates against the ubiquitin protein, UBE2C by conducting docking, MD and MMPBSA analyses. Results show that while the lead peptide, T20-M shows good potential as a peptide drug, its binding affinity towards UBE2C is not enough to overcome the natural UBE2C-ANAPC2 interaction. Further studies on modification of T20-M and the analysis of other peptide leads are recommended.
{"title":"<i>In silico de novo</i> drug design of a therapeutic peptide inhibitor against UBE2C in breast cancer.","authors":"Andrea Mae Añonuevo, Marineil Gomez, Lemmuel L Tayo","doi":"10.1142/S0219720022500299","DOIUrl":"https://doi.org/10.1142/S0219720022500299","url":null,"abstract":"<p><p>The World Health Organization (WHO) declared breast cancer (BC) as the most prevalent cancer in the world. With its prevalence and severity, there have been several breakthroughs in developing treatments for the disease. Targeted therapy treatments limit the damage done to healthy tissues. These targeted therapies are especially potent for luminal and HER-2 positive type breast cancer. However, for triple negative breast cancer (TNBC), the lack of defining biomarkers makes it hard to approach with targeted therapy methods. Protein-protein interactions (PPIs) have been studied as possible targets for drug action. However, small molecule drugs are not able to cover the entirety of the PPI binding interface. Peptides were found to be more suited to the large or flat PPI surfaces, in addition to their better pharmacokinetic properties. In this study, computational methods was used in order to verify whether peptide drug inhibitors are good drug candidates against the ubiquitin protein, UBE2C by conducting docking, MD and MMPBSA analyses. Results show that while the lead peptide, T20-M shows good potential as a peptide drug, its binding affinity towards UBE2C is not enough to overcome the natural UBE2C-ANAPC2 interaction. Further studies on modification of T20-M and the analysis of other peptide leads are recommended.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9465490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-01DOI: 10.1142/S0219720023500087
Hui-Ling Huang, Chong-Heng Weng, Torbjörn E M Nordling, Yi-Fan Liou
Motivation: The synthesis of proteins with novel desired properties is challenging but sought after by the industry and academia. The dominating approach is based on trial-and-error inducing point mutations, assisted by structural information or predictive models built with paired data that are difficult to collect. This study proposes a sequence-based unpaired-sample of novel protein inventor (SUNI) to build ThermalProGAN for generating thermally stable proteins based on sequence information.
Results: The ThermalProGAN can strongly mutate the input sequence with a median number of 32 residues. A known normal protein, 1RG0, was used to generate a thermally stable form by mutating 51 residues. After superimposing the two structures, high similarity is shown, indicating that the basic function would be conserved. Eighty four molecular dynamics simulation results of 1RG0 and the COVID-19 vaccine candidates with a total simulation time of 840[Formula: see text]ns indicate that the thermal stability increased.
Conclusion: This proof of concept demonstrated that transfer of a desired protein property from one set of proteins is feasible. Availability and implementation: The source code of ThermalProGAN can be freely accessed at https://github.com/markliou/ThermalProGAN/ with an MIT license. The website is https://thermalprogan.markliou.tw:433. Supplementary information: Supplementary data are available on Github.
{"title":"ThermalProGAN: A sequence-based thermally stable protein generator trained using unpaired data.","authors":"Hui-Ling Huang, Chong-Heng Weng, Torbjörn E M Nordling, Yi-Fan Liou","doi":"10.1142/S0219720023500087","DOIUrl":"https://doi.org/10.1142/S0219720023500087","url":null,"abstract":"<p><strong>Motivation: </strong>The synthesis of proteins with novel desired properties is challenging but sought after by the industry and academia. The dominating approach is based on trial-and-error inducing point mutations, assisted by structural information or predictive models built with paired data that are difficult to collect. This study proposes a sequence-based unpaired-sample of novel protein inventor (SUNI) to build ThermalProGAN for generating thermally stable proteins based on sequence information.</p><p><strong>Results: </strong>The ThermalProGAN can strongly mutate the input sequence with a median number of 32 residues. A known normal protein, 1RG0, was used to generate a thermally stable form by mutating 51 residues. After superimposing the two structures, high similarity is shown, indicating that the basic function would be conserved. Eighty four molecular dynamics simulation results of 1RG0 and the COVID-19 vaccine candidates with a total simulation time of 840[Formula: see text]ns indicate that the thermal stability increased.</p><p><strong>Conclusion: </strong>This proof of concept demonstrated that transfer of a desired protein property from one set of proteins is feasible. <b>Availability and implementation:</b> The source code of ThermalProGAN can be freely accessed at https://github.com/markliou/ThermalProGAN/ with an MIT license. The website is https://thermalprogan.markliou.tw:433. <b>Supplementary information:</b> Supplementary data are available on Github.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9466541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-01DOI: 10.1142/S0219720023500051
Wilson Wen Bin Goh, Weijia Kong, Limsoon Wong
Some prediction methods use probability to rank their predictions, while some other prediction methods do not rank their predictions and instead use [Formula: see text]-values to support their predictions. This disparity renders direct cross-comparison of these two kinds of methods difficult. In particular, approaches such as the Bayes Factor upper Bound (BFB) for [Formula: see text]-value conversion may not make correct assumptions for this kind of cross-comparisons. Here, using a well-established case study on renal cancer proteomics and in the context of missing protein prediction, we demonstrate how to compare these two kinds of prediction methods using two different strategies. The first strategy is based on false discovery rate (FDR) estimation, which does not make the same naïve assumptions as BFB conversions. The second strategy is a powerful approach which we colloquially call "home ground testing". Both strategies perform better than BFB conversions. Thus, we recommend comparing prediction methods by standardization to a common performance benchmark such as a global FDR. And where this is not possible, we recommend reciprocal "home ground testing".
{"title":"Evaluating network-based missing protein prediction using <i>p</i>-values, Bayes Factors, and probabilities.","authors":"Wilson Wen Bin Goh, Weijia Kong, Limsoon Wong","doi":"10.1142/S0219720023500051","DOIUrl":"https://doi.org/10.1142/S0219720023500051","url":null,"abstract":"<p><p>Some prediction methods use probability to rank their predictions, while some other prediction methods do not rank their predictions and instead use [Formula: see text]-values to support their predictions. This disparity renders direct cross-comparison of these two kinds of methods difficult. In particular, approaches such as the Bayes Factor upper Bound (BFB) for [Formula: see text]-value conversion may not make correct assumptions for this kind of cross-comparisons. Here, using a well-established case study on renal cancer proteomics and in the context of missing protein prediction, we demonstrate how to compare these two kinds of prediction methods using two different strategies. The first strategy is based on false discovery rate (FDR) estimation, which does not make the same naïve assumptions as BFB conversions. The second strategy is a powerful approach which we colloquially call \"home ground testing\". Both strategies perform better than BFB conversions. Thus, we recommend comparing prediction methods by standardization to a common performance benchmark such as a global FDR. And where this is not possible, we recommend reciprocal \"home ground testing\".</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9474482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}