Pub Date : 2024-07-26DOI: 10.1016/j.compbiolchem.2024.108163
The increasing demand for eco-friendly technologies in biotechnology necessitates effective and sustainable catalysts. Acidophilic proteins, functioning optimally in highly acidic environments, hold immense promise for various applications, including food production, biofuels, and bioremediation. However, limited knowledge about these proteins hinders their exploration. This study addresses this gap by employing in silico methods utilizing computational tools and machine learning. We propose a novel approach to predict acidophilic proteins using protein language models (PLMs), accelerating discovery without extensive lab work. Our investigation highlights the potential of PLMs in understanding and harnessing acidophilic proteins for scientific and industrial advancements. We introduce the ACE model, which combines a simple Logistic Regression model with embeddings derived from protein sequences processed by the ProtT5 PLM. This model achieves high performance on an independent test set, with accuracy (0.91), F1-score (0.93), and Matthew's correlation coefficient (0.76). To our knowledge, this is the first application of pre-trained PLM embeddings for acidophilic protein classification. The ACE model serves as a powerful tool for exploring protein acidophilicity, paving the way for future advancements in protein design and engineering.
{"title":"Leveraging protein language model embeddings and logistic regression for efficient and accurate in-silico acidophilic proteins classification","authors":"","doi":"10.1016/j.compbiolchem.2024.108163","DOIUrl":"10.1016/j.compbiolchem.2024.108163","url":null,"abstract":"<div><p>The increasing demand for eco-friendly technologies in biotechnology necessitates effective and sustainable catalysts. Acidophilic proteins, functioning optimally in highly acidic environments, hold immense promise for various applications, including food production, biofuels, and bioremediation. However, limited knowledge about these proteins hinders their exploration. This study addresses this gap by employing <em>in silico</em> methods utilizing computational tools and machine learning. We propose a novel approach to predict acidophilic proteins using protein language models (PLMs), accelerating discovery without extensive lab work. Our investigation highlights the potential of PLMs in understanding and harnessing acidophilic proteins for scientific and industrial advancements. We introduce the ACE model, which combines a simple Logistic Regression model with embeddings derived from protein sequences processed by the ProtT5 PLM. This model achieves high performance on an independent test set, with accuracy (0.91), F1-score (0.93), and Matthew's correlation coefficient (0.76). To our knowledge, this is the first application of pre-trained PLM embeddings for acidophilic protein classification. The ACE model serves as a powerful tool for exploring protein acidophilicity, paving the way for future advancements in protein design and engineering.</p></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141891325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-26DOI: 10.1016/j.compbiolchem.2024.108165
A comprehensive analysis of the whole mitochondrial genomes of the Schizothoracinae subfamily of the family Cyprinidae has been revealed for the first time. The species analyzed include Schizothorax niger, Schizothorax esocinus, Schizothorax labiatus and Schizothorax plagoistomus. The total mitochondrial DNA (mtDNA) length was determined to be 16585 bp, 16583 bp, 16582 bp and 16576 bp, respectively with 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes and 2 non-coding area genes. The combined mean base compositions of the four species were as follows: A: 29.91 % T: 25.47 % G: 17.65 % C 27.01 %. The range of the GC content is 45–44 %, respectively. All protein coding genes (PCGs) commenced with the typical ATG codon, except for the cytochrome c oxidase subunit 1 (COX1) gene with GTG. The analysis of vital amino acid biosynthesis genes (COX1, ATPase 6, ATPase 8) in four different species revealed no significant differences. All 13 PCGs had Ka/Ks ratios that were all lesser than one, demonstrating purifying selection on those molecules. These tRNA genes were predicted to fold into the typical cloverleaf secondary structures with normal base pairing and ranged in size from 66 to 75 nucleotides. Additionally, the phylogenetic tree analysis revealed that S. esocinus species that was most alike to S. labiatus. This study provides critical data for phylogenetic analysis of the Schizothoracinae subfamily, which will help to resolve taxonomic difficulties and identify evolutionary links. Detailed mtDNA data are an invaluable resource for studying genetic diversity, population structure, and gene flow. Understanding genetic makeup can help inform conservation plans, identify unique populations, and track genetic variation to ensure effective preservation.
{"title":"Comparative mitochondrial genomics analysis of selected species of Schizothoracinae sub family to explore the differences at mitochondrial DNA level","authors":"","doi":"10.1016/j.compbiolchem.2024.108165","DOIUrl":"10.1016/j.compbiolchem.2024.108165","url":null,"abstract":"<div><p>A comprehensive analysis of the whole mitochondrial genomes of the <em>Schizothoracinae</em> subfamily of the family <em>Cyprinidae</em> has been revealed for the first time<em>.</em> The species analyzed include <em>Schizothorax niger, Schizothorax esocinus, Schizothorax labiatus</em> and <em>Schizothorax plagoistomus</em>. The total mitochondrial DNA (mtDNA) length was determined to be 16585 bp, 16583 bp, 16582 bp and 16576 bp, respectively with 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes and 2 non-coding area genes. The combined mean base compositions of the four species were as follows: A: 29.91 % T: 25.47 % G: 17.65 % C 27.01 %. The range of the GC content is 45–44 %, respectively. All protein coding genes (PCGs) commenced with the typical ATG codon, except for the cytochrome c oxidase subunit 1 (COX1) gene with GTG. The analysis of vital amino acid biosynthesis genes (COX1, ATPase 6, ATPase 8) in four different species revealed no significant differences. All 13 PCGs had Ka/Ks ratios that were all lesser than one, demonstrating purifying selection on those molecules. These tRNA genes were predicted to fold into the typical cloverleaf secondary structures with normal base pairing and ranged in size from 66 to 75 nucleotides. Additionally, the phylogenetic tree analysis revealed that <em>S. esocinus</em> species that was most alike to <em>S. labiatus</em>. This study provides critical data for phylogenetic analysis of the Schizothoracinae subfamily, which will help to resolve taxonomic difficulties and identify evolutionary links. Detailed mtDNA data are an invaluable resource for studying genetic diversity, population structure, and gene flow. Understanding genetic makeup can help inform conservation plans, identify unique populations, and track genetic variation to ensure effective preservation.</p></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141846286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-26DOI: 10.1016/j.compbiolchem.2024.108164
Breast carcinoma is the leading factor in women's cancer-related fatalities. Due to its numerous inherent molecular subtypes, breast cancer is an extremely diverse illness. The human epidermal growth factor receptor 2 (HER2) positive subtypes stands out among these subtypes as being especially prone to cancer development and illness recurrence. The regulation of embryonic stem cells' pluripotency and self-renewal is carried out by the SALL4 (Spalt-like transcription factor 4) family member. Numerous molecular pathways operating at the transcriptional, post-transcriptional, and epigenomic levels regulate the expression of SALL4. Many transcription factors control the expression of SALL4, with STAT3 being the primary regulator in hepatocellular carcinoma (HCC) and breast carcinoma. Moreover, this oncogene has been connected to a number of cellular functions, including invasion, apoptosis, proliferation, and resistance to therapy. Reduced patient survival rates and a worse prognosis have been linked to higher levels of SALL4. In order to target the undruggable SALL4 that is overexpressed in breast carcinoma, we investigated the prognostic levels of SALL4 in breast carcinoma and its interaction with various related proteins. Using TIMER 2.0 analysis, the expression pattern of SALL4 was investigated across all TCGA datasets. The research revealed that SALL4 expression was elevated in various cancers. The UALCAN findings demonstrated that SALL4 was overexpressed in all tumor samples including breast cancer especially TNBC (Triple negative breast cancer). The web-based ENRICHR program was used for gene ontology analysis that revealed SALL4 was actively involved in the development of the nervous system, positive regulation of stem cell proliferation, regulation of stem cell proliferation, regulation of the activin receptor signaling pathway, regulation of transcription using DNA templates, miRNA metabolic processes, and regulation of transcription by RNA Polymerase I. Using the STRING database, we analyzed the interaction and involvement of SALL4 with other abruptly activated proteins and used Cytoscape 3.8.0 for visualization. Additionally, using bc-GenExMiner, we studied the impact of SALL4 on pathways abruptly activated in different breast cancer subtypes that revealed SALL4 was highly correlated with WNT2B, NOTCH4, AKT3, and PIK3CA. Furthermore, to target SALL4, we evaluated and analyzed the impact of CLP and its analogues, revealing promising outcomes.
{"title":"Exploring SALL4 as a significant prognostic marker in breast cancer and its association with progression pathways involved in cancer genesis","authors":"","doi":"10.1016/j.compbiolchem.2024.108164","DOIUrl":"10.1016/j.compbiolchem.2024.108164","url":null,"abstract":"<div><p>Breast carcinoma is the leading factor in women's cancer-related fatalities. Due to its numerous inherent molecular subtypes, breast cancer is an extremely diverse illness. The human epidermal growth factor receptor 2 (HER2) positive subtypes stands out among these subtypes as being especially prone to cancer development and illness recurrence. The regulation of embryonic stem cells' pluripotency and self-renewal is carried out by the SALL4 (Spalt-like transcription factor 4) family member. Numerous molecular pathways operating at the transcriptional, post-transcriptional, and epigenomic levels regulate the expression of SALL4. Many transcription factors control the expression of SALL4, with STAT3 being the primary regulator in hepatocellular carcinoma (HCC) and breast carcinoma. Moreover, this oncogene has been connected to a number of cellular functions, including invasion, apoptosis, proliferation, and resistance to therapy. Reduced patient survival rates and a worse prognosis have been linked to higher levels of SALL4. In order to target the undruggable SALL4 that is overexpressed in breast carcinoma, we investigated the prognostic levels of SALL4 in breast carcinoma and its interaction with various related proteins. Using TIMER 2.0 analysis, the expression pattern of SALL4 was investigated across all TCGA datasets. The research revealed that SALL4 expression was elevated in various cancers. The UALCAN findings demonstrated that SALL4 was overexpressed in all tumor samples including breast cancer especially TNBC (Triple negative breast cancer). The web-based ENRICHR program was used for gene ontology analysis that revealed SALL4 was actively involved in the development of the nervous system, positive regulation of stem cell proliferation, regulation of stem cell proliferation, regulation of the activin receptor signaling pathway, regulation of transcription using DNA templates, miRNA metabolic processes, and regulation of transcription by RNA Polymerase I. Using the STRING database, we analyzed the interaction and involvement of SALL4 with other abruptly activated proteins and used Cytoscape 3.8.0 for visualization. Additionally, using bc-GenExMiner, we studied the impact of SALL4 on pathways abruptly activated in different breast cancer subtypes that revealed SALL4 was highly correlated with WNT2B, NOTCH4, AKT3, and PIK3CA. Furthermore, to target SALL4, we evaluated and analyzed the impact of CLP and its analogues, revealing promising outcomes.</p></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141891324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1016/j.compbiolchem.2024.108162
The motive of current investigations is to design a novel radial basis neural network stochastic structure to present the numerical representations of the Zika virus spreading model (ZVSM). The mathematical ZVSM is categorized into humans and vectors based on the susceptible S(q), exposed E(q), infected I(q) and recovered R(q), i.e., SEIR. The stochastic performances are designed using the radial basis activation function, feed forward neural network, twenty-two numbers of neurons along with the optimization of Bayesian regularization in order to solve the ZVSM. A dataset is achieved using the explicit Runge-Kutta scheme, which is used to reduce the mean square error (MSE) based on the process of training for solving the nonlinear ZVSM. The division of the data is categorized into training, which is taken as 78 %, while 11 % for both authentication and testing. Three different cases of the nonlinear ZVSM have been taken, while the scheme’s correctness is performed through the matching of the results. Furthermore, the reliability of the scheme is observed by applying different performances of regression, MSE, error histograms and state transition.
{"title":"A novel radial basis neural network for the Zika virus spreading model","authors":"","doi":"10.1016/j.compbiolchem.2024.108162","DOIUrl":"10.1016/j.compbiolchem.2024.108162","url":null,"abstract":"<div><p>The motive of current investigations is to design a novel radial basis neural network stochastic structure to present the numerical representations of the Zika virus spreading model (ZVSM). The mathematical ZVSM is categorized into humans and vectors based on the susceptible <em>S</em>(<em>q</em>), exposed <em>E</em>(<em>q</em>), infected <em>I</em>(<em>q</em>) and recovered <em>R</em>(<em>q</em>), i.e., SEIR. The stochastic performances are designed using the radial basis activation function, feed forward neural network, twenty-two numbers of neurons along with the optimization of Bayesian regularization in order to solve the ZVSM. A dataset is achieved using the explicit Runge-Kutta scheme, which is used to reduce the mean square error (MSE) based on the process of training for solving the nonlinear ZVSM. The division of the data is categorized into training, which is taken as 78 %, while 11 % for both authentication and testing. Three different cases of the nonlinear ZVSM have been taken, while the scheme’s correctness is performed through the matching of the results. Furthermore, the reliability of the scheme is observed by applying different performances of regression, MSE, error histograms and state transition.</p></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141844030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1016/j.compbiolchem.2024.108137
Motivation
Compound-protein interaction (CPI) prediction plays a crucial role in drug discovery and drug repositioning. Early researchers relied on time-consuming and labor-intensive wet laboratory experiments. However, the advent of deep learning has significantly accelerated this progress. Most existing deep learning methods utilize deep neural networks to extract compound features from sequences and graphs, either separately or in combination. Our team’s previous research has demonstrated that compound images contain valuable information that can be leveraged for CPI task. However, there is a scarcity of multimodal methods that effectively combine sequence and image representations of compounds in CPI. Currently, the use of text-image pairs for contrastive language-image pre-training is a popular approach in the multimodal field. Further research is needed to explore how the integration of sequence and image representations can enhance the accuracy of CPI task.
Results
This paper presents a novel method called MMCL-CPI, which encompasses two key highlights: 1) Firstly, we propose extracting compound features from two modalities: one-dimensional SMILES and two-dimensional images. This approach enables us to capture both sequence and spatial features, enhancing the prediction accuracy for CPI. Based on this, we design a novel multimodal model. 2) Secondly, we introduce a multimodal pre-training strategy that leverages comparative learning on a large-scale unlabeled dataset to establish the correspondence between SMILES string and compound’s image. This pre-training approach significantly improves compound feature representations for downstream CPI task. Our method has shown competitive results on multiple datasets.
{"title":"MMCL-CPI: A multi-modal compound-protein interaction prediction model incorporating contrastive learning pre-training","authors":"","doi":"10.1016/j.compbiolchem.2024.108137","DOIUrl":"10.1016/j.compbiolchem.2024.108137","url":null,"abstract":"<div><h3>Motivation</h3><p>Compound-protein interaction (CPI) prediction plays a crucial role in drug discovery and drug repositioning. Early researchers relied on time-consuming and labor-intensive wet laboratory experiments. However, the advent of deep learning has significantly accelerated this progress. Most existing deep learning methods utilize deep neural networks to extract compound features from sequences and graphs, either separately or in combination. Our team’s previous research has demonstrated that compound images contain valuable information that can be leveraged for CPI task. However, there is a scarcity of multimodal methods that effectively combine sequence and image representations of compounds in CPI. Currently, the use of text-image pairs for contrastive language-image pre-training is a popular approach in the multimodal field. Further research is needed to explore how the integration of sequence and image representations can enhance the accuracy of CPI task.</p></div><div><h3>Results</h3><p>This paper presents a novel method called MMCL-CPI, which encompasses two key highlights: 1) Firstly, we propose extracting compound features from two modalities: one-dimensional SMILES and two-dimensional images. This approach enables us to capture both sequence and spatial features, enhancing the prediction accuracy for CPI. Based on this, we design a novel multimodal model. 2) Secondly, we introduce a multimodal pre-training strategy that leverages comparative learning on a large-scale unlabeled dataset to establish the correspondence between SMILES string and compound’s image. This pre-training approach significantly improves compound feature representations for downstream CPI task. Our method has shown competitive results on multiple datasets.</p></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141846992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1016/j.compbiolchem.2024.108161
Deinococcus species, noted for their exceptional resistance to DNA-damaging environmental stresses, have piqued scientists' interest for decades. This study dives into the complex mechanisms underpinning radiation resistance in the Deinococcus genus. We have examined the genomes of 82 Deinococcus species and classified radiation-resistance proteins manually into five unique curated categories: DNA repair, oxidative stress defense, Ddr and Ppr proteins, regulatory proteins, and miscellaneous resistance components. This classification reveals important information about the various molecular mechanisms used by these extremophiles which have been less explored so far. We also investigated the presence or lack of these proteins in the context of phylogenetic relationships, core, and pan-genomes, which offered light on the evolutionary dynamics of radiation resistance. This comprehensive study provides a deeper understanding of the genetic underpinnings of radiation resistance in the Deinococcus genus, with potential implications for understanding similar mechanisms in other organisms using an interactomics approach. Finally, this study reveals the complexities of radiation resistance mechanisms, providing a comprehensive understanding of the genetic components that allow Deinococcus species to flourish under harsh environments. The findings add to our understanding of the larger spectrum of stress adaption techniques in bacteria and may have applications in sectors ranging from biotechnology to environmental research.
德氏球菌因其对破坏 DNA 的环境压力具有超强的抵抗力而备受关注,几十年来一直吸引着科学家们的兴趣。这项研究深入探讨了去势球菌属抗辐射性的复杂机制。我们研究了 82 个去势球菌物种的基因组,并将抗辐射蛋白质手动分为五个独特的类别:DNA 修复、氧化应激防御、Ddr 和 Ppr 蛋白、调节蛋白以及其他抗性成分。这一分类揭示了有关这些嗜极生物所使用的各种分子机制的重要信息,而迄今为止对这些机制的探索还较少。我们还结合系统发育关系、核心基因组和泛基因组研究了这些蛋白质的存在与否,从而揭示了抗辐射的进化动态。这项全面的研究加深了人们对去势球菌属抗辐射性遗传基础的理解,对利用相互作用组学方法理解其他生物的类似机制具有潜在的意义。最后,这项研究揭示了抗辐射机制的复杂性,让我们全面了解了让德氏球菌物种在严酷环境下繁衍生息的基因成分。这些发现加深了我们对更广泛的细菌应激适应技术的了解,可能会应用于从生物技术到环境研究等各个领域。
{"title":"Insights into the radiation and oxidative stress mechanisms in genus Deinococcus","authors":"","doi":"10.1016/j.compbiolchem.2024.108161","DOIUrl":"10.1016/j.compbiolchem.2024.108161","url":null,"abstract":"<div><p>Deinococcus species, noted for their exceptional resistance to DNA-damaging environmental stresses, have piqued scientists' interest for decades. This study dives into the complex mechanisms underpinning radiation resistance in the Deinococcus genus. We have examined the genomes of 82 Deinococcus species and classified radiation-resistance proteins manually into five unique curated categories: DNA repair, oxidative stress defense, Ddr and Ppr proteins, regulatory proteins, and miscellaneous resistance components. This classification reveals important information about the various molecular mechanisms used by these extremophiles which have been less explored so far. We also investigated the presence or lack of these proteins in the context of phylogenetic relationships, core, and pan-genomes, which offered light on the evolutionary dynamics of radiation resistance. This comprehensive study provides a deeper understanding of the genetic underpinnings of radiation resistance in the Deinococcus genus, with potential implications for understanding similar mechanisms in other organisms using an interactomics approach. Finally, this study reveals the complexities of radiation resistance mechanisms, providing a comprehensive understanding of the genetic components that allow Deinococcus species to flourish under harsh environments. The findings add to our understanding of the larger spectrum of stress adaption techniques in bacteria and may have applications in sectors ranging from biotechnology to environmental research.</p></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141850846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-23DOI: 10.1016/j.compbiolchem.2024.108158
Studying the relationship between sequences and their corresponding three-dimensional structure assists structural biologists in solving the protein-folding problem. Despite several experimental and in-silico approaches, still understanding or decoding the three-dimensional structures from the sequence remains a mystery. In such cases, the accuracy of the structure prediction plays an indispensable role. To address this issue, an updated web server (CSSP-2.0) has been created to improve the accuracy of our previous version of CSSP by deploying the existing algorithms. It uses input as probabilities and predicts the consensus for the secondary structure as a highly accurate three-state Q3 (helix, strand, and coil). This prediction is achieved using six recent top-performing methods: MUFOLD-SS, RaptorX, PSSpred v4, PSIPRED, JPred v4, and Porter 5.0. CSSP-2.0 validation includes datasets involving various protein classes from the PDB, CullPDB, and AlphaFold databases. Our results indicate a significant improvement in the accuracy of the consensus Q3 prediction. Using CSSP-2.0, crystallographers can sort out the stable regular secondary structures from the entire complex structure, which would aid in inferring the functional annotation of hypothetical proteins. The web server is freely available at https://bioserver3.physics.iisc.ac.in/cgi-bin/cssp-2/
{"title":"CSSP-2.0: A refined consensus method for accurate protein secondary structure prediction","authors":"","doi":"10.1016/j.compbiolchem.2024.108158","DOIUrl":"10.1016/j.compbiolchem.2024.108158","url":null,"abstract":"<div><p>Studying the relationship between sequences and their corresponding three-dimensional structure assists structural biologists in solving the protein-folding problem. Despite several experimental and <em>in-silico</em> approaches, still understanding or decoding the three-dimensional structures from the sequence remains a mystery. In such cases, the accuracy of the structure prediction plays an indispensable role. To address this issue, an updated web server (CSSP-2.0) has been created to improve the accuracy of our previous version of CSSP by deploying the existing algorithms. It uses input as probabilities and predicts the consensus for the secondary structure as a highly accurate three-state Q3 (helix, strand, and coil). This prediction is achieved using six recent top-performing methods: MUFOLD-SS, RaptorX, PSSpred v4, PSIPRED, JPred v4, and Porter 5.0. CSSP-2.0 validation includes datasets involving various protein classes from the PDB, CullPDB, and AlphaFold databases. Our results indicate a significant improvement in the accuracy of the consensus Q3 prediction. Using CSSP-2.0, crystallographers can sort out the stable regular secondary structures from the entire complex structure, which would aid in inferring the functional annotation of hypothetical proteins. The web server is freely available at <span><span>https://bioserver3.physics.iisc.ac.in/cgi-bin/cssp-2/</span><svg><path></path></svg></span></p></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141763295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-23DOI: 10.1016/j.compbiolchem.2024.108160
Ganoderma lucidum is a unique form of fungus utilized in Chinese medicine for various therapies as it exhibits a wide range of pharmacological activity. In this study, the purpose is to evaluate the possible drug-like qualities of the metabolites of G. lucidium as well as the impact that these metabolites have on the pathways involved in atherosclerosis. Throughout our research, a total of 17 compounds were chosen based on their drug-like properties. These compounds were then utilized in the subsequent networking and docking simulations. According to the findings, the compound ganodone has a maximum binding energy of −7.243 Kcal/mol. In terms of the binding energy, it has been discovered that the compound cianidanol has the lowest value. Based on the findings of the molecular docking investigations, it was determined that TNF, AKT1, SRC, and STAT3 exhibited a higher affinity for the complex. To determine this, molecular dynamics simulation was performed for about 100 nanoseconds. Following the completion of the GO functional analysis, it was discovered that the target genes were involved in the processes of protein binding, ATP binding, enzyme binding, and protein tyrosine kinase activity. Overall, the study results provide a view of possible metabolites that may have an impact on disease progression.
{"title":"Computational exploration of Ganoderma lucidum metabolites as potential anti-atherosclerotic agents: Insights from molecular docking and dynamics simulations","authors":"","doi":"10.1016/j.compbiolchem.2024.108160","DOIUrl":"10.1016/j.compbiolchem.2024.108160","url":null,"abstract":"<div><p><em><strong>Ganoderma lucidum</strong></em> is a unique form of fungus utilized in Chinese medicine for various therapies as it exhibits a wide range of pharmacological activity. In this study, the purpose is to evaluate the possible drug-like qualities of the metabolites of <em>G. lucidium</em> as well as the impact that these metabolites have on the pathways involved in atherosclerosis. Throughout our research, a total of 17 compounds were chosen based on their drug-like properties. These compounds were then utilized in the subsequent networking and docking simulations. According to the findings, the compound ganodone has a maximum binding energy of −7.243 Kcal/mol. In terms of the binding energy, it has been discovered that the compound cianidanol has the lowest value. Based on the findings of the molecular docking investigations, it was determined that TNF, AKT1, SRC, and STAT3 exhibited a higher affinity for the complex. To determine this, molecular dynamics simulation was performed for about 100 nanoseconds. Following the completion of the GO functional analysis, it was discovered that the target genes were involved in the processes of protein binding, ATP binding, enzyme binding, and protein tyrosine kinase activity. Overall, the study results provide a view of possible metabolites that may have an impact on disease progression.</p></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141853003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-23DOI: 10.1016/j.compbiolchem.2024.108159
In the present work, we describe the synthesis of new 1,3,4-thiadiazole derivatives from natural (R)-carvone in three steps including, dichloro-cyclopropanation, a condensation with thiosemicarbazide and then a 1,3-dipolar cycloaddition reaction with various nitrilimines. the targeted compounds were structurally identified by 1H & 13C NMR and HRMS analyses. The cytotoxic assay demonstrated that some synthesized novel compounds were potent on certain cancer cell lines. Molecular modeling studies were undertaken to rationalize the wet lab study results. Furthermore, molecular docking was performed to unveil the binding potential of the most active derivatives, 3a and 6c, to caspase-3 and COX-2. The stabilities of the protein-compound complexes obtained from the docking were evaluated using MD simulation. Furthermore, FMO and related parameters of the active compounds and their stereoisomers were examined through DFT studies. The docking study showed compound 6c had a higher binding potential than caspase-3. However, the binding strength of 6c was found to be less than that of the standard drug, doxorubicin, as it formed lower conventional hydrogen bonds. On the other hand, compound 3a had a higher binding potential to COX-2. However, the binding potential 3a was much lower than that of the standard COX-2 inhibitor, celecoxib. The MD simulation demonstrated that the caspase-3-6c complex was less stable than the caspase-3-doxorubicin complex. In contrast, the COX-2-3a complex was stable, and 3a was anticipated to remain inside the protein's binding pocket. The DFT study showed that 3a had higher chemical stability than 6c. The electron exchange capacity, chemical stability, and molecular orbital distributions of the stereoisomers of the active compounds were also found to be alike.
{"title":"Multitargeted molecular docking and dynamics simulation studies of 1,3,4-thiadiazoles synthesised from (R)-carvone against specific tumour protein markers: An In-silico study of two diastereoisomers","authors":"","doi":"10.1016/j.compbiolchem.2024.108159","DOIUrl":"10.1016/j.compbiolchem.2024.108159","url":null,"abstract":"<div><p>In the present work, we describe the synthesis of new 1,3,4-thiadiazole derivatives from natural (R)-carvone in three steps including, dichloro-cyclopropanation, a condensation with thiosemicarbazide and then a 1,3-dipolar cycloaddition reaction with various nitrilimines. the targeted compounds were structurally identified by <sup>1</sup>H & <sup>13</sup>C NMR and HRMS analyses. The cytotoxic assay demonstrated that some synthesized novel compounds were potent on certain cancer cell lines. Molecular modeling studies were undertaken to rationalize the wet lab study results. Furthermore, molecular docking was performed to unveil the binding potential of the most active derivatives, <strong>3a</strong> and <strong>6c</strong>, to caspase-3 and COX-2. The stabilities of the protein-compound complexes obtained from the docking were evaluated using MD simulation. Furthermore, FMO and related parameters of the active compounds and their stereoisomers were examined through DFT studies. The docking study showed compound <strong>6c</strong> had a higher binding potential than caspase-3. However, the binding strength of <strong>6c</strong> was found to be less than that of the standard drug, doxorubicin, as it formed lower conventional hydrogen bonds. On the other hand, compound <strong>3a</strong> had a higher binding potential to COX-2. However, the binding potential <strong>3a</strong> was much lower than that of the standard COX-2 inhibitor, celecoxib. The MD simulation demonstrated that the caspase-3-<strong>6c</strong> complex was less stable than the caspase-3-doxorubicin complex. In contrast, the COX-2-<strong>3a</strong> complex was stable, and <strong>3a</strong> was anticipated to remain inside the protein's binding pocket. The DFT study showed that <strong>3a</strong> had higher chemical stability than <strong>6c</strong>. The electron exchange capacity, chemical stability, and molecular orbital distributions of the stereoisomers of the active compounds were also found to be alike.</p></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141844164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-19DOI: 10.1016/j.compbiolchem.2024.108156
Background
Cycas revoluta Thunb., known for its ornamental, economic, and medicinal value, has leaves often discarded as waste. However, in ethnic regions of China, the leaves (CRL) are used in folk medicine for anti-tumor properties, particularly for regulating pathways related to cancer. Recent studies on ion channels and transporters (ICTs) highlight their therapeutic potential against cancer, making it vital to identify CRL’s active constituents targeting ICTs in lung cancer.
Purpose
This study aims to uncover bioactive substances in CRL and their mechanisms in regulating ICTs for lung cancer treatment using network pharmacology, bioinformatics, molecular docking, molecular dynamics (MD) simulations, in vitro cell assays and HPLC.
Methods
We analyzed 62 CRL compounds, predicted targets using PubChem and SwissTargetPrediction, identified lung cancer and ICT targets via GeneCards, and visualized overlaps with R software. Interaction networks were constructed using Cytoscape and STRING. Gene expression, GO, and KEGG analyses were performed using R software. TCGA data provided insights into differential, correlation, survival, and immune analyses. Key interactions were validated through molecular docking and MD simulations. Main biflavonoids were quantified using HPLC, and in vitro cell viability assays were conducted for key biflavonoids.
Results
Venn diagram analysis identified 52 intersecting targets and ten active CRL compounds. The PPI network highlighted seven key targets. GO and KEGG analysis showed CRL-targeted ICTs involved in synaptic transmission, GABAergic synapse, and proteoglycans in cancer. Differential expression and correlation analysis revealed significant differences in five core targets in lung cancer tissues. Survival analysis linked EGFR and GABRG2 with overall survival, and immune infiltration analysis associated the core targets with most immune cell types. Molecular docking indicated strong binding of CRL ingredients to core targets. HPLC revealed amentoflavone as the most abundant biflavonoid, followed by hinokiflavone, sciadopitysin, and podocarpusflavone A. MD simulations showed that podocarpusflavone A and amentoflavone had better binding stability with GABRG2, and the cell viability assay also proved that they had better anti-lung cancer potential.
Conclusions
This study identified potential active components, targets, and pathways of CRL-targeted ICTs for lung cancer treatment, suggesting CRL’s utility in drug development and its potential beyond industrial waste.
{"title":"Investigating the anti-lung cancer properties of Zhuang medicine Cycas revoluta Thunb. leaves targeting ion channels and transporters through a comprehensive strategy","authors":"","doi":"10.1016/j.compbiolchem.2024.108156","DOIUrl":"10.1016/j.compbiolchem.2024.108156","url":null,"abstract":"<div><h3>Background</h3><p><em>Cycas revoluta</em> Thunb., known for its ornamental, economic, and medicinal value, has leaves often discarded as waste. However, in ethnic regions of China, the leaves (CRL) are used in folk medicine for anti-tumor properties, particularly for regulating pathways related to cancer. Recent studies on ion channels and transporters (ICTs) highlight their therapeutic potential against cancer, making it vital to identify CRL’s active constituents targeting ICTs in lung cancer.</p></div><div><h3>Purpose</h3><p>This study aims to uncover bioactive substances in CRL and their mechanisms in regulating ICTs for lung cancer treatment using network pharmacology, bioinformatics, molecular docking, molecular dynamics (MD) simulations, <em>in vitro</em> cell assays and HPLC.</p></div><div><h3>Methods</h3><p>We analyzed 62 CRL compounds, predicted targets using PubChem and SwissTargetPrediction, identified lung cancer and ICT targets via GeneCards, and visualized overlaps with R software. Interaction networks were constructed using Cytoscape and STRING. Gene expression, GO, and KEGG analyses were performed using R software. TCGA data provided insights into differential, correlation, survival, and immune analyses. Key interactions were validated through molecular docking and MD simulations. Main biflavonoids were quantified using HPLC, and in vitro cell viability assays were conducted for key biflavonoids.</p></div><div><h3>Results</h3><p>Venn diagram analysis identified 52 intersecting targets and ten active CRL compounds. The PPI network highlighted seven key targets. GO and KEGG analysis showed CRL-targeted ICTs involved in synaptic transmission, GABAergic synapse, and proteoglycans in cancer. Differential expression and correlation analysis revealed significant differences in five core targets in lung cancer tissues. Survival analysis linked EGFR and GABRG2 with overall survival, and immune infiltration analysis associated the core targets with most immune cell types. Molecular docking indicated strong binding of CRL ingredients to core targets. HPLC revealed amentoflavone as the most abundant biflavonoid, followed by hinokiflavone, sciadopitysin, and podocarpusflavone A. MD simulations showed that podocarpusflavone A and amentoflavone had better binding stability with GABRG2, and the cell viability assay also proved that they had better anti-lung cancer potential.</p></div><div><h3>Conclusions</h3><p>This study identified potential active components, targets, and pathways of CRL-targeted ICTs for lung cancer treatment, suggesting CRL’s utility in drug development and its potential beyond industrial waste.</p></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141790276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}