The oral route is the most preferred route for drug delivery, due to which the largest share of the pharmaceutical market is represented by oral drugs. Human intestinal absorption (HIA) is closely related to oral bioavailability making it an important factor in predicting drug absorption. In this study, we focus on predicting drug permeability at HIA as a marker for oral bioavailability. A set of 2648 compounds were collected from some early as well as recent works and curated to build a robust dataset. Five machine learning (ML) algorithms have been trained with a set of molecular descriptors of these compounds which have been selected after rigorous feature engineering. Additionally, two deep learning models - graph convolution neural network (GCNN) and graph attention network (GAT) based model were developed using the same set of compounds to exploit the predictability with automated extracted features. The numerical analyses show that out the five ML models, Random forest and LightGBM could predict with an accuracy of 87.71 % and 86.04 % on the test set and 81.43 % and 77.30 % with the external validation set respectively. Whereas with the GCNN and GAT based models, the final accuracy achieved was 77.69 % and 78.58 % on test set and 79.29 % and 79.42 % on the external validation set respectively. We believe deployment of these models for screening oral drugs can provide promising results and therefore deposited the dataset and models on the GitHub platform (https://github.com/hridoy69/HIA).
{"title":"Integrating (deep) machine learning and cheminformatics for predicting human intestinal absorption of small molecules","authors":"Orchid Baruah , Upashya Parasar , Anirban Borphukan , Bikram Phukan , Pankaj Bharali , Selvaraman Nagamani , Hridoy Jyoti Mahanta","doi":"10.1016/j.compbiolchem.2024.108270","DOIUrl":"10.1016/j.compbiolchem.2024.108270","url":null,"abstract":"<div><div>The oral route is the most preferred route for drug delivery, due to which the largest share of the pharmaceutical market is represented by oral drugs. Human intestinal absorption (HIA) is closely related to oral bioavailability making it an important factor in predicting drug absorption. In this study, we focus on predicting drug permeability at HIA as a marker for oral bioavailability. A set of 2648 compounds were collected from some early as well as recent works and curated to build a robust dataset. Five machine learning (ML) algorithms have been trained with a set of molecular descriptors of these compounds which have been selected after rigorous feature engineering. Additionally, two deep learning models - graph convolution neural network (GCNN) and graph attention network (GAT) based model were developed using the same set of compounds to exploit the predictability with automated extracted features. The numerical analyses show that out the five ML models, Random forest and LightGBM could predict with an accuracy of 87.71 % and 86.04 % on the test set and 81.43 % and 77.30 % with the external validation set respectively. Whereas with the GCNN and GAT based models, the final accuracy achieved was 77.69 % and 78.58 % on test set and 79.29 % and 79.42 % on the external validation set respectively. We believe deployment of these models for screening oral drugs can provide promising results and therefore deposited the dataset and models on the GitHub platform (<span><span>https://github.com/hridoy69/HIA</span><svg><path></path></svg></span>).</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"113 ","pages":"Article 108270"},"PeriodicalIF":2.6,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142553675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pancreatic cancer, with a 5-year survival rate below 10 %, is one of the deadliest malignancies. The TGF-ß pathway plays a crucial role in this disease, making it a key target for therapeutic intervention. Clinical trials targeting TGF-β have faced challenges of toxicity and limited efficacy, highlighting the need for more potent small molecule inhibitors. We selected TGFßR1 as the drug target to inhibit TGF-ß signaling in pancreatic cancer. A multi-faceted approach was employed, commencing with AI-driven screening techniques to rapidly identify potential TGFßR1 inhibitors from vast compound libraries, including the ZINC and ChEMBL databases. AI-screened compounds were further validated through structure-based high-throughput virtual screening (HTVS) to evaluate their binding affinity to TGFßR1. In addition to this, a dedicated library of anticancer compounds (65,000 compounds) and protein kinase inhibitors (36,324 compounds) were also used for HTVS. Subsequently, pharmacokinetic profiling narrowed the selection to 40 hit compounds. Five hit compounds were chosen based on binding affinity, non-bonded interactions, stereochemistry, and pharmacokinetic profiles for molecular dynamics (MD) simulations. Trajectory analysis showed that residues HIS283, ASP351, LYS232, SER280, ILE211, and LYS213 within TGFßR1's active site are crucial for ligand binding through hydrogen bonds and hydrophobic interactions. Principal component analysis (PCA) and Dynamic cross-correlation matrix (DCCM) analysis were used to evaluate the receptor's dynamic response to the hit compounds. The simulation data revealed that compounds 1, 2, 3, 4, and 5 formed stable complexes with TGFßR1. Notably, post-MDS MM-GBSA analysis showed that compounds 4 and 5 exhibited exceptionally strong binding energies of −81.0 kcal/mol and −85.5 kcal/mol, respectively. The comprehensive computational analysis confirms compounds 4 and 5 as promising TGFßR1 hits with potential therapeutic applications in development of new treatments for pancreatic cancer.
{"title":"AI screening and molecular dynamic simulation-driven identification of novel inhibitors of TGFßR1 for pancreatic cancer therapy","authors":"Samvedna Singh , Kiran Bharat Lokhande , Aman Chandra Kaushik , Ashutosh Singh , Shakti Sahi","doi":"10.1016/j.compbiolchem.2024.108262","DOIUrl":"10.1016/j.compbiolchem.2024.108262","url":null,"abstract":"<div><div>Pancreatic cancer, with a 5-year survival rate below 10 %, is one of the deadliest malignancies. The TGF-ß pathway plays a crucial role in this disease, making it a key target for therapeutic intervention. Clinical trials targeting TGF-β have faced challenges of toxicity and limited efficacy, highlighting the need for more potent small molecule inhibitors. We selected TGFßR1 as the drug target to inhibit TGF-ß signaling in pancreatic cancer. A multi-faceted approach was employed, commencing with AI-driven screening techniques to rapidly identify potential TGFßR1 inhibitors from vast compound libraries, including the ZINC and ChEMBL databases. AI-screened compounds were further validated through structure-based high-throughput virtual screening (HTVS) to evaluate their binding affinity to TGFßR1. In addition to this, a dedicated library of anticancer compounds (65,000 compounds) and protein kinase inhibitors (36,324 compounds) were also used for HTVS. Subsequently, pharmacokinetic profiling narrowed the selection to 40 hit compounds. Five hit compounds were chosen based on binding affinity, non-bonded interactions, stereochemistry, and pharmacokinetic profiles for molecular dynamics (MD) simulations. Trajectory analysis showed that residues HIS283, ASP351, LYS232, SER280, ILE211, and LYS213 within TGFßR1's active site are crucial for ligand binding through hydrogen bonds and hydrophobic interactions. Principal component analysis (PCA) and Dynamic cross-correlation matrix (DCCM) analysis were used to evaluate the receptor's dynamic response to the hit compounds. The simulation data revealed that compounds 1, 2, 3, 4, and 5 formed stable complexes with TGFßR1. Notably, post-MDS MM-GBSA analysis showed that compounds 4 and 5 exhibited exceptionally strong binding energies of −81.0 kcal/mol and −85.5 kcal/mol, respectively. The comprehensive computational analysis confirms compounds 4 and 5 as promising TGFßR1 hits with potential therapeutic applications in development of new treatments for pancreatic cancer.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"113 ","pages":"Article 108262"},"PeriodicalIF":2.6,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142570706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-28DOI: 10.1016/j.compbiolchem.2024.108264
Shan Du , Xin-Xin Zhang , Xiang Gao, Yan-Bin He
Leukocyte antigen related protein (LAR), a member of the PTP family, has become a potential target for exploring therapeutic interventions for various complex diseases, including neurodegenerative diseases. The reuse of FDA-approved drugs offers a promising approach for rapidly identifying potential LAR inhibitors. In this study, we conducted a structure-based virtual screening of FDA-approved drugs from ZINC database and selected candidate compounds based on their binding affinity and interactions with LAR. Our research revealed that the candidate compound ZINC6716957 exhibited excellent binding affinity to the binding pocket of LAR, formed interactions with key residues at the active site, and demonstrated low toxicity. To further understand the binding dynamics and interaction mechanisms, the 100-ns molecular dynamics simulations were performed. Post-dynamics analyses (RMSD, RMSF, SASA, hydrogen bond, binding free energy and free energy landscape) indicated that the compound ZINC6716957 stabilized the structure of LAR and the residues (Tyr1355, Arg1431, Lys1433, Arg1528, Tyr1563 and Thr1567) played a vital role in stabilizing the conformational changes of protein. In conclusion, the identified compound ZINC6716957 possessed robust inhibitory activity on LAR and merited extensive research, potentially unleashing its significant therapeutic potential in the treatment of complex diseases, particularly neurodegenerative disorders.
白细胞抗原相关蛋白(LAR)是 PTP 家族的成员之一,已成为探索各种复杂疾病(包括神经退行性疾病)治疗干预措施的潜在靶点。美国食品药物管理局(FDA)批准药物的再利用为快速鉴定潜在的 LAR 抑制剂提供了一种很有前景的方法。在本研究中,我们从 ZINC 数据库中对 FDA 批准的药物进行了基于结构的虚拟筛选,并根据其与 LAR 的结合亲和力和相互作用筛选出候选化合物。我们的研究发现,候选化合物 ZINC6716957 与 LAR 的结合口袋具有极佳的结合亲和力,与活性位点的关键残基形成了相互作用,并表现出较低的毒性。为了进一步了解结合动力学和相互作用机制,我们进行了 100-ns 分子动力学模拟。后动力学分析(RMSD、RMSF、SASA、氢键、结合自由能和自由能景观)表明,化合物 ZINC6716957 稳定了 LAR 的结构,其中的残基(Tyr1355、Arg1431、Lys1433、Arg1528、Tyr1563 和 Thr1567)在稳定蛋白质构象变化中发挥了重要作用。总之,所发现的化合物 ZINC6716957 对 LAR 具有很强的抑制活性,值得广泛研究,有望在治疗复杂疾病,尤其是神经退行性疾病方面释放出巨大的治疗潜力。
{"title":"Structure-based screening of FDA-approved drugs and molecular dynamics simulation to identify potential leukocyte antigen related protein (PTP-LAR) inhibitors","authors":"Shan Du , Xin-Xin Zhang , Xiang Gao, Yan-Bin He","doi":"10.1016/j.compbiolchem.2024.108264","DOIUrl":"10.1016/j.compbiolchem.2024.108264","url":null,"abstract":"<div><div>Leukocyte antigen related protein (LAR), a member of the PTP family, has become a potential target for exploring therapeutic interventions for various complex diseases, including neurodegenerative diseases. The reuse of FDA-approved drugs offers a promising approach for rapidly identifying potential LAR inhibitors. In this study, we conducted a structure-based virtual screening of FDA-approved drugs from ZINC database and selected candidate compounds based on their binding affinity and interactions with LAR. Our research revealed that the candidate compound ZINC6716957 exhibited excellent binding affinity to the binding pocket of LAR, formed interactions with key residues at the active site, and demonstrated low toxicity. To further understand the binding dynamics and interaction mechanisms, the 100-ns molecular dynamics simulations were performed. Post-dynamics analyses (RMSD, RMSF, SASA, hydrogen bond, binding free energy and free energy landscape) indicated that the compound ZINC6716957 stabilized the structure of LAR and the residues (Tyr1355, Arg1431, Lys1433, Arg1528, Tyr1563 and Thr1567) played a vital role in stabilizing the conformational changes of protein. In conclusion, the identified compound ZINC6716957 possessed robust inhibitory activity on LAR and merited extensive research, potentially unleashing its significant therapeutic potential in the treatment of complex diseases, particularly neurodegenerative disorders.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"113 ","pages":"Article 108264"},"PeriodicalIF":2.6,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142568192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-28DOI: 10.1016/j.compbiolchem.2024.108265
Aanchal Rathi , Saba Noor , Shama Khan , Faizya Khan , Farah Anjum , Anam Ashraf , Aaliya Taiyab , Asimul Islam , Md. Imtaiyaz Hassan , Mohammad Mahfuzul Haque
PIM-1 is a Ser/Thr kinase, which has been extensively studied as a potential target for cancer therapy due to its significant roles in various cancers, including prostate and breast cancers. Given its importance in cancer, researchers are investigating the structure of PIM-1 for pharmacological inhibition to discover therapeutic intervention. This study examines structural and conformational changes in PIM-1 across different pH using various spectroscopic and computational techniques. Spectroscopic results indicate that PIM-1 maintains its secondary and tertiary structure within the pH range of 7.0–9.0. However, protein aggregation occurs in the acidic pH range of 5.0–6.0. Additionally, kinase assays suggested that PIM-1 activity is optimal within the pH range of 7.0–9.0. Subsequently, we performed a 100 ns all-atom molecular dynamics (MD) simulation to see the effect of pH on PIM-1 structural stability at the molecular level. MD simulation analysis revealed that PIM-1 retains its native conformation in alkaline conditions, with some residual fluctuations in acidic conditions as well. A strong correlation was observed between our MD simulation, spectroscopic, and enzymatic activity studies. Understanding the pH-dependent structural changes of PIM-1 can provide insights into its role in disease conditions and cellular homeostasis, particularly regarding protein function under varying pH conditions.
{"title":"Investigating pH-induced conformational switch in PIM-1: An integrated multi spectroscopic and MD simulation study","authors":"Aanchal Rathi , Saba Noor , Shama Khan , Faizya Khan , Farah Anjum , Anam Ashraf , Aaliya Taiyab , Asimul Islam , Md. Imtaiyaz Hassan , Mohammad Mahfuzul Haque","doi":"10.1016/j.compbiolchem.2024.108265","DOIUrl":"10.1016/j.compbiolchem.2024.108265","url":null,"abstract":"<div><div>PIM-1 is a Ser/Thr kinase, which has been extensively studied as a potential target for cancer therapy due to its significant roles in various cancers, including prostate and breast cancers. Given its importance in cancer, researchers are investigating the structure of PIM-1 for pharmacological inhibition to discover therapeutic intervention. This study examines structural and conformational changes in PIM-1 across different pH using various spectroscopic and computational techniques. Spectroscopic results indicate that PIM-1 maintains its secondary and tertiary structure within the pH range of 7.0–9.0. However, protein aggregation occurs in the acidic pH range of 5.0–6.0. Additionally, kinase assays suggested that PIM-1 activity is optimal within the pH range of 7.0–9.0. Subsequently, we performed a 100 ns all-atom molecular dynamics (MD) simulation to see the effect of pH on PIM-1 structural stability at the molecular level. MD simulation analysis revealed that PIM-1 retains its native conformation in alkaline conditions, with some residual fluctuations in acidic conditions as well. A strong correlation was observed between our MD simulation, spectroscopic, and enzymatic activity studies. Understanding the pH-dependent structural changes of PIM-1 can provide insights into its role in disease conditions and cellular homeostasis, particularly regarding protein function under varying pH conditions.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"113 ","pages":"Article 108265"},"PeriodicalIF":2.6,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142568003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-24DOI: 10.1016/j.compbiolchem.2024.108255
Shalini Majumder , Ekarsi Lodh , Tapan Chowdhury
Breast cancer has been one of the supreme causes of cancer-related deaths among women worldwide. To make the case even more compounded, due to innate or acquired causes, cancer cells often develop resistance against the available chemotherapy or monotargeted treatments. This resistance is concomitant with increased activation of the MAPK (mitogen-activated protein kinase) signaling pathway. This study simultaneously targets three imperative intermediates in this pathway using molecular docking and real-time simulation. Docking was performed via the integrated AutoDock Vina 1.1.2 & 1.2.5 of the PyRx software, while the Discovery Studio (BIOVIA) v24.1.0.23298 was utilized to conduct the simulation. The aim is to investigate the therapeutic prospects of known potential inhibitors of the targeted intermediates and repurposable drugs to comprehend the effectiveness of targeting these trinodes simultaneously. The target points were deemed to be PDPK1 (3-phosphoinositide-dependent protein kinase 1), ERK1/2 (extracellular signal-related protein kinases 1/2), and mTOR (mammalian target of Rapamycin). Our study reveals that out of the candidate inhibitors chosen for each node, MP7 exhibited the most superior binding affinities for all three: −10.918 kcal/mol for PDPK1, −10.224 kcal/mol for ERK1, −10.134 kcal/mol for ERK2, and −9.2 kcal/mol for mTOR (via AutoDock Vina 1, .2.5). Some scores with MP7 were often higher than the available single-targeted drugs for different nodes in the MAPK pathway. Additionally, a total of 1867 repurposed analgesic, antibiotic, and antiparasitic drugs, including Zavegepant (−13.399 kcal/mol for PDPK1), Adozelesin (−11.74 kcal/mol for mTOR) and Modoflaner (−11.29 kcal/mol for PDPK1), showed promising binding energetics while targeting our triad points than other compounds used. This approach prompts for mitigating not only breast cancer but other elusive diseases as well, with state-of-the-art multitargeted therapies coupled with bioinformatic strategies.
{"title":"Implications of trinodal inhibitions and drug repurposing in MAPK pathway: A putative remedy for breast cancer","authors":"Shalini Majumder , Ekarsi Lodh , Tapan Chowdhury","doi":"10.1016/j.compbiolchem.2024.108255","DOIUrl":"10.1016/j.compbiolchem.2024.108255","url":null,"abstract":"<div><div>Breast cancer has been one of the supreme causes of cancer-related deaths among women worldwide. To make the case even more compounded, due to innate or acquired causes, cancer cells often develop resistance against the available chemotherapy or monotargeted treatments. This resistance is concomitant with increased activation of the MAPK (mitogen-activated protein kinase) signaling pathway. This study simultaneously targets three imperative intermediates in this pathway using molecular docking and real-time simulation. Docking was performed via the integrated AutoDock Vina 1.1.2 & 1.2.5 of the PyRx software, while the Discovery Studio (BIOVIA) v24.1.0.23298 was utilized to conduct the simulation. The aim is to investigate the therapeutic prospects of known potential inhibitors of the targeted intermediates and repurposable drugs to comprehend the effectiveness of targeting these trinodes simultaneously. The target points were deemed to be PDPK1 (3-phosphoinositide-dependent protein kinase 1), ERK1/2 (extracellular signal-related protein kinases 1/2), and mTOR (mammalian target of Rapamycin). Our study reveals that out of the candidate inhibitors chosen for each node, MP7 exhibited the most superior binding affinities for all three: −10.918 kcal/mol for PDPK1, −10.224 kcal/mol for ERK1, −10.134 kcal/mol for ERK2, and −9.2 kcal/mol for mTOR (via AutoDock Vina 1, .2.5). Some scores with MP7 were often higher than the available single-targeted drugs for different nodes in the MAPK pathway. Additionally, a total of 1867 repurposed analgesic, antibiotic, and antiparasitic drugs, including Zavegepant (−13.399 kcal/mol for PDPK1), Adozelesin (−11.74 kcal/mol for mTOR) and Modoflaner (−11.29 kcal/mol for PDPK1), showed promising binding energetics while targeting our triad points than other compounds used. This approach prompts for mitigating not only breast cancer but other elusive diseases as well, with state-of-the-art multitargeted therapies coupled with bioinformatic strategies.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"113 ","pages":"Article 108255"},"PeriodicalIF":2.6,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142514761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-24DOI: 10.1016/j.compbiolchem.2024.108268
Sinan Eyuboglu , Semih Alpsoy , Vladimir N. Uversky , Orkid Coskuner-Weber
Pancreatic ductal adenocarcinoma (PDAC) is recognized for its aggressive nature, dismal prognosis, and a notably low five-year survival rate, underscoring the critical need for early detection methods and more effective therapeutic approaches. This research rigorously investigates the molecular mechanisms underlying PDAC, with a focus on the identification of pivotal genes and pathways that may hold therapeutic relevance and prognostic value. Through the construction of a protein-protein interaction (PPI) network and the examination of differentially expressed genes (DEGs), the study uncovers key hub genes such as CDK1, KIF11, and BUB1, demonstrating their substantial role in the pathogenesis of PDAC. Notably, the dysregulation of these genes is consistent across a spectrum of cancers, positing them as potential targets for wide-ranging cancer therapeutics. This study also brings to the fore significant genes encoding intrinsically disordered proteins, in particular GPRC5A and KRT7, unveiling promising new pathways for therapeutic intervention. Advanced machine learning techniques were harnessed to classify PDAC patients with high accuracy, utilizing the key genetic markers as a dataset. The Support Vector Machine (SVM) model leveraged the hub genes to achieve a sensitivity of 91 % and a specificity of 85 %, while the RandomForest model notched a sensitivity of 91 % and specificity of 92.5 %. Crucially, when the identified genes were cross-referenced with TCGA-PAAD clinical datasets, a tangible correlation with patient survival rates was discovered, reinforcing the potential of these genes as prognostic biomarkers and their viability as targets for therapeutic intervention. This study's findings serve as a potent testament to the value of molecular analysis in enhancing the understanding of PDAC and in advancing the pursuit for more effective diagnostic and treatment strategies.
{"title":"Key genes and pathways in the molecular landscape of pancreatic ductal adenocarcinoma: A bioinformatics and machine learning study","authors":"Sinan Eyuboglu , Semih Alpsoy , Vladimir N. Uversky , Orkid Coskuner-Weber","doi":"10.1016/j.compbiolchem.2024.108268","DOIUrl":"10.1016/j.compbiolchem.2024.108268","url":null,"abstract":"<div><div>Pancreatic ductal adenocarcinoma (PDAC) is recognized for its aggressive nature, dismal prognosis, and a notably low five-year survival rate, underscoring the critical need for early detection methods and more effective therapeutic approaches. This research rigorously investigates the molecular mechanisms underlying PDAC, with a focus on the identification of pivotal genes and pathways that may hold therapeutic relevance and prognostic value. Through the construction of a protein-protein interaction (PPI) network and the examination of differentially expressed genes (DEGs), the study uncovers key hub genes such as CDK1, KIF11, and BUB1, demonstrating their substantial role in the pathogenesis of PDAC. Notably, the dysregulation of these genes is consistent across a spectrum of cancers, positing them as potential targets for wide-ranging cancer therapeutics. This study also brings to the fore significant genes encoding intrinsically disordered proteins, in particular GPRC5A and KRT7, unveiling promising new pathways for therapeutic intervention. Advanced machine learning techniques were harnessed to classify PDAC patients with high accuracy, utilizing the key genetic markers as a dataset. The Support Vector Machine (SVM) model leveraged the hub genes to achieve a sensitivity of 91 % and a specificity of 85 %, while the RandomForest model notched a sensitivity of 91 % and specificity of 92.5 %. Crucially, when the identified genes were cross-referenced with TCGA-PAAD clinical datasets, a tangible correlation with patient survival rates was discovered, reinforcing the potential of these genes as prognostic biomarkers and their viability as targets for therapeutic intervention. This study's findings serve as a potent testament to the value of molecular analysis in enhancing the understanding of PDAC and in advancing the pursuit for more effective diagnostic and treatment strategies.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"113 ","pages":"Article 108268"},"PeriodicalIF":2.6,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142523872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-24DOI: 10.1016/j.compbiolchem.2024.108266
Ahmad Hasan , Muhammad Ibrahim , Wadi B. Alonazi , Jian Shen
Bloodstream infections pose a significant public health challenge caused by resistant bacteria such as Variovorax durovernensis, a recently reported Gram-negative bacterium, worsening the burden on healthcare systems. The design of a vaccine using chimeric peptides derived from a representative V. durovernensis strain holds significant promise for preventing disease onset. The current study aimed to employ reverse vaccinology (RV) approaches such as the retrieval of V. durovernensis proteomics data, removal of redundant proteins by CD-HIT, filtering of non-homologous proteins to humans and essential proteins, identification of outer membrane (OM) proteins by CELLO and PSORTb. Following these steps immunoinformatic approaches were applied, such as epitope prediction by IEDB, vaccine design using linkers and adjuvant and analysis of antigenicity, allergenicity, safety and stability. Among the 4208 nonredundant proteins, an OmpA family protein (A0A940EKP4) was designated a potential candidate for the development of a multiepitope vaccine construct. Upon analysis of OM protein, six immunodominant (B cell) epitopes were found on the basis of the chimeric construct following the prediction of CTL stands cytotoxic T lymphocyte and HTL stands helper T lymphocyte epitopes. To ensure comprehensive population coverage globally, the CTL and HTL coverage rates were 58.18 % and 46.56 %, respectively, and 77.23 % overall. By utilizing EAAAK, GPGPG, and AAY linkers, Cholera toxin B subunit adjuvants, and appropriate epitopes were smoothly incorporated into a chimeric vaccine effectively triggering both adaptive and innate immune responses. For example, the administered antigen showed a peak in counts on the fifthday post injection and then gradually declined until the fifteenth day. Elevated levels of several antibodies (IgG + IgM > 700,000; IgM > 600,000; IgG1 + IgG2; IgG1 > 500,000) were observed as decreased in the antigen concentration. Molecular dynamics simulations carried out via iMODS revealed strong correlations between residue pairs, highlighting the stability of the docked complex. The designed vaccine has promising potential in eliciting specific immunogenic responses, thereby facilitating future research for vaccine development against V. durovernensis.
血流感染是由耐药细菌(如最近报道的革兰氏阴性菌 Variovorax durovernensis)引起的重大公共卫生挑战,加重了医疗系统的负担。利用从具有代表性的 V. durovernensis 菌株中提取的嵌合肽设计疫苗,在预防疾病发生方面大有可为。目前的研究旨在采用反向疫苗学(RV)方法,如检索 V. durovernensis 蛋白质组学数据、通过 CD-HIT 去除冗余蛋白、过滤与人类非同源的蛋白和必需蛋白、通过 CELLO 和 PSORTb 鉴定外膜(OM)蛋白。在这些步骤之后,还采用了免疫形式化方法,如通过 IEDB 预测表位,使用连接体和佐剂进行疫苗设计,以及分析抗原性、过敏性、安全性和稳定性。在 4208 个非冗余蛋白中,一个 OmpA 家族蛋白(A0A940EKP4)被确定为开发多位点疫苗构建体的潜在候选蛋白。对 OM 蛋白进行分析后,根据 CTL 代表细胞毒性 T 淋巴细胞和 HTL 代表辅助性 T 淋巴细胞表位的预测,在嵌合构建体的基础上发现了六个免疫优势(B 细胞)表位。为确保全面覆盖全球人群,CTL 和 HTL 的覆盖率分别为 58.18 % 和 46.56 %,总体覆盖率为 77.23 %。通过使用 EAAAK、GPGPG 和 AAY 连接器、霍乱毒素 B 亚基佐剂和适当的表位,嵌合体疫苗被顺利地整合到了一起,有效地激发了适应性免疫和先天性免疫反应。例如,给药抗原在注射后第五天出现计数高峰,然后逐渐下降,直到第十五天。在抗原浓度降低的同时,还观察到几种抗体(IgG + IgM > 700,000;IgM > 600,000;IgG1 + IgG2;IgG1 > 500,000)的水平升高。通过 iMODS 进行的分子动力学模拟显示,残基对之间存在很强的相关性,突出了对接复合物的稳定性。所设计的疫苗在诱导特异性免疫原反应方面具有良好的潜力,从而促进了未来针对杜氏疟原虫疫苗开发的研究。
{"title":"Application of immunoinformatics to develop a novel and effective multiepitope chimeric vaccine against Variovorax durovernensis","authors":"Ahmad Hasan , Muhammad Ibrahim , Wadi B. Alonazi , Jian Shen","doi":"10.1016/j.compbiolchem.2024.108266","DOIUrl":"10.1016/j.compbiolchem.2024.108266","url":null,"abstract":"<div><div>Bloodstream infections pose a significant public health challenge caused by resistant bacteria such as <em>Variovorax durovernensis, a</em> recently reported Gram-negative bacterium, worsening the burden on healthcare systems. The design of a vaccine using chimeric peptides derived from a representative <em>V. durovernensis</em> strain holds significant promise for preventing disease onset. The current study aimed to employ reverse vaccinology (RV) approaches such as the retrieval of <em>V. durovernensis</em> proteomics data, removal of redundant proteins by CD-HIT, filtering of non-homologous proteins to humans and essential proteins, identification of outer membrane (OM) proteins by CELLO and PSORTb. Following these steps immunoinformatic approaches were applied, such as epitope prediction by IEDB, vaccine design using linkers and adjuvant and analysis of antigenicity, allergenicity, safety and stability. Among the 4208 nonredundant proteins, an OmpA family protein (A0A940EKP4) was designated a potential candidate for the development of a multiepitope vaccine construct. Upon analysis of OM protein, six immunodominant (B cell) epitopes were found on the basis of the chimeric construct following the prediction of CTL stands cytotoxic T lymphocyte and HTL stands helper T lymphocyte epitopes. To ensure comprehensive population coverage globally, the CTL and HTL coverage rates were 58.18 % and 46.56 %, respectively, and 77.23 % overall. By utilizing EAAAK, GPGPG, and AAY linkers, Cholera toxin B subunit adjuvants, and appropriate epitopes were smoothly incorporated into a chimeric vaccine effectively triggering both adaptive and innate immune responses. For example, the administered antigen showed a peak in counts on the fifthday post injection and then gradually declined until the fifteenth day. Elevated levels of several antibodies (IgG + IgM > 700,000; IgM > 600,000; IgG1 + IgG2; IgG1 > 500,000) were observed as decreased in the antigen concentration. Molecular dynamics simulations carried out via iMODS revealed strong correlations between residue pairs, highlighting the stability of the docked complex. The designed vaccine has promising potential in eliciting specific immunogenic responses, thereby facilitating future research for vaccine development against <em>V. durovernensis</em>.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"113 ","pages":"Article 108266"},"PeriodicalIF":2.6,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142586857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-23DOI: 10.1016/j.compbiolchem.2024.108260
Panchami V.U. , Manish T.I. , Manesh K.K.
Integrating and analyzing the pancancer data collected from different experiments is crucial for gaining insights into the common mechanisms in the molecular level underlying the development and progression of cancers. Epigenetic study of the pancancer data can provide promising results in biomarker discovery. The genes that are epigenetically dysregulated in different cancers are powerful biomarkers for drug-related studies. This paper identifies the genes having altered expression due to aberrant methylation patterns using differential analysis of TCGA pancancer data of 12 different cancers. We identified a comprehensive set of 115 epigenetic biomarker genes out of which 106 genes having pancancer properties. The correlation analysis, gene set enrichment, protein–protein interaction analysis, pancancer characteristics analysis, and diagnostic modeling were performed on these biomarkers to illustrate the power of this signature and found to be important in different molecular operations related to cancer. An accuracy of 97.56% was obtained on TCGA pancancer gene expression dataset for predicting the binary class tumor or normal. The source code and dataset of this work are available at https://github.com/panchamisuneeth/EpiPanCan.git.
{"title":"An integrative analysis to identify pancancer epigenetic biomarkers","authors":"Panchami V.U. , Manish T.I. , Manesh K.K.","doi":"10.1016/j.compbiolchem.2024.108260","DOIUrl":"10.1016/j.compbiolchem.2024.108260","url":null,"abstract":"<div><div>Integrating and analyzing the pancancer data collected from different experiments is crucial for gaining insights into the common mechanisms in the molecular level underlying the development and progression of cancers. Epigenetic study of the pancancer data can provide promising results in biomarker discovery. The genes that are epigenetically dysregulated in different cancers are powerful biomarkers for drug-related studies. This paper identifies the genes having altered expression due to aberrant methylation patterns using differential analysis of TCGA pancancer data of 12 different cancers. We identified a comprehensive set of 115 epigenetic biomarker genes out of which 106 genes having pancancer properties. The correlation analysis, gene set enrichment, protein–protein interaction analysis, pancancer characteristics analysis, and diagnostic modeling were performed on these biomarkers to illustrate the power of this signature and found to be important in different molecular operations related to cancer. An accuracy of 97.56% was obtained on TCGA pancancer gene expression dataset for predicting the binary class tumor or normal. The source code and dataset of this work are available at <span><span>https://github.com/panchamisuneeth/EpiPanCan.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"113 ","pages":"Article 108260"},"PeriodicalIF":2.6,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142523871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-23DOI: 10.1016/j.compbiolchem.2024.108257
Negar Safinianaini , Camila P.E. De Souza , Andrew Roth , Hazal Koptagel , Hosein Toosi , Jens Lagergren
Investigating tumor heterogeneity using single-cell sequencing technologies is imperative to understand how tumors evolve since each cell subpopulation harbors a unique set of genomic features that yields a unique phenotype, which is bound to have clinical relevance. Clustering of cells based on copy number data obtained from single-cell DNA sequencing provides an opportunity to identify different tumor cell subpopulations. Accordingly, computational methods have emerged for single-cell copy number profiling and clustering; however, these two tasks have been handled sequentially by applying various ad-hoc pre- and post-processing steps; hence, a procedure vulnerable to introducing clustering artifacts. We avoid the clustering artifact issues in our method, CopyMix, a Variational Inference for a novel mixture model, by jointly inferring cell clusters and their underlying copy number profile. Our probabilistic graphical model is an improved version of the mixture of hidden Markov models, which is designed uniquely to infer single-cell copy number profiling and clustering. For the evaluation, we used likelihood-ratio test, CH index, Silhouette, V-measure, total variation scores. CopyMix performs well on both biological and simulated data. Our favorable results indicate a considerable potential to obtain clinical impact by using CopyMix in studies of cancer tumor heterogeneity.
利用单细胞测序技术研究肿瘤异质性是了解肿瘤如何演变的当务之急,因为每个细胞亚群都有一套独特的基因组特征,从而产生独特的表型,这必然与临床相关。根据单细胞 DNA 测序获得的拷贝数数据对细胞进行聚类,为识别不同的肿瘤细胞亚群提供了机会。因此,出现了用于单细胞拷贝数分析和聚类的计算方法;然而,这两项任务是通过应用各种临时的前处理和后处理步骤来顺序处理的;因此,这种程序很容易引入聚类伪影。在我们的方法 "CopyMix--新型混合模型的变量推理 "中,我们通过联合推断细胞簇及其基本拷贝数特征,避免了聚类伪影问题。我们的概率图形模型是隐马尔可夫模型混合物的改进版,其设计独特,可用于推断单细胞拷贝数剖析和聚类。在评估中,我们使用了似然比检验、CH 指数、Silhouette、V-measure 和总变异分数。CopyMix 在生物数据和模拟数据上都表现良好。我们的良好结果表明,在癌症肿瘤异质性研究中使用 CopyMix 有很大的潜力产生临床影响。
{"title":"CopyMix: Mixture model based single-cell clustering and copy number profiling using variational inference","authors":"Negar Safinianaini , Camila P.E. De Souza , Andrew Roth , Hazal Koptagel , Hosein Toosi , Jens Lagergren","doi":"10.1016/j.compbiolchem.2024.108257","DOIUrl":"10.1016/j.compbiolchem.2024.108257","url":null,"abstract":"<div><div>Investigating tumor heterogeneity using single-cell sequencing technologies is imperative to understand how tumors evolve since each cell subpopulation harbors a unique set of genomic features that yields a unique phenotype, which is bound to have clinical relevance. Clustering of cells based on copy number data obtained from single-cell DNA sequencing provides an opportunity to identify different tumor cell subpopulations. Accordingly, computational methods have emerged for single-cell copy number profiling and clustering; however, these two tasks have been handled sequentially by applying various ad-hoc pre- and post-processing steps; hence, a procedure vulnerable to introducing clustering artifacts. We avoid the clustering artifact issues in our method, CopyMix, a Variational Inference for a novel mixture model, by jointly inferring cell clusters and their underlying copy number profile. Our probabilistic graphical model is an improved version of the mixture of hidden Markov models, which is designed uniquely to infer single-cell copy number profiling and clustering. For the evaluation, we used likelihood-ratio test, CH index, Silhouette, V-measure, total variation scores. CopyMix performs well on both biological and simulated data. Our favorable results indicate a considerable potential to obtain clinical impact by using CopyMix in studies of cancer tumor heterogeneity.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"113 ","pages":"Article 108257"},"PeriodicalIF":2.6,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142578223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-19DOI: 10.1016/j.compbiolchem.2024.108243
Vinnakota Sai Durga Tejaswi, Venubabu Rachapudi
Liver cancer is a leading cause of cancer-related deaths, often diagnosed at advanced stages due to reliance on traditional imaging methods. Existing computer-aided diagnosis systems struggle with noise, anatomical complexity, and ineffective feature integration, leading to inaccuracies in lesion segmentation and classification. By effectively addressing these challenges, the model aims to enhance early detection and assist clinicians in making informed decisions. Ultimately, this research seeks to contribute to more efficient and accurate liver cancer diagnosis. This paper presents a novel model for liver cancer classification, called SegNet-based Liver Cancer Classification via SqueezeNet (SgN-LCC-SqN). The model effectively executes liver cancer segmentation and classification through four key steps: preprocessing, segmentation, feature extraction, and classification. During preprocessing, Quadratic Mean Estimated Wiener Filtering (QMEWF) is utilized to minimize image noise. Segmentation divides the image into segments using Enhanced Feature Pyramid SegNet (EFP-SgN), which is essential for precise diagnosis. Feature extraction encompasses color features, Local Directional Pattern Variance, and Correlation Filtering-Local Gradient Increasing Pattern (CF-LGIP) features. The extracted features are then processed through an ensemble model, Deep Convolutional, Recurrent, Long Short Term Memory with SqueezeNet (DCR-LSTM-SqN), which includes Deep Convolutional Neural Network (DCNN), Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), and Modified Loss Function in SqueezeNet (MLF-SqN) classifiers, sequentially analyzing the feature sets through DCNN, RNN, and LSTM before classification by MLF-SqN. The performance of the suggested DCR-LSTM-SqN model is evaluated over conventional methods for positive, negative and other metrics. The DCR-LSTM-SqN model consistently demonstrates superior accuracy, ranging from 0.947 to 0.984, across all training data percentages. Thus, the proposed model effectively segments liver lesions and classifies cancerous areas, demonstrating its potential as a valuable resource for clinicians to enhance the efficiency and accuracy of liver cancer diagnosis.
{"title":"Computer-aided diagnosis of liver cancer with improved SegNet and deep stacking ensemble model","authors":"Vinnakota Sai Durga Tejaswi, Venubabu Rachapudi","doi":"10.1016/j.compbiolchem.2024.108243","DOIUrl":"10.1016/j.compbiolchem.2024.108243","url":null,"abstract":"<div><div>Liver cancer is a leading cause of cancer-related deaths, often diagnosed at advanced stages due to reliance on traditional imaging methods. Existing computer-aided diagnosis systems struggle with noise, anatomical complexity, and ineffective feature integration, leading to inaccuracies in lesion segmentation and classification. By effectively addressing these challenges, the model aims to enhance early detection and assist clinicians in making informed decisions. Ultimately, this research seeks to contribute to more efficient and accurate liver cancer diagnosis. This paper presents a novel model for liver cancer classification, called SegNet-based Liver Cancer Classification via SqueezeNet (SgN-LCC-SqN). The model effectively executes liver cancer segmentation and classification through four key steps: preprocessing, segmentation, feature extraction, and classification. During preprocessing, Quadratic Mean Estimated Wiener Filtering (QMEWF) is utilized to minimize image noise. Segmentation divides the image into segments using Enhanced Feature Pyramid SegNet (EFP-SgN), which is essential for precise diagnosis. Feature extraction encompasses color features, Local Directional Pattern Variance, and Correlation Filtering-Local Gradient Increasing Pattern (CF-LGIP) features. The extracted features are then processed through an ensemble model, Deep Convolutional, Recurrent, Long Short Term Memory with SqueezeNet (DCR-LSTM-SqN), which includes Deep Convolutional Neural Network (DCNN), Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), and Modified Loss Function in SqueezeNet (MLF-SqN) classifiers, sequentially analyzing the feature sets through DCNN, RNN, and LSTM before classification by MLF-SqN. The performance of the suggested DCR-LSTM-SqN model is evaluated over conventional methods for positive, negative and other metrics. The DCR-LSTM-SqN model consistently demonstrates superior accuracy, ranging from 0.947 to 0.984, across all training data percentages. Thus, the proposed model effectively segments liver lesions and classifies cancerous areas, demonstrating its potential as a valuable resource for clinicians to enhance the efficiency and accuracy of liver cancer diagnosis.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"113 ","pages":"Article 108243"},"PeriodicalIF":2.6,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142514745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}