Pub Date : 2025-01-02DOI: 10.1016/j.compbiolchem.2024.108332
Shah Tania Akter Sujana, Md Shahjaman, Atul Chandra Singha
The advancements in single-cell RNA sequencing (scRNAseq) technology have significantly transformed genomics research, enabling the handling of thousands of cells in each experiment. As of now, 32,068 research studies have been cataloged in the Pubmed database. The primary aim of scRNAseq investigations is to identify cell types, understand the antitumor immune response, and identify new and uncommon cell types. Traditional techniques for identifying cell types include microscopy, histology, and pathological characteristics. However, the complexity of instruments and the need for precise experimental design make it difficult to fully capture the overall heterogeneity. Unsupervised clustering and supervised classification methods have been used to solve this task. Supervised cell type classification methods have gained popularity as large-scale, high-quality, well-annotated and more robust results compared to clustering methods. A recent study showed that support vector machine (SVM) gives a high-quality classification performance in different scenarios. In this article, we compare and evaluate the performance of four different kernels (sigmoid, linear, radial, polynomial) of SVM. The results of the experiments on three standard scRNA-seq datasets indicate that SVM with linear and SVM with sigmoid kernel classify the cells more accurately (approx. 99 %) where SVM linear kernel method has remarkably fast computation time and we also evaluate the results using some single cell specific evaluation matrices F-1 score, MCC, AUC value. Additionally, it sheds light on the potential use of kernels of SVM to give underlying information of single-cell RNA-Seq data more effectively.
{"title":"Application of bioinformatic tools in cell type classification for single-cell RNA-seq data.","authors":"Shah Tania Akter Sujana, Md Shahjaman, Atul Chandra Singha","doi":"10.1016/j.compbiolchem.2024.108332","DOIUrl":"https://doi.org/10.1016/j.compbiolchem.2024.108332","url":null,"abstract":"<p><p>The advancements in single-cell RNA sequencing (scRNAseq) technology have significantly transformed genomics research, enabling the handling of thousands of cells in each experiment. As of now, 32,068 research studies have been cataloged in the Pubmed database. The primary aim of scRNAseq investigations is to identify cell types, understand the antitumor immune response, and identify new and uncommon cell types. Traditional techniques for identifying cell types include microscopy, histology, and pathological characteristics. However, the complexity of instruments and the need for precise experimental design make it difficult to fully capture the overall heterogeneity. Unsupervised clustering and supervised classification methods have been used to solve this task. Supervised cell type classification methods have gained popularity as large-scale, high-quality, well-annotated and more robust results compared to clustering methods. A recent study showed that support vector machine (SVM) gives a high-quality classification performance in different scenarios. In this article, we compare and evaluate the performance of four different kernels (sigmoid, linear, radial, polynomial) of SVM. The results of the experiments on three standard scRNA-seq datasets indicate that SVM with linear and SVM with sigmoid kernel classify the cells more accurately (approx. 99 %) where SVM linear kernel method has remarkably fast computation time and we also evaluate the results using some single cell specific evaluation matrices F-1 score, MCC, AUC value. Additionally, it sheds light on the potential use of kernels of SVM to give underlying information of single-cell RNA-Seq data more effectively.</p>","PeriodicalId":93952,"journal":{"name":"Computational biology and chemistry","volume":"115 ","pages":"108332"},"PeriodicalIF":0.0,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142967504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1016/j.compbiolchem.2024.108336
Lina Zhang, Sizan Gao, Qinghao Yuan, Yao Fu, Runtao Yang
Long non-coding RNAs (lncRNAs) are strongly associated with cellular physiological mechanisms and implicated in the numerous diseases. By exploring the subcellular localizations of lncRNAs, we can not only gain crucial insights into the molecular mechanisms of lncRNA-related biological processes but also make valuable contributions towards the diagnosis, prevention, and treatment of various human diseases. However, conventional experimental techniques tend to be laborious and time-intensive. In this context, computational methods are in increased demand. The focus of this paper is the development of an innovative ensemble method that incorporates hybrid features to accurately predict the subcellular localizations of lncRNAs. To address the issue of incomplete reflection of inherent correlation with the intended target using singular source features, the utilization of heterogeneous multi-source features is implemented by introducing information on sequence composition, physicochemical properties, and structure. To address the issue of the imbalance classes in the benchmark dataset, the Synthetic Minority Over-sampling Technique (SMOTE) is employed. Finally, the resulting predictor termed lncSLPre is developed by integrating the outputs of the individual classifiers. Experimental findings suggest that the complementarity of multi-source heterogeneous features improves prediction performance. Additionally, it is demonstrated that the application of SMOTE is effective in mitigating the issue of the imbalanced dataset, while the feature selection approach is critical in eliminating extraneous and redundant features. Compared with existing advanced methods, lncSLPre achieves better performance with an overall accuracy improvement of 13.13%, 2.15%, and 3.23%, respectively, indicating that lncSLPre can effectively predict lncRNA subcellular localizations.
{"title":"An ensemble learning method combined with multiple feature representation strategies to predict lncRNA subcellular localizations.","authors":"Lina Zhang, Sizan Gao, Qinghao Yuan, Yao Fu, Runtao Yang","doi":"10.1016/j.compbiolchem.2024.108336","DOIUrl":"https://doi.org/10.1016/j.compbiolchem.2024.108336","url":null,"abstract":"<p><p>Long non-coding RNAs (lncRNAs) are strongly associated with cellular physiological mechanisms and implicated in the numerous diseases. By exploring the subcellular localizations of lncRNAs, we can not only gain crucial insights into the molecular mechanisms of lncRNA-related biological processes but also make valuable contributions towards the diagnosis, prevention, and treatment of various human diseases. However, conventional experimental techniques tend to be laborious and time-intensive. In this context, computational methods are in increased demand. The focus of this paper is the development of an innovative ensemble method that incorporates hybrid features to accurately predict the subcellular localizations of lncRNAs. To address the issue of incomplete reflection of inherent correlation with the intended target using singular source features, the utilization of heterogeneous multi-source features is implemented by introducing information on sequence composition, physicochemical properties, and structure. To address the issue of the imbalance classes in the benchmark dataset, the Synthetic Minority Over-sampling Technique (SMOTE) is employed. Finally, the resulting predictor termed lncSLPre is developed by integrating the outputs of the individual classifiers. Experimental findings suggest that the complementarity of multi-source heterogeneous features improves prediction performance. Additionally, it is demonstrated that the application of SMOTE is effective in mitigating the issue of the imbalanced dataset, while the feature selection approach is critical in eliminating extraneous and redundant features. Compared with existing advanced methods, lncSLPre achieves better performance with an overall accuracy improvement of 13.13%, 2.15%, and 3.23%, respectively, indicating that lncSLPre can effectively predict lncRNA subcellular localizations.</p>","PeriodicalId":93952,"journal":{"name":"Computational biology and chemistry","volume":"115 ","pages":"108336"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142928857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-30DOI: 10.1016/j.compbiolchem.2024.108335
Sakthi Ulaganathan, Pon Harshavardhanan, N V Ganapathi Raju, G Parthasarathy
Autism spectrum disorder (ASD) is the neuro-developmental disorder caused by various changes in the brain. It affects the life conditions with social interaction and communication. Most of the previous researches used the various techniques for the early detection to reduce the ASD, but it had been occurred several complications such as, time expenses, and low accessibility for diagnosis.This paper aims to develop the JSTO-DenseNetmodel is for the detection of ASD. In this paper, an input autism brainimage is considered as an input applied to image pre-processing phase. In image pre-processing, the clatters are removed utilizing Gaussian filtering and also, Region of Interest (ROI) extraction is carried out. Thereafter, extraction of pivotal region is done based on functional connectivity utilizing proposed Jaya Sewing Training Optimization (JSTO). The JSTO is newly introduced by combining Jaya algorithm and Sewing Training-Based Optimization (STBO). Thus, output-1 is obtained. In feature extraction phase, grey level co-occurrence matrix (GLCM) features like entropy, correlation, energy, homogeneity, inverse difference moment, Angular second moment and texture features namelylocal ternary patterns (LTP), Local Optimal Oriented Pattern (LOOP) and Histogram of Oriented Gradients (HOG) are extracted from the Magnetic Resonance Imaging (MRI). Therefore, output-2 is obtained. From output-1 and output-2, ASD classification is accomplished using DenseNet, which is trained employing same proposed JSTO.The proposed JSTO-DenseNet model achieves the highest accuracy of 94.8 %, True Positive Rate (TPR) of 90 %, True Negative Rate (TNR) of 90.5 %, un-weighted average recall (UAR) of 89.8 % and the lowest False Negative Rate (FNR) of 86.7 %, and False Positive Rate of 82.6 %, when compared with other traditional methods like, Explainable Artificial Intelligence (XAI), Hybrid deep lightweight feature generator, CLAttention, Two stream end-to-end deep learning (DL), Auto-Encoder feature representation, and Fuzzy Inference Gait System-Deep Extreme Adaptive Fuzzy (FIGS-DEAF) based on Abide 1 dataset.
自闭症谱系障碍(ASD)是由大脑的各种变化引起的神经发育障碍。它通过社会交往和交流影响着人们的生活状况。以往的研究大多采用各种早期检测技术来减少ASD的发生,但存在诊断费时、可及性低等并发症。本文旨在开发用于ASD检测的JSTO-DenseNetmodel。本文将自闭症脑图像作为输入,应用于图像预处理阶段。在图像预处理中,利用高斯滤波去除杂波,并提取感兴趣区域(ROI)。然后,利用提出的JSTO算法,基于功能连通性提取关键区域。JSTO是将Jaya算法与基于缝纫训练的优化算法(Sewing training based Optimization, STBO)相结合而提出的一种新算法。因此,得到output-1。在特征提取阶段,从磁共振成像(MRI)中提取灰度共生矩阵(GLCM)特征,如熵、相关性、能量、均匀性、逆差矩、角秒矩和纹理特征,即局部三元模式(LTP)、局部最优定向模式(LOOP)和定向梯度直方图(HOG)。因此,得到输出2。从输出1和输出2来看,ASD分类是使用DenseNet完成的,DenseNet使用相同的JSTO进行训练。与可解释人工智能(Explainable Artificial Intelligence, XAI)、混合深度轻量级特征生成器、CLAttention、两流端到端深度学习(Two stream end- end deep learning, DL)等传统方法相比,JSTO-DenseNet模型的准确率最高,为94.8 %,真阳性率(True Positive Rate, TPR)为90 %,真阴性率(True Negative Rate, TNR)为90.5 %,非加权平均召回率(unweighted average recall, UAR)为89.8 %,假阴性率(False Negative Rate, FNR)最低,为86.7 %,假阳性率为82.6 %。基于遵守1数据集的自编码器特征表示和模糊推理步态系统-深度极端自适应模糊(FIGS-DEAF)。
{"title":"Hybrid optimization enabled DenseNet for autism spectrum disorders using MRI image.","authors":"Sakthi Ulaganathan, Pon Harshavardhanan, N V Ganapathi Raju, G Parthasarathy","doi":"10.1016/j.compbiolchem.2024.108335","DOIUrl":"https://doi.org/10.1016/j.compbiolchem.2024.108335","url":null,"abstract":"<p><p>Autism spectrum disorder (ASD) is the neuro-developmental disorder caused by various changes in the brain. It affects the life conditions with social interaction and communication. Most of the previous researches used the various techniques for the early detection to reduce the ASD, but it had been occurred several complications such as, time expenses, and low accessibility for diagnosis.This paper aims to develop the JSTO-DenseNetmodel is for the detection of ASD. In this paper, an input autism brainimage is considered as an input applied to image pre-processing phase. In image pre-processing, the clatters are removed utilizing Gaussian filtering and also, Region of Interest (ROI) extraction is carried out. Thereafter, extraction of pivotal region is done based on functional connectivity utilizing proposed Jaya Sewing Training Optimization (JSTO). The JSTO is newly introduced by combining Jaya algorithm and Sewing Training-Based Optimization (STBO). Thus, output-1 is obtained. In feature extraction phase, grey level co-occurrence matrix (GLCM) features like entropy, correlation, energy, homogeneity, inverse difference moment, Angular second moment and texture features namelylocal ternary patterns (LTP), Local Optimal Oriented Pattern (LOOP) and Histogram of Oriented Gradients (HOG) are extracted from the Magnetic Resonance Imaging (MRI). Therefore, output-2 is obtained. From output-1 and output-2, ASD classification is accomplished using DenseNet, which is trained employing same proposed JSTO.The proposed JSTO-DenseNet model achieves the highest accuracy of 94.8 %, True Positive Rate (TPR) of 90 %, True Negative Rate (TNR) of 90.5 %, un-weighted average recall (UAR) of 89.8 % and the lowest False Negative Rate (FNR) of 86.7 %, and False Positive Rate of 82.6 %, when compared with other traditional methods like, Explainable Artificial Intelligence (XAI), Hybrid deep lightweight feature generator, CLAttention, Two stream end-to-end deep learning (DL), Auto-Encoder feature representation, and Fuzzy Inference Gait System-Deep Extreme Adaptive Fuzzy (FIGS-DEAF) based on Abide 1 dataset.</p>","PeriodicalId":93952,"journal":{"name":"Computational biology and chemistry","volume":"115 ","pages":"108335"},"PeriodicalIF":0.0,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142967492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-30DOI: 10.1016/j.compbiolchem.2024.108338
Kamelia Zaman Moon, Md Habibur Rahman, Md Jahangir Alam, Md Arju Hossain, Sungho Hwang, Sojin Kang, Seungjoon Moon, Moon Nyeo Park, Chi-Hoon Ahn, Bonglee Kim
Clinical observations indicate a pronounced exacerbation of Cardiovascular Diseases (CVDs) in individuals grappling with Alcohol Use Disorder (AUD), suggesting an intricate interplay between these maladies. Pinpointing shared risk factors for both conditions has proven elusive. To address this, we pioneered a sophisticated bioinformatics framework and network-based strategy to unearth genes exhibiting aberrant expression patterns in both AUD and CVDs. In heart tissue samples from patients battling both AUD and CVDs, our study identified 76 Differentially Expressed Genes (DEGs) further used for retrieving important Gene Ontology (GO) keywords and metabolic pathways, highlighting mechanisms like proinflammatory cascades, T-cell cytotoxicity, antigen processing and presentation. By using Protein-Protein Interaction (PPI) analysis, we were able to identify key hub proteins that have a significant impact on the pathophysiology of these illnesses. Several hub proteins were identified include PTGS2, VCAM1, CCL2, CXCL8, IL7R, among these only CDH1 was covered in 10 algorithms of cytoHubba plugin. Furthermore, we pinpointed several Transcription Factors (TFs), including SOD2, CXCL8, THBS2, GREM1, CCL2, and PTGS2, alongside potential microRNAs (miRNAs) such as hsa-mir-203a-3p, hsa-mir-23a-3p, hsa-mir-98-5p, and hsa-mir-7-5p, which exert critical regulatory control over gene expression… In vitro study investigates the effect of alcohol on E-cadherin (CDH1) expression in HepG2 and Hep3B cells, showing a significant decrease in expression following ethanol treatment. These findings suggest that alcohol exposure may disrupt cell adhesion, potentially contributing to cellular changes associated with cardiovascular diseases. Our innovative approach has unveiled distinctive biomarkers delineating the dynamic interplay between AUD and various cardiovascular conditions for future therapeutic exploration.
{"title":"Unraveling the interplay between cardiovascular diseases and alcohol use disorder: A bioinformatics and network-based exploration of shared molecular pathways and key biomarkers validation via western blot analysis.","authors":"Kamelia Zaman Moon, Md Habibur Rahman, Md Jahangir Alam, Md Arju Hossain, Sungho Hwang, Sojin Kang, Seungjoon Moon, Moon Nyeo Park, Chi-Hoon Ahn, Bonglee Kim","doi":"10.1016/j.compbiolchem.2024.108338","DOIUrl":"https://doi.org/10.1016/j.compbiolchem.2024.108338","url":null,"abstract":"<p><p>Clinical observations indicate a pronounced exacerbation of Cardiovascular Diseases (CVDs) in individuals grappling with Alcohol Use Disorder (AUD), suggesting an intricate interplay between these maladies. Pinpointing shared risk factors for both conditions has proven elusive. To address this, we pioneered a sophisticated bioinformatics framework and network-based strategy to unearth genes exhibiting aberrant expression patterns in both AUD and CVDs. In heart tissue samples from patients battling both AUD and CVDs, our study identified 76 Differentially Expressed Genes (DEGs) further used for retrieving important Gene Ontology (GO) keywords and metabolic pathways, highlighting mechanisms like proinflammatory cascades, T-cell cytotoxicity, antigen processing and presentation. By using Protein-Protein Interaction (PPI) analysis, we were able to identify key hub proteins that have a significant impact on the pathophysiology of these illnesses. Several hub proteins were identified include PTGS2, VCAM1, CCL2, CXCL8, IL7R, among these only CDH1 was covered in 10 algorithms of cytoHubba plugin. Furthermore, we pinpointed several Transcription Factors (TFs), including SOD2, CXCL8, THBS2, GREM1, CCL2, and PTGS2, alongside potential microRNAs (miRNAs) such as hsa-mir-203a-3p, hsa-mir-23a-3p, hsa-mir-98-5p, and hsa-mir-7-5p, which exert critical regulatory control over gene expression… In vitro study investigates the effect of alcohol on E-cadherin (CDH1) expression in HepG2 and Hep3B cells, showing a significant decrease in expression following ethanol treatment. These findings suggest that alcohol exposure may disrupt cell adhesion, potentially contributing to cellular changes associated with cardiovascular diseases. Our innovative approach has unveiled distinctive biomarkers delineating the dynamic interplay between AUD and various cardiovascular conditions for future therapeutic exploration.</p>","PeriodicalId":93952,"journal":{"name":"Computational biology and chemistry","volume":"115 ","pages":"108338"},"PeriodicalIF":0.0,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142960276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-30DOI: 10.1016/j.compbiolchem.2024.108337
Li Zhang, Outi Lampela, Lari Lehtiö, André H Juffer
Single-stranded breaks (SSBs) are the most frequent DNA lesions threatening genomic integrity-understanding how DNA sensor proteins recognize certain SSB types is crucial for studies of the DNA repair pathways. During repair of damaged DNA the final SSB that is to be ligated contains a 5'-phosphorylated end. The present work employed molecular simulation (MD) of DNA with a phosphorylated break in solution to address multiple questions regarding the dynamics of the break site. How does the 5'-phosphate group behave before it initiates a connection with other biomolecules? What is the conformation of the SSB site when it is likely to be recognized by DNA repair factors once the DNA repair response is triggered? And how is the structure and dynamics of DNA affected by the presence of a break? For this purpose, a series of MD simulations of 20 base pair DNAs, each with either a pyrimidine-based or purine-based break, were completed at a combined length of over 20,000 ns simulation time and compared with intact DNA of the same sequence. An analysis of the DNA forms, translational and orientational helical parameters, local break site stiffness, bending angles, 5'-phosphate group orientation dynamics, and the effects of the protonation state of the break site phosphate group provides insights into the mechanism for the break site recognition.
{"title":"Insights into the behaviour of phosphorylated DNA breaks from molecular dynamic simulations.","authors":"Li Zhang, Outi Lampela, Lari Lehtiö, André H Juffer","doi":"10.1016/j.compbiolchem.2024.108337","DOIUrl":"https://doi.org/10.1016/j.compbiolchem.2024.108337","url":null,"abstract":"<p><p>Single-stranded breaks (SSBs) are the most frequent DNA lesions threatening genomic integrity-understanding how DNA sensor proteins recognize certain SSB types is crucial for studies of the DNA repair pathways. During repair of damaged DNA the final SSB that is to be ligated contains a 5'-phosphorylated end. The present work employed molecular simulation (MD) of DNA with a phosphorylated break in solution to address multiple questions regarding the dynamics of the break site. How does the 5'-phosphate group behave before it initiates a connection with other biomolecules? What is the conformation of the SSB site when it is likely to be recognized by DNA repair factors once the DNA repair response is triggered? And how is the structure and dynamics of DNA affected by the presence of a break? For this purpose, a series of MD simulations of 20 base pair DNAs, each with either a pyrimidine-based or purine-based break, were completed at a combined length of over 20,000 ns simulation time and compared with intact DNA of the same sequence. An analysis of the DNA forms, translational and orientational helical parameters, local break site stiffness, bending angles, 5'-phosphate group orientation dynamics, and the effects of the protonation state of the break site phosphate group provides insights into the mechanism for the break site recognition.</p>","PeriodicalId":93952,"journal":{"name":"Computational biology and chemistry","volume":"115 ","pages":"108337"},"PeriodicalIF":0.0,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142928754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-29DOI: 10.1016/j.compbiolchem.2024.108326
Zhanwei Hou, Zhenhan Xu, Chaokun Yan, Huimin Luo, Junwei Luo
Background: Compound-protein interaction (CPI) is essential to drug discovery and design, where traditional methods are often costly and have low success rates. Recently, the integration of machine learning and deep learning in CPI research has shown potential to reduce costs and enhance discovery efficiency by improving protein target identification accuracy. Additionally, with an urgent need for novel therapies against complex diseases, CPI investigation could lead to the identification of effective new drugs. Since drug-target interactions involve complex biological processes, refined models are necessary for precise feature extraction and analysis. Nevertheless, current CPI prediction methods still face significant limitations: predictions lack sufficient accuracy, models require improved generalization ability, and further validation across diverse datasets remains essential.
Results: To address some issues at the current stage, this paper proposes a combined deep learning method, CPI-GGS, for predicting and analyzing compound-protein interactions. The source code is available on GitHub at https://github.com/xingjie321/CPI-GGS.
Conclusions: The experimental results demonstrate improved accuracy in predicting compound-protein interactions and enhance the understanding of how compounds and proteins interact, providing a valuable new tool for drug discovery and development.
{"title":"CPI-GGS: A deep learning model for predicting compound-protein interaction based on graphs and sequences.","authors":"Zhanwei Hou, Zhenhan Xu, Chaokun Yan, Huimin Luo, Junwei Luo","doi":"10.1016/j.compbiolchem.2024.108326","DOIUrl":"https://doi.org/10.1016/j.compbiolchem.2024.108326","url":null,"abstract":"<p><strong>Background: </strong>Compound-protein interaction (CPI) is essential to drug discovery and design, where traditional methods are often costly and have low success rates. Recently, the integration of machine learning and deep learning in CPI research has shown potential to reduce costs and enhance discovery efficiency by improving protein target identification accuracy. Additionally, with an urgent need for novel therapies against complex diseases, CPI investigation could lead to the identification of effective new drugs. Since drug-target interactions involve complex biological processes, refined models are necessary for precise feature extraction and analysis. Nevertheless, current CPI prediction methods still face significant limitations: predictions lack sufficient accuracy, models require improved generalization ability, and further validation across diverse datasets remains essential.</p><p><strong>Results: </strong>To address some issues at the current stage, this paper proposes a combined deep learning method, CPI-GGS, for predicting and analyzing compound-protein interactions. The source code is available on GitHub at https://github.com/xingjie321/CPI-GGS.</p><p><strong>Conclusions: </strong>The experimental results demonstrate improved accuracy in predicting compound-protein interactions and enhance the understanding of how compounds and proteins interact, providing a valuable new tool for drug discovery and development.</p>","PeriodicalId":93952,"journal":{"name":"Computational biology and chemistry","volume":"115 ","pages":"108326"},"PeriodicalIF":0.0,"publicationDate":"2024-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142928688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-28DOI: 10.1016/j.compbiolchem.2024.108334
Mingchao Zhang, Zhenming Lin, Wenbin Liu
In the present study, we uncovered and validated potential biomarkers related to gout, characterized by the accumulation of sodium urate crystals in different joint and non-joint structures. The data set GSE160170 was obtained from the GEO database. We conducted differential gene expression analysis, GO enrichment assessment, and KEGG pathway analysis to understand the underlying processes. The overlap of 66 methodologies was visualized through UpSetR (v1.3.3). We used Cytoscape's cytoHubba to detect pivotal genes and mapped out protein-protein interaction (PPI) networks. The overlapping targets among upregulated, downregulated, and key genes were depicted using a Venn diagram. CIBERSORT was employed to ascertain the composition of 22 immune cell types in tissue samples. Subsequently, CCL18 levels in serum samples were quantified using enzyme-linked immunosorbent assay (ELISA) and served as a biomarker evaluation metric. The DEG analysis revealed 1000 genes with varied expression (with an even split of 500 upregulated and 500 downregulated genes) when contrasting gout patients with healthy counterparts. The GO enrichment findings revealed a predominant association with small molecule degradation, positive regulatory catabolic mechanism, organelle division, signal transduction, and axon formation. KEGG assay associated the DEGs predominantly with conditions such as systemic lupus erythematosus, pathways such as tumor necrosis factor (TNF) signaling, as well as alcohol dependency and necroptosis. Intersections were visualized using UpSetR, resulting in the identification of 20 hub genes. A Venn representation highlighted five upregulated genes and three downregulated genes. CIBERSORT analysis revealed a noticeable increase in the number of gamma delta T cells and regulatory T cells. The PPI network analysis revealed CC Chemokine ligand 18 (CCL18) as a critical gene. Gout-afflicted samples exhibited a heightened CCL18 expression compared to healthy ones (P < 0.01). Altogether, CCL18 is a promising biomarker for patients with gout and is suitable for predicting of gout.
{"title":"The immune microenvironment related biomarker CCL18 for patients with gout by comprehensive analysis.","authors":"Mingchao Zhang, Zhenming Lin, Wenbin Liu","doi":"10.1016/j.compbiolchem.2024.108334","DOIUrl":"https://doi.org/10.1016/j.compbiolchem.2024.108334","url":null,"abstract":"<p><p>In the present study, we uncovered and validated potential biomarkers related to gout, characterized by the accumulation of sodium urate crystals in different joint and non-joint structures. The data set GSE160170 was obtained from the GEO database. We conducted differential gene expression analysis, GO enrichment assessment, and KEGG pathway analysis to understand the underlying processes. The overlap of 66 methodologies was visualized through UpSetR (v1.3.3). We used Cytoscape's cytoHubba to detect pivotal genes and mapped out protein-protein interaction (PPI) networks. The overlapping targets among upregulated, downregulated, and key genes were depicted using a Venn diagram. CIBERSORT was employed to ascertain the composition of 22 immune cell types in tissue samples. Subsequently, CCL18 levels in serum samples were quantified using enzyme-linked immunosorbent assay (ELISA) and served as a biomarker evaluation metric. The DEG analysis revealed 1000 genes with varied expression (with an even split of 500 upregulated and 500 downregulated genes) when contrasting gout patients with healthy counterparts. The GO enrichment findings revealed a predominant association with small molecule degradation, positive regulatory catabolic mechanism, organelle division, signal transduction, and axon formation. KEGG assay associated the DEGs predominantly with conditions such as systemic lupus erythematosus, pathways such as tumor necrosis factor (TNF) signaling, as well as alcohol dependency and necroptosis. Intersections were visualized using UpSetR, resulting in the identification of 20 hub genes. A Venn representation highlighted five upregulated genes and three downregulated genes. CIBERSORT analysis revealed a noticeable increase in the number of gamma delta T cells and regulatory T cells. The PPI network analysis revealed CC Chemokine ligand 18 (CCL18) as a critical gene. Gout-afflicted samples exhibited a heightened CCL18 expression compared to healthy ones (P < 0.01). Altogether, CCL18 is a promising biomarker for patients with gout and is suitable for predicting of gout.</p>","PeriodicalId":93952,"journal":{"name":"Computational biology and chemistry","volume":"115 ","pages":"108334"},"PeriodicalIF":0.0,"publicationDate":"2024-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142928846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-28DOI: 10.1016/j.compbiolchem.2024.108331
Mohammad Rasouli, Fatemeh Safari, Raheleh Roudi, Navid Sobhani
The mesenchymal stem cell (MSC) secretome plays a pivotal role in shaping the tumor microenvironment, influencing both cancer progression and potential therapeutic outcomes. In this research, by using publicly available dataset GSE196312, we investigated the role of MSC secretome on breast cancer cell gene expression. Our results raveled differentially expressed genes, including the upregulation of Phosphatidylinositol-3,4,5-Trisphosphate Dependent Rac Exchange Factor 1 (PREX1), C-C Motif Chemokine Ligand 28 (CCL28), and downregulation of Collagen Type I Alpha 1 Chain (COL1A1), Collagen Type I Alpha 3 Chain (COL1A3), Collagen Type III Alpha 1 Chain (COL3A1), which contributing to extra cellular matrix (ECM) weakening and promoting cell migration. Functional enrichment analyses also highlighted suppression of ECM remodeling pathways, and activation of calcium ion binding and Rap1 signaling pathway. We proposed that Ca2 + medicated activation of Ras-related protein 1 (Rap1) through its its downstream pathways such as Matrix Metalloprotease (MMP), PI3K/Akt, and MEK/ERK signaling pathway contribute to promotion of cell migration. However, the co-culture model by reducing Fibronectin 1 (FN1) and Secreted Protein Acidic and Cysteine Rich (SPARC) gene expression in cancer cells, emphasized on therapeutical aspects of MSC secretome. These findings emphasize on the dual edge sword nature of MSC secretome on cancer cell behaviors, while our major results emphasize on the cancer progression through ECM remodeling, the therapeutic aspects should not be underscored.
{"title":"Investigation of mesenchymal stem cell secretome on breast cancer gene expression: A bioinformatic approach to identify differentially expressed genes, functional networks, and potential therapeutic targets.","authors":"Mohammad Rasouli, Fatemeh Safari, Raheleh Roudi, Navid Sobhani","doi":"10.1016/j.compbiolchem.2024.108331","DOIUrl":"https://doi.org/10.1016/j.compbiolchem.2024.108331","url":null,"abstract":"<p><p>The mesenchymal stem cell (MSC) secretome plays a pivotal role in shaping the tumor microenvironment, influencing both cancer progression and potential therapeutic outcomes. In this research, by using publicly available dataset GSE196312, we investigated the role of MSC secretome on breast cancer cell gene expression. Our results raveled differentially expressed genes, including the upregulation of Phosphatidylinositol-3,4,5-Trisphosphate Dependent Rac Exchange Factor 1 (PREX1), C-C Motif Chemokine Ligand 28 (CCL28), and downregulation of Collagen Type I Alpha 1 Chain (COL1A1), Collagen Type I Alpha 3 Chain (COL1A3), Collagen Type III Alpha 1 Chain (COL3A1), which contributing to extra cellular matrix (ECM) weakening and promoting cell migration. Functional enrichment analyses also highlighted suppression of ECM remodeling pathways, and activation of calcium ion binding and Rap1 signaling pathway. We proposed that Ca<sup>2 +</sup> medicated activation of Ras-related protein 1 (Rap1) through its its downstream pathways such as Matrix Metalloprotease (MMP), PI3K/Akt, and MEK/ERK signaling pathway contribute to promotion of cell migration. However, the co-culture model by reducing Fibronectin 1 (FN1) and Secreted Protein Acidic and Cysteine Rich (SPARC) gene expression in cancer cells, emphasized on therapeutical aspects of MSC secretome. These findings emphasize on the dual edge sword nature of MSC secretome on cancer cell behaviors, while our major results emphasize on the cancer progression through ECM remodeling, the therapeutic aspects should not be underscored.</p>","PeriodicalId":93952,"journal":{"name":"Computational biology and chemistry","volume":"115 ","pages":"108331"},"PeriodicalIF":0.0,"publicationDate":"2024-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142928805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-28DOI: 10.1016/j.compbiolchem.2024.108320
Pengli Lu, Jiajie Gao, Wenzhi Liu
The metabolic level within an organism typically reflects its health status. Studying the relationship between human diseases and metabolites helps enhance medical professionals' ability for early disease diagnosis and risk prediction. However, traditional biological experimental methods often require substantial resources and manpower, and there is still room for improvement in the performance of existing predictive models. To tackle these, we propose a novel method based on the Neighborhood Aggregation Graph Transformer (NAGphormer) to predict potential associations between diseases and metabolites (DMNAG), aiming to provide guidance for biological experiments and improve experimental efficiency. First, we calculated the Gaussian kernel similarity of diseases and the physicochemical similarity of metabolites, and combined them with known associations to construct a bipartite heterogeneous network. We then calculated the semantic similarity of diseases and the Mol2vec similarity of metabolites, using them respectively as the similarity feature vectors for the disease nodes and metabolite nodes. Meanwhile, we calculate the positional information features of nodes and combine them with similarity features as the initial features of the nodes. Next, we input the bipartite heterogeneous network and node initial features into the Hop2Token module to capture multihop neighborhood information between nodes. Finally, we input the multi-hop features of nodes into the Transformer model for training and obtain the edge prediction probabilities through the decoder. Through experiments, our model achieved an AUC value of 0.9801 and an AUPR value of 0.9818 in five-fold cross-validation. In case studies, most DMNAG-predicted associations have been validated, showcasing the model's reliability and superiority.
{"title":"DMNAG: Prediction of disease-metabolite associations based on Neighborhood Aggregation Graph Transformer.","authors":"Pengli Lu, Jiajie Gao, Wenzhi Liu","doi":"10.1016/j.compbiolchem.2024.108320","DOIUrl":"https://doi.org/10.1016/j.compbiolchem.2024.108320","url":null,"abstract":"<p><p>The metabolic level within an organism typically reflects its health status. Studying the relationship between human diseases and metabolites helps enhance medical professionals' ability for early disease diagnosis and risk prediction. However, traditional biological experimental methods often require substantial resources and manpower, and there is still room for improvement in the performance of existing predictive models. To tackle these, we propose a novel method based on the Neighborhood Aggregation Graph Transformer (NAGphormer) to predict potential associations between diseases and metabolites (DMNAG), aiming to provide guidance for biological experiments and improve experimental efficiency. First, we calculated the Gaussian kernel similarity of diseases and the physicochemical similarity of metabolites, and combined them with known associations to construct a bipartite heterogeneous network. We then calculated the semantic similarity of diseases and the Mol2vec similarity of metabolites, using them respectively as the similarity feature vectors for the disease nodes and metabolite nodes. Meanwhile, we calculate the positional information features of nodes and combine them with similarity features as the initial features of the nodes. Next, we input the bipartite heterogeneous network and node initial features into the Hop2Token module to capture multihop neighborhood information between nodes. Finally, we input the multi-hop features of nodes into the Transformer model for training and obtain the edge prediction probabilities through the decoder. Through experiments, our model achieved an AUC value of 0.9801 and an AUPR value of 0.9818 in five-fold cross-validation. In case studies, most DMNAG-predicted associations have been validated, showcasing the model's reliability and superiority.</p>","PeriodicalId":93952,"journal":{"name":"Computational biology and chemistry","volume":"115 ","pages":"108320"},"PeriodicalIF":0.0,"publicationDate":"2024-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142923970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-27DOI: 10.1016/j.compbiolchem.2024.108328
A T Vivek, Namrata Sahu, Garima Kalakoti, Shailesh Kumar
Eukaryotic transcriptomes are remarkably complex, encompassing not only protein-coding RNAs but also an expanding repertoire of noncoding RNAs (ncRNAs). In plants, ncRNA-ncRNA interactions (NNIs) have emerged as pivotal regulators of gene expression, orchestrating development and adaptive responses to stress. Despite their critical roles, the functional significance of NNIs remains poorly understood, largely due to a lack of comprehensive resources. Here, we present ANNInter, a comprehensive platform that integrates computational predictions with experimental datasets to systematically identify and analyze NNIs. The current version catalogs over 90,000 interactions spanning eight categories of sRNA-to-longer ncRNAs, each extensively annotated with interaction types, identification methods, and functional descriptions. The integrated schema and advanced visualization framework in ANNInter enable users to explore intricate interaction networks, providing system-wide insights into ncRNA-mediated regulation. These interaction data provide unparalleled opportunities to uncover the regulatory roles of NNIs in key biological processes such as growth regulation, stress adaptation, and cellular signaling. By providing an extensive, curated repository of computational and degradome-based interaction data, ANNInter will provide a platform to the study of ncRNA biology, elucidating the complex mechanisms of NNIs and supporting the concept of competing endogenous RNAs (ceRNAs) in gene regulation. The platform is freely accessible at https://www.nipgr.ac.in/ANNInter/.
{"title":"ANNInter: A platform to explore ncRNA-ncRNA interactome of Arabidopsis thaliana.","authors":"A T Vivek, Namrata Sahu, Garima Kalakoti, Shailesh Kumar","doi":"10.1016/j.compbiolchem.2024.108328","DOIUrl":"https://doi.org/10.1016/j.compbiolchem.2024.108328","url":null,"abstract":"<p><p>Eukaryotic transcriptomes are remarkably complex, encompassing not only protein-coding RNAs but also an expanding repertoire of noncoding RNAs (ncRNAs). In plants, ncRNA-ncRNA interactions (NNIs) have emerged as pivotal regulators of gene expression, orchestrating development and adaptive responses to stress. Despite their critical roles, the functional significance of NNIs remains poorly understood, largely due to a lack of comprehensive resources. Here, we present ANNInter, a comprehensive platform that integrates computational predictions with experimental datasets to systematically identify and analyze NNIs. The current version catalogs over 90,000 interactions spanning eight categories of sRNA-to-longer ncRNAs, each extensively annotated with interaction types, identification methods, and functional descriptions. The integrated schema and advanced visualization framework in ANNInter enable users to explore intricate interaction networks, providing system-wide insights into ncRNA-mediated regulation. These interaction data provide unparalleled opportunities to uncover the regulatory roles of NNIs in key biological processes such as growth regulation, stress adaptation, and cellular signaling. By providing an extensive, curated repository of computational and degradome-based interaction data, ANNInter will provide a platform to the study of ncRNA biology, elucidating the complex mechanisms of NNIs and supporting the concept of competing endogenous RNAs (ceRNAs) in gene regulation. The platform is freely accessible at https://www.nipgr.ac.in/ANNInter/.</p>","PeriodicalId":93952,"journal":{"name":"Computational biology and chemistry","volume":"115 ","pages":"108328"},"PeriodicalIF":0.0,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142928686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}