Pub Date : 2024-11-11DOI: 10.1109/TCBB.2024.3489614
Mir Tanveerul Hassan, Hilal Tayara, Kil To Chong
Due to their safety, high activity, and plentiful sources, antioxidant peptides, particularly those produced from food, are thought to be prospective competitors to synthetic antioxidants in the fight against free radical-mediated illnesses. The lengthy and laborious trial-and-error method for identifying antioxidative peptides (AOP) has raised interest in creating computational-based methods. There exist two state-of-the-art AOP predictors; however, the restriction on peptide sequence length makes them inviable. By overcoming the aforementioned problem, a novel predictor might be useful in the context of AOP prediction. The method has been trained, tested, and evaluated on two datasets: a balanced one and an unbalanced one. We used seven different descriptors and five machine-learning (ML) classifiers to construct 35 baseline models. Five ML classifiers were further trained to create five meta-models using the combined output of 35 baseline models. Finally, these five meta-models were aggregated together through ensemble learning to create a robust predictive model named iAnOxPep. On both datasets, our proposed model demonstrated good prediction performance when compared to baseline models and meta-models, demonstrating the superiority of our approach in the identification of AOPs. For the purpose of screening and identifying possible AOPs, we anticipate that the iAnOxPep method will be an invaluable tool.
{"title":"iAnOxPep: a machine learning model for the identification of anti-oxidative peptides using ensemble learning.","authors":"Mir Tanveerul Hassan, Hilal Tayara, Kil To Chong","doi":"10.1109/TCBB.2024.3489614","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3489614","url":null,"abstract":"<p><p>Due to their safety, high activity, and plentiful sources, antioxidant peptides, particularly those produced from food, are thought to be prospective competitors to synthetic antioxidants in the fight against free radical-mediated illnesses. The lengthy and laborious trial-and-error method for identifying antioxidative peptides (AOP) has raised interest in creating computational-based methods. There exist two state-of-the-art AOP predictors; however, the restriction on peptide sequence length makes them inviable. By overcoming the aforementioned problem, a novel predictor might be useful in the context of AOP prediction. The method has been trained, tested, and evaluated on two datasets: a balanced one and an unbalanced one. We used seven different descriptors and five machine-learning (ML) classifiers to construct 35 baseline models. Five ML classifiers were further trained to create five meta-models using the combined output of 35 baseline models. Finally, these five meta-models were aggregated together through ensemble learning to create a robust predictive model named iAnOxPep. On both datasets, our proposed model demonstrated good prediction performance when compared to baseline models and meta-models, demonstrating the superiority of our approach in the identification of AOPs. For the purpose of screening and identifying possible AOPs, we anticipate that the iAnOxPep method will be an invaluable tool.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142619332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-07DOI: 10.1109/TCBB.2024.3493203
C Vishnuppriya, G Tamilpavai
Huntington Disease (HD) is a type of neurodegenerative disorder which causes problems like psychiatric disturbances, movement problem, weight loss and problem in sleep. It needs to be addressed in earlier stage of human life. Nowadays Deep Learning (DL) based system could help physicians provide second opinion in treating patient's disease. In this work, human Deoxyribo Nucleic Acid (DNA) sequence is analyzed using Deep Neural Network (DNN) algorithm to predict the HD disease. The main objective of this work is to identify whether the human DNA is affected by HD or not. Human DNA sequences are collected from National Center for Biotechnology Information (NCBI) and synthetic human DNA data are also constructed for process. Then numerical conversion of human DNA sequence data is done by Chaos Game Representation (CGR) method. After that, numerical values of DNA data are used for feature extraction. Mean, median, standard deviation, entropy, contrast, correlation, energy and homogeneity are extracted. Additionally, the following features such as counts of adenine, thymine, guanine and cytosine are extracted from the DNA sequence data itself. The extracted features are used as input to the DNN classifier and other machine learning based classifiers such as NN (Neural Network), Support Vector Machine (SVM), Random Forest (RF) and Classification Tree with Forward Pruning (CTWFP). Six performance measures are used such as Accuracy, Sensitivity, Specificity, Precision, F1 score and Mathew Correlation Co-efficient (MCC). The study concludes DNN, NN, SVM, RF achieve 100% accuracy and CTWFP achieves accuracy of 87%.
亨廷顿舞蹈症(Huntington Disease,HD)是一种神经退行性疾病,会导致精神障碍、运动障碍、体重减轻和睡眠障碍等问题。这种疾病需要在人类生命的早期阶段加以解决。如今,基于深度学习(DL)的系统可以帮助医生在治疗患者疾病时提供第二意见。在这项工作中,使用深度神经网络(DNN)算法对人类脱氧核糖核酸(DNA)序列进行分析,以预测人类乳腺疾病。这项工作的主要目的是确定人类 DNA 是否受 HD 影响。从美国国家生物技术信息中心(NCBI)收集了人类 DNA 序列,并构建了合成人类 DNA 数据。然后通过混沌博弈表示法(CGR)对人类 DNA 序列数据进行数值转换。之后,DNA 数据的数值被用于特征提取。提取出平均值、中位数、标准偏差、熵、对比度、相关性、能量和同质性。此外,还从 DNA 序列数据中提取了腺嘌呤、胸腺嘧啶、鸟嘌呤和胞嘧啶的计数等特征。提取的特征被用作 DNN 分类器和其他基于机器学习的分类器的输入,如 NN(神经网络)、支持向量机(SVM)、随机森林(RF)和前向剪枝分类树(CTWFP)。使用了六种性能指标,如准确度、灵敏度、特异度、精确度、F1 分数和马修相关系数 (MCC)。研究得出结论,DNN、NN、SVM、RF 的准确率达到 100%,CTWFP 的准确率达到 87%。
{"title":"Performance Comparison between Deep Neural Network and Machine Learning based Classifiers for Huntington Disease Prediction from Human DNA Sequence.","authors":"C Vishnuppriya, G Tamilpavai","doi":"10.1109/TCBB.2024.3493203","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3493203","url":null,"abstract":"<p><p>Huntington Disease (HD) is a type of neurodegenerative disorder which causes problems like psychiatric disturbances, movement problem, weight loss and problem in sleep. It needs to be addressed in earlier stage of human life. Nowadays Deep Learning (DL) based system could help physicians provide second opinion in treating patient's disease. In this work, human Deoxyribo Nucleic Acid (DNA) sequence is analyzed using Deep Neural Network (DNN) algorithm to predict the HD disease. The main objective of this work is to identify whether the human DNA is affected by HD or not. Human DNA sequences are collected from National Center for Biotechnology Information (NCBI) and synthetic human DNA data are also constructed for process. Then numerical conversion of human DNA sequence data is done by Chaos Game Representation (CGR) method. After that, numerical values of DNA data are used for feature extraction. Mean, median, standard deviation, entropy, contrast, correlation, energy and homogeneity are extracted. Additionally, the following features such as counts of adenine, thymine, guanine and cytosine are extracted from the DNA sequence data itself. The extracted features are used as input to the DNN classifier and other machine learning based classifiers such as NN (Neural Network), Support Vector Machine (SVM), Random Forest (RF) and Classification Tree with Forward Pruning (CTWFP). Six performance measures are used such as Accuracy, Sensitivity, Specificity, Precision, F1 score and Mathew Correlation Co-efficient (MCC). The study concludes DNN, NN, SVM, RF achieve 100% accuracy and CTWFP achieves accuracy of 87%.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142604214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-07DOI: 10.1109/TCBB.2024.3493820
Vural Orhun, Jololian Leon, Pan Lurong
The analysis of protein-ligand binding sites plays a crucial role in the initial stages of drug discovery. Accurately predicting the ligand types that are likely to bind to protein-ligand binding sites enables more informed decision making in drug design. Our study, DeepLigType, determines protein-ligand binding sites using Fpocket and then predicts the ligand type of these pockets with the deep learning model, Convolutional Block Attention Module (CBAM) with ResNet. CBAM-ResNet has been trained to accurately predict five distinct ligand types. We classified protein-ligand binding sites into five different categories according to the type of response ligands cause when they bind to their target proteins, which are antagonist, agonist, activator, inhibitor, and others. We created a novel dataset, referred to as LigType5, from the widely recognized PDBbind and scPDB dataset for training and testing our model. While the literature mostly focuses on the specificity and characteristic analysis of protein binding sites by experimental (laboratory-based) methods, we propose a computational method with the DeepLigType architecture. DeepLigType demonstrated an accuracy of 74.30% and an AUC of 0.83 in ligand type prediction on a novel test dataset using the CBAM-ResNet deep learning model. For access to the code implementation of this research, please visit our GitHub repository at https://github.com/drorhunvural/DeepLigType.
{"title":"DeepLigType: Predicting Ligand Types of Protein-Ligand Binding Sites Using a Deep Learning Model.","authors":"Vural Orhun, Jololian Leon, Pan Lurong","doi":"10.1109/TCBB.2024.3493820","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3493820","url":null,"abstract":"<p><p>The analysis of protein-ligand binding sites plays a crucial role in the initial stages of drug discovery. Accurately predicting the ligand types that are likely to bind to protein-ligand binding sites enables more informed decision making in drug design. Our study, DeepLigType, determines protein-ligand binding sites using Fpocket and then predicts the ligand type of these pockets with the deep learning model, Convolutional Block Attention Module (CBAM) with ResNet. CBAM-ResNet has been trained to accurately predict five distinct ligand types. We classified protein-ligand binding sites into five different categories according to the type of response ligands cause when they bind to their target proteins, which are antagonist, agonist, activator, inhibitor, and others. We created a novel dataset, referred to as LigType5, from the widely recognized PDBbind and scPDB dataset for training and testing our model. While the literature mostly focuses on the specificity and characteristic analysis of protein binding sites by experimental (laboratory-based) methods, we propose a computational method with the DeepLigType architecture. DeepLigType demonstrated an accuracy of 74.30% and an AUC of 0.83 in ligand type prediction on a novel test dataset using the CBAM-ResNet deep learning model. For access to the code implementation of this research, please visit our GitHub repository at https://github.com/drorhunvural/DeepLigType.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142604213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-06DOI: 10.1109/TCBB.2024.3492708
Flora Rajaei, Cristian Minoccheri, Emily Wittrup, Richard C Wilson, Brian D Athey, Gilbert S Omenn, Kayvan Najarian
Over the past few years, artificial intelligence (AI) has emerged as a transformative force in drug discovery and development (DDD), revolutionizing many aspects of the process. This survey provides a comprehensive review of recent advancements in AI applications within early drug discovery and post-market drug assessment. It addresses the identification and prioritization of new therapeutic targets, prediction of drug-target interaction (DTI), design of novel drug-like molecules, and assessment of the clinical efficacy of new medications. By integrating AI technologies, pharmaceutical companies can accelerate the discovery of new treatments, enhance the precision of drug development, and bring more effective therapies to market. This shift represents a significant move towards more efficient and cost-effective methodologies in the DDD landscape.
{"title":"AI-based Computational Methods in Early Drug Discovery and Post Market Drug Assessment: A Survey.","authors":"Flora Rajaei, Cristian Minoccheri, Emily Wittrup, Richard C Wilson, Brian D Athey, Gilbert S Omenn, Kayvan Najarian","doi":"10.1109/TCBB.2024.3492708","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3492708","url":null,"abstract":"<p><p>Over the past few years, artificial intelligence (AI) has emerged as a transformative force in drug discovery and development (DDD), revolutionizing many aspects of the process. This survey provides a comprehensive review of recent advancements in AI applications within early drug discovery and post-market drug assessment. It addresses the identification and prioritization of new therapeutic targets, prediction of drug-target interaction (DTI), design of novel drug-like molecules, and assessment of the clinical efficacy of new medications. By integrating AI technologies, pharmaceutical companies can accelerate the discovery of new treatments, enhance the precision of drug development, and bring more effective therapies to market. This shift represents a significant move towards more efficient and cost-effective methodologies in the DDD landscape.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142590779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Single cell RNA sequencing (scRNA-seq) is a powerful tool to capture gene expression snapshots in individual cells. However, a low amount of RNA in the individual cells results in dropout events, which introduce huge zero counts in the single cell expression matrix. We have developed VAImpute, a variational graph autoencoder based imputation technique that learns the inherent distribution of a large network/graph constructed from the scRNA-seq data leveraging copula correlation ( Ccor) among cells/genes. The trained model is utilized to predict the dropouts events by computing the probability of all non-edges (cell-gene) in the network. We devise an algorithm to impute the missing expression values of the detected dropouts. The performance of the proposed model is assessed on both simulated and real scRNA-seq datasets, comparing it to established single-cell imputation methods. VAImpute yields significant improvements to detect dropouts, thereby achieving superior performance in cell clustering, detecting rare cells, and differential expression. All codes and datasets are given in the github link: https://github.com/sumantaray/VAImputeAvailability.
{"title":"Enhancing Single-Cell RNA-seq Data Completeness with a Graph Learning Framework.","authors":"Snehalika Lall, Sumanta Ray, Sanghamitra Bandyopadhyay","doi":"10.1109/TCBB.2024.3492384","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3492384","url":null,"abstract":"<p><p>Single cell RNA sequencing (scRNA-seq) is a powerful tool to capture gene expression snapshots in individual cells. However, a low amount of RNA in the individual cells results in dropout events, which introduce huge zero counts in the single cell expression matrix. We have developed VAImpute, a variational graph autoencoder based imputation technique that learns the inherent distribution of a large network/graph constructed from the scRNA-seq data leveraging copula correlation ( Ccor) among cells/genes. The trained model is utilized to predict the dropouts events by computing the probability of all non-edges (cell-gene) in the network. We devise an algorithm to impute the missing expression values of the detected dropouts. The performance of the proposed model is assessed on both simulated and real scRNA-seq datasets, comparing it to established single-cell imputation methods. VAImpute yields significant improvements to detect dropouts, thereby achieving superior performance in cell clustering, detecting rare cells, and differential expression. All codes and datasets are given in the github link: https://github.com/sumantaray/VAImputeAvailability.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142590782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-30DOI: 10.1109/TCBB.2024.3485788
Qiao Ning, Yaomiao Zhao, Jun Gao, Chen Chen, Minghao Yin
MicroRNAs (miRNAs) play a significant role in cell differentiation, biological development as well as the occurrence and growth of diseases. Although many computational methods contribute to predicting the association between miRNAs and diseases, they do not fully explore the attribute information contained in associated edges between miRNAs and diseases. In this study, we propose a new method, Hierarchical Hypergraph learning in Association-Weighted heterogeneous network for MiRNA-Disease association identification (HHAWMD). HHAWMD first adaptively fuses multi-view similarities based on channel attention and distinguishes the relevance of different associated relationships according to changes in expression levels of disease-related miRNAs, miRNA similarity information, and disease similarity information. Then, HHAWMD assigns edge weights and attribute features according to the association level to construct an association-weighted heterogeneous graph. Next, HHAWMD extracts the subgraph of the miRNA-disease node pair from the heterogeneous graph and builds the hyperedge (a kind of virtual edge) between the node pair to generate the hypergraph. Finally, HHAWMD proposes a hierarchical hypergraph learning approach, including node-aware attention and hyperedge-aware attention, which aggregates the abundant semantic information contained in deep and shallow neighborhoods to the hyperedge in the hypergraph. Our experiment results suggest that HHAWMD has better performance and can be used as a powerful tool for miRNA-disease association identification. The source code and data of HHAWMD are available at https://github.com/ningq669/HHAWMD/.
{"title":"Hierarchical hypergraph learning in association-weighted heterogeneous network for miRNA-disease association identification.","authors":"Qiao Ning, Yaomiao Zhao, Jun Gao, Chen Chen, Minghao Yin","doi":"10.1109/TCBB.2024.3485788","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3485788","url":null,"abstract":"<p><p>MicroRNAs (miRNAs) play a significant role in cell differentiation, biological development as well as the occurrence and growth of diseases. Although many computational methods contribute to predicting the association between miRNAs and diseases, they do not fully explore the attribute information contained in associated edges between miRNAs and diseases. In this study, we propose a new method, Hierarchical Hypergraph learning in Association-Weighted heterogeneous network for MiRNA-Disease association identification (HHAWMD). HHAWMD first adaptively fuses multi-view similarities based on channel attention and distinguishes the relevance of different associated relationships according to changes in expression levels of disease-related miRNAs, miRNA similarity information, and disease similarity information. Then, HHAWMD assigns edge weights and attribute features according to the association level to construct an association-weighted heterogeneous graph. Next, HHAWMD extracts the subgraph of the miRNA-disease node pair from the heterogeneous graph and builds the hyperedge (a kind of virtual edge) between the node pair to generate the hypergraph. Finally, HHAWMD proposes a hierarchical hypergraph learning approach, including node-aware attention and hyperedge-aware attention, which aggregates the abundant semantic information contained in deep and shallow neighborhoods to the hyperedge in the hypergraph. Our experiment results suggest that HHAWMD has better performance and can be used as a powerful tool for miRNA-disease association identification. The source code and data of HHAWMD are available at https://github.com/ningq669/HHAWMD/.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142545203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-30DOI: 10.1109/TCBB.2024.3488059
Jia Wang, Zhenjing Yu, Jianqiang Li
The escalation of antibiotic resistance underscores the need for innovative approaches to combat bacterial infections. Phage therapy has emerged as a promising solution, wherein host determination plays an important role. Phage lysins, characterized by their specificity in targeting and cleaving corresponding host bacteria, serve as key players in this paradigm. In this study, we present a novel approach by leveraging genes of phage-encoded lytic enzymes for host prediction, culminating in the development of LHPre. Initially, gene fragments of phage-encoded lytic enzymes and their respective hosts were collected from the database. Secondly, DNA sequences were encoded using the Frequency Chaos Game Representation (FCGR) method, and pseudo samples were generated employing the Variational Autoencoder (VAE) model to address class imbalance. Finally, a prediction model was constructed using the Vision Transformer(Vit) model. Five-fold cross-validation results demonstrated that LHPre surpassed other state-of-the-art phage host prediction methods, achieving accuracies of 85.04%, 90.01%, and 93.39% at the species, genus, and family levels, respectively.
{"title":"LHPre: Phage Host Prediction with VAE-based Class Imbalance Correction and Lyase Sequence Embedding.","authors":"Jia Wang, Zhenjing Yu, Jianqiang Li","doi":"10.1109/TCBB.2024.3488059","DOIUrl":"10.1109/TCBB.2024.3488059","url":null,"abstract":"<p><p>The escalation of antibiotic resistance underscores the need for innovative approaches to combat bacterial infections. Phage therapy has emerged as a promising solution, wherein host determination plays an important role. Phage lysins, characterized by their specificity in targeting and cleaving corresponding host bacteria, serve as key players in this paradigm. In this study, we present a novel approach by leveraging genes of phage-encoded lytic enzymes for host prediction, culminating in the development of LHPre. Initially, gene fragments of phage-encoded lytic enzymes and their respective hosts were collected from the database. Secondly, DNA sequences were encoded using the Frequency Chaos Game Representation (FCGR) method, and pseudo samples were generated employing the Variational Autoencoder (VAE) model to address class imbalance. Finally, a prediction model was constructed using the Vision Transformer(Vit) model. Five-fold cross-validation results demonstrated that LHPre surpassed other state-of-the-art phage host prediction methods, achieving accuracies of 85.04%, 90.01%, and 93.39% at the species, genus, and family levels, respectively.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142545204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-30DOI: 10.1109/TCBB.2024.3488281
Keliang Cen, Zheming Xing, Xuan Wang, Yadong Wang, Junyi Li
Investigating the associations between circRNA and diseases is vital for comprehending the underlying mechanisms of diseases and formulating effective therapies. Computational prediction methods often rely solely on known circRNA-disease data, indirectly incorporating other biomolecules' effects by computing circRNA and disease similarities based on these molecules. However, this approach is limited, as other biomolecules also play significant roles in circRNA-disease interactions. To address this, we construct a comprehensive heterogeneous network incorporating data on human circRNAs, diseases, and other biomolecule interactions to develop a novel computational model, circ2DGNN, which is built upon a heterogeneous graph neural network. circ2DGNN directly takes heterogeneous networks as inputs and obtains the embedded representation of each node for downstream link prediction through graph representation learning. circ2DGNN employs a Transformer-like architecture, which can compute heterogeneous attention score for each edge, and perform message propagation and aggregation, using a residual connection to enhance the representation vector. It uniquely applies the same parameter matrix only to identical meta-relationships, reflecting diverse parameter spaces for different relationship types. After fine-tuning hyperparameters via five-fold cross-validation, evaluation conducted on a test dataset shows circ2DGNN outperforms existing state-of-the-art(SOTA) methods.
{"title":"circ2DGNN: circRNA-disease Association Prediction via Transformer-based Graph Neural Network.","authors":"Keliang Cen, Zheming Xing, Xuan Wang, Yadong Wang, Junyi Li","doi":"10.1109/TCBB.2024.3488281","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3488281","url":null,"abstract":"<p><p>Investigating the associations between circRNA and diseases is vital for comprehending the underlying mechanisms of diseases and formulating effective therapies. Computational prediction methods often rely solely on known circRNA-disease data, indirectly incorporating other biomolecules' effects by computing circRNA and disease similarities based on these molecules. However, this approach is limited, as other biomolecules also play significant roles in circRNA-disease interactions. To address this, we construct a comprehensive heterogeneous network incorporating data on human circRNAs, diseases, and other biomolecule interactions to develop a novel computational model, circ2DGNN, which is built upon a heterogeneous graph neural network. circ2DGNN directly takes heterogeneous networks as inputs and obtains the embedded representation of each node for downstream link prediction through graph representation learning. circ2DGNN employs a Transformer-like architecture, which can compute heterogeneous attention score for each edge, and perform message propagation and aggregation, using a residual connection to enhance the representation vector. It uniquely applies the same parameter matrix only to identical meta-relationships, reflecting diverse parameter spaces for different relationship types. After fine-tuning hyperparameters via five-fold cross-validation, evaluation conducted on a test dataset shows circ2DGNN outperforms existing state-of-the-art(SOTA) methods.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142545200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-29DOI: 10.1109/TCBB.2024.3487434
Haosheng Zhou, Wei Lin, Sergio R Labra, Stuart A Lipton, Jeremy A Elman, Nicholas J Schork, Aaditya V Rangan
Many traditional methods for analyzing gene-gene relationships focus on positive and negative correlations, both of which are a kind of 'symmetric' relationship. Biclustering is one such technique that typically searches for subsets of genes exhibiting correlated expression among a subset of samples. However, genes can also exhibit 'asymmetric' relationships, such as 'if-then' relationships used in boolean circuits. In this paper we develop a very general method that can be used to detect biclusters within gene-expression data that involve subsets of genes which are enriched for these 'boolean-asymmetric' relationships (BARs). These BAR-biclusters can correspond to heterogeneity that is driven by asymmetric gene-gene interactions, e.g., reflecting regulatory effects of one gene on another, rather than more standard symmetric interactions. Unlike typical approaches that search for BARs across the entire population, BAR-biclusters can detect asymmetric interactions that only occur among a subset of samples. We apply our method to a single-cell RNA-sequencing data-set, demonstrating that the statistically-significant BARbiclusters indeed contain additional information not present within the more traditional 'boolean-symmetric'-biclusters. For example, the BAR-biclusters involve different subsets of cells, and highlight different gene-pathways within the data-set. Moreover, by combining the boolean-asymmetric- and boolean-symmetricsignals, one can build linear classifiers which outperform those built using only traditional boolean-symmetric signals.
许多分析基因-基因关系的传统方法都侧重于正相关和负相关,这两种关系都是一种 "对称 "关系。双聚类就是这样一种技术,它通常在样本子集中搜索表现出相关表达的基因子集。然而,基因也可以表现出 "非对称 "关系,例如布尔电路中使用的 "如果-那么 "关系。在本文中,我们开发了一种非常通用的方法,可用于检测基因表达数据中的双簇,这些数据涉及富集了这些 "布尔-非对称 "关系(BAR)的基因子集。这些 "布尔-非对称 "关系双集群可能对应于由非对称基因-基因相互作用驱动的异质性,例如,反映一个基因对另一个基因的调控作用,而不是更标准的对称相互作用。与在整个群体中搜索 BAR 的典型方法不同,BAR-双簇可以检测到只发生在部分样本中的非对称相互作用。我们将这一方法应用于单细胞 RNA 序列数据集,结果表明,在统计意义上显著的 BAR 双簇确实包含了更传统的 "布尔-对称 "双簇所不具备的额外信息。例如,BAR 双簇涉及不同的细胞子集,并突出了数据集中不同的基因通路。此外,通过结合布尔-非对称信号和布尔-对称信号,我们可以建立线性分类器,其效果优于仅使用传统布尔-对称信号建立的分类器。
{"title":"Detecting Boolean Asymmetric Relationships with a Loop Counting Technique and its Implications for Analyzing Heterogeneity within Gene Expression Datasets.","authors":"Haosheng Zhou, Wei Lin, Sergio R Labra, Stuart A Lipton, Jeremy A Elman, Nicholas J Schork, Aaditya V Rangan","doi":"10.1109/TCBB.2024.3487434","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3487434","url":null,"abstract":"<p><p>Many traditional methods for analyzing gene-gene relationships focus on positive and negative correlations, both of which are a kind of 'symmetric' relationship. Biclustering is one such technique that typically searches for subsets of genes exhibiting correlated expression among a subset of samples. However, genes can also exhibit 'asymmetric' relationships, such as 'if-then' relationships used in boolean circuits. In this paper we develop a very general method that can be used to detect biclusters within gene-expression data that involve subsets of genes which are enriched for these 'boolean-asymmetric' relationships (BARs). These BAR-biclusters can correspond to heterogeneity that is driven by asymmetric gene-gene interactions, e.g., reflecting regulatory effects of one gene on another, rather than more standard symmetric interactions. Unlike typical approaches that search for BARs across the entire population, BAR-biclusters can detect asymmetric interactions that only occur among a subset of samples. We apply our method to a single-cell RNA-sequencing data-set, demonstrating that the statistically-significant BARbiclusters indeed contain additional information not present within the more traditional 'boolean-symmetric'-biclusters. For example, the BAR-biclusters involve different subsets of cells, and highlight different gene-pathways within the data-set. Moreover, by combining the boolean-asymmetric- and boolean-symmetricsignals, one can build linear classifiers which outperform those built using only traditional boolean-symmetric signals.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142545201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine learning techniques have become increasingly important in analyzing single-cell RNA and identifying cell types, providing valuable insights into cellular development and disease mechanisms. However, the presence of batch effects poses major challenges in scRNA-seq analysis due to data distribution variation across batches. Although several batch effect mitigation algorithms have been proposed, most of them focus only on the correlation of local structure embeddings, ignoring global distribution matching and discriminative feature representation in batch correction. In this paper, we proposed the discriminative domain adaption network (D2AN) for joint batch effects correction and type annotation with single-cell RNA-seq. Specifically, we first captured the global low-dimensional embeddings of samples from the source and target domains by adversarial domain adaption strategy. Second, a contrastive loss is developed to preliminarily align the source domain samples. Moreover, the semantic alignment of class centroids in the source and target domains is achieved for further local alignment. Finally, a self-paced learning mechanism based on inter-domain loss is adopted to gradually select samples with high similarity to the target domain for training, which is used to improve the robustness of the model. Experimental results demonstrated that the proposed method on multiple real datasets outperforms several state-of-the-art methods.
{"title":"Discriminative Domain Adaption Network for Simultaneously Removing Batch Effects and Annotating Cell Types in Single-Cell RNA-Seq.","authors":"Qi Zhu, Aizhen Li, Zheng Zhang, Chuhang Zheng, Junyong Zhao, Jin-Xing Liu, Daoqiang Zhang, Wei Shao","doi":"10.1109/TCBB.2024.3487574","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3487574","url":null,"abstract":"<p><p>Machine learning techniques have become increasingly important in analyzing single-cell RNA and identifying cell types, providing valuable insights into cellular development and disease mechanisms. However, the presence of batch effects poses major challenges in scRNA-seq analysis due to data distribution variation across batches. Although several batch effect mitigation algorithms have been proposed, most of them focus only on the correlation of local structure embeddings, ignoring global distribution matching and discriminative feature representation in batch correction. In this paper, we proposed the discriminative domain adaption network (D2AN) for joint batch effects correction and type annotation with single-cell RNA-seq. Specifically, we first captured the global low-dimensional embeddings of samples from the source and target domains by adversarial domain adaption strategy. Second, a contrastive loss is developed to preliminarily align the source domain samples. Moreover, the semantic alignment of class centroids in the source and target domains is achieved for further local alignment. Finally, a self-paced learning mechanism based on inter-domain loss is adopted to gradually select samples with high similarity to the target domain for training, which is used to improve the robustness of the model. Experimental results demonstrated that the proposed method on multiple real datasets outperforms several state-of-the-art methods.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142545202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}