Pub Date : 2025-01-01DOI: 10.1109/TCBB.2024.3489614
Mir Tanveerul Hassan, Hilal Tayara, Kil To Chong
Due to their safety, high activity, and plentiful sources, antioxidant peptides, particularly those produced from food, are thought to be prospective competitors to synthetic antioxidants in the fight against free radical-mediated illnesses. The lengthy and laborious trial-and-error method for identifying antioxidative peptides (AOP) has raised interest in creating computational-based methods. There exist two state-of-the-art AOP predictors; however, the restriction on peptide sequence length makes them inviable. By overcoming the aforementioned problem, a novel predictor might be useful in the context of AOP prediction. The method has been trained, tested, and evaluated on two datasets: a balanced one and an unbalanced one. We used seven different descriptors and five machine-learning (ML) classifiers to construct 35 baseline models. Five ML classifiers were further trained to create five meta-models using the combined output of 35 baseline models. Finally, these five meta-models were aggregated together through ensemble learning to create a robust predictive model named iAnOxPep. On both datasets, our proposed model demonstrated good prediction performance when compared to baseline models and meta-models, demonstrating the superiority of our approach in the identification of AOPs. For the purpose of screening and identifying possible AOPs, we anticipate that the iAnOxPep method will be an invaluable tool.
{"title":"iAnOxPep: A Machine Learning Model for the Identification of Anti-Oxidative Peptides Using Ensemble Learning.","authors":"Mir Tanveerul Hassan, Hilal Tayara, Kil To Chong","doi":"10.1109/TCBB.2024.3489614","DOIUrl":"10.1109/TCBB.2024.3489614","url":null,"abstract":"<p><p>Due to their safety, high activity, and plentiful sources, antioxidant peptides, particularly those produced from food, are thought to be prospective competitors to synthetic antioxidants in the fight against free radical-mediated illnesses. The lengthy and laborious trial-and-error method for identifying antioxidative peptides (AOP) has raised interest in creating computational-based methods. There exist two state-of-the-art AOP predictors; however, the restriction on peptide sequence length makes them inviable. By overcoming the aforementioned problem, a novel predictor might be useful in the context of AOP prediction. The method has been trained, tested, and evaluated on two datasets: a balanced one and an unbalanced one. We used seven different descriptors and five machine-learning (ML) classifiers to construct 35 baseline models. Five ML classifiers were further trained to create five meta-models using the combined output of 35 baseline models. Finally, these five meta-models were aggregated together through ensemble learning to create a robust predictive model named iAnOxPep. On both datasets, our proposed model demonstrated good prediction performance when compared to baseline models and meta-models, demonstrating the superiority of our approach in the identification of AOPs. For the purpose of screening and identifying possible AOPs, we anticipate that the iAnOxPep method will be an invaluable tool.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"85-96"},"PeriodicalIF":3.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142619332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1109/TCBB.2024.3493820
Orhun Vural, Leon Jololian, Lurong Pan
The analysis of protein-ligand binding sites plays a crucial role in the initial stages of drug discovery. Accurately predicting the ligand types that are likely to bind to protein-ligand binding sites enables more informed decision making in drug design. Our study, DeepLigType, determines protein-ligand binding sites using Fpocket and then predicts the ligand type of these pockets with the deep learning model, Convolutional Block Attention Module (CBAM) with ResNet. CBAM-ResNet has been trained to accurately predict five distinct ligand types. We classified protein-ligand binding sites into five different categories according to the type of response ligands cause when they bind to their target proteins, which are antagonist, agonist, activator, inhibitor, and others. We created a novel dataset, referred to as LigType5, from the widely recognized PDBbind and scPDB dataset for training and testing our model. While the literature mostly focuses on the specificity and characteristic analysis of protein binding sites by experimental (laboratory-based) methods, we propose a computational method with the DeepLigType architecture. DeepLigType demonstrated an accuracy of 74.30% and an AUC of 0.83 in ligand type prediction on a novel test dataset using the CBAM-ResNet deep learning model.
{"title":"DeepLigType: Predicting Ligand Types of Protein-Ligand Binding Sites Using a Deep Learning Model.","authors":"Orhun Vural, Leon Jololian, Lurong Pan","doi":"10.1109/TCBB.2024.3493820","DOIUrl":"10.1109/TCBB.2024.3493820","url":null,"abstract":"<p><p>The analysis of protein-ligand binding sites plays a crucial role in the initial stages of drug discovery. Accurately predicting the ligand types that are likely to bind to protein-ligand binding sites enables more informed decision making in drug design. Our study, DeepLigType, determines protein-ligand binding sites using Fpocket and then predicts the ligand type of these pockets with the deep learning model, Convolutional Block Attention Module (CBAM) with ResNet. CBAM-ResNet has been trained to accurately predict five distinct ligand types. We classified protein-ligand binding sites into five different categories according to the type of response ligands cause when they bind to their target proteins, which are antagonist, agonist, activator, inhibitor, and others. We created a novel dataset, referred to as LigType5, from the widely recognized PDBbind and scPDB dataset for training and testing our model. While the literature mostly focuses on the specificity and characteristic analysis of protein binding sites by experimental (laboratory-based) methods, we propose a computational method with the DeepLigType architecture. DeepLigType demonstrated an accuracy of 74.30% and an AUC of 0.83 in ligand type prediction on a novel test dataset using the CBAM-ResNet deep learning model.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"116-123"},"PeriodicalIF":3.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142604213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1109/TCBB.2024.3488059
Jia Wang, Zhenjing Yu, Jianqiang Li
The escalation of antibiotic resistance underscores the need for innovative approaches to combat bacterial infections. Phage therapy has emerged as a promising solution, wherein host determination plays an important role. Phage lysins, characterized by their specificity in targeting and cleaving corresponding host bacteria, serve as key players in this paradigm. In this study, we present a novel approach by leveraging genes of phage-encoded lytic enzymes for host prediction, culminating in the development of LHPre. Initially, gene fragments of phage-encoded lytic enzymes and their respective hosts were collected from the database. Second, DNA sequences were encoded using the Frequency Chaos Game Representation (FCGR) method, and pseudo samples were generated employing the Variational Autoencoder (VAE) model to address class imbalance. Finally, a prediction model was constructed using the Vision Transformer(Vit) model. Five-fold cross-validation results demonstrated that LHPre surpassed other state-of-the-art phage host prediction methods, achieving accuracies of 85.04%, 90.01%, and 93.39% at the species, genus, and family levels, respectively.
{"title":"LHPre: Phage Host Prediction With VAE-Based Class Imbalance Correction and Lyase Sequence Embedding.","authors":"Jia Wang, Zhenjing Yu, Jianqiang Li","doi":"10.1109/TCBB.2024.3488059","DOIUrl":"10.1109/TCBB.2024.3488059","url":null,"abstract":"<p><p>The escalation of antibiotic resistance underscores the need for innovative approaches to combat bacterial infections. Phage therapy has emerged as a promising solution, wherein host determination plays an important role. Phage lysins, characterized by their specificity in targeting and cleaving corresponding host bacteria, serve as key players in this paradigm. In this study, we present a novel approach by leveraging genes of phage-encoded lytic enzymes for host prediction, culminating in the development of LHPre. Initially, gene fragments of phage-encoded lytic enzymes and their respective hosts were collected from the database. Second, DNA sequences were encoded using the Frequency Chaos Game Representation (FCGR) method, and pseudo samples were generated employing the Variational Autoencoder (VAE) model to address class imbalance. Finally, a prediction model was constructed using the Vision Transformer(Vit) model. Five-fold cross-validation results demonstrated that LHPre surpassed other state-of-the-art phage host prediction methods, achieving accuracies of 85.04%, 90.01%, and 93.39% at the species, genus, and family levels, respectively.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"73-84"},"PeriodicalIF":3.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142545204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1109/TCBB.2024.3477313
Aditya Malusare, Vaneet Aggarwal
Recent advancements in generative models have established state-of-the-art benchmarks in the generation of molecules and novel drug candidates. Despite these successes, a significant gap persists between generative models and the utilization of extensive biomedical knowledge, often systematized within knowledge graphs, whose potential to inform and enhance generative processes has not been realized. In this paper, we present a novel approach that bridges this divide by developing a framework for knowledge-enhanced generative models called KARL. We develop a scalable methodology to extend the functionality of knowledge graphs while preserving semantic integrity, and incorporate this contextual information into a generative framework to guide a diffusion-based model. The integration of knowledge graph embeddings with our generative model furnishes a robust mechanism for producing novel drug candidates possessing specific characteristics while ensuring validity and synthesizability. KARL outperforms state-of-the-art generative models on both unconditional and targeted generation tasks.
生成模型的最新进展为分子和新型候选药物的生成建立了最先进的基准。尽管取得了这些成就,但在生成模型与利用广泛的生物医学知识(通常在知识图谱中系统化)之间仍然存在着巨大的差距,而这些知识为生成过程提供信息和增强生成过程的潜力尚未实现。在本文中,我们提出了一种新颖的方法,通过开发一个名为 KARL 的知识增强生成模型框架来弥合这一鸿沟。我们开发了一种可扩展的方法来扩展知识图谱的功能,同时保持语义的完整性,并将这种上下文信息纳入生成框架,以指导基于扩散的模型。知识图谱嵌入与我们的生成模型相结合,提供了一种稳健的机制,用于生成具有特定特征的新型候选药物,同时确保有效性和可合成性。KARL 在无条件生成和目标生成任务上的表现都优于最先进的生成模型。
{"title":"Improving Molecule Generation and Drug Discovery With a Knowledge-Enhanced Generative Model.","authors":"Aditya Malusare, Vaneet Aggarwal","doi":"10.1109/TCBB.2024.3477313","DOIUrl":"10.1109/TCBB.2024.3477313","url":null,"abstract":"<p><p>Recent advancements in generative models have established state-of-the-art benchmarks in the generation of molecules and novel drug candidates. Despite these successes, a significant gap persists between generative models and the utilization of extensive biomedical knowledge, often systematized within knowledge graphs, whose potential to inform and enhance generative processes has not been realized. In this paper, we present a novel approach that bridges this divide by developing a framework for knowledge-enhanced generative models called KARL. We develop a scalable methodology to extend the functionality of knowledge graphs while preserving semantic integrity, and incorporate this contextual information into a generative framework to guide a diffusion-based model. The integration of knowledge graph embeddings with our generative model furnishes a robust mechanism for producing novel drug candidates possessing specific characteristics while ensuring validity and synthesizability. KARL outperforms state-of-the-art generative models on both unconditional and targeted generation tasks.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"375-381"},"PeriodicalIF":3.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142390228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Single cell RNA sequencing (scRNA-seq) is a powerful tool to capture gene expression snapshots in individual cells. However, a low amount of RNA in the individual cells results in dropout events, which introduce huge zero counts in the single cell expression matrix. We have developed VAImpute, a variational graph autoencoder based imputation technique that learns the inherent distribution of a large network/graph constructed from the scRNA-seq data leveraging copula correlation ($Ccor$) among cells/genes. The trained model is utilized to predict the dropouts events by computing the probability of all non-edges (cell-gene) in the network. We devise an algorithm to impute the missing expression values of the detected dropouts. The performance of the proposed model is assessed on both simulated and real scRNA-seq datasets, comparing it to established single-cell imputation methods. VAImpute yields significant improvements to detect dropouts, thereby achieving superior performance in cell clustering, detecting rare cells, and differential expression.
{"title":"Enhancing Single-Cell RNA-Seq Data Completeness With a Graph Learning Framework.","authors":"Snehalika Lall, Sumanta Ray, Sanghamitra Bandyopadhyay","doi":"10.1109/TCBB.2024.3492384","DOIUrl":"10.1109/TCBB.2024.3492384","url":null,"abstract":"<p><p>Single cell RNA sequencing (scRNA-seq) is a powerful tool to capture gene expression snapshots in individual cells. However, a low amount of RNA in the individual cells results in dropout events, which introduce huge zero counts in the single cell expression matrix. We have developed VAImpute, a variational graph autoencoder based imputation technique that learns the inherent distribution of a large network/graph constructed from the scRNA-seq data leveraging copula correlation ($Ccor$) among cells/genes. The trained model is utilized to predict the dropouts events by computing the probability of all non-edges (cell-gene) in the network. We devise an algorithm to impute the missing expression values of the detected dropouts. The performance of the proposed model is assessed on both simulated and real scRNA-seq datasets, comparing it to established single-cell imputation methods. VAImpute yields significant improvements to detect dropouts, thereby achieving superior performance in cell clustering, detecting rare cells, and differential expression.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"64-72"},"PeriodicalIF":3.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142590782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-11DOI: 10.1109/TCBB.2024.3475108
Su Datt Lam;Wai Keat Yam;Yi-Ping Phoebe Chen
The four papers in this special section were presented at the 20th Asia Pacific Bioinformatics Conference (APBC), which was held in Malaysia 26-28 April 2022.
{"title":"Guest Editorial Guest Editorial for the 20th Asia Pacific Bioinformatics Conference","authors":"Su Datt Lam;Wai Keat Yam;Yi-Ping Phoebe Chen","doi":"10.1109/TCBB.2024.3475108","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3475108","url":null,"abstract":"The four papers in this special section were presented at the 20th Asia Pacific Bioinformatics Conference (APBC), which was held in Malaysia 26-28 April 2022.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1601-1603"},"PeriodicalIF":3.6,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10790560","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MicroRNAs (miRNAs) play a significant role in cell differentiation, biological development as well as the occurrence and growth of diseases. Although many computational methods contribute to predicting the association between miRNAs and diseases, they do not fully explore the attribute information contained in associated edges between miRNAs and diseases. In this study, we propose a new method, Hierarchical Hypergraph learning in Association-Weighted heterogeneous network for MiRNA-Disease association identification (HHAWMD). HHAWMD first adaptively fuses multi-view similarities based on channel attention and distinguishes the relevance of different associated relationships according to changes in expression levels of disease-related miRNAs, miRNA similarity information, and disease similarity information. Then, HHAWMD assigns edge weights and attribute features according to the association level to construct an association-weighted heterogeneous graph. Next, HHAWMD extracts the subgraph of the miRNA-disease node pair from the heterogeneous graph and builds the hyperedge (a kind of virtual edge) between the node pair to generate the hypergraph. Finally, HHAWMD proposes a hierarchical hypergraph learning approach, including node-aware attention and hyperedge-aware attention, which aggregates the abundant semantic information contained in deep and shallow neighborhoods to the hyperedge in the hypergraph. Our experiment results suggest that HHAWMD has better performance and can be used as a powerful tool for miRNA-disease association identification.
{"title":"Hierarchical Hypergraph Learning in Association- Weighted Heterogeneous Network for miRNA- Disease Association Identification","authors":"Qiao Ning;Yaomiao Zhao;Jun Gao;Chen Chen;Minghao Yin","doi":"10.1109/TCBB.2024.3485788","DOIUrl":"10.1109/TCBB.2024.3485788","url":null,"abstract":"MicroRNAs (miRNAs) play a significant role in cell differentiation, biological development as well as the occurrence and growth of diseases. Although many computational methods contribute to predicting the association between miRNAs and diseases, they do not fully explore the attribute information contained in associated edges between miRNAs and diseases. In this study, we propose a new method, Hierarchical Hypergraph learning in Association-Weighted heterogeneous network for MiRNA-Disease association identification (HHAWMD). HHAWMD first adaptively fuses multi-view similarities based on channel attention and distinguishes the relevance of different associated relationships according to changes in expression levels of disease-related miRNAs, miRNA similarity information, and disease similarity information. Then, HHAWMD assigns edge weights and attribute features according to the association level to construct an association-weighted heterogeneous graph. Next, HHAWMD extracts the subgraph of the miRNA-disease node pair from the heterogeneous graph and builds the hyperedge (a kind of virtual edge) between the node pair to generate the hypergraph. Finally, HHAWMD proposes a hierarchical hypergraph learning approach, including node-aware attention and hyperedge-aware attention, which aggregates the abundant semantic information contained in deep and shallow neighborhoods to the hyperedge in the hypergraph. Our experiment results suggest that HHAWMD has better performance and can be used as a powerful tool for miRNA-disease association identification.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2531-2542"},"PeriodicalIF":3.6,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142545203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-30DOI: 10.1109/TCBB.2024.3488281
Keliang Cen;Zheming Xing;Xuan Wang;Yadong Wang;Junyi Li
Investigating the associations between circRNA and diseases is vital for comprehending the underlying mechanisms of diseases and formulating effective therapies. Computational prediction methods often rely solely on known circRNA-disease data, indirectly incorporating other biomolecules' effects by computing circRNA and disease similarities based on these molecules. However, this approach is limited, as other biomolecules also play significant roles in circRNA-disease interactions. To address this, we construct a comprehensive heterogeneous network incorporating data on human circRNAs, diseases, and other biomolecule interactions to develop a novel computational model, circ2DGNN, which is built upon a heterogeneous graph neural network. circ2DGNN directly takes heterogeneous networks as inputs and obtains the embedded representation of each node for downstream link prediction through graph representation learning. circ2DGNN employs a Transformer-like architecture, which can compute heterogeneous attention score for each edge, and perform message propagation and aggregation, using a residual connection to enhance the representation vector. It uniquely applies the same parameter matrix only to identical meta-relationships, reflecting diverse parameter spaces for different relationship types. After fine-tuning hyperparameters via five-fold cross-validation, evaluation conducted on a test dataset shows circ2DGNN outperforms existing state-of-the-art(SOTA) methods.
{"title":"circ2DGNN: circRNA-Disease Association Prediction via Transformer-Based Graph Neural Network","authors":"Keliang Cen;Zheming Xing;Xuan Wang;Yadong Wang;Junyi Li","doi":"10.1109/TCBB.2024.3488281","DOIUrl":"10.1109/TCBB.2024.3488281","url":null,"abstract":"Investigating the associations between circRNA and diseases is vital for comprehending the underlying mechanisms of diseases and formulating effective therapies. Computational prediction methods often rely solely on known circRNA-disease data, indirectly incorporating other biomolecules' effects by computing circRNA and disease similarities based on these molecules. However, this approach is limited, as other biomolecules also play significant roles in circRNA-disease interactions. To address this, we construct a comprehensive heterogeneous network incorporating data on human circRNAs, diseases, and other biomolecule interactions to develop a novel computational model, circ2DGNN, which is built upon a heterogeneous graph neural network. circ2DGNN directly takes heterogeneous networks as inputs and obtains the embedded representation of each node for downstream link prediction through graph representation learning. circ2DGNN employs a Transformer-like architecture, which can compute heterogeneous attention score for each edge, and perform message propagation and aggregation, using a residual connection to enhance the representation vector. It uniquely applies the same parameter matrix only to identical meta-relationships, reflecting diverse parameter spaces for different relationship types. After fine-tuning hyperparameters via five-fold cross-validation, evaluation conducted on a test dataset shows circ2DGNN outperforms existing state-of-the-art(SOTA) methods.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2556-2567"},"PeriodicalIF":3.6,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142545200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine learning techniques have become increasingly important in analyzing single-cell RNA and identifying cell types, providing valuable insights into cellular development and disease mechanisms. However, the presence of batch effects poses major challenges in scRNA-seq analysis due to data distribution variation across batches. Although several batch effect mitigation algorithms have been proposed, most of them focus only on the correlation of local structure embeddings, ignoring global distribution matching and discriminative feature representation in batch correction. In this paper, we proposed the discriminative domain adaption network (D2AN) for joint batch effects correction and type annotation with single-cell RNA-seq. Specifically, we first captured the global low-dimensional embeddings of samples from the source and target domains by adversarial domain adaption strategy. Second, a contrastive loss is developed to preliminarily align the source domain samples. Moreover, the semantic alignment of class centroids in the source and target domains is achieved for further local alignment. Finally, a self-paced learning mechanism based on inter-domain loss is adopted to gradually select samples with high similarity to the target domain for training, which is used to improve the robustness of the model. Experimental results demonstrated that the proposed method on multiple real datasets outperforms several state-of-the-art methods.
{"title":"Discriminative Domain Adaption Network for Simultaneously Removing Batch Effects and Annotating Cell Types in Single-Cell RNA-Seq","authors":"Qi Zhu;Aizhen Li;Zheng Zhang;Chuhang Zheng;Junyong Zhao;Jin-Xing Liu;Daoqiang Zhang;Wei Shao","doi":"10.1109/TCBB.2024.3487574","DOIUrl":"10.1109/TCBB.2024.3487574","url":null,"abstract":"Machine learning techniques have become increasingly important in analyzing single-cell RNA and identifying cell types, providing valuable insights into cellular development and disease mechanisms. However, the presence of batch effects poses major challenges in scRNA-seq analysis due to data distribution variation across batches. Although several batch effect mitigation algorithms have been proposed, most of them focus only on the correlation of local structure embeddings, ignoring global distribution matching and discriminative feature representation in batch correction. In this paper, we proposed the discriminative domain adaption network (D2AN) for joint batch effects correction and type annotation with single-cell RNA-seq. Specifically, we first captured the global low-dimensional embeddings of samples from the source and target domains by adversarial domain adaption strategy. Second, a contrastive loss is developed to preliminarily align the source domain samples. Moreover, the semantic alignment of class centroids in the source and target domains is achieved for further local alignment. Finally, a self-paced learning mechanism based on inter-domain loss is adopted to gradually select samples with high similarity to the target domain for training, which is used to improve the robustness of the model. Experimental results demonstrated that the proposed method on multiple real datasets outperforms several state-of-the-art methods.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2543-2555"},"PeriodicalIF":3.6,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142545202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-25DOI: 10.1109/TCBB.2024.3486742
Liye Jia;Liancheng Jiang;Junhong Yue;Fang Hao;Yongfei Wu;Xilin Liu
The stage prediction of kidney renal clear cell carcinoma (KIRC) is important for the diagnosis, personalized treatment, and prognosis of patients. Many prediction methods have been proposed, but most of them are based on unimodal gene data, and their accuracy is difficult to further improve. Therefore, we propose a novel multi-weighted dynamic cascade forest based on the bilinear feature extraction (MLW-BFECF) model for stage prediction of KIRC using multimodal gene data (RNA-seq, CNA, and methylation). The proposed model utilizes a dynamic cascade framework with shuffle layers to prevent early degradation of the model. In each cascade layer, a voting technique based on three gene selection algorithms is first employed to effectively retain gene features more relevant to KIRC and eliminate redundant information in gene features. Then, two new bilinear models based on the gated attention mechanism are proposed to better extract new intra-modal and inter-modal gene features; Finally, based on the idea of the bagging, a multi-weighted ensemble forest classifiers module is proposed to extract and fuse probabilistic features of the three-modal gene data. A series of experiments demonstrate that the MLW-BFECF model based on the three-modal KIRC dataset achieves the highest prediction performance with an accuracy of 88.9 %.
{"title":"MLW-BFECF: A Multi-Weighted Dynamic Cascade Forest Based on Bilinear Feature Extraction for Predicting the Stage of Kidney Renal Clear Cell Carcinoma on Multi-Modal Gene Data","authors":"Liye Jia;Liancheng Jiang;Junhong Yue;Fang Hao;Yongfei Wu;Xilin Liu","doi":"10.1109/TCBB.2024.3486742","DOIUrl":"10.1109/TCBB.2024.3486742","url":null,"abstract":"The stage prediction of kidney renal clear cell carcinoma (KIRC) is important for the diagnosis, personalized treatment, and prognosis of patients. Many prediction methods have been proposed, but most of them are based on unimodal gene data, and their accuracy is difficult to further improve. Therefore, we propose a novel multi-weighted dynamic cascade forest based on the bilinear feature extraction (MLW-BFECF) model for stage prediction of KIRC using multimodal gene data (RNA-seq, CNA, and methylation). The proposed model utilizes a dynamic cascade framework with shuffle layers to prevent early degradation of the model. In each cascade layer, a voting technique based on three gene selection algorithms is first employed to effectively retain gene features more relevant to KIRC and eliminate redundant information in gene features. Then, two new bilinear models based on the gated attention mechanism are proposed to better extract new intra-modal and inter-modal gene features; Finally, based on the idea of the bagging, a multi-weighted ensemble forest classifiers module is proposed to extract and fuse probabilistic features of the three-modal gene data. A series of experiments demonstrate that the MLW-BFECF model based on the three-modal KIRC dataset achieves the highest prediction performance with an accuracy of 88.9 %.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2568-2579"},"PeriodicalIF":3.6,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142499543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}