首页 > 最新文献

IEEE/ACM Transactions on Computational Biology and Bioinformatics最新文献

英文 中文
iAnOxPep: A Machine Learning Model for the Identification of Anti-Oxidative Peptides Using Ensemble Learning. iAnOxPep:利用集合学习识别抗氧化肽的机器学习模型。
IF 3.4 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-01-01 DOI: 10.1109/TCBB.2024.3489614
Mir Tanveerul Hassan, Hilal Tayara, Kil To Chong

Due to their safety, high activity, and plentiful sources, antioxidant peptides, particularly those produced from food, are thought to be prospective competitors to synthetic antioxidants in the fight against free radical-mediated illnesses. The lengthy and laborious trial-and-error method for identifying antioxidative peptides (AOP) has raised interest in creating computational-based methods. There exist two state-of-the-art AOP predictors; however, the restriction on peptide sequence length makes them inviable. By overcoming the aforementioned problem, a novel predictor might be useful in the context of AOP prediction. The method has been trained, tested, and evaluated on two datasets: a balanced one and an unbalanced one. We used seven different descriptors and five machine-learning (ML) classifiers to construct 35 baseline models. Five ML classifiers were further trained to create five meta-models using the combined output of 35 baseline models. Finally, these five meta-models were aggregated together through ensemble learning to create a robust predictive model named iAnOxPep. On both datasets, our proposed model demonstrated good prediction performance when compared to baseline models and meta-models, demonstrating the superiority of our approach in the identification of AOPs. For the purpose of screening and identifying possible AOPs, we anticipate that the iAnOxPep method will be an invaluable tool.

抗氧化肽,尤其是从食物中提取的抗氧化肽,由于其安全性、高活性和丰富的来源,被认为是对抗自由基介导的疾病的合成抗氧化剂的潜在竞争对手。鉴定抗氧化肽(AOP)的漫长而费力的试错法引起了人们对创建基于计算的方法的兴趣。目前有两种最先进的抗氧化肽预测方法,但由于肽序列长度的限制,这两种方法并不可行。通过克服上述问题,一种新的预测方法可能对 AOP 预测有用。该方法在两个数据集上进行了训练、测试和评估:一个平衡数据集和一个不平衡数据集。我们使用七个不同的描述符和五个机器学习(ML)分类器构建了 35 个基线模型。我们进一步训练了五个机器学习分类器,利用 35 个基线模型的综合输出创建了五个元模型。最后,通过集合学习将这五个元模型聚合在一起,创建了一个名为 iAnOxPep 的稳健预测模型。在这两个数据集上,与基线模型和元模型相比,我们提出的模型都表现出了良好的预测性能,证明了我们的方法在识别 AOPs 方面的优越性。在筛选和识别可能的 AOPs 方面,我们预计 iAnOxPep 方法将是一个非常有价值的工具。
{"title":"iAnOxPep: A Machine Learning Model for the Identification of Anti-Oxidative Peptides Using Ensemble Learning.","authors":"Mir Tanveerul Hassan, Hilal Tayara, Kil To Chong","doi":"10.1109/TCBB.2024.3489614","DOIUrl":"10.1109/TCBB.2024.3489614","url":null,"abstract":"<p><p>Due to their safety, high activity, and plentiful sources, antioxidant peptides, particularly those produced from food, are thought to be prospective competitors to synthetic antioxidants in the fight against free radical-mediated illnesses. The lengthy and laborious trial-and-error method for identifying antioxidative peptides (AOP) has raised interest in creating computational-based methods. There exist two state-of-the-art AOP predictors; however, the restriction on peptide sequence length makes them inviable. By overcoming the aforementioned problem, a novel predictor might be useful in the context of AOP prediction. The method has been trained, tested, and evaluated on two datasets: a balanced one and an unbalanced one. We used seven different descriptors and five machine-learning (ML) classifiers to construct 35 baseline models. Five ML classifiers were further trained to create five meta-models using the combined output of 35 baseline models. Finally, these five meta-models were aggregated together through ensemble learning to create a robust predictive model named iAnOxPep. On both datasets, our proposed model demonstrated good prediction performance when compared to baseline models and meta-models, demonstrating the superiority of our approach in the identification of AOPs. For the purpose of screening and identifying possible AOPs, we anticipate that the iAnOxPep method will be an invaluable tool.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"85-96"},"PeriodicalIF":3.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142619332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepLigType: Predicting Ligand Types of Protein-Ligand Binding Sites Using a Deep Learning Model. DeepLigType:使用深度学习模型预测蛋白质配体结合位点的配体类型。
IF 3.4 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-01-01 DOI: 10.1109/TCBB.2024.3493820
Orhun Vural, Leon Jololian, Lurong Pan

The analysis of protein-ligand binding sites plays a crucial role in the initial stages of drug discovery. Accurately predicting the ligand types that are likely to bind to protein-ligand binding sites enables more informed decision making in drug design. Our study, DeepLigType, determines protein-ligand binding sites using Fpocket and then predicts the ligand type of these pockets with the deep learning model, Convolutional Block Attention Module (CBAM) with ResNet. CBAM-ResNet has been trained to accurately predict five distinct ligand types. We classified protein-ligand binding sites into five different categories according to the type of response ligands cause when they bind to their target proteins, which are antagonist, agonist, activator, inhibitor, and others. We created a novel dataset, referred to as LigType5, from the widely recognized PDBbind and scPDB dataset for training and testing our model. While the literature mostly focuses on the specificity and characteristic analysis of protein binding sites by experimental (laboratory-based) methods, we propose a computational method with the DeepLigType architecture. DeepLigType demonstrated an accuracy of 74.30% and an AUC of 0.83 in ligand type prediction on a novel test dataset using the CBAM-ResNet deep learning model.

对蛋白质配体结合位点的分析在药物发现的初始阶段起着至关重要的作用。准确预测可能与蛋白质配体结合位点结合的配体类型,有助于在药物设计中做出更明智的决策。我们的研究 "DeepLigType "利用 Fpocket 确定蛋白质配体结合位点,然后利用深度学习模型卷积块注意模块(CBAM)和 ResNet 预测这些口袋的配体类型。经过训练,CBAM-ResNet 可以准确预测五种不同的配体类型。我们根据配体与靶蛋白结合时产生的反应类型,将蛋白质配体结合位点分为五种不同的类别,即拮抗剂、激动剂、激活剂、抑制剂和其他。我们从广泛认可的 PDBbind 和 scPDB 数据集中创建了一个称为 LigType5 的新数据集,用于训练和测试我们的模型。文献大多侧重于通过实验(基于实验室)方法分析蛋白质结合位点的特异性和特征,而我们则提出了一种采用 DeepLigType 架构的计算方法。DeepLigType 在使用 CBAM-ResNet 深度学习模型的新型测试数据集上进行配体类型预测时,准确率达到 74.30%,AUC 达到 0.83。如需访问本研究的代码实现,请访问我们的 GitHub 存储库 https://github.com/drorhunvural/DeepLigType。
{"title":"DeepLigType: Predicting Ligand Types of Protein-Ligand Binding Sites Using a Deep Learning Model.","authors":"Orhun Vural, Leon Jololian, Lurong Pan","doi":"10.1109/TCBB.2024.3493820","DOIUrl":"10.1109/TCBB.2024.3493820","url":null,"abstract":"<p><p>The analysis of protein-ligand binding sites plays a crucial role in the initial stages of drug discovery. Accurately predicting the ligand types that are likely to bind to protein-ligand binding sites enables more informed decision making in drug design. Our study, DeepLigType, determines protein-ligand binding sites using Fpocket and then predicts the ligand type of these pockets with the deep learning model, Convolutional Block Attention Module (CBAM) with ResNet. CBAM-ResNet has been trained to accurately predict five distinct ligand types. We classified protein-ligand binding sites into five different categories according to the type of response ligands cause when they bind to their target proteins, which are antagonist, agonist, activator, inhibitor, and others. We created a novel dataset, referred to as LigType5, from the widely recognized PDBbind and scPDB dataset for training and testing our model. While the literature mostly focuses on the specificity and characteristic analysis of protein binding sites by experimental (laboratory-based) methods, we propose a computational method with the DeepLigType architecture. DeepLigType demonstrated an accuracy of 74.30% and an AUC of 0.83 in ligand type prediction on a novel test dataset using the CBAM-ResNet deep learning model.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"116-123"},"PeriodicalIF":3.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142604213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LHPre: Phage Host Prediction With VAE-Based Class Imbalance Correction and Lyase Sequence Embedding. LHPre:利用基于 VAE 的类不平衡校正和 Lyase 序列嵌入进行噬菌体宿主预测。
IF 3.4 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-01-01 DOI: 10.1109/TCBB.2024.3488059
Jia Wang, Zhenjing Yu, Jianqiang Li

The escalation of antibiotic resistance underscores the need for innovative approaches to combat bacterial infections. Phage therapy has emerged as a promising solution, wherein host determination plays an important role. Phage lysins, characterized by their specificity in targeting and cleaving corresponding host bacteria, serve as key players in this paradigm. In this study, we present a novel approach by leveraging genes of phage-encoded lytic enzymes for host prediction, culminating in the development of LHPre. Initially, gene fragments of phage-encoded lytic enzymes and their respective hosts were collected from the database. Second, DNA sequences were encoded using the Frequency Chaos Game Representation (FCGR) method, and pseudo samples were generated employing the Variational Autoencoder (VAE) model to address class imbalance. Finally, a prediction model was constructed using the Vision Transformer(Vit) model. Five-fold cross-validation results demonstrated that LHPre surpassed other state-of-the-art phage host prediction methods, achieving accuracies of 85.04%, 90.01%, and 93.39% at the species, genus, and family levels, respectively.

抗生素耐药性的升级凸显了采用创新方法抗击细菌感染的必要性。噬菌体疗法已成为一种前景广阔的解决方案,其中宿主决定起着重要作用。噬菌体溶菌素具有靶向和裂解相应宿主细菌的特异性,是这一模式中的关键角色。在这项研究中,我们提出了一种新方法,利用噬菌体编码的溶菌酶基因进行宿主预测,最终开发出 LHPre。首先,我们从数据库中收集了噬菌体编码的溶菌酶基因片段及其各自的宿主。其次,利用频率混沌博弈表示法(FCGR)对DNA序列进行编码,并利用变异自动编码器(VAE)模型生成伪样本,以解决类不平衡问题。最后,利用视觉转换器(Vit)模型构建了一个预测模型。五倍交叉验证结果表明,LHPre 超越了其他最先进的噬菌体宿主预测方法,在种、属和科层面的准确率分别达到了 85.04%、90.01% 和 93.39%。
{"title":"LHPre: Phage Host Prediction With VAE-Based Class Imbalance Correction and Lyase Sequence Embedding.","authors":"Jia Wang, Zhenjing Yu, Jianqiang Li","doi":"10.1109/TCBB.2024.3488059","DOIUrl":"10.1109/TCBB.2024.3488059","url":null,"abstract":"<p><p>The escalation of antibiotic resistance underscores the need for innovative approaches to combat bacterial infections. Phage therapy has emerged as a promising solution, wherein host determination plays an important role. Phage lysins, characterized by their specificity in targeting and cleaving corresponding host bacteria, serve as key players in this paradigm. In this study, we present a novel approach by leveraging genes of phage-encoded lytic enzymes for host prediction, culminating in the development of LHPre. Initially, gene fragments of phage-encoded lytic enzymes and their respective hosts were collected from the database. Second, DNA sequences were encoded using the Frequency Chaos Game Representation (FCGR) method, and pseudo samples were generated employing the Variational Autoencoder (VAE) model to address class imbalance. Finally, a prediction model was constructed using the Vision Transformer(Vit) model. Five-fold cross-validation results demonstrated that LHPre surpassed other state-of-the-art phage host prediction methods, achieving accuracies of 85.04%, 90.01%, and 93.39% at the species, genus, and family levels, respectively.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"73-84"},"PeriodicalIF":3.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142545204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Molecule Generation and Drug Discovery With a Knowledge-Enhanced Generative Model. 利用知识增强型生成模型改进分子生成和药物发现。
IF 3.4 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-01-01 DOI: 10.1109/TCBB.2024.3477313
Aditya Malusare, Vaneet Aggarwal

Recent advancements in generative models have established state-of-the-art benchmarks in the generation of molecules and novel drug candidates. Despite these successes, a significant gap persists between generative models and the utilization of extensive biomedical knowledge, often systematized within knowledge graphs, whose potential to inform and enhance generative processes has not been realized. In this paper, we present a novel approach that bridges this divide by developing a framework for knowledge-enhanced generative models called KARL. We develop a scalable methodology to extend the functionality of knowledge graphs while preserving semantic integrity, and incorporate this contextual information into a generative framework to guide a diffusion-based model. The integration of knowledge graph embeddings with our generative model furnishes a robust mechanism for producing novel drug candidates possessing specific characteristics while ensuring validity and synthesizability. KARL outperforms state-of-the-art generative models on both unconditional and targeted generation tasks.

生成模型的最新进展为分子和新型候选药物的生成建立了最先进的基准。尽管取得了这些成就,但在生成模型与利用广泛的生物医学知识(通常在知识图谱中系统化)之间仍然存在着巨大的差距,而这些知识为生成过程提供信息和增强生成过程的潜力尚未实现。在本文中,我们提出了一种新颖的方法,通过开发一个名为 KARL 的知识增强生成模型框架来弥合这一鸿沟。我们开发了一种可扩展的方法来扩展知识图谱的功能,同时保持语义的完整性,并将这种上下文信息纳入生成框架,以指导基于扩散的模型。知识图谱嵌入与我们的生成模型相结合,提供了一种稳健的机制,用于生成具有特定特征的新型候选药物,同时确保有效性和可合成性。KARL 在无条件生成和目标生成任务上的表现都优于最先进的生成模型。
{"title":"Improving Molecule Generation and Drug Discovery With a Knowledge-Enhanced Generative Model.","authors":"Aditya Malusare, Vaneet Aggarwal","doi":"10.1109/TCBB.2024.3477313","DOIUrl":"10.1109/TCBB.2024.3477313","url":null,"abstract":"<p><p>Recent advancements in generative models have established state-of-the-art benchmarks in the generation of molecules and novel drug candidates. Despite these successes, a significant gap persists between generative models and the utilization of extensive biomedical knowledge, often systematized within knowledge graphs, whose potential to inform and enhance generative processes has not been realized. In this paper, we present a novel approach that bridges this divide by developing a framework for knowledge-enhanced generative models called KARL. We develop a scalable methodology to extend the functionality of knowledge graphs while preserving semantic integrity, and incorporate this contextual information into a generative framework to guide a diffusion-based model. The integration of knowledge graph embeddings with our generative model furnishes a robust mechanism for producing novel drug candidates possessing specific characteristics while ensuring validity and synthesizability. KARL outperforms state-of-the-art generative models on both unconditional and targeted generation tasks.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"375-381"},"PeriodicalIF":3.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142390228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Single-Cell RNA-Seq Data Completeness With a Graph Learning Framework. 利用图形学习框架提高单细胞 RNA-seq 数据的完整性。
IF 3.4 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-01-01 DOI: 10.1109/TCBB.2024.3492384
Snehalika Lall, Sumanta Ray, Sanghamitra Bandyopadhyay

Single cell RNA sequencing (scRNA-seq) is a powerful tool to capture gene expression snapshots in individual cells. However, a low amount of RNA in the individual cells results in dropout events, which introduce huge zero counts in the single cell expression matrix. We have developed VAImpute, a variational graph autoencoder based imputation technique that learns the inherent distribution of a large network/graph constructed from the scRNA-seq data leveraging copula correlation ($Ccor$) among cells/genes. The trained model is utilized to predict the dropouts events by computing the probability of all non-edges (cell-gene) in the network. We devise an algorithm to impute the missing expression values of the detected dropouts. The performance of the proposed model is assessed on both simulated and real scRNA-seq datasets, comparing it to established single-cell imputation methods. VAImpute yields significant improvements to detect dropouts, thereby achieving superior performance in cell clustering, detecting rare cells, and differential expression.

单细胞 RNA 测序(scRNA-seq)是捕捉单个细胞基因表达快照的强大工具。然而,由于单个细胞中的 RNA 含量较低,因此会出现丢失事件,从而在单细胞表达矩阵中引入大量零计数。我们开发的 VAImpute 是一种基于变异图自动编码器的估算技术,它利用细胞/基因间的 copula correlation ( Ccor) 学习由 scRNA-seq 数据构建的大型网络/图的固有分布。通过计算网络中所有非边(细胞-基因)的概率,利用训练好的模型预测掉线事件。我们还设计了一种算法,对检测到的缺失表达值进行补偿。我们在模拟和真实的 scRNA-seq 数据集上评估了拟议模型的性能,并将其与已有的单细胞估算方法进行了比较。VAImpute 在检测缺失方面有显著改进,因此在细胞聚类、检测稀有细胞和差异表达方面表现出色。所有代码和数据集都在 github 链接中提供:https://github.com/sumantaray/VAImputeAvailability。
{"title":"Enhancing Single-Cell RNA-Seq Data Completeness With a Graph Learning Framework.","authors":"Snehalika Lall, Sumanta Ray, Sanghamitra Bandyopadhyay","doi":"10.1109/TCBB.2024.3492384","DOIUrl":"10.1109/TCBB.2024.3492384","url":null,"abstract":"<p><p>Single cell RNA sequencing (scRNA-seq) is a powerful tool to capture gene expression snapshots in individual cells. However, a low amount of RNA in the individual cells results in dropout events, which introduce huge zero counts in the single cell expression matrix. We have developed VAImpute, a variational graph autoencoder based imputation technique that learns the inherent distribution of a large network/graph constructed from the scRNA-seq data leveraging copula correlation ($Ccor$) among cells/genes. The trained model is utilized to predict the dropouts events by computing the probability of all non-edges (cell-gene) in the network. We devise an algorithm to impute the missing expression values of the detected dropouts. The performance of the proposed model is assessed on both simulated and real scRNA-seq datasets, comparing it to established single-cell imputation methods. VAImpute yields significant improvements to detect dropouts, thereby achieving superior performance in cell clustering, detecting rare cells, and differential expression.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"64-72"},"PeriodicalIF":3.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142590782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Guest Editorial Guest Editorial for the 20th Asia Pacific Bioinformatics Conference 第20届亚太生物信息学会议客座评论
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-12-11 DOI: 10.1109/TCBB.2024.3475108
Su Datt Lam;Wai Keat Yam;Yi-Ping Phoebe Chen
The four papers in this special section were presented at the 20th Asia Pacific Bioinformatics Conference (APBC), which was held in Malaysia 26-28 April 2022.
这四篇论文在2022年4月26日至28日在马来西亚举行的第20届亚太生物信息学会议(APBC)上发表。
{"title":"Guest Editorial Guest Editorial for the 20th Asia Pacific Bioinformatics Conference","authors":"Su Datt Lam;Wai Keat Yam;Yi-Ping Phoebe Chen","doi":"10.1109/TCBB.2024.3475108","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3475108","url":null,"abstract":"The four papers in this special section were presented at the 20th Asia Pacific Bioinformatics Conference (APBC), which was held in Malaysia 26-28 April 2022.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1601-1603"},"PeriodicalIF":3.6,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10790560","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical Hypergraph Learning in Association- Weighted Heterogeneous Network for miRNA- Disease Association Identification 用于 miRNA 与疾病关联识别的关联加权异构网络中的层次超图学习
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-10-30 DOI: 10.1109/TCBB.2024.3485788
Qiao Ning;Yaomiao Zhao;Jun Gao;Chen Chen;Minghao Yin
MicroRNAs (miRNAs) play a significant role in cell differentiation, biological development as well as the occurrence and growth of diseases. Although many computational methods contribute to predicting the association between miRNAs and diseases, they do not fully explore the attribute information contained in associated edges between miRNAs and diseases. In this study, we propose a new method, Hierarchical Hypergraph learning in Association-Weighted heterogeneous network for MiRNA-Disease association identification (HHAWMD). HHAWMD first adaptively fuses multi-view similarities based on channel attention and distinguishes the relevance of different associated relationships according to changes in expression levels of disease-related miRNAs, miRNA similarity information, and disease similarity information. Then, HHAWMD assigns edge weights and attribute features according to the association level to construct an association-weighted heterogeneous graph. Next, HHAWMD extracts the subgraph of the miRNA-disease node pair from the heterogeneous graph and builds the hyperedge (a kind of virtual edge) between the node pair to generate the hypergraph. Finally, HHAWMD proposes a hierarchical hypergraph learning approach, including node-aware attention and hyperedge-aware attention, which aggregates the abundant semantic information contained in deep and shallow neighborhoods to the hyperedge in the hypergraph. Our experiment results suggest that HHAWMD has better performance and can be used as a powerful tool for miRNA-disease association identification.
微小核糖核酸(miRNA)在细胞分化、生物发育以及疾病的发生和发展中发挥着重要作用。虽然许多计算方法有助于预测 miRNA 与疾病之间的关联,但它们并没有充分挖掘 miRNA 与疾病之间关联边所包含的属性信息。在本研究中,我们提出了一种新方法--用于 MiRNA 与疾病关联识别的关联加权异构网络中的层次超图学习(HHAWMD)。HHAWMD 首先基于通道注意力自适应地融合多视图相似性,并根据疾病相关 miRNA 表达水平的变化、miRNA 相似性信息和疾病相似性信息区分不同关联关系的相关性。然后,HHAWMD 根据关联程度分配边权重和属性特征,构建关联加权异构图。接着,HHAWMD 从异质图中提取 miRNA-疾病节点对的子图,并在节点对之间建立超边(一种虚拟边),生成超图。最后,HHAWMD 提出了一种分层超图学习方法,包括节点感知注意力和超边感知注意力,将深层和浅层邻域中包含的丰富语义信息聚合到超图中的超边。实验结果表明,HHAWMD 具有更好的性能,可作为 miRNA 与疾病关联识别的有力工具。HHAWMD的源代码和数据可在https://github.com/ningq669/HHAWMD/。
{"title":"Hierarchical Hypergraph Learning in Association- Weighted Heterogeneous Network for miRNA- Disease Association Identification","authors":"Qiao Ning;Yaomiao Zhao;Jun Gao;Chen Chen;Minghao Yin","doi":"10.1109/TCBB.2024.3485788","DOIUrl":"10.1109/TCBB.2024.3485788","url":null,"abstract":"MicroRNAs (miRNAs) play a significant role in cell differentiation, biological development as well as the occurrence and growth of diseases. Although many computational methods contribute to predicting the association between miRNAs and diseases, they do not fully explore the attribute information contained in associated edges between miRNAs and diseases. In this study, we propose a new method, Hierarchical Hypergraph learning in Association-Weighted heterogeneous network for MiRNA-Disease association identification (HHAWMD). HHAWMD first adaptively fuses multi-view similarities based on channel attention and distinguishes the relevance of different associated relationships according to changes in expression levels of disease-related miRNAs, miRNA similarity information, and disease similarity information. Then, HHAWMD assigns edge weights and attribute features according to the association level to construct an association-weighted heterogeneous graph. Next, HHAWMD extracts the subgraph of the miRNA-disease node pair from the heterogeneous graph and builds the hyperedge (a kind of virtual edge) between the node pair to generate the hypergraph. Finally, HHAWMD proposes a hierarchical hypergraph learning approach, including node-aware attention and hyperedge-aware attention, which aggregates the abundant semantic information contained in deep and shallow neighborhoods to the hyperedge in the hypergraph. Our experiment results suggest that HHAWMD has better performance and can be used as a powerful tool for miRNA-disease association identification.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2531-2542"},"PeriodicalIF":3.6,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142545203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
circ2DGNN: circRNA-Disease Association Prediction via Transformer-Based Graph Neural Network circ2DGNN:通过基于变换器的图神经网络进行 circRNA-疾病关联预测。
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-10-30 DOI: 10.1109/TCBB.2024.3488281
Keliang Cen;Zheming Xing;Xuan Wang;Yadong Wang;Junyi Li
Investigating the associations between circRNA and diseases is vital for comprehending the underlying mechanisms of diseases and formulating effective therapies. Computational prediction methods often rely solely on known circRNA-disease data, indirectly incorporating other biomolecules' effects by computing circRNA and disease similarities based on these molecules. However, this approach is limited, as other biomolecules also play significant roles in circRNA-disease interactions. To address this, we construct a comprehensive heterogeneous network incorporating data on human circRNAs, diseases, and other biomolecule interactions to develop a novel computational model, circ2DGNN, which is built upon a heterogeneous graph neural network. circ2DGNN directly takes heterogeneous networks as inputs and obtains the embedded representation of each node for downstream link prediction through graph representation learning. circ2DGNN employs a Transformer-like architecture, which can compute heterogeneous attention score for each edge, and perform message propagation and aggregation, using a residual connection to enhance the representation vector. It uniquely applies the same parameter matrix only to identical meta-relationships, reflecting diverse parameter spaces for different relationship types. After fine-tuning hyperparameters via five-fold cross-validation, evaluation conducted on a test dataset shows circ2DGNN outperforms existing state-of-the-art(SOTA) methods.
研究 circRNA 与疾病之间的关联对于理解疾病的内在机制和制定有效疗法至关重要。计算预测方法通常仅依赖于已知的 circRNA-疾病数据,通过计算基于这些分子的 circRNA 和疾病相似性,间接纳入其他生物分子的影响。然而,这种方法存在局限性,因为其他生物大分子在 circRNA 与疾病的相互作用中也发挥着重要作用。为了解决这个问题,我们构建了一个综合的异构网络,其中包含人类 circRNA、疾病和其他生物分子相互作用的数据,从而开发出一种新型计算模型 circ2DGNN,它建立在异构图神经网络的基础上。circ2DGNN直接将异构网络作为输入,通过图表示学习获得每个节点的嵌入表示,用于下游链接预测。circ2DGNN采用了类似变形器的架构,可以计算每条边的异构关注度得分,并进行信息传播和聚合,利用残差连接增强表示向量。它唯一适用于相同元关系的相同参数矩阵,反映了不同关系类型的不同参数空间。通过五倍交叉验证对超参数进行微调后,在测试数据集上进行的评估显示,circ2DGNN优于现有的最先进(SOTA)方法。
{"title":"circ2DGNN: circRNA-Disease Association Prediction via Transformer-Based Graph Neural Network","authors":"Keliang Cen;Zheming Xing;Xuan Wang;Yadong Wang;Junyi Li","doi":"10.1109/TCBB.2024.3488281","DOIUrl":"10.1109/TCBB.2024.3488281","url":null,"abstract":"Investigating the associations between circRNA and diseases is vital for comprehending the underlying mechanisms of diseases and formulating effective therapies. Computational prediction methods often rely solely on known circRNA-disease data, indirectly incorporating other biomolecules' effects by computing circRNA and disease similarities based on these molecules. However, this approach is limited, as other biomolecules also play significant roles in circRNA-disease interactions. To address this, we construct a comprehensive heterogeneous network incorporating data on human circRNAs, diseases, and other biomolecule interactions to develop a novel computational model, circ2DGNN, which is built upon a heterogeneous graph neural network. circ2DGNN directly takes heterogeneous networks as inputs and obtains the embedded representation of each node for downstream link prediction through graph representation learning. circ2DGNN employs a Transformer-like architecture, which can compute heterogeneous attention score for each edge, and perform message propagation and aggregation, using a residual connection to enhance the representation vector. It uniquely applies the same parameter matrix only to identical meta-relationships, reflecting diverse parameter spaces for different relationship types. After fine-tuning hyperparameters via five-fold cross-validation, evaluation conducted on a test dataset shows circ2DGNN outperforms existing state-of-the-art(SOTA) methods.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2556-2567"},"PeriodicalIF":3.6,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142545200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discriminative Domain Adaption Network for Simultaneously Removing Batch Effects and Annotating Cell Types in Single-Cell RNA-Seq 在单细胞 RNA-Seq 中同时消除批次效应和标注细胞类型的判别域自适应网络
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-10-29 DOI: 10.1109/TCBB.2024.3487574
Qi Zhu;Aizhen Li;Zheng Zhang;Chuhang Zheng;Junyong Zhao;Jin-Xing Liu;Daoqiang Zhang;Wei Shao
Machine learning techniques have become increasingly important in analyzing single-cell RNA and identifying cell types, providing valuable insights into cellular development and disease mechanisms. However, the presence of batch effects poses major challenges in scRNA-seq analysis due to data distribution variation across batches. Although several batch effect mitigation algorithms have been proposed, most of them focus only on the correlation of local structure embeddings, ignoring global distribution matching and discriminative feature representation in batch correction. In this paper, we proposed the discriminative domain adaption network (D2AN) for joint batch effects correction and type annotation with single-cell RNA-seq. Specifically, we first captured the global low-dimensional embeddings of samples from the source and target domains by adversarial domain adaption strategy. Second, a contrastive loss is developed to preliminarily align the source domain samples. Moreover, the semantic alignment of class centroids in the source and target domains is achieved for further local alignment. Finally, a self-paced learning mechanism based on inter-domain loss is adopted to gradually select samples with high similarity to the target domain for training, which is used to improve the robustness of the model. Experimental results demonstrated that the proposed method on multiple real datasets outperforms several state-of-the-art methods.
机器学习技术在分析单细胞 RNA 和识别细胞类型方面的作用日益重要,为细胞发育和疾病机制提供了宝贵的见解。然而,由于不同批次的数据分布存在差异,批次效应的存在给 scRNA-seq 分析带来了重大挑战。虽然已有多种批次效应缓解算法被提出,但大多数算法只关注局部结构嵌入的相关性,忽略了批次校正中的全局分布匹配和判别特征表示。在本文中,我们提出了用于单细胞 RNA-seq 批次效应校正和类型标注的判别域自适应网络(D2AN)。具体来说,我们首先通过对抗性域自适应策略捕获源域和目标域样本的全局低维嵌入。其次,我们开发了一种对比损失(contrastive loss)来初步对齐源域样本。此外,还实现了源域和目标域中类中心点的语义对齐,以进一步进行局部对齐。最后,采用基于域间损失的自步进学习机制,逐步选择与目标域相似度高的样本进行训练,从而提高模型的鲁棒性。实验结果表明,所提出的方法在多个真实数据集上的表现优于几种最先进的方法。
{"title":"Discriminative Domain Adaption Network for Simultaneously Removing Batch Effects and Annotating Cell Types in Single-Cell RNA-Seq","authors":"Qi Zhu;Aizhen Li;Zheng Zhang;Chuhang Zheng;Junyong Zhao;Jin-Xing Liu;Daoqiang Zhang;Wei Shao","doi":"10.1109/TCBB.2024.3487574","DOIUrl":"10.1109/TCBB.2024.3487574","url":null,"abstract":"Machine learning techniques have become increasingly important in analyzing single-cell RNA and identifying cell types, providing valuable insights into cellular development and disease mechanisms. However, the presence of batch effects poses major challenges in scRNA-seq analysis due to data distribution variation across batches. Although several batch effect mitigation algorithms have been proposed, most of them focus only on the correlation of local structure embeddings, ignoring global distribution matching and discriminative feature representation in batch correction. In this paper, we proposed the discriminative domain adaption network (D2AN) for joint batch effects correction and type annotation with single-cell RNA-seq. Specifically, we first captured the global low-dimensional embeddings of samples from the source and target domains by adversarial domain adaption strategy. Second, a contrastive loss is developed to preliminarily align the source domain samples. Moreover, the semantic alignment of class centroids in the source and target domains is achieved for further local alignment. Finally, a self-paced learning mechanism based on inter-domain loss is adopted to gradually select samples with high similarity to the target domain for training, which is used to improve the robustness of the model. Experimental results demonstrated that the proposed method on multiple real datasets outperforms several state-of-the-art methods.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2543-2555"},"PeriodicalIF":3.6,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142545202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MLW-BFECF: A Multi-Weighted Dynamic Cascade Forest Based on Bilinear Feature Extraction for Predicting the Stage of Kidney Renal Clear Cell Carcinoma on Multi-Modal Gene Data MLW-BFECF:基于双线性特征提取的多加权动态级联森林,用于在多模态基因数据上预测肾透明细胞癌的分期。
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-10-25 DOI: 10.1109/TCBB.2024.3486742
Liye Jia;Liancheng Jiang;Junhong Yue;Fang Hao;Yongfei Wu;Xilin Liu
The stage prediction of kidney renal clear cell carcinoma (KIRC) is important for the diagnosis, personalized treatment, and prognosis of patients. Many prediction methods have been proposed, but most of them are based on unimodal gene data, and their accuracy is difficult to further improve. Therefore, we propose a novel multi-weighted dynamic cascade forest based on the bilinear feature extraction (MLW-BFECF) model for stage prediction of KIRC using multimodal gene data (RNA-seq, CNA, and methylation). The proposed model utilizes a dynamic cascade framework with shuffle layers to prevent early degradation of the model. In each cascade layer, a voting technique based on three gene selection algorithms is first employed to effectively retain gene features more relevant to KIRC and eliminate redundant information in gene features. Then, two new bilinear models based on the gated attention mechanism are proposed to better extract new intra-modal and inter-modal gene features; Finally, based on the idea of the bagging, a multi-weighted ensemble forest classifiers module is proposed to extract and fuse probabilistic features of the three-modal gene data. A series of experiments demonstrate that the MLW-BFECF model based on the three-modal KIRC dataset achieves the highest prediction performance with an accuracy of 88.9 %.
肾透明细胞癌(KIRC)的分期预测对于患者的诊断、个性化治疗和预后都非常重要。目前已提出了许多预测方法,但大多基于单模态基因数据,其准确性难以进一步提高。因此,我们提出了一种基于双线性特征提取的新型多权重动态级联森林(MLW-BFECF)模型,利用多模态基因数据集(RNA-seq、CNA 和甲基化)对 KIRC 进行分期预测。该模型采用动态级联框架和洗牌层,以防止模型的早期退化。在每个级联层中,首先采用基于三种基因选择算法的投票技术,以有效保留与 KIRC 更为相关的基因特征,并消除基因特征中的冗余信息。然后,提出了基于门控注意机制的两个新的双线性模型,以更好地提取新的模内和模间基因特征;最后,基于bagging的思想,提出了多加权集合森林分类器模块,以提取和融合三模态基因数据的概率特征。一系列实验证明,基于三模态 KIRC 数据集的 MLW-BFECF 模型预测准确率高达 88.92%,预测性能最高。
{"title":"MLW-BFECF: A Multi-Weighted Dynamic Cascade Forest Based on Bilinear Feature Extraction for Predicting the Stage of Kidney Renal Clear Cell Carcinoma on Multi-Modal Gene Data","authors":"Liye Jia;Liancheng Jiang;Junhong Yue;Fang Hao;Yongfei Wu;Xilin Liu","doi":"10.1109/TCBB.2024.3486742","DOIUrl":"10.1109/TCBB.2024.3486742","url":null,"abstract":"The stage prediction of kidney renal clear cell carcinoma (KIRC) is important for the diagnosis, personalized treatment, and prognosis of patients. Many prediction methods have been proposed, but most of them are based on unimodal gene data, and their accuracy is difficult to further improve. Therefore, we propose a novel multi-weighted dynamic cascade forest based on the bilinear feature extraction (MLW-BFECF) model for stage prediction of KIRC using multimodal gene data (RNA-seq, CNA, and methylation). The proposed model utilizes a dynamic cascade framework with shuffle layers to prevent early degradation of the model. In each cascade layer, a voting technique based on three gene selection algorithms is first employed to effectively retain gene features more relevant to KIRC and eliminate redundant information in gene features. Then, two new bilinear models based on the gated attention mechanism are proposed to better extract new intra-modal and inter-modal gene features; Finally, based on the idea of the bagging, a multi-weighted ensemble forest classifiers module is proposed to extract and fuse probabilistic features of the three-modal gene data. A series of experiments demonstrate that the MLW-BFECF model based on the three-modal KIRC dataset achieves the highest prediction performance with an accuracy of 88.9 %.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2568-2579"},"PeriodicalIF":3.6,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142499543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE/ACM Transactions on Computational Biology and Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1