IEEE/ACM Transactions on Computational Biology and Bioinformatics最新文献

Guest Editorial Guest Editorial for the 20th Asia Pacific Bioinformatics Conference 第20届亚太生物信息学会议客座评论

IF 3.6 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Pub Date : 2024-12-11 DOI: 10.1109/TCBB.2024.3475108

Su Datt Lam;Wai Keat Yam;Yi-Ping Phoebe Chen

The four papers in this special section were presented at the 20th Asia Pacific Bioinformatics Conference (APBC), which was held in Malaysia 26-28 April 2022.

这四篇论文在2022年4月26日至28日在马来西亚举行的第20届亚太生物信息学会议（APBC）上发表。

引用次数: 0

iAnOxPep: a machine learning model for the identification of anti-oxidative peptides using ensemble learning. iAnOxPep：利用集合学习识别抗氧化肽的机器学习模型。

IF 3.6 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Pub Date : 2024-11-11 DOI: 10.1109/TCBB.2024.3489614

Mir Tanveerul Hassan, Hilal Tayara, Kil To Chong

Due to their safety, high activity, and plentiful sources, antioxidant peptides, particularly those produced from food, are thought to be prospective competitors to synthetic antioxidants in the fight against free radical-mediated illnesses. The lengthy and laborious trial-and-error method for identifying antioxidative peptides (AOP) has raised interest in creating computational-based methods. There exist two state-of-the-art AOP predictors; however, the restriction on peptide sequence length makes them inviable. By overcoming the aforementioned problem, a novel predictor might be useful in the context of AOP prediction. The method has been trained, tested, and evaluated on two datasets: a balanced one and an unbalanced one. We used seven different descriptors and five machine-learning (ML) classifiers to construct 35 baseline models. Five ML classifiers were further trained to create five meta-models using the combined output of 35 baseline models. Finally, these five meta-models were aggregated together through ensemble learning to create a robust predictive model named iAnOxPep. On both datasets, our proposed model demonstrated good prediction performance when compared to baseline models and meta-models, demonstrating the superiority of our approach in the identification of AOPs. For the purpose of screening and identifying possible AOPs, we anticipate that the iAnOxPep method will be an invaluable tool.

抗氧化肽，尤其是从食物中提取的抗氧化肽，由于其安全性、高活性和丰富的来源，被认为是对抗自由基介导的疾病的合成抗氧化剂的潜在竞争对手。鉴定抗氧化肽（AOP）的漫长而费力的试错法引起了人们对创建基于计算的方法的兴趣。目前有两种最先进的抗氧化肽预测方法，但由于肽序列长度的限制，这两种方法并不可行。通过克服上述问题，一种新的预测方法可能对 AOP 预测有用。该方法在两个数据集上进行了训练、测试和评估：一个平衡数据集和一个不平衡数据集。我们使用七个不同的描述符和五个机器学习（ML）分类器构建了 35 个基线模型。我们进一步训练了五个机器学习分类器，利用 35 个基线模型的综合输出创建了五个元模型。最后，通过集合学习将这五个元模型聚合在一起，创建了一个名为 iAnOxPep 的稳健预测模型。在这两个数据集上，与基线模型和元模型相比，我们提出的模型都表现出了良好的预测性能，证明了我们的方法在识别 AOPs 方面的优越性。在筛选和识别可能的 AOPs 方面，我们预计 iAnOxPep 方法将是一个非常有价值的工具。

{"title":"iAnOxPep: a machine learning model for the identification of anti-oxidative peptides using ensemble learning.","authors":"Mir Tanveerul Hassan, Hilal Tayara, Kil To Chong","doi":"10.1109/TCBB.2024.3489614","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3489614","url":null,"abstract":"Due to their safety, high activity, and plentiful sources, antioxidant peptides, particularly those produced from food, are thought to be prospective competitors to synthetic antioxidants in the fight against free radical-mediated illnesses. The lengthy and laborious trial-and-error method for identifying antioxidative peptides (AOP) has raised interest in creating computational-based methods. There exist two state-of-the-art AOP predictors; however, the restriction on peptide sequence length makes them inviable. By overcoming the aforementioned problem, a novel predictor might be useful in the context of AOP prediction. The method has been trained, tested, and evaluated on two datasets: a balanced one and an unbalanced one. We used seven different descriptors and five machine-learning (ML) classifiers to construct 35 baseline models. Five ML classifiers were further trained to create five meta-models using the combined output of 35 baseline models. Finally, these five meta-models were aggregated together through ensemble learning to create a robust predictive model named iAnOxPep. On both datasets, our proposed model demonstrated good prediction performance when compared to baseline models and meta-models, demonstrating the superiority of our approach in the identification of AOPs. For the purpose of screening and identifying possible AOPs, we anticipate that the iAnOxPep method will be an invaluable tool.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142619332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Performance Comparison between Deep Neural Network and Machine Learning based Classifiers for Huntington Disease Prediction from Human DNA Sequence. 基于深度神经网络和机器学习的分类器在从人类 DNA 序列预测亨廷顿病方面的性能比较。

IF 3.6 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Pub Date : 2024-11-07 DOI: 10.1109/TCBB.2024.3493203

C Vishnuppriya, G Tamilpavai

Huntington Disease (HD) is a type of neurodegenerative disorder which causes problems like psychiatric disturbances, movement problem, weight loss and problem in sleep. It needs to be addressed in earlier stage of human life. Nowadays Deep Learning (DL) based system could help physicians provide second opinion in treating patient's disease. In this work, human Deoxyribo Nucleic Acid (DNA) sequence is analyzed using Deep Neural Network (DNN) algorithm to predict the HD disease. The main objective of this work is to identify whether the human DNA is affected by HD or not. Human DNA sequences are collected from National Center for Biotechnology Information (NCBI) and synthetic human DNA data are also constructed for process. Then numerical conversion of human DNA sequence data is done by Chaos Game Representation (CGR) method. After that, numerical values of DNA data are used for feature extraction. Mean, median, standard deviation, entropy, contrast, correlation, energy and homogeneity are extracted. Additionally, the following features such as counts of adenine, thymine, guanine and cytosine are extracted from the DNA sequence data itself. The extracted features are used as input to the DNN classifier and other machine learning based classifiers such as NN (Neural Network), Support Vector Machine (SVM), Random Forest (RF) and Classification Tree with Forward Pruning (CTWFP). Six performance measures are used such as Accuracy, Sensitivity, Specificity, Precision, F1 score and Mathew Correlation Co-efficient (MCC). The study concludes DNN, NN, SVM, RF achieve 100% accuracy and CTWFP achieves accuracy of 87%.

亨廷顿舞蹈症（Huntington Disease，HD）是一种神经退行性疾病，会导致精神障碍、运动障碍、体重减轻和睡眠障碍等问题。这种疾病需要在人类生命的早期阶段加以解决。如今，基于深度学习（DL）的系统可以帮助医生在治疗患者疾病时提供第二意见。在这项工作中，使用深度神经网络（DNN）算法对人类脱氧核糖核酸（DNA）序列进行分析，以预测人类乳腺疾病。这项工作的主要目的是确定人类 DNA 是否受 HD 影响。从美国国家生物技术信息中心（NCBI）收集了人类 DNA 序列，并构建了合成人类 DNA 数据。然后通过混沌博弈表示法（CGR）对人类 DNA 序列数据进行数值转换。之后，DNA 数据的数值被用于特征提取。提取出平均值、中位数、标准偏差、熵、对比度、相关性、能量和同质性。此外，还从 DNA 序列数据中提取了腺嘌呤、胸腺嘧啶、鸟嘌呤和胞嘧啶的计数等特征。提取的特征被用作 DNN 分类器和其他基于机器学习的分类器的输入，如 NN（神经网络）、支持向量机（SVM）、随机森林（RF）和前向剪枝分类树（CTWFP）。使用了六种性能指标，如准确度、灵敏度、特异度、精确度、F1 分数和马修相关系数 (MCC)。研究得出结论，DNN、NN、SVM、RF 的准确率达到 100%，CTWFP 的准确率达到 87%。

{"title":"Performance Comparison between Deep Neural Network and Machine Learning based Classifiers for Huntington Disease Prediction from Human DNA Sequence.","authors":"C Vishnuppriya, G Tamilpavai","doi":"10.1109/TCBB.2024.3493203","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3493203","url":null,"abstract":"Huntington Disease (HD) is a type of neurodegenerative disorder which causes problems like psychiatric disturbances, movement problem, weight loss and problem in sleep. It needs to be addressed in earlier stage of human life. Nowadays Deep Learning (DL) based system could help physicians provide second opinion in treating patient's disease. In this work, human Deoxyribo Nucleic Acid (DNA) sequence is analyzed using Deep Neural Network (DNN) algorithm to predict the HD disease. The main objective of this work is to identify whether the human DNA is affected by HD or not. Human DNA sequences are collected from National Center for Biotechnology Information (NCBI) and synthetic human DNA data are also constructed for process. Then numerical conversion of human DNA sequence data is done by Chaos Game Representation (CGR) method. After that, numerical values of DNA data are used for feature extraction. Mean, median, standard deviation, entropy, contrast, correlation, energy and homogeneity are extracted. Additionally, the following features such as counts of adenine, thymine, guanine and cytosine are extracted from the DNA sequence data itself. The extracted features are used as input to the DNN classifier and other machine learning based classifiers such as NN (Neural Network), Support Vector Machine (SVM), Random Forest (RF) and Classification Tree with Forward Pruning (CTWFP). Six performance measures are used such as Accuracy, Sensitivity, Specificity, Precision, F1 score and Mathew Correlation Co-efficient (MCC). The study concludes DNN, NN, SVM, RF achieve 100% accuracy and CTWFP achieves accuracy of 87%.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142604214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DeepLigType: Predicting Ligand Types of Protein-Ligand Binding Sites Using a Deep Learning Model. DeepLigType：使用深度学习模型预测蛋白质配体结合位点的配体类型。

IF 3.6 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Pub Date : 2024-11-07 DOI: 10.1109/TCBB.2024.3493820

Orhun Vural, Leon Jololian, Lurong Pan

The analysis of protein-ligand binding sites plays a crucial role in the initial stages of drug discovery. Accurately predicting the ligand types that are likely to bind to protein-ligand binding sites enables more informed decision making in drug design. Our study, DeepLigType, determines protein-ligand binding sites using Fpocket and then predicts the ligand type of these pockets with the deep learning model, Convolutional Block Attention Module (CBAM) with ResNet. CBAM-ResNet has been trained to accurately predict five distinct ligand types. We classified protein-ligand binding sites into five different categories according to the type of response ligands cause when they bind to their target proteins, which are antagonist, agonist, activator, inhibitor, and others. We created a novel dataset, referred to as LigType5, from the widely recognized PDBbind and scPDB dataset for training and testing our model. While the literature mostly focuses on the specificity and characteristic analysis of protein binding sites by experimental (laboratory-based) methods, we propose a computational method with the DeepLigType architecture. DeepLigType demonstrated an accuracy of 74.30% and an AUC of 0.83 in ligand type prediction on a novel test dataset using the CBAM-ResNet deep learning model. For access to the code implementation of this research, please visit our GitHub repository at https://github.com/drorhunvural/DeepLigType.

对蛋白质配体结合位点的分析在药物发现的初始阶段起着至关重要的作用。准确预测可能与蛋白质配体结合位点结合的配体类型，有助于在药物设计中做出更明智的决策。我们的研究 "DeepLigType "利用 Fpocket 确定蛋白质配体结合位点，然后利用深度学习模型卷积块注意模块（CBAM）和 ResNet 预测这些口袋的配体类型。经过训练，CBAM-ResNet 可以准确预测五种不同的配体类型。我们根据配体与靶蛋白结合时产生的反应类型，将蛋白质配体结合位点分为五种不同的类别，即拮抗剂、激动剂、激活剂、抑制剂和其他。我们从广泛认可的 PDBbind 和 scPDB 数据集中创建了一个称为 LigType5 的新数据集，用于训练和测试我们的模型。文献大多侧重于通过实验（基于实验室）方法分析蛋白质结合位点的特异性和特征，而我们则提出了一种采用 DeepLigType 架构的计算方法。DeepLigType 在使用 CBAM-ResNet 深度学习模型的新型测试数据集上进行配体类型预测时，准确率达到 74.30%，AUC 达到 0.83。如需访问本研究的代码实现，请访问我们的 GitHub 存储库 https://github.com/drorhunvural/DeepLigType。

{"title":"DeepLigType: Predicting Ligand Types of Protein-Ligand Binding Sites Using a Deep Learning Model.","authors":"Orhun Vural, Leon Jololian, Lurong Pan","doi":"10.1109/TCBB.2024.3493820","DOIUrl":"10.1109/TCBB.2024.3493820","url":null,"abstract":"The analysis of protein-ligand binding sites plays a crucial role in the initial stages of drug discovery. Accurately predicting the ligand types that are likely to bind to protein-ligand binding sites enables more informed decision making in drug design. Our study, DeepLigType, determines protein-ligand binding sites using Fpocket and then predicts the ligand type of these pockets with the deep learning model, Convolutional Block Attention Module (CBAM) with ResNet. CBAM-ResNet has been trained to accurately predict five distinct ligand types. We classified protein-ligand binding sites into five different categories according to the type of response ligands cause when they bind to their target proteins, which are antagonist, agonist, activator, inhibitor, and others. We created a novel dataset, referred to as LigType5, from the widely recognized PDBbind and scPDB dataset for training and testing our model. While the literature mostly focuses on the specificity and characteristic analysis of protein binding sites by experimental (laboratory-based) methods, we propose a computational method with the DeepLigType architecture. DeepLigType demonstrated an accuracy of 74.30% and an AUC of 0.83 in ligand type prediction on a novel test dataset using the CBAM-ResNet deep learning model. For access to the code implementation of this research, please visit our GitHub repository at https://github.com/drorhunvural/DeepLigType.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142604213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AI-based Computational Methods in Early Drug Discovery and Post Market Drug Assessment: A Survey. 基于人工智能的计算方法在早期药物发现和上市后药物评估中的应用：调查。

IF 3.6 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Pub Date : 2024-11-06 DOI: 10.1109/TCBB.2024.3492708

Flora Rajaei, Cristian Minoccheri, Emily Wittrup, Richard C Wilson, Brian D Athey, Gilbert S Omenn, Kayvan Najarian

Over the past few years, artificial intelligence (AI) has emerged as a transformative force in drug discovery and development (DDD), revolutionizing many aspects of the process. This survey provides a comprehensive review of recent advancements in AI applications within early drug discovery and post-market drug assessment. It addresses the identification and prioritization of new therapeutic targets, prediction of drug-target interaction (DTI), design of novel drug-like molecules, and assessment of the clinical efficacy of new medications. By integrating AI technologies, pharmaceutical companies can accelerate the discovery of new treatments, enhance the precision of drug development, and bring more effective therapies to market. This shift represents a significant move towards more efficient and cost-effective methodologies in the DDD landscape.

在过去几年中，人工智能（AI）已成为药物发现与开发（DDD）领域的一股变革性力量，彻底改变了药物发现与开发过程的许多方面。本调查全面回顾了人工智能在早期药物发现和上市后药物评估中应用的最新进展。它涉及新治疗靶点的识别和优先排序、药物-靶点相互作用（DTI）预测、新型类药物分子设计以及新药临床疗效评估。通过整合人工智能技术，制药公司可以加快新疗法的发现，提高药物开发的精准度，并将更有效的疗法推向市场。这一转变标志着 DDD 领域正朝着更高效、更具成本效益的方法迈出重要一步。

引用次数: 0

Enhancing Single-Cell RNA-seq Data Completeness with a Graph Learning Framework. 利用图形学习框架提高单细胞 RNA-seq 数据的完整性。

IF 3.6 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Pub Date : 2024-11-06 DOI: 10.1109/TCBB.2024.3492384

Snehalika Lall, Sumanta Ray, Sanghamitra Bandyopadhyay

Single cell RNA sequencing (scRNA-seq) is a powerful tool to capture gene expression snapshots in individual cells. However, a low amount of RNA in the individual cells results in dropout events, which introduce huge zero counts in the single cell expression matrix. We have developed VAImpute, a variational graph autoencoder based imputation technique that learns the inherent distribution of a large network/graph constructed from the scRNA-seq data leveraging copula correlation ( Ccor) among cells/genes. The trained model is utilized to predict the dropouts events by computing the probability of all non-edges (cell-gene) in the network. We devise an algorithm to impute the missing expression values of the detected dropouts. The performance of the proposed model is assessed on both simulated and real scRNA-seq datasets, comparing it to established single-cell imputation methods. VAImpute yields significant improvements to detect dropouts, thereby achieving superior performance in cell clustering, detecting rare cells, and differential expression. All codes and datasets are given in the github link: https://github.com/sumantaray/VAImputeAvailability.

单细胞 RNA 测序（scRNA-seq）是捕捉单个细胞基因表达快照的强大工具。然而，由于单个细胞中的 RNA 含量较低，因此会出现丢失事件，从而在单细胞表达矩阵中引入大量零计数。我们开发的 VAImpute 是一种基于变异图自动编码器的估算技术，它利用细胞/基因间的 copula correlation ( Ccor) 学习由 scRNA-seq 数据构建的大型网络/图的固有分布。通过计算网络中所有非边（细胞-基因）的概率，利用训练好的模型预测掉线事件。我们还设计了一种算法，对检测到的缺失表达值进行补偿。我们在模拟和真实的 scRNA-seq 数据集上评估了拟议模型的性能，并将其与已有的单细胞估算方法进行了比较。VAImpute 在检测缺失方面有显著改进，因此在细胞聚类、检测稀有细胞和差异表达方面表现出色。所有代码和数据集都在 github 链接中提供：https://github.com/sumantaray/VAImputeAvailability。

{"title":"Enhancing Single-Cell RNA-seq Data Completeness with a Graph Learning Framework.","authors":"Snehalika Lall, Sumanta Ray, Sanghamitra Bandyopadhyay","doi":"10.1109/TCBB.2024.3492384","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3492384","url":null,"abstract":"Single cell RNA sequencing (scRNA-seq) is a powerful tool to capture gene expression snapshots in individual cells. However, a low amount of RNA in the individual cells results in dropout events, which introduce huge zero counts in the single cell expression matrix. We have developed VAImpute, a variational graph autoencoder based imputation technique that learns the inherent distribution of a large network/graph constructed from the scRNA-seq data leveraging copula correlation ( Ccor) among cells/genes. The trained model is utilized to predict the dropouts events by computing the probability of all non-edges (cell-gene) in the network. We devise an algorithm to impute the missing expression values of the detected dropouts. The performance of the proposed model is assessed on both simulated and real scRNA-seq datasets, comparing it to established single-cell imputation methods. VAImpute yields significant improvements to detect dropouts, thereby achieving superior performance in cell clustering, detecting rare cells, and differential expression. All codes and datasets are given in the github link: https://github.com/sumantaray/VAImputeAvailability.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142590782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hierarchical Hypergraph Learning in Association- Weighted Heterogeneous Network for miRNA- Disease Association Identification 用于 miRNA 与疾病关联识别的关联加权异构网络中的层次超图学习

IF 3.6 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Pub Date : 2024-10-30 DOI: 10.1109/TCBB.2024.3485788

Qiao Ning;Yaomiao Zhao;Jun Gao;Chen Chen;Minghao Yin

MicroRNAs (miRNAs) play a significant role in cell differentiation, biological development as well as the occurrence and growth of diseases. Although many computational methods contribute to predicting the association between miRNAs and diseases, they do not fully explore the attribute information contained in associated edges between miRNAs and diseases. In this study, we propose a new method, Hierarchical Hypergraph learning in Association-Weighted heterogeneous network for MiRNA-Disease association identification (HHAWMD). HHAWMD first adaptively fuses multi-view similarities based on channel attention and distinguishes the relevance of different associated relationships according to changes in expression levels of disease-related miRNAs, miRNA similarity information, and disease similarity information. Then, HHAWMD assigns edge weights and attribute features according to the association level to construct an association-weighted heterogeneous graph. Next, HHAWMD extracts the subgraph of the miRNA-disease node pair from the heterogeneous graph and builds the hyperedge (a kind of virtual edge) between the node pair to generate the hypergraph. Finally, HHAWMD proposes a hierarchical hypergraph learning approach, including node-aware attention and hyperedge-aware attention, which aggregates the abundant semantic information contained in deep and shallow neighborhoods to the hyperedge in the hypergraph. Our experiment results suggest that HHAWMD has better performance and can be used as a powerful tool for miRNA-disease association identification.

微小核糖核酸（miRNA）在细胞分化、生物发育以及疾病的发生和发展中发挥着重要作用。虽然许多计算方法有助于预测 miRNA 与疾病之间的关联，但它们并没有充分挖掘 miRNA 与疾病之间关联边所包含的属性信息。在本研究中，我们提出了一种新方法--用于 MiRNA 与疾病关联识别的关联加权异构网络中的层次超图学习（HHAWMD）。HHAWMD 首先基于通道注意力自适应地融合多视图相似性，并根据疾病相关 miRNA 表达水平的变化、miRNA 相似性信息和疾病相似性信息区分不同关联关系的相关性。然后，HHAWMD 根据关联程度分配边权重和属性特征，构建关联加权异构图。接着，HHAWMD 从异质图中提取 miRNA-疾病节点对的子图，并在节点对之间建立超边（一种虚拟边），生成超图。最后，HHAWMD 提出了一种分层超图学习方法，包括节点感知注意力和超边感知注意力，将深层和浅层邻域中包含的丰富语义信息聚合到超图中的超边。实验结果表明，HHAWMD 具有更好的性能，可作为 miRNA 与疾病关联识别的有力工具。HHAWMD的源代码和数据可在https://github.com/ningq669/HHAWMD/。

{"title":"Hierarchical Hypergraph Learning in Association- Weighted Heterogeneous Network for miRNA- Disease Association Identification","authors":"Qiao Ning;Yaomiao Zhao;Jun Gao;Chen Chen;Minghao Yin","doi":"10.1109/TCBB.2024.3485788","DOIUrl":"10.1109/TCBB.2024.3485788","url":null,"abstract":"MicroRNAs (miRNAs) play a significant role in cell differentiation, biological development as well as the occurrence and growth of diseases. Although many computational methods contribute to predicting the association between miRNAs and diseases, they do not fully explore the attribute information contained in associated edges between miRNAs and diseases. In this study, we propose a new method, Hierarchical Hypergraph learning in Association-Weighted heterogeneous network for MiRNA-Disease association identification (HHAWMD). HHAWMD first adaptively fuses multi-view similarities based on channel attention and distinguishes the relevance of different associated relationships according to changes in expression levels of disease-related miRNAs, miRNA similarity information, and disease similarity information. Then, HHAWMD assigns edge weights and attribute features according to the association level to construct an association-weighted heterogeneous graph. Next, HHAWMD extracts the subgraph of the miRNA-disease node pair from the heterogeneous graph and builds the hyperedge (a kind of virtual edge) between the node pair to generate the hypergraph. Finally, HHAWMD proposes a hierarchical hypergraph learning approach, including node-aware attention and hyperedge-aware attention, which aggregates the abundant semantic information contained in deep and shallow neighborhoods to the hyperedge in the hypergraph. Our experiment results suggest that HHAWMD has better performance and can be used as a powerful tool for miRNA-disease association identification.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2531-2542"},"PeriodicalIF":3.6,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142545203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LHPre: Phage Host Prediction with VAE-based Class Imbalance Correction and Lyase Sequence Embedding. LHPre：利用基于 VAE 的类不平衡校正和 Lyase 序列嵌入进行噬菌体宿主预测。

IF 3.6 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Pub Date : 2024-10-30 DOI: 10.1109/TCBB.2024.3488059

Jia Wang, Zhenjing Yu, Jianqiang Li

The escalation of antibiotic resistance underscores the need for innovative approaches to combat bacterial infections. Phage therapy has emerged as a promising solution, wherein host determination plays an important role. Phage lysins, characterized by their specificity in targeting and cleaving corresponding host bacteria, serve as key players in this paradigm. In this study, we present a novel approach by leveraging genes of phage-encoded lytic enzymes for host prediction, culminating in the development of LHPre. Initially, gene fragments of phage-encoded lytic enzymes and their respective hosts were collected from the database. Secondly, DNA sequences were encoded using the Frequency Chaos Game Representation (FCGR) method, and pseudo samples were generated employing the Variational Autoencoder (VAE) model to address class imbalance. Finally, a prediction model was constructed using the Vision Transformer(Vit) model. Five-fold cross-validation results demonstrated that LHPre surpassed other state-of-the-art phage host prediction methods, achieving accuracies of 85.04%, 90.01%, and 93.39% at the species, genus, and family levels, respectively.

抗生素耐药性的升级凸显了采用创新方法抗击细菌感染的必要性。噬菌体疗法已成为一种前景广阔的解决方案，其中宿主决定起着重要作用。噬菌体溶菌素具有靶向和裂解相应宿主细菌的特异性，是这一模式中的关键角色。在这项研究中，我们提出了一种新方法，利用噬菌体编码的溶菌酶基因进行宿主预测，最终开发出 LHPre。首先，我们从数据库中收集了噬菌体编码的溶菌酶基因片段及其各自的宿主。其次，利用频率混沌博弈表示法（FCGR）对DNA序列进行编码，并利用变异自动编码器（VAE）模型生成伪样本，以解决类不平衡问题。最后，利用视觉转换器（Vit）模型构建了一个预测模型。五倍交叉验证结果表明，LHPre 超越了其他最先进的噬菌体宿主预测方法，在种、属和科层面的准确率分别达到了 85.04%、90.01% 和 93.39%。

{"title":"LHPre: Phage Host Prediction with VAE-based Class Imbalance Correction and Lyase Sequence Embedding.","authors":"Jia Wang, Zhenjing Yu, Jianqiang Li","doi":"10.1109/TCBB.2024.3488059","DOIUrl":"10.1109/TCBB.2024.3488059","url":null,"abstract":"The escalation of antibiotic resistance underscores the need for innovative approaches to combat bacterial infections. Phage therapy has emerged as a promising solution, wherein host determination plays an important role. Phage lysins, characterized by their specificity in targeting and cleaving corresponding host bacteria, serve as key players in this paradigm. In this study, we present a novel approach by leveraging genes of phage-encoded lytic enzymes for host prediction, culminating in the development of LHPre. Initially, gene fragments of phage-encoded lytic enzymes and their respective hosts were collected from the database. Secondly, DNA sequences were encoded using the Frequency Chaos Game Representation (FCGR) method, and pseudo samples were generated employing the Variational Autoencoder (VAE) model to address class imbalance. Finally, a prediction model was constructed using the Vision Transformer(Vit) model. Five-fold cross-validation results demonstrated that LHPre surpassed other state-of-the-art phage host prediction methods, achieving accuracies of 85.04%, 90.01%, and 93.39% at the species, genus, and family levels, respectively.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142545204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

circ2DGNN: circRNA-Disease Association Prediction via Transformer-Based Graph Neural Network circ2DGNN：通过基于变换器的图神经网络进行 circRNA-疾病关联预测。

IF 3.6 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Pub Date : 2024-10-30 DOI: 10.1109/TCBB.2024.3488281

Keliang Cen;Zheming Xing;Xuan Wang;Yadong Wang;Junyi Li

Investigating the associations between circRNA and diseases is vital for comprehending the underlying mechanisms of diseases and formulating effective therapies. Computational prediction methods often rely solely on known circRNA-disease data, indirectly incorporating other biomolecules' effects by computing circRNA and disease similarities based on these molecules. However, this approach is limited, as other biomolecules also play significant roles in circRNA-disease interactions. To address this, we construct a comprehensive heterogeneous network incorporating data on human circRNAs, diseases, and other biomolecule interactions to develop a novel computational model, circ2DGNN, which is built upon a heterogeneous graph neural network. circ2DGNN directly takes heterogeneous networks as inputs and obtains the embedded representation of each node for downstream link prediction through graph representation learning. circ2DGNN employs a Transformer-like architecture, which can compute heterogeneous attention score for each edge, and perform message propagation and aggregation, using a residual connection to enhance the representation vector. It uniquely applies the same parameter matrix only to identical meta-relationships, reflecting diverse parameter spaces for different relationship types. After fine-tuning hyperparameters via five-fold cross-validation, evaluation conducted on a test dataset shows circ2DGNN outperforms existing state-of-the-art(SOTA) methods.

研究 circRNA 与疾病之间的关联对于理解疾病的内在机制和制定有效疗法至关重要。计算预测方法通常仅依赖于已知的 circRNA-疾病数据，通过计算基于这些分子的 circRNA 和疾病相似性，间接纳入其他生物分子的影响。然而，这种方法存在局限性，因为其他生物大分子在 circRNA 与疾病的相互作用中也发挥着重要作用。为了解决这个问题，我们构建了一个综合的异构网络，其中包含人类 circRNA、疾病和其他生物分子相互作用的数据，从而开发出一种新型计算模型 circ2DGNN，它建立在异构图神经网络的基础上。circ2DGNN直接将异构网络作为输入，通过图表示学习获得每个节点的嵌入表示，用于下游链接预测。circ2DGNN采用了类似变形器的架构，可以计算每条边的异构关注度得分，并进行信息传播和聚合，利用残差连接增强表示向量。它唯一适用于相同元关系的相同参数矩阵，反映了不同关系类型的不同参数空间。通过五倍交叉验证对超参数进行微调后，在测试数据集上进行的评估显示，circ2DGNN优于现有的最先进（SOTA）方法。

{"title":"circ2DGNN: circRNA-Disease Association Prediction via Transformer-Based Graph Neural Network","authors":"Keliang Cen;Zheming Xing;Xuan Wang;Yadong Wang;Junyi Li","doi":"10.1109/TCBB.2024.3488281","DOIUrl":"10.1109/TCBB.2024.3488281","url":null,"abstract":"Investigating the associations between circRNA and diseases is vital for comprehending the underlying mechanisms of diseases and formulating effective therapies. Computational prediction methods often rely solely on known circRNA-disease data, indirectly incorporating other biomolecules' effects by computing circRNA and disease similarities based on these molecules. However, this approach is limited, as other biomolecules also play significant roles in circRNA-disease interactions. To address this, we construct a comprehensive heterogeneous network incorporating data on human circRNAs, diseases, and other biomolecule interactions to develop a novel computational model, circ2DGNN, which is built upon a heterogeneous graph neural network. circ2DGNN directly takes heterogeneous networks as inputs and obtains the embedded representation of each node for downstream link prediction through graph representation learning. circ2DGNN employs a Transformer-like architecture, which can compute heterogeneous attention score for each edge, and perform message propagation and aggregation, using a residual connection to enhance the representation vector. It uniquely applies the same parameter matrix only to identical meta-relationships, reflecting diverse parameter spaces for different relationship types. After fine-tuning hyperparameters via five-fold cross-validation, evaluation conducted on a test dataset shows circ2DGNN outperforms existing state-of-the-art(SOTA) methods.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2556-2567"},"PeriodicalIF":3.6,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142545200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Detecting Boolean Asymmetric Relationships with a Loop Counting Technique and its Implications for Analyzing Heterogeneity within Gene Expression Datasets. 利用循环计数技术检测布尔不对称关系及其对分析基因表达数据集异质性的影响

IF 3.6 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Pub Date : 2024-10-29 DOI: 10.1109/TCBB.2024.3487434

Haosheng Zhou, Wei Lin, Sergio R Labra, Stuart A Lipton, Jeremy A Elman, Nicholas J Schork, Aaditya V Rangan

Many traditional methods for analyzing gene-gene relationships focus on positive and negative correlations, both of which are a kind of 'symmetric' relationship. Biclustering is one such technique that typically searches for subsets of genes exhibiting correlated expression among a subset of samples. However, genes can also exhibit 'asymmetric' relationships, such as 'if-then' relationships used in boolean circuits. In this paper we develop a very general method that can be used to detect biclusters within gene-expression data that involve subsets of genes which are enriched for these 'boolean-asymmetric' relationships (BARs). These BAR-biclusters can correspond to heterogeneity that is driven by asymmetric gene-gene interactions, e.g., reflecting regulatory effects of one gene on another, rather than more standard symmetric interactions. Unlike typical approaches that search for BARs across the entire population, BAR-biclusters can detect asymmetric interactions that only occur among a subset of samples. We apply our method to a single-cell RNA-sequencing data-set, demonstrating that the statistically-significant BARbiclusters indeed contain additional information not present within the more traditional 'boolean-symmetric'-biclusters. For example, the BAR-biclusters involve different subsets of cells, and highlight different gene-pathways within the data-set. Moreover, by combining the boolean-asymmetric- and boolean-symmetricsignals, one can build linear classifiers which outperform those built using only traditional boolean-symmetric signals.

许多分析基因-基因关系的传统方法都侧重于正相关和负相关，这两种关系都是一种 "对称 "关系。双聚类就是这样一种技术，它通常在样本子集中搜索表现出相关表达的基因子集。然而，基因也可以表现出 "非对称 "关系，例如布尔电路中使用的 "如果-那么 "关系。在本文中，我们开发了一种非常通用的方法，可用于检测基因表达数据中的双簇，这些数据涉及富集了这些 "布尔-非对称 "关系（BAR）的基因子集。这些 "布尔-非对称 "关系双集群可能对应于由非对称基因-基因相互作用驱动的异质性，例如，反映一个基因对另一个基因的调控作用，而不是更标准的对称相互作用。与在整个群体中搜索 BAR 的典型方法不同，BAR-双簇可以检测到只发生在部分样本中的非对称相互作用。我们将这一方法应用于单细胞 RNA 序列数据集，结果表明，在统计意义上显著的 BAR 双簇确实包含了更传统的 "布尔-对称 "双簇所不具备的额外信息。例如，BAR 双簇涉及不同的细胞子集，并突出了数据集中不同的基因通路。此外，通过结合布尔-非对称信号和布尔-对称信号，我们可以建立线性分类器，其效果优于仅使用传统布尔-对称信号建立的分类器。

{"title":"Detecting Boolean Asymmetric Relationships with a Loop Counting Technique and its Implications for Analyzing Heterogeneity within Gene Expression Datasets.","authors":"Haosheng Zhou, Wei Lin, Sergio R Labra, Stuart A Lipton, Jeremy A Elman, Nicholas J Schork, Aaditya V Rangan","doi":"10.1109/TCBB.2024.3487434","DOIUrl":"10.1109/TCBB.2024.3487434","url":null,"abstract":"Many traditional methods for analyzing gene-gene relationships focus on positive and negative correlations, both of which are a kind of 'symmetric' relationship. Biclustering is one such technique that typically searches for subsets of genes exhibiting correlated expression among a subset of samples. However, genes can also exhibit 'asymmetric' relationships, such as 'if-then' relationships used in boolean circuits. In this paper we develop a very general method that can be used to detect biclusters within gene-expression data that involve subsets of genes which are enriched for these 'boolean-asymmetric' relationships (BARs). These BAR-biclusters can correspond to heterogeneity that is driven by asymmetric gene-gene interactions, e.g., reflecting regulatory effects of one gene on another, rather than more standard symmetric interactions. Unlike typical approaches that search for BARs across the entire population, BAR-biclusters can detect asymmetric interactions that only occur among a subset of samples. We apply our method to a single-cell RNA-sequencing data-set, demonstrating that the statistically-significant BARbiclusters indeed contain additional information not present within the more traditional 'boolean-symmetric'-biclusters. For example, the BAR-biclusters involve different subsets of cells, and highlight different gene-pathways within the data-set. Moreover, by combining the boolean-asymmetric- and boolean-symmetricsignals, one can build linear classifiers which outperform those built using only traditional boolean-symmetric signals.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142545201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0