IEEE/ACM Transactions on Computational Biology and Bioinformatics最新文献_第4页

LEC-Codec: Learning-Based Genome Data Compression LEC-Codec：基于学习的基因组数据压缩

IF 3.6 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Pub Date : 2024-10-03 DOI: 10.1109/TCBB.2024.3473899

Zhenhao Sun;Meng Wang;Shiqi Wang;Sam Kwong

In this paper, we propose a Learning-based gEnome Codec (LEC), which is designed for high efficiency and enhanced flexibility. The LEC integrates several advanced technologies, including Group of Bases (GoB) compression, multi-stride coding and bidirectional prediction, all of which are aimed at optimizing the balance between coding complexity and performance in lossless compression. The model applied in our proposed codec is data-driven, based on deep neural networks to infer probabilities for each symbol, enabling fully parallel encoding and decoding with configured complexity for diverse applications. Based upon a set of configurations on compression ratios and inference speed, experimental results show that the proposed method is very efficient in terms of compression performance and provides improved flexibility in real-world applications.

在本文中，我们提出了基于学习的 gEnome 编解码器 (LEC)，其设计旨在提高效率和灵活性。LEC 集成了多项先进技术，包括基群（GoB）压缩、多线编码和双向预测，所有这些技术都旨在优化无损压缩中编码复杂性和性能之间的平衡。我们提出的编解码器中应用的模型是数据驱动的，基于深度神经网络来推断每个符号的概率，从而实现完全并行的编码和解码，并为不同的应用配置复杂度。基于压缩比和推理速度的一系列配置，实验结果表明，所提出的方法在压缩性能方面非常高效，并为实际应用提供了更大的灵活性。

引用次数: 0

Enhancing Spatial Domain Identification in Spatially Resolved Transcriptomics Using Graph Convolutional Networks With Adaptively Feature-Spatial Balance and Contrastive Learning 利用具有自适应特征空间平衡和对比学习功能的图卷积网络增强空间分辨转录组学中的空间域识别能力

IF 3.6 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Pub Date : 2024-09-27 DOI: 10.1109/TCBB.2024.3469164

Xuena Liang;Junliang Shang;Jin-Xing Liu;Chun-Hou Zheng;Juan Wang

Recent advancements in spatially transcriptomics (ST) technologies have enabled the comprehensive measurement of gene expression profiles while preserving the spatial information of cells. Combining gene expression profiles and spatial information has been the most commonly used method to identify spatial functional domains and genes. However, most existing spatial domain decipherer methods are more focused on spatially neighboring structures and fail to take into account balancing the self-characteristics and the spatial structure dependency of spots. Therefore, we propose a novel model called SpaGCAC, which recognizes spatial domains with the help of an adaptive feature-spatial balanced graph convolutional network named AFSBGCN. The AFSBGCN can dynamically learn the relationship between spatial local topology structures and the self-characteristics of spots by adaptively increasing or declining the weight on the self-characteristics during message aggregation. Moreover, to better capture the local structures of spots, SpaGCAC exploits a local topology structure contrastive learning strategy. Meanwhile, SpaGCAC utilizes a probability distribution contrastive learning strategy to increase the similarity of probability distributions for points belonging to the same category. We validate the performance of SpaGCAC for spatial domain identification on four spatial transcriptomic datasets. In comparison with seven spatial domain recognition methods, SpaGCAC achieved the highest NMI median of 0.683 and the second highest ARI median of 0.559 on the multi-slice DLPFC dataset. SpaGCAC achieved the best results on all three other single-slice datasets. The above-mentioned results show that SpaGCAC outperforms most existing methods, providing enhanced insights into tissue heterogeneity.

空间转录组学（ST）技术的最新进展实现了对基因表达谱的全面测量，同时保留了细胞的空间信息。结合基因表达谱和空间信息一直是识别空间功能域和基因最常用的方法。然而，现有的空间功能域破译方法大多更关注空间相邻结构，未能兼顾斑的自特性和空间结构依赖性。因此，我们提出了一种名为 SpaGCAC 的新型模型，它借助名为 AFSBGCN 的自适应特征空间平衡图卷积网络来识别空间域。AFSBGCN 可以通过在信息聚合过程中自适应地增加或降低自特征的权重，动态学习空间局部拓扑结构与点的自特征之间的关系。此外，为了更好地捕捉点的局部结构，SpaGCAC 采用了局部拓扑结构对比学习策略。同时，SpaGCAC 利用概率分布对比学习策略来提高属于同一类别的点的概率分布的相似性。我们在四个空间转录组数据集上验证了 SpaGCAC 在空间域识别方面的性能。与七种空间域识别方法相比，SpaGCAC在多切片DLPFC数据集上取得了最高的NMI中值0.683和第二高的ARI中值0.559。SpaGCAC 在其他三个单片数据集上都取得了最佳结果。上述结果表明，SpaGCAC 优于大多数现有方法，能更好地洞察组织异质性。

{"title":"Enhancing Spatial Domain Identification in Spatially Resolved Transcriptomics Using Graph Convolutional Networks With Adaptively Feature-Spatial Balance and Contrastive Learning","authors":"Xuena Liang;Junliang Shang;Jin-Xing Liu;Chun-Hou Zheng;Juan Wang","doi":"10.1109/TCBB.2024.3469164","DOIUrl":"10.1109/TCBB.2024.3469164","url":null,"abstract":"Recent advancements in spatially transcriptomics (ST) technologies have enabled the comprehensive measurement of gene expression profiles while preserving the spatial information of cells. Combining gene expression profiles and spatial information has been the most commonly used method to identify spatial functional domains and genes. However, most existing spatial domain decipherer methods are more focused on spatially neighboring structures and fail to take into account balancing the self-characteristics and the spatial structure dependency of spots. Therefore, we propose a novel model called SpaGCAC, which recognizes spatial domains with the help of an adaptive feature-spatial balanced graph convolutional network named AFSBGCN. The AFSBGCN can dynamically learn the relationship between spatial local topology structures and the self-characteristics of spots by adaptively increasing or declining the weight on the self-characteristics during message aggregation. Moreover, to better capture the local structures of spots, SpaGCAC exploits a local topology structure contrastive learning strategy. Meanwhile, SpaGCAC utilizes a probability distribution contrastive learning strategy to increase the similarity of probability distributions for points belonging to the same category. We validate the performance of SpaGCAC for spatial domain identification on four spatial transcriptomic datasets. In comparison with seven spatial domain recognition methods, SpaGCAC achieved the highest NMI median of 0.683 and the second highest ARI median of 0.559 on the multi-slice DLPFC dataset. SpaGCAC achieved the best results on all three other single-slice datasets. The above-mentioned results show that SpaGCAC outperforms most existing methods, providing enhanced insights into tissue heterogeneity.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2406-2417"},"PeriodicalIF":3.6,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142345922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Protein-Context Enhanced Master Slave Framework for Zero-Shot Drug Target Interaction Prediction 用于零注射药物靶点相互作用预测的蛋白质-上下文增强型主从框架。

IF 3.6 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Pub Date : 2024-09-27 DOI: 10.1109/TCBB.2024.3468434

Yuyang Xu;Jingbo Zhou;Haochao Ying;Jintai Chen;Wei Chen;Danny Z. Chen;Jian Wu

Drug Target Interaction (DTI) prediction plays a crucial role in in-silico drug discovery, especially for deep learning (DL) models. Along this line, existing methods usually first extract features from drugs and target proteins, and use drug-target pairs to train DL models. However, these DL-based methods essentially rely on similar structures and patterns defined by the homologous proteins from a large amount of data. When few drug-target interactions are known for a newly discovered protein and its homologous proteins, prediction performance can suffer notable reduction. In this paper, we propose a novel Protein-Context enhanced Master/Slave Framework (PCMS), for zero-shot DTI prediction. This framework facilitates the efficient discovery of ligands for newly discovered target proteins, addressing the challenge of predicting interactions without prior data. Specifically, the PCMS framework consists of two main components: a Master Learner and a Slave Learner. The Master Learner first learns the target protein context information, and then adaptively generates the corresponding parameters for the Slave Learner. The Slave Learner then perform zero-shot DTI prediction in different protein contexts. Extensive experiments verify the effectiveness of our PCMS compared to state-of-the-art methods in various metrics on two public datasets.

药物靶点相互作用（DTI）预测在硅内药物发现中起着至关重要的作用，尤其是对深度学习（DL）模型而言。根据这一思路，现有方法通常首先从药物和靶蛋白中提取特征，然后使用药物-靶蛋白对训练 DL 模型。然而，这些基于 DL 的方法基本上依赖于大量数据中同源蛋白质所定义的相似结构和模式。当已知的新发现蛋白质及其同源蛋白质的药物-靶标相互作用很少时，预测性能就会明显下降。在本文中，我们提出了一种新颖的蛋白质上下文增强型主从框架（PCMS），用于零次 DTI 预测。该框架有助于为新发现的目标蛋白质高效发现配体，解决了在没有先验数据的情况下预测相互作用的难题。具体来说，PCMS 框架由两个主要部分组成：主学习器和从学习器。主学习器首先学习目标蛋白质的上下文信息，然后自适应地为从学习器生成相应的参数。然后，从属学习器在不同的蛋白质上下文中执行零次 DTI 预测。在两个公开数据集上进行的大量实验验证了我们的 PCMS 在各种指标上与最先进方法相比的有效性。一旦论文被接受，我们将公开代码和处理过的数据。

{"title":"A Protein-Context Enhanced Master Slave Framework for Zero-Shot Drug Target Interaction Prediction","authors":"Yuyang Xu;Jingbo Zhou;Haochao Ying;Jintai Chen;Wei Chen;Danny Z. Chen;Jian Wu","doi":"10.1109/TCBB.2024.3468434","DOIUrl":"10.1109/TCBB.2024.3468434","url":null,"abstract":"Drug Target Interaction (DTI) prediction plays a crucial role in in-silico drug discovery, especially for deep learning (DL) models. Along this line, existing methods usually first extract features from drugs and target proteins, and use drug-target pairs to train DL models. However, these DL-based methods essentially rely on similar structures and patterns defined by the homologous proteins from a large amount of data. When few drug-target interactions are known for a newly discovered protein and its homologous proteins, prediction performance can suffer notable reduction. In this paper, we propose a novel Protein-Context enhanced Master/Slave Framework (PCMS), for zero-shot DTI prediction. This framework facilitates the efficient discovery of ligands for newly discovered target proteins, addressing the challenge of predicting interactions without prior data. Specifically, the PCMS framework consists of two main components: a Master Learner and a Slave Learner. The Master Learner first learns the target protein context information, and then adaptively generates the corresponding parameters for the Slave Learner. The Slave Learner then perform zero-shot DTI prediction in different protein contexts. Extensive experiments verify the effectiveness of our PCMS compared to state-of-the-art methods in various metrics on two public datasets.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2359-2370"},"PeriodicalIF":3.6,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142345921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Game-Theoretic Flux Balance Analysis Model for Predicting Stable Community Composition 预测稳定群落组成的博弈论通量平衡分析模型

IF 3.6 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Pub Date : 2024-09-27 DOI: 10.1109/TCBB.2024.3470592

Garud Iyengar;Mitch Perry

Models for microbial interactions attempt to understand and predict the steady state network of inter-species relationships in a community, e.g. competition for shared metabolites, and cooperation through cross-feeding. Flux balance analysis (FBA) is an approach that was introduced to model the interaction of a particular microbial species with its environment. This approach has been extended to analyzing interactions in a community of microbes; however, these approaches have two important drawbacks: first, one has to numerically solve a differential equation to identify the steady state, and second, there are no methods available to analyze the stability of the steady state. We propose a game theory based community FBA model wherein species compete to maximize their individual growth rate, and the state of the community is given by the resulting Nash equilibrium. We develop a computationally efficient method for directly computing the steady state biomasses and fluxes without solving a differential equation. We also develop a method to determine the stability of a steady state to perturbations in the biomasses and to invasion by new species. We report the results of applying our proposed framework to a small community of four E. coli mutants that compete for externally supplied glucose, as well as cooperate since the mutants are auxotrophic for metabolites exported by other mutants, and a more realistic model for a gut microbiome consisting of nine species.

微生物相互作用模型试图理解和预测群落中物种间关系的稳态网络，例如对共享代谢物的竞争和通过交叉进食进行的合作。通量平衡分析（FBA）是一种用于模拟特定微生物物种与其环境相互作用的方法。然而，这些方法有两个重要的缺点：首先，必须通过数值求解微分方程来确定稳态；其次，没有可用的方法来分析稳态的稳定性。我们提出了一种基于博弈论的群落 FBA 模型，在该模型中，物种通过竞争最大化各自的增长率，而群落的状态则由由此产生的纳什均衡给出。我们开发了一种计算高效的方法，无需求解微分方程即可直接计算稳态生物量和通量。我们还开发了一种方法来确定稳态对生物量扰动和新物种入侵的稳定性。我们报告了将我们提出的框架应用于一个由四个大肠杆菌突变体组成的小型群落的结果，这四个突变体既竞争外部提供的葡萄糖，又相互合作，因为突变体对其他突变体输出的代谢物具有辅助营养作用；我们还报告了一个由九个物种组成的肠道微生物群的更现实的模型。

{"title":"Game-Theoretic Flux Balance Analysis Model for Predicting Stable Community Composition","authors":"Garud Iyengar;Mitch Perry","doi":"10.1109/TCBB.2024.3470592","DOIUrl":"10.1109/TCBB.2024.3470592","url":null,"abstract":"Models for microbial interactions attempt to understand and predict the steady state network of inter-species relationships in a community, e.g. competition for shared metabolites, and cooperation through cross-feeding. Flux balance analysis (FBA) is an approach that was introduced to model the interaction of a particular microbial species with its environment. This approach has been extended to analyzing interactions in a community of microbes; however, these approaches have two important drawbacks: first, one has to numerically solve a differential equation to identify the steady state, and second, there are no methods available to analyze the stability of the steady state. We propose a game theory based community FBA model wherein species compete to maximize their individual growth rate, and the state of the community is given by the resulting Nash equilibrium. We develop a computationally efficient method for directly computing the steady state biomasses and fluxes without solving a differential equation. We also develop a method to determine the stability of a steady state to perturbations in the biomasses and to invasion by new species. We report the results of applying our proposed framework to a small community of four \u0000<italic>E. coli</i>\u0000 mutants that compete for externally supplied glucose, as well as cooperate since the mutants are auxotrophic for metabolites exported by other mutants, and a more realistic model for a gut microbiome consisting of nine species.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2394-2405"},"PeriodicalIF":3.6,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142345923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Incremental RPN: Hierarchical Region Proposal Network for Apple Leaf Disease Detection in Natural Environments 增量 RPN：用于自然环境中苹果叶病检测的分层区域建议网络

IF 3.6 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Pub Date : 2024-09-26 DOI: 10.1109/TCBB.2024.3469178

Haixi Zhang;Jiahui Yang;Chenyan Lv;Xing Wei;Haibin Han;Bin Liu

Apple leaf diseases can seriously affect apple production and quality, and accurately detecting them can improve the efficiency of disease monitoring. Owing to the complex natural growth environment, apple leaf lesions may be easily confused with background noise, leading to poor performance. In this study, a cascaded Incremental Region Proposal Network (Inc-RPN) is proposed to accurately detect apple leaf diseases in natural environments. The proposed Inc-RPN has a two-layer RPN architecture, where the precursor RPN is leveraged to generate diseased leaf proposals, and the successor RPN focuses on extracting target disease spots based on diseased leaf proposals. In the successor RPN, a low-level feature aggregation module is designed to fully utilize the bridged features and preserve the semantic information of the target disease spots. An incremental module is also leveraged to extract aggregated diseased leaf features and target disease spot features. Finally, a novel position anchor generator is designed to generate anchors based on diseased leaf proposals. The experimental results show that the proposed Inc-RPN performs very well on the FALD_CED and Apple Leaf Disease datasets, showing that it can accurately perform apple leaf disease detection tasks.

苹果叶片病害会严重影响苹果的产量和质量，准确检测苹果叶片病害可以提高病害监测的效率。由于自然生长环境复杂，苹果叶片病害很容易与背景噪声混淆，导致检测效果不佳。本研究提出了一种级联递增区域建议网络（Inc-RPN），用于准确检测自然环境中的苹果叶片病害。所提出的 Inc-RPN 采用双层 RPN 架构，其中前导 RPN 用于生成病叶建议，后继 RPN 侧重于根据病叶建议提取目标病斑。在后继 RPN 中，设计了一个底层特征聚合模块，以充分利用桥接特征并保留目标病斑的语义信息。此外，还利用增量模块提取聚合的病叶特征和目标病斑特征。最后，设计了一个新颖的位置锚点生成器，根据病叶建议生成锚点。实验结果表明，所提出的 Inc-RPN 在 FALD_CED 和苹果叶病数据集上表现出色，表明它能准确地执行苹果叶病检测任务。

{"title":"Incremental RPN: Hierarchical Region Proposal Network for Apple Leaf Disease Detection in Natural Environments","authors":"Haixi Zhang;Jiahui Yang;Chenyan Lv;Xing Wei;Haibin Han;Bin Liu","doi":"10.1109/TCBB.2024.3469178","DOIUrl":"10.1109/TCBB.2024.3469178","url":null,"abstract":"Apple leaf diseases can seriously affect apple production and quality, and accurately detecting them can improve the efficiency of disease monitoring. Owing to the complex natural growth environment, apple leaf lesions may be easily confused with background noise, leading to poor performance. In this study, a cascaded Incremental Region Proposal Network (Inc-RPN) is proposed to accurately detect apple leaf diseases in natural environments. The proposed Inc-RPN has a two-layer RPN architecture, where the precursor RPN is leveraged to generate diseased leaf proposals, and the successor RPN focuses on extracting target disease spots based on diseased leaf proposals. In the successor RPN, a low-level feature aggregation module is designed to fully utilize the bridged features and preserve the semantic information of the target disease spots. An incremental module is also leveraged to extract aggregated diseased leaf features and target disease spot features. Finally, a novel position anchor generator is designed to generate anchors based on diseased leaf proposals. The experimental results show that the proposed Inc-RPN performs very well on the FALD_CED and Apple Leaf Disease datasets, showing that it can accurately perform apple leaf disease detection tasks.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2418-2431"},"PeriodicalIF":3.6,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142345927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Vina-GPU 2.1: Towards Further Optimizing Docking Speed and Precision of AutoDock Vina and Its Derivatives Vina-GPU 2.1：进一步优化 AutoDock Vina 及其衍生产品的对接速度和精度。

IF 3.6 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Pub Date : 2024-09-25 DOI: 10.1109/TCBB.2024.3467127

Shidi Tang;Ji Ding;Xiangyu Zhu;Zheng Wang;Haitao Zhao;Jiansheng Wu

AutoDock Vina and its derivatives have established themselves as a prevailing pipeline for virtual screening in contemporary drug discovery. Our Vina-GPU method leverages the parallel computing power of GPUs to accelerate AutoDock Vina, and Vina-GPU 2.0 further enhances the speed of AutoDock Vina and its derivatives. Given the prevalence of large virtual screens in modern drug discovery, the improvement of speed and accuracy in virtual screening has become a longstanding challenge. In this study, we propose Vina-GPU 2.1, aimed at enhancing the docking speed and precision of AutoDock Vina and its derivatives through the integration of novel algorithms to facilitate improved docking and virtual screening outcomes. Building upon the foundations laid by Vina-GPU 2.0, we introduce a novel algorithm, namely Reduced Iteration and Low Complexity BFGS (RILC-BFGS), designed to expedite the most time-consuming operation. Additionally, we implement grid cache optimization to further enhance the docking speed. Furthermore, we employ optimal strategies to individually optimize the structures of ligands, receptors, and binding pockets, thereby enhancing the docking precision. To assess the performance of Vina-GPU 2.1, we conduct extensive virtual screening experiments on three prominent targets, utilizing two fundamental compound libraries and seven docking tools. Our results demonstrate that Vina-GPU 2.1 achieves an average 4.97-fold acceleration in docking speed and an average 342% improvement in EF1% compared to Vina-GPU 2.0.

AutoDock Vina 及其衍生产品已成为当代药物发现领域虚拟筛选的主流管道。我们的 Vina-GPU 方法利用 GPU 的并行计算能力来加速 AutoDock Vina，Vina-GPU 2.0 进一步提高了 AutoDock Vina 及其衍生产品的速度。鉴于大型虚拟筛选在现代药物发现中的普遍存在，如何提高虚拟筛选的速度和准确性已成为一项长期挑战。在本研究中，我们提出了 Vina-GPU 2.1，旨在通过集成新算法提高 AutoDock Vina 及其衍生产品的对接速度和精度，从而促进对接和虚拟筛选结果的改进。在 Vina-GPU 2.0 的基础上，我们引入了一种新算法，即减少迭代和低复杂度 BFGS（RILC-BFGS），旨在加快最耗时的操作。此外，我们还实施了网格缓存优化，以进一步提高对接速度。此外，我们还采用优化策略来单独优化配体、受体和结合口袋的结构，从而提高对接精度。为了评估 Vina-GPU 2.1 的性能，我们利用两个基本化合物库和七个对接工具对三个主要靶点进行了广泛的虚拟筛选实验。结果表明，与 Vina-GPU 2.0 相比，Vina-GPU 2.1 的对接速度平均提高了 4.97 倍，EF1% 平均提高了 342%。Vina-GPU 2.1 的源代码和工具免费提供，并附有全面的说明和示例。

{"title":"Vina-GPU 2.1: Towards Further Optimizing Docking Speed and Precision of AutoDock Vina and Its Derivatives","authors":"Shidi Tang;Ji Ding;Xiangyu Zhu;Zheng Wang;Haitao Zhao;Jiansheng Wu","doi":"10.1109/TCBB.2024.3467127","DOIUrl":"10.1109/TCBB.2024.3467127","url":null,"abstract":"AutoDock Vina and its derivatives have established themselves as a prevailing pipeline for virtual screening in contemporary drug discovery. Our Vina-GPU method leverages the parallel computing power of GPUs to accelerate AutoDock Vina, and Vina-GPU 2.0 further enhances the speed of AutoDock Vina and its derivatives. Given the prevalence of large virtual screens in modern drug discovery, the improvement of speed and accuracy in virtual screening has become a longstanding challenge. In this study, we propose Vina-GPU 2.1, aimed at enhancing the docking speed and precision of AutoDock Vina and its derivatives through the integration of novel algorithms to facilitate improved docking and virtual screening outcomes. Building upon the foundations laid by Vina-GPU 2.0, we introduce a novel algorithm, namely Reduced Iteration and Low Complexity BFGS (RILC-BFGS), designed to expedite the most time-consuming operation. Additionally, we implement grid cache optimization to further enhance the docking speed. Furthermore, we employ optimal strategies to individually optimize the structures of ligands, receptors, and binding pockets, thereby enhancing the docking precision. To assess the performance of Vina-GPU 2.1, we conduct extensive virtual screening experiments on three prominent targets, utilizing two fundamental compound libraries and seven docking tools. Our results demonstrate that Vina-GPU 2.1 achieves an average 4.97-fold acceleration in docking speed and an average 342% improvement in EF1% compared to Vina-GPU 2.0.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2382-2393"},"PeriodicalIF":3.6,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142345933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MetalPrognosis: A Biological Language Model-Based Approach for Disease-Associated Mutations in Metal-Binding Site Prediction MetalPrognosis：基于生物语言模型的金属结合部位疾病相关突变预测方法。

IF 3.6 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Pub Date : 2024-09-25 DOI: 10.1109/TCBB.2024.3467093

Runchang Jia;Zhijie He;Cong Wang;Xudong Guo;Fuyi Li

Protein-metal ion interactions play a central role in the onset of numerous diseases. When amino acid changes lead to missense mutations in metal-binding sites, the disrupted interaction with metal ions can compromise protein function, potentially causing severe human ailments. Identifying these disease-associated mutation sites within metal-binding regions is paramount for understanding protein function and fostering innovative drug development. While some computational methods aim to tackle this challenge, they often fall short in accuracy, commonly due to manual feature extraction and the absence of structural data. We introduce MetalPrognosis, an innovative, alignment-free solution that predicts disease-associated mutations within metal-binding sites of metalloproteins with heightened precision. Rather than relying on manual feature extraction, MetalPrognosis employs sliding window sequences as input, extracting deep semantic insights from pre-trained protein language models. These insights are then incorporated into a convolutional neural network, facilitating the derivation of intricate features. Comparative evaluations show MetalPrognosis outperforms leading methodologies like MCCNN and M-Ionic across various metalloprotein test sets. Furthermore, an ablation study reiterates the effectiveness of our model architecture.

蛋白质与金属离子之间的相互作用在许多疾病的发病中起着核心作用。当氨基酸变化导致金属结合位点发生错义突变时，与金属离子的相互作用就会破坏蛋白质的功能，从而可能导致严重的人类疾病。识别金属结合区域内这些与疾病相关的突变位点，对于了解蛋白质功能和促进创新药物开发至关重要。虽然一些计算方法旨在应对这一挑战，但它们的准确性往往不高，这通常是由于人工特征提取和缺乏结构数据造成的。我们介绍的 MetalPrognosis 是一种创新的无配准解决方案，它能更精确地预测金属蛋白金属结合位点内与疾病相关的突变。MetalPrognosis 不依赖人工特征提取，而是采用滑动窗口序列作为输入，从预先训练好的蛋白质语言模型中提取深刻的语义见解。然后将这些见解纳入卷积神经网络，促进复杂特征的提取。比较评估显示，在各种金属蛋白测试集中，MetalPrognosis 的表现优于 MCCNN 和 M-Ionic 等领先方法。此外，一项消融研究重申了我们模型架构的有效性。为了方便公众使用，我们已将 MetalPrognosis 的数据集、源代码和训练好的模型放在 http://metalprognosis.unimelb-biotools.cloud.edu.au/ 网站上。

{"title":"MetalPrognosis: A Biological Language Model-Based Approach for Disease-Associated Mutations in Metal-Binding Site Prediction","authors":"Runchang Jia;Zhijie He;Cong Wang;Xudong Guo;Fuyi Li","doi":"10.1109/TCBB.2024.3467093","DOIUrl":"10.1109/TCBB.2024.3467093","url":null,"abstract":"Protein-metal ion interactions play a central role in the onset of numerous diseases. When amino acid changes lead to missense mutations in metal-binding sites, the disrupted interaction with metal ions can compromise protein function, potentially causing severe human ailments. Identifying these disease-associated mutation sites within metal-binding regions is paramount for understanding protein function and fostering innovative drug development. While some computational methods aim to tackle this challenge, they often fall short in accuracy, commonly due to manual feature extraction and the absence of structural data. We introduce MetalPrognosis, an innovative, alignment-free solution that predicts disease-associated mutations within metal-binding sites of metalloproteins with heightened precision. Rather than relying on manual feature extraction, MetalPrognosis employs sliding window sequences as input, extracting deep semantic insights from pre-trained protein language models. These insights are then incorporated into a convolutional neural network, facilitating the derivation of intricate features. Comparative evaluations show MetalPrognosis outperforms leading methodologies like MCCNN and M-Ionic across various metalloprotein test sets. Furthermore, an ablation study reiterates the effectiveness of our model architecture.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2340-2348"},"PeriodicalIF":3.6,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142345928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MISSH: Fast Hashing of Multiple Spaced Seeds MISSH：多间隔种子快速散列。

IF 3.6 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Pub Date : 2024-09-25 DOI: 10.1109/TCBB.2024.3467368

Eleonora Mian;Enrico Petrucci;Cinzia Pizzi;Matteo Comin

Alignment-free analysis of sequences has revolutionized the high-throughput processing of sequencing data within numerous bioinformatics pipelines. Hashing

$k$

-mers represents a common function across various alignment-free applications, serving as a crucial tool for indexing, querying, and rapid similarity searching. More recently, spaced seeds, a specialized pattern that accommodates errors or mutations, have become a standard choice over traditional

$k$

-mers. Spaced seeds offer enhanced sensitivity in many applications when compared to

$k$

-mers. However, it's important to note that hashing spaced seeds significantly increases computational time. Furthermore, if multiple spaced seeds are employed, accuracy can be further improved, albeit at the expense of longer processing times. This paper addresses the challenge of efficiently hashing multiple spaced seeds. The proposed algorithms leverage the similarity of adjacent spaced seed hash values within an input sequence, allowing for the swift computation of subsequent hashes. Our experimental results, conducted across various tests, demonstrate a remarkable performance improvement over previously suggested algorithms, with potential speedups of up to 20 times. Additionally, we apply these efficient spaced seed hashing algorithms to a metagenomic application, specifically the classification of reads using Clark-S (Ounit and Lonardi, 2016). Our findings reveal a substantial speedup, effectively mitigating the slowdown caused by the utilization of multiple spaced seeds.

序列的无配对分析彻底改变了众多生物信息学管道中对测序数据的高通量处理。散列 k-mers 是各种无配对应用的共同功能，是索引、查询和快速相似性搜索的重要工具。最近，间隔种子（一种可容纳错误或突变的专门模式）已成为传统 k-mers 的标准选择。在许多应用中，间隔种子比 k-mers具有更高的灵敏度。不过，值得注意的是，散列间隔种子会大大增加计算时间。此外，如果采用多个间隔种子，准确性还能进一步提高，但代价是需要更长的处理时间。本文解决了高效散列多个间隔种子的难题。所提出的算法利用了输入序列中相邻间隔种子哈希值的相似性，允许快速计算后续哈希值。我们在各种测试中得出的实验结果表明，与之前提出的算法相比，本文的性能有了显著提高，速度可能提高 20 倍。此外，我们还将这些高效的间隔种子散列算法应用于元基因组应用，特别是使用 Clark-S 算法对读数进行分类 [Ounit and Lonardi, 2016]。我们的研究结果表明，该算法的速度大幅提升，有效缓解了因使用多间隔种子而导致的速度减慢问题。

{"title":"MISSH: Fast Hashing of Multiple Spaced Seeds","authors":"Eleonora Mian;Enrico Petrucci;Cinzia Pizzi;Matteo Comin","doi":"10.1109/TCBB.2024.3467368","DOIUrl":"10.1109/TCBB.2024.3467368","url":null,"abstract":"Alignment-free analysis of sequences has revolutionized the high-throughput processing of sequencing data within numerous bioinformatics pipelines. Hashing \u0000<inline-formula><tex-math>$k$</tex-math></inline-formula>\u0000-mers represents a common function across various alignment-free applications, serving as a crucial tool for indexing, querying, and rapid similarity searching. More recently, spaced seeds, a specialized pattern that accommodates errors or mutations, have become a standard choice over traditional \u0000<inline-formula><tex-math>$k$</tex-math></inline-formula>\u0000-mers. Spaced seeds offer enhanced sensitivity in many applications when compared to \u0000<inline-formula><tex-math>$k$</tex-math></inline-formula>\u0000-mers. However, it's important to note that hashing spaced seeds significantly increases computational time. Furthermore, if multiple spaced seeds are employed, accuracy can be further improved, albeit at the expense of longer processing times. This paper addresses the challenge of efficiently hashing multiple spaced seeds. The proposed algorithms leverage the similarity of adjacent spaced seed hash values within an input sequence, allowing for the swift computation of subsequent hashes. Our experimental results, conducted across various tests, demonstrate a remarkable performance improvement over previously suggested algorithms, with potential speedups of up to 20 times. Additionally, we apply these efficient spaced seed hashing algorithms to a metagenomic application, specifically the classification of reads using Clark-S (Ounit and Lonardi, 2016). Our findings reveal a substantial speedup, effectively mitigating the slowdown caused by the utilization of multiple spaced seeds.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2330-2339"},"PeriodicalIF":3.6,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142345930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reinforced Metapath Optimization in Heterogeneous Information Networks for Drug-Target Interaction Prediction 异构信息网络中用于药物-靶点相互作用预测的强化元路径优化。

IF 3.6 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Pub Date : 2024-09-24 DOI: 10.1109/TCBB.2024.3467135

Ben Xu;Jianping Chen;Yunzhe Wang;Qiming Fu;You Lu

Graph neural networks offer an effective avenue for predicting drug-target interactions. In this domain, researchers have found that constructing heterogeneous information networks based on metapaths using diverse biological datasets enhances prediction performance. However, the performance of such methods is closely tied to the selection of metapaths and the compatibility between metapath subgraphs and graph neural networks. Most existing approaches still rely on fixed strategies for selecting metapaths and often fail to fully exploit node information along the metapaths, limiting the improvement in model performance. This paper introduces a novel method for predicting drug-target interactions by optimizing metapaths in heterogeneous information networks. On one hand, the method formulates the metapath optimization problem as a Markov decision process, using the enhancement of downstream network performance as a reward signal. Through iterative training of a reinforcement learning agent, a high-quality set of metapaths is learned. On the other hand, to fully leverage node information along the metapaths, the paper constructs subgraphs based on nodes along the metapaths. Different depths of subgraphs are processed using different graph convolutional neural network. The proposed method is validated using standard heterogeneous biological benchmark datasets. Experimental results on standard datasets show significant advantages over traditional methods.

图神经网络为预测药物-靶点相互作用提供了有效途径。在这一领域，研究人员发现，利用不同的生物数据集构建基于元图谱的异构信息网络可以提高预测性能。然而，这些方法的性能与元图的选择以及元图子图和图神经网络之间的兼容性密切相关。现有的大多数方法仍然依赖于固定的元路径选择策略，往往不能充分利用元路径上的节点信息，从而限制了模型性能的提高。本文介绍了一种在异构信息网络中通过优化元径预测药物-靶点相互作用的新方法。一方面，该方法将元路径优化问题表述为马尔可夫决策过程，将下游网络性能的提升作为奖励信号。通过强化学习代理的迭代训练，可以学习到一组高质量的元路径。另一方面，为了充分利用元路径上的节点信息，本文根据元路径上的节点构建子图。使用不同的图卷积神经网络处理不同深度的子图。本文使用标准异构生物基准数据集对所提出的方法进行了验证。标准数据集上的实验结果表明，该方法与传统方法相比具有显著优势。

{"title":"Reinforced Metapath Optimization in Heterogeneous Information Networks for Drug-Target Interaction Prediction","authors":"Ben Xu;Jianping Chen;Yunzhe Wang;Qiming Fu;You Lu","doi":"10.1109/TCBB.2024.3467135","DOIUrl":"10.1109/TCBB.2024.3467135","url":null,"abstract":"Graph neural networks offer an effective avenue for predicting drug-target interactions. In this domain, researchers have found that constructing heterogeneous information networks based on metapaths using diverse biological datasets enhances prediction performance. However, the performance of such methods is closely tied to the selection of metapaths and the compatibility between metapath subgraphs and graph neural networks. Most existing approaches still rely on fixed strategies for selecting metapaths and often fail to fully exploit node information along the metapaths, limiting the improvement in model performance. This paper introduces a novel method for predicting drug-target interactions by optimizing metapaths in heterogeneous information networks. On one hand, the method formulates the metapath optimization problem as a Markov decision process, using the enhancement of downstream network performance as a reward signal. Through iterative training of a reinforcement learning agent, a high-quality set of metapaths is learned. On the other hand, to fully leverage node information along the metapaths, the paper constructs subgraphs based on nodes along the metapaths. Different depths of subgraphs are processed using different graph convolutional neural network. The proposed method is validated using standard heterogeneous biological benchmark datasets. Experimental results on standard datasets show significant advantages over traditional methods.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2315-2329"},"PeriodicalIF":3.6,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142345932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Identification of Cancer Driver Genes based on Dynamic Incentive Model 基于动态激励模型的癌症驱动基因识别。

IF 3.6 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Pub Date : 2024-09-24 DOI: 10.1109/TCBB.2024.3467119

Zhipeng Hu;Gaoshi Li;Xinlong Luo;Wei Peng;Jiafei Liu;Xiaoshu Zhu;Jingli Wu

Cancer is a complex genomic mutation disease, and identifying cancer driver genes promotes the development of targeted drugs and personalized therapies. The current computational method takes less consideration of the relationship among features and the effect of noise in protein-protein interaction(PPI) data, resulting in a low recognition rate. In this paper, we propose a cancer driver genes identification method based on dynamic incentive model, DIM. This method firstly constructs a hypergraph to reduce the impact of false positive data in PPI. Then, the importance of genes in each hyperedge in hypergraph is considered from three perspectives, network and functional score(NFS) is proposed. By analyzing the relation among features, the dynamic incentive model is proposed to fuse NFS, the differential expression score of mRNA and the differential expression score of miRNA. DIM is compared with some classical methods on breast cancer, lung cancer, prostate cancer, and pan-cancer datasets. The results show that DIM has the best performance on statistical evaluation indicators, functional consistency and the partial area under the ROC curve, and has good cross-cancer capability.

癌症是一种复杂的基因组突变疾病，识别癌症驱动基因有助于靶向药物和个性化疗法的开发。目前的计算方法较少考虑蛋白质-蛋白质相互作用（PPI）数据中特征之间的关系和噪声的影响，导致识别率较低。本文提出了一种基于动态激励模型（DIM）的癌症驱动基因识别方法。该方法首先构建了一个超图，以减少 PPI 中假阳性数据的影响。然后，从网络和功能得分（NFS）三个角度考虑超图中每个超边中基因的重要性。通过分析特征之间的关系，提出了融合 NFS、mRNA 差异表达得分和 miRNA 差异表达得分的动态激励模型。在乳腺癌、肺癌、前列腺癌和泛癌症数据集上，将 DIM 与一些经典方法进行了比较。结果表明，DIM 在统计评价指标、功能一致性和 ROC 曲线下部分面积方面表现最佳，并具有良好的跨癌症能力。

{"title":"Identification of Cancer Driver Genes based on Dynamic Incentive Model","authors":"Zhipeng Hu;Gaoshi Li;Xinlong Luo;Wei Peng;Jiafei Liu;Xiaoshu Zhu;Jingli Wu","doi":"10.1109/TCBB.2024.3467119","DOIUrl":"10.1109/TCBB.2024.3467119","url":null,"abstract":"Cancer is a complex genomic mutation disease, and identifying cancer driver genes promotes the development of targeted drugs and personalized therapies. The current computational method takes less consideration of the relationship among features and the effect of noise in protein-protein interaction(PPI) data, resulting in a low recognition rate. In this paper, we propose a cancer driver genes identification method based on dynamic incentive model, DIM. This method firstly constructs a hypergraph to reduce the impact of false positive data in PPI. Then, the importance of genes in each hyperedge in hypergraph is considered from three perspectives, network and functional score(NFS) is proposed. By analyzing the relation among features, the dynamic incentive model is proposed to fuse NFS, the differential expression score of mRNA and the differential expression score of miRNA. DIM is compared with some classical methods on breast cancer, lung cancer, prostate cancer, and pan-cancer datasets. The results show that DIM has the best performance on statistical evaluation indicators, functional consistency and the partial area under the ROC curve, and has good cross-cancer capability.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2371-2381"},"PeriodicalIF":3.6,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142345924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0