首页 > 最新文献

IEEE/ACM Transactions on Computational Biology and Bioinformatics最新文献

英文 中文
Vina-GPU 2.1: Towards Further Optimizing Docking Speed and Precision of AutoDock Vina and Its Derivatives. Vina-GPU 2.1:进一步优化 AutoDock Vina 及其衍生产品的对接速度和精度。
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-25 DOI: 10.1109/TCBB.2024.3467127
Shidi Tang, Ji Ding, Xiangyu Zhu, Zheng Wang, Haitao Zhao, Jiansheng Wu

AutoDock Vina and its derivatives have established themselves as a prevailing pipeline for virtual screening in contemporary drug discovery. Our Vina-GPU method leverages the parallel computing power of GPUs to accelerate AutoDock Vina, and Vina-GPU 2.0 further enhances the speed of AutoDock Vina and its derivatives. Given the prevalence of large virtual screens in modern drug discovery, the improvement of speed and accuracy in virtual screening has become a longstanding challenge. In this study, we propose Vina-GPU 2.1, aimed at enhancing the docking speed and precision of AutoDock Vina and its derivatives through the integration of novel algorithms to facilitate improved docking and virtual screening outcomes. Building upon the foundations laid by Vina-GPU 2.0, we introduce a novel algorithm, namely Reduced Iteration and Low Complexity BFGS (RILC-BFGS), designed to expedite the most time-consuming operation. Additionally, we implement grid cache optimization to further enhance the docking speed. Furthermore, we employ optimal strategies to individually optimize the structures of ligands, receptors, and binding pockets, thereby enhancing the docking precision. To assess the performance of Vina-GPU 2.1, we conduct extensive virtual screening experiments on three prominent targets, utilizing two fundamental compound libraries and seven docking tools. Our results demonstrate that Vina-GPU 2.1 achieves an average 4.97-fold acceleration in docking speed and an average 342% improvement in EF1% compared to Vina-GPU 2.0. The source code and tools for Vina-GPU 2.1 are freely available accompanied by comprehensive instructions and illustrative examples.

AutoDock Vina 及其衍生产品已成为当代药物发现领域虚拟筛选的主流管道。我们的 Vina-GPU 方法利用 GPU 的并行计算能力来加速 AutoDock Vina,Vina-GPU 2.0 进一步提高了 AutoDock Vina 及其衍生产品的速度。鉴于大型虚拟筛选在现代药物发现中的普遍存在,如何提高虚拟筛选的速度和准确性已成为一项长期挑战。在本研究中,我们提出了 Vina-GPU 2.1,旨在通过集成新算法提高 AutoDock Vina 及其衍生产品的对接速度和精度,从而促进对接和虚拟筛选结果的改进。在 Vina-GPU 2.0 的基础上,我们引入了一种新算法,即减少迭代和低复杂度 BFGS(RILC-BFGS),旨在加快最耗时的操作。此外,我们还实施了网格缓存优化,以进一步提高对接速度。此外,我们还采用优化策略来单独优化配体、受体和结合口袋的结构,从而提高对接精度。为了评估 Vina-GPU 2.1 的性能,我们利用两个基本化合物库和七个对接工具对三个主要靶点进行了广泛的虚拟筛选实验。结果表明,与 Vina-GPU 2.0 相比,Vina-GPU 2.1 的对接速度平均提高了 4.97 倍,EF1% 平均提高了 342%。Vina-GPU 2.1 的源代码和工具免费提供,并附有全面的说明和示例。
{"title":"Vina-GPU 2.1: Towards Further Optimizing Docking Speed and Precision of AutoDock Vina and Its Derivatives.","authors":"Shidi Tang, Ji Ding, Xiangyu Zhu, Zheng Wang, Haitao Zhao, Jiansheng Wu","doi":"10.1109/TCBB.2024.3467127","DOIUrl":"10.1109/TCBB.2024.3467127","url":null,"abstract":"<p><p>AutoDock Vina and its derivatives have established themselves as a prevailing pipeline for virtual screening in contemporary drug discovery. Our Vina-GPU method leverages the parallel computing power of GPUs to accelerate AutoDock Vina, and Vina-GPU 2.0 further enhances the speed of AutoDock Vina and its derivatives. Given the prevalence of large virtual screens in modern drug discovery, the improvement of speed and accuracy in virtual screening has become a longstanding challenge. In this study, we propose Vina-GPU 2.1, aimed at enhancing the docking speed and precision of AutoDock Vina and its derivatives through the integration of novel algorithms to facilitate improved docking and virtual screening outcomes. Building upon the foundations laid by Vina-GPU 2.0, we introduce a novel algorithm, namely Reduced Iteration and Low Complexity BFGS (RILC-BFGS), designed to expedite the most time-consuming operation. Additionally, we implement grid cache optimization to further enhance the docking speed. Furthermore, we employ optimal strategies to individually optimize the structures of ligands, receptors, and binding pockets, thereby enhancing the docking precision. To assess the performance of Vina-GPU 2.1, we conduct extensive virtual screening experiments on three prominent targets, utilizing two fundamental compound libraries and seven docking tools. Our results demonstrate that Vina-GPU 2.1 achieves an average 4.97-fold acceleration in docking speed and an average 342% improvement in EF1% compared to Vina-GPU 2.0. The source code and tools for Vina-GPU 2.1 are freely available accompanied by comprehensive instructions and illustrative examples.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142345933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MetalPrognosis: A Biological Language Model-Based Approach for Disease-Associated Mutations in Metal-Binding Site Prediction. MetalPrognosis:基于生物语言模型的金属结合部位疾病相关突变预测方法。
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-25 DOI: 10.1109/TCBB.2024.3467093
Runchang Jia, Zhijie He, Cong Wang, Xudong Guo, Fuyi Li

Protein-metal ion interactions play a central role in the onset of numerous diseases. When amino acid changes lead to missense mutations in metal-binding sites, the disrupted interaction with metal ions can compromise protein function, potentially causing severe human ailments. Identifying these disease-associated mutation sites within metal-binding regions is paramount for understanding protein function and fostering innovative drug development. While some computational methods aim to tackle this challenge, they often fall short in accuracy, commonly due to manual feature extraction and the absence of structural data. We introduce MetalPrognosis, an innovative, alignment-free solution that predicts disease-associated mutations within metal-binding sites of metalloproteins with heightened precision. Rather than relying on manual feature extraction, MetalPrognosis employs sliding window sequences as input, extracting deep semantic insights from pre-trained protein language models. These insights are then incorporated into a convolutional neural network, facilitating the derivation of intricate features. Comparative evaluations show MetalPrognosis outperforms leading methodologies like MCCNN and M-Ionic across various metalloprotein test sets. Furthermore, an ablation study reiterates the effectiveness of our model architecture. To facilitate public use, we have made the datasets, source codes, and trained models for MetalPrognosis online available at http://metalprognosis.unimelb-biotools.cloud.edu.au/.

蛋白质与金属离子之间的相互作用在许多疾病的发病中起着核心作用。当氨基酸变化导致金属结合位点发生错义突变时,与金属离子的相互作用就会破坏蛋白质的功能,从而可能导致严重的人类疾病。识别金属结合区域内这些与疾病相关的突变位点,对于了解蛋白质功能和促进创新药物开发至关重要。虽然一些计算方法旨在应对这一挑战,但它们的准确性往往不高,这通常是由于人工特征提取和缺乏结构数据造成的。我们介绍的 MetalPrognosis 是一种创新的无配准解决方案,它能更精确地预测金属蛋白金属结合位点内与疾病相关的突变。MetalPrognosis 不依赖人工特征提取,而是采用滑动窗口序列作为输入,从预先训练好的蛋白质语言模型中提取深刻的语义见解。然后将这些见解纳入卷积神经网络,促进复杂特征的提取。比较评估显示,在各种金属蛋白测试集中,MetalPrognosis 的表现优于 MCCNN 和 M-Ionic 等领先方法。此外,一项消融研究重申了我们模型架构的有效性。为了方便公众使用,我们已将 MetalPrognosis 的数据集、源代码和训练好的模型放在 http://metalprognosis.unimelb-biotools.cloud.edu.au/ 网站上。
{"title":"MetalPrognosis: A Biological Language Model-Based Approach for Disease-Associated Mutations in Metal-Binding Site Prediction.","authors":"Runchang Jia, Zhijie He, Cong Wang, Xudong Guo, Fuyi Li","doi":"10.1109/TCBB.2024.3467093","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3467093","url":null,"abstract":"<p><p>Protein-metal ion interactions play a central role in the onset of numerous diseases. When amino acid changes lead to missense mutations in metal-binding sites, the disrupted interaction with metal ions can compromise protein function, potentially causing severe human ailments. Identifying these disease-associated mutation sites within metal-binding regions is paramount for understanding protein function and fostering innovative drug development. While some computational methods aim to tackle this challenge, they often fall short in accuracy, commonly due to manual feature extraction and the absence of structural data. We introduce MetalPrognosis, an innovative, alignment-free solution that predicts disease-associated mutations within metal-binding sites of metalloproteins with heightened precision. Rather than relying on manual feature extraction, MetalPrognosis employs sliding window sequences as input, extracting deep semantic insights from pre-trained protein language models. These insights are then incorporated into a convolutional neural network, facilitating the derivation of intricate features. Comparative evaluations show MetalPrognosis outperforms leading methodologies like MCCNN and M-Ionic across various metalloprotein test sets. Furthermore, an ablation study reiterates the effectiveness of our model architecture. To facilitate public use, we have made the datasets, source codes, and trained models for MetalPrognosis online available at http://metalprognosis.unimelb-biotools.cloud.edu.au/.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142345928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MISSH: Fast Hashing of Multiple Spaced Seeds. MISSH:多间隔种子快速散列。
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-25 DOI: 10.1109/TCBB.2024.3467368
Eleonora Mian, Enrico Petrucci, Cinzia Pizzi, Matteo Comin

Alignment-free analysis of sequences has revolutionized the high-throughput processing of sequencing data within numerous bioinformatics pipelines. Hashing k-mers represents a common function across various alignment-free applications, serving as a crucial tool for indexing, querying, and rapid similarity searching. More recently, spaced seeds, a specialized pattern that accommodates errors or mutations, have become a standard choice over traditional k-mers. Spaced seeds offer enhanced sensitivity in many applications when compared to k-mers. However, it's important to note that hashing spaced seeds significantly increases computational time. Furthermore, if multiple spaced seeds are employed, accuracy can be further improved, albeit at the expense of longer processing times. This paper addresses the challenge of efficiently hashing multiple spaced seeds. The proposed algorithms leverage the similarity of adjacent spaced seed hash values within an input sequence, allowing for the swift computation of subsequent hashes. Our experimental results, conducted across various tests, demonstrate a remarkable performance improvement over previously suggested algorithms, with potential speedups of up to 20 times. Additionally, we apply these efficient spaced seed hashing algorithms to a metagenomic application, specifically the classification of reads using Clark-S [Ounit and Lonardi, 2016]. Our findings reveal a substantial speedup, effectively mitigating the slowdown caused by the utilization of multiple spaced seeds.

序列的无配对分析彻底改变了众多生物信息学管道中对测序数据的高通量处理。散列 k-mers 是各种无配对应用的共同功能,是索引、查询和快速相似性搜索的重要工具。最近,间隔种子(一种可容纳错误或突变的专门模式)已成为传统 k-mers 的标准选择。在许多应用中,间隔种子比 k-mers具有更高的灵敏度。不过,值得注意的是,散列间隔种子会大大增加计算时间。此外,如果采用多个间隔种子,准确性还能进一步提高,但代价是需要更长的处理时间。本文解决了高效散列多个间隔种子的难题。所提出的算法利用了输入序列中相邻间隔种子哈希值的相似性,允许快速计算后续哈希值。我们在各种测试中得出的实验结果表明,与之前提出的算法相比,本文的性能有了显著提高,速度可能提高 20 倍。此外,我们还将这些高效的间隔种子散列算法应用于元基因组应用,特别是使用 Clark-S 算法对读数进行分类 [Ounit and Lonardi, 2016]。我们的研究结果表明,该算法的速度大幅提升,有效缓解了因使用多间隔种子而导致的速度减慢问题。
{"title":"MISSH: Fast Hashing of Multiple Spaced Seeds.","authors":"Eleonora Mian, Enrico Petrucci, Cinzia Pizzi, Matteo Comin","doi":"10.1109/TCBB.2024.3467368","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3467368","url":null,"abstract":"<p><p>Alignment-free analysis of sequences has revolutionized the high-throughput processing of sequencing data within numerous bioinformatics pipelines. Hashing k-mers represents a common function across various alignment-free applications, serving as a crucial tool for indexing, querying, and rapid similarity searching. More recently, spaced seeds, a specialized pattern that accommodates errors or mutations, have become a standard choice over traditional k-mers. Spaced seeds offer enhanced sensitivity in many applications when compared to k-mers. However, it's important to note that hashing spaced seeds significantly increases computational time. Furthermore, if multiple spaced seeds are employed, accuracy can be further improved, albeit at the expense of longer processing times. This paper addresses the challenge of efficiently hashing multiple spaced seeds. The proposed algorithms leverage the similarity of adjacent spaced seed hash values within an input sequence, allowing for the swift computation of subsequent hashes. Our experimental results, conducted across various tests, demonstrate a remarkable performance improvement over previously suggested algorithms, with potential speedups of up to 20 times. Additionally, we apply these efficient spaced seed hashing algorithms to a metagenomic application, specifically the classification of reads using Clark-S [Ounit and Lonardi, 2016]. Our findings reveal a substantial speedup, effectively mitigating the slowdown caused by the utilization of multiple spaced seeds.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142345930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reinforced Metapath Optimization in Heterogeneous Information Networks for Drug-Target Interaction Prediction. 异构信息网络中用于药物-靶点相互作用预测的强化元路径优化。
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-24 DOI: 10.1109/TCBB.2024.3467135
Ben Xu, Jianping Chen, Yunzhe Wang, Qiming Fu, You Lu

Graph neural networks offer an effective avenue for predicting drug-target interactions. In this domain, researchers have found that constructing heterogeneous information networks based on metapaths using diverse biological datasets enhances prediction performance. However, the performance of such methods is closely tied to the selection of metapaths and the compatibility between metapath subgraphs and graph neural networks. Most existing approaches still rely on fixed strategies for selecting metapaths and often fail to fully exploit node information along the metapaths, limiting the improvement in model performance. This paper introduces a novel method for predicting drug-target interactions by optimizing metapaths in heterogeneous information networks. On one hand, the method formulates the metapath optimization problem as a Markov decision process, using the enhancement of downstream network performance as a reward signal. Through iterative training of a reinforcement learning agent, a high-quality set of metapaths is learned. On the other hand, to fully leverage node information along the metapaths, the paper constructs subgraphs based on nodes along the metapaths. Different depths of subgraphs are processed using different graph convolutional neural network. The proposed method is validated using standard heterogeneous biological benchmark datasets. Experimental results on standard datasets show significant advantages over traditional methods.

图神经网络为预测药物-靶点相互作用提供了有效途径。在这一领域,研究人员发现,利用不同的生物数据集构建基于元图谱的异构信息网络可以提高预测性能。然而,这些方法的性能与元图的选择以及元图子图和图神经网络之间的兼容性密切相关。现有的大多数方法仍然依赖于固定的元路径选择策略,往往不能充分利用元路径上的节点信息,从而限制了模型性能的提高。本文介绍了一种在异构信息网络中通过优化元径预测药物-靶点相互作用的新方法。一方面,该方法将元路径优化问题表述为马尔可夫决策过程,将下游网络性能的提升作为奖励信号。通过强化学习代理的迭代训练,可以学习到一组高质量的元路径。另一方面,为了充分利用元路径上的节点信息,本文根据元路径上的节点构建子图。使用不同的图卷积神经网络处理不同深度的子图。本文使用标准异构生物基准数据集对所提出的方法进行了验证。标准数据集上的实验结果表明,该方法与传统方法相比具有显著优势。
{"title":"Reinforced Metapath Optimization in Heterogeneous Information Networks for Drug-Target Interaction Prediction.","authors":"Ben Xu, Jianping Chen, Yunzhe Wang, Qiming Fu, You Lu","doi":"10.1109/TCBB.2024.3467135","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3467135","url":null,"abstract":"<p><p>Graph neural networks offer an effective avenue for predicting drug-target interactions. In this domain, researchers have found that constructing heterogeneous information networks based on metapaths using diverse biological datasets enhances prediction performance. However, the performance of such methods is closely tied to the selection of metapaths and the compatibility between metapath subgraphs and graph neural networks. Most existing approaches still rely on fixed strategies for selecting metapaths and often fail to fully exploit node information along the metapaths, limiting the improvement in model performance. This paper introduces a novel method for predicting drug-target interactions by optimizing metapaths in heterogeneous information networks. On one hand, the method formulates the metapath optimization problem as a Markov decision process, using the enhancement of downstream network performance as a reward signal. Through iterative training of a reinforcement learning agent, a high-quality set of metapaths is learned. On the other hand, to fully leverage node information along the metapaths, the paper constructs subgraphs based on nodes along the metapaths. Different depths of subgraphs are processed using different graph convolutional neural network. The proposed method is validated using standard heterogeneous biological benchmark datasets. Experimental results on standard datasets show significant advantages over traditional methods.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142345932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identification of cancer driver genes based on dynamic incentive model. 基于动态激励模型的癌症驱动基因识别。
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-24 DOI: 10.1109/TCBB.2024.3467119
Zhipeng Hu, Gaoshi Li, Xinlong Luo, Wei Peng, Jiafei Liu, Xiaoshu Zhu, Jingli Wu

Cancer is a complex genomic mutation disease, and identifying cancer driver genes promotes the development of targeted drugs and personalized therapies. The current computational method takes less consideration of the relationship among features and the effect of noise in protein-protein interaction(PPI) data, resulting in a low recognition rate. In this paper, we propose a cancer driver genes identification method based on dynamic incentive model, DIM. This method firstly constructs a hypergraph to reduce the impact of false positive data in PPI. Then, the importance of genes in each hyperedge in hypergraph is considered from three perspectives, network and functional score(NFS) is proposed. By analyzing the relation among features, the dynamic incentive model is proposed to fuse NFS, the differential expression score of mRNA and the differential expression score of miRNA. DIM is compared with some classical methods on breast cancer, lung cancer, prostate cancer, and pan-cancer datasets. The results show that DIM has the best performance on statistical evaluation indicators, functional consistency and the partial area under the ROC curve, and has good cross-cancer capability.

癌症是一种复杂的基因组突变疾病,识别癌症驱动基因有助于靶向药物和个性化疗法的开发。目前的计算方法较少考虑蛋白质-蛋白质相互作用(PPI)数据中特征之间的关系和噪声的影响,导致识别率较低。本文提出了一种基于动态激励模型(DIM)的癌症驱动基因识别方法。该方法首先构建了一个超图,以减少 PPI 中假阳性数据的影响。然后,从网络和功能得分(NFS)三个角度考虑超图中每个超边中基因的重要性。通过分析特征之间的关系,提出了融合 NFS、mRNA 差异表达得分和 miRNA 差异表达得分的动态激励模型。在乳腺癌、肺癌、前列腺癌和泛癌症数据集上,将 DIM 与一些经典方法进行了比较。结果表明,DIM 在统计评价指标、功能一致性和 ROC 曲线下部分面积方面表现最佳,并具有良好的跨癌症能力。
{"title":"Identification of cancer driver genes based on dynamic incentive model.","authors":"Zhipeng Hu, Gaoshi Li, Xinlong Luo, Wei Peng, Jiafei Liu, Xiaoshu Zhu, Jingli Wu","doi":"10.1109/TCBB.2024.3467119","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3467119","url":null,"abstract":"<p><p>Cancer is a complex genomic mutation disease, and identifying cancer driver genes promotes the development of targeted drugs and personalized therapies. The current computational method takes less consideration of the relationship among features and the effect of noise in protein-protein interaction(PPI) data, resulting in a low recognition rate. In this paper, we propose a cancer driver genes identification method based on dynamic incentive model, DIM. This method firstly constructs a hypergraph to reduce the impact of false positive data in PPI. Then, the importance of genes in each hyperedge in hypergraph is considered from three perspectives, network and functional score(NFS) is proposed. By analyzing the relation among features, the dynamic incentive model is proposed to fuse NFS, the differential expression score of mRNA and the differential expression score of miRNA. DIM is compared with some classical methods on breast cancer, lung cancer, prostate cancer, and pan-cancer datasets. The results show that DIM has the best performance on statistical evaluation indicators, functional consistency and the partial area under the ROC curve, and has good cross-cancer capability.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142345924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Partition Based Algorithms for Rearrangement Distances with Flexible Intergenic Regions. 基于分区的灵活基因间重排距离算法
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-24 DOI: 10.1109/TCBB.2024.3467033
Gabriel Siqueira, Alexsandro Oliveira Alexandrino, Andre Rodrigues Oliveira, Geraldine Jean, Guillaume Fertin, Zanoni Dias

Genome Rearrangement distance problems are used in Computational Biology to estimate the evolutionary distance between genomes. These problems consist of minimizing the number of rearrangement events necessary to transform one genome into another. Two commonly used rearrangement events are reversal and transposition. The first studied problems ignored nucleotides outside genes (called intergenic regions), or assumed that genomes have a single copy of each gene. Recent works made advancements in more general problems considering the number of nucleotides in intergenic regions, and replicated genes. Nevertheless, genomes tend to have wildly different quantities of nucleotides on their intergenic regions, which poses a problem when comparing these regions exactly. To overcome this limitation, our work considers some flexibility when matching intergenic regions that do not have the same number of nucleotides. We propose new problems seeking the minimum number of reversals, or reversals and transpositions, necessary to transform one genome into another, while considering flexible intergenic region information. We show approximations for these problems by exploring their relationship with the Signed Minimum Common Flexible Intergenic String Partition problem. We also present different heuristics for the partition problem, and conduct experimental tests on simulated genomes to assess the performance of our algorithms.

基因组重排距离问题在计算生物学中用于估算基因组之间的进化距离。这些问题包括将一个基因组转化为另一个基因组所需的重排事件数量最小化。两种常用的重排事件是反转和转座。最初研究的问题忽略了基因外的核苷酸(称为基因间区),或假设基因组中每个基因只有一个拷贝。最近的研究在考虑基因间区核苷酸数量和复制基因等更一般的问题上取得了进展。然而,基因组在基因间区的核苷酸数量往往相差很大,这就给精确比较这些区域带来了问题。为了克服这一局限,我们的研究在匹配核苷酸数量不一致的基因间区时考虑了一定的灵活性。我们提出了新的问题,即在考虑灵活的基因间区域信息的同时,寻求将一个基因组转化为另一个基因组所需的最小反转或反转和转座次数。我们通过探讨这些问题与符号最小通用灵活基因间字符串分割问题的关系,展示了这些问题的近似值。我们还针对分割问题提出了不同的启发式算法,并在模拟基因组上进行了实验测试,以评估我们算法的性能。
{"title":"Partition Based Algorithms for Rearrangement Distances with Flexible Intergenic Regions.","authors":"Gabriel Siqueira, Alexsandro Oliveira Alexandrino, Andre Rodrigues Oliveira, Geraldine Jean, Guillaume Fertin, Zanoni Dias","doi":"10.1109/TCBB.2024.3467033","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3467033","url":null,"abstract":"<p><p>Genome Rearrangement distance problems are used in Computational Biology to estimate the evolutionary distance between genomes. These problems consist of minimizing the number of rearrangement events necessary to transform one genome into another. Two commonly used rearrangement events are reversal and transposition. The first studied problems ignored nucleotides outside genes (called intergenic regions), or assumed that genomes have a single copy of each gene. Recent works made advancements in more general problems considering the number of nucleotides in intergenic regions, and replicated genes. Nevertheless, genomes tend to have wildly different quantities of nucleotides on their intergenic regions, which poses a problem when comparing these regions exactly. To overcome this limitation, our work considers some flexibility when matching intergenic regions that do not have the same number of nucleotides. We propose new problems seeking the minimum number of reversals, or reversals and transpositions, necessary to transform one genome into another, while considering flexible intergenic region information. We show approximations for these problems by exploring their relationship with the Signed Minimum Common Flexible Intergenic String Partition problem. We also present different heuristics for the partition problem, and conduct experimental tests on simulated genomes to assess the performance of our algorithms.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142345931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Antifreeze Proteins Prediction with Protein Language Models and Hybrid Feature Extraction Networks. 利用蛋白质语言模型和混合特征提取网络改进抗冻蛋白预测。
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-24 DOI: 10.1109/TCBB.2024.3467261
Jiashun Wu, Yan Liu, Yiheng Zhu, Dong-Jun Yu

Accurate identification of antifreeze proteins (AFPs) is crucial in developing biomimetic synthetic anti-icing materials and low-temperature organ preservation materials. Although numerous machine learning-based methods have been proposed for AFPs prediction, the complex and diverse nature of AFPs limits the prediction performance of existing methods. In this study, we propose AFP-Deep, a new deep learning method to predict antifreeze proteins by integrating embedding from protein sequences with pre-trained protein language models and evolutionary contexts with hybrid feature extraction networks. The experimental results demonstrated that the main advantage of AFP-Deep is its utilization of pre-trained protein language models, which can extract discriminative global contextual features from protein sequences. Additionally, the hybrid deep neural networks designed for protein language models and evolutionary context feature extraction enhance the correlation between embeddings and antifreeze pattern. The performance evaluation results show that AFP-Deep achieves superior performance compared to state-of-the-art models on benchmark datasets, achieving an AUPRC of 0.724 and 0.924, respectively.

准确鉴定防冻蛋白(AFPs)对于开发仿生合成防冰材料和低温器官保存材料至关重要。虽然已经提出了许多基于机器学习的 AFPs 预测方法,但 AFPs 的复杂性和多样性限制了现有方法的预测性能。在本研究中,我们提出了一种新的深度学习方法AFP-Deep,通过将蛋白质序列的嵌入与预训练的蛋白质语言模型和进化上下文与混合特征提取网络相结合来预测防冻蛋白质。实验结果表明,AFP-Deep 的主要优势在于它利用了预训练的蛋白质语言模型,可以从蛋白质序列中提取具有区分性的全局上下文特征。此外,为蛋白质语言模型和进化上下文特征提取设计的混合深度神经网络增强了嵌入与防冻模式之间的相关性。性能评估结果表明,AFP-Deep 在基准数据集上的性能优于最先进的模型,AUPRC 分别达到 0.724 和 0.924。
{"title":"Improving Antifreeze Proteins Prediction with Protein Language Models and Hybrid Feature Extraction Networks.","authors":"Jiashun Wu, Yan Liu, Yiheng Zhu, Dong-Jun Yu","doi":"10.1109/TCBB.2024.3467261","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3467261","url":null,"abstract":"<p><p>Accurate identification of antifreeze proteins (AFPs) is crucial in developing biomimetic synthetic anti-icing materials and low-temperature organ preservation materials. Although numerous machine learning-based methods have been proposed for AFPs prediction, the complex and diverse nature of AFPs limits the prediction performance of existing methods. In this study, we propose AFP-Deep, a new deep learning method to predict antifreeze proteins by integrating embedding from protein sequences with pre-trained protein language models and evolutionary contexts with hybrid feature extraction networks. The experimental results demonstrated that the main advantage of AFP-Deep is its utilization of pre-trained protein language models, which can extract discriminative global contextual features from protein sequences. Additionally, the hybrid deep neural networks designed for protein language models and evolutionary context feature extraction enhance the correlation between embeddings and antifreeze pattern. The performance evaluation results show that AFP-Deep achieves superior performance compared to state-of-the-art models on benchmark datasets, achieving an AUPRC of 0.724 and 0.924, respectively.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142345926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GenoM7GNet: An Efficient N7-Methylguanosine Site Prediction Approach Based on a Nucleotide Language Model. GenoM7GNet:基于核苷酸语言模型的高效 N7-甲基鸟苷位点预测方法
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-20 DOI: 10.1109/TCBB.2024.3459870
Chuang Li, Heshi Wang, Yanhua Wen, Rui Yin, Xiangxiang Zeng, Keqin Li

N7 -methylguanosine (m7G), one of the mainstream post-transcriptional RNA modifications, occupies an exceedingly significant place in medical treatments. However, classic approaches for identifying m7G sites are costly both in time and equipment. Meanwhile, the existing machine learning methods extract limited hidden information from RNA sequences, thus making it difficult to improve the accuracy. Therefore, we put forward to a deep learning network, called "GenoM7GNet," for m7G site identification. This model utilizes a Bidirectional Encoder Representation from Transformers (BERT) and is pretrained on nucleotide sequences data to capture hidden patterns from RNA sequences for m7G site prediction. Moreover, through detailed comparative experiments with various deep learning models, we discovered that the one-dimensional convolutional neural network (CNN) exhibits outstanding performance in sequence feature learning and classification. The proposed GenoM7GNet model achieved 0.953in accuracy, 0.932in sensitivity, 0.976in specificity, 0.907in Matthews Correlation Coefficient and 0.984in Area Under the receiver operating characteristic Curve on performance evaluation. Extensive experimental results further prove that our GenoM7GNet model markedly surpasses other state-of-the-art models in predicting m7G sites, exhibiting high computing performance.

N7 -甲基鸟苷(m7G)是转录后 RNA 修饰的主流之一,在医学治疗中占有极其重要的地位。然而,识别 m7G 位点的传统方法在时间和设备上都很昂贵。同时,现有的机器学习方法从 RNA 序列中提取的隐藏信息有限,因此很难提高准确率。因此,我们提出了一种用于识别 m7G 位点的深度学习网络,称为 "GenoM7GNet"。该模型利用双向变换器编码器表征(BERT),并在核苷酸序列数据上进行预训练,以捕捉 RNA 序列中的隐藏模式,用于 m7G 位点预测。此外,通过与各种深度学习模型的详细对比实验,我们发现一维卷积神经网络(CNN)在序列特征学习和分类方面表现出色。所提出的 GenoM7GNet 模型在性能评估上取得了 0.953 的准确率、0.932 的灵敏度、0.976 的特异性、0.907 的马修斯相关系数和 0.984 的接收者工作特征曲线下面积。广泛的实验结果进一步证明,我们的 GenoM7GNet 模型在预测 m7G 位点方面明显超越了其他最先进的模型,表现出很高的计算性能。
{"title":"GenoM7GNet: An Efficient N<sup>7</sup>-Methylguanosine Site Prediction Approach Based on a Nucleotide Language Model.","authors":"Chuang Li, Heshi Wang, Yanhua Wen, Rui Yin, Xiangxiang Zeng, Keqin Li","doi":"10.1109/TCBB.2024.3459870","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3459870","url":null,"abstract":"<p><p>N<sup>7</sup> -methylguanosine (m7G), one of the mainstream post-transcriptional RNA modifications, occupies an exceedingly significant place in medical treatments. However, classic approaches for identifying m7G sites are costly both in time and equipment. Meanwhile, the existing machine learning methods extract limited hidden information from RNA sequences, thus making it difficult to improve the accuracy. Therefore, we put forward to a deep learning network, called \"GenoM7GNet,\" for m7G site identification. This model utilizes a Bidirectional Encoder Representation from Transformers (BERT) and is pretrained on nucleotide sequences data to capture hidden patterns from RNA sequences for m7G site prediction. Moreover, through detailed comparative experiments with various deep learning models, we discovered that the one-dimensional convolutional neural network (CNN) exhibits outstanding performance in sequence feature learning and classification. The proposed GenoM7GNet model achieved 0.953in accuracy, 0.932in sensitivity, 0.976in specificity, 0.907in Matthews Correlation Coefficient and 0.984in Area Under the receiver operating characteristic Curve on performance evaluation. Extensive experimental results further prove that our GenoM7GNet model markedly surpasses other state-of-the-art models in predicting m7G sites, exhibiting high computing performance.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142286167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Topological-Similarity Based Canonical Representations for Biological Link Prediction 基于拓扑相似性的生物链接预测典型表示法
IF 4.5 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-17 DOI: 10.1109/tcbb.2024.3462730
Mengzhen Li, Mustafa Coşkun, Mehmet Koyutürk
{"title":"Topological-Similarity Based Canonical Representations for Biological Link Prediction","authors":"Mengzhen Li, Mustafa Coşkun, Mehmet Koyutürk","doi":"10.1109/tcbb.2024.3462730","DOIUrl":"https://doi.org/10.1109/tcbb.2024.3462730","url":null,"abstract":"","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"38 1","pages":""},"PeriodicalIF":4.5,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142251662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accurate Flow Decomposition via Robust Integer Linear Programming 通过稳健整数线性规划实现精确流量分解
IF 4.5 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-13 DOI: 10.1109/tcbb.2024.3433523
Fernando H. C. Dias, Alexandru I. Tomescu
{"title":"Accurate Flow Decomposition via Robust Integer Linear Programming","authors":"Fernando H. C. Dias, Alexandru I. Tomescu","doi":"10.1109/tcbb.2024.3433523","DOIUrl":"https://doi.org/10.1109/tcbb.2024.3433523","url":null,"abstract":"","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"16 1","pages":""},"PeriodicalIF":4.5,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142251664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE/ACM Transactions on Computational Biology and Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1