首页 > 最新文献

IEEE/ACM Transactions on Computational Biology and Bioinformatics最新文献

英文 中文
A comprehensive evaluation framework for benchmarking multi-objective feature selection in omics-based biomarker discovery. 基于 omics 的生物标记发现中多目标特征选择基准的综合评估框架。
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-10-14 DOI: 10.1109/TCBB.2024.3480150
Luca Cattelani, Arindam Ghosh, Teemu Rintala, Vittorio Fortino

Machine learning algorithms have been extensively used for accurate classification of cancer subtypes driven by gene expression-based biomarkers. However, biomarker models combining multiple gene expression signatures are often not reproducible in external validation datasets and their feature set size is often not optimized, jeopardizing their translatability into cost-effective clinical tools. We investigated how to solve the multi-objective problem of finding the best trade-offs between classification performance and set size applying seven algorithms for machine learning-driven feature subset selection and analyse how they perform in a benchmark with eight large-scale transcriptome datasets of cancer, covering both training and external validation sets. The benchmark includes evaluation metrics assessing the performance of the individual biomarkers and the solution sets, according to their accuracy, diversity, and stability of the composing genes. Moreover, a new evaluation metric for cross-validation studies is proposed that generalizes the hypervolume, which is commonly used to assess the performance of multi-objective optimization algorithms. Biomarkers exhibiting 0.8 of balanced accuracy on the external dataset for breast, kidney and ovarian cancer using respectively 4, 2 and 7 features, were obtained. Genetic algorithms often provided better performance than other considered algorithms, and the recently proposed NSGA2-CH and NSGA2-CHS were the best performing methods in most cases.

机器学习算法已被广泛用于对基于基因表达的生物标记物驱动的癌症亚型进行准确分类。然而,结合多种基因表达特征的生物标志物模型在外部验证数据集中往往不可重现,而且其特征集的大小往往没有得到优化,从而影响了其转化为具有成本效益的临床工具的能力。我们研究了如何解决在分类性能和特征集大小之间找到最佳权衡的多目标问题,应用了七种机器学习驱动的特征子集选择算法,并分析了它们在八个大规模癌症转录组数据集(涵盖训练集和外部验证集)基准中的表现。该基准包括根据组成基因的准确性、多样性和稳定性评估单个生物标记物和解决方案集性能的评价指标。此外,还提出了一种用于交叉验证研究的新评价指标,该指标对通常用于评估多目标优化算法性能的超体积(hypervolume)进行了概括。在乳腺癌、肾癌和卵巢癌的外部数据集上,分别使用 4 个、2 个和 7 个特征的生物标志物显示出 0.8 的均衡准确性。遗传算法的性能往往优于其他算法,最近提出的 NSGA2-CH 和 NSGA2-CHS 在大多数情况下是性能最好的方法。
{"title":"A comprehensive evaluation framework for benchmarking multi-objective feature selection in omics-based biomarker discovery.","authors":"Luca Cattelani, Arindam Ghosh, Teemu Rintala, Vittorio Fortino","doi":"10.1109/TCBB.2024.3480150","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3480150","url":null,"abstract":"<p><p>Machine learning algorithms have been extensively used for accurate classification of cancer subtypes driven by gene expression-based biomarkers. However, biomarker models combining multiple gene expression signatures are often not reproducible in external validation datasets and their feature set size is often not optimized, jeopardizing their translatability into cost-effective clinical tools. We investigated how to solve the multi-objective problem of finding the best trade-offs between classification performance and set size applying seven algorithms for machine learning-driven feature subset selection and analyse how they perform in a benchmark with eight large-scale transcriptome datasets of cancer, covering both training and external validation sets. The benchmark includes evaluation metrics assessing the performance of the individual biomarkers and the solution sets, according to their accuracy, diversity, and stability of the composing genes. Moreover, a new evaluation metric for cross-validation studies is proposed that generalizes the hypervolume, which is commonly used to assess the performance of multi-objective optimization algorithms. Biomarkers exhibiting 0.8 of balanced accuracy on the external dataset for breast, kidney and ovarian cancer using respectively 4, 2 and 7 features, were obtained. Genetic algorithms often provided better performance than other considered algorithms, and the recently proposed NSGA2-CH and NSGA2-CHS were the best performing methods in most cases.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142464221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generative Biomedical Event Extraction with Constrained Decoding Strategy. 采用约束解码策略的生成式生物医学事件提取。
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-10-14 DOI: 10.1109/TCBB.2024.3480088
Fangfang Su, Chong Teng, Fei Li, Bobo Li, Jun Zhou, Donghong Ji

Currently, biomedical event extraction has received considerable attention in various fields, including natural language processing, bioinformatics, and computational biomedicine. This has led to the emergence of numerous machine learning and deep learning models that have been proposed and applied to tackle this complex task. While existing models typically adopt an extraction-based approach, which requires breaking down the extraction of biomedical events into multiple subtasks for sequential processing, making it prone to cascading errors. This paper presents a novel approach by constructing a biomedical event generation model based on the framework of the pre-trained language model T5. We employ a sequence-tosequence generation paradigm to obtain events, the model utilizes constrained decoding algorithm to guide sequence generation, and a curriculum learning algorithm for efficient model learning. To demonstrate the effectiveness of our model, we evaluate it on two public benchmark datasets, Genia 2011 and Genia 2013. Our model achieves superior performance, illustrating the effectiveness of generative modeling of biomedical events.

目前,生物医学事件提取已在自然语言处理、生物信息学和计算生物医学等多个领域受到广泛关注。为解决这一复杂任务,人们提出并应用了大量机器学习和深度学习模型。现有模型通常采用基于提取的方法,这需要将生物医学事件的提取分解成多个子任务进行顺序处理,因此容易出现层叠错误。本文提出了一种新方法,即在预训练语言模型 T5 的框架基础上构建生物医学事件生成模型。我们采用序列-序列生成范式来获取事件,模型利用约束解码算法来指导序列生成,并利用课程学习算法来实现高效的模型学习。为了证明模型的有效性,我们在两个公共基准数据集(Genia 2011 和 Genia 2013)上对其进行了评估。我们的模型取得了优异的性能,说明了生物医学事件生成模型的有效性。
{"title":"Generative Biomedical Event Extraction with Constrained Decoding Strategy.","authors":"Fangfang Su, Chong Teng, Fei Li, Bobo Li, Jun Zhou, Donghong Ji","doi":"10.1109/TCBB.2024.3480088","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3480088","url":null,"abstract":"<p><p>Currently, biomedical event extraction has received considerable attention in various fields, including natural language processing, bioinformatics, and computational biomedicine. This has led to the emergence of numerous machine learning and deep learning models that have been proposed and applied to tackle this complex task. While existing models typically adopt an extraction-based approach, which requires breaking down the extraction of biomedical events into multiple subtasks for sequential processing, making it prone to cascading errors. This paper presents a novel approach by constructing a biomedical event generation model based on the framework of the pre-trained language model T5. We employ a sequence-tosequence generation paradigm to obtain events, the model utilizes constrained decoding algorithm to guide sequence generation, and a curriculum learning algorithm for efficient model learning. To demonstrate the effectiveness of our model, we evaluate it on two public benchmark datasets, Genia 2011 and Genia 2013. Our model achieves superior performance, illustrating the effectiveness of generative modeling of biomedical events.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142464222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GrapHiC: An integrative graph based approach for imputing missing Hi-C reads. GrapHiC:一种基于图的综合方法,用于估算缺失的 Hi-C 读数。
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-10-11 DOI: 10.1109/TCBB.2024.3477909
Ghulam Murtaza, Justin Wagner, Justin M Zook, Ritambhara Singh

Hi-C experiments allow researchers to study and understand the 3D genome organization and its regulatory function. Unfortunately, sequencing costs and technical constraints severely restrict access to high-quality Hi-C data for many cell types. Existing frameworks rely on a sparse Hi-C dataset or cheaper-to-acquire ChIP-seq data to predict Hi-C contact maps with high read coverage. However, these methods fail to generalize to sparse or cross-cell-type inputs because they do not account for the contributions of epigenomic features or the impact of the structural neighborhood in predicting Hi-C reads. We propose GrapHiC, which combines Hi-C and ChIP-seq in a graph representation, allowing more accurate embedding of structural and epigenomic features. Each node represents a binned genomic region, and we assign edge weights using the observed Hi-C reads. Additionally, we embed ChIP-seq and relative positional information as node attributes, allowing our representation to capture structural neighborhoods and the contributions of proteins and their modifications for predicting Hi-C reads. We show that GrapHiC generalizes better than the current state-of-the-art on cross-cell-type settings and sparse Hi-C inputs. Moreover, we can utilize our framework to impute Hi-C reads even when no Hi-C contact map is available, thus making high-quality Hi-C data accessible for many cell types. Availability: https://github.com/rsinghlab/GrapHiC.

Hi-C 实验使研究人员能够研究和了解三维基因组的组织及其调控功能。遗憾的是,测序成本和技术限制严重制约了对许多细胞类型的高质量 Hi-C 数据的获取。现有的框架依赖于稀疏的 Hi-C 数据集或获取成本更低的 ChIP-seq 数据来预测高读数覆盖率的 Hi-C 接触图。然而,这些方法无法推广到稀疏或跨细胞类型的输入,因为它们没有考虑表观基因组特征的贡献或结构邻域对预测 Hi-C 读数的影响。我们提出的 GrapHiC 方法将 Hi-C 和 ChIP-seq 结合到图表示法中,可以更准确地嵌入结构和表观基因组特征。每个节点代表一个二进制基因组区域,我们使用观察到的 Hi-C 读数分配边缘权重。此外,我们还将 ChIP-seq 和相对位置信息嵌入节点属性,从而使我们的表征能够捕捉结构邻域和蛋白质及其修饰对预测 Hi-C 读数的贡献。我们的研究表明,在交叉细胞类型设置和稀疏 Hi-C 输入上,GrapHiC 的通用性优于目前最先进的技术。此外,即使没有 Hi-C 接触图,我们也能利用我们的框架来推算 Hi-C 读数,从而使许多细胞类型都能获得高质量的 Hi-C 数据。可用性:https://github.com/rsinghlab/GrapHiC。
{"title":"GrapHiC: An integrative graph based approach for imputing missing Hi-C reads.","authors":"Ghulam Murtaza, Justin Wagner, Justin M Zook, Ritambhara Singh","doi":"10.1109/TCBB.2024.3477909","DOIUrl":"10.1109/TCBB.2024.3477909","url":null,"abstract":"<p><p>Hi-C experiments allow researchers to study and understand the 3D genome organization and its regulatory function. Unfortunately, sequencing costs and technical constraints severely restrict access to high-quality Hi-C data for many cell types. Existing frameworks rely on a sparse Hi-C dataset or cheaper-to-acquire ChIP-seq data to predict Hi-C contact maps with high read coverage. However, these methods fail to generalize to sparse or cross-cell-type inputs because they do not account for the contributions of epigenomic features or the impact of the structural neighborhood in predicting Hi-C reads. We propose GrapHiC, which combines Hi-C and ChIP-seq in a graph representation, allowing more accurate embedding of structural and epigenomic features. Each node represents a binned genomic region, and we assign edge weights using the observed Hi-C reads. Additionally, we embed ChIP-seq and relative positional information as node attributes, allowing our representation to capture structural neighborhoods and the contributions of proteins and their modifications for predicting Hi-C reads. We show that GrapHiC generalizes better than the current state-of-the-art on cross-cell-type settings and sparse Hi-C inputs. Moreover, we can utilize our framework to impute Hi-C reads even when no Hi-C contact map is available, thus making high-quality Hi-C data accessible for many cell types. Availability: https://github.com/rsinghlab/GrapHiC.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142406376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Guest Editors' Introduction to the Special Section on Bioinformatics Research and Applications 特邀编辑对生物信息学研究与应用专栏的介绍
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-10-09 DOI: 10.1109/TCBB.2024.3390374
Zhipeng Cai;Alexander Zelikovsky
{"title":"Guest Editors' Introduction to the Special Section on Bioinformatics Research and Applications","authors":"Zhipeng Cai;Alexander Zelikovsky","doi":"10.1109/TCBB.2024.3390374","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3390374","url":null,"abstract":"","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10712175","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142397043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
De Novo Drug Design by Multi-Objective Path Consistency Learning with Beam A∗ Search. 利用光束 A∗ 搜索的多目标路径一致性学习进行新药设计。
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-10-09 DOI: 10.1109/TCBB.2024.3477592
Dengwei Zhao, Jingyuan Zhou, Shikui Tu, Lei Xu

Generating high-quality and drug-like molecules from scratch within the expansive chemical space presents a significant challenge in the field of drug discovery. In prior research, value-based reinforcement learning algorithms have been employed to generate molecules with multiple desired properties iteratively. The immediate reward was defined as the evaluation of intermediate-state molecules at each step, and the learning objective would be maximizing the expected cumulative evaluation scores for all molecules along the generative path. However, this definition of the reward was misleading, as in reality, the optimization target should be the evaluation score of only the final generated molecule. Furthermore, in previous works, randomness was introduced into the decision-making process, enabling the generation of diverse molecules but no longer pursuing the maximum future rewards. In this paper, immediate reward is defined as the improvement achieved through the modification of the molecule to maximize the evaluation score of the final generated molecule exclusively. Originating from the A ∗ search, path consistency (PC), i.e., f values on one optimal path should be identical, is employed as the objective function in the update of the f value estimator to train a multi-objective de novo drug designer. By incorporating the f value into the decision-making process of beam search, the DrugBA∗ algorithm is proposed to enable the large-scale generation of molecules that exhibit both high quality and diversity. Experimental results demonstrate a substantial enhancement over the state-of-theart algorithm QADD in multiple molecular properties of the generated molecules.

在广阔的化学空间内从零开始生成高质量的类药物分子是药物发现领域的一项重大挑战。在之前的研究中,基于价值的强化学习算法被用来迭代生成具有多种所需特性的分子。即时奖励被定义为每一步对中间状态分子的评估,学习目标是最大化生成路径上所有分子的预期累积评估分数。然而,这种对奖励的定义有误导性,因为实际上,优化目标应该只是最终生成的分子的评价得分。此外,在以前的研究中,决策过程中引入了随机性,从而可以生成多种分子,但不再追求未来的最大回报。在本文中,即时奖励被定义为通过对分子的修改所实现的改进,从而使最终生成的分子的评价得分最大化。路径一致性(PC)源于 A ∗ 搜索,即一条最优路径上的 f 值应完全相同,它被用作更新 f 值估计器的目标函数,以训练多目标全新药物设计器。通过将 f 值纳入波束搜索的决策过程,DrugBA∗ 算法得以大规模生成高质量和多样性的分子。实验结果表明,与最先进的 QADD 算法相比,所生成分子的多种分子特性都有大幅提升。
{"title":"De Novo Drug Design by Multi-Objective Path Consistency Learning with Beam A∗ Search.","authors":"Dengwei Zhao, Jingyuan Zhou, Shikui Tu, Lei Xu","doi":"10.1109/TCBB.2024.3477592","DOIUrl":"10.1109/TCBB.2024.3477592","url":null,"abstract":"<p><p>Generating high-quality and drug-like molecules from scratch within the expansive chemical space presents a significant challenge in the field of drug discovery. In prior research, value-based reinforcement learning algorithms have been employed to generate molecules with multiple desired properties iteratively. The immediate reward was defined as the evaluation of intermediate-state molecules at each step, and the learning objective would be maximizing the expected cumulative evaluation scores for all molecules along the generative path. However, this definition of the reward was misleading, as in reality, the optimization target should be the evaluation score of only the final generated molecule. Furthermore, in previous works, randomness was introduced into the decision-making process, enabling the generation of diverse molecules but no longer pursuing the maximum future rewards. In this paper, immediate reward is defined as the improvement achieved through the modification of the molecule to maximize the evaluation score of the final generated molecule exclusively. Originating from the A ∗ search, path consistency (PC), i.e., f values on one optimal path should be identical, is employed as the objective function in the update of the f value estimator to train a multi-objective de novo drug designer. By incorporating the f value into the decision-making process of beam search, the DrugBA∗ algorithm is proposed to enable the large-scale generation of molecules that exhibit both high quality and diversity. Experimental results demonstrate a substantial enhancement over the state-of-theart algorithm QADD in multiple molecular properties of the generated molecules.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142390227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Guest Editorial Selected Papers From BIOKDD 2022 特邀编辑 BIOKDD 2022 论文选
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-10-09 DOI: 10.1109/TCBB.2024.3429784
Da Yan;Catia Pesquita;Carsten Görg;Jake Y. Chen
{"title":"Guest Editorial Selected Papers From BIOKDD 2022","authors":"Da Yan;Catia Pesquita;Carsten Görg;Jake Y. Chen","doi":"10.1109/TCBB.2024.3429784","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3429784","url":null,"abstract":"","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10712183","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142397044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Orientation Determination of Cryo-EM Projection Images Using Reliable Common Lines and Spherical Embeddings. 利用可靠的共线和球形嵌入确定冷冻电镜投影图像的方向
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-10-09 DOI: 10.1109/TCBB.2024.3476619
Xiangwen Wang, Qiaoying Jin, Li Zou, Xianghong Lin, Yonggang Lu

Three-dimensional (3D) reconstruction in single-particle cryo-electron microscopy (cryo-EM) is a critical technique for recovering and studying the fine 3D structure of proteins and other biological macromolecules, where the primary issue is to determine the orientations of projection images with high levels of noise. This paper proposes a method to determine the orientations of cryo-EM projection images using reliable common lines and spherical embeddings. First, the reliability of common lines between projection images is evaluated using a weighted voting algorithm based on an iterative improvement technique and binarized weighting. Then, the reliable common lines are used to calculate the normal vectors and local X-axis vectors of projection images after two spherical embeddings. Finally, the orientations of projection images are determined by aligning the results of the two spherical embeddings using an orthogonal constraint. Experimental results on both synthetic and real cryo-EM projection image datasets demonstrate that the proposed method can achieve higher accuracy in estimating the orientations of projection images and higher resolution in reconstructing preliminary 3D structures than some common line-based methods, indicating that the proposed method is effective in single-particle cryo-EM 3D reconstruction.

单颗粒冷冻电镜(cryo-EM)中的三维(3D)重建是恢复和研究蛋白质及其他生物大分子精细三维结构的关键技术,其中的首要问题是确定高噪声投影图像的方向。本文提出了一种利用可靠的共线和球形嵌入确定冷冻电镜投影图像方向的方法。首先,使用基于迭代改进技术和二值化加权的加权投票算法评估投影图像之间公共线的可靠性。然后,利用可靠的公共线计算经过两次球形嵌入后投影图像的法向量和局部 X 轴向量。最后,利用正交约束对两个球形嵌入的结果进行对齐,从而确定投影图像的方向。在合成和真实冷冻电镜投影图像数据集上的实验结果表明,与一些常见的基于线的方法相比,所提出的方法在估计投影图像的方向方面能达到更高的精度,在重建初步的三维结构方面能达到更高的分辨率,这表明所提出的方法在单粒子冷冻电镜三维重建方面是有效的。
{"title":"Orientation Determination of Cryo-EM Projection Images Using Reliable Common Lines and Spherical Embeddings.","authors":"Xiangwen Wang, Qiaoying Jin, Li Zou, Xianghong Lin, Yonggang Lu","doi":"10.1109/TCBB.2024.3476619","DOIUrl":"10.1109/TCBB.2024.3476619","url":null,"abstract":"<p><p>Three-dimensional (3D) reconstruction in single-particle cryo-electron microscopy (cryo-EM) is a critical technique for recovering and studying the fine 3D structure of proteins and other biological macromolecules, where the primary issue is to determine the orientations of projection images with high levels of noise. This paper proposes a method to determine the orientations of cryo-EM projection images using reliable common lines and spherical embeddings. First, the reliability of common lines between projection images is evaluated using a weighted voting algorithm based on an iterative improvement technique and binarized weighting. Then, the reliable common lines are used to calculate the normal vectors and local X-axis vectors of projection images after two spherical embeddings. Finally, the orientations of projection images are determined by aligning the results of the two spherical embeddings using an orthogonal constraint. Experimental results on both synthetic and real cryo-EM projection image datasets demonstrate that the proposed method can achieve higher accuracy in estimating the orientations of projection images and higher resolution in reconstructing preliminary 3D structures than some common line-based methods, indicating that the proposed method is effective in single-particle cryo-EM 3D reconstruction.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142390230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A knowledge graph-based method for drug-drug interaction prediction with contrastive learning. 基于知识图谱的药物相互作用预测方法与对比学习。
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-10-09 DOI: 10.1109/TCBB.2024.3477410
Jian Zhong, Haochen Zhao, Qichang Zhao, Jianxin Wang

Precisely predicting Drug-Drug Interactions (DDIs) carries the potential to elevate the quality and safety of drug therapies, protecting the well-being of patients, and providing essential guidance and decision support at every stage of the drug development process. In recent years, leveraging large-scale biomedical knowledge graphs has improved DDI prediction performance. However, the feature extraction procedures in these methods are still rough. More refined features may further improve the quality of predictions. To overcome these limitations, we develop a knowledge graph-based method for multi-typed DDI prediction with contrastive learning (KG-CLDDI). In KG-CLDDI, we combine drug knowledge aggregation features from the knowledge graph with drug topological aggregation features from the DDI graph. Additionally, we build a contrastive learning module that uses horizontal reversal and dropout operations to produce high-quality embeddings for drug-drug pairs. The comparison results indicate that KG-CLDDI is superior to state-of-the-art models in both the transductive and inductive settings. Notably, for the inductive setting, KG-CLDDI outperforms the previous best method by 17.49% and 24.97% in terms of AUC and AUPR, respectively. Furthermore, we conduct the ablation analysis and case study to show the effectiveness of KG-CLDDI. These findings illustrate the potential significance of KG-CLDDI in advancing DDI research and its clinical applications. The codes of KG-CLDDI are available at https://github.com/jianzhong123/KG-CLDDI.

精确预测药物间相互作用(DDI)有可能提高药物治疗的质量和安全性,保护患者的健康,并在药物开发过程的各个阶段提供必要的指导和决策支持。近年来,利用大规模生物医学知识图谱提高了 DDI 预测性能。然而,这些方法中的特征提取程序仍然比较粗糙。更精细的特征可能会进一步提高预测质量。为了克服这些局限性,我们开发了一种基于知识图谱的多类型 DDI 预测方法(KG-CLDDI)。在 KG-CLDDI 中,我们将知识图谱中的药物知识聚合特征与 DDI 图谱中的药物拓扑聚合特征相结合。此外,我们还建立了一个对比学习模块,利用水平反转和剔除操作为药物对生成高质量的嵌入。对比结果表明,KG-CLDDI 在转导和归纳环境中都优于最先进的模型。值得注意的是,在归纳环境中,KG-CLDDI 的 AUC 和 AUPR 分别比之前的最佳方法高出 17.49% 和 24.97%。此外,我们还进行了消融分析和案例研究,以显示 KG-CLDDI 的有效性。这些发现说明了 KG-CLDDI 在推动 DDI 研究及其临床应用方面的潜在意义。KG-CLDDI 的代码见 https://github.com/jianzhong123/KG-CLDDI。
{"title":"A knowledge graph-based method for drug-drug interaction prediction with contrastive learning.","authors":"Jian Zhong, Haochen Zhao, Qichang Zhao, Jianxin Wang","doi":"10.1109/TCBB.2024.3477410","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3477410","url":null,"abstract":"<p><p>Precisely predicting Drug-Drug Interactions (DDIs) carries the potential to elevate the quality and safety of drug therapies, protecting the well-being of patients, and providing essential guidance and decision support at every stage of the drug development process. In recent years, leveraging large-scale biomedical knowledge graphs has improved DDI prediction performance. However, the feature extraction procedures in these methods are still rough. More refined features may further improve the quality of predictions. To overcome these limitations, we develop a knowledge graph-based method for multi-typed DDI prediction with contrastive learning (KG-CLDDI). In KG-CLDDI, we combine drug knowledge aggregation features from the knowledge graph with drug topological aggregation features from the DDI graph. Additionally, we build a contrastive learning module that uses horizontal reversal and dropout operations to produce high-quality embeddings for drug-drug pairs. The comparison results indicate that KG-CLDDI is superior to state-of-the-art models in both the transductive and inductive settings. Notably, for the inductive setting, KG-CLDDI outperforms the previous best method by 17.49% and 24.97% in terms of AUC and AUPR, respectively. Furthermore, we conduct the ablation analysis and case study to show the effectiveness of KG-CLDDI. These findings illustrate the potential significance of KG-CLDDI in advancing DDI research and its clinical applications. The codes of KG-CLDDI are available at https://github.com/jianzhong123/KG-CLDDI.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142390226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Molecule Generation and Drug Discovery with a Knowledge-enhanced Generative Model. 利用知识增强型生成模型改进分子生成和药物发现。
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-10-09 DOI: 10.1109/TCBB.2024.3477313
Aditya Malusare, Vaneet Aggarwal

Recent advancements in generative models have established state-of-the-art benchmarks in the generation of molecules and novel drug candidates. Despite these successes, a significant gap persists between generative models and the utilization of extensive biomedical knowledge, often systematized within knowledge graphs, whose potential to inform and enhance generative processes has not been realized. In this paper, we present a novel approach that bridges this divide by developing a framework for knowledge-enhanced generative models called KARL. We develop a scalable methodology to extend the functionality of knowledge graphs while preserving semantic integrity, and incorporate this contextual information into a generative framework to guide a diffusion-based model. The integration of knowledge graph embeddings with our generative model furnishes a robust mechanism for producing novel drug candidates possessing specific characteristics while ensuring validity and synthesizability. KARL outperforms state-of-the-art generative models on both unconditional and targeted generation tasks.

生成模型的最新进展为分子和新型候选药物的生成建立了最先进的基准。尽管取得了这些成就,但在生成模型与利用广泛的生物医学知识(通常在知识图谱中系统化)之间仍然存在着巨大的差距,而这些知识为生成过程提供信息和增强生成过程的潜力尚未实现。在本文中,我们提出了一种新颖的方法,通过开发一个名为 KARL 的知识增强生成模型框架来弥合这一鸿沟。我们开发了一种可扩展的方法来扩展知识图谱的功能,同时保持语义的完整性,并将这种上下文信息纳入生成框架,以指导基于扩散的模型。知识图谱嵌入与我们的生成模型相结合,提供了一种稳健的机制,用于生成具有特定特征的新型候选药物,同时确保有效性和可合成性。KARL 在无条件生成和目标生成任务上的表现都优于最先进的生成模型。
{"title":"Improving Molecule Generation and Drug Discovery with a Knowledge-enhanced Generative Model.","authors":"Aditya Malusare, Vaneet Aggarwal","doi":"10.1109/TCBB.2024.3477313","DOIUrl":"10.1109/TCBB.2024.3477313","url":null,"abstract":"<p><p>Recent advancements in generative models have established state-of-the-art benchmarks in the generation of molecules and novel drug candidates. Despite these successes, a significant gap persists between generative models and the utilization of extensive biomedical knowledge, often systematized within knowledge graphs, whose potential to inform and enhance generative processes has not been realized. In this paper, we present a novel approach that bridges this divide by developing a framework for knowledge-enhanced generative models called KARL. We develop a scalable methodology to extend the functionality of knowledge graphs while preserving semantic integrity, and incorporate this contextual information into a generative framework to guide a diffusion-based model. The integration of knowledge graph embeddings with our generative model furnishes a robust mechanism for producing novel drug candidates possessing specific characteristics while ensuring validity and synthesizability. KARL outperforms state-of-the-art generative models on both unconditional and targeted generation tasks.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142390228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RFLP-inator: interactive web platform for in silico simulation and complementary tools of the PCR-RFLP technique. RFLP-inator:用于 PCR-RFLP 技术硅学模拟和补充工具的交互式网络平台。
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-10-08 DOI: 10.1109/TCBB.2024.3476453
Kiefer Andre Bedoya Benites, Wilser Andres Garcia-Quispes

Polymerase chain reaction - Restriction Fragment Length Polymorphism (PCR-RFLP) is an established molecular biology technique leveraging DNA sequence variability for organism identification, genetic disease detection, biodiversity analysis, etc. Traditional PCR-RFLP requires wet-laboratory procedures that can result in technical errors, procedural challenges, and financial costs. With the aim of providing an accessible and efficient PCR-RFLP technique complement, we introduce RFLP-inator. This is a comprehensive web-based platform developed in R using the package Shiny, which simulates the PCR-RFLP technique, integrates analysis capabilities, and offers complementary tools for both pre- and post-evaluation of in vitro results. We developed the RFLP-inator's algorithm independently and our platform offers seven dynamic tools: RFLP simulator, Pattern identifier, Enzyme selector, RFLP analyzer, Multiplex PCR, Restriction map maker, and Gel plotter. Moreover, the software includes a restriction pattern database of more than 250,000 sequences of the bacterial 16S rRNA gene. We successfully validated the core tools against published research findings. This new platform is open access and user-friendly, offering a valuable resource for researchers, educators, and students specializing in molecular genetics. RFLP-inator not only streamlines RFLP technique application but also supports pedagogical efforts in genetics, illustrating its utility and reliability. The software is available for free at https://kodebio.shinyapps.io/RFLP-inator/.

聚合酶链式反应-限制性片段长度多态性(PCR-RFLP)是一种成熟的分子生物学技术,可利用 DNA 序列的变异性进行生物鉴定、遗传病检测和生物多样性分析等。传统的 PCR-RFLP 需要湿法实验室程序,可能导致技术错误、程序挑战和经济成本。为了提供方便、高效的 PCR-RFLP 技术补充,我们推出了 RFLP-inator。这是一个基于 R 的综合网络平台,使用 Shiny 软件包开发,可模拟 PCR-RFLP 技术,集成分析功能,并为体外结果的事前和事后评估提供补充工具。我们独立开发了 RFLP-inator 算法,我们的平台提供七种动态工具:RFLP 模拟器、模式识别器、酶选择器、RFLP 分析器、多重 PCR、限制性图谱制作器和凝胶绘图器。此外,该软件还包括一个包含 25 万多条细菌 16S rRNA 基因序列的限制性模式数据库。我们根据已发表的研究成果成功验证了核心工具。这一新平台具有开放性和用户友好性,为分子遗传学专业的研究人员、教育工作者和学生提供了宝贵的资源。RFLP-inator 不仅简化了 RFLP 技术的应用,还支持遗传学的教学工作,体现了其实用性和可靠性。该软件可在 https://kodebio.shinyapps.io/RFLP-inator/ 免费获取。
{"title":"RFLP-inator: interactive web platform for in silico simulation and complementary tools of the PCR-RFLP technique.","authors":"Kiefer Andre Bedoya Benites, Wilser Andres Garcia-Quispes","doi":"10.1109/TCBB.2024.3476453","DOIUrl":"10.1109/TCBB.2024.3476453","url":null,"abstract":"<p><p>Polymerase chain reaction - Restriction Fragment Length Polymorphism (PCR-RFLP) is an established molecular biology technique leveraging DNA sequence variability for organism identification, genetic disease detection, biodiversity analysis, etc. Traditional PCR-RFLP requires wet-laboratory procedures that can result in technical errors, procedural challenges, and financial costs. With the aim of providing an accessible and efficient PCR-RFLP technique complement, we introduce RFLP-inator. This is a comprehensive web-based platform developed in R using the package Shiny, which simulates the PCR-RFLP technique, integrates analysis capabilities, and offers complementary tools for both pre- and post-evaluation of in vitro results. We developed the RFLP-inator's algorithm independently and our platform offers seven dynamic tools: RFLP simulator, Pattern identifier, Enzyme selector, RFLP analyzer, Multiplex PCR, Restriction map maker, and Gel plotter. Moreover, the software includes a restriction pattern database of more than 250,000 sequences of the bacterial 16S rRNA gene. We successfully validated the core tools against published research findings. This new platform is open access and user-friendly, offering a valuable resource for researchers, educators, and students specializing in molecular genetics. RFLP-inator not only streamlines RFLP technique application but also supports pedagogical efforts in genetics, illustrating its utility and reliability. The software is available for free at https://kodebio.shinyapps.io/RFLP-inator/.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142390231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE/ACM Transactions on Computational Biology and Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1