首页 > 最新文献

Journal of Computational Biology最新文献

英文 中文
Asymmetric Cluster-Based Measures for Comparative Phylogenetics. 基于非对称聚类的比较系统发生学测量方法
IF 1.7 4区 生物学 Q2 Mathematics Pub Date : 2024-04-01 Epub Date: 2024-04-17 DOI: 10.1089/cmb.2023.0338
Sanket Wagle, Alexey Markin, Paweł Górecki, Tavis K Anderson, Oliver Eulenstein

Phylogenetic inference and reconstruction methods generate hypotheses on evolutionary history. Competing inference methods are frequently used, and the evaluation of the generated hypotheses is achieved using tree comparison costs. The Robinson-Foulds (RF) distance is a widely used cost to compare the topology of two trees, but this cost is sensitive to tree error and can overestimate tree differences. To overcome this limitation, a refined version of the RF distance called the Cluster Affinity (CA) distance was introduced. However, CA distances are symmetric and cannot compare different types of trees. These asymmetric comparisons occur when gene trees are compared with species trees, when disparate datasets are integrated into a supertree, or when tree comparison measures are used to infer a phylogenetic network. In this study, we introduce a relaxation of the original Affinity distance to compare heterogeneous trees called the asymmetric CA cost. We also develop a biologically interpretable cost, the Cluster Support cost that normalizes by cluster size across gene trees. The characteristics of these costs are similar to the symmetric CA cost. We describe efficient algorithms, derive the exact diameters, and use these to standardize the cost to be applicable in practice. These costs provide objective, fine-scale, and biologically interpretable values that can assess differences and similarities between phylogenetic trees.

系统发育推断和重建方法会产生关于进化历史的假设。相互竞争的推断方法经常被使用,而对生成的假说的评估是通过树比较成本来实现的。罗宾逊-福尔斯(Robinson-Foulds,RF)距离是比较两棵树拓扑结构的一种广泛使用的成本,但这种成本对树的误差很敏感,可能会高估树的差异。为了克服这一局限性,人们引入了 RF 距离的改进版,称为簇亲和力(CA)距离。然而,CA 距离是对称的,不能比较不同类型的树。当将基因树与物种树进行比较时,当将不同的数据集整合到一棵超级树中时,或者当使用树比较度量来推断系统发育网络时,这些非对称比较就会发生。在本研究中,我们引入了一种原始亲和距离的松弛方法,用于比较异质树,称为非对称 CA 成本。我们还开发了一种可从生物学角度解释的成本--集群支持成本,该成本根据基因树的集群大小进行归一化处理。这些成本的特点与对称 CA 成本类似。我们描述了高效的算法,推导出精确的直径,并利用这些算法将成本标准化,使其在实践中适用。这些成本提供了客观、精细和生物可解释的值,可以评估系统发生树之间的差异和相似性。
{"title":"Asymmetric Cluster-Based Measures for Comparative Phylogenetics.","authors":"Sanket Wagle, Alexey Markin, Paweł Górecki, Tavis K Anderson, Oliver Eulenstein","doi":"10.1089/cmb.2023.0338","DOIUrl":"10.1089/cmb.2023.0338","url":null,"abstract":"<p><p><b>Phylogenetic inference and reconstruction methods generate hypotheses on evolutionary history. Competing inference methods are frequently used, and the evaluation of the generated hypotheses is achieved using tree comparison costs. The Robinson</b>-<b>Foulds (RF) distance is a widely used cost to compare the topology of two trees, but this cost is sensitive to tree error and can overestimate tree differences. To overcome this limitation, a refined version of the RF distance called the Cluster Affinity (CA) distance was introduced. However, CA distances are symmetric and cannot compare different types of trees. These asymmetric comparisons occur when gene trees are compared with species trees, when disparate datasets are integrated into a supertree, or when tree comparison measures are used to infer a phylogenetic network. In this study, we introduce a relaxation of the original Affinity distance to compare heterogeneous trees called the asymmetric CA cost. We also develop a biologically interpretable cost, the Cluster Support cost that normalizes by cluster size across gene trees. The characteristics of these costs are similar to the symmetric CA cost. We describe efficient algorithms, derive the exact diameters, and use these to standardize the cost to be applicable in practice. These costs provide objective, fine-scale, and biologically interpretable values that can assess differences and similarities between phylogenetic trees.</b></p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11057527/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140863219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RECOMB Satellite Conference on Comparative Genomics (RECOMB-CG 2023). RECOMB 比较基因组学卫星会议(RECOMB-CG 2023)。
IF 1.7 4区 生物学 Q2 Mathematics Pub Date : 2024-04-01 Epub Date: 2024-04-09 DOI: 10.1089/cmb.2024.29113.tv
Tomas Vinar
{"title":"RECOMB Satellite Conference on Comparative Genomics (RECOMB-CG 2023).","authors":"Tomas Vinar","doi":"10.1089/cmb.2024.29113.tv","DOIUrl":"10.1089/cmb.2024.29113.tv","url":null,"abstract":"","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140849793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Fixed-Parameter Tractable Algorithm for Finding Agreement Cherry-Reduced Subnetworks in Level-1 Orchard Networks. 在一级果园网络中寻找协议樱桃还原子网络的固定参数可实现算法
IF 1.7 4区 生物学 Q2 Mathematics Pub Date : 2024-04-01 Epub Date: 2023-12-20 DOI: 10.1089/cmb.2023.0317
Kaari Landry, Olivier Tremblay-Savard, Manuel Lafond

Phylogenetic networks are increasingly being considered better suited to represent the complexity of the evolutionary relationships between species. One class of phylogenetic networks that have received a lot of attention recently is the class of orchard networks, which is composed of networks that can be reduced to a single leaf using cherry reductions. Cherry reductions, also called cherry-picking operations, remove either a leaf of a simple cherry (sibling leaves sharing a parent) or a reticulate edge of a reticulate cherry (two leaves whose parents are connected by a reticulate edge). In this article, we present a fixed-parameter tractable algorithm to solve the problem of finding a maximum agreement cherry-reduced subnetwork (MACRS) between two rooted binary level-1 networks. This is the first exact algorithm proposed to solve the MACRS problem. As proven in an earlier work, there is a direct relationship between finding an MACRS and calculating a distance based on cherry operations. As a result, the proposed algorithm also provides a distance that can be used for the comparison of level-1 networks.

越来越多的人认为系统发生网络更适合代表物种间进化关系的复杂性。最近受到广泛关注的一类系统发育网络是果园网络,它是由可以通过樱桃还原法还原为单叶的网络组成的。樱桃还原也称为樱桃摘取操作,它可以移除简单樱桃(共享一个父本的同胞叶子)的一片叶子或网状樱桃(父本由网状边连接的两片叶子)的网状边。在这篇文章中,我们提出了一种固定参数的可操作性算法,用于解决在两个有根二元一级网络之间寻找最大一致樱桃缩小子网络(MACRS)的问题。这是第一个提出的解决 MACRS 问题的精确算法。正如早先的工作所证明的那样,找到 MACRS 和计算基于樱桃运算的距离之间存在直接关系。因此,所提出的算法也提供了一个可用于比较一级网络的距离。
{"title":"A Fixed-Parameter Tractable Algorithm for Finding Agreement Cherry-Reduced Subnetworks in Level-1 Orchard Networks.","authors":"Kaari Landry, Olivier Tremblay-Savard, Manuel Lafond","doi":"10.1089/cmb.2023.0317","DOIUrl":"10.1089/cmb.2023.0317","url":null,"abstract":"<p><p><b>Phylogenetic networks are increasingly being considered better suited to represent the complexity of the evolutionary relationships between species. One class of phylogenetic networks that have received a lot of attention recently is the class of orchard networks, which is composed of networks that can be reduced to a single leaf using cherry reductions. Cherry reductions, also called cherry-picking operations, remove either a leaf of a simple cherry (sibling leaves sharing a parent) or a reticulate edge of a reticulate cherry (two leaves whose parents are connected by a reticulate edge). In this article, we present a fixed-parameter tractable algorithm to solve the problem of finding a maximum agreement cherry-reduced subnetwork (MACRS) between two rooted binary level-1 networks. This is the first exact algorithm proposed to solve the MACRS problem. As proven in an earlier work, there is a direct relationship between finding an MACRS and calculating a distance based on cherry operations. As a result, the proposed algorithm also provides a distance that can be used for the comparison of level-1 networks</b>.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138830002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The k-Robinson-Foulds Dissimilarity Measures for Comparison of Labeled Trees. 用于比较标记树的 k-Robinson-Foulds 差异度量。
IF 1.7 4区 生物学 Q2 Mathematics Pub Date : 2024-04-01 Epub Date: 2024-01-25 DOI: 10.1089/cmb.2023.0312
Elahe Khayatian, Gabriel Valiente, Louxin Zhang

Understanding the mutational history of tumor cells is a critical endeavor in unraveling the mechanisms that drive the onset and progression of cancer. Modeling tumor cell evolution with labeled trees motivates researchers to develop different measures to compare labeled trees. Although the Robinson-Foulds (RF) distance is widely used for comparing species trees, its applicability to labeled trees reveals certain limitations. This study introduces the k-RF dissimilarity measures, tailored to address the challenges of labeled tree comparison. The RF distance is succinctly expressed as n-RF in the space of labeled trees with n nodes. Like the RF distance, the k-RF is a pseudometric for multiset-labeled trees and becomes a metric in the space of 1-labeled trees. By setting k to a small value, the k-RF dissimilarity can capture analogous local regions in two labeled trees with different size or different labels.

了解肿瘤细胞的突变历史是揭示癌症发病和进展机制的关键工作。用标记树模拟肿瘤细胞的进化促使研究人员开发不同的测量方法来比较标记树。虽然罗宾逊-福尔斯(Robinson-Foulds,RF)距离被广泛用于比较物种树,但它对标记树的适用性暴露出一定的局限性。本研究引入了 k-RF 差异度量,以应对标记树比较的挑战。RF 距离在有 n 个节点的标记树空间中简洁地表示为 n-RF。与 RF 距离一样,k-RF 也是多集标签树的伪计量,在 1 个标签树的空间中成为一个度量。通过将 k 设为一个较小的值,k-RF 差异度可以捕捉到两个不同大小或不同标签的标记树中的类似局部区域。
{"title":"The <i>k</i>-Robinson-Foulds Dissimilarity Measures for Comparison of Labeled Trees.","authors":"Elahe Khayatian, Gabriel Valiente, Louxin Zhang","doi":"10.1089/cmb.2023.0312","DOIUrl":"10.1089/cmb.2023.0312","url":null,"abstract":"<p><p>\u0000 <b>Understanding the mutational history of tumor cells is a critical endeavor in unraveling the mechanisms that drive the onset and progression of cancer. Modeling tumor cell evolution with labeled trees motivates researchers to develop different measures to compare labeled trees. Although the Robinson-Foulds (RF) distance is widely used for comparing species trees, its applicability to labeled trees reveals certain limitations. This study introduces the <i>k</i>-RF dissimilarity measures, tailored to address the challenges of labeled tree comparison. The RF distance is succinctly expressed as <i>n</i>-RF in the space of labeled trees with <i>n</i> nodes. Like the RF distance, the <i>k</i>-RF is a pseudometric for multiset-labeled trees and becomes a metric in the space of 1-labeled trees. By setting <i>k</i> to a small value, the <i>k</i>-RF dissimilarity can capture analogous local regions in two labeled trees with different size or different labels.</b>\u0000 </p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11057537/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139564180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Orthology and Paralogy Relationships at Transcript Level. 转录本水平的正交和旁系关系
IF 1.7 4区 生物学 Q2 Mathematics Pub Date : 2024-04-01 Epub Date: 2024-04-16 DOI: 10.1089/cmb.2023.0400
Wend Yam D D Ouedraogo, Aida Ouangraoua

Eukaryotic genes undergo a mechanism called alternative processing, resulting in transcriptome diversity by allowing the production of multiple distinct transcripts from a gene. More than half of human genes are affected, and the resulting transcripts are highly conserved among orthologous genes of distinct species. In this work, we present the definition of orthology and paralogy between transcripts of homologous genes, together with an algorithm to compute clusters of conserved orthologous and paralogous transcripts. Gene-level homology relationships are utilized to define various types of homology relationships between transcripts originating from the same ancestral transcript. A Reciprocal Best Hits approach is employed to infer clusters of isoorthologous and recent paralogous transcripts. We applied this method to transcripts from simulated gene families as well as real gene families from the Ensembl-Compara database. The results are consistent with those from previous studies that compared orthologous gene transcripts. Furthermore, our findings provide evidence that searching for conserved transcripts between homologous genes, beyond the scope of orthologous genes, is likely to yield valuable information.

真核生物的基因会经历一种叫做 "替代加工 "的机制,从而使一个基因产生多种不同的转录本,从而导致转录本组的多样性。人类一半以上的基因都会受到影响,由此产生的转录本在不同物种的同源基因之间高度保守。在这项工作中,我们提出了同源基因转录本之间的同源和旁系定义,以及计算保守的同源和旁系转录本集群的算法。利用基因水平的同源关系来定义源自同一祖先转录本的转录本之间的各种同源关系。我们采用互惠最佳点击法来推断同源和最近的旁系转录本群。我们将这种方法应用于模拟基因家族以及 Ensembl-Compara 数据库中真实基因家族的转录本。结果与之前比较直向同源基因转录本的研究结果一致。此外,我们的研究结果还证明,搜索同源基因之间的保守转录本很可能会产生有价值的信息,而这已经超出了正交基因的范围。
{"title":"Orthology and Paralogy Relationships at Transcript Level.","authors":"Wend Yam D D Ouedraogo, Aida Ouangraoua","doi":"10.1089/cmb.2023.0400","DOIUrl":"10.1089/cmb.2023.0400","url":null,"abstract":"<p><p>\u0000 <b>Eukaryotic genes undergo a mechanism called alternative processing, resulting in transcriptome diversity by allowing the production of multiple distinct transcripts from a gene. More than half of human genes are affected, and the resulting transcripts are highly conserved among orthologous genes of distinct species. In this work, we present the definition of orthology and paralogy between transcripts of homologous genes, together with an algorithm to compute clusters of conserved orthologous and paralogous transcripts. Gene-level homology relationships are utilized to define various types of homology relationships between transcripts originating from the same ancestral transcript. A Reciprocal Best Hits approach is employed to infer clusters of isoorthologous and recent paralogous transcripts. We applied this method to transcripts from simulated gene families as well as real gene families from the Ensembl-Compara database. The results are consistent with those from previous studies that compared orthologous gene transcripts. Furthermore, our findings provide evidence that searching for conserved transcripts between homologous genes, beyond the scope of orthologous genes, is likely to yield valuable information.</b>\u0000 </p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140861411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward Robust Self-Training Paradigm for Molecular Prediction Tasks. 针对分子预测任务的鲁棒性自我训练范例。
IF 1.7 4区 生物学 Q2 Mathematics Pub Date : 2024-03-01 DOI: 10.1089/cmb.2023.0187
Hehuan Ma, Feng Jiang, Yu Rong, Yuzhi Guo, Junzhou Huang

Molecular prediction tasks normally demand a series of professional experiments to label the target molecule, which suffers from the limited labeled data problem. One of the semisupervised learning paradigms, known as self-training, utilizes both labeled and unlabeled data. Specifically, a teacher model is trained using labeled data and produces pseudo labels for unlabeled data. These labeled and pseudo-labeled data are then jointly used to train a student model. However, the pseudo labels generated from the teacher model are generally not sufficiently accurate. Thus, we propose a robust self-training strategy by exploring robust loss function to handle such noisy labels in two paradigms, that is, generic and adaptive. We have conducted experiments on three molecular biology prediction tasks with four backbone models to gradually evaluate the performance of the proposed robust self-training strategy. The results demonstrate that the proposed method enhances prediction performance across all tasks, notably within molecular regression tasks, where there has been an average enhancement of 41.5%. Furthermore, the visualization analysis confirms the superiority of our method. Our proposed robust self-training is a simple yet effective strategy that efficiently improves molecular biology prediction performance. It tackles the labeled data insufficient issue in molecular biology by taking advantage of both labeled and unlabeled data. Moreover, it can be easily embedded with any prediction task, which serves as a universal approach for the bioinformatics community.

分子预测任务通常需要一系列专业实验来标记目标分子,这就存在标记数据有限的问题。半监督学习范式之一,即自我训练(self-training),同时利用标记数据和非标记数据。具体来说,教师模型使用标记数据进行训练,并为未标记数据生成伪标签。然后,这些带标签和伪标签的数据被共同用于训练学生模型。然而,教师模型生成的伪标签通常不够准确。因此,我们提出了一种稳健的自我训练策略,通过探索稳健的损失函数,在通用和自适应两种范式中处理此类噪声标签。我们用四个骨干模型在三个分子生物学预测任务中进行了实验,以逐步评估所提出的鲁棒自我训练策略的性能。结果表明,所提出的方法提高了所有任务的预测性能,特别是在分子回归任务中,平均提高了 41.5%。此外,可视化分析也证实了我们方法的优越性。我们提出的鲁棒自我训练是一种简单而有效的策略,能有效提高分子生物学预测性能。它利用标记和非标记数据,解决了分子生物学中标记数据不足的问题。此外,它还能轻松嵌入任何预测任务,是生物信息学界的通用方法。
{"title":"Toward Robust Self-Training Paradigm for Molecular Prediction Tasks.","authors":"Hehuan Ma, Feng Jiang, Yu Rong, Yuzhi Guo, Junzhou Huang","doi":"10.1089/cmb.2023.0187","DOIUrl":"10.1089/cmb.2023.0187","url":null,"abstract":"<p><p>Molecular prediction tasks normally demand a series of professional experiments to label the target molecule, which suffers from the limited labeled data problem. One of the semisupervised learning paradigms, known as self-training, utilizes both labeled and unlabeled data. Specifically, a teacher model is trained using labeled data and produces pseudo labels for unlabeled data. These labeled and pseudo-labeled data are then jointly used to train a student model. However, the pseudo labels generated from the teacher model are generally not sufficiently accurate. Thus, we propose a robust self-training strategy by exploring robust loss function to handle such noisy labels in two paradigms, that is, generic and adaptive. We have conducted experiments on three molecular biology prediction tasks with four backbone models to gradually evaluate the performance of the proposed robust self-training strategy. The results demonstrate that the proposed method enhances prediction performance across all tasks, notably within molecular regression tasks, where there has been an average enhancement of 41.5%. Furthermore, the visualization analysis confirms the superiority of our method. Our proposed robust self-training is a simple yet effective strategy that efficiently improves molecular biology prediction performance. It tackles the labeled data insufficient issue in molecular biology by taking advantage of both labeled and unlabeled data. Moreover, it can be easily embedded with any prediction task, which serves as a universal approach for the bioinformatics community.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140293601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unveiling Gene Regulatory Networks That Characterize Difference of Molecular Interplays Between Gastric Cancer Drug Sensitive and Resistance Cell Lines. 揭示胃癌药物敏感和耐药细胞系之间分子相互作用差异的基因调控网络。
IF 1.7 4区 生物学 Q2 Mathematics Pub Date : 2024-03-01 Epub Date: 2024-02-23 DOI: 10.1089/cmb.2023.0215
Heewon Park

Gastric cancer is a leading cause of cancer-related deaths globally and chemotherapy is widely accepted as the standard treatment for gastric cancer. However, drug resistance in cancer cells poses a significant obstacle to the success of chemotherapy, limiting its effectiveness in treating gastric cancer. Although many studies have been conducted to unravel the mechanisms of acquired drug resistance, the existing studies were based on abnormalities of a single gene, that is, differential gene expression (DGE) analysis. Single gene-based analysis alone is insufficient to comprehensively understand the mechanisms of drug resistance in cancer cells, because the underlying processes of the mechanism involve perturbations of the molecular interactions. To uncover the mechanism of acquired gastric cancer drug resistance, we perform for identification of differentially regulated gene networks between drug-sensitive and drug-resistant cell lines. We develop a computational strategy for identifying phenotype-specific gene networks by extending the existing method, CIdrgn, that quantifies the dissimilarity of gene networks based on comprehensive information of network structure, that is, regulatory effect between genes, structure of edge, and expression levels of genes. To enhance the efficiency of identifying differentially regulated gene networks and improve the biological relevance of our findings, we integrate additional information and incorporate knowledge of network biology, such as hubness of genes and weighted adjacency matrices. The outstanding capabilities of the developed strategy are validated through Monte Carlo simulations. By using our strategy, we uncover gene regulatory networks that specifically capture the molecular interplays distinguishing drug-sensitive and drug-resistant profiles in gastric cancer. The reliability and significance of the identified drug-sensitive and resistance-specific gene networks, as well as their related markers, are verified through literature. Our analysis for differentially regulated gene network identification has the capacity to characterize the drug-sensitive and resistance-specific molecular interplays related to mechanisms of acquired drug resistance that cannot be revealed by analysis based solely on abnormalities of a single gene, for example, DGE analysis. Through our analysis and comprehensive examination of relevant literature, we suggest that targeting the suppressors of the identified drug-resistant markers, such as the Melanoma Antigen (MAGE) family, Trefoil Factor (TFF) family, and Ras-Associated Binding 25 (RAB25), while enhancing the expression of inducers of the drug sensitivity markers [e.g., Serum Amyloid A (SAA) family], could potentially reduce drug resistance and enhance the effectiveness of chemotherapy for gastric cancer. We expect that the developed strategy will serve as a useful tool for uncovering cancer-related phenotype-specific gene regulatory

胃癌是全球癌症相关死亡的主要原因,化疗被广泛接受为胃癌的标准治疗方法。然而,癌细胞的耐药性是化疗成功的一大障碍,限制了化疗治疗胃癌的效果。虽然已有许多研究揭示了获得性耐药的机制,但现有的研究都是基于单个基因的异常,即差异基因表达(DGE)分析。由于该机制的基本过程涉及分子相互作用的扰动,因此仅基于单基因的分析不足以全面了解癌细胞的耐药机制。为了揭示获得性胃癌耐药机制,我们对药物敏感细胞系和耐药细胞系之间的差异调控基因网络进行了鉴定。我们开发了一种识别表型特异性基因网络的计算策略,它扩展了现有的 CIdrgn 方法,该方法基于网络结构的综合信息(即基因间的调控效应、边缘结构和基因表达水平)量化基因网络的异质性。为了提高识别差异调控基因网络的效率,改善研究结果的生物学相关性,我们整合了更多信息,并融入了网络生物学知识,如基因的枢纽性和加权邻接矩阵。蒙特卡罗模拟验证了所开发策略的卓越能力。通过使用我们的策略,我们发现了基因调控网络,这些网络能具体捕捉区分胃癌药物敏感和耐药特征的分子相互作用。所发现的药物敏感性和耐药性特异性基因网络及其相关标记物的可靠性和重要性通过文献得到了验证。我们的差异调控基因网络鉴定分析有能力描述与获得性耐药机制有关的药物敏感性和耐药性特异性分子相互作用的特征,而这些特征无法通过仅基于单个基因异常的分析(如 DGE 分析)来揭示。通过分析和对相关文献的综合研究,我们认为,针对已发现的耐药标记抑制剂,如黑色素瘤抗原(MAGE)家族、三叶因子(TFF)家族和 Ras 相关结合 25(RAB25),同时提高药物敏感标记诱导剂[如血清淀粉样蛋白 A(SAA)家族]的表达,有可能降低耐药性,提高胃癌化疗的疗效。我们期待所开发的策略能成为揭示癌症相关表型特异性基因调控网络的有用工具,该网络不仅能为揭示耐药机制提供重要线索,还能为揭示癌症的复杂生物系统提供重要线索。
{"title":"Unveiling Gene Regulatory Networks That Characterize Difference of Molecular Interplays Between Gastric Cancer Drug Sensitive and Resistance Cell Lines.","authors":"Heewon Park","doi":"10.1089/cmb.2023.0215","DOIUrl":"10.1089/cmb.2023.0215","url":null,"abstract":"<p><p>Gastric cancer is a leading cause of cancer-related deaths globally and chemotherapy is widely accepted as the standard treatment for gastric cancer. However, drug resistance in cancer cells poses a significant obstacle to the success of chemotherapy, limiting its effectiveness in treating gastric cancer. Although many studies have been conducted to unravel the mechanisms of acquired drug resistance, the existing studies were based on abnormalities of a single gene, that is, differential gene expression (DGE) analysis. Single gene-based analysis alone is insufficient to comprehensively understand the mechanisms of drug resistance in cancer cells, because the underlying processes of the mechanism involve perturbations of the molecular interactions. To uncover the mechanism of acquired gastric cancer drug resistance, we perform for identification of differentially regulated gene networks between drug-sensitive and drug-resistant cell lines. We develop a computational strategy for identifying phenotype-specific gene networks by extending the existing method, CIdrgn, that quantifies the dissimilarity of gene networks based on comprehensive information of network structure, that is, regulatory effect between genes, structure of edge, and expression levels of genes. To enhance the efficiency of identifying differentially regulated gene networks and improve the biological relevance of our findings, we integrate additional information and incorporate knowledge of network biology, such as hubness of genes and weighted adjacency matrices. The outstanding capabilities of the developed strategy are validated through Monte Carlo simulations. By using our strategy, we uncover gene regulatory networks that specifically capture the molecular interplays distinguishing drug-sensitive and drug-resistant profiles in gastric cancer. The reliability and significance of the identified drug-sensitive and resistance-specific gene networks, as well as their related markers, are verified through literature. Our analysis for differentially regulated gene network identification has the capacity to characterize the drug-sensitive and resistance-specific molecular interplays related to mechanisms of acquired drug resistance that cannot be revealed by analysis based solely on abnormalities of a single gene, for example, DGE analysis. Through our analysis and comprehensive examination of relevant literature, we suggest that targeting the suppressors of the identified drug-resistant markers, such as the Melanoma Antigen (<i>MAGE</i>) family, Trefoil Factor (<i>TFF</i>) family, and Ras-Associated Binding 25 (<i>RAB25</i>), while enhancing the expression of inducers of the drug sensitivity markers [e.g., Serum Amyloid A (<i>SAA</i>) family], could potentially reduce drug resistance and enhance the effectiveness of chemotherapy for gastric cancer. We expect that the developed strategy will serve as a useful tool for uncovering cancer-related phenotype-specific gene regulatory ","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139939978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GS-TCGA: Gene Set-Based Analysis of The Cancer Genome Atlas. GS-TCGA:基于基因组的癌症基因组图谱分析。
IF 1.7 4区 生物学 Q2 Mathematics Pub Date : 2024-03-01 Epub Date: 2024-03-04 DOI: 10.1089/cmb.2023.0278
Tarrion Baird, Rahul Roychoudhuri

Most tools for analyzing large gene expression datasets, including The Cancer Genome Atlas (TCGA), have focused on analyzing the expression of individual genes or inference of the abundance of specific cell types from whole transcriptome information. While these methods provide useful insights, they can overlook crucial process-based information that may enhance our understanding of cancer biology. In this study, we describe three novel tools incorporated into an online resource; gene set-based analysis of The Cancer Genome Atlas (GS-TCGA). GS-TCGA is designed to enable user-friendly exploration of TCGA data using gene set-based analysis, leveraging gene sets from the Molecular Signatures Database. GS-TCGA includes three unique tools: GS-Surv determines the association between the expression of gene sets and survival in human cancers. Co-correlative gene set enrichment analysis (CC-GSEA) utilizes interpatient heterogeneity in cancer gene expression to infer functions of specific genes based on GSEA of coregulated genes in TCGA. GS-Corr utilizes interpatient heterogeneity in cancer gene expression profiles to identify genes coregulated with the expression of specific gene sets in TCGA. Users are also able to upload custom gene sets for analysis with each tool. These tools empower researchers to perform survival analysis linked to gene set expression, explore the functional implications of gene coexpression, and identify potential gene regulatory mechanisms.

分析包括癌症基因组图谱(TCGA)在内的大型基因表达数据集的大多数工具都侧重于分析单个基因的表达,或从整个转录组信息中推断特定细胞类型的丰度。虽然这些方法能提供有用的见解,但它们可能会忽略一些关键的基于过程的信息,而这些信息可能会加深我们对癌症生物学的理解。在本研究中,我们介绍了纳入在线资源的三种新型工具:基于基因组的癌症基因组图谱分析(GS-TCGA)。GS-TCGA 的设计目的是利用分子特征数据库中的基因组,通过基于基因组的分析对 TCGA 数据进行用户友好型探索。GS-TCGA 包括三个独特的工具:GS-Surv 确定人类癌症中基因组表达与存活之间的关联。共相关基因组富集分析(CC-GSEA)利用癌症基因表达的患者间异质性,根据 TCGA 中核心基因的 GSEA 推断特定基因的功能。GS-Corr 利用癌症基因表达谱中的患者间异质性来确定与 TCGA 中特定基因集表达相关的核心基因。用户还可以上传自定义基因集,以便用每种工具进行分析。这些工具使研究人员能够进行与基因组表达相关的生存分析,探索基因共表达的功能意义,并确定潜在的基因调控机制。
{"title":"GS-TCGA: Gene Set-Based Analysis of The Cancer Genome Atlas.","authors":"Tarrion Baird, Rahul Roychoudhuri","doi":"10.1089/cmb.2023.0278","DOIUrl":"10.1089/cmb.2023.0278","url":null,"abstract":"<p><p>Most tools for analyzing large gene expression datasets, including The Cancer Genome Atlas (TCGA), have focused on analyzing the expression of individual genes or inference of the abundance of specific cell types from whole transcriptome information. While these methods provide useful insights, they can overlook crucial process-based information that may enhance our understanding of cancer biology. In this study, we describe three novel tools incorporated into an online resource; gene set-based analysis of The Cancer Genome Atlas (GS-TCGA). GS-TCGA is designed to enable user-friendly exploration of TCGA data using gene set-based analysis, leveraging gene sets from the Molecular Signatures Database. GS-TCGA includes three unique tools: GS-Surv determines the association between the expression of gene sets and survival in human cancers. Co-correlative gene set enrichment analysis (CC-GSEA) utilizes interpatient heterogeneity in cancer gene expression to infer functions of specific genes based on GSEA of coregulated genes in TCGA. GS-Corr utilizes interpatient heterogeneity in cancer gene expression profiles to identify genes coregulated with the expression of specific gene sets in TCGA. Users are also able to upload custom gene sets for analysis with each tool. These tools empower researchers to perform survival analysis linked to gene set expression, explore the functional implications of gene coexpression, and identify potential gene regulatory mechanisms.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140021922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Finding Highly Similar Regions of Genomic Sequences Through Homomorphic Encryption. 通过同态加密寻找基因组序列的高度相似区。
IF 1.7 4区 生物学 Q2 Mathematics Pub Date : 2024-03-01 DOI: 10.1089/cmb.2023.0050
Magsarjav Bataa, Siwoo Song, Kunsoo Park, Miran Kim, Jung Hee Cheon, Sun Kim

Finding highly similar regions of genomic sequences is a basic computation of genomic analysis. Genomic analyses on a large amount of data are efficiently processed in cloud environments, but outsourcing them to a cloud raises concerns over the privacy and security issues. Homomorphic encryption (HE) is a powerful cryptographic primitive that preserves privacy of genomic data in various analyses processed in an untrusted cloud environment. We introduce an efficient algorithm for finding highly similar regions of two homomorphically encrypted sequences, and describe how to implement it using the bit-wise and word-wise HE schemes. In the experiment, our algorithm outperforms an existing algorithm by up to two orders of magnitude in terms of elapsed time. Overall, it finds highly similar regions of the sequences in real data sets in a feasible time.

寻找基因组序列的高度相似区域是基因组分析的一项基本计算。在云环境中可以高效处理大量数据的基因组分析,但将其外包到云中会引发对隐私和安全问题的担忧。同态加密(HE)是一种功能强大的加密原语,可以保护在不受信任的云环境中处理的各种分析中基因组数据的隐私。我们介绍了一种高效算法,用于寻找两个同态加密序列的高度相似区域,并介绍了如何使用比特和字的HE方案来实现该算法。在实验中,我们的算法在耗时方面比现有算法高出两个数量级。总之,它能在可行的时间内找到真实数据集中高度相似的序列区域。
{"title":"Finding Highly Similar Regions of Genomic Sequences Through Homomorphic Encryption.","authors":"Magsarjav Bataa, Siwoo Song, Kunsoo Park, Miran Kim, Jung Hee Cheon, Sun Kim","doi":"10.1089/cmb.2023.0050","DOIUrl":"10.1089/cmb.2023.0050","url":null,"abstract":"<p><p>Finding highly similar regions of genomic sequences is a basic computation of genomic analysis. Genomic analyses on a large amount of data are efficiently processed in cloud environments, but outsourcing them to a cloud raises concerns over the privacy and security issues. Homomorphic encryption (HE) is a powerful cryptographic primitive that preserves privacy of genomic data in various analyses processed in an untrusted cloud environment. We introduce an efficient algorithm for finding highly similar regions of two homomorphically encrypted sequences, and describe how to implement it using the bit-wise and word-wise HE schemes. In the experiment, our algorithm outperforms an existing algorithm by up to two orders of magnitude in terms of elapsed time. Overall, it finds highly similar regions of the sequences in real data sets in a feasible time.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140293600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of MicroRNA-Disease Potential Association Based on Sparse Learning and Multilayer Random Walks. 基于稀疏学习和多层随机漫步的微RNA-疾病潜在关联预测
IF 1.7 4区 生物学 Q2 Mathematics Pub Date : 2024-03-01 Epub Date: 2024-02-19 DOI: 10.1089/cmb.2023.0266
Hai-Bin Yao, Zhen-Jie Hou, Wen-Guang Zhang, Han Li, Yan Chen

More and more studies have shown that microRNAs (miRNAs) play an indispensable role in the study of complex diseases in humans. Traditional biological experiments to detect miRNA-disease associations are expensive and time-consuming. Therefore, it is necessary to propose efficient and meaningful computational models to predict miRNA-disease associations. In this study, we aim to propose a miRNA-disease association prediction model based on sparse learning and multilayer random walks (SLMRWMDA). The miRNA-disease association matrix is decomposed and reconstructed by the sparse learning method to obtain richer association information, and at the same time, the initial probability matrix for the random walk with restart algorithm is obtained. The disease similarity network, miRNA similarity network, and miRNA-disease association network are used to construct heterogeneous networks, and the stable probability is obtained based on the topological structure features of diseases and miRNAs through a multilayer random walk algorithm to predict miRNA-disease potential association. The experimental results show that the prediction accuracy of this model is significantly improved compared with the previous related models. We evaluated the model using global leave-one-out cross-validation (global LOOCV) and fivefold cross-validation (5-fold CV). The area under the curve (AUC) value for the LOOCV is 0.9368. The mean AUC value for 5-fold CV is 0.9335 and the variance is 0.0004. In the case study, the results show that SLMRWMDA is effective in inferring the potential association of miRNA-disease.

越来越多的研究表明,microRNA(miRNA)在人类复杂疾病的研究中发挥着不可或缺的作用。检测 miRNA 与疾病关联的传统生物学实验既昂贵又耗时。因此,有必要提出高效且有意义的计算模型来预测 miRNA 与疾病的关联。本研究旨在提出一种基于稀疏学习和多层随机游走(SLMRWMDA)的 miRNA-疾病关联预测模型。通过稀疏学习方法对 miRNA-疾病关联矩阵进行分解和重构,以获得更丰富的关联信息,同时获得重启算法随机行走的初始概率矩阵。利用疾病相似性网络、miRNA相似性网络和miRNA-疾病关联网络构建异构网络,并根据疾病和miRNA的拓扑结构特征,通过多层随机游走算法获得稳定概率,预测miRNA-疾病潜在关联。实验结果表明,与之前的相关模型相比,该模型的预测准确率有了显著提高。我们使用全局缺一交叉验证(global LOOCV)和五倍交叉验证(5-fold CV)对该模型进行了评估。LOOCV 的曲线下面积(AUC)值为 0.9368。5 倍 CV 的平均 AUC 值为 0.9335,方差为 0.0004。案例研究结果表明,SLMRWMDA 能有效推断 miRNA 与疾病的潜在关联。
{"title":"Prediction of MicroRNA-Disease Potential Association Based on Sparse Learning and Multilayer Random Walks.","authors":"Hai-Bin Yao, Zhen-Jie Hou, Wen-Guang Zhang, Han Li, Yan Chen","doi":"10.1089/cmb.2023.0266","DOIUrl":"10.1089/cmb.2023.0266","url":null,"abstract":"<p><p>More and more studies have shown that microRNAs (miRNAs) play an indispensable role in the study of complex diseases in humans. Traditional biological experiments to detect miRNA-disease associations are expensive and time-consuming. Therefore, it is necessary to propose efficient and meaningful computational models to predict miRNA-disease associations. In this study, we aim to propose a miRNA-disease association prediction model based on sparse learning and multilayer random walks (SLMRWMDA). The miRNA-disease association matrix is decomposed and reconstructed by the sparse learning method to obtain richer association information, and at the same time, the initial probability matrix for the random walk with restart algorithm is obtained. The disease similarity network, miRNA similarity network, and miRNA-disease association network are used to construct heterogeneous networks, and the stable probability is obtained based on the topological structure features of diseases and miRNAs through a multilayer random walk algorithm to predict miRNA-disease potential association. The experimental results show that the prediction accuracy of this model is significantly improved compared with the previous related models. We evaluated the model using global leave-one-out cross-validation (global LOOCV) and fivefold cross-validation (5-fold CV). The area under the curve (AUC) value for the LOOCV is 0.9368. The mean AUC value for 5-fold CV is 0.9335 and the variance is 0.0004. In the case study, the results show that SLMRWMDA is effective in inferring the potential association of miRNA-disease.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139912747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Computational Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1