首页 > 最新文献

Briefings in bioinformatics最新文献

英文 中文
Identifying cell types by lasso-constraint regularized Gaussian graphical model based on weighted distance penalty. 基于加权距离惩罚的套索约束正则化高斯图形模型识别细胞类型。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae572
Wei Zhang, Yaxin Xu, Xiaoying Zheng, Juan Shen, Yuanyuan Li

Single-cell RNA sequencing (scRNA-seq) technology is one of the most cost-effective and efficacious methods for revealing cellular heterogeneity and diversity. Precise identification of cell types is essential for establishing a robust foundation for downstream analyses and is a prerequisite for understanding heterogeneous mechanisms. However, the accuracy of existing methods warrants improvement, and highly accurate methods often impose stringent equipment requirements. Moreover, most unsupervised learning-based approaches are constrained by the need to input the number of cell types a prior, which limits their widespread application. In this paper, we propose a novel algorithm framework named WLGG. Initially, to capture the underlying nonlinear information, we introduce a weighted distance penalty term utilizing the Gaussian kernel function, which maps data from a low-dimensional nonlinear space to a high-dimensional linear space. We subsequently impose a Lasso constraint on the regularized Gaussian graphical model to enhance its ability to capture linear data characteristics. Additionally, we utilize the Eigengap strategy to predict the number of cell types and obtain predicted labels via spectral clustering. The experimental results on 14 test datasets demonstrate the superior clustering accuracy of the WLGG algorithm over 16 alternative methods. Furthermore, downstream analysis, including marker gene identification, pseudotime inference, and functional enrichment analysis based on the similarity matrix and predicted labels from the WLGG algorithm, substantiates the reliability of WLGG and offers valuable insights into biological dynamic biological processes and regulatory mechanisms.

单细胞 RNA 测序(scRNA-seq)技术是揭示细胞异质性和多样性最经济有效的方法之一。精确鉴定细胞类型对于为下游分析奠定坚实基础至关重要,也是了解异质性机制的先决条件。然而,现有方法的准确性有待提高,而高准确性方法往往对设备有严格要求。此外,大多数基于无监督学习的方法受限于需要先输入细胞类型的数量,这限制了它们的广泛应用。在本文中,我们提出了一种名为 WLGG 的新型算法框架。首先,为了捕捉潜在的非线性信息,我们利用高斯核函数引入了加权距离惩罚项,将数据从低维非线性空间映射到高维线性空间。随后,我们对正则化高斯图形模型施加 Lasso 约束,以增强其捕捉线性数据特征的能力。此外,我们还利用 Eigengap 策略预测细胞类型的数量,并通过光谱聚类获得预测标签。14 个测试数据集的实验结果表明,WLGG 算法的聚类准确性优于 16 种替代方法。此外,基于 WLGG 算法的相似性矩阵和预测标签进行的下游分析,包括标记基因鉴定、伪时间推断和功能富集分析,都证实了 WLGG 算法的可靠性,并为生物动态过程和调控机制提供了宝贵的见解。
{"title":"Identifying cell types by lasso-constraint regularized Gaussian graphical model based on weighted distance penalty.","authors":"Wei Zhang, Yaxin Xu, Xiaoying Zheng, Juan Shen, Yuanyuan Li","doi":"10.1093/bib/bbae572","DOIUrl":"10.1093/bib/bbae572","url":null,"abstract":"<p><p>Single-cell RNA sequencing (scRNA-seq) technology is one of the most cost-effective and efficacious methods for revealing cellular heterogeneity and diversity. Precise identification of cell types is essential for establishing a robust foundation for downstream analyses and is a prerequisite for understanding heterogeneous mechanisms. However, the accuracy of existing methods warrants improvement, and highly accurate methods often impose stringent equipment requirements. Moreover, most unsupervised learning-based approaches are constrained by the need to input the number of cell types a prior, which limits their widespread application. In this paper, we propose a novel algorithm framework named WLGG. Initially, to capture the underlying nonlinear information, we introduce a weighted distance penalty term utilizing the Gaussian kernel function, which maps data from a low-dimensional nonlinear space to a high-dimensional linear space. We subsequently impose a Lasso constraint on the regularized Gaussian graphical model to enhance its ability to capture linear data characteristics. Additionally, we utilize the Eigengap strategy to predict the number of cell types and obtain predicted labels via spectral clustering. The experimental results on 14 test datasets demonstrate the superior clustering accuracy of the WLGG algorithm over 16 alternative methods. Furthermore, downstream analysis, including marker gene identification, pseudotime inference, and functional enrichment analysis based on the similarity matrix and predicted labels from the WLGG algorithm, substantiates the reliability of WLGG and offers valuable insights into biological dynamic biological processes and regulatory mechanisms.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562834/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142614904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DGCL: dual-graph neural networks contrastive learning for molecular property prediction. DGCL:用于分子特性预测的双图神经网络对比学习。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae474
Xiuyu Jiang, Liqin Tan, Qingsong Zou

In this paper, we propose DGCL, a dual-graph neural networks (GNNs)-based contrastive learning (CL) integrated with mixed molecular fingerprints (MFPs) for molecular property prediction. The DGCL-MFP method contains two stages. In the first pretraining stage, we utilize two different GNNs as encoders to construct CL, rather than using the method of generating enhanced graphs as before. Precisely, DGCL aggregates and enhances features of the same molecule by the Graph Isomorphism Network and the Graph Attention Network, with representations extracted from the same molecule serving as positive samples, and others marked as negative ones. In the downstream tasks training stage, features extracted from the two above pretrained graph networks and the meticulously selected MFPs are concated together to predict molecular properties. Our experiments show that DGCL enhances the performance of existing GNNs by achieving or surpassing the state-of-the-art self-supervised learning models on multiple benchmark datasets. Specifically, DGCL increases the average performance of classification tasks by 3.73$%$ and improves the performance of regression task Lipo by 0.126. Through ablation studies, we validate the impact of network fusion strategies and MFPs on model performance. In addition, DGCL's predictive performance is further enhanced by weighting different molecular features based on the Extended Connectivity Fingerprint. The code and datasets of DGCL will be made publicly available.

本文提出了基于双图神经网络(GNN)的对比学习(CL)与混合分子指纹(MFP)相结合的分子性质预测方法 DGCL。DGCL-MFP 方法包含两个阶段。在第一个预训练阶段,我们利用两个不同的 GNN 作为编码器来构建 CL,而不是像以前那样使用生成增强图的方法。确切地说,DGCL 通过图同构网络和图注意力网络对同一分子的特征进行聚合和增强,将从同一分子中提取的表征作为正样本,其他表征作为负样本。在下游任务训练阶段,从上述两个预训练图网络和精心挑选的 MFP 中提取的特征将被整合在一起,用于预测分子特性。我们的实验表明,DGCL 提高了现有 GNN 的性能,在多个基准数据集上达到或超过了最先进的自监督学习模型。具体来说,DGCL 将分类任务的平均性能提高了 3.73%,将回归任务 Lipo 的性能提高了 0.126。通过消融研究,我们验证了网络融合策略和 MFP 对模型性能的影响。此外,基于扩展连接指纹对不同的分子特征进行加权,进一步提高了 DGCL 的预测性能。DGCL 的代码和数据集将公开发布。
{"title":"DGCL: dual-graph neural networks contrastive learning for molecular property prediction.","authors":"Xiuyu Jiang, Liqin Tan, Qingsong Zou","doi":"10.1093/bib/bbae474","DOIUrl":"https://doi.org/10.1093/bib/bbae474","url":null,"abstract":"<p><p>In this paper, we propose DGCL, a dual-graph neural networks (GNNs)-based contrastive learning (CL) integrated with mixed molecular fingerprints (MFPs) for molecular property prediction. The DGCL-MFP method contains two stages. In the first pretraining stage, we utilize two different GNNs as encoders to construct CL, rather than using the method of generating enhanced graphs as before. Precisely, DGCL aggregates and enhances features of the same molecule by the Graph Isomorphism Network and the Graph Attention Network, with representations extracted from the same molecule serving as positive samples, and others marked as negative ones. In the downstream tasks training stage, features extracted from the two above pretrained graph networks and the meticulously selected MFPs are concated together to predict molecular properties. Our experiments show that DGCL enhances the performance of existing GNNs by achieving or surpassing the state-of-the-art self-supervised learning models on multiple benchmark datasets. Specifically, DGCL increases the average performance of classification tasks by 3.73$%$ and improves the performance of regression task Lipo by 0.126. Through ablation studies, we validate the impact of network fusion strategies and MFPs on model performance. In addition, DGCL's predictive performance is further enhanced by weighting different molecular features based on the Extended Connectivity Fingerprint. The code and datasets of DGCL will be made publicly available.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11428321/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142341941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recover then aggregate: unified cross-modal deep clustering with global structural information for single-cell data. 先恢复后聚合:利用全局结构信息对单细胞数据进行统一的跨模态深度聚类。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae485
Ziyi Wang, Peng Luo, Mingming Xiao, Boyang Wang, Tianyu Liu, Xiangyu Sun

Single-cell cross-modal joint clustering has been extensively utilized to investigate the tumor microenvironment. Although numerous approaches have been suggested, accurate clustering remains the main challenge. First, the gene expression matrix frequently contains numerous missing values due to measurement limitations. The majority of existing clustering methods treat it as a typical multi-modal dataset without further processing. Few methods conduct recovery before clustering and do not sufficiently engage with the underlying research, leading to suboptimal outcomes. Additionally, the existing cross-modal information fusion strategy does not ensure consistency of representations across different modes, potentially leading to the integration of conflicting information, which could degrade performance. To address these challenges, we propose the 'Recover then Aggregate' strategy and introduce the Unified Cross-Modal Deep Clustering model. Specifically, we have developed a data augmentation technique based on neighborhood similarity, iteratively imposing rank constraints on the Laplacian matrix, thus updating the similarity matrix and recovering dropout events. Concurrently, we integrate cross-modal features and employ contrastive learning to align modality-specific representations with consistent ones, enhancing the effective integration of diverse modal information. Comprehensive experiments on five real-world multi-modal datasets have demonstrated this method's superior effectiveness in single-cell clustering tasks.

单细胞跨模态联合聚类已被广泛用于研究肿瘤微环境。尽管提出了许多方法,但准确聚类仍是主要挑战。首先,由于测量的局限性,基因表达矩阵经常包含大量缺失值。现有的大多数聚类方法都将其作为典型的多模态数据集处理,而不做进一步处理。很少有方法会在聚类前进行恢复,也没有充分参与基础研究,从而导致了次优结果。此外,现有的跨模态信息融合策略无法确保不同模态表征的一致性,可能导致冲突信息的融合,从而降低性能。为了应对这些挑战,我们提出了 "先恢复后聚合 "策略,并引入了统一跨模态深度聚类模型。具体来说,我们开发了一种基于邻域相似性的数据增强技术,对拉普拉斯矩阵迭代施加秩约束,从而更新相似性矩阵并恢复掉队事件。与此同时,我们还整合了跨模态特征,并采用对比学习将特定模态表征与一致的表征相统一,从而增强了对不同模态信息的有效整合。在五个真实世界多模态数据集上进行的综合实验证明了这种方法在单细胞聚类任务中的卓越功效。
{"title":"Recover then aggregate: unified cross-modal deep clustering with global structural information for single-cell data.","authors":"Ziyi Wang, Peng Luo, Mingming Xiao, Boyang Wang, Tianyu Liu, Xiangyu Sun","doi":"10.1093/bib/bbae485","DOIUrl":"10.1093/bib/bbae485","url":null,"abstract":"<p><p>Single-cell cross-modal joint clustering has been extensively utilized to investigate the tumor microenvironment. Although numerous approaches have been suggested, accurate clustering remains the main challenge. First, the gene expression matrix frequently contains numerous missing values due to measurement limitations. The majority of existing clustering methods treat it as a typical multi-modal dataset without further processing. Few methods conduct recovery before clustering and do not sufficiently engage with the underlying research, leading to suboptimal outcomes. Additionally, the existing cross-modal information fusion strategy does not ensure consistency of representations across different modes, potentially leading to the integration of conflicting information, which could degrade performance. To address these challenges, we propose the 'Recover then Aggregate' strategy and introduce the Unified Cross-Modal Deep Clustering model. Specifically, we have developed a data augmentation technique based on neighborhood similarity, iteratively imposing rank constraints on the Laplacian matrix, thus updating the similarity matrix and recovering dropout events. Concurrently, we integrate cross-modal features and employ contrastive learning to align modality-specific representations with consistent ones, enhancing the effective integration of diverse modal information. Comprehensive experiments on five real-world multi-modal datasets have demonstrated this method's superior effectiveness in single-cell clustering tasks.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11445907/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142361070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bioinformatics in Russia: history and present-day landscape. 俄罗斯的生物信息学:历史与现状。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae513
Muhammad A Nawaz, Igor E Pamirsky, Kirill S Golokhvast

Bioinformatics has become an interdisciplinary subject due to its universal role in molecular biology research. The current status of Russia's bioinformatics research in Russia is not known. Here, we review the history of bioinformatics in Russia, present the current landscape, and highlight future directions and challenges. Bioinformatics research in Russia is driven by four major industries: information technology, pharmaceuticals, biotechnology, and agriculture. Over the past three decades, despite a delayed start, the field has gained momentum, especially in protein and nucleic acid research. Dedicated and shared centers for genomics, proteomics, and bioinformatics are active in different regions of Russia. Present-day bioinformatics in Russia is characterized by research issues related to genetics, metagenomics, OMICs, medical informatics, computational biology, environmental informatics, and structural bioinformatics. Notable developments are in the fields of software (tools, algorithms, and pipelines), use of high computation power (e.g. by the Siberian Supercomputer Center), and large-scale sequencing projects (the sequencing of 100 000 human genomes). Government funding is increasing, policies are being changed, and a National Genomic Information Database is being established. An increased focus on eukaryotic genome sequencing, the development of a common place for developers and researchers to share tools and data, and the use of biological modeling, machine learning, and biostatistics are key areas for future focus. Universities and research institutes have started to implement bioinformatics modules. A critical mass of bioinformaticians is essential to catch up with the global pace in the discipline.

由于生物信息学在分子生物学研究中的普遍作用,它已成为一门跨学科学科。俄罗斯的生物信息学研究现状尚不清楚。在此,我们回顾了俄罗斯生物信息学的历史,介绍了目前的状况,并强调了未来的方向和挑战。俄罗斯的生物信息学研究主要由四大产业驱动:信息技术、制药、生物技术和农业。在过去的三十年中,尽管起步较晚,但该领域的发展势头良好,尤其是在蛋白质和核酸研究方面。基因组学、蛋白质组学和生物信息学的专用和共享中心活跃在俄罗斯的不同地区。当今俄罗斯生物信息学的特点是与遗传学、元基因组学、OMICs、医学信息学、计算生物学、环境信息学和结构生物信息学有关的研究问题。在软件(工具、算法和管道)、高计算能力的使用(如西伯利亚超级计算机中心)和大规模测序项目(10 万个人类基因组的测序)等领域取得了显著发展。政府的资金正在增加,政策正在改变,国家基因组信息数据库正在建立。对真核基因组测序的日益重视,为开发人员和研究人员开发共享工具和数据的共同场所,以及生物建模、机器学习和生物统计学的应用,都是未来重点关注的领域。大学和研究机构已开始实施生物信息学模块。要跟上全球生物信息学的发展步伐,必须有足够数量的生物信息学家。
{"title":"Bioinformatics in Russia: history and present-day landscape.","authors":"Muhammad A Nawaz, Igor E Pamirsky, Kirill S Golokhvast","doi":"10.1093/bib/bbae513","DOIUrl":"10.1093/bib/bbae513","url":null,"abstract":"<p><p>Bioinformatics has become an interdisciplinary subject due to its universal role in molecular biology research. The current status of Russia's bioinformatics research in Russia is not known. Here, we review the history of bioinformatics in Russia, present the current landscape, and highlight future directions and challenges. Bioinformatics research in Russia is driven by four major industries: information technology, pharmaceuticals, biotechnology, and agriculture. Over the past three decades, despite a delayed start, the field has gained momentum, especially in protein and nucleic acid research. Dedicated and shared centers for genomics, proteomics, and bioinformatics are active in different regions of Russia. Present-day bioinformatics in Russia is characterized by research issues related to genetics, metagenomics, OMICs, medical informatics, computational biology, environmental informatics, and structural bioinformatics. Notable developments are in the fields of software (tools, algorithms, and pipelines), use of high computation power (e.g. by the Siberian Supercomputer Center), and large-scale sequencing projects (the sequencing of 100 000 human genomes). Government funding is increasing, policies are being changed, and a National Genomic Information Database is being established. An increased focus on eukaryotic genome sequencing, the development of a common place for developers and researchers to share tools and data, and the use of biological modeling, machine learning, and biostatistics are key areas for future focus. Universities and research institutes have started to implement bioinformatics modules. A critical mass of bioinformaticians is essential to catch up with the global pace in the discipline.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11473191/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142458366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mut-Map: Comprehensive Computational Pipeline for Structural Mapping and Analysis of Cancer-Associated Mutations. Mut-Map:用于癌症相关突变的结构映射和分析的综合计算管道。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae514
Ali F Alsulami

Understanding the functional impact of genetic mutations on protein structures is essential for advancing cancer research and developing targeted therapies. The main challenge lies in accurately mapping these mutations to protein structures and analysing their effects on protein function. To address this, Mut-Map (https://genemutation.org/) is a comprehensive computational pipeline designed to integrate mutation data from the Catalogue Of Somatic Mutations In Cancer database with protein structural data from the Protein Data Bank and AlphaFold models. The pipeline begins by taking a UniProt ID and proceeds through mapping corresponding Protein Data Bank structures, renumbering residues, and assessing disorder percentages. It then overlays mutation data, categorizes mutations based on structural context, and visualizes them using advanced tools like MolStar. This approach allows for a detailed analysis of how mutations may disrupt protein function by affecting key regions such as DNA interfaces, ligand-binding sites, and dimer interactions. To validate the pipeline, a case study on the TP53 gene, a critical tumour suppressor often mutated in cancers, was conducted. The analysis highlighted the most frequent mutations occurring at the DNA-binding interface, providing insights into their potential role in cancer progression. Mut-Map offers a powerful resource for elucidating the structural implications of cancer-associated mutations, paving the way for more targeted therapeutic strategies and advancing our understanding of protein structure-function relationships.

了解基因突变对蛋白质结构的功能影响对于推动癌症研究和开发靶向疗法至关重要。主要的挑战在于如何准确地将这些突变映射到蛋白质结构上,并分析它们对蛋白质功能的影响。为了解决这个问题,Mut-Map (https://genemutation.org/) 是一个综合计算管道,旨在将癌症中的体细胞突变目录数据库中的突变数据与蛋白质数据库和 AlphaFold 模型中的蛋白质结构数据整合在一起。该管道首先获取一个 UniProt ID,然后映射相应的蛋白质数据库结构、重新编号残基并评估紊乱百分比。然后叠加突变数据,根据结构背景对突变进行分类,并使用 MolStar 等先进工具对突变进行可视化处理。这种方法可以详细分析突变如何通过影响 DNA 界面、配体结合位点和二聚体相互作用等关键区域来破坏蛋白质的功能。为了验证该管道,我们对 TP53 基因进行了案例研究,该基因是一种在癌症中经常发生突变的关键肿瘤抑制因子。该分析突出显示了 DNA 结合界面上最常见的突变,为深入了解其在癌症进展中的潜在作用提供了线索。Mut-Map 为阐明癌症相关突变的结构影响提供了强大的资源,为制定更有针对性的治疗策略铺平了道路,并加深了我们对蛋白质结构与功能关系的理解。
{"title":"Mut-Map: Comprehensive Computational Pipeline for Structural Mapping and Analysis of Cancer-Associated Mutations.","authors":"Ali F Alsulami","doi":"10.1093/bib/bbae514","DOIUrl":"https://doi.org/10.1093/bib/bbae514","url":null,"abstract":"<p><p>Understanding the functional impact of genetic mutations on protein structures is essential for advancing cancer research and developing targeted therapies. The main challenge lies in accurately mapping these mutations to protein structures and analysing their effects on protein function. To address this, Mut-Map (https://genemutation.org/) is a comprehensive computational pipeline designed to integrate mutation data from the Catalogue Of Somatic Mutations In Cancer database with protein structural data from the Protein Data Bank and AlphaFold models. The pipeline begins by taking a UniProt ID and proceeds through mapping corresponding Protein Data Bank structures, renumbering residues, and assessing disorder percentages. It then overlays mutation data, categorizes mutations based on structural context, and visualizes them using advanced tools like MolStar. This approach allows for a detailed analysis of how mutations may disrupt protein function by affecting key regions such as DNA interfaces, ligand-binding sites, and dimer interactions. To validate the pipeline, a case study on the TP53 gene, a critical tumour suppressor often mutated in cancers, was conducted. The analysis highlighted the most frequent mutations occurring at the DNA-binding interface, providing insights into their potential role in cancer progression. Mut-Map offers a powerful resource for elucidating the structural implications of cancer-associated mutations, paving the way for more targeted therapeutic strategies and advancing our understanding of protein structure-function relationships.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11483132/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142458387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancing microbial diagnostics: a universal phylogeny guided computational algorithm to find unique sequences for precise microorganism detection. 推进微生物诊断:以通用系统发育为指导的计算算法,为精确检测微生物寻找独特的序列。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae545
Gulshan Kumar Sharma, Rakesh Sharma, Kavita Joshi, Sameer Qureshi, Shubhita Mathur, Sharad Sinha, Samit Chatterjee, Vandana Nunia

Sequences derived from organisms sharing common evolutionary origins exhibit similarity, while unique sequences, absent in related organisms, act as good diagnostic marker candidates. However, the approach focused on identifying dissimilar regions among closely-related organisms poses challenges as it requires complex multiple sequence alignments, making computation and parsing difficult. To address this, we have developed a biologically inspired universal NAUniSeq algorithm to find the unique sequences for microorganism diagnosis by traveling through the phylogeny of life. Mapping through a phylogenetic tree ensures a low number of cross-contamination and false positives. We have downloaded complete taxonomy data from Taxadb database and sequence data from National Center for Biotechnology Information Reference Sequence Database (NCBI-Refseq) and, with the help of NetworkX, created a phylogenetic tree. Sequences were assigned over the graph nodes, k-mers were created for target and non-target nodes and search was performed over the graph using the depth first search algorithm. In a memory efficient alternative NoSQL approach, we created a collection of Refseq sequences in MongoDB database using tax-id and path of FASTA files. We queried the MongoDB collection for the target and non-target sequences. In both the approaches, we used an alignment free sliding window k-mer-based procedure that quickly compares k-mers of target and non-target sequences and returns unique sequences that are not present in the non-target. We have validated our algorithm with target nodes Mycobacterium tuberculosis, Neisseria gonorrhoeae, and Monkeypox and generated unique sequences. This universal algorithm is a powerful tool for generating diagnostic sequences, enabling the accurate identification of microbial strains with high phylogenetic precision.

从具有共同进化起源的生物体中提取的序列具有相似性,而在相关生物体中不存在的独特序列则是良好的诊断标记候选物。然而,这种侧重于识别近缘生物中不相似区域的方法需要复杂的多序列比对,给计算和解析带来了困难,因此带来了挑战。为了解决这个问题,我们开发了一种受生物学启发的通用 NAUniSeq 算法,通过在生命系统发育过程中旅行,找到用于微生物诊断的独特序列。通过系统发生树进行映射可确保较低的交叉污染和假阳性率。我们从 Taxadb 数据库下载了完整的分类数据,从美国国家生物技术信息中心参考序列数据库(NCBI-Refseq)下载了序列数据,并在 NetworkX 的帮助下创建了系统发生树。在图节点上分配序列,为目标和非目标节点创建 k-mers,并使用深度优先搜索算法在图上进行搜索。在一种内存高效的替代 NoSQL 方法中,我们使用 FASTA 文件的税号和路径在 MongoDB 数据库中创建了 Refseq 序列集合。我们在 MongoDB 数据库中查询目标和非目标序列。在这两种方法中,我们都使用了基于无配对滑动窗口 k-mer的程序,该程序可快速比较目标序列和非目标序列的 k-mer,并返回非目标序列中不存在的唯一序列。我们用结核分枝杆菌、淋病奈瑟菌和猴痘等目标节点验证了我们的算法,并生成了独特的序列。这种通用算法是生成诊断序列的强大工具,可准确鉴定微生物菌株,并具有很高的系统发育精确度。
{"title":"Advancing microbial diagnostics: a universal phylogeny guided computational algorithm to find unique sequences for precise microorganism detection.","authors":"Gulshan Kumar Sharma, Rakesh Sharma, Kavita Joshi, Sameer Qureshi, Shubhita Mathur, Sharad Sinha, Samit Chatterjee, Vandana Nunia","doi":"10.1093/bib/bbae545","DOIUrl":"https://doi.org/10.1093/bib/bbae545","url":null,"abstract":"<p><p>Sequences derived from organisms sharing common evolutionary origins exhibit similarity, while unique sequences, absent in related organisms, act as good diagnostic marker candidates. However, the approach focused on identifying dissimilar regions among closely-related organisms poses challenges as it requires complex multiple sequence alignments, making computation and parsing difficult. To address this, we have developed a biologically inspired universal NAUniSeq algorithm to find the unique sequences for microorganism diagnosis by traveling through the phylogeny of life. Mapping through a phylogenetic tree ensures a low number of cross-contamination and false positives. We have downloaded complete taxonomy data from Taxadb database and sequence data from National Center for Biotechnology Information Reference Sequence Database (NCBI-Refseq) and, with the help of NetworkX, created a phylogenetic tree. Sequences were assigned over the graph nodes, k-mers were created for target and non-target nodes and search was performed over the graph using the depth first search algorithm. In a memory efficient alternative NoSQL approach, we created a collection of Refseq sequences in MongoDB database using tax-id and path of FASTA files. We queried the MongoDB collection for the target and non-target sequences. In both the approaches, we used an alignment free sliding window k-mer-based procedure that quickly compares k-mers of target and non-target sequences and returns unique sequences that are not present in the non-target. We have validated our algorithm with target nodes Mycobacterium tuberculosis, Neisseria gonorrhoeae, and Monkeypox and generated unique sequences. This universal algorithm is a powerful tool for generating diagnostic sequences, enabling the accurate identification of microbial strains with high phylogenetic precision.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11497845/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142495335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimized patient-specific immune checkpoint inhibitor therapies for cancer treatment based on tumor immune microenvironment modeling. 基于肿瘤免疫微环境建模,优化用于癌症治疗的患者特异性免疫检查点抑制剂疗法。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae547
Yao Yao, Youhua Frank Chen, Qingpeng Zhang

Enhancing patient response to immune checkpoint inhibitors (ICIs) is crucial in cancer immunotherapy. We aim to create a data-driven mathematical model of the tumor immune microenvironment (TIME) and utilize deep reinforcement learning (DRL) to optimize patient-specific ICI therapy combined with chemotherapy (ICC). Using patients' genomic and transcriptomic data, we develop an ordinary differential equations (ODEs)-based TIME dynamic evolutionary model to characterize interactions among chemotherapy, ICIs, immune cells, and tumor cells. A DRL agent is trained to determine the personalized optimal ICC therapy. Numerical experiments with real-world data demonstrate that the proposed TIME model can predict ICI therapy response. The DRL-derived personalized ICC therapy outperforms predefined fixed schedules. For tumors with extremely low CD8 + T cell infiltration ('extremely cold tumors'), the DRL agent recommends high-dosage chemotherapy alone. For tumors with higher CD8 + T cell infiltration ('cold' and 'hot tumors'), an appropriate chemotherapy dosage induces CD8 + T cell proliferation, enhancing ICI therapy outcomes. Specifically, for 'hot tumors', chemotherapy and ICI are administered simultaneously, while for 'cold tumors', a mid-dosage of chemotherapy makes the TIME 'hotter' before ICI administration. However, in several 'cold tumors' with rapid resistant tumor cell growth, ICC eventually fails. This study highlights the potential of utilizing real-world clinical data and DRL algorithm to develop personalized optimal ICC by understanding the complex biological dynamics of a patient's TIME. Our ODE-based TIME dynamic evolutionary model offers a theoretical framework for determining the best use of ICI, and the proposed DRL agent may guide personalized ICC schedules.

增强患者对免疫检查点抑制剂(ICIs)的反应在癌症免疫疗法中至关重要。我们的目标是创建一个数据驱动的肿瘤免疫微环境数学模型(TIME),并利用深度强化学习(DRL)来优化患者特异性 ICI 治疗联合化疗(ICC)。利用患者的基因组和转录组数据,我们开发了基于常微分方程(ODEs)的TIME动态进化模型,以描述化疗、ICIs、免疫细胞和肿瘤细胞之间的相互作用。对 DRL 代理进行训练,以确定个性化的最佳 ICC 疗法。利用真实世界数据进行的数值实验证明,所提出的 TIME 模型可以预测 ICI 治疗反应。DRL 衍生的个性化 ICC 疗法优于预定义的固定时间表。对于 CD8 + T 细胞浸润极低的肿瘤("极冷肿瘤"),DRL 代理建议单独使用大剂量化疗。对于 CD8 + T 细胞浸润较高的肿瘤("冷肿瘤 "和 "热肿瘤"),适当的化疗剂量可诱导 CD8 + T 细胞增殖,从而提高 ICI 治疗效果。具体来说,对于 "热肿瘤",化疗和 ICI 可同时进行;而对于 "冷肿瘤",化疗的中期剂量可使 TIME 在 ICI 给药前变得更 "热"。然而,在一些肿瘤细胞快速生长的 "冷肿瘤 "中,ICC最终失败。本研究强调了利用真实世界的临床数据和 DRL 算法,通过了解患者 TIME 的复杂生物动态,开发个性化最佳 ICC 的潜力。我们基于 ODE 的 TIME 动态进化模型为确定 ICI 的最佳使用提供了一个理论框架,而所提出的 DRL 代理可为个性化 ICC 计划提供指导。
{"title":"Optimized patient-specific immune checkpoint inhibitor therapies for cancer treatment based on tumor immune microenvironment modeling.","authors":"Yao Yao, Youhua Frank Chen, Qingpeng Zhang","doi":"10.1093/bib/bbae547","DOIUrl":"https://doi.org/10.1093/bib/bbae547","url":null,"abstract":"<p><p>Enhancing patient response to immune checkpoint inhibitors (ICIs) is crucial in cancer immunotherapy. We aim to create a data-driven mathematical model of the tumor immune microenvironment (TIME) and utilize deep reinforcement learning (DRL) to optimize patient-specific ICI therapy combined with chemotherapy (ICC). Using patients' genomic and transcriptomic data, we develop an ordinary differential equations (ODEs)-based TIME dynamic evolutionary model to characterize interactions among chemotherapy, ICIs, immune cells, and tumor cells. A DRL agent is trained to determine the personalized optimal ICC therapy. Numerical experiments with real-world data demonstrate that the proposed TIME model can predict ICI therapy response. The DRL-derived personalized ICC therapy outperforms predefined fixed schedules. For tumors with extremely low CD8 + T cell infiltration ('extremely cold tumors'), the DRL agent recommends high-dosage chemotherapy alone. For tumors with higher CD8 + T cell infiltration ('cold' and 'hot tumors'), an appropriate chemotherapy dosage induces CD8 + T cell proliferation, enhancing ICI therapy outcomes. Specifically, for 'hot tumors', chemotherapy and ICI are administered simultaneously, while for 'cold tumors', a mid-dosage of chemotherapy makes the TIME 'hotter' before ICI administration. However, in several 'cold tumors' with rapid resistant tumor cell growth, ICC eventually fails. This study highlights the potential of utilizing real-world clinical data and DRL algorithm to develop personalized optimal ICC by understanding the complex biological dynamics of a patient's TIME. Our ODE-based TIME dynamic evolutionary model offers a theoretical framework for determining the best use of ICI, and the proposed DRL agent may guide personalized ICC schedules.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11503752/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142495341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IPFMC: an iterative pathway fusion approach for enhanced multi-omics clustering in cancer research. IPFMC:一种用于增强癌症研究中多组学聚类的迭代路径融合方法。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae541
Haoyang Zhang, Sha Liu, Bingxin Li, Xionghui Zhou

Using multi-omics data for clustering (cancer subtyping) is crucial for precision medicine research. Despite numerous methods having been proposed, current approaches either do not perform satisfactorily or lack biological interpretability, limiting the practical application of these methods. Based on the biological hypothesis that patients with the same subtype may exhibit similar dysregulated pathways, we developed an Iterative Pathway Fusion approach for enhanced Multi-omics Clustering (IPFMC), a novel multi-omics clustering method involving two data fusion stages. In the first stage, omics data are partitioned at each layer using pathway information, with crucial pathways iteratively selected to represent samples. Ultimately, the representation information from multiple pathways is integrated. In the second stage, similarity network fusion was applied to integrate the representation information from multiple omics. Comparative experiments with nine cancer datasets from The Cancer Genome Atlas (TCGA), involving systematic comparisons with 10 representative methods, reveal that IPFMC outperforms these methods. Additionally, the biological pathways and genes identified by our approach hold biological significance, affirming not only its excellent clustering performance but also its biological interpretability.

利用多组学数据进行聚类(癌症亚型)对精准医学研究至关重要。尽管已经提出了许多方法,但目前的方法要么效果不理想,要么缺乏生物学可解释性,限制了这些方法的实际应用。基于同一亚型的患者可能表现出相似的失调通路这一生物学假设,我们开发了一种用于增强多组学聚类的迭代通路融合方法(IPFMC),这是一种新型的多组学聚类方法,包括两个数据融合阶段。在第一阶段,利用通路信息对每一层的 omics 数据进行分区,并迭代选择关键通路来代表样本。最终,来自多条途径的表征信息被整合在一起。在第二阶段,应用相似性网络融合来整合来自多个 omics 的表征信息。用癌症基因组图谱(TCGA)中的九个癌症数据集进行的比较实验显示,IPFMC优于这些方法。此外,我们的方法识别出的生物通路和基因具有生物学意义,这不仅肯定了其出色的聚类性能,也肯定了其生物学可解释性。
{"title":"IPFMC: an iterative pathway fusion approach for enhanced multi-omics clustering in cancer research.","authors":"Haoyang Zhang, Sha Liu, Bingxin Li, Xionghui Zhou","doi":"10.1093/bib/bbae541","DOIUrl":"10.1093/bib/bbae541","url":null,"abstract":"<p><p>Using multi-omics data for clustering (cancer subtyping) is crucial for precision medicine research. Despite numerous methods having been proposed, current approaches either do not perform satisfactorily or lack biological interpretability, limiting the practical application of these methods. Based on the biological hypothesis that patients with the same subtype may exhibit similar dysregulated pathways, we developed an Iterative Pathway Fusion approach for enhanced Multi-omics Clustering (IPFMC), a novel multi-omics clustering method involving two data fusion stages. In the first stage, omics data are partitioned at each layer using pathway information, with crucial pathways iteratively selected to represent samples. Ultimately, the representation information from multiple pathways is integrated. In the second stage, similarity network fusion was applied to integrate the representation information from multiple omics. Comparative experiments with nine cancer datasets from The Cancer Genome Atlas (TCGA), involving systematic comparisons with 10 representative methods, reveal that IPFMC outperforms these methods. Additionally, the biological pathways and genes identified by our approach hold biological significance, affirming not only its excellent clustering performance but also its biological interpretability.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11514061/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142520982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal contrastive learning for spatial gene expression prediction using histology images. 利用组织学图像进行空间基因表达预测的多模态对比学习。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae551
Wenwen Min, Zhiceng Shi, Jun Zhang, Jun Wan, Changmiao Wang

In recent years, the advent of spatial transcriptomics (ST) technology has unlocked unprecedented opportunities for delving into the complexities of gene expression patterns within intricate biological systems. Despite its transformative potential, the prohibitive cost of ST technology remains a significant barrier to its widespread adoption in large-scale studies. An alternative, more cost-effective strategy involves employing artificial intelligence to predict gene expression levels using readily accessible whole-slide images stained with Hematoxylin and Eosin (H&E). However, existing methods have yet to fully capitalize on multimodal information provided by H&E images and ST data with spatial location. In this paper, we propose mclSTExp, a multimodal contrastive learning with Transformer and Densenet-121 encoder for Spatial Transcriptomics Expression prediction. We conceptualize each spot as a "word", integrating its intrinsic features with spatial context through the self-attention mechanism of a Transformer encoder. This integration is further enriched by incorporating image features via contrastive learning, thereby enhancing the predictive capability of our model. We conducted an extensive evaluation of highly variable genes in two breast cancer datasets and a skin squamous cell carcinoma dataset, and the results demonstrate that mclSTExp exhibits superior performance in predicting spatial gene expression. Moreover, mclSTExp has shown promise in interpreting cancer-specific overexpressed genes, elucidating immune-related genes, and identifying specialized spatial domains annotated by pathologists. Our source code is available at https://github.com/shizhiceng/mclSTExp.

近年来,空间转录组学(ST)技术的出现为深入研究复杂生物系统中复杂的基因表达模式带来了前所未有的机遇。尽管空间转录组学技术具有变革性的潜力,但其高昂的成本仍然是阻碍其在大规模研究中广泛应用的一大障碍。另一种更具成本效益的策略是利用人工智能预测基因表达水平,这种方法使用的是易于获取的经苏木精和伊红(H&E)染色的整张切片图像。然而,现有方法尚未充分利用 H&E 图像提供的多模态信息和带有空间位置的 ST 数据。在本文中,我们提出了 mclSTExp,这是一种利用 Transformer 和 Densenet-121 编码器进行多模态对比学习的空间转录组学表达预测方法。我们将每个点概念化为一个 "词",通过 Transformer 编码器的自我注意机制将其内在特征与空间上下文整合在一起。通过对比学习结合图像特征进一步丰富了这种整合,从而增强了我们模型的预测能力。我们对两个乳腺癌数据集和一个皮肤鳞状细胞癌数据集中的高变异基因进行了广泛评估,结果表明 mclSTExp 在预测空间基因表达方面表现出色。此外,mclSTExp 在解读癌症特异性过表达基因、阐明免疫相关基因以及识别病理学家注释的特殊空间域方面也表现出了良好的前景。我们的源代码见 https://github.com/shizhiceng/mclSTExp。
{"title":"Multimodal contrastive learning for spatial gene expression prediction using histology images.","authors":"Wenwen Min, Zhiceng Shi, Jun Zhang, Jun Wan, Changmiao Wang","doi":"10.1093/bib/bbae551","DOIUrl":"https://doi.org/10.1093/bib/bbae551","url":null,"abstract":"<p><p>In recent years, the advent of spatial transcriptomics (ST) technology has unlocked unprecedented opportunities for delving into the complexities of gene expression patterns within intricate biological systems. Despite its transformative potential, the prohibitive cost of ST technology remains a significant barrier to its widespread adoption in large-scale studies. An alternative, more cost-effective strategy involves employing artificial intelligence to predict gene expression levels using readily accessible whole-slide images stained with Hematoxylin and Eosin (H&E). However, existing methods have yet to fully capitalize on multimodal information provided by H&E images and ST data with spatial location. In this paper, we propose mclSTExp, a multimodal contrastive learning with Transformer and Densenet-121 encoder for Spatial Transcriptomics Expression prediction. We conceptualize each spot as a \"word\", integrating its intrinsic features with spatial context through the self-attention mechanism of a Transformer encoder. This integration is further enriched by incorporating image features via contrastive learning, thereby enhancing the predictive capability of our model. We conducted an extensive evaluation of highly variable genes in two breast cancer datasets and a skin squamous cell carcinoma dataset, and the results demonstrate that mclSTExp exhibits superior performance in predicting spatial gene expression. Moreover, mclSTExp has shown promise in interpreting cancer-specific overexpressed genes, elucidating immune-related genes, and identifying specialized spatial domains annotated by pathologists. Our source code is available at https://github.com/shizhiceng/mclSTExp.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142543789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DEWNA: dynamic entropy weight network analysis and its application to the DNA-binding proteome in A549 cells with cisplatin-induced damage. DEWNA:动态熵权网络分析及其在顺铂诱导损伤的 A549 细胞 DNA 结合蛋白质组中的应用。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae564
Shisheng Wang, Wenjuan Zeng, Yin Yang, Jingqiu Cheng, Dan Liu, Hao Yang

Cisplatin is one of the most commonly used chemotherapy drugs for treating solid tumors. As a genotoxic agent, cisplatin binds to DNA and forms platinum-DNA adducts that cause DNA damage and activate a series of signaling pathways mediated by various DNA-binding proteins (DBPs), ultimately leading to cell death. Therefore, DBPs play crucial roles in the cellular response to cisplatin and in determining cell fate. However, systematic studies of DBPs responding to cisplatin damage and their temporal dynamics are still lacking. To address this, we developed a novel and user-friendly stand-alone software, DEWNA, designed for dynamic entropy weight network analysis to reveal the dynamic changes of DBPs and their functions. DEWNA utilizes the entropy weight method, multiscale embedded gene co-expression network analysis and generalized reporter score-based analysis to process time-course proteome expression data, helping scientists identify protein hubs and pathway entropy profiles during disease progression. We applied DEWNA to a dataset of DBPs from A549 cells responding to cisplatin-induced damage across 8 time points, with data generated by data-independent acquisition mass spectrometry (DIA-MS). The results demonstrate that DEWNA can effectively identify protein hubs and associated pathways that are significantly altered in response to cisplatin-induced DNA damage, and offer a comprehensive view of how different pathways interact and respond dynamically over time to cisplatin treatment. Notably, we observed the dynamic activation of distinct DNA repair pathways and cell death mechanisms during the drug treatment time course, providing new insights into the molecular mechanisms underlying the cellular response to DNA damage.

顺铂是治疗实体瘤最常用的化疗药物之一。作为一种基因毒性药物,顺铂与 DNA 结合,形成铂-DNA 加合物,造成 DNA 损伤,并激活由各种 DNA 结合蛋白(DBPs)介导的一系列信号通路,最终导致细胞死亡。因此,DBPs 在细胞对顺铂的反应和决定细胞命运方面起着至关重要的作用。然而,有关 DBPs 对顺铂损伤的反应及其时间动态的系统研究仍然缺乏。为了解决这个问题,我们开发了一种新颖且用户友好的单机软件 DEWNA,用于动态熵权网络分析,以揭示 DBPs 的动态变化及其功能。DEWNA 利用熵权法、多尺度嵌入式基因共表达网络分析和基于广义报告得分的分析来处理时序蛋白质组表达数据,帮助科学家识别疾病进展过程中的蛋白质枢纽和通路熵谱。我们将 DEWNA 应用于 A549 细胞对顺铂诱导的损伤做出反应的 DBPs 数据集,该数据集跨越 8 个时间点,由数据无关采集质谱(DIA-MS)生成。结果表明,DEWNA 能有效识别顺铂诱导 DNA 损伤时发生显著改变的蛋白质枢纽和相关通路,并能全面了解不同通路如何相互作用并随时间动态响应顺铂处理。值得注意的是,我们观察到在药物治疗过程中,不同的DNA修复通路和细胞死亡机制被动态激活,这为我们深入了解细胞对DNA损伤反应的分子机制提供了新的视角。
{"title":"DEWNA: dynamic entropy weight network analysis and its application to the DNA-binding proteome in A549 cells with cisplatin-induced damage.","authors":"Shisheng Wang, Wenjuan Zeng, Yin Yang, Jingqiu Cheng, Dan Liu, Hao Yang","doi":"10.1093/bib/bbae564","DOIUrl":"10.1093/bib/bbae564","url":null,"abstract":"<p><p>Cisplatin is one of the most commonly used chemotherapy drugs for treating solid tumors. As a genotoxic agent, cisplatin binds to DNA and forms platinum-DNA adducts that cause DNA damage and activate a series of signaling pathways mediated by various DNA-binding proteins (DBPs), ultimately leading to cell death. Therefore, DBPs play crucial roles in the cellular response to cisplatin and in determining cell fate. However, systematic studies of DBPs responding to cisplatin damage and their temporal dynamics are still lacking. To address this, we developed a novel and user-friendly stand-alone software, DEWNA, designed for dynamic entropy weight network analysis to reveal the dynamic changes of DBPs and their functions. DEWNA utilizes the entropy weight method, multiscale embedded gene co-expression network analysis and generalized reporter score-based analysis to process time-course proteome expression data, helping scientists identify protein hubs and pathway entropy profiles during disease progression. We applied DEWNA to a dataset of DBPs from A549 cells responding to cisplatin-induced damage across 8 time points, with data generated by data-independent acquisition mass spectrometry (DIA-MS). The results demonstrate that DEWNA can effectively identify protein hubs and associated pathways that are significantly altered in response to cisplatin-induced DNA damage, and offer a comprehensive view of how different pathways interact and respond dynamically over time to cisplatin treatment. Notably, we observed the dynamic activation of distinct DNA repair pathways and cell death mechanisms during the drug treatment time course, providing new insights into the molecular mechanisms underlying the cellular response to DNA damage.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11530294/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142563963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Briefings in bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1