Wei Zhang, Yaxin Xu, Xiaoying Zheng, Juan Shen, Yuanyuan Li
Single-cell RNA sequencing (scRNA-seq) technology is one of the most cost-effective and efficacious methods for revealing cellular heterogeneity and diversity. Precise identification of cell types is essential for establishing a robust foundation for downstream analyses and is a prerequisite for understanding heterogeneous mechanisms. However, the accuracy of existing methods warrants improvement, and highly accurate methods often impose stringent equipment requirements. Moreover, most unsupervised learning-based approaches are constrained by the need to input the number of cell types a prior, which limits their widespread application. In this paper, we propose a novel algorithm framework named WLGG. Initially, to capture the underlying nonlinear information, we introduce a weighted distance penalty term utilizing the Gaussian kernel function, which maps data from a low-dimensional nonlinear space to a high-dimensional linear space. We subsequently impose a Lasso constraint on the regularized Gaussian graphical model to enhance its ability to capture linear data characteristics. Additionally, we utilize the Eigengap strategy to predict the number of cell types and obtain predicted labels via spectral clustering. The experimental results on 14 test datasets demonstrate the superior clustering accuracy of the WLGG algorithm over 16 alternative methods. Furthermore, downstream analysis, including marker gene identification, pseudotime inference, and functional enrichment analysis based on the similarity matrix and predicted labels from the WLGG algorithm, substantiates the reliability of WLGG and offers valuable insights into biological dynamic biological processes and regulatory mechanisms.
{"title":"Identifying cell types by lasso-constraint regularized Gaussian graphical model based on weighted distance penalty.","authors":"Wei Zhang, Yaxin Xu, Xiaoying Zheng, Juan Shen, Yuanyuan Li","doi":"10.1093/bib/bbae572","DOIUrl":"10.1093/bib/bbae572","url":null,"abstract":"<p><p>Single-cell RNA sequencing (scRNA-seq) technology is one of the most cost-effective and efficacious methods for revealing cellular heterogeneity and diversity. Precise identification of cell types is essential for establishing a robust foundation for downstream analyses and is a prerequisite for understanding heterogeneous mechanisms. However, the accuracy of existing methods warrants improvement, and highly accurate methods often impose stringent equipment requirements. Moreover, most unsupervised learning-based approaches are constrained by the need to input the number of cell types a prior, which limits their widespread application. In this paper, we propose a novel algorithm framework named WLGG. Initially, to capture the underlying nonlinear information, we introduce a weighted distance penalty term utilizing the Gaussian kernel function, which maps data from a low-dimensional nonlinear space to a high-dimensional linear space. We subsequently impose a Lasso constraint on the regularized Gaussian graphical model to enhance its ability to capture linear data characteristics. Additionally, we utilize the Eigengap strategy to predict the number of cell types and obtain predicted labels via spectral clustering. The experimental results on 14 test datasets demonstrate the superior clustering accuracy of the WLGG algorithm over 16 alternative methods. Furthermore, downstream analysis, including marker gene identification, pseudotime inference, and functional enrichment analysis based on the similarity matrix and predicted labels from the WLGG algorithm, substantiates the reliability of WLGG and offers valuable insights into biological dynamic biological processes and regulatory mechanisms.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562834/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142614904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose DGCL, a dual-graph neural networks (GNNs)-based contrastive learning (CL) integrated with mixed molecular fingerprints (MFPs) for molecular property prediction. The DGCL-MFP method contains two stages. In the first pretraining stage, we utilize two different GNNs as encoders to construct CL, rather than using the method of generating enhanced graphs as before. Precisely, DGCL aggregates and enhances features of the same molecule by the Graph Isomorphism Network and the Graph Attention Network, with representations extracted from the same molecule serving as positive samples, and others marked as negative ones. In the downstream tasks training stage, features extracted from the two above pretrained graph networks and the meticulously selected MFPs are concated together to predict molecular properties. Our experiments show that DGCL enhances the performance of existing GNNs by achieving or surpassing the state-of-the-art self-supervised learning models on multiple benchmark datasets. Specifically, DGCL increases the average performance of classification tasks by 3.73$%$ and improves the performance of regression task Lipo by 0.126. Through ablation studies, we validate the impact of network fusion strategies and MFPs on model performance. In addition, DGCL's predictive performance is further enhanced by weighting different molecular features based on the Extended Connectivity Fingerprint. The code and datasets of DGCL will be made publicly available.
{"title":"DGCL: dual-graph neural networks contrastive learning for molecular property prediction.","authors":"Xiuyu Jiang, Liqin Tan, Qingsong Zou","doi":"10.1093/bib/bbae474","DOIUrl":"https://doi.org/10.1093/bib/bbae474","url":null,"abstract":"<p><p>In this paper, we propose DGCL, a dual-graph neural networks (GNNs)-based contrastive learning (CL) integrated with mixed molecular fingerprints (MFPs) for molecular property prediction. The DGCL-MFP method contains two stages. In the first pretraining stage, we utilize two different GNNs as encoders to construct CL, rather than using the method of generating enhanced graphs as before. Precisely, DGCL aggregates and enhances features of the same molecule by the Graph Isomorphism Network and the Graph Attention Network, with representations extracted from the same molecule serving as positive samples, and others marked as negative ones. In the downstream tasks training stage, features extracted from the two above pretrained graph networks and the meticulously selected MFPs are concated together to predict molecular properties. Our experiments show that DGCL enhances the performance of existing GNNs by achieving or surpassing the state-of-the-art self-supervised learning models on multiple benchmark datasets. Specifically, DGCL increases the average performance of classification tasks by 3.73$%$ and improves the performance of regression task Lipo by 0.126. Through ablation studies, we validate the impact of network fusion strategies and MFPs on model performance. In addition, DGCL's predictive performance is further enhanced by weighting different molecular features based on the Extended Connectivity Fingerprint. The code and datasets of DGCL will be made publicly available.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11428321/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142341941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Single-cell cross-modal joint clustering has been extensively utilized to investigate the tumor microenvironment. Although numerous approaches have been suggested, accurate clustering remains the main challenge. First, the gene expression matrix frequently contains numerous missing values due to measurement limitations. The majority of existing clustering methods treat it as a typical multi-modal dataset without further processing. Few methods conduct recovery before clustering and do not sufficiently engage with the underlying research, leading to suboptimal outcomes. Additionally, the existing cross-modal information fusion strategy does not ensure consistency of representations across different modes, potentially leading to the integration of conflicting information, which could degrade performance. To address these challenges, we propose the 'Recover then Aggregate' strategy and introduce the Unified Cross-Modal Deep Clustering model. Specifically, we have developed a data augmentation technique based on neighborhood similarity, iteratively imposing rank constraints on the Laplacian matrix, thus updating the similarity matrix and recovering dropout events. Concurrently, we integrate cross-modal features and employ contrastive learning to align modality-specific representations with consistent ones, enhancing the effective integration of diverse modal information. Comprehensive experiments on five real-world multi-modal datasets have demonstrated this method's superior effectiveness in single-cell clustering tasks.
{"title":"Recover then aggregate: unified cross-modal deep clustering with global structural information for single-cell data.","authors":"Ziyi Wang, Peng Luo, Mingming Xiao, Boyang Wang, Tianyu Liu, Xiangyu Sun","doi":"10.1093/bib/bbae485","DOIUrl":"10.1093/bib/bbae485","url":null,"abstract":"<p><p>Single-cell cross-modal joint clustering has been extensively utilized to investigate the tumor microenvironment. Although numerous approaches have been suggested, accurate clustering remains the main challenge. First, the gene expression matrix frequently contains numerous missing values due to measurement limitations. The majority of existing clustering methods treat it as a typical multi-modal dataset without further processing. Few methods conduct recovery before clustering and do not sufficiently engage with the underlying research, leading to suboptimal outcomes. Additionally, the existing cross-modal information fusion strategy does not ensure consistency of representations across different modes, potentially leading to the integration of conflicting information, which could degrade performance. To address these challenges, we propose the 'Recover then Aggregate' strategy and introduce the Unified Cross-Modal Deep Clustering model. Specifically, we have developed a data augmentation technique based on neighborhood similarity, iteratively imposing rank constraints on the Laplacian matrix, thus updating the similarity matrix and recovering dropout events. Concurrently, we integrate cross-modal features and employ contrastive learning to align modality-specific representations with consistent ones, enhancing the effective integration of diverse modal information. Comprehensive experiments on five real-world multi-modal datasets have demonstrated this method's superior effectiveness in single-cell clustering tasks.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11445907/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142361070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Muhammad A Nawaz, Igor E Pamirsky, Kirill S Golokhvast
Bioinformatics has become an interdisciplinary subject due to its universal role in molecular biology research. The current status of Russia's bioinformatics research in Russia is not known. Here, we review the history of bioinformatics in Russia, present the current landscape, and highlight future directions and challenges. Bioinformatics research in Russia is driven by four major industries: information technology, pharmaceuticals, biotechnology, and agriculture. Over the past three decades, despite a delayed start, the field has gained momentum, especially in protein and nucleic acid research. Dedicated and shared centers for genomics, proteomics, and bioinformatics are active in different regions of Russia. Present-day bioinformatics in Russia is characterized by research issues related to genetics, metagenomics, OMICs, medical informatics, computational biology, environmental informatics, and structural bioinformatics. Notable developments are in the fields of software (tools, algorithms, and pipelines), use of high computation power (e.g. by the Siberian Supercomputer Center), and large-scale sequencing projects (the sequencing of 100 000 human genomes). Government funding is increasing, policies are being changed, and a National Genomic Information Database is being established. An increased focus on eukaryotic genome sequencing, the development of a common place for developers and researchers to share tools and data, and the use of biological modeling, machine learning, and biostatistics are key areas for future focus. Universities and research institutes have started to implement bioinformatics modules. A critical mass of bioinformaticians is essential to catch up with the global pace in the discipline.
{"title":"Bioinformatics in Russia: history and present-day landscape.","authors":"Muhammad A Nawaz, Igor E Pamirsky, Kirill S Golokhvast","doi":"10.1093/bib/bbae513","DOIUrl":"10.1093/bib/bbae513","url":null,"abstract":"<p><p>Bioinformatics has become an interdisciplinary subject due to its universal role in molecular biology research. The current status of Russia's bioinformatics research in Russia is not known. Here, we review the history of bioinformatics in Russia, present the current landscape, and highlight future directions and challenges. Bioinformatics research in Russia is driven by four major industries: information technology, pharmaceuticals, biotechnology, and agriculture. Over the past three decades, despite a delayed start, the field has gained momentum, especially in protein and nucleic acid research. Dedicated and shared centers for genomics, proteomics, and bioinformatics are active in different regions of Russia. Present-day bioinformatics in Russia is characterized by research issues related to genetics, metagenomics, OMICs, medical informatics, computational biology, environmental informatics, and structural bioinformatics. Notable developments are in the fields of software (tools, algorithms, and pipelines), use of high computation power (e.g. by the Siberian Supercomputer Center), and large-scale sequencing projects (the sequencing of 100 000 human genomes). Government funding is increasing, policies are being changed, and a National Genomic Information Database is being established. An increased focus on eukaryotic genome sequencing, the development of a common place for developers and researchers to share tools and data, and the use of biological modeling, machine learning, and biostatistics are key areas for future focus. Universities and research institutes have started to implement bioinformatics modules. A critical mass of bioinformaticians is essential to catch up with the global pace in the discipline.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11473191/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142458366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Understanding the functional impact of genetic mutations on protein structures is essential for advancing cancer research and developing targeted therapies. The main challenge lies in accurately mapping these mutations to protein structures and analysing their effects on protein function. To address this, Mut-Map (https://genemutation.org/) is a comprehensive computational pipeline designed to integrate mutation data from the Catalogue Of Somatic Mutations In Cancer database with protein structural data from the Protein Data Bank and AlphaFold models. The pipeline begins by taking a UniProt ID and proceeds through mapping corresponding Protein Data Bank structures, renumbering residues, and assessing disorder percentages. It then overlays mutation data, categorizes mutations based on structural context, and visualizes them using advanced tools like MolStar. This approach allows for a detailed analysis of how mutations may disrupt protein function by affecting key regions such as DNA interfaces, ligand-binding sites, and dimer interactions. To validate the pipeline, a case study on the TP53 gene, a critical tumour suppressor often mutated in cancers, was conducted. The analysis highlighted the most frequent mutations occurring at the DNA-binding interface, providing insights into their potential role in cancer progression. Mut-Map offers a powerful resource for elucidating the structural implications of cancer-associated mutations, paving the way for more targeted therapeutic strategies and advancing our understanding of protein structure-function relationships.
了解基因突变对蛋白质结构的功能影响对于推动癌症研究和开发靶向疗法至关重要。主要的挑战在于如何准确地将这些突变映射到蛋白质结构上,并分析它们对蛋白质功能的影响。为了解决这个问题,Mut-Map (https://genemutation.org/) 是一个综合计算管道,旨在将癌症中的体细胞突变目录数据库中的突变数据与蛋白质数据库和 AlphaFold 模型中的蛋白质结构数据整合在一起。该管道首先获取一个 UniProt ID,然后映射相应的蛋白质数据库结构、重新编号残基并评估紊乱百分比。然后叠加突变数据,根据结构背景对突变进行分类,并使用 MolStar 等先进工具对突变进行可视化处理。这种方法可以详细分析突变如何通过影响 DNA 界面、配体结合位点和二聚体相互作用等关键区域来破坏蛋白质的功能。为了验证该管道,我们对 TP53 基因进行了案例研究,该基因是一种在癌症中经常发生突变的关键肿瘤抑制因子。该分析突出显示了 DNA 结合界面上最常见的突变,为深入了解其在癌症进展中的潜在作用提供了线索。Mut-Map 为阐明癌症相关突变的结构影响提供了强大的资源,为制定更有针对性的治疗策略铺平了道路,并加深了我们对蛋白质结构与功能关系的理解。
{"title":"Mut-Map: Comprehensive Computational Pipeline for Structural Mapping and Analysis of Cancer-Associated Mutations.","authors":"Ali F Alsulami","doi":"10.1093/bib/bbae514","DOIUrl":"https://doi.org/10.1093/bib/bbae514","url":null,"abstract":"<p><p>Understanding the functional impact of genetic mutations on protein structures is essential for advancing cancer research and developing targeted therapies. The main challenge lies in accurately mapping these mutations to protein structures and analysing their effects on protein function. To address this, Mut-Map (https://genemutation.org/) is a comprehensive computational pipeline designed to integrate mutation data from the Catalogue Of Somatic Mutations In Cancer database with protein structural data from the Protein Data Bank and AlphaFold models. The pipeline begins by taking a UniProt ID and proceeds through mapping corresponding Protein Data Bank structures, renumbering residues, and assessing disorder percentages. It then overlays mutation data, categorizes mutations based on structural context, and visualizes them using advanced tools like MolStar. This approach allows for a detailed analysis of how mutations may disrupt protein function by affecting key regions such as DNA interfaces, ligand-binding sites, and dimer interactions. To validate the pipeline, a case study on the TP53 gene, a critical tumour suppressor often mutated in cancers, was conducted. The analysis highlighted the most frequent mutations occurring at the DNA-binding interface, providing insights into their potential role in cancer progression. Mut-Map offers a powerful resource for elucidating the structural implications of cancer-associated mutations, paving the way for more targeted therapeutic strategies and advancing our understanding of protein structure-function relationships.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11483132/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142458387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sequences derived from organisms sharing common evolutionary origins exhibit similarity, while unique sequences, absent in related organisms, act as good diagnostic marker candidates. However, the approach focused on identifying dissimilar regions among closely-related organisms poses challenges as it requires complex multiple sequence alignments, making computation and parsing difficult. To address this, we have developed a biologically inspired universal NAUniSeq algorithm to find the unique sequences for microorganism diagnosis by traveling through the phylogeny of life. Mapping through a phylogenetic tree ensures a low number of cross-contamination and false positives. We have downloaded complete taxonomy data from Taxadb database and sequence data from National Center for Biotechnology Information Reference Sequence Database (NCBI-Refseq) and, with the help of NetworkX, created a phylogenetic tree. Sequences were assigned over the graph nodes, k-mers were created for target and non-target nodes and search was performed over the graph using the depth first search algorithm. In a memory efficient alternative NoSQL approach, we created a collection of Refseq sequences in MongoDB database using tax-id and path of FASTA files. We queried the MongoDB collection for the target and non-target sequences. In both the approaches, we used an alignment free sliding window k-mer-based procedure that quickly compares k-mers of target and non-target sequences and returns unique sequences that are not present in the non-target. We have validated our algorithm with target nodes Mycobacterium tuberculosis, Neisseria gonorrhoeae, and Monkeypox and generated unique sequences. This universal algorithm is a powerful tool for generating diagnostic sequences, enabling the accurate identification of microbial strains with high phylogenetic precision.
{"title":"Advancing microbial diagnostics: a universal phylogeny guided computational algorithm to find unique sequences for precise microorganism detection.","authors":"Gulshan Kumar Sharma, Rakesh Sharma, Kavita Joshi, Sameer Qureshi, Shubhita Mathur, Sharad Sinha, Samit Chatterjee, Vandana Nunia","doi":"10.1093/bib/bbae545","DOIUrl":"https://doi.org/10.1093/bib/bbae545","url":null,"abstract":"<p><p>Sequences derived from organisms sharing common evolutionary origins exhibit similarity, while unique sequences, absent in related organisms, act as good diagnostic marker candidates. However, the approach focused on identifying dissimilar regions among closely-related organisms poses challenges as it requires complex multiple sequence alignments, making computation and parsing difficult. To address this, we have developed a biologically inspired universal NAUniSeq algorithm to find the unique sequences for microorganism diagnosis by traveling through the phylogeny of life. Mapping through a phylogenetic tree ensures a low number of cross-contamination and false positives. We have downloaded complete taxonomy data from Taxadb database and sequence data from National Center for Biotechnology Information Reference Sequence Database (NCBI-Refseq) and, with the help of NetworkX, created a phylogenetic tree. Sequences were assigned over the graph nodes, k-mers were created for target and non-target nodes and search was performed over the graph using the depth first search algorithm. In a memory efficient alternative NoSQL approach, we created a collection of Refseq sequences in MongoDB database using tax-id and path of FASTA files. We queried the MongoDB collection for the target and non-target sequences. In both the approaches, we used an alignment free sliding window k-mer-based procedure that quickly compares k-mers of target and non-target sequences and returns unique sequences that are not present in the non-target. We have validated our algorithm with target nodes Mycobacterium tuberculosis, Neisseria gonorrhoeae, and Monkeypox and generated unique sequences. This universal algorithm is a powerful tool for generating diagnostic sequences, enabling the accurate identification of microbial strains with high phylogenetic precision.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11497845/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142495335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Enhancing patient response to immune checkpoint inhibitors (ICIs) is crucial in cancer immunotherapy. We aim to create a data-driven mathematical model of the tumor immune microenvironment (TIME) and utilize deep reinforcement learning (DRL) to optimize patient-specific ICI therapy combined with chemotherapy (ICC). Using patients' genomic and transcriptomic data, we develop an ordinary differential equations (ODEs)-based TIME dynamic evolutionary model to characterize interactions among chemotherapy, ICIs, immune cells, and tumor cells. A DRL agent is trained to determine the personalized optimal ICC therapy. Numerical experiments with real-world data demonstrate that the proposed TIME model can predict ICI therapy response. The DRL-derived personalized ICC therapy outperforms predefined fixed schedules. For tumors with extremely low CD8 + T cell infiltration ('extremely cold tumors'), the DRL agent recommends high-dosage chemotherapy alone. For tumors with higher CD8 + T cell infiltration ('cold' and 'hot tumors'), an appropriate chemotherapy dosage induces CD8 + T cell proliferation, enhancing ICI therapy outcomes. Specifically, for 'hot tumors', chemotherapy and ICI are administered simultaneously, while for 'cold tumors', a mid-dosage of chemotherapy makes the TIME 'hotter' before ICI administration. However, in several 'cold tumors' with rapid resistant tumor cell growth, ICC eventually fails. This study highlights the potential of utilizing real-world clinical data and DRL algorithm to develop personalized optimal ICC by understanding the complex biological dynamics of a patient's TIME. Our ODE-based TIME dynamic evolutionary model offers a theoretical framework for determining the best use of ICI, and the proposed DRL agent may guide personalized ICC schedules.
增强患者对免疫检查点抑制剂(ICIs)的反应在癌症免疫疗法中至关重要。我们的目标是创建一个数据驱动的肿瘤免疫微环境数学模型(TIME),并利用深度强化学习(DRL)来优化患者特异性 ICI 治疗联合化疗(ICC)。利用患者的基因组和转录组数据,我们开发了基于常微分方程(ODEs)的TIME动态进化模型,以描述化疗、ICIs、免疫细胞和肿瘤细胞之间的相互作用。对 DRL 代理进行训练,以确定个性化的最佳 ICC 疗法。利用真实世界数据进行的数值实验证明,所提出的 TIME 模型可以预测 ICI 治疗反应。DRL 衍生的个性化 ICC 疗法优于预定义的固定时间表。对于 CD8 + T 细胞浸润极低的肿瘤("极冷肿瘤"),DRL 代理建议单独使用大剂量化疗。对于 CD8 + T 细胞浸润较高的肿瘤("冷肿瘤 "和 "热肿瘤"),适当的化疗剂量可诱导 CD8 + T 细胞增殖,从而提高 ICI 治疗效果。具体来说,对于 "热肿瘤",化疗和 ICI 可同时进行;而对于 "冷肿瘤",化疗的中期剂量可使 TIME 在 ICI 给药前变得更 "热"。然而,在一些肿瘤细胞快速生长的 "冷肿瘤 "中,ICC最终失败。本研究强调了利用真实世界的临床数据和 DRL 算法,通过了解患者 TIME 的复杂生物动态,开发个性化最佳 ICC 的潜力。我们基于 ODE 的 TIME 动态进化模型为确定 ICI 的最佳使用提供了一个理论框架,而所提出的 DRL 代理可为个性化 ICC 计划提供指导。
{"title":"Optimized patient-specific immune checkpoint inhibitor therapies for cancer treatment based on tumor immune microenvironment modeling.","authors":"Yao Yao, Youhua Frank Chen, Qingpeng Zhang","doi":"10.1093/bib/bbae547","DOIUrl":"https://doi.org/10.1093/bib/bbae547","url":null,"abstract":"<p><p>Enhancing patient response to immune checkpoint inhibitors (ICIs) is crucial in cancer immunotherapy. We aim to create a data-driven mathematical model of the tumor immune microenvironment (TIME) and utilize deep reinforcement learning (DRL) to optimize patient-specific ICI therapy combined with chemotherapy (ICC). Using patients' genomic and transcriptomic data, we develop an ordinary differential equations (ODEs)-based TIME dynamic evolutionary model to characterize interactions among chemotherapy, ICIs, immune cells, and tumor cells. A DRL agent is trained to determine the personalized optimal ICC therapy. Numerical experiments with real-world data demonstrate that the proposed TIME model can predict ICI therapy response. The DRL-derived personalized ICC therapy outperforms predefined fixed schedules. For tumors with extremely low CD8 + T cell infiltration ('extremely cold tumors'), the DRL agent recommends high-dosage chemotherapy alone. For tumors with higher CD8 + T cell infiltration ('cold' and 'hot tumors'), an appropriate chemotherapy dosage induces CD8 + T cell proliferation, enhancing ICI therapy outcomes. Specifically, for 'hot tumors', chemotherapy and ICI are administered simultaneously, while for 'cold tumors', a mid-dosage of chemotherapy makes the TIME 'hotter' before ICI administration. However, in several 'cold tumors' with rapid resistant tumor cell growth, ICC eventually fails. This study highlights the potential of utilizing real-world clinical data and DRL algorithm to develop personalized optimal ICC by understanding the complex biological dynamics of a patient's TIME. Our ODE-based TIME dynamic evolutionary model offers a theoretical framework for determining the best use of ICI, and the proposed DRL agent may guide personalized ICC schedules.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11503752/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142495341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Using multi-omics data for clustering (cancer subtyping) is crucial for precision medicine research. Despite numerous methods having been proposed, current approaches either do not perform satisfactorily or lack biological interpretability, limiting the practical application of these methods. Based on the biological hypothesis that patients with the same subtype may exhibit similar dysregulated pathways, we developed an Iterative Pathway Fusion approach for enhanced Multi-omics Clustering (IPFMC), a novel multi-omics clustering method involving two data fusion stages. In the first stage, omics data are partitioned at each layer using pathway information, with crucial pathways iteratively selected to represent samples. Ultimately, the representation information from multiple pathways is integrated. In the second stage, similarity network fusion was applied to integrate the representation information from multiple omics. Comparative experiments with nine cancer datasets from The Cancer Genome Atlas (TCGA), involving systematic comparisons with 10 representative methods, reveal that IPFMC outperforms these methods. Additionally, the biological pathways and genes identified by our approach hold biological significance, affirming not only its excellent clustering performance but also its biological interpretability.
{"title":"IPFMC: an iterative pathway fusion approach for enhanced multi-omics clustering in cancer research.","authors":"Haoyang Zhang, Sha Liu, Bingxin Li, Xionghui Zhou","doi":"10.1093/bib/bbae541","DOIUrl":"10.1093/bib/bbae541","url":null,"abstract":"<p><p>Using multi-omics data for clustering (cancer subtyping) is crucial for precision medicine research. Despite numerous methods having been proposed, current approaches either do not perform satisfactorily or lack biological interpretability, limiting the practical application of these methods. Based on the biological hypothesis that patients with the same subtype may exhibit similar dysregulated pathways, we developed an Iterative Pathway Fusion approach for enhanced Multi-omics Clustering (IPFMC), a novel multi-omics clustering method involving two data fusion stages. In the first stage, omics data are partitioned at each layer using pathway information, with crucial pathways iteratively selected to represent samples. Ultimately, the representation information from multiple pathways is integrated. In the second stage, similarity network fusion was applied to integrate the representation information from multiple omics. Comparative experiments with nine cancer datasets from The Cancer Genome Atlas (TCGA), involving systematic comparisons with 10 representative methods, reveal that IPFMC outperforms these methods. Additionally, the biological pathways and genes identified by our approach hold biological significance, affirming not only its excellent clustering performance but also its biological interpretability.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11514061/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142520982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenwen Min, Zhiceng Shi, Jun Zhang, Jun Wan, Changmiao Wang
In recent years, the advent of spatial transcriptomics (ST) technology has unlocked unprecedented opportunities for delving into the complexities of gene expression patterns within intricate biological systems. Despite its transformative potential, the prohibitive cost of ST technology remains a significant barrier to its widespread adoption in large-scale studies. An alternative, more cost-effective strategy involves employing artificial intelligence to predict gene expression levels using readily accessible whole-slide images stained with Hematoxylin and Eosin (H&E). However, existing methods have yet to fully capitalize on multimodal information provided by H&E images and ST data with spatial location. In this paper, we propose mclSTExp, a multimodal contrastive learning with Transformer and Densenet-121 encoder for Spatial Transcriptomics Expression prediction. We conceptualize each spot as a "word", integrating its intrinsic features with spatial context through the self-attention mechanism of a Transformer encoder. This integration is further enriched by incorporating image features via contrastive learning, thereby enhancing the predictive capability of our model. We conducted an extensive evaluation of highly variable genes in two breast cancer datasets and a skin squamous cell carcinoma dataset, and the results demonstrate that mclSTExp exhibits superior performance in predicting spatial gene expression. Moreover, mclSTExp has shown promise in interpreting cancer-specific overexpressed genes, elucidating immune-related genes, and identifying specialized spatial domains annotated by pathologists. Our source code is available at https://github.com/shizhiceng/mclSTExp.
{"title":"Multimodal contrastive learning for spatial gene expression prediction using histology images.","authors":"Wenwen Min, Zhiceng Shi, Jun Zhang, Jun Wan, Changmiao Wang","doi":"10.1093/bib/bbae551","DOIUrl":"https://doi.org/10.1093/bib/bbae551","url":null,"abstract":"<p><p>In recent years, the advent of spatial transcriptomics (ST) technology has unlocked unprecedented opportunities for delving into the complexities of gene expression patterns within intricate biological systems. Despite its transformative potential, the prohibitive cost of ST technology remains a significant barrier to its widespread adoption in large-scale studies. An alternative, more cost-effective strategy involves employing artificial intelligence to predict gene expression levels using readily accessible whole-slide images stained with Hematoxylin and Eosin (H&E). However, existing methods have yet to fully capitalize on multimodal information provided by H&E images and ST data with spatial location. In this paper, we propose mclSTExp, a multimodal contrastive learning with Transformer and Densenet-121 encoder for Spatial Transcriptomics Expression prediction. We conceptualize each spot as a \"word\", integrating its intrinsic features with spatial context through the self-attention mechanism of a Transformer encoder. This integration is further enriched by incorporating image features via contrastive learning, thereby enhancing the predictive capability of our model. We conducted an extensive evaluation of highly variable genes in two breast cancer datasets and a skin squamous cell carcinoma dataset, and the results demonstrate that mclSTExp exhibits superior performance in predicting spatial gene expression. Moreover, mclSTExp has shown promise in interpreting cancer-specific overexpressed genes, elucidating immune-related genes, and identifying specialized spatial domains annotated by pathologists. Our source code is available at https://github.com/shizhiceng/mclSTExp.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142543789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shisheng Wang, Wenjuan Zeng, Yin Yang, Jingqiu Cheng, Dan Liu, Hao Yang
Cisplatin is one of the most commonly used chemotherapy drugs for treating solid tumors. As a genotoxic agent, cisplatin binds to DNA and forms platinum-DNA adducts that cause DNA damage and activate a series of signaling pathways mediated by various DNA-binding proteins (DBPs), ultimately leading to cell death. Therefore, DBPs play crucial roles in the cellular response to cisplatin and in determining cell fate. However, systematic studies of DBPs responding to cisplatin damage and their temporal dynamics are still lacking. To address this, we developed a novel and user-friendly stand-alone software, DEWNA, designed for dynamic entropy weight network analysis to reveal the dynamic changes of DBPs and their functions. DEWNA utilizes the entropy weight method, multiscale embedded gene co-expression network analysis and generalized reporter score-based analysis to process time-course proteome expression data, helping scientists identify protein hubs and pathway entropy profiles during disease progression. We applied DEWNA to a dataset of DBPs from A549 cells responding to cisplatin-induced damage across 8 time points, with data generated by data-independent acquisition mass spectrometry (DIA-MS). The results demonstrate that DEWNA can effectively identify protein hubs and associated pathways that are significantly altered in response to cisplatin-induced DNA damage, and offer a comprehensive view of how different pathways interact and respond dynamically over time to cisplatin treatment. Notably, we observed the dynamic activation of distinct DNA repair pathways and cell death mechanisms during the drug treatment time course, providing new insights into the molecular mechanisms underlying the cellular response to DNA damage.
顺铂是治疗实体瘤最常用的化疗药物之一。作为一种基因毒性药物,顺铂与 DNA 结合,形成铂-DNA 加合物,造成 DNA 损伤,并激活由各种 DNA 结合蛋白(DBPs)介导的一系列信号通路,最终导致细胞死亡。因此,DBPs 在细胞对顺铂的反应和决定细胞命运方面起着至关重要的作用。然而,有关 DBPs 对顺铂损伤的反应及其时间动态的系统研究仍然缺乏。为了解决这个问题,我们开发了一种新颖且用户友好的单机软件 DEWNA,用于动态熵权网络分析,以揭示 DBPs 的动态变化及其功能。DEWNA 利用熵权法、多尺度嵌入式基因共表达网络分析和基于广义报告得分的分析来处理时序蛋白质组表达数据,帮助科学家识别疾病进展过程中的蛋白质枢纽和通路熵谱。我们将 DEWNA 应用于 A549 细胞对顺铂诱导的损伤做出反应的 DBPs 数据集,该数据集跨越 8 个时间点,由数据无关采集质谱(DIA-MS)生成。结果表明,DEWNA 能有效识别顺铂诱导 DNA 损伤时发生显著改变的蛋白质枢纽和相关通路,并能全面了解不同通路如何相互作用并随时间动态响应顺铂处理。值得注意的是,我们观察到在药物治疗过程中,不同的DNA修复通路和细胞死亡机制被动态激活,这为我们深入了解细胞对DNA损伤反应的分子机制提供了新的视角。
{"title":"DEWNA: dynamic entropy weight network analysis and its application to the DNA-binding proteome in A549 cells with cisplatin-induced damage.","authors":"Shisheng Wang, Wenjuan Zeng, Yin Yang, Jingqiu Cheng, Dan Liu, Hao Yang","doi":"10.1093/bib/bbae564","DOIUrl":"10.1093/bib/bbae564","url":null,"abstract":"<p><p>Cisplatin is one of the most commonly used chemotherapy drugs for treating solid tumors. As a genotoxic agent, cisplatin binds to DNA and forms platinum-DNA adducts that cause DNA damage and activate a series of signaling pathways mediated by various DNA-binding proteins (DBPs), ultimately leading to cell death. Therefore, DBPs play crucial roles in the cellular response to cisplatin and in determining cell fate. However, systematic studies of DBPs responding to cisplatin damage and their temporal dynamics are still lacking. To address this, we developed a novel and user-friendly stand-alone software, DEWNA, designed for dynamic entropy weight network analysis to reveal the dynamic changes of DBPs and their functions. DEWNA utilizes the entropy weight method, multiscale embedded gene co-expression network analysis and generalized reporter score-based analysis to process time-course proteome expression data, helping scientists identify protein hubs and pathway entropy profiles during disease progression. We applied DEWNA to a dataset of DBPs from A549 cells responding to cisplatin-induced damage across 8 time points, with data generated by data-independent acquisition mass spectrometry (DIA-MS). The results demonstrate that DEWNA can effectively identify protein hubs and associated pathways that are significantly altered in response to cisplatin-induced DNA damage, and offer a comprehensive view of how different pathways interact and respond dynamically over time to cisplatin treatment. Notably, we observed the dynamic activation of distinct DNA repair pathways and cell death mechanisms during the drug treatment time course, providing new insights into the molecular mechanisms underlying the cellular response to DNA damage.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11530294/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142563963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}