首页 > 最新文献

Briefings in bioinformatics最新文献

英文 中文
kMetaShot: a fast and reliable taxonomy classifier for metagenome-assembled genomes. kmetshot:一个快速可靠的宏基因组组装基因组分类分类器。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbae680
Giuseppe Defazio, Marco Antonio Tangaro, Graziano Pesole, Bruno Fosso

The advent of high-throughput sequencing (HTS) technologies unlocked the complexity of the microbial world through the development of metagenomics, which now provides an unprecedented and comprehensive overview of its taxonomic and functional contribution in a huge variety of macro- and micro-ecosystems. In particular, shotgun metagenomics allows the reconstruction of microbial genomes, through the assembly of reads into MAGs (metagenome-assembled genomes). In fact, MAGs represent an information-rich proxy for inferring the taxonomic composition and the functional contribution of microbiomes, even if the relevant analytical approaches are not trivial and still improvable. In this regard, tools like CAMITAX and GTDBtk have implemented complex approaches, relying on marker gene identification and sequence alignments, requiring a large processing time. With the aim of deploying an effective tool for fast and reliable MAG taxonomic classification, we present here kMetaShot, a taxonomy classifier based on k-mer/minimizer counting. We benchmarked kMetaShot against CAMITAX and GTDBtk by using both in silico and real mock communities and demonstrated how, while implementing a fast and concise algorithm, it outperforms the other tools in terms of classification accuracy. Additionally, kMetaShot is an easy-to-install and easy-to-use bioinformatic tool that is also suitable for researchers with few command-line skills. It is available and documented at https://github.com/gdefazio/kMetaShot.

高通量测序(HTS)技术的出现通过宏基因组学的发展揭开了微生物世界的复杂性,现在提供了其在各种宏观和微生态系统中的分类和功能贡献的前所未有的全面概述。特别是,霰弹枪宏基因组学允许通过将reads组装成MAGs(宏基因组组装基因组)来重建微生物基因组。事实上,即使相关的分析方法不是微不足道的,而且仍然可以改进,mag也代表了推断微生物组的分类组成和功能贡献的信息丰富的代理。在这方面,CAMITAX和GTDBtk等工具实现了复杂的方法,依赖于标记基因鉴定和序列比对,需要大量的处理时间。为了部署一个快速可靠的MAG分类分类的有效工具,我们在这里提出了kmetshot,一个基于k-mer/minimizer计数的分类分类器。我们通过使用计算机和真实的模拟社区对kmetshot与CAMITAX和GTDBtk进行了基准测试,并演示了在实现快速简洁的算法的同时,它如何在分类准确性方面优于其他工具。此外,kmetshot是一个易于安装和易于使用的生物信息学工具,也适用于没有多少命令行技能的研究人员。它可以在https://github.com/gdefazio/kMetaShot上获得和记录。
{"title":"kMetaShot: a fast and reliable taxonomy classifier for metagenome-assembled genomes.","authors":"Giuseppe Defazio, Marco Antonio Tangaro, Graziano Pesole, Bruno Fosso","doi":"10.1093/bib/bbae680","DOIUrl":"10.1093/bib/bbae680","url":null,"abstract":"<p><p>The advent of high-throughput sequencing (HTS) technologies unlocked the complexity of the microbial world through the development of metagenomics, which now provides an unprecedented and comprehensive overview of its taxonomic and functional contribution in a huge variety of macro- and micro-ecosystems. In particular, shotgun metagenomics allows the reconstruction of microbial genomes, through the assembly of reads into MAGs (metagenome-assembled genomes). In fact, MAGs represent an information-rich proxy for inferring the taxonomic composition and the functional contribution of microbiomes, even if the relevant analytical approaches are not trivial and still improvable. In this regard, tools like CAMITAX and GTDBtk have implemented complex approaches, relying on marker gene identification and sequence alignments, requiring a large processing time. With the aim of deploying an effective tool for fast and reliable MAG taxonomic classification, we present here kMetaShot, a taxonomy classifier based on k-mer/minimizer counting. We benchmarked kMetaShot against CAMITAX and GTDBtk by using both in silico and real mock communities and demonstrated how, while implementing a fast and concise algorithm, it outperforms the other tools in terms of classification accuracy. Additionally, kMetaShot is an easy-to-install and easy-to-use bioinformatic tool that is also suitable for researchers with few command-line skills. It is available and documented at https://github.com/gdefazio/kMetaShot.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11695915/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142920868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying cancer prognosis genes through causal learning. 通过因果学习识别癌症预后基因。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbae721
Siwei Wu, Chaoyi Yin, Yuezhu Wang, Huiyan Sun

Accurate identification of causal genes for cancer prognosis is critical for estimating disease progression and guiding treatment interventions. In this study, we propose CPCG (Cancer Prognosis's Causal Gene), a two-stage framework identifying gene sets causally associated with patient prognosis across diverse cancer types using transcriptomic data. Initially, an ensemble approach models gene expression's impact on survival with parametric and semiparametric hazard models. Subsequently, an iterative conditional independence test combined with graph pruning is utilized to infer the causal skeleton, thereby pinpointing prognosis-related genes. Experiments on transcriptomic data from 18 cancer types sourced from The Cancer Genome Atlas Project demonstrate CPCG's effectiveness in predicting prognosis under four evaluation metrics. Validations on 24 additional datasets covering 12 cancer types from the Gene Expression Omnibus and the Chinese Glioma Genome Atlas Project further demonstrate CPCG's robustness and generalizability. CPCG identifies a concise but reliable set of genes, obviating the need for gene combination enumeration for survival time estimation. These genes are also proved closely linked to crucial biological processes in cancer. Moreover, CPCG constructs a stable causal skeleton and exhibits insensitivity to the order of data shuffling. Overall, CPCG is a powerful tool for extracting cancer prognostic biomarkers, offering interpretability, generalizability, and robustness. CPCG holds promise for facilitating targeted interventions in clinical treatment strategies.

准确识别癌症预后的致病基因对于估计疾病进展和指导治疗干预至关重要。在这项研究中,我们提出了CPCG(癌症预后的因果基因),这是一个两阶段的框架,利用转录组学数据识别与不同癌症类型的患者预后有因果关系的基因集。最初,一个集合方法用参数和半参数风险模型来模拟基因表达对生存的影响。随后,利用迭代条件独立性检验结合图修剪来推断因果骨架,从而精确定位预后相关基因。对来自癌症基因组图谱项目的18种癌症类型的转录组学数据的实验表明,CPCG在四个评估指标下预测预后的有效性。对来自基因表达综合数据库和中国胶质瘤基因组图谱项目的24个额外数据集的验证进一步证明了CPCG的稳健性和普遍性。CPCG识别了一组简洁但可靠的基因,避免了对生存时间估计的基因组合枚举的需要。这些基因也被证明与癌症的关键生物过程密切相关。此外,CPCG构建了一个稳定的因果骨架,对数据洗牌顺序不敏感。总的来说,CPCG是提取癌症预后生物标志物的强大工具,具有可解释性、通用性和稳健性。CPCG有望促进临床治疗策略中有针对性的干预。
{"title":"Identifying cancer prognosis genes through causal learning.","authors":"Siwei Wu, Chaoyi Yin, Yuezhu Wang, Huiyan Sun","doi":"10.1093/bib/bbae721","DOIUrl":"10.1093/bib/bbae721","url":null,"abstract":"<p><p>Accurate identification of causal genes for cancer prognosis is critical for estimating disease progression and guiding treatment interventions. In this study, we propose CPCG (Cancer Prognosis's Causal Gene), a two-stage framework identifying gene sets causally associated with patient prognosis across diverse cancer types using transcriptomic data. Initially, an ensemble approach models gene expression's impact on survival with parametric and semiparametric hazard models. Subsequently, an iterative conditional independence test combined with graph pruning is utilized to infer the causal skeleton, thereby pinpointing prognosis-related genes. Experiments on transcriptomic data from 18 cancer types sourced from The Cancer Genome Atlas Project demonstrate CPCG's effectiveness in predicting prognosis under four evaluation metrics. Validations on 24 additional datasets covering 12 cancer types from the Gene Expression Omnibus and the Chinese Glioma Genome Atlas Project further demonstrate CPCG's robustness and generalizability. CPCG identifies a concise but reliable set of genes, obviating the need for gene combination enumeration for survival time estimation. These genes are also proved closely linked to crucial biological processes in cancer. Moreover, CPCG constructs a stable causal skeleton and exhibits insensitivity to the order of data shuffling. Overall, CPCG is a powerful tool for extracting cancer prognostic biomarkers, offering interpretability, generalizability, and robustness. CPCG holds promise for facilitating targeted interventions in clinical treatment strategies.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11729728/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142977610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Complex hierarchical structures analysis in single-cell data with Poincaré deep manifold transformation.
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbae687
Yongjie Xu, Zelin Zang, Bozhen Hu, Yue Yuan, Cheng Tan, Jun Xia, Stan Z Li

Single-cell RNA sequencing (scRNA-seq) offers remarkable insights into cellular development and differentiation by capturing the gene expression profiles of individual cells. The role of dimensionality reduction and visualization in the interpretation of scRNA-seq data has gained widely acceptance. However, current methods face several challenges, including incomplete structure-preserving strategies and high distortion in embeddings, which fail to effectively model complex cell trajectories with multiple branches. To address these issues, we propose the Poincaré deep manifold transformation (PoincaréDMT) method, which maps high-dimensional scRNA-seq data to a hyperbolic Poincaré disk. This approach preserves global structure from a graph Laplacian matrix while achieving local structure correction through a structure module combined with data augmentation. Additionally, PoincaréDMT alleviates batch effects by integrating a batch graph that accounts for batch labels into the low-dimensional embeddings during network training. Furthermore, PoincaréDMT introduces the Shapley additive explanations method based on trained model to identify the important marker genes in specific clusters and cell differentiation process. Therefore, PoincaréDMT provides a unified framework for multiple key tasks essential for scRNA-seq analysis, including trajectory inference, pseudotime inference, batch correction, and marker gene selection. We validate PoincaréDMT through extensive evaluations on both simulated and real scRNA-seq datasets, demonstrating its superior performance in preserving global and local data structures compared to existing methods.

{"title":"Complex hierarchical structures analysis in single-cell data with Poincaré deep manifold transformation.","authors":"Yongjie Xu, Zelin Zang, Bozhen Hu, Yue Yuan, Cheng Tan, Jun Xia, Stan Z Li","doi":"10.1093/bib/bbae687","DOIUrl":"10.1093/bib/bbae687","url":null,"abstract":"<p><p>Single-cell RNA sequencing (scRNA-seq) offers remarkable insights into cellular development and differentiation by capturing the gene expression profiles of individual cells. The role of dimensionality reduction and visualization in the interpretation of scRNA-seq data has gained widely acceptance. However, current methods face several challenges, including incomplete structure-preserving strategies and high distortion in embeddings, which fail to effectively model complex cell trajectories with multiple branches. To address these issues, we propose the Poincaré deep manifold transformation (PoincaréDMT) method, which maps high-dimensional scRNA-seq data to a hyperbolic Poincaré disk. This approach preserves global structure from a graph Laplacian matrix while achieving local structure correction through a structure module combined with data augmentation. Additionally, PoincaréDMT alleviates batch effects by integrating a batch graph that accounts for batch labels into the low-dimensional embeddings during network training. Furthermore, PoincaréDMT introduces the Shapley additive explanations method based on trained model to identify the important marker genes in specific clusters and cell differentiation process. Therefore, PoincaréDMT provides a unified framework for multiple key tasks essential for scRNA-seq analysis, including trajectory inference, pseudotime inference, batch correction, and marker gene selection. We validate PoincaréDMT through extensive evaluations on both simulated and real scRNA-seq datasets, demonstrating its superior performance in preserving global and local data structures compared to existing methods.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11757945/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143027967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identify potential drug candidates within a high-quality compound search space.
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbaf024
Xiaoqing Ru, Shulin Zhao, Quan Zou, Lifeng Xu

The identification of potential effective drug candidates is a fundamental step in new drug discovery, with profound implications for pharmaceutical research and the healthcare sector. While many computational methods have been developed for such predictions and have yielded promising results, two challenges persist: (i) The cold start problem of new drugs, which increases the difficulty of prediction due to lack of historical data or prior knowledge. (ii) The vastness of the compound search space for potential drug candidates. In this study, we present a promising method that not only enhances the accuracy of identifying potential novel drug candidates but also refines the search space. Drawing inspiration from solutions to the cold start problem in recommender systems, we apply 'learning to rank' techniques to the field of new drug discovery. Furthermore, we propose using three similarity metrics to condense the compound search space into compact yet high-quality spaces, allowing for more efficient screening of potential drug candidates. Experimental results from two widely used datasets demonstrate that our method outperforms other state-of-the-art approaches in the new drug cold-start scenario. Additionally, we have verified that it is feasible to identify potential drug candidates within these high-quality compound search spaces. To our knowledge, this study is the first to address drug cold-start problem in such a confined space, potentially providing valuable insights and guidance for drug screening.

{"title":"Identify potential drug candidates within a high-quality compound search space.","authors":"Xiaoqing Ru, Shulin Zhao, Quan Zou, Lifeng Xu","doi":"10.1093/bib/bbaf024","DOIUrl":"10.1093/bib/bbaf024","url":null,"abstract":"<p><p>The identification of potential effective drug candidates is a fundamental step in new drug discovery, with profound implications for pharmaceutical research and the healthcare sector. While many computational methods have been developed for such predictions and have yielded promising results, two challenges persist: (i) The cold start problem of new drugs, which increases the difficulty of prediction due to lack of historical data or prior knowledge. (ii) The vastness of the compound search space for potential drug candidates. In this study, we present a promising method that not only enhances the accuracy of identifying potential novel drug candidates but also refines the search space. Drawing inspiration from solutions to the cold start problem in recommender systems, we apply 'learning to rank' techniques to the field of new drug discovery. Furthermore, we propose using three similarity metrics to condense the compound search space into compact yet high-quality spaces, allowing for more efficient screening of potential drug candidates. Experimental results from two widely used datasets demonstrate that our method outperforms other state-of-the-art approaches in the new drug cold-start scenario. Additionally, we have verified that it is feasible to identify potential drug candidates within these high-quality compound search spaces. To our knowledge, this study is the first to address drug cold-start problem in such a confined space, potentially providing valuable insights and guidance for drug screening.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11758506/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143032245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deciphering cell states and the cellular ecosystem to improve risk stratification in acute myeloid leukemia.
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbaf028
Zheyang Zhang, Ronghan Tang, Ming Zhu, Zhijuan Zhu, Jiali Zhu, Hua Li, Mengsha Tong, Nainong Li, Jialiang Huang

Acute myeloid leukemia (AML) demonstrates significant cellular heterogeneity in both leukemic and immune cells, providing valuable insights into clinical outcomes. Here, we constructed an AML single-cell transcriptome atlas and proposed sciNMF workflow to systematically dissect underlying cellular heterogeneity. Notably, sciNMF identified 26 leukemic and immune cell states that linked to clinical variables, mutations, and prognosis. By examining the co-existence patterns among these cell states, we highlighted a unique AML cellular ecosystem (ACE) that signifies aberrant tumor milieu and poor survival, which is confirmed by public RNA-seq cohorts. We further developed the ACE signature (ACEsig), comprising 12 genes, which accurately predicts AML prognosis, and outperforms existing signatures. When applied to cytogenetically normal AML or intensively treated patients, the ACEsig continues to demonstrate strong performance. Our results demonstrate that large-scale systematic characterization of cellular heterogeneity has the potential to enhance our understanding of AML heterogeneity and contribute to more precise risk stratification strategy.

{"title":"Deciphering cell states and the cellular ecosystem to improve risk stratification in acute myeloid leukemia.","authors":"Zheyang Zhang, Ronghan Tang, Ming Zhu, Zhijuan Zhu, Jiali Zhu, Hua Li, Mengsha Tong, Nainong Li, Jialiang Huang","doi":"10.1093/bib/bbaf028","DOIUrl":"10.1093/bib/bbaf028","url":null,"abstract":"<p><p>Acute myeloid leukemia (AML) demonstrates significant cellular heterogeneity in both leukemic and immune cells, providing valuable insights into clinical outcomes. Here, we constructed an AML single-cell transcriptome atlas and proposed sciNMF workflow to systematically dissect underlying cellular heterogeneity. Notably, sciNMF identified 26 leukemic and immune cell states that linked to clinical variables, mutations, and prognosis. By examining the co-existence patterns among these cell states, we highlighted a unique AML cellular ecosystem (ACE) that signifies aberrant tumor milieu and poor survival, which is confirmed by public RNA-seq cohorts. We further developed the ACE signature (ACEsig), comprising 12 genes, which accurately predicts AML prognosis, and outperforms existing signatures. When applied to cytogenetically normal AML or intensively treated patients, the ACEsig continues to demonstrate strong performance. Our results demonstrate that large-scale systematic characterization of cellular heterogeneity has the potential to enhance our understanding of AML heterogeneity and contribute to more precise risk stratification strategy.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11770069/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143045571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting transcriptional changes induced by molecules with MiTCP.
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbaf006
Kaiyuan Yang, Jiabei Cheng, Shenghao Cao, Xiaoyong Pan, Hong-Bin Shen, Ye Yuan

Studying the changes in cellular transcriptional profiles induced by small molecules can significantly advance our understanding of cellular state alterations and response mechanisms under chemical perturbations, which plays a crucial role in drug discovery and screening processes. Considering that experimental measurements need substantial time and cost, we developed a deep learning-based method called Molecule-induced Transcriptional Change Predictor (MiTCP) to predict changes in transcriptional profiles (CTPs) of 978 landmark genes induced by molecules. MiTCP utilizes graph neural network-based approaches to simultaneously model molecular structure representation and gene co-expression relationships, and integrates them for CTP prediction. After training on the L1000 dataset, MiTCP achieves an average Pearson correlation coefficient (PCC) of 0.482 on the test set and an average PCC of 0.801 for predicting the top 50 differentially expressed genes, which outperforms other existing methods. Furthermore, we used MiTCP to predict CTPs of three cancer drugs, palbociclib, irinotecan and goserelin, and performed gene enrichment analysis on the top differentially expressed genes and found that the enriched pathways and Gene Ontology terms are highly relevant to the corresponding diseases, which reveals the potential of MiTCP in drug development.

{"title":"Predicting transcriptional changes induced by molecules with MiTCP.","authors":"Kaiyuan Yang, Jiabei Cheng, Shenghao Cao, Xiaoyong Pan, Hong-Bin Shen, Ye Yuan","doi":"10.1093/bib/bbaf006","DOIUrl":"10.1093/bib/bbaf006","url":null,"abstract":"<p><p>Studying the changes in cellular transcriptional profiles induced by small molecules can significantly advance our understanding of cellular state alterations and response mechanisms under chemical perturbations, which plays a crucial role in drug discovery and screening processes. Considering that experimental measurements need substantial time and cost, we developed a deep learning-based method called Molecule-induced Transcriptional Change Predictor (MiTCP) to predict changes in transcriptional profiles (CTPs) of 978 landmark genes induced by molecules. MiTCP utilizes graph neural network-based approaches to simultaneously model molecular structure representation and gene co-expression relationships, and integrates them for CTP prediction. After training on the L1000 dataset, MiTCP achieves an average Pearson correlation coefficient (PCC) of 0.482 on the test set and an average PCC of 0.801 for predicting the top 50 differentially expressed genes, which outperforms other existing methods. Furthermore, we used MiTCP to predict CTPs of three cancer drugs, palbociclib, irinotecan and goserelin, and performed gene enrichment analysis on the top differentially expressed genes and found that the enriched pathways and Gene Ontology terms are highly relevant to the corresponding diseases, which reveals the potential of MiTCP in drug development.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11756340/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143022250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepPFP: a multi-task-aware architecture for protein function prediction. DeepPFP:用于蛋白质功能预测的多任务感知架构。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbae579
Han Wang, Zilin Ren, Jinghong Sun, Yongbing Chen, Xiaochen Bo, JiGuo Xue, Jingyang Gao, Ming Ni

Deriving protein function from protein sequences poses a significant challenge due to the intricate relationship between sequence and function. Deep learning has made remarkable strides in predicting sequence-function relationships. However, models tailored for specific tasks or protein types encounter difficulties when using transfer learning across domains. This is attributed to the fact that protein function relies heavily on structural characteristics rather than mere sequence information. Consequently, there is a pressing need for a model capable of capturing shared features among diverse sequence-function mapping tasks to address the generalization issue. In this study, we explore the potential of Model-Agnostic Meta-Learning combined with a protein language model called Evolutionary Scale Modeling to tackle this challenge. Our approach involves training the architecture on five out-domain deep mutational scanning (DMS) datasets and evaluating its performance across four key dimensions. Our findings demonstrate that the proposed architecture exhibits satisfactory performance in terms of generalization and employs an effective few-shot learning strategy. To explain further, Compared to the best results, the Pearson's correlation coefficient (PCC) in the final stage increased by ~0.31%. Furthermore, we leverage the trained architecture to predict binding affinity scores of the DMS dataset of SARS-CoV-2 using transfer learning. Notably, training on a subset of the Ube4b dataset with 500 samples resulted in a notable improvement of 0.11 in the PCC. These results underscore the potential of our conceptual architecture as a promising methodology for multi-task protein function prediction.

{"title":"DeepPFP: a multi-task-aware architecture for protein function prediction.","authors":"Han Wang, Zilin Ren, Jinghong Sun, Yongbing Chen, Xiaochen Bo, JiGuo Xue, Jingyang Gao, Ming Ni","doi":"10.1093/bib/bbae579","DOIUrl":"10.1093/bib/bbae579","url":null,"abstract":"<p><p>Deriving protein function from protein sequences poses a significant challenge due to the intricate relationship between sequence and function. Deep learning has made remarkable strides in predicting sequence-function relationships. However, models tailored for specific tasks or protein types encounter difficulties when using transfer learning across domains. This is attributed to the fact that protein function relies heavily on structural characteristics rather than mere sequence information. Consequently, there is a pressing need for a model capable of capturing shared features among diverse sequence-function mapping tasks to address the generalization issue. In this study, we explore the potential of Model-Agnostic Meta-Learning combined with a protein language model called Evolutionary Scale Modeling to tackle this challenge. Our approach involves training the architecture on five out-domain deep mutational scanning (DMS) datasets and evaluating its performance across four key dimensions. Our findings demonstrate that the proposed architecture exhibits satisfactory performance in terms of generalization and employs an effective few-shot learning strategy. To explain further, Compared to the best results, the Pearson's correlation coefficient (PCC) in the final stage increased by ~0.31%. Furthermore, we leverage the trained architecture to predict binding affinity scores of the DMS dataset of SARS-CoV-2 using transfer learning. Notably, training on a subset of the Ube4b dataset with 500 samples resulted in a notable improvement of 0.11 in the PCC. These results underscore the potential of our conceptual architecture as a promising methodology for multi-task protein function prediction.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11794456/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143188336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
iDOMO: identification of drug combinations via multi-set operations for treating diseases.
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbaf054
Xianxiao Zhou, Ling Wu, Minghui Wang, Guojun Wu, Bin Zhang

Combination therapy has become increasingly important for treating complex diseases which often involve multiple pathways and targets. However, experimental screening of drug combinations is costly and time-consuming. The availability of large-scale transcriptomic datasets (e.g. CMap and LINCS) from in vitro drug treatment experiments makes it possible to computationally predict drug combinations with synergistic effects. Towards this end, we developed a computational approach, termed Identification of Drug Combinations via Multi-Set Operations (iDOMO), to predict drug synergy based on multi-set operations of drug and disease gene signatures. iDOMO quantifies the synergistic effect of a pair of drugs by taking into account the combination's beneficial and detrimental effects on treating a disease. We evaluated iDOMO, in a DREAM Challenge dataset with the matched, pre- and post-treatment gene expression data and cell viability information. We further evaluated the performance of iDOMO by concordance index and Spearman correlation on predicting the Highest Single Agency (HSA) synergy scores for four most common cancer types in two large-scale drug combination databases, showing that iDOMO  significantly outperformed two existing popular drug combination approaches including the Therapeutic Score and the SynergySeq Orthogonality Score. Application of iDOMO to triple-negative breast cancer (TNBC) identified drug pairs with potential synergistic effects, with the combination of trifluridine and monobenzone being the most synergistic. Our in vitro experiments confirmed that the top predicted drug combination exerted a significant synergistic effect in inhibiting TNBC cell growth. In summary, iDOMO is an effective method for the in silico screening of synergistic drug combinations and will be a valuable tool for the development of novel therapeutics for complex diseases.

{"title":"iDOMO: identification of drug combinations via multi-set operations for treating diseases.","authors":"Xianxiao Zhou, Ling Wu, Minghui Wang, Guojun Wu, Bin Zhang","doi":"10.1093/bib/bbaf054","DOIUrl":"https://doi.org/10.1093/bib/bbaf054","url":null,"abstract":"<p><p>Combination therapy has become increasingly important for treating complex diseases which often involve multiple pathways and targets. However, experimental screening of drug combinations is costly and time-consuming. The availability of large-scale transcriptomic datasets (e.g. CMap and LINCS) from in vitro drug treatment experiments makes it possible to computationally predict drug combinations with synergistic effects. Towards this end, we developed a computational approach, termed Identification of Drug Combinations via Multi-Set Operations (iDOMO), to predict drug synergy based on multi-set operations of drug and disease gene signatures. iDOMO quantifies the synergistic effect of a pair of drugs by taking into account the combination's beneficial and detrimental effects on treating a disease. We evaluated iDOMO, in a DREAM Challenge dataset with the matched, pre- and post-treatment gene expression data and cell viability information. We further evaluated the performance of iDOMO by concordance index and Spearman correlation on predicting the Highest Single Agency (HSA) synergy scores for four most common cancer types in two large-scale drug combination databases, showing that iDOMO  significantly outperformed two existing popular drug combination approaches including the Therapeutic Score and the SynergySeq Orthogonality Score. Application of iDOMO to triple-negative breast cancer (TNBC) identified drug pairs with potential synergistic effects, with the combination of trifluridine and monobenzone being the most synergistic. Our in vitro experiments confirmed that the top predicted drug combination exerted a significant synergistic effect in inhibiting TNBC cell growth. In summary, iDOMO is an effective method for the in silico screening of synergistic drug combinations and will be a valuable tool for the development of novel therapeutics for complex diseases.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143457043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CSGDN: contrastive signed graph diffusion network for predicting crop gene-phenotype associations.
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbaf062
Yiru Pan, Xingyu Ji, Jiaqi You, Lu Li, Zhenping Liu, Xianlong Zhang, Zeyu Zhang, Maojun Wang

Positive and negative association prediction between gene and phenotype helps to illustrate the underlying mechanism of complex traits in organisms. The transcription and regulation activity of specific genes will be adjusted accordingly in different cell types, developmental timepoints, and physiological states. There are the following two problems in obtaining the positive/negative associations between gene and phenotype: (1) high-throughput DNA/RNA sequencing and phenotyping are expensive and time-consuming due to the need to process large sample sizes; (2) experiments introduce both random and systematic errors, and, meanwhile, calculations or predictions using software or models may produce noise. To address these two issues, we propose a Contrastive Signed Graph Diffusion Network, CSGDN, to learn robust node representations with fewer training samples to achieve higher link prediction accuracy. CSGDN uses a signed graph diffusion method to uncover the underlying regulatory associations between genes and phenotypes. Then, stochastic perturbation strategies are used to create two views for both original and diffusive graphs. Lastly, a multiview contrastive learning paradigm loss is designed to unify the node presentations learned from the two views to resist interference and reduce noise. We perform experiments to validate the performance of CSGDN in three crop datasets: Gossypium hirsutum, Brassica napus, and Triticum turgidum. The results show that the proposed model outperforms state-of-the-art methods by up to 9. 28% AUC for the prediction of link sign in the G. hirsutum dataset. The source code of our model is available at https://github.com/Erican-Ji/CSGDN.

{"title":"CSGDN: contrastive signed graph diffusion network for predicting crop gene-phenotype associations.","authors":"Yiru Pan, Xingyu Ji, Jiaqi You, Lu Li, Zhenping Liu, Xianlong Zhang, Zeyu Zhang, Maojun Wang","doi":"10.1093/bib/bbaf062","DOIUrl":"10.1093/bib/bbaf062","url":null,"abstract":"<p><p>Positive and negative association prediction between gene and phenotype helps to illustrate the underlying mechanism of complex traits in organisms. The transcription and regulation activity of specific genes will be adjusted accordingly in different cell types, developmental timepoints, and physiological states. There are the following two problems in obtaining the positive/negative associations between gene and phenotype: (1) high-throughput DNA/RNA sequencing and phenotyping are expensive and time-consuming due to the need to process large sample sizes; (2) experiments introduce both random and systematic errors, and, meanwhile, calculations or predictions using software or models may produce noise. To address these two issues, we propose a Contrastive Signed Graph Diffusion Network, CSGDN, to learn robust node representations with fewer training samples to achieve higher link prediction accuracy. CSGDN uses a signed graph diffusion method to uncover the underlying regulatory associations between genes and phenotypes. Then, stochastic perturbation strategies are used to create two views for both original and diffusive graphs. Lastly, a multiview contrastive learning paradigm loss is designed to unify the node presentations learned from the two views to resist interference and reduce noise. We perform experiments to validate the performance of CSGDN in three crop datasets: Gossypium hirsutum, Brassica napus, and Triticum turgidum. The results show that the proposed model outperforms state-of-the-art methods by up to 9. 28% AUC for the prediction of link sign in the G. hirsutum dataset. The source code of our model is available at https://github.com/Erican-Ji/CSGDN.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11840565/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143456952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TriTan: an efficient triple nonnegative matrix factorization method for integrative analysis of single-cell multiomics data. TriTan:一种用于单细胞多组学数据综合分析的高效三重非负矩阵因式分解方法。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbae615
Xin Ma, Lijing Lin, Qian Zhao, Mudassar Iqbal

Single-cell multiomics have opened up tremendous opportunities for understanding gene regulatory networks underlying cell states by simultaneously profiling transcriptomes, epigenomes, and proteomes of the same cell. However, existing computational methods for integrative analysis of these high-dimensional multiomics data are either computationally expensive or limited in interpretation. These limitations pose challenges in the implementation of these methods in large-scale studies and hinder a more in-depth understanding of the underlying regulatory mechanisms. Here, we propose TriTan (Triple inTegrative fast non-negative matrix factorization), an efficient joint factorization method for single-cell multiomics data. TriTan implements a highly efficient factorization algorithm, greatly improving its computational performance. Three matrix factorization produced by TriTan helps in clustering cells, identifying signature features for each cell type, and uncovering feature associations across omics, which facilitates the identification of domains of regulatory chromatin and the prediction of cell-type-specific regulatory networks. We applied TriTan to the single-cell multiomics data obtained from different technologies and benchmarked it against the state-of-the-art methods where it shows highly competitive performance. Furthermore, we showed a range of downstream analyses conducted utilizing TriTan outputs, highlighting its capacity to facilitate interpretation in biological discovery.

单细胞多组学通过同时分析同一细胞的转录组、表观基因组和蛋白质组,为了解细胞状态的基因调控网络提供了巨大的机会。然而,对这些高维多组学数据进行综合分析的现有计算方法要么计算成本高昂,要么解释能力有限。这些局限性给这些方法在大规模研究中的应用带来了挑战,并阻碍了对潜在调控机制的更深入了解。在此,我们提出了 TriTan(Triple inTegrative 快速非负矩阵因式分解),这是一种针对单细胞多组学数据的高效联合因式分解方法。TriTan 实现了一种高效的因式分解算法,大大提高了计算性能。TriTan 生成的三矩阵因式分解有助于对细胞进行聚类,识别每种细胞类型的特征,并发现跨 omics 的特征关联,从而有助于识别调控染色质域和预测细胞类型特异性调控网络。我们将 TriTan 应用于从不同技术获取的单细胞多组学数据,并将其与最先进的方法进行比较,结果显示 TriTan 的性能极具竞争力。此外,我们还展示了利用 TriTan 输出结果进行的一系列下游分析,突出了它在促进生物发现解释方面的能力。
{"title":"TriTan: an efficient triple nonnegative matrix factorization method for integrative analysis of single-cell multiomics data.","authors":"Xin Ma, Lijing Lin, Qian Zhao, Mudassar Iqbal","doi":"10.1093/bib/bbae615","DOIUrl":"10.1093/bib/bbae615","url":null,"abstract":"<p><p>Single-cell multiomics have opened up tremendous opportunities for understanding gene regulatory networks underlying cell states by simultaneously profiling transcriptomes, epigenomes, and proteomes of the same cell. However, existing computational methods for integrative analysis of these high-dimensional multiomics data are either computationally expensive or limited in interpretation. These limitations pose challenges in the implementation of these methods in large-scale studies and hinder a more in-depth understanding of the underlying regulatory mechanisms. Here, we propose TriTan (Triple inTegrative fast non-negative matrix factorization), an efficient joint factorization method for single-cell multiomics data. TriTan implements a highly efficient factorization algorithm, greatly improving its computational performance. Three matrix factorization produced by TriTan helps in clustering cells, identifying signature features for each cell type, and uncovering feature associations across omics, which facilitates the identification of domains of regulatory chromatin and the prediction of cell-type-specific regulatory networks. We applied TriTan to the single-cell multiomics data obtained from different technologies and benchmarked it against the state-of-the-art methods where it shows highly competitive performance. Furthermore, we showed a range of downstream analyses conducted utilizing TriTan outputs, highlighting its capacity to facilitate interpretation in biological discovery.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11586128/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142709258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Briefings in bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1