首页 > 最新文献

Briefings in Functional Genomics最新文献

英文 中文
A survey of biclustering and clustering methods in clustering different types of single-cell RNA sequencing data. 不同类型单细胞RNA测序数据聚类的双聚类和聚类方法综述。
IF 2.5 3区 生物学 Q3 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-01-15 DOI: 10.1093/bfgp/elaf010
Chaowang Lan, Xiaoqi Tang, Caihua Liu

Single-cell RNA sequencing (scRNA-seq) technology has garnered considerable attention as it enables the exploration of cellular heterogeneity from a single-cell perspective. Various unsupervised methods, such as biclustering and clustering methods, offer a theoretical foundation for understanding the structure and function of cells. However, accurately identifying cell subtypes within complex scRNA-seq data remains challenging. To evaluate the current development status; summarize the strengths, weaknesses, and improvement strategies of unsupervised methods; and provide guidelines for future research, we surveyed five biclustering and 21 clustering methods applied to different types of scRNA-seq datasets. We employed three external and two internal metrics to determine clustering performance on 10 publicly available real datasets. Dataset properties are quantified from six perspectives to discover the most suitable biclustering or clustering methods. The results of this survey indicate that biclustering methods are effective for identifying local consistency or for deeply mining partially annotated datasets. Conversely, clustering methods are more suitable for dealing with unknown datasets. This survey aids in identifying cellular heterogeneity by recommending appropriate methods based on different dataset characteristics.

单细胞RNA测序(scRNA-seq)技术由于能够从单细胞角度探索细胞异质性而引起了相当大的关注。各种无监督方法,如双聚类和聚类方法,为理解细胞的结构和功能提供了理论基础。然而,在复杂的scRNA-seq数据中准确识别细胞亚型仍然具有挑战性。评价目前的发展状况;总结无监督方法的优点、缺点和改进策略;为今后的研究提供指导,我们调查了应用于不同类型scRNA-seq数据集的5种双聚类和21种聚类方法。我们使用了三个外部指标和两个内部指标来确定10个公开可用的真实数据集的集群性能。从六个角度对数据集属性进行量化,以发现最合适的双聚类或聚类方法。调查结果表明,双聚类方法对于识别局部一致性或深度挖掘部分注释数据集是有效的。相反,聚类方法更适合于处理未知数据集。该调查通过推荐基于不同数据集特征的适当方法,有助于识别细胞异质性。
{"title":"A survey of biclustering and clustering methods in clustering different types of single-cell RNA sequencing data.","authors":"Chaowang Lan, Xiaoqi Tang, Caihua Liu","doi":"10.1093/bfgp/elaf010","DOIUrl":"10.1093/bfgp/elaf010","url":null,"abstract":"<p><p>Single-cell RNA sequencing (scRNA-seq) technology has garnered considerable attention as it enables the exploration of cellular heterogeneity from a single-cell perspective. Various unsupervised methods, such as biclustering and clustering methods, offer a theoretical foundation for understanding the structure and function of cells. However, accurately identifying cell subtypes within complex scRNA-seq data remains challenging. To evaluate the current development status; summarize the strengths, weaknesses, and improvement strategies of unsupervised methods; and provide guidelines for future research, we surveyed five biclustering and 21 clustering methods applied to different types of scRNA-seq datasets. We employed three external and two internal metrics to determine clustering performance on 10 publicly available real datasets. Dataset properties are quantified from six perspectives to discover the most suitable biclustering or clustering methods. The results of this survey indicate that biclustering methods are effective for identifying local consistency or for deeply mining partially annotated datasets. Conversely, clustering methods are more suitable for dealing with unknown datasets. This survey aids in identifying cellular heterogeneity by recommending appropriate methods based on different dataset characteristics.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":"24 ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12342763/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144838650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Expression and role of deubiquitinating enzymes in thyroid carcinoma. 去泛素化酶在甲状腺癌中的表达及作用。
IF 2.5 3区 生物学 Q3 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-01-15 DOI: 10.1093/bfgp/elaf022
Meiling Huang, Changjiao Yan, Rui Ling, Ting Wang

Thyroid cancer is one of the most common endocrine diseases worldwide with phenotypic heterogeneity. Deubiquitinating enzymes (DUBs) participated in ubiquitin (Ub) conjugases-induced signal by removing Ub from the substrates. Dysregulation of DUBs are associated with cancer progression, including thyroid carcinoma. In this review, we outline the main classification and structure of DUBs, the expression of DUBs in thyroid cancer, the association of DUBs with survival, and the possible mechanism of DUBs in thyroid cancer progression. Finally, we summarized the development of USP specific inhibitors, the strategies for designing and identifying selective inhibitors.

甲状腺癌是世界上最常见的内分泌疾病之一,具有表型异质性。去泛素化酶(deubiquitination酶,DUBs)通过从底物中去除泛素(Ub)参与泛素偶联物诱导的信号。dub的失调与包括甲状腺癌在内的癌症进展有关。本文综述了DUBs的主要分类和结构,DUBs在甲状腺癌中的表达,DUBs与生存的关系,以及DUBs在甲状腺癌进展中的可能机制。最后,我们总结了USP特异性抑制剂的研究进展、设计和鉴定选择性抑制剂的策略。
{"title":"Expression and role of deubiquitinating enzymes in thyroid carcinoma.","authors":"Meiling Huang, Changjiao Yan, Rui Ling, Ting Wang","doi":"10.1093/bfgp/elaf022","DOIUrl":"10.1093/bfgp/elaf022","url":null,"abstract":"<p><p>Thyroid cancer is one of the most common endocrine diseases worldwide with phenotypic heterogeneity. Deubiquitinating enzymes (DUBs) participated in ubiquitin (Ub) conjugases-induced signal by removing Ub from the substrates. Dysregulation of DUBs are associated with cancer progression, including thyroid carcinoma. In this review, we outline the main classification and structure of DUBs, the expression of DUBs in thyroid cancer, the association of DUBs with survival, and the possible mechanism of DUBs in thyroid cancer progression. Finally, we summarized the development of USP specific inhibitors, the strategies for designing and identifying selective inhibitors.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":"24 ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12700091/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145745903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A review of artificial intelligence-based brain age estimation and its applications for related diseases. 基于人工智能的脑年龄估计及其在相关疾病中的应用综述。
IF 2.5 3区 生物学 Q3 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-01-15 DOI: 10.1093/bfgp/elae042
Mohamed Azzam, Ziyang Xu, Ruobing Liu, Lie Li, Kah Meng Soh, Kishore B Challagundla, Shibiao Wan, Jieqiong Wang

The study of brain age has emerged over the past decade, aiming to estimate a person's age based on brain imaging scans. Ideally, predicted brain age should match chronological age in healthy individuals. However, brain structure and function change in the presence of brain-related diseases. Consequently, brain age also changes in affected individuals, making the brain age gap (BAG)-the difference between brain age and chronological age-a potential biomarker for brain health, early screening, and identifying age-related cognitive decline and disorders. With the recent successes of artificial intelligence in healthcare, it is essential to track the latest advancements and highlight promising directions. This review paper presents recent machine learning techniques used in brain age estimation (BAE) studies. Typically, BAE models involve developing a machine learning regression model to capture age-related variations in brain structure from imaging scans of healthy individuals and automatically predict brain age for new subjects. The process also involves estimating BAG as a measure of brain health. While we discuss recent clinical applications of BAE methods, we also review studies of biological age that can be integrated into BAE research. Finally, we point out the current limitations of BAE's studies.

脑年龄研究是在过去十年间兴起的,旨在根据脑成像扫描来估算一个人的年龄。理想情况下,预测的脑年龄应与健康人的实际年龄相符。然而,脑部结构和功能会因脑部相关疾病而发生变化。因此,受影响个体的脑年龄也会发生变化,这就使得脑年龄差距(BAG)--脑年龄与实际年龄之间的差值--成为大脑健康、早期筛查以及识别与年龄相关的认知衰退和失调的潜在生物标志物。最近,人工智能在医疗保健领域取得了巨大成功,因此有必要跟踪最新进展并强调有前景的发展方向。本综述论文介绍了最近用于脑年龄估计(BAE)研究的机器学习技术。通常,BAE 模型涉及开发一个机器学习回归模型,以便从健康人的成像扫描中捕捉大脑结构中与年龄相关的变化,并自动预测新受试者的脑年龄。这一过程还包括估算作为大脑健康度量的 BAG。在讨论 BAE 方法的最新临床应用的同时,我们还回顾了可纳入 BAE 研究的生物年龄研究。最后,我们指出了 BAE 研究目前存在的局限性。
{"title":"A review of artificial intelligence-based brain age estimation and its applications for related diseases.","authors":"Mohamed Azzam, Ziyang Xu, Ruobing Liu, Lie Li, Kah Meng Soh, Kishore B Challagundla, Shibiao Wan, Jieqiong Wang","doi":"10.1093/bfgp/elae042","DOIUrl":"10.1093/bfgp/elae042","url":null,"abstract":"<p><p>The study of brain age has emerged over the past decade, aiming to estimate a person's age based on brain imaging scans. Ideally, predicted brain age should match chronological age in healthy individuals. However, brain structure and function change in the presence of brain-related diseases. Consequently, brain age also changes in affected individuals, making the brain age gap (BAG)-the difference between brain age and chronological age-a potential biomarker for brain health, early screening, and identifying age-related cognitive decline and disorders. With the recent successes of artificial intelligence in healthcare, it is essential to track the latest advancements and highlight promising directions. This review paper presents recent machine learning techniques used in brain age estimation (BAE) studies. Typically, BAE models involve developing a machine learning regression model to capture age-related variations in brain structure from imaging scans of healthy individuals and automatically predict brain age for new subjects. The process also involves estimating BAG as a measure of brain health. While we discuss recent clinical applications of BAE methods, we also review studies of biological age that can be integrated into BAE research. Finally, we point out the current limitations of BAE's studies.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735757/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142481472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EnsembleSE: identification of super-enhancers based on ensemble learning. 基于集成学习的超增强器识别。
IF 2.5 3区 生物学 Q3 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-01-15 DOI: 10.1093/bfgp/elaf003
Wenying He, Jialu Xu, Yun Zuo, Yude Bai, Fei Guo

Super-enhancers (SEs) are typically located in the regulatory regions of genes, driving high-level gene expression. Identifying SEs is crucial for a deeper understanding of gene regulatory networks, disease mechanisms, and the development and physiological processes of organisms, thus exerting a profound impact on research and applications in the life sciences field. Traditional experimental methods for identifying SEs are costly and time-consuming. Existing methods for predicting SEs based solely on sequence data use deep learning for feature representation and have achieved good results. However, they overlook biological features related to physicochemical properties, leading to low interpretability. Additionally, the complex model structure often requires extensive labeled data for training, which limits their further application in biological data. In this paper, we integrate the strengths of different models and proposes an ensemble model based on an integration strategy to enhance the model's generalization ability. It designs a multi-angle feature representation method that combines local structure and global information to extract high-dimensional abstract relationships and key low-dimensional biological features from sequences. This enhances the effectiveness and interpretability of the model's input features, providing technical support for discovering cell-specific and species-specific patterns of SEs. We evaluated the performance on both mouse and human datasets using five metrics, including area under the receiver operating characteristic curve accuracy, and others. Compared to the latest models, EnsembleSE achieved an average improvement of 4.5% in F1 score and an average improvement of 8.05% in recall, demonstrating the robustness and adaptability of the model on a unified test set. Source codes are available at https://github.com/2103374200/EnsembleSE-main.

超级增强子通常位于基因的调控区域,驱动高水平的基因表达。识别se对于深入了解基因调控网络、疾病机制、生物发育和生理过程至关重要,对生命科学领域的研究和应用具有深远的影响。传统的实验方法既昂贵又耗时。现有的仅基于序列数据的se预测方法使用深度学习进行特征表示,并取得了良好的效果。然而,它们忽略了与物理化学性质相关的生物特征,导致可解释性较低。此外,复杂的模型结构往往需要大量的标记数据进行训练,这限制了其在生物数据中的进一步应用。本文综合了不同模型的优点,提出了一种基于集成策略的集成模型,以提高模型的泛化能力。设计了一种结合局部结构和全局信息的多角度特征表示方法,从序列中提取高维抽象关系和关键低维生物特征。这增强了模型输入特征的有效性和可解释性,为发现se的细胞特异性和物种特异性模式提供了技术支持。我们使用五个指标评估了小鼠和人类数据集的性能,包括接收器工作特征曲线下的面积,精度等。与最新模型相比,EnsembleSE的F1得分平均提高了4.5%,召回率平均提高了8.05%,显示了模型在统一测试集上的鲁棒性和适应性。源代码可从https://github.com/2103374200/EnsembleSE-main获得。
{"title":"EnsembleSE: identification of super-enhancers based on ensemble learning.","authors":"Wenying He, Jialu Xu, Yun Zuo, Yude Bai, Fei Guo","doi":"10.1093/bfgp/elaf003","DOIUrl":"https://doi.org/10.1093/bfgp/elaf003","url":null,"abstract":"<p><p>Super-enhancers (SEs) are typically located in the regulatory regions of genes, driving high-level gene expression. Identifying SEs is crucial for a deeper understanding of gene regulatory networks, disease mechanisms, and the development and physiological processes of organisms, thus exerting a profound impact on research and applications in the life sciences field. Traditional experimental methods for identifying SEs are costly and time-consuming. Existing methods for predicting SEs based solely on sequence data use deep learning for feature representation and have achieved good results. However, they overlook biological features related to physicochemical properties, leading to low interpretability. Additionally, the complex model structure often requires extensive labeled data for training, which limits their further application in biological data. In this paper, we integrate the strengths of different models and proposes an ensemble model based on an integration strategy to enhance the model's generalization ability. It designs a multi-angle feature representation method that combines local structure and global information to extract high-dimensional abstract relationships and key low-dimensional biological features from sequences. This enhances the effectiveness and interpretability of the model's input features, providing technical support for discovering cell-specific and species-specific patterns of SEs. We evaluated the performance on both mouse and human datasets using five metrics, including area under the receiver operating characteristic curve accuracy, and others. Compared to the latest models, EnsembleSE achieved an average improvement of 4.5% in F1 score and an average improvement of 8.05% in recall, demonstrating the robustness and adaptability of the model on a unified test set. Source codes are available at https://github.com/2103374200/EnsembleSE-main.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":"24 ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12008123/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143995578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
amplysis: an R package for microbial composition and diversity analysis using 16S rRNA amplicon data. 扩增:一个R包微生物组成和多样性分析使用16S rRNA扩增子数据。
IF 2.5 3区 生物学 Q3 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-01-15 DOI: 10.1093/bfgp/elaf017
Ziming Su, Xinyu Zhang, Qiming Wang, Qianwei Tang, Dan Yang, Yaqing Liu

The downstream analysis of 16S rRNA sequencing data remains a significant challenge for researchers lacking extensive bioinformatics expertise, often requiring proficiency in diverse tools and methodologies. To address this, we present amplysis, an R package designed to streamline the analysis and visualization of 16S rRNA amplicon sequencing data through an intuitive, code-light workflow. amplysis integrates data importing, processing, statistical analysis, and visualization into a unified framework. Key features include data normalization, microbial composition profiling, alpha/beta diversity analysis, ordination methods (e.g. Principal Component Analysis), and publication-ready visualization tools. The package's utility was demonstrated through three case studies, one of which analyzed microbial community responses to hexachlorocyclohexane (HCH) degradation in groundwater environments. Using amplysis, we efficiently generated phylum/genus-level abundance plots, alpha-diversity indices, and Principal Coordinates Analysis ordination, revealing significant shifts in community structure and diversity under HCH stress. The other case studies utilized publicly available data from published studies by other researchers. These results underscore the package's ability to simplify complex analyses while ensuring reproducibility and high-quality output. By integrating modular, user-friendly functions, amplysis lowers the barrier to robust microbiome data exploration. The package is available on GitHub (https://github.com/min-perilla/amplysis), offering a valuable resource for researchers in microbial ecology and environmental genomics.

对于缺乏广泛生物信息学专业知识的研究人员来说,16S rRNA测序数据的下游分析仍然是一个重大挑战,通常需要熟练使用各种工具和方法。为了解决这个问题,我们提出了amplysis,这是一个R包,旨在通过直观的轻代码工作流程简化16S rRNA扩增子测序数据的分析和可视化。Amplysis将数据导入、处理、统计分析和可视化集成到一个统一的框架中。主要功能包括数据归一化、微生物组成分析、α / β多样性分析、协调方法(如主成分分析)和可发表的可视化工具。通过三个案例研究证明了该套件的实用性,其中一个案例分析了地下水环境中微生物群落对六氯环己烷(HCH)降解的反应。利用扩增分析方法,我们有效地生成了门/属水平的丰度图、α -多样性指数和主坐标分析排序,揭示了HCH胁迫下群落结构和多样性的显著变化。其他案例研究利用了其他研究人员发表的研究中公开可用的数据。这些结果强调了包的能力,以简化复杂的分析,同时确保再现性和高质量的输出。通过集成模块化,用户友好的功能,扩增降低了强大的微生物组数据探索的障碍。该软件包可在GitHub (https://github.com/min-perilla/amplysis)上获得,为微生物生态学和环境基因组学的研究人员提供了宝贵的资源。
{"title":"amplysis: an R package for microbial composition and diversity analysis using 16S rRNA amplicon data.","authors":"Ziming Su, Xinyu Zhang, Qiming Wang, Qianwei Tang, Dan Yang, Yaqing Liu","doi":"10.1093/bfgp/elaf017","DOIUrl":"10.1093/bfgp/elaf017","url":null,"abstract":"<p><p>The downstream analysis of 16S rRNA sequencing data remains a significant challenge for researchers lacking extensive bioinformatics expertise, often requiring proficiency in diverse tools and methodologies. To address this, we present amplysis, an R package designed to streamline the analysis and visualization of 16S rRNA amplicon sequencing data through an intuitive, code-light workflow. amplysis integrates data importing, processing, statistical analysis, and visualization into a unified framework. Key features include data normalization, microbial composition profiling, alpha/beta diversity analysis, ordination methods (e.g. Principal Component Analysis), and publication-ready visualization tools. The package's utility was demonstrated through three case studies, one of which analyzed microbial community responses to hexachlorocyclohexane (HCH) degradation in groundwater environments. Using amplysis, we efficiently generated phylum/genus-level abundance plots, alpha-diversity indices, and Principal Coordinates Analysis ordination, revealing significant shifts in community structure and diversity under HCH stress. The other case studies utilized publicly available data from published studies by other researchers. These results underscore the package's ability to simplify complex analyses while ensuring reproducibility and high-quality output. By integrating modular, user-friendly functions, amplysis lowers the barrier to robust microbiome data exploration. The package is available on GitHub (https://github.com/min-perilla/amplysis), offering a valuable resource for researchers in microbial ecology and environmental genomics.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":"24 ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12640542/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145589875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VARGG: a deep learning framework advancing precise spatial domain identification and cellular heterogeneity analysis in spatial transcriptomics. VARGG:在空间转录组学中推进精确空间域识别和细胞异质性分析的深度学习框架。
IF 2.5 3区 生物学 Q3 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-01-15 DOI: 10.1093/bfgp/elaf018
Mengqiu Wang, Zhiwei Zhang, Lixin Lei, Kaitai Han, Zhenghui Wang, Ruoyan Dai, Zijun Wang, Chaojing Shi, Xudong Zhao, Qianjin Guo

Spatial transcriptomics has revolutionized our ability to measure gene expression while preserving spatial information, thus facilitating detailed analysis of tissue structure and function. Identifying spatial domains accurately is key for understanding tissue microenvironments and biological progression. To overcome the challenge of integrating gene expression data with spatial information, we introduce the VARGG deep learning framework. VARGG combines a pretrained Vision Transformer (ViT) with a graph neural network autoencoder, utilizing ViT's self-attention mechanism to capture global contextual information and enhance understanding of spatial relationships. This framework is further enhanced by multi-layer gated residual graph neural networks and Gaussian noise, which improve feature representation and model generalizability across different data sources. The robustness and scalability of VARGG have been verified on different platforms (10x Visium, Slide-seqV2, Stereo-seq, and MERFISH) and datasets of different sizes (human glioblastoma, mouse embryo, breast cancer). Our results demonstrate that VARGG's ability to accurately delineate spatial domains can provide a deeper understanding of tissue structure and help identify key molecular markers and potential therapeutic targets, thereby improving our understanding of disease mechanisms and providing opportunities for personalization to inform the development of treatment strategies.

空间转录组学彻底改变了我们在保留空间信息的同时测量基因表达的能力,从而促进了对组织结构和功能的详细分析。准确识别空间域是理解组织微环境和生物进程的关键。为了克服整合基因表达数据与空间信息的挑战,我们引入了VARGG深度学习框架。VARGG结合了一个预训练的视觉变压器(ViT)和一个图神经网络自编码器,利用ViT的自注意机制来捕获全局上下文信息,增强对空间关系的理解。该框架通过多层门控残差图神经网络和高斯噪声进一步增强,提高了特征表示和模型跨不同数据源的可泛化性。VARGG的稳健性和可扩展性已在不同平台(10x Visium、Slide-seqV2、Stereo-seq和MERFISH)和不同大小的数据集(人胶质母细胞瘤、小鼠胚胎、乳腺癌)上得到验证。我们的研究结果表明,VARGG准确描绘空间域的能力可以更深入地了解组织结构,帮助识别关键分子标记和潜在的治疗靶点,从而提高我们对疾病机制的理解,并为个性化治疗策略的制定提供机会。
{"title":"VARGG: a deep learning framework advancing precise spatial domain identification and cellular heterogeneity analysis in spatial transcriptomics.","authors":"Mengqiu Wang, Zhiwei Zhang, Lixin Lei, Kaitai Han, Zhenghui Wang, Ruoyan Dai, Zijun Wang, Chaojing Shi, Xudong Zhao, Qianjin Guo","doi":"10.1093/bfgp/elaf018","DOIUrl":"10.1093/bfgp/elaf018","url":null,"abstract":"<p><p>Spatial transcriptomics has revolutionized our ability to measure gene expression while preserving spatial information, thus facilitating detailed analysis of tissue structure and function. Identifying spatial domains accurately is key for understanding tissue microenvironments and biological progression. To overcome the challenge of integrating gene expression data with spatial information, we introduce the VARGG deep learning framework. VARGG combines a pretrained Vision Transformer (ViT) with a graph neural network autoencoder, utilizing ViT's self-attention mechanism to capture global contextual information and enhance understanding of spatial relationships. This framework is further enhanced by multi-layer gated residual graph neural networks and Gaussian noise, which improve feature representation and model generalizability across different data sources. The robustness and scalability of VARGG have been verified on different platforms (10x Visium, Slide-seqV2, Stereo-seq, and MERFISH) and datasets of different sizes (human glioblastoma, mouse embryo, breast cancer). Our results demonstrate that VARGG's ability to accurately delineate spatial domains can provide a deeper understanding of tissue structure and help identify key molecular markers and potential therapeutic targets, thereby improving our understanding of disease mechanisms and providing opportunities for personalization to inform the development of treatment strategies.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":"24 ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12640549/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145589955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Crosstalk between genomic variants and DNA methylation in FLT3 mutant acute myeloid leukemia. FLT3突变型急性髓性白血病中基因组变异与DNA甲基化之间的相互关系
IF 2.5 3区 生物学 Q3 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-01-15 DOI: 10.1093/bfgp/elae028
Bac Dao, Van Ngu Trinh, Huy V Nguyen, Hoa L Nguyen, Thuc Duy Le, Phuc Loi Luu

Acute myeloid leukemia (AML) is a type of blood cancer with diverse genetic variations and DNA methylation alterations. By studying the interaction of gene mutations, expression, and DNA methylation, we aimed to gain valuable insights into the processes that lead to block differentiation in AML. We analyzed TCGA-LAML data (173 samples) with RNA sequencing and DNA methylation arrays, comparing FLT3 mutant (48) and wild-type (125) cases. We conducted differential gene expression analysis using cBioPortal, identified DNA methylation differences with ChAMP tool, and correlated them with gene expression changes. Gene set enrichment analysis (g:Profiler) revealed significant biological processes and pathways. ShinyGo and GeneCards were used to find potential transcription factors and their binding sites among significant genes. We found significant differentially expressed genes (DEGs) negatively correlated with their most significant methylation probes (Pearson correlation coefficient of -0.49, P-value <0.001) between FLT3 mutant and wild-type groups. Moreover, our exploration of 450 k CpG sites uncovered a global hypo-methylated status in 168 DEGs. Notably, these methylation changes were enriched in the promoter regions of Homebox superfamily gene, which are crucial in transcriptional-regulating pathways in blood cancer. Furthermore, in FLT3 mutant AML patient samples, we observed overexpress of WT1, a transcription factor known to bind homeobox gene family. This finding suggests a potential mechanism by which WT1 recruits TET2 to demethylate specific genomic regions. Integrating gene expression and DNA methylation analyses shed light on the impact of FLT3 mutations on cancer cell development and differentiation, supporting a two-hit model in AML. This research advances understanding of AML and fosters targeted therapeutic strategy development.

急性髓性白血病(AML)是一种具有多种基因变异和DNA甲基化改变的血癌。通过研究基因突变、表达和DNA甲基化之间的相互作用,我们旨在获得有关导致急性髓细胞白血病分化受阻过程的宝贵见解。我们用RNA测序和DNA甲基化阵列分析了TCGA-LAML数据(173个样本),比较了FLT3突变型(48个)和野生型(125个)病例。我们使用 cBioPortal 进行了差异基因表达分析,使用 ChAMP 工具确定了 DNA 甲基化差异,并将其与基因表达变化相关联。基因组富集分析(g:Profiler)揭示了重要的生物过程和通路。我们使用 ShinyGo 和 GeneCards 寻找重要基因中的潜在转录因子及其结合位点。我们发现重要的差异表达基因(DEGs)与其最重要的甲基化探针呈负相关(Pearson 相关系数为 -0.49,P-value 为
{"title":"Crosstalk between genomic variants and DNA methylation in FLT3 mutant acute myeloid leukemia.","authors":"Bac Dao, Van Ngu Trinh, Huy V Nguyen, Hoa L Nguyen, Thuc Duy Le, Phuc Loi Luu","doi":"10.1093/bfgp/elae028","DOIUrl":"10.1093/bfgp/elae028","url":null,"abstract":"<p><p>Acute myeloid leukemia (AML) is a type of blood cancer with diverse genetic variations and DNA methylation alterations. By studying the interaction of gene mutations, expression, and DNA methylation, we aimed to gain valuable insights into the processes that lead to block differentiation in AML. We analyzed TCGA-LAML data (173 samples) with RNA sequencing and DNA methylation arrays, comparing FLT3 mutant (48) and wild-type (125) cases. We conducted differential gene expression analysis using cBioPortal, identified DNA methylation differences with ChAMP tool, and correlated them with gene expression changes. Gene set enrichment analysis (g:Profiler) revealed significant biological processes and pathways. ShinyGo and GeneCards were used to find potential transcription factors and their binding sites among significant genes. We found significant differentially expressed genes (DEGs) negatively correlated with their most significant methylation probes (Pearson correlation coefficient of -0.49, P-value <0.001) between FLT3 mutant and wild-type groups. Moreover, our exploration of 450 k CpG sites uncovered a global hypo-methylated status in 168 DEGs. Notably, these methylation changes were enriched in the promoter regions of Homebox superfamily gene, which are crucial in transcriptional-regulating pathways in blood cancer. Furthermore, in FLT3 mutant AML patient samples, we observed overexpress of WT1, a transcription factor known to bind homeobox gene family. This finding suggests a potential mechanism by which WT1 recruits TET2 to demethylate specific genomic regions. Integrating gene expression and DNA methylation analyses shed light on the impact of FLT3 mutations on cancer cell development and differentiation, supporting a two-hit model in AML. This research advances understanding of AML and fosters targeted therapeutic strategy development.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735749/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141472885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A dynamic model of gene activation in response to hypoxia accounting for both HIF-1 and HIF-2. 缺氧时HIF-1和HIF-2基因激活的动态模型。
IF 2.5 3区 生物学 Q3 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-01-15 DOI: 10.1093/bfgp/elaf021
Aleksandra Cabaj, Agata Charzyńska, Adrianna Moszyńska, Maciej Jaśkiewicz, Rafał Bartoszewski, Michał Dąbrowski

We developed an ordinary differential equations (ODEs) model of hypoxia signaling that, in addition to HIF-1α, takes into account also HIF-2α. Our model can be separated into two parts, the first, describing the production and degradation of the α subunits of HIF-1 and HIF-2, and their accumulation in response to hypoxia; and the second, describing how the α subunits cooperate with the β subunit in binding to cis-regulatory regions and activation of HIF-target genes in response to hypoxia. In our previous work [1], using the first part of our model trained on time-series data from 0.9 % hypoxia, we successfully predicted the response of the system to a further drop of the oxygen to 0.3 % hypoxia. This modeling result contributed to explaining the mechanism of the switch of the control from HIF-1 to HIF-2 during the response of human primary endothelial cells to hypoxia. In another work [2], we experimentally demonstrated a linear proportionality between the counts of motifs assigned to HIF-1 in promoter open chromatin regions of genes and the effects of HIF-1 on the induction of these genes under hypoxia. We furthermore showed that such a proportionality is predicted by the subset of the ODE model of Nguyen et al. (2013) [3] common with the second part of our ODE model. In the current work, we provide the details of our full ODE model and show that it leads to a prediction that HIF-1β can be a limiting factor of the response to hypoxia.

我们开发了一个缺氧信号的常微分方程(ode)模型,除了HIF-1α外,还考虑了HIF-2α。我们的模型可以分为两部分,第一部分描述了HIF-1和HIF-2 α亚基的产生和降解,以及它们在缺氧条件下的积累;第二,描述了α亚基如何与β亚基合作,结合顺式调控区域并激活hif靶基因,以应对缺氧。在我们之前的工作[1]中,使用我们模型的第一部分训练时间序列数据,从0.9%的缺氧,我们成功地预测了系统对氧气进一步下降到0.3%的缺氧的响应。这一建模结果有助于解释在人原代内皮细胞对缺氧的反应中,HIF-1向HIF-2的控制转换的机制。在另一项工作b[2]中,我们通过实验证明了HIF-1在基因启动子开放染色质区域分配给基序的数量与HIF-1在缺氧条件下对这些基因的诱导作用之间的线性比例关系。我们进一步表明,这种比例性是由Nguyen等人(2013)的ODE模型的子集预测的,该子集与我们的ODE模型的第二部分相同。在目前的工作中,我们提供了完整ODE模型的细节,并表明它可以预测HIF-1β可能是对缺氧反应的限制因素。
{"title":"A dynamic model of gene activation in response to hypoxia accounting for both HIF-1 and HIF-2.","authors":"Aleksandra Cabaj, Agata Charzyńska, Adrianna Moszyńska, Maciej Jaśkiewicz, Rafał Bartoszewski, Michał Dąbrowski","doi":"10.1093/bfgp/elaf021","DOIUrl":"10.1093/bfgp/elaf021","url":null,"abstract":"<p><p>We developed an ordinary differential equations (ODEs) model of hypoxia signaling that, in addition to HIF-1α, takes into account also HIF-2α. Our model can be separated into two parts, the first, describing the production and degradation of the α subunits of HIF-1 and HIF-2, and their accumulation in response to hypoxia; and the second, describing how the α subunits cooperate with the β subunit in binding to cis-regulatory regions and activation of HIF-target genes in response to hypoxia. In our previous work [1], using the first part of our model trained on time-series data from 0.9 % hypoxia, we successfully predicted the response of the system to a further drop of the oxygen to 0.3 % hypoxia. This modeling result contributed to explaining the mechanism of the switch of the control from HIF-1 to HIF-2 during the response of human primary endothelial cells to hypoxia. In another work [2], we experimentally demonstrated a linear proportionality between the counts of motifs assigned to HIF-1 in promoter open chromatin regions of genes and the effects of HIF-1 on the induction of these genes under hypoxia. We furthermore showed that such a proportionality is predicted by the subset of the ODE model of Nguyen et al. (2013) [3] common with the second part of our ODE model. In the current work, we provide the details of our full ODE model and show that it leads to a prediction that HIF-1β can be a limiting factor of the response to hypoxia.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":"24 ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12700088/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145745917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual oncogenic roles of TPD52 and TPD52L2 in gastric cancer progression via PI3K/AKT activation and immunosuppressive microenvironment remodeling. TPD52和TPD52L2通过PI3K/AKT激活和免疫抑制微环境重塑在胃癌进展中的双重致癌作用
IF 2.5 3区 生物学 Q3 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-01-15 DOI: 10.1093/bfgp/elaf015
Hailong Li, Xiaqing Gao, Shuangming Guo, Shenfei Gao, Chunting Yang, Rong Su, Zhe Jing, Shuping Qiu, Ping Tang, Jing Han

Aim: TPD52 (tumor protein D52) and TPD52L2 (tumor protein D52-like 2), members of the TPD52 gene family, have been implicated in multiple malignancies. However, their roles in gastric cancer (GC) remain elusive. Herein, we integrated multiomics analyses and experimental validation to elucidate their prognostic and functional significance in GC.

Methods: Utilizing The Cancer Genome Atlas (TCGA), Gene Expression Omnibus (GEO), and tissue microarray datasets, we analyzed TPD52/TPD52L2 expression patterns in patients with GC. Survival analysis, Cox regression, and nomogram construction were performed to assess prognostic value. Gene Ontology and Kyoto Encyclopedia of Genes and Genomes functional enrichment analysis and immune infiltration evaluation (Cell-type Identification By Estimating Relative Subsets Of RNA Transcripts‌/Estimation of STromal and Immune cells in MAlignant Tumour tissues using Expression data) (CIBERSORTx/ESTIMATE) were conducted to explore the molecular mechanisms involved. In vitro experiments (cell proliferation, migration, invasion, and apoptosis assays) were performed via lentivirus-mediated gene knockdown in gastric cancer cell lines AGS and MKN45 cells.

Results: TPD52 and TPD52L2 were significantly overexpressed in GC tissues compared with their normal counterparts. Elevated TPD52L2 expression was significantly associated with advanced Tumor, Node, Metastasis (TNM) stage and independently predicted reduced overall survival according to multivariate Cox regression. Multivariate analysis identified TPD52L2 as an independent prognostic factor. Diagnostic Receiver Operating Characteristic (ROC) curves yielded area under the curve values of 0.813 (TPD52) and 0.807 (TPD52L2). The results of functional experiments suggested that TPD52/TPD52L2 knockdown inhibited proliferation, migration, G0/G1 arrest, and induced apoptosis. Mechanistically, TPD52/TPD52L2 silencing suppressed PI3K/Akt serine/threonine kinase (AKT)/mammalian target of rapamycin (mTOR) signaling and epithelial-mesenchymal transition marker expression.

Conclusion: TPD52 and TPD52L2 are promising prognostic biomarkers in GC, with TPD52L2 exhibiting greater clinical relevance. Targeting these proteins may disrupt oncogenic signaling pathways and enhance immunotherapy efficacy, warranting further investigation in clinical trials.

目的:TPD52基因家族成员TPD52(肿瘤蛋白D52)和TPD52L2(肿瘤蛋白D52样2)与多种恶性肿瘤有关。然而,它们在胃癌(GC)中的作用尚不明确。在此,我们结合多组学分析和实验验证来阐明它们在GC中的预后和功能意义。方法:利用肿瘤基因组图谱(TCGA)、基因表达图谱(GEO)和组织微阵列数据集,分析胃癌患者TPD52/TPD52L2的表达模式。通过生存分析、Cox回归和nomogram构建来评估预后价值。通过基因本体和京都基因与基因组百科全书功能富集分析和免疫浸润评估(Cell-type Identification By estimated Relative Subsets of RNA转录本)/利用表达数据估计恶性肿瘤组织中的基质和免疫细胞)(CIBERSORTx/ESTIMATE)来探索其中的分子机制。通过慢病毒介导的基因敲低,在胃癌细胞系AGS和MKN45细胞中进行了体外实验(细胞增殖、迁移、侵袭和凋亡实验)。结果:与正常组织相比,TPD52和TPD52L2在GC组织中明显过表达。根据多变量Cox回归,TPD52L2表达升高与肿瘤、淋巴结、转移(TNM)晚期显著相关,并独立预测总生存期降低。多变量分析发现TPD52L2是一个独立的预后因素。诊断性受试者工作特征(ROC)曲线下面积分别为0.813 (TPD52)和0.807 (TPD52L2)。功能实验结果表明,敲低TPD52/TPD52L2可抑制细胞增殖、迁移、G0/G1阻滞、诱导细胞凋亡。机制上,TPD52/TPD52L2沉默可抑制PI3K/Akt丝氨酸/苏氨酸激酶(Akt)/哺乳动物雷帕霉素靶蛋白(mTOR)信号和上皮-间质转化标志物的表达。结论:TPD52和TPD52L2是有前景的胃癌预后生物标志物,其中TPD52L2具有更大的临床相关性。靶向这些蛋白可能会破坏致癌信号通路,提高免疫治疗效果,值得在临床试验中进一步研究。
{"title":"Dual oncogenic roles of TPD52 and TPD52L2 in gastric cancer progression via PI3K/AKT activation and immunosuppressive microenvironment remodeling.","authors":"Hailong Li, Xiaqing Gao, Shuangming Guo, Shenfei Gao, Chunting Yang, Rong Su, Zhe Jing, Shuping Qiu, Ping Tang, Jing Han","doi":"10.1093/bfgp/elaf015","DOIUrl":"10.1093/bfgp/elaf015","url":null,"abstract":"<p><strong>Aim: </strong>TPD52 (tumor protein D52) and TPD52L2 (tumor protein D52-like 2), members of the TPD52 gene family, have been implicated in multiple malignancies. However, their roles in gastric cancer (GC) remain elusive. Herein, we integrated multiomics analyses and experimental validation to elucidate their prognostic and functional significance in GC.</p><p><strong>Methods: </strong>Utilizing The Cancer Genome Atlas (TCGA), Gene Expression Omnibus (GEO), and tissue microarray datasets, we analyzed TPD52/TPD52L2 expression patterns in patients with GC. Survival analysis, Cox regression, and nomogram construction were performed to assess prognostic value. Gene Ontology and Kyoto Encyclopedia of Genes and Genomes functional enrichment analysis and immune infiltration evaluation (Cell-type Identification By Estimating Relative Subsets Of RNA Transcripts‌/Estimation of STromal and Immune cells in MAlignant Tumour tissues using Expression data) (CIBERSORTx/ESTIMATE) were conducted to explore the molecular mechanisms involved. In vitro experiments (cell proliferation, migration, invasion, and apoptosis assays) were performed via lentivirus-mediated gene knockdown in gastric cancer cell lines AGS and MKN45 cells.</p><p><strong>Results: </strong>TPD52 and TPD52L2 were significantly overexpressed in GC tissues compared with their normal counterparts. Elevated TPD52L2 expression was significantly associated with advanced Tumor, Node, Metastasis (TNM) stage and independently predicted reduced overall survival according to multivariate Cox regression. Multivariate analysis identified TPD52L2 as an independent prognostic factor. Diagnostic Receiver Operating Characteristic (ROC) curves yielded area under the curve values of 0.813 (TPD52) and 0.807 (TPD52L2). The results of functional experiments suggested that TPD52/TPD52L2 knockdown inhibited proliferation, migration, G0/G1 arrest, and induced apoptosis. Mechanistically, TPD52/TPD52L2 silencing suppressed PI3K/Akt serine/threonine kinase (AKT)/mammalian target of rapamycin (mTOR) signaling and epithelial-mesenchymal transition marker expression.</p><p><strong>Conclusion: </strong>TPD52 and TPD52L2 are promising prognostic biomarkers in GC, with TPD52L2 exhibiting greater clinical relevance. Targeting these proteins may disrupt oncogenic signaling pathways and enhance immunotherapy efficacy, warranting further investigation in clinical trials.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":"24 ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12449195/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145093137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepMEns: an ensemble model for predicting sgRNA on-target activity based on multiple features. DeepMEns:基于多种特征预测 sgRNA 靶向活性的集合模型。
IF 2.5 3区 生物学 Q3 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2025-01-15 DOI: 10.1093/bfgp/elae043
Shumei Ding, Jia Zheng, Cangzhi Jia

The CRISPR/Cas9 system developed from Streptococcus pyogenes (SpCas9) has high potential in gene editing. However, its successful application is hindered by the considerable variability in target efficiencies across different single guide RNAs (sgRNAs). Although several deep learning models have been created to predict sgRNA on-target activity, the intrinsic mechanisms of these models are difficult to explain, and there is still scope for improvement in prediction performance. To overcome these issues, we propose an ensemble interpretable model termed DeepMEns based on deep learning to predict sgRNA on-target activity. By using five different training and validation datasets, we constructed five sub-regressors, each comprising three parts. The first part uses one-hot encoding, wherein 0-1 representation of the secondary structure is used as the input to the convolutional neural network (CNN) with Transformer encoder. The second part uses the DNA shape feature matrix as the input to the CNN with Transformer encoder. The third part uses positional encoding feature matrices as the proposed input into a long short-term memory network with an attention mechanism. These three parts are concatenated through the flattened layer, and the final prediction result is the average of the five sub-regressors. Extensive benchmarking experiments indicated that DeepMEns achieved the highest Spearman correlation coefficient for 6 of 10 independent test datasets as compared to previous predictors, this finding confirmed that DeepMEns can accomplish state-of-the-art performance. Moreover, the ablation analysis also indicated that the ensemble strategy may improve the performance of the prediction model.

从化脓性链球菌(SpCas9)中开发的 CRISPR/Cas9 系统在基因编辑方面具有很大的潜力。然而,不同的单导RNA(sgRNA)在靶标效率上存在很大差异,这阻碍了它的成功应用。虽然已经创建了几个深度学习模型来预测 sgRNA 的靶上活性,但这些模型的内在机制难以解释,预测性能仍有改进的余地。为了克服这些问题,我们提出了一种基于深度学习的集合可解释模型,称为 DeepMEns,用于预测 sgRNA 靶向活性。通过使用五个不同的训练和验证数据集,我们构建了五个子回归器,每个子回归器由三部分组成。第一部分使用单次编码,其中二级结构的 0-1 表示被用作带有 Transformer 编码器的卷积神经网络(CNN)的输入。第二部分使用 DNA 形状特征矩阵作为带变换器编码器的卷积神经网络的输入。第三部分使用位置编码特征矩阵作为具有注意力机制的长短期记忆网络的拟议输入。这三个部分通过扁平化层进行串联,最终预测结果是五个子回归器的平均值。广泛的基准测试实验表明,在 10 个独立测试数据集中,DeepMEns 有 6 个数据集的斯皮尔曼相关系数与之前的预测器相比最高,这一结果证实了 DeepMEns 可以达到最先进的性能。此外,消融分析还表明,集合策略可以提高预测模型的性能。
{"title":"DeepMEns: an ensemble model for predicting sgRNA on-target activity based on multiple features.","authors":"Shumei Ding, Jia Zheng, Cangzhi Jia","doi":"10.1093/bfgp/elae043","DOIUrl":"10.1093/bfgp/elae043","url":null,"abstract":"<p><p>The CRISPR/Cas9 system developed from Streptococcus pyogenes (SpCas9) has high potential in gene editing. However, its successful application is hindered by the considerable variability in target efficiencies across different single guide RNAs (sgRNAs). Although several deep learning models have been created to predict sgRNA on-target activity, the intrinsic mechanisms of these models are difficult to explain, and there is still scope for improvement in prediction performance. To overcome these issues, we propose an ensemble interpretable model termed DeepMEns based on deep learning to predict sgRNA on-target activity. By using five different training and validation datasets, we constructed five sub-regressors, each comprising three parts. The first part uses one-hot encoding, wherein 0-1 representation of the secondary structure is used as the input to the convolutional neural network (CNN) with Transformer encoder. The second part uses the DNA shape feature matrix as the input to the CNN with Transformer encoder. The third part uses positional encoding feature matrices as the proposed input into a long short-term memory network with an attention mechanism. These three parts are concatenated through the flattened layer, and the final prediction result is the average of the five sub-regressors. Extensive benchmarking experiments indicated that DeepMEns achieved the highest Spearman correlation coefficient for 6 of 10 independent test datasets as compared to previous predictors, this finding confirmed that DeepMEns can accomplish state-of-the-art performance. Moreover, the ablation analysis also indicated that the ensemble strategy may improve the performance of the prediction model.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735754/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142630918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Briefings in Functional Genomics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1