首页 > 最新文献

BMC Bioinformatics最新文献

英文 中文
AllergenAI: a deep learning model predicting allergenicity based on protein sequence. AllergenAI:基于蛋白质序列预测致敏性的深度学习模型。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-11-18 DOI: 10.1186/s12859-025-06302-1
Jiajia Liu, Surendra S Negi, Chengyuan Yang, Xiaobo Zhou, Catherine H Schein, Werner Braun, Pora Kim
{"title":"AllergenAI: a deep learning model predicting allergenicity based on protein sequence.","authors":"Jiajia Liu, Surendra S Negi, Chengyuan Yang, Xiaobo Zhou, Catherine H Schein, Werner Braun, Pora Kim","doi":"10.1186/s12859-025-06302-1","DOIUrl":"10.1186/s12859-025-06302-1","url":null,"abstract":"","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"279"},"PeriodicalIF":3.3,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12625376/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145547704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scMFF: a machine learning framework with multiple feature fusion strategies for cell type identification. scMFF:一种具有多种特征融合策略的机器学习框架,用于细胞类型识别。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-11-18 DOI: 10.1186/s12859-025-06309-8
Nan Sun, Yu Wang, Xiang Shi, Dengcheng Yang, Rongling Wu, Stephen S-T Yau

Accurate cell type classification is critical for downstream analysis in single-cell RNA sequencing (scRNA-seq). Most existing methods rely on a single type of feature representation-such as statistical, information theory, matrix factorization, or deep learning-based features. However, each captures different aspects of the data, and no single feature type can fully represent the complex differences between cell types. Moreover, naïvely concatenating multiple features may introduce redundancy or noise, reducing model performance. To address these challenges, we propose scMFF, which is a multiple feature fusion framework that integrates four features and explores six fusion strategies in combination with various classifiers for single-cell type classification. Comprehensive evaluations on 42 disease-related datasets and an external COVID-19 dataset demonstrate that scMFF outperforms single-feature approaches in terms of performance and stability, providing a reliable and effective solution for scRNA-seq data analysis.

准确的细胞类型分类对于单细胞RNA测序(scRNA-seq)的下游分析至关重要。大多数现有方法依赖于单一类型的特征表示,例如统计、信息论、矩阵分解或基于深度学习的特征。然而,每一种都捕获数据的不同方面,没有一种特征类型可以完全表示单元格类型之间的复杂差异。此外,naïvely连接多个特征可能会引入冗余或噪声,降低模型性能。为了解决这些挑战,我们提出了scMFF,它是一个多特征融合框架,集成了四个特征,并探索了六种融合策略,结合各种分类器进行单细胞类型分类。对42个疾病相关数据集和一个外部COVID-19数据集的综合评估表明,scMFF在性能和稳定性方面优于单特征方法,为scRNA-seq数据分析提供了可靠有效的解决方案。
{"title":"scMFF: a machine learning framework with multiple feature fusion strategies for cell type identification.","authors":"Nan Sun, Yu Wang, Xiang Shi, Dengcheng Yang, Rongling Wu, Stephen S-T Yau","doi":"10.1186/s12859-025-06309-8","DOIUrl":"10.1186/s12859-025-06309-8","url":null,"abstract":"<p><p>Accurate cell type classification is critical for downstream analysis in single-cell RNA sequencing (scRNA-seq). Most existing methods rely on a single type of feature representation-such as statistical, information theory, matrix factorization, or deep learning-based features. However, each captures different aspects of the data, and no single feature type can fully represent the complex differences between cell types. Moreover, naïvely concatenating multiple features may introduce redundancy or noise, reducing model performance. To address these challenges, we propose scMFF, which is a multiple feature fusion framework that integrates four features and explores six fusion strategies in combination with various classifiers for single-cell type classification. Comprehensive evaluations on 42 disease-related datasets and an external COVID-19 dataset demonstrate that scMFF outperforms single-feature approaches in terms of performance and stability, providing a reliable and effective solution for scRNA-seq data analysis.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"277"},"PeriodicalIF":3.3,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12625116/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145547805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Talk2Biomodels: AI agent-based open-source LLM initiative for kinetic biological models. talk2biommodels:基于人工智能代理的开源动态生物模型法学硕士计划。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-11-18 DOI: 10.1186/s12859-025-06310-1
Lilija Wehling, Gurdeep Singh, Ahmad Wisnu Mulyadi, Rakesh Hadne Sreenath, Henning Hermjakob, Tung V N Nguyen, Thomas Rückle, Mohammed H Mosa, Henrik Cordes, Tommaso Andreani, Thomas Klabunde, Rahuman S Malik Sheriff, Douglas McCloskey

Background: Quantitative kinetic models of biological regulatory processes play an important role in understanding disease mechanisms. However, their simulation and analysis require specialized domain expertise.

Results: In this study, we present Talk2Biomodels (T2B), an open-source, user-friendly, large language model-based agentic AI platform designed to facilitate access to computational models of biological systems and promote the FAIRification (Findability, Accessibility, Interoperability, and Reusability) principles in systems biology. T2B allows users to interact with and analyse mathematical models of biological systems through conversations in natural language, thereby lowering the barrier to entry for model interpretation and hypothesis-driven exploration. The platform natively supports models encoded in the Systems Biology Markup Language, a widely adopted standard in the computational biology community. T2B is integrated with the BioModels database ( https://www.ebi.ac.uk/biomodels/ ), enabling retrieval, simulation, and analysis of curated systems biology models. We illustrate the platform's capabilities through use cases in precision medicine, infectious disease epidemiology, and the study of emergent network-level properties in cellular systems - demonstrating how both computational experts and domain scientists without formal modelling training can derive actionable insights from complex biological models. Talk2Biomodels is available at https://github.com/VirtualPatientEngine/AIAgents4Pharma . Detailed documentation and use cases are available at https://virtualpatientengine.github.io/AIAgents4Pharma/talk2biomodels/intro/ .

Conclusions: In summary, T2B lowers the barrier for non-experts to engage with and extract insights from computational models of biological systems, while simultaneously providing experts with a streamlined interface for analysing models and overall contributes to the FAIRification of models.

背景:生物调控过程的定量动力学模型在理解疾病机制方面发挥着重要作用。然而,它们的模拟和分析需要专门领域的专业知识。在这项研究中,我们提出了talk2biommodels (T2B),这是一个开源的、用户友好的、基于大型语言模型的人工智能平台,旨在促进对生物系统计算模型的访问,并促进系统生物学中的公平性(可寻性、可访问性、互操作性和可重用性)原则。T2B允许用户通过自然语言对话与生物系统的数学模型进行交互和分析,从而降低模型解释和假设驱动探索的门槛。该平台原生支持用系统生物学标记语言编码的模型,系统生物学标记语言是计算生物学社区广泛采用的标准。T2B与生物模型数据库(https://www.ebi.ac.uk/biomodels/)集成,支持检索、模拟和分析策划系统生物学模型。我们通过精准医学、传染病流行病学和细胞系统中突发网络级特性的研究用例说明了该平台的功能——展示了没有经过正式建模训练的计算专家和领域科学家如何从复杂的生物模型中获得可操作的见解。talk2biommodels可以在https://github.com/VirtualPatientEngine/AIAgents4Pharma上找到。详细的文档和用例可在https://virtualpatientengine.github.io/AIAgents4Pharma/talk2biomodels/intro/上获得。结论:总之,T2B降低了非专家参与生物系统计算模型并从中提取见解的障碍,同时为专家提供了一个简化的界面来分析模型,并总体上有助于模型的标准化。
{"title":"Talk2Biomodels: AI agent-based open-source LLM initiative for kinetic biological models.","authors":"Lilija Wehling, Gurdeep Singh, Ahmad Wisnu Mulyadi, Rakesh Hadne Sreenath, Henning Hermjakob, Tung V N Nguyen, Thomas Rückle, Mohammed H Mosa, Henrik Cordes, Tommaso Andreani, Thomas Klabunde, Rahuman S Malik Sheriff, Douglas McCloskey","doi":"10.1186/s12859-025-06310-1","DOIUrl":"10.1186/s12859-025-06310-1","url":null,"abstract":"<p><strong>Background: </strong>Quantitative kinetic models of biological regulatory processes play an important role in understanding disease mechanisms. However, their simulation and analysis require specialized domain expertise.</p><p><strong>Results: </strong>In this study, we present Talk2Biomodels (T2B), an open-source, user-friendly, large language model-based agentic AI platform designed to facilitate access to computational models of biological systems and promote the FAIRification (Findability, Accessibility, Interoperability, and Reusability) principles in systems biology. T2B allows users to interact with and analyse mathematical models of biological systems through conversations in natural language, thereby lowering the barrier to entry for model interpretation and hypothesis-driven exploration. The platform natively supports models encoded in the Systems Biology Markup Language, a widely adopted standard in the computational biology community. T2B is integrated with the BioModels database ( https://www.ebi.ac.uk/biomodels/ ), enabling retrieval, simulation, and analysis of curated systems biology models. We illustrate the platform's capabilities through use cases in precision medicine, infectious disease epidemiology, and the study of emergent network-level properties in cellular systems - demonstrating how both computational experts and domain scientists without formal modelling training can derive actionable insights from complex biological models. Talk2Biomodels is available at https://github.com/VirtualPatientEngine/AIAgents4Pharma . Detailed documentation and use cases are available at https://virtualpatientengine.github.io/AIAgents4Pharma/talk2biomodels/intro/ .</p><p><strong>Conclusions: </strong>In summary, T2B lowers the barrier for non-experts to engage with and extract insights from computational models of biological systems, while simultaneously providing experts with a streamlined interface for analysing models and overall contributes to the FAIRification of models.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"276"},"PeriodicalIF":3.3,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12625589/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145547748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine learning for genomic prediction of growth traits in aquaculture: a case study of the Australasian snapper (Chrysophrys auratus). 机器学习用于水产养殖生长性状的基因组预测:以澳大利亚鲷鱼为例。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-11-18 DOI: 10.1186/s12859-025-06287-x
Ze Chen, Julie Blommaert, Yi Mei, Linley Jesson, Maren Wellenreuther, Mengjie Zhang

Background: Chrysophrys auratus (family: Sparidae), commonly known as Australasian snapper, is a warm-water species being developed as a candidate for aquaculture in New Zealand. Genomic selection of elite snapper offers significant potential to accelerate genetic gains in aquaculture; however, the complexity of genetic architecture, coupled with challenges such as missing data and high dimensionality, poses significant hurdles. Machine learning techniques have emerged as powerful tools in genomic selection programmes due to their flexibility and ability to model complex, polygenic and non-linear relationships between genotypes and traits. This study aims to develop a comprehensive machine learning framework to evaluate imputation methods and genomic prediction models, and identify single-nucleotide polymorphisms associated with growth traits in snapper, ultimately contributing to the advancement of selective breeding programmes.

Results: We evaluated multiple approaches for each component of the machine learning framework. We developed and evaluated the Domain Knowledge-based K-nearest neighbour (DK-KNN) imputation method, achieving a notably high imputation accuracy of 98.33% in simulation testing, outperforming two alternative imputation methods. Among feature selection and classification combinations evaluated for growth prediction, Chi-squared feature selection paired with Distance-Weighted Discrimination (Chi2-DWD) achieved 60% prediction accuracy, comparable to genomic best linear unbiased prediction (60.3%) but without requiring the genomic relationship matrix. Notably, the two-stage approach using Domain Knowledge-based Pre-filtering (DK Pre-filtering) as a pre-filter did not substantially impact prediction accuracy, and it proved valuable in reducing the dimensionality of the feature space without affecting model performance.

Conclusions: Integration of domain knowledge into machine learning frameworks effectively addresses missing values and high-dimensional challenges in snapper genomic data. The evaluated framework demonstrates that Chi2-DWD represents a promising combination for genomic prediction tasks. The DK Pre-filtering workflow as a pre-filtering method successfully removes redundant features without affecting model performance. Selected features showed biological significance and were confirmed to be associated with growth traits based on biological analysis, providing valuable insights for selective breeding programs.

背景:金蝶(Chrysophrys auratus,科:Sparidae),俗称澳洲鲷鱼,是新西兰正在开发的一种暖水物种,作为水产养殖的候选物种。优质鲷鱼的基因组选择为加速水产养殖的遗传增益提供了巨大的潜力;然而,遗传结构的复杂性,加上数据缺失和高维等挑战,构成了重大障碍。机器学习技术已成为基因组选择计划的强大工具,因为它们具有灵活性和建模基因型和性状之间复杂的多基因和非线性关系的能力。本研究旨在开发一个全面的机器学习框架来评估估算方法和基因组预测模型,并确定与鲷鱼生长性状相关的单核苷酸多态性,最终为选择性育种计划的推进做出贡献。结果:我们对机器学习框架的每个组件评估了多种方法。我们开发并评估了基于领域知识的k -近邻(DK-KNN)插值方法,在模拟测试中获得了98.33%的显著高插值精度,优于两种替代的插值方法。在评估用于生长预测的特征选择和分类组合中,卡方特征选择与距离加权辨别(Chi2-DWD)配对获得了60%的预测精度,与基因组最佳线性无偏预测(60.3%)相当,但不需要基因组关系矩阵。值得注意的是,使用基于领域知识的预滤波(DK Pre-filtering)作为预滤波的两阶段方法并没有实质性地影响预测精度,并且在不影响模型性能的情况下降低特征空间的维数。结论:将领域知识集成到机器学习框架中,有效地解决了鲷鱼基因组数据中的缺失值和高维挑战。评估的框架表明,Chi2-DWD代表了基因组预测任务的一个有前途的组合。DK预滤波工作流作为一种预滤波方法,在不影响模型性能的前提下成功地去除了冗余特征。所选择的特征具有生物学意义,并被生物学分析证实与生长性状相关,为选择育种计划提供了有价值的见解。
{"title":"Machine learning for genomic prediction of growth traits in aquaculture: a case study of the Australasian snapper (Chrysophrys auratus).","authors":"Ze Chen, Julie Blommaert, Yi Mei, Linley Jesson, Maren Wellenreuther, Mengjie Zhang","doi":"10.1186/s12859-025-06287-x","DOIUrl":"10.1186/s12859-025-06287-x","url":null,"abstract":"<p><strong>Background: </strong>Chrysophrys auratus (family: Sparidae), commonly known as Australasian snapper, is a warm-water species being developed as a candidate for aquaculture in New Zealand. Genomic selection of elite snapper offers significant potential to accelerate genetic gains in aquaculture; however, the complexity of genetic architecture, coupled with challenges such as missing data and high dimensionality, poses significant hurdles. Machine learning techniques have emerged as powerful tools in genomic selection programmes due to their flexibility and ability to model complex, polygenic and non-linear relationships between genotypes and traits. This study aims to develop a comprehensive machine learning framework to evaluate imputation methods and genomic prediction models, and identify single-nucleotide polymorphisms associated with growth traits in snapper, ultimately contributing to the advancement of selective breeding programmes.</p><p><strong>Results: </strong>We evaluated multiple approaches for each component of the machine learning framework. We developed and evaluated the Domain Knowledge-based K-nearest neighbour (DK-KNN) imputation method, achieving a notably high imputation accuracy of 98.33% in simulation testing, outperforming two alternative imputation methods. Among feature selection and classification combinations evaluated for growth prediction, Chi-squared feature selection paired with Distance-Weighted Discrimination (Chi2-DWD) achieved 60% prediction accuracy, comparable to genomic best linear unbiased prediction (60.3%) but without requiring the genomic relationship matrix. Notably, the two-stage approach using Domain Knowledge-based Pre-filtering (DK Pre-filtering) as a pre-filter did not substantially impact prediction accuracy, and it proved valuable in reducing the dimensionality of the feature space without affecting model performance.</p><p><strong>Conclusions: </strong>Integration of domain knowledge into machine learning frameworks effectively addresses missing values and high-dimensional challenges in snapper genomic data. The evaluated framework demonstrates that Chi2-DWD represents a promising combination for genomic prediction tasks. The DK Pre-filtering workflow as a pre-filtering method successfully removes redundant features without affecting model performance. Selected features showed biological significance and were confirmed to be associated with growth traits based on biological analysis, providing valuable insights for selective breeding programs.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"278"},"PeriodicalIF":3.3,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12625465/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145547678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SeqForge: a scalable platform for alignment-based searches, motif detection, and sequence curation across meta/genomic datasets. SeqForge:一个可扩展的平台,用于基于比对的搜索,基序检测和跨元/基因组数据集的序列管理。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-11-18 DOI: 10.1186/s12859-025-06297-9
Elijah R Bring Horvath, Jaclyn M Winter

Background: The rapid increase in publicly available microbial and metagenomic data has created a growing demand for tools that can efficiently perform custom large-scale comparative searches and functional annotation. While BLAST + remains the standard for sequence similarity searches, population-level studies often require custom scripting and manual curation of results, which can present barriers for many researchers.

Results: We developed SeqForge, a scalable, modular command-line toolkit that streamlines alignment-based searches and motif mining across large genomic datasets. SeqForge automates BLAST + database creation and querying, integrates amino acid motif discovery, enables sequence and contig extraction, and curates results into structured, easily parsed formats. The platform supports diverse input formats, parallelized execution for high-performance computing environments, and built-in visualization tools. Benchmarking demonstrates that SeqForge achieves near-linear runtime scaling for computationally intensive modules while maintaining modest memory usage.

Conclusions: SeqForge lowers the computational barrier for large-scale meta/genomic exploration, enabling researchers to perform population-scale BLAST searches, motif detection, and sequence curation without custom scripting. The toolkit is freely available and platform-independent, making it suitable for both personal workstations and high-performance computing environments.

背景:可公开获得的微生物和宏基因组数据的快速增长,创造了对工具的不断增长的需求,这些工具可以有效地执行定制的大规模比较搜索和功能注释。虽然BLAST +仍然是序列相似性搜索的标准,但群体水平的研究通常需要自定义脚本和手动管理结果,这可能给许多研究人员带来障碍。结果:我们开发了SeqForge,这是一个可扩展的模块化命令行工具包,可以简化基于比对的搜索和跨大型基因组数据集的motif挖掘。SeqForge自动化BLAST +数据库创建和查询,集成氨基酸基序发现,支持序列和配置提取,并将结果整理成结构化,易于解析的格式。该平台支持多种输入格式、高性能计算环境的并行执行以及内置的可视化工具。基准测试表明,SeqForge在保持适度内存使用的同时,为计算密集型模块实现了近似线性的运行时扩展。结论:SeqForge降低了大规模元/基因组探索的计算障碍,使研究人员能够在没有自定义脚本的情况下进行群体规模的BLAST搜索、基序检测和序列管理。该工具包是免费提供的,并且与平台无关,因此既适合个人工作站,也适合高性能计算环境。
{"title":"SeqForge: a scalable platform for alignment-based searches, motif detection, and sequence curation across meta/genomic datasets.","authors":"Elijah R Bring Horvath, Jaclyn M Winter","doi":"10.1186/s12859-025-06297-9","DOIUrl":"10.1186/s12859-025-06297-9","url":null,"abstract":"<p><strong>Background: </strong>The rapid increase in publicly available microbial and metagenomic data has created a growing demand for tools that can efficiently perform custom large-scale comparative searches and functional annotation. While BLAST + remains the standard for sequence similarity searches, population-level studies often require custom scripting and manual curation of results, which can present barriers for many researchers.</p><p><strong>Results: </strong>We developed SeqForge, a scalable, modular command-line toolkit that streamlines alignment-based searches and motif mining across large genomic datasets. SeqForge automates BLAST + database creation and querying, integrates amino acid motif discovery, enables sequence and contig extraction, and curates results into structured, easily parsed formats. The platform supports diverse input formats, parallelized execution for high-performance computing environments, and built-in visualization tools. Benchmarking demonstrates that SeqForge achieves near-linear runtime scaling for computationally intensive modules while maintaining modest memory usage.</p><p><strong>Conclusions: </strong>SeqForge lowers the computational barrier for large-scale meta/genomic exploration, enabling researchers to perform population-scale BLAST searches, motif detection, and sequence curation without custom scripting. The toolkit is freely available and platform-independent, making it suitable for both personal workstations and high-performance computing environments.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"280"},"PeriodicalIF":3.3,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12625553/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145547772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph convolution network based on meta-paths and mutual information for drug-target interaction prediction. 基于元路径和互信息的图卷积网络药物-靶标相互作用预测。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-11-07 DOI: 10.1186/s12859-025-06295-x
Shujuan Cao, Binying Cai, Zhejian Qiu, Tiantian Chang, Qiqige Wuyun, Fang-Xiang Wu

Background: Predicting drug-target interactions (DTIs) plays a pivotal role in accelerating drug repositioning by prioritizing candidate drugs and reducing experimental costs. Despite advancements in deep learning, several challenges still require further exploration, including sparsity and inadequate representation of feature relationships.

Results: We propose GCNMM, a novel graph convolutional network based on meta-paths and mutual information, to predict latent DTIs in drug-target heterogeneous networks. Our approach begins by constructing a fused DTI network based on meta-paths and a graph attention network. We compute multiple similarity networks by using Jaccard coefficients and integrate them into the fused drug and target similarity networks through entropy-based fusion. These networks are then jointly processed by graph convolutional auto-encoder to generate low-dimensional feature representations. To preserve the topological structure of the original network in the embedding space and strengthen the relationship between the input and latent representations, we incorporate spatial topological consistency and mutual information maximization as dual optimization objectives.

Conclusions: The experimental results illustrate that GCNMM exhibits superior performance to existing baseline models in DTI prediction. Furthermore, case studies validate the practical effectiveness of GCNMM, highlighting its potential in DTI prediction and drug repositioning.

背景:预测药物-靶标相互作用(DTIs)在加速药物重新定位、确定候选药物优先级和降低实验成本方面发挥着关键作用。尽管深度学习取得了进步,但仍有一些挑战需要进一步探索,包括稀疏性和特征关系的不充分表示。结果:我们提出了一种基于元路径和互信息的新型图形卷积网络GCNMM,用于预测药物靶点异构网络中的潜在dti。我们的方法首先构建了一个基于元路径和图注意网络的融合DTI网络。我们利用Jaccard系数计算多个相似网络,并通过基于熵的融合将其整合到融合的药物和靶标相似网络中。然后通过图卷积自编码器对这些网络进行联合处理,生成低维特征表示。为了在嵌入空间中保留原始网络的拓扑结构,并加强输入和潜在表示之间的关系,我们将空间拓扑一致性和互信息最大化作为双重优化目标。结论:实验结果表明,GCNMM在DTI预测中表现出优于现有基线模型的性能。此外,案例研究验证了GCNMM的实际有效性,突出了其在DTI预测和药物重新定位方面的潜力。
{"title":"Graph convolution network based on meta-paths and mutual information for drug-target interaction prediction.","authors":"Shujuan Cao, Binying Cai, Zhejian Qiu, Tiantian Chang, Qiqige Wuyun, Fang-Xiang Wu","doi":"10.1186/s12859-025-06295-x","DOIUrl":"10.1186/s12859-025-06295-x","url":null,"abstract":"<p><strong>Background: </strong>Predicting drug-target interactions (DTIs) plays a pivotal role in accelerating drug repositioning by prioritizing candidate drugs and reducing experimental costs. Despite advancements in deep learning, several challenges still require further exploration, including sparsity and inadequate representation of feature relationships.</p><p><strong>Results: </strong>We propose GCNMM, a novel graph convolutional network based on meta-paths and mutual information, to predict latent DTIs in drug-target heterogeneous networks. Our approach begins by constructing a fused DTI network based on meta-paths and a graph attention network. We compute multiple similarity networks by using Jaccard coefficients and integrate them into the fused drug and target similarity networks through entropy-based fusion. These networks are then jointly processed by graph convolutional auto-encoder to generate low-dimensional feature representations. To preserve the topological structure of the original network in the embedding space and strengthen the relationship between the input and latent representations, we incorporate spatial topological consistency and mutual information maximization as dual optimization objectives.</p><p><strong>Conclusions: </strong>The experimental results illustrate that GCNMM exhibits superior performance to existing baseline models in DTI prediction. Furthermore, case studies validate the practical effectiveness of GCNMM, highlighting its potential in DTI prediction and drug repositioning.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"275"},"PeriodicalIF":3.3,"publicationDate":"2025-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12595897/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145470547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TransST: transfer learning embedded spatial factor modeling of spatial transcriptomics data. TransST:迁移学习嵌入空间转录组学数据的空间因子建模。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-11-06 DOI: 10.1186/s12859-025-06099-z
Shuo Shuo Liu, Shikun Wang, Yuxuan Chen, Anil K Rustgi, Ming Yuan, Jianhua Hu

Background: Spatial transcriptomics have emerged as a powerful tool in biomedical research because of its ability to capture both the spatial contexts and abundance of the complete RNA transcript profile in organs of interest. However, limitations of the technology such as the relatively low resolution and comparatively insufficient sequencing depth make it difficult to reliably extract real biological signals from these data. To alleviate this challenge, we propose a novel transfer learning framework, referred to as TransST, to adaptively leverage the cell-labeled information from external sources in inferring cell-level heterogeneity of a target spatial transcriptomics data.

Results: Applications in several real studies as well as a number of simulation settings show that our approach significantly improves existing techniques. For example, in the breast cancer study, TransST successfully identifies five biologically meaningful cell clusters, including the two subgroups of cancer in situ and invasive cancer; in addition, only TransST is able to separate the adipose tissues from the connective issues among all the studied methods.

Conclusions: In summary, the proposed method TransST is both effective and robust in identifying cell subclusters and detecting corresponding driving biomarkers in spatial transcriptomics data.

背景:空间转录组学已经成为生物医学研究的有力工具,因为它能够捕获感兴趣器官的空间背景和完整RNA转录谱的丰度。然而,该技术的局限性,如相对较低的分辨率和相对不足的测序深度,使得难以可靠地从这些数据中提取真实的生物信号。为了缓解这一挑战,我们提出了一种新的迁移学习框架,称为TransST,以自适应地利用来自外部来源的细胞标记信息来推断目标空间转录组学数据的细胞水平异质性。结果:在几项实际研究中的应用以及一些模拟设置表明,我们的方法显着改进了现有技术。例如,在乳腺癌研究中,TransST成功识别了五个具有生物学意义的细胞簇,包括原位癌和浸润性癌两个亚群;此外,在所有研究的方法中,只有TransST能够将脂肪组织与结缔组织分离。综上所述,TransST方法在空间转录组学数据中识别细胞亚簇和检测相应的驱动生物标志物方面既有效又稳健。
{"title":"TransST: transfer learning embedded spatial factor modeling of spatial transcriptomics data.","authors":"Shuo Shuo Liu, Shikun Wang, Yuxuan Chen, Anil K Rustgi, Ming Yuan, Jianhua Hu","doi":"10.1186/s12859-025-06099-z","DOIUrl":"10.1186/s12859-025-06099-z","url":null,"abstract":"<p><strong>Background: </strong>Spatial transcriptomics have emerged as a powerful tool in biomedical research because of its ability to capture both the spatial contexts and abundance of the complete RNA transcript profile in organs of interest. However, limitations of the technology such as the relatively low resolution and comparatively insufficient sequencing depth make it difficult to reliably extract real biological signals from these data. To alleviate this challenge, we propose a novel transfer learning framework, referred to as TransST, to adaptively leverage the cell-labeled information from external sources in inferring cell-level heterogeneity of a target spatial transcriptomics data.</p><p><strong>Results: </strong>Applications in several real studies as well as a number of simulation settings show that our approach significantly improves existing techniques. For example, in the breast cancer study, TransST successfully identifies five biologically meaningful cell clusters, including the two subgroups of cancer in situ and invasive cancer; in addition, only TransST is able to separate the adipose tissues from the connective issues among all the studied methods.</p><p><strong>Conclusions: </strong>In summary, the proposed method TransST is both effective and robust in identifying cell subclusters and detecting corresponding driving biomarkers in spatial transcriptomics data.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"274"},"PeriodicalIF":3.3,"publicationDate":"2025-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12593783/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145457374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A lightweight single-view contrastive learning hypergraph neural network for food-microbe-disease association prediction. 用于食物-微生物-疾病关联预测的轻量级单视图对比学习超图神经网络。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-11-04 DOI: 10.1186/s12859-025-06283-1
Jianqiang Hu, Mingyi Hu, Yangxiang Wu, Songyao Mu, Dahao Huang, Baolong Wang, Yuchen Gao, Shixin Gu, Jinlin Zhu

Background: Identifying potential associations among food, gut microbiota and disease is fundamental for elucidating interaction mechanisms and advancing personalized healthy dietary strategies. While computational methods have been extensively applied to predict microbiota-disease associations, methods on predicting food-microbiota relationships remain limited, particularly regarding higher-order food-microbiota-disease interactions.

Results: In this work, we construct a food-microbe-disease (FMD) database encompassing 190 food items, 219 gut microbiota species, and 163 disease entities, resulting in 17,065 FMD associations. We then propose a lightweight single-view contrastive learning hypergraph neural network (LSCHNN) for FMD association prediction on the sparse FMD dataset. LSCHNN formulates ternary FMD interactions as a hypergraph, in which foods, microbes, and diseases are represented by nodes and FMD triplets are represented by hyperedges, and leverages the biological features of foods, microbes, and diseases as node attributes. Subsequently, a hypergraph neural network is designed to learn the embeddings of foods, microbes, and diseases from the hypergraph and predict potential ternary FMD associations. Additionally, we incorporate a single-view contrastive learning mechanism that enhances the model's ability to extract discriminative features and improves generalization on sparse data. Comprehensive comparison experiments demonstrate that LSCHNN outperforms other state-of-the-art methods in terms of the precision of predicting ternary FMD associations and discovering more potential FMD associations. Case studies on two microbes further confirm the effectiveness of LSCHNN in identifying potential FMD associations.

Conclusions: A novel computational model, LSCHNN, is proposed, marking the first integration of hypergraph neural networks with lightweight single-view contrastive learning for ternary FMD association prediction, providing a groundbreaking framework for precision nutrition and personalized dietary interventions.

背景:确定食物、肠道菌群和疾病之间的潜在关联是阐明相互作用机制和推进个性化健康饮食策略的基础。虽然计算方法已广泛应用于预测微生物群-疾病关联,但预测食物-微生物群关系的方法仍然有限,特别是关于高阶食物-微生物群-疾病相互作用的方法。结果:在这项工作中,我们构建了一个食物微生物-疾病(FMD)数据库,包括190种食物,219种肠道微生物群和163种疾病实体,得出17065种口蹄疫关联。然后,我们提出了一种轻量级的单视图对比学习超图神经网络(LSCHNN),用于稀疏FMD数据集上的FMD关联预测。LSCHNN将三元口蹄疫相互作用表述为一个超图,其中食物、微生物和疾病由节点表示,口蹄疫三元组由超边缘表示,并利用食物、微生物和疾病的生物学特征作为节点属性。随后,设计了一个超图神经网络,从超图中学习食物、微生物和疾病的嵌入,并预测潜在的三元口蹄疫关联。此外,我们结合了一个单视图对比学习机制,增强了模型提取判别特征的能力,提高了对稀疏数据的泛化。综合对比实验表明,LSCHNN在预测三元FMD关联和发现更多潜在FMD关联的精度方面优于其他最先进的方法。对两种微生物的案例研究进一步证实了LSCHNN在识别口蹄疫潜在关联方面的有效性。结论:提出了一种新的计算模型LSCHNN,这标志着超图神经网络与轻量级单视图对比学习的首次集成,用于三元FMD关联预测,为精确营养和个性化饮食干预提供了开创性的框架。
{"title":"A lightweight single-view contrastive learning hypergraph neural network for food-microbe-disease association prediction.","authors":"Jianqiang Hu, Mingyi Hu, Yangxiang Wu, Songyao Mu, Dahao Huang, Baolong Wang, Yuchen Gao, Shixin Gu, Jinlin Zhu","doi":"10.1186/s12859-025-06283-1","DOIUrl":"10.1186/s12859-025-06283-1","url":null,"abstract":"<p><strong>Background: </strong>Identifying potential associations among food, gut microbiota and disease is fundamental for elucidating interaction mechanisms and advancing personalized healthy dietary strategies. While computational methods have been extensively applied to predict microbiota-disease associations, methods on predicting food-microbiota relationships remain limited, particularly regarding higher-order food-microbiota-disease interactions.</p><p><strong>Results: </strong>In this work, we construct a food-microbe-disease (FMD) database encompassing 190 food items, 219 gut microbiota species, and 163 disease entities, resulting in 17,065 FMD associations. We then propose a lightweight single-view contrastive learning hypergraph neural network (LSCHNN) for FMD association prediction on the sparse FMD dataset. LSCHNN formulates ternary FMD interactions as a hypergraph, in which foods, microbes, and diseases are represented by nodes and FMD triplets are represented by hyperedges, and leverages the biological features of foods, microbes, and diseases as node attributes. Subsequently, a hypergraph neural network is designed to learn the embeddings of foods, microbes, and diseases from the hypergraph and predict potential ternary FMD associations. Additionally, we incorporate a single-view contrastive learning mechanism that enhances the model's ability to extract discriminative features and improves generalization on sparse data. Comprehensive comparison experiments demonstrate that LSCHNN outperforms other state-of-the-art methods in terms of the precision of predicting ternary FMD associations and discovering more potential FMD associations. Case studies on two microbes further confirm the effectiveness of LSCHNN in identifying potential FMD associations.</p><p><strong>Conclusions: </strong>A novel computational model, LSCHNN, is proposed, marking the first integration of hypergraph neural networks with lightweight single-view contrastive learning for ternary FMD association prediction, providing a groundbreaking framework for precision nutrition and personalized dietary interventions.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"273"},"PeriodicalIF":3.3,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12584493/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145443977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A shrinkage-based statistical method for testing group mean differences in quantitative bottom-up proteomics. 一种基于收缩的统计学方法用于检测定量自底向上蛋白质组学的组平均差异。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-10-31 DOI: 10.1186/s12859-025-06275-1
Namgil Lee, Hojin Yoo, Juhyoung Kim, Heejung Yang

Background: In bottom-up proteomics using data-independent acquisition mass spectrometry (DIA-MS), quantitative measurements are obtained following multiple steps of protein fragmentation and ionization, which introduces cumulative errors and impairs the effectiveness of classical statistical methods. This study proposes an alternative statistical approach for testing group mean differences at the peptide level in quantitative bottom-up proteomics.

Results: We present a novel probabilistic graphical model, that accounts for the non-normality of empirical distributions and the correlations between fragment ion quantities. Based on the model, we propose a new statistical method that improves upon the classical feature-based approach by incorporating distribution-free shrinkage estimation of covariance matrices and bootstrap-based estimation of degrees-of-freedom. Simulated experiments demonstrate that the proposed method outperforms the four most widely used classical methods in terms of specificity, sensitivity, and accuracy, particularly when the data distribution closely resembles real MS data, and under conditions of small sample sizes. Numerical analysis of real quantitative tandem mass spectrometry data reveals that the proposed method effectively identifies candidate peptides exhibiting changes in mean quantity following treatment with the kinase inhibitor Staurosporine.

Conclusions: The proposed statistical method offers an effective alternative to classical approaches for differential analysis of peptides in quantitative bottom-up proteomics using DIA-MS. The R software package MDstatsDIAMS is available at https://github.com/namgillee/MDstatsDIAMS .

背景:在基于数据独立获取质谱(DIA-MS)的自下而上蛋白质组学中,定量测量是在蛋白质片段化和电离的多个步骤之后获得的,这引入了累积误差,损害了经典统计方法的有效性。本研究提出了一种替代的统计方法,用于在定量自下而上的蛋白质组学中测试肽水平的组平均差异。结果:我们提出了一个新的概率图形模型,该模型解释了经验分布的非正态性和碎片离子数量之间的相关性。在此基础上,我们提出了一种新的统计方法,该方法在经典的基于特征的方法基础上,结合了协方差矩阵的无分布收缩估计和基于bootstrap的自由度估计。模拟实验表明,该方法在特异性、灵敏度和准确性方面优于四种最广泛使用的经典方法,特别是当数据分布与真实MS数据接近时,以及在小样本量条件下。实际定量串联质谱数据的数值分析表明,所提出的方法有效地识别出在使用激酶抑制剂Staurosporine治疗后平均数量发生变化的候选肽。结论:所提出的统计方法为使用DIA-MS进行定量自下而上蛋白质组学的肽差异分析提供了一种有效的替代方法。R软件包MDstatsDIAMS可从https://github.com/namgillee/MDstatsDIAMS获得。
{"title":"A shrinkage-based statistical method for testing group mean differences in quantitative bottom-up proteomics.","authors":"Namgil Lee, Hojin Yoo, Juhyoung Kim, Heejung Yang","doi":"10.1186/s12859-025-06275-1","DOIUrl":"10.1186/s12859-025-06275-1","url":null,"abstract":"<p><strong>Background: </strong>In bottom-up proteomics using data-independent acquisition mass spectrometry (DIA-MS), quantitative measurements are obtained following multiple steps of protein fragmentation and ionization, which introduces cumulative errors and impairs the effectiveness of classical statistical methods. This study proposes an alternative statistical approach for testing group mean differences at the peptide level in quantitative bottom-up proteomics.</p><p><strong>Results: </strong>We present a novel probabilistic graphical model, that accounts for the non-normality of empirical distributions and the correlations between fragment ion quantities. Based on the model, we propose a new statistical method that improves upon the classical feature-based approach by incorporating distribution-free shrinkage estimation of covariance matrices and bootstrap-based estimation of degrees-of-freedom. Simulated experiments demonstrate that the proposed method outperforms the four most widely used classical methods in terms of specificity, sensitivity, and accuracy, particularly when the data distribution closely resembles real MS data, and under conditions of small sample sizes. Numerical analysis of real quantitative tandem mass spectrometry data reveals that the proposed method effectively identifies candidate peptides exhibiting changes in mean quantity following treatment with the kinase inhibitor Staurosporine.</p><p><strong>Conclusions: </strong>The proposed statistical method offers an effective alternative to classical approaches for differential analysis of peptides in quantitative bottom-up proteomics using DIA-MS. The R software package MDstatsDIAMS is available at https://github.com/namgillee/MDstatsDIAMS .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"269"},"PeriodicalIF":3.3,"publicationDate":"2025-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12577184/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145421027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ESAE-SDA: ensemble sparse autoencoder framework for epigenomics-informed snoRNA-disease associations prediction. ESAE-SDA:用于表观基因组学信息的snorna疾病关联预测的集合稀疏自编码器框架。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-10-31 DOI: 10.1186/s12859-025-06290-2
Xinqing Jiang, Xiaojun Chen, Lifeng Xu, Feng Zhang, Jiawei Chen, Wenqian Zhang

Small nucleolar RNAs (snoRNAs), a class of non-coding RNAs broadly distributed in eukaryotes, are emerging as pivotal regulators in the field of epigenomics. In addition to guiding 2'-O-methylation and pseudouridylation modifications at specific rRNA sites to maintain ribosomal stability and support protein synthesis, snoRNAs have been increasingly implicated in epigenetic regulation, influencing gene expression, chromatin architecture, and RNA modification patterns. Accurate identification of potential snoRNA-disease associations (SDAs) is therefore essential for understanding epigenomic dysregulation in complex diseases and facilitating early intervention and drug repurposing. Although artificial intelligence (AI) methods have advanced SDA prediction, they are still hindered by issues such as sample imbalance and high false-negative rates. To address these challenges, we propose ESAE-SDA, a novel model integrating sparse autoencoders with an ensemble learning framework. ESAE-SDA first constructs a comprehensive snoRNA-disease representation using multi-source similarity metrics. It then applies k-means clustering to select high-confidence negative samples and employs a deep sparse autoencoder with sparsity constraints to learn compact, discriminative embeddings. Finally, multiple GNN-based learners are independently trained on dynamically resampled data, and ensemble inference is performed via weighted fusion, substantially enhancing robustness and generalization. Experiments on a public SDA dataset demonstrate that ESAE-SDA consistently outperforms state-of-the-art methods. Notably, a case study on ophthalmic diseases highlights the model's ability to uncover epigenetically relevant snoRNAs with potential regulatory and therapeutic significance, underscoring its value in epigenomics-driven disease research and target discovery.

小核核rna (Small nucleolar RNAs, snoRNAs)是一类广泛分布于真核生物中的非编码rna,在表观基因组学领域正逐渐成为关键的调控因子。除了指导特定rRNA位点的2'- o甲基化和假尿嘧啶化修饰以维持核糖体稳定性和支持蛋白质合成外,snoRNAs还越来越多地参与表观遗传调控,影响基因表达、染色质结构和RNA修饰模式。因此,准确识别潜在的snorna -疾病关联(SDAs)对于理解复杂疾病中的表观基因组失调以及促进早期干预和药物重新利用至关重要。尽管人工智能(AI)方法在SDA预测方面取得了进步,但仍然受到样本不平衡和高假阴性率等问题的阻碍。为了解决这些挑战,我们提出了ESAE-SDA,这是一种将稀疏自编码器与集成学习框架集成在一起的新模型。ESAE-SDA首先使用多源相似性指标构建了一个全面的snorna疾病表示。然后,它应用k-means聚类来选择高置信度的负样本,并使用具有稀疏性约束的深度稀疏自编码器来学习紧凑的判别嵌入。最后,在动态重采样数据上独立训练多个基于gnn的学习器,并通过加权融合进行集成推理,大大增强了鲁棒性和泛化能力。在公共SDA数据集上的实验表明,ESAE-SDA始终优于最先进的方法。值得注意的是,眼科疾病的一个案例研究强调了该模型发现具有潜在调节和治疗意义的表观遗传相关的snorna的能力,强调了其在表观基因组驱动的疾病研究和靶点发现中的价值。
{"title":"ESAE-SDA: ensemble sparse autoencoder framework for epigenomics-informed snoRNA-disease associations prediction.","authors":"Xinqing Jiang, Xiaojun Chen, Lifeng Xu, Feng Zhang, Jiawei Chen, Wenqian Zhang","doi":"10.1186/s12859-025-06290-2","DOIUrl":"10.1186/s12859-025-06290-2","url":null,"abstract":"<p><p>Small nucleolar RNAs (snoRNAs), a class of non-coding RNAs broadly distributed in eukaryotes, are emerging as pivotal regulators in the field of epigenomics. In addition to guiding 2'-O-methylation and pseudouridylation modifications at specific rRNA sites to maintain ribosomal stability and support protein synthesis, snoRNAs have been increasingly implicated in epigenetic regulation, influencing gene expression, chromatin architecture, and RNA modification patterns. Accurate identification of potential snoRNA-disease associations (SDAs) is therefore essential for understanding epigenomic dysregulation in complex diseases and facilitating early intervention and drug repurposing. Although artificial intelligence (AI) methods have advanced SDA prediction, they are still hindered by issues such as sample imbalance and high false-negative rates. To address these challenges, we propose ESAE-SDA, a novel model integrating sparse autoencoders with an ensemble learning framework. ESAE-SDA first constructs a comprehensive snoRNA-disease representation using multi-source similarity metrics. It then applies k-means clustering to select high-confidence negative samples and employs a deep sparse autoencoder with sparsity constraints to learn compact, discriminative embeddings. Finally, multiple GNN-based learners are independently trained on dynamically resampled data, and ensemble inference is performed via weighted fusion, substantially enhancing robustness and generalization. Experiments on a public SDA dataset demonstrate that ESAE-SDA consistently outperforms state-of-the-art methods. Notably, a case study on ophthalmic diseases highlights the model's ability to uncover epigenetically relevant snoRNAs with potential regulatory and therapeutic significance, underscoring its value in epigenomics-driven disease research and target discovery.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"270"},"PeriodicalIF":3.3,"publicationDate":"2025-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12577141/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145421061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
BMC Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1