首页 > 最新文献

BMC Bioinformatics最新文献

英文 中文
Contrastive learning for cell division detection and tracking in live cell imaging data. 活细胞成像数据中细胞分裂检测和跟踪的对比学习。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-12-27 DOI: 10.1186/s12859-025-06344-5
Daniel Zyss, Amritansh Sharma, Susana A Ribeiro, Claire E Repellin, Oliver Lai, Mary J C Ludlam, Thomas Walter, Amin Fehri
{"title":"Contrastive learning for cell division detection and tracking in live cell imaging data.","authors":"Daniel Zyss, Amritansh Sharma, Susana A Ribeiro, Claire E Repellin, Oliver Lai, Mary J C Ludlam, Thomas Walter, Amin Fehri","doi":"10.1186/s12859-025-06344-5","DOIUrl":"10.1186/s12859-025-06344-5","url":null,"abstract":"","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"30"},"PeriodicalIF":3.3,"publicationDate":"2025-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12859858/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145846402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AR-CDT NET: a deep deformable convolutional network for gut microbiome-based disease classification. ar - cdtnet:用于肠道微生物群疾病分类的深度可变形卷积网络。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-12-26 DOI: 10.1186/s12859-025-06357-0
Jiaye Li, Zijian Sun, Shuo Chai, Hangming Li, Yijun Wang, Jingkui Tian

Advances in metagenomic sequencing have increasingly implicated gut microbiome dysbiosis in numerous complex diseases, yet its application for precise differential diagnosis remains a major challenge. Existing computational approaches often show limited predictive performance and insufficient robustness when applied to large-scale, imbalanced microbiome datasets, and they typically lack mechanisms to effectively capture microbial community-level or functional guild interactions. To address these limitations, we developed AR-CDT Net, a novel deep learning framework that integrates a Multi-Scale Deformable Convolution (MS-DConv) module with a Channel-wise Dynamic Tanh (CD-Tanh) activation function to achieve more accurate and robust classification of host disease states. Evaluated on a large-scale cohort comprising over 8000 samples spanning eight disease phenotypes, AR-CDT Net demonstrated highly competitive within-cohort performance, outperforming nine representative models across the majority of classification tasks. Importantly, in a stringent cross-dataset generalization test, the model was trained on the highly imbalanced primary multi-disease cohort and validated on relatively balanced independent external cohorts. It achieved a statistically significant AUC of 0.7921 on the highly heterogeneous external T2D cohort, confirming that AR-CDT captures transferable biological signals rather than dataset-specific artifacts. Furthermore, by combining dimensionality reduction with SHAP-based interpretation of our One-vs-Rest (OvR) classifiers, AR-CDT disentangles disease-specific pathogenic signatures from the shared dysbiotic background among clinically distinct yet microbially similar diseases.

宏基因组测序的进展越来越多地涉及许多复杂疾病的肠道微生物群失调,但其在精确鉴别诊断中的应用仍然是一个主要挑战。当应用于大规模、不平衡的微生物组数据集时,现有的计算方法往往表现出有限的预测性能和不足的鲁棒性,并且它们通常缺乏有效捕获微生物群落水平或功能guild相互作用的机制。为了解决这些限制,我们开发了AR-CDT Net,这是一个新的深度学习框架,它集成了多尺度可变形卷积(MS-DConv)模块和通道动态Tanh (CD-Tanh)激活函数,以实现更准确和稳健的宿主疾病状态分类。在一个包含8000多个样本、跨越8种疾病表型的大规模队列中进行评估,AR-CDT Net在队列内表现出高度的竞争力,在大多数分类任务中优于9个代表性模型。重要的是,在严格的跨数据集泛化检验中,该模型在高度不平衡的原发性多疾病队列上进行了训练,并在相对平衡的独立外部队列上进行了验证。在高度异质性的外部T2D队列中,AUC达到了统计学意义上的0.7921,证实AR-CDT捕获的是可转移的生物信号,而不是数据集特定的伪像。此外,通过结合降维和基于shap的One-vs-Rest (OvR)分类器的解释,AR-CDT从临床不同但微生物相似的疾病中共享的生态失调背景中分离出疾病特异性致病特征。
{"title":"AR-CDT NET: a deep deformable convolutional network for gut microbiome-based disease classification.","authors":"Jiaye Li, Zijian Sun, Shuo Chai, Hangming Li, Yijun Wang, Jingkui Tian","doi":"10.1186/s12859-025-06357-0","DOIUrl":"10.1186/s12859-025-06357-0","url":null,"abstract":"<p><p>Advances in metagenomic sequencing have increasingly implicated gut microbiome dysbiosis in numerous complex diseases, yet its application for precise differential diagnosis remains a major challenge. Existing computational approaches often show limited predictive performance and insufficient robustness when applied to large-scale, imbalanced microbiome datasets, and they typically lack mechanisms to effectively capture microbial community-level or functional guild interactions. To address these limitations, we developed AR-CDT Net, a novel deep learning framework that integrates a Multi-Scale Deformable Convolution (MS-DConv) module with a Channel-wise Dynamic Tanh (CD-Tanh) activation function to achieve more accurate and robust classification of host disease states. Evaluated on a large-scale cohort comprising over 8000 samples spanning eight disease phenotypes, AR-CDT Net demonstrated highly competitive within-cohort performance, outperforming nine representative models across the majority of classification tasks. Importantly, in a stringent cross-dataset generalization test, the model was trained on the highly imbalanced primary multi-disease cohort and validated on relatively balanced independent external cohorts. It achieved a statistically significant AUC of 0.7921 on the highly heterogeneous external T2D cohort, confirming that AR-CDT captures transferable biological signals rather than dataset-specific artifacts. Furthermore, by combining dimensionality reduction with SHAP-based interpretation of our One-vs-Rest (OvR) classifiers, AR-CDT disentangles disease-specific pathogenic signatures from the shared dysbiotic background among clinically distinct yet microbially similar diseases.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"23"},"PeriodicalIF":3.3,"publicationDate":"2025-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12849458/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145843427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical modelling of an outcome variable with integrated multi-omics. 综合多组学结果变量的统计建模。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-12-24 DOI: 10.1186/s12859-025-06349-0
He Li, Zander Gu, Said El Bouhaddani, Jeanine Houwing-Duistermaat

Background: In studies that aim to model the relationship between an outcome variable and multiple omics datasets, it is often desirable to reduce the dimensionality of these datasets or to represent one omics dataset in terms of another. Several approaches exist for this purpose, including univariate methods such as polygenic scores, and multivariate methods. Multivariate approaches offer advantages by producing lower-dimensional integrative scores, capturing joint structures across datasets, and filtering out dataset-specific noise. In this paper, we describe one univariate and two multivariate methods, and evaluate their performance through simulations involving two correlated multivariate normally distributed omics datasets, as well as a combination of one multivariate normal and one fixed categorical dataset.

Results: We assess method performance using the root mean squared error (RMSE) when modelling the outcome variable as a function of the reduced omics representations. Multivariate methods generally perform well, particularly when a slightly higher number of components is used for integration. They outperform the univariate method in scenarios involving two normally distributed omics datasets and perform comparably in settings with one normal and one categorical dataset. In real data applications, including two metabolomics datasets from TwinsUK and a metabolomics-genetic dataset from ORCADES, all methods show similar performance in modelling body mass index.

Conclusions: Multivariate methods provide a valuable framework for summarizing multi-omics datasets into low-dimensional components suitable for outcome modelling. Even in the presence of non-normal data, these methods offer a promising alternative to high-dimensional univariate approaches.

背景:在旨在为结果变量和多个组学数据集之间的关系建模的研究中,通常需要降低这些数据集的维数,或者用另一个组学数据集表示一个组学数据集。有几种方法可以达到这个目的,包括单变量方法,如多基因评分和多变量方法。多变量方法通过产生低维综合分数、捕获数据集之间的联合结构以及过滤数据集特定的噪声来提供优势。在本文中,我们描述了一种单变量和两种多变量方法,并通过两个相关的多变量正态分布组学数据集,以及一个多变量正态和一个固定类别数据集的组合的模拟来评估它们的性能。结果:我们使用均方根误差(RMSE)来评估方法的性能,将结果变量建模为减少组学表征的函数。多元方法通常表现良好,特别是当用于集成的组件数量略高时。它们在涉及两个正态分布组学数据集的场景中优于单变量方法,并且在一个正态和一个分类数据集的设置中表现相当。在实际数据应用中,包括来自TwinsUK的两个代谢组学数据集和来自ORCADES的代谢组学-遗传数据集,所有方法在模拟体重指数方面都显示出相似的性能。结论:多变量方法为将多组学数据集汇总为适合结果建模的低维组件提供了一个有价值的框架。即使在存在非正态数据的情况下,这些方法也为高维单变量方法提供了一个有希望的替代方法。
{"title":"Statistical modelling of an outcome variable with integrated multi-omics.","authors":"He Li, Zander Gu, Said El Bouhaddani, Jeanine Houwing-Duistermaat","doi":"10.1186/s12859-025-06349-0","DOIUrl":"10.1186/s12859-025-06349-0","url":null,"abstract":"<p><strong>Background: </strong>In studies that aim to model the relationship between an outcome variable and multiple omics datasets, it is often desirable to reduce the dimensionality of these datasets or to represent one omics dataset in terms of another. Several approaches exist for this purpose, including univariate methods such as polygenic scores, and multivariate methods. Multivariate approaches offer advantages by producing lower-dimensional integrative scores, capturing joint structures across datasets, and filtering out dataset-specific noise. In this paper, we describe one univariate and two multivariate methods, and evaluate their performance through simulations involving two correlated multivariate normally distributed omics datasets, as well as a combination of one multivariate normal and one fixed categorical dataset.</p><p><strong>Results: </strong>We assess method performance using the root mean squared error (RMSE) when modelling the outcome variable as a function of the reduced omics representations. Multivariate methods generally perform well, particularly when a slightly higher number of components is used for integration. They outperform the univariate method in scenarios involving two normally distributed omics datasets and perform comparably in settings with one normal and one categorical dataset. In real data applications, including two metabolomics datasets from TwinsUK and a metabolomics-genetic dataset from ORCADES, all methods show similar performance in modelling body mass index.</p><p><strong>Conclusions: </strong>Multivariate methods provide a valuable framework for summarizing multi-omics datasets into low-dimensional components suitable for outcome modelling. Even in the presence of non-normal data, these methods offer a promising alternative to high-dimensional univariate approaches.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"26"},"PeriodicalIF":3.3,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12859906/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145826816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LONMF: a non-negative matrix factorization model based on graph Laplacian and optimal transmission for paired single-cell multi-omics data integration. LONMF:一种基于图拉普拉斯和最优传输的非负矩阵分解模型,用于配对单细胞多组学数据集成。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-12-23 DOI: 10.1186/s12859-025-06301-2
Mengdi Nan, Qing Ren, Yuhan Fu, Xiang Chen, Guanpeng Qi, Liugen Wang, Jie Gao

The rapid development of single-cell sequencing technologies has provided a robust technical support for the efficient resolution of multiple levels of molecular information from a single-cell population. However, the data produced by these technologies often contain a lot of noise and differences in characteristics that make it difficult to integrate and analyze single-cell multi-omics data. In this study, there is a growing demand for methods to integrate single-cell multi-omics data, which is expected to enhance the ability to reveal cellular heterogeneity and provide new biological perspectives for a deeper understanding of cellular phenotypes by jointly analyzing multi-omics data. We propose LONMF, a non-negative matrix factorization algorithm combining graph Laplacian and optimal transmission to enhance clustering performance and interpretability. We apply LONMF to visualize and cluster multi-pair single-cell multi-omics data, including 10X-multi-group, CITE-seq, and TEA-multi-group seq, to facilitate marker characterization and gene ontology enrichment analysis and to provide rich biological insights for downstream analyses. Our comprehensive benchmarking demonstrates that LONMF exhibits comparable performance compared with the current state-of-the-art in cell clustering and outperforms other methods in terms of biological interpretability.

单细胞测序技术的快速发展为单细胞群体多层次分子信息的高效解析提供了强有力的技术支持。然而,这些技术产生的数据往往包含大量的噪声和特征差异,这给单细胞多组学数据的整合和分析带来了困难。在本研究中,对整合单细胞多组学数据的方法的需求日益增长,这有望通过联合分析多组学数据来增强揭示细胞异质性的能力,并为更深入地理解细胞表型提供新的生物学视角。为了提高聚类性能和可解释性,我们提出了一种结合图拉普拉斯和最优传输的非负矩阵分解算法LONMF。我们应用LONMF对多对单细胞多组学数据进行可视化和聚类,包括10x -多组、CITE-seq和tea -多组seq,以促进标记表征和基因本体富集分析,并为下游分析提供丰富的生物学见解。我们的综合基准测试表明,与当前最先进的细胞聚类相比,LONMF表现出相当的性能,并且在生物可解释性方面优于其他方法。
{"title":"LONMF: a non-negative matrix factorization model based on graph Laplacian and optimal transmission for paired single-cell multi-omics data integration.","authors":"Mengdi Nan, Qing Ren, Yuhan Fu, Xiang Chen, Guanpeng Qi, Liugen Wang, Jie Gao","doi":"10.1186/s12859-025-06301-2","DOIUrl":"10.1186/s12859-025-06301-2","url":null,"abstract":"<p><p>The rapid development of single-cell sequencing technologies has provided a robust technical support for the efficient resolution of multiple levels of molecular information from a single-cell population. However, the data produced by these technologies often contain a lot of noise and differences in characteristics that make it difficult to integrate and analyze single-cell multi-omics data. In this study, there is a growing demand for methods to integrate single-cell multi-omics data, which is expected to enhance the ability to reveal cellular heterogeneity and provide new biological perspectives for a deeper understanding of cellular phenotypes by jointly analyzing multi-omics data. We propose LONMF, a non-negative matrix factorization algorithm combining graph Laplacian and optimal transmission to enhance clustering performance and interpretability. We apply LONMF to visualize and cluster multi-pair single-cell multi-omics data, including 10X-multi-group, CITE-seq, and TEA-multi-group seq, to facilitate marker characterization and gene ontology enrichment analysis and to provide rich biological insights for downstream analyses. Our comprehensive benchmarking demonstrates that LONMF exhibits comparable performance compared with the current state-of-the-art in cell clustering and outperforms other methods in terms of biological interpretability.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"294"},"PeriodicalIF":3.3,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12729160/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145817499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Research on drug-drug interaction prediction using capsule neural network based on self-attention mechanism. 基于自注意机制的胶囊神经网络药物相互作用预测研究。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-12-23 DOI: 10.1186/s12859-025-06308-9
Xingxin Chen, Zhuo Wang, Zhen Miao, Bin Nie

Background: Multi-drug combinations represent an effective strategy for treating complex diseases. However, due to the vast number of unknown interactions among drugs, accurately predicting drug-drug interactions (DDIs) is essential for preventing adverse drug reactions that may cause serious harm to patients. Therefore, DDI prediction plays a critical role in pharmacology.

Results: In this paper, we propose a novel DDI prediction model that integrates a self-attention mechanism with a capsule neural network, termed ACaps-DDI. The model effectively combines chemical information from internal drug substructures with biological information from external drug targets and drug-metabolizing enzymes to predict potential drug-drug interactions.

Conclusions: Experimental results on two benchmark datasets show that the ACaps-DDI model outperforms six other classification models across seven evaluation metrics, demonstrating its strong predictive performance and generalization ability. Ablation studies further confirm the effectiveness of individual components within the ACaps-DDI architecture. Finally, case studies involving three drugs (cannabidiol, torasemide, and cyclophosphamide) validate the model's ability to predict previously unknown drug interactions. In conclusion, the ACaps-DDI model exhibits high predictive accuracy for known drugs and demonstrates promising predictive capability for unseen drugs, highlighting its practical significance for clinical research on drug interactions.

背景:多药联合是治疗复杂疾病的有效策略。然而,由于药物之间存在大量未知的相互作用,准确预测药物相互作用(ddi)对于预防可能对患者造成严重伤害的药物不良反应至关重要。因此,DDI预测在药理学中起着至关重要的作用。结果:本文提出了一种将自注意机制与胶囊神经网络相结合的新型DDI预测模型,称为ACaps-DDI。该模型有效地将药物内部亚结构的化学信息与外部药物靶点和药物代谢酶的生物信息结合起来,预测潜在的药物-药物相互作用。结论:在两个基准数据集上的实验结果表明,ACaps-DDI模型在7个评价指标上优于其他6个分类模型,显示出较强的预测性能和泛化能力。消融研究进一步证实了ACaps-DDI结构中单个组件的有效性。最后,涉及三种药物(大麻二酚、托拉塞米和环磷酰胺)的案例研究验证了该模型预测先前未知药物相互作用的能力。综上所述,ACaps-DDI模型对已知药物具有较高的预测精度,对未知药物也具有良好的预测能力,在药物相互作用的临床研究中具有重要的现实意义。
{"title":"Research on drug-drug interaction prediction using capsule neural network based on self-attention mechanism.","authors":"Xingxin Chen, Zhuo Wang, Zhen Miao, Bin Nie","doi":"10.1186/s12859-025-06308-9","DOIUrl":"10.1186/s12859-025-06308-9","url":null,"abstract":"<p><strong>Background: </strong>Multi-drug combinations represent an effective strategy for treating complex diseases. However, due to the vast number of unknown interactions among drugs, accurately predicting drug-drug interactions (DDIs) is essential for preventing adverse drug reactions that may cause serious harm to patients. Therefore, DDI prediction plays a critical role in pharmacology.</p><p><strong>Results: </strong>In this paper, we propose a novel DDI prediction model that integrates a self-attention mechanism with a capsule neural network, termed ACaps-DDI. The model effectively combines chemical information from internal drug substructures with biological information from external drug targets and drug-metabolizing enzymes to predict potential drug-drug interactions.</p><p><strong>Conclusions: </strong>Experimental results on two benchmark datasets show that the ACaps-DDI model outperforms six other classification models across seven evaluation metrics, demonstrating its strong predictive performance and generalization ability. Ablation studies further confirm the effectiveness of individual components within the ACaps-DDI architecture. Finally, case studies involving three drugs (cannabidiol, torasemide, and cyclophosphamide) validate the model's ability to predict previously unknown drug interactions. In conclusion, the ACaps-DDI model exhibits high predictive accuracy for known drugs and demonstrates promising predictive capability for unseen drugs, highlighting its practical significance for clinical research on drug interactions.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"293"},"PeriodicalIF":3.3,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12729404/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145817508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Phylograd: fast column-specific calculation of substitution model gradients. Phylograd:替代模型梯度的快速列特定计算。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-12-23 DOI: 10.1186/s12859-025-06353-4
Benjamin Lieser, Georgy Belousov, Johannes Söding

Background: Most popular tools for reconstructing phylogenetic trees from multiple sequence alignments use a model of molecular evolution in which a single substitution matrix or a small set of fixed matrices are shared between all columns. Models with column-specific rate matrices can in principle be fit by automatic differentiation methods, but in practice the heavy computational burden associated with computing the gradients of the many matrix exponentials has hindered exploration of such models.

Implementation: Here, we present a highly efficient approach for reverse-mode differentiation of the log likelihood computed with Felsenstein's algorithm under any time-reversible substitution model. PhyloGrad is implemented in Rust and has Python bindings to easily combine it with automatic differentiation tools.

Results: Depending on the tree size, PhyloGrad is 30-100 times faster than automatic differentiation in Pytorch and uses 10-100 times less memory. Even in the task of fitting one global model it is still at least 10 times faster than IQ-TREE3. PhyloGrad accelerates current model optimizations and enables the field to easily explore and implement novel site-specific models.

背景:从多个序列比对中重建系统发育树的最流行的工具使用分子进化模型,其中单个替代矩阵或一小组固定矩阵在所有列之间共享。具有列特定率矩阵的模型原则上可以通过自动微分方法拟合,但在实践中,与计算许多矩阵指数的梯度相关的繁重计算负担阻碍了对此类模型的探索。实现:在这里,我们提出了一种在任何时间可逆替代模型下用Felsenstein算法计算的对数似然的反模式微分的高效方法。PhyloGrad是在Rust中实现的,并具有Python绑定,可以轻松地将其与自动区分工具组合在一起。结果:根据树的大小,PhyloGrad比Pytorch中的自动分化快30-100倍,使用的内存少10-100倍。即使在拟合一个全球模型的任务中,它仍然比IQ-TREE3至少快10倍。PhyloGrad加速了当前的模型优化,使该领域能够轻松地探索和实施新的特定于现场的模型。
{"title":"Phylograd: fast column-specific calculation of substitution model gradients.","authors":"Benjamin Lieser, Georgy Belousov, Johannes Söding","doi":"10.1186/s12859-025-06353-4","DOIUrl":"10.1186/s12859-025-06353-4","url":null,"abstract":"<p><strong>Background: </strong>Most popular tools for reconstructing phylogenetic trees from multiple sequence alignments use a model of molecular evolution in which a single substitution matrix or a small set of fixed matrices are shared between all columns. Models with column-specific rate matrices can in principle be fit by automatic differentiation methods, but in practice the heavy computational burden associated with computing the gradients of the many matrix exponentials has hindered exploration of such models.</p><p><strong>Implementation: </strong>Here, we present a highly efficient approach for reverse-mode differentiation of the log likelihood computed with Felsenstein's algorithm under any time-reversible substitution model. PhyloGrad is implemented in Rust and has Python bindings to easily combine it with automatic differentiation tools.</p><p><strong>Results: </strong>Depending on the tree size, PhyloGrad is 30-100 times faster than automatic differentiation in Pytorch and uses 10-100 times less memory. Even in the task of fitting one global model it is still at least 10 times faster than IQ-TREE3. PhyloGrad accelerates current model optimizations and enables the field to easily explore and implement novel site-specific models.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"20"},"PeriodicalIF":3.3,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145817545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical clustering-based coarse-to-fine classification framework for microbial protein function prediction. 基于层次聚类的微生物蛋白功能预测粗精分类框架。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-12-20 DOI: 10.1186/s12859-025-06326-7
Shengyang Chen, Xinyue Gao, Congmin Zhu, Honglei Liu, Yuqing Yang
{"title":"Hierarchical clustering-based coarse-to-fine classification framework for microbial protein function prediction.","authors":"Shengyang Chen, Xinyue Gao, Congmin Zhu, Honglei Liu, Yuqing Yang","doi":"10.1186/s12859-025-06326-7","DOIUrl":"10.1186/s12859-025-06326-7","url":null,"abstract":"","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"301"},"PeriodicalIF":3.3,"publicationDate":"2025-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12750710/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145800203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EasyKASP: a simple and fast tool for KASP primer design. EasyKASP:一个简单快速的KASP引物设计工具。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-12-19 DOI: 10.1186/s12859-025-06322-x
Jian Zhang, Jingjing Yang, Changlong Wen

Background: Kompetitive Allele-Specific PCR (KASP) is a fluorescence-based, high-throughput and cost-effective genotyping technology widely used for detecting single nucleotide polymorphisms (SNPs) and insertion-deletions (InDels) across various species. However, few software tools are available for automatically designing KASP primers, especially for InDel variations.

Results: To address the lack of free and user-friendly automated tools for KASP primer design, we analyzed the sequence characteristics of KASP primers and developed a user-friendly program named EasyKASP on the Excel VBA platform. EasyKASP designs KASP primers for both SNP and InDel variations, with an average processing time of only 0.03 s per primer pair. A total of 80 SNP loci and 6 InDel loci with variations of different lengths were selected to validate the KASP markers designed by EasyKASP, all of which were successfully amplified and genotyped using KASP technology.

Conclusions: EasyKASP is a simple and rapid tool for KASP primer design, demonstrating broad applicability in KASP genotyping studies.

背景:竞争性等位基因特异性PCR (KASP)是一种基于荧光的、高通量和高成本效益的基因分型技术,广泛用于检测各种物种的单核苷酸多态性(snp)和插入缺失(InDels)。然而,很少有软件工具可用于自动设计KASP引物,特别是InDel变体。结果:针对目前KASP引物设计缺乏免费且用户友好的自动化工具的问题,我们分析了KASP引物的序列特征,并在Excel VBA平台上开发了一个用户友好的程序EasyKASP。EasyKASP为SNP和InDel变异设计了KASP引物,每对引物平均处理时间仅为0.03 s。选取80个不同长度变异的SNP位点和6个InDel位点对EasyKASP设计的KASP标记进行验证,并利用KASP技术成功扩增和分型。结论:EasyKASP是一种简便、快速的KASP引物设计工具,在KASP基因分型研究中具有广泛的适用性。
{"title":"EasyKASP: a simple and fast tool for KASP primer design.","authors":"Jian Zhang, Jingjing Yang, Changlong Wen","doi":"10.1186/s12859-025-06322-x","DOIUrl":"10.1186/s12859-025-06322-x","url":null,"abstract":"<p><strong>Background: </strong>Kompetitive Allele-Specific PCR (KASP) is a fluorescence-based, high-throughput and cost-effective genotyping technology widely used for detecting single nucleotide polymorphisms (SNPs) and insertion-deletions (InDels) across various species. However, few software tools are available for automatically designing KASP primers, especially for InDel variations.</p><p><strong>Results: </strong>To address the lack of free and user-friendly automated tools for KASP primer design, we analyzed the sequence characteristics of KASP primers and developed a user-friendly program named EasyKASP on the Excel VBA platform. EasyKASP designs KASP primers for both SNP and InDel variations, with an average processing time of only 0.03 s per primer pair. A total of 80 SNP loci and 6 InDel loci with variations of different lengths were selected to validate the KASP markers designed by EasyKASP, all of which were successfully amplified and genotyped using KASP technology.</p><p><strong>Conclusions: </strong>EasyKASP is a simple and rapid tool for KASP primer design, demonstrating broad applicability in KASP genotyping studies.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"292"},"PeriodicalIF":3.3,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12717768/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145792807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A geometric graph-based deep learning model for drug-target affinity prediction. 基于几何图形的药物靶点亲和力预测深度学习模型。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-12-18 DOI: 10.1186/s12859-025-06347-2
Md Masud Rana, Farjana Tasnim Mukta, Duc D Nguyen
{"title":"A geometric graph-based deep learning model for drug-target affinity prediction.","authors":"Md Masud Rana, Farjana Tasnim Mukta, Duc D Nguyen","doi":"10.1186/s12859-025-06347-2","DOIUrl":"10.1186/s12859-025-06347-2","url":null,"abstract":"","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"19"},"PeriodicalIF":3.3,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12831358/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CIA: unveiling cellular identities with cluster-independent annotation in single-cell RNA sequencing data for comprehensive cell type characterization and exploration. CIA:在单细胞RNA测序数据中使用簇无关注释揭示细胞身份,用于全面的细胞类型表征和探索。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-12-17 DOI: 10.1186/s12859-025-06320-z
Ivan Ferrari, Mattia Battistella, Francesca Vincenti, Andrea Gobbini, Federico Marini, Samuele Notarbartolo, Jole Costanza, Stefano Biffo, Renata Grifantini, Sergio Abrignani, Eugenia Galeota

Background: Single-cell RNA sequencing (scRNA-seq) has revolutionized our understanding of the transcriptional landscape of complex tissues, enabling the discovery of novel cell types and biological functions. However, the identification and classification of cells from scRNA-seq datasets remain significant challenges.

Results: To address this, we developed a new computational tool called CIA (Cluster Independent Annotation), which accurately identifies cell types across different datasets without requiring a fully annotated reference dataset or complex machine learning processes. Based on predefined cell type signatures, CIA provides a highly user-friendly and practical solution to cell-type and functional annotation of single cells. The CIA framework is implemented in both the Python and R programming languages, making it applicable to all main single-cell analysis frameworks, and it is available under the MIT license with its documentation at the following links: Python package: https://pypi.org/project/cia-python/ . Python tutorial: https://cia-python.readthedocs.io/en/latest/tutorial/Cluster_Independent_Annotation.html . R package and tutorial: https://github.com/ingmbioinfo/CIA_R .

Conclusions: Our results demonstrate that CIA classification performances are comparable to the other state-of-the-art approaches, while requiring a significantly lower computational running time. Overall, CIA simplifies the process of obtaining reproducible signature-based cell assignments that can be easily interpreted through graphical summaries providing researchers with a powerful tool to explore the complex transcriptional landscape of single cells.

背景:单细胞RNA测序(scRNA-seq)彻底改变了我们对复杂组织转录景观的理解,使我们能够发现新的细胞类型和生物学功能。然而,从scRNA-seq数据集中鉴定和分类细胞仍然是一个重大挑战。为了解决这个问题,我们开发了一种新的计算工具,称为CIA(集群独立注释),它可以准确地识别不同数据集中的细胞类型,而不需要完全注释的参考数据集或复杂的机器学习过程。基于预定义的细胞类型签名,CIA为单个细胞的细胞类型和功能标注提供了一个高度用户友好和实用的解决方案。CIA框架是用Python和R编程语言实现的,使其适用于所有主要的单细胞分析框架,并且可以在MIT许可下使用其文档链接:Python包:https://pypi.org/project/cia-python/。Python教程:https://cia-python.readthedocs.io/en/latest/tutorial/Cluster_Independent_Annotation.html。R包和教程:https://github.com/ingmbioinfo/CIA_R.Conclusions:我们的结果表明,CIA分类性能与其他最先进的方法相当,同时需要更少的计算运行时间。总的来说,CIA简化了获得可重复的基于签名的细胞分配的过程,可以通过图形摘要轻松解释,为研究人员提供了探索单细胞复杂转录景观的强大工具。
{"title":"CIA: unveiling cellular identities with cluster-independent annotation in single-cell RNA sequencing data for comprehensive cell type characterization and exploration.","authors":"Ivan Ferrari, Mattia Battistella, Francesca Vincenti, Andrea Gobbini, Federico Marini, Samuele Notarbartolo, Jole Costanza, Stefano Biffo, Renata Grifantini, Sergio Abrignani, Eugenia Galeota","doi":"10.1186/s12859-025-06320-z","DOIUrl":"10.1186/s12859-025-06320-z","url":null,"abstract":"<p><strong>Background: </strong>Single-cell RNA sequencing (scRNA-seq) has revolutionized our understanding of the transcriptional landscape of complex tissues, enabling the discovery of novel cell types and biological functions. However, the identification and classification of cells from scRNA-seq datasets remain significant challenges.</p><p><strong>Results: </strong>To address this, we developed a new computational tool called CIA (Cluster Independent Annotation), which accurately identifies cell types across different datasets without requiring a fully annotated reference dataset or complex machine learning processes. Based on predefined cell type signatures, CIA provides a highly user-friendly and practical solution to cell-type and functional annotation of single cells. The CIA framework is implemented in both the Python and R programming languages, making it applicable to all main single-cell analysis frameworks, and it is available under the MIT license with its documentation at the following links: Python package: https://pypi.org/project/cia-python/ . Python tutorial: https://cia-python.readthedocs.io/en/latest/tutorial/Cluster_Independent_Annotation.html . R package and tutorial: https://github.com/ingmbioinfo/CIA_R .</p><p><strong>Conclusions: </strong>Our results demonstrate that CIA classification performances are comparable to the other state-of-the-art approaches, while requiring a significantly lower computational running time. Overall, CIA simplifies the process of obtaining reproducible signature-based cell assignments that can be easily interpreted through graphical summaries providing researchers with a powerful tool to explore the complex transcriptional landscape of single cells.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"38"},"PeriodicalIF":3.3,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12875040/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145773458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
BMC Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1