Pub Date : 2025-12-27DOI: 10.1186/s12859-025-06344-5
Daniel Zyss, Amritansh Sharma, Susana A Ribeiro, Claire E Repellin, Oliver Lai, Mary J C Ludlam, Thomas Walter, Amin Fehri
{"title":"Contrastive learning for cell division detection and tracking in live cell imaging data.","authors":"Daniel Zyss, Amritansh Sharma, Susana A Ribeiro, Claire E Repellin, Oliver Lai, Mary J C Ludlam, Thomas Walter, Amin Fehri","doi":"10.1186/s12859-025-06344-5","DOIUrl":"10.1186/s12859-025-06344-5","url":null,"abstract":"","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"30"},"PeriodicalIF":3.3,"publicationDate":"2025-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12859858/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145846402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Advances in metagenomic sequencing have increasingly implicated gut microbiome dysbiosis in numerous complex diseases, yet its application for precise differential diagnosis remains a major challenge. Existing computational approaches often show limited predictive performance and insufficient robustness when applied to large-scale, imbalanced microbiome datasets, and they typically lack mechanisms to effectively capture microbial community-level or functional guild interactions. To address these limitations, we developed AR-CDT Net, a novel deep learning framework that integrates a Multi-Scale Deformable Convolution (MS-DConv) module with a Channel-wise Dynamic Tanh (CD-Tanh) activation function to achieve more accurate and robust classification of host disease states. Evaluated on a large-scale cohort comprising over 8000 samples spanning eight disease phenotypes, AR-CDT Net demonstrated highly competitive within-cohort performance, outperforming nine representative models across the majority of classification tasks. Importantly, in a stringent cross-dataset generalization test, the model was trained on the highly imbalanced primary multi-disease cohort and validated on relatively balanced independent external cohorts. It achieved a statistically significant AUC of 0.7921 on the highly heterogeneous external T2D cohort, confirming that AR-CDT captures transferable biological signals rather than dataset-specific artifacts. Furthermore, by combining dimensionality reduction with SHAP-based interpretation of our One-vs-Rest (OvR) classifiers, AR-CDT disentangles disease-specific pathogenic signatures from the shared dysbiotic background among clinically distinct yet microbially similar diseases.
{"title":"AR-CDT NET: a deep deformable convolutional network for gut microbiome-based disease classification.","authors":"Jiaye Li, Zijian Sun, Shuo Chai, Hangming Li, Yijun Wang, Jingkui Tian","doi":"10.1186/s12859-025-06357-0","DOIUrl":"10.1186/s12859-025-06357-0","url":null,"abstract":"<p><p>Advances in metagenomic sequencing have increasingly implicated gut microbiome dysbiosis in numerous complex diseases, yet its application for precise differential diagnosis remains a major challenge. Existing computational approaches often show limited predictive performance and insufficient robustness when applied to large-scale, imbalanced microbiome datasets, and they typically lack mechanisms to effectively capture microbial community-level or functional guild interactions. To address these limitations, we developed AR-CDT Net, a novel deep learning framework that integrates a Multi-Scale Deformable Convolution (MS-DConv) module with a Channel-wise Dynamic Tanh (CD-Tanh) activation function to achieve more accurate and robust classification of host disease states. Evaluated on a large-scale cohort comprising over 8000 samples spanning eight disease phenotypes, AR-CDT Net demonstrated highly competitive within-cohort performance, outperforming nine representative models across the majority of classification tasks. Importantly, in a stringent cross-dataset generalization test, the model was trained on the highly imbalanced primary multi-disease cohort and validated on relatively balanced independent external cohorts. It achieved a statistically significant AUC of 0.7921 on the highly heterogeneous external T2D cohort, confirming that AR-CDT captures transferable biological signals rather than dataset-specific artifacts. Furthermore, by combining dimensionality reduction with SHAP-based interpretation of our One-vs-Rest (OvR) classifiers, AR-CDT disentangles disease-specific pathogenic signatures from the shared dysbiotic background among clinically distinct yet microbially similar diseases.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"23"},"PeriodicalIF":3.3,"publicationDate":"2025-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12849458/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145843427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-24DOI: 10.1186/s12859-025-06349-0
He Li, Zander Gu, Said El Bouhaddani, Jeanine Houwing-Duistermaat
Background: In studies that aim to model the relationship between an outcome variable and multiple omics datasets, it is often desirable to reduce the dimensionality of these datasets or to represent one omics dataset in terms of another. Several approaches exist for this purpose, including univariate methods such as polygenic scores, and multivariate methods. Multivariate approaches offer advantages by producing lower-dimensional integrative scores, capturing joint structures across datasets, and filtering out dataset-specific noise. In this paper, we describe one univariate and two multivariate methods, and evaluate their performance through simulations involving two correlated multivariate normally distributed omics datasets, as well as a combination of one multivariate normal and one fixed categorical dataset.
Results: We assess method performance using the root mean squared error (RMSE) when modelling the outcome variable as a function of the reduced omics representations. Multivariate methods generally perform well, particularly when a slightly higher number of components is used for integration. They outperform the univariate method in scenarios involving two normally distributed omics datasets and perform comparably in settings with one normal and one categorical dataset. In real data applications, including two metabolomics datasets from TwinsUK and a metabolomics-genetic dataset from ORCADES, all methods show similar performance in modelling body mass index.
Conclusions: Multivariate methods provide a valuable framework for summarizing multi-omics datasets into low-dimensional components suitable for outcome modelling. Even in the presence of non-normal data, these methods offer a promising alternative to high-dimensional univariate approaches.
{"title":"Statistical modelling of an outcome variable with integrated multi-omics.","authors":"He Li, Zander Gu, Said El Bouhaddani, Jeanine Houwing-Duistermaat","doi":"10.1186/s12859-025-06349-0","DOIUrl":"10.1186/s12859-025-06349-0","url":null,"abstract":"<p><strong>Background: </strong>In studies that aim to model the relationship between an outcome variable and multiple omics datasets, it is often desirable to reduce the dimensionality of these datasets or to represent one omics dataset in terms of another. Several approaches exist for this purpose, including univariate methods such as polygenic scores, and multivariate methods. Multivariate approaches offer advantages by producing lower-dimensional integrative scores, capturing joint structures across datasets, and filtering out dataset-specific noise. In this paper, we describe one univariate and two multivariate methods, and evaluate their performance through simulations involving two correlated multivariate normally distributed omics datasets, as well as a combination of one multivariate normal and one fixed categorical dataset.</p><p><strong>Results: </strong>We assess method performance using the root mean squared error (RMSE) when modelling the outcome variable as a function of the reduced omics representations. Multivariate methods generally perform well, particularly when a slightly higher number of components is used for integration. They outperform the univariate method in scenarios involving two normally distributed omics datasets and perform comparably in settings with one normal and one categorical dataset. In real data applications, including two metabolomics datasets from TwinsUK and a metabolomics-genetic dataset from ORCADES, all methods show similar performance in modelling body mass index.</p><p><strong>Conclusions: </strong>Multivariate methods provide a valuable framework for summarizing multi-omics datasets into low-dimensional components suitable for outcome modelling. Even in the presence of non-normal data, these methods offer a promising alternative to high-dimensional univariate approaches.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"26"},"PeriodicalIF":3.3,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12859906/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145826816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The rapid development of single-cell sequencing technologies has provided a robust technical support for the efficient resolution of multiple levels of molecular information from a single-cell population. However, the data produced by these technologies often contain a lot of noise and differences in characteristics that make it difficult to integrate and analyze single-cell multi-omics data. In this study, there is a growing demand for methods to integrate single-cell multi-omics data, which is expected to enhance the ability to reveal cellular heterogeneity and provide new biological perspectives for a deeper understanding of cellular phenotypes by jointly analyzing multi-omics data. We propose LONMF, a non-negative matrix factorization algorithm combining graph Laplacian and optimal transmission to enhance clustering performance and interpretability. We apply LONMF to visualize and cluster multi-pair single-cell multi-omics data, including 10X-multi-group, CITE-seq, and TEA-multi-group seq, to facilitate marker characterization and gene ontology enrichment analysis and to provide rich biological insights for downstream analyses. Our comprehensive benchmarking demonstrates that LONMF exhibits comparable performance compared with the current state-of-the-art in cell clustering and outperforms other methods in terms of biological interpretability.
{"title":"LONMF: a non-negative matrix factorization model based on graph Laplacian and optimal transmission for paired single-cell multi-omics data integration.","authors":"Mengdi Nan, Qing Ren, Yuhan Fu, Xiang Chen, Guanpeng Qi, Liugen Wang, Jie Gao","doi":"10.1186/s12859-025-06301-2","DOIUrl":"10.1186/s12859-025-06301-2","url":null,"abstract":"<p><p>The rapid development of single-cell sequencing technologies has provided a robust technical support for the efficient resolution of multiple levels of molecular information from a single-cell population. However, the data produced by these technologies often contain a lot of noise and differences in characteristics that make it difficult to integrate and analyze single-cell multi-omics data. In this study, there is a growing demand for methods to integrate single-cell multi-omics data, which is expected to enhance the ability to reveal cellular heterogeneity and provide new biological perspectives for a deeper understanding of cellular phenotypes by jointly analyzing multi-omics data. We propose LONMF, a non-negative matrix factorization algorithm combining graph Laplacian and optimal transmission to enhance clustering performance and interpretability. We apply LONMF to visualize and cluster multi-pair single-cell multi-omics data, including 10X-multi-group, CITE-seq, and TEA-multi-group seq, to facilitate marker characterization and gene ontology enrichment analysis and to provide rich biological insights for downstream analyses. Our comprehensive benchmarking demonstrates that LONMF exhibits comparable performance compared with the current state-of-the-art in cell clustering and outperforms other methods in terms of biological interpretability.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"294"},"PeriodicalIF":3.3,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12729160/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145817499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-23DOI: 10.1186/s12859-025-06308-9
Xingxin Chen, Zhuo Wang, Zhen Miao, Bin Nie
Background: Multi-drug combinations represent an effective strategy for treating complex diseases. However, due to the vast number of unknown interactions among drugs, accurately predicting drug-drug interactions (DDIs) is essential for preventing adverse drug reactions that may cause serious harm to patients. Therefore, DDI prediction plays a critical role in pharmacology.
Results: In this paper, we propose a novel DDI prediction model that integrates a self-attention mechanism with a capsule neural network, termed ACaps-DDI. The model effectively combines chemical information from internal drug substructures with biological information from external drug targets and drug-metabolizing enzymes to predict potential drug-drug interactions.
Conclusions: Experimental results on two benchmark datasets show that the ACaps-DDI model outperforms six other classification models across seven evaluation metrics, demonstrating its strong predictive performance and generalization ability. Ablation studies further confirm the effectiveness of individual components within the ACaps-DDI architecture. Finally, case studies involving three drugs (cannabidiol, torasemide, and cyclophosphamide) validate the model's ability to predict previously unknown drug interactions. In conclusion, the ACaps-DDI model exhibits high predictive accuracy for known drugs and demonstrates promising predictive capability for unseen drugs, highlighting its practical significance for clinical research on drug interactions.
{"title":"Research on drug-drug interaction prediction using capsule neural network based on self-attention mechanism.","authors":"Xingxin Chen, Zhuo Wang, Zhen Miao, Bin Nie","doi":"10.1186/s12859-025-06308-9","DOIUrl":"10.1186/s12859-025-06308-9","url":null,"abstract":"<p><strong>Background: </strong>Multi-drug combinations represent an effective strategy for treating complex diseases. However, due to the vast number of unknown interactions among drugs, accurately predicting drug-drug interactions (DDIs) is essential for preventing adverse drug reactions that may cause serious harm to patients. Therefore, DDI prediction plays a critical role in pharmacology.</p><p><strong>Results: </strong>In this paper, we propose a novel DDI prediction model that integrates a self-attention mechanism with a capsule neural network, termed ACaps-DDI. The model effectively combines chemical information from internal drug substructures with biological information from external drug targets and drug-metabolizing enzymes to predict potential drug-drug interactions.</p><p><strong>Conclusions: </strong>Experimental results on two benchmark datasets show that the ACaps-DDI model outperforms six other classification models across seven evaluation metrics, demonstrating its strong predictive performance and generalization ability. Ablation studies further confirm the effectiveness of individual components within the ACaps-DDI architecture. Finally, case studies involving three drugs (cannabidiol, torasemide, and cyclophosphamide) validate the model's ability to predict previously unknown drug interactions. In conclusion, the ACaps-DDI model exhibits high predictive accuracy for known drugs and demonstrates promising predictive capability for unseen drugs, highlighting its practical significance for clinical research on drug interactions.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"293"},"PeriodicalIF":3.3,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12729404/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145817508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-23DOI: 10.1186/s12859-025-06353-4
Benjamin Lieser, Georgy Belousov, Johannes Söding
Background: Most popular tools for reconstructing phylogenetic trees from multiple sequence alignments use a model of molecular evolution in which a single substitution matrix or a small set of fixed matrices are shared between all columns. Models with column-specific rate matrices can in principle be fit by automatic differentiation methods, but in practice the heavy computational burden associated with computing the gradients of the many matrix exponentials has hindered exploration of such models.
Implementation: Here, we present a highly efficient approach for reverse-mode differentiation of the log likelihood computed with Felsenstein's algorithm under any time-reversible substitution model. PhyloGrad is implemented in Rust and has Python bindings to easily combine it with automatic differentiation tools.
Results: Depending on the tree size, PhyloGrad is 30-100 times faster than automatic differentiation in Pytorch and uses 10-100 times less memory. Even in the task of fitting one global model it is still at least 10 times faster than IQ-TREE3. PhyloGrad accelerates current model optimizations and enables the field to easily explore and implement novel site-specific models.
{"title":"Phylograd: fast column-specific calculation of substitution model gradients.","authors":"Benjamin Lieser, Georgy Belousov, Johannes Söding","doi":"10.1186/s12859-025-06353-4","DOIUrl":"10.1186/s12859-025-06353-4","url":null,"abstract":"<p><strong>Background: </strong>Most popular tools for reconstructing phylogenetic trees from multiple sequence alignments use a model of molecular evolution in which a single substitution matrix or a small set of fixed matrices are shared between all columns. Models with column-specific rate matrices can in principle be fit by automatic differentiation methods, but in practice the heavy computational burden associated with computing the gradients of the many matrix exponentials has hindered exploration of such models.</p><p><strong>Implementation: </strong>Here, we present a highly efficient approach for reverse-mode differentiation of the log likelihood computed with Felsenstein's algorithm under any time-reversible substitution model. PhyloGrad is implemented in Rust and has Python bindings to easily combine it with automatic differentiation tools.</p><p><strong>Results: </strong>Depending on the tree size, PhyloGrad is 30-100 times faster than automatic differentiation in Pytorch and uses 10-100 times less memory. Even in the task of fitting one global model it is still at least 10 times faster than IQ-TREE3. PhyloGrad accelerates current model optimizations and enables the field to easily explore and implement novel site-specific models.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"20"},"PeriodicalIF":3.3,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145817545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.1186/s12859-025-06322-x
Jian Zhang, Jingjing Yang, Changlong Wen
Background: Kompetitive Allele-Specific PCR (KASP) is a fluorescence-based, high-throughput and cost-effective genotyping technology widely used for detecting single nucleotide polymorphisms (SNPs) and insertion-deletions (InDels) across various species. However, few software tools are available for automatically designing KASP primers, especially for InDel variations.
Results: To address the lack of free and user-friendly automated tools for KASP primer design, we analyzed the sequence characteristics of KASP primers and developed a user-friendly program named EasyKASP on the Excel VBA platform. EasyKASP designs KASP primers for both SNP and InDel variations, with an average processing time of only 0.03 s per primer pair. A total of 80 SNP loci and 6 InDel loci with variations of different lengths were selected to validate the KASP markers designed by EasyKASP, all of which were successfully amplified and genotyped using KASP technology.
Conclusions: EasyKASP is a simple and rapid tool for KASP primer design, demonstrating broad applicability in KASP genotyping studies.
{"title":"EasyKASP: a simple and fast tool for KASP primer design.","authors":"Jian Zhang, Jingjing Yang, Changlong Wen","doi":"10.1186/s12859-025-06322-x","DOIUrl":"10.1186/s12859-025-06322-x","url":null,"abstract":"<p><strong>Background: </strong>Kompetitive Allele-Specific PCR (KASP) is a fluorescence-based, high-throughput and cost-effective genotyping technology widely used for detecting single nucleotide polymorphisms (SNPs) and insertion-deletions (InDels) across various species. However, few software tools are available for automatically designing KASP primers, especially for InDel variations.</p><p><strong>Results: </strong>To address the lack of free and user-friendly automated tools for KASP primer design, we analyzed the sequence characteristics of KASP primers and developed a user-friendly program named EasyKASP on the Excel VBA platform. EasyKASP designs KASP primers for both SNP and InDel variations, with an average processing time of only 0.03 s per primer pair. A total of 80 SNP loci and 6 InDel loci with variations of different lengths were selected to validate the KASP markers designed by EasyKASP, all of which were successfully amplified and genotyped using KASP technology.</p><p><strong>Conclusions: </strong>EasyKASP is a simple and rapid tool for KASP primer design, demonstrating broad applicability in KASP genotyping studies.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"292"},"PeriodicalIF":3.3,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12717768/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145792807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Single-cell RNA sequencing (scRNA-seq) has revolutionized our understanding of the transcriptional landscape of complex tissues, enabling the discovery of novel cell types and biological functions. However, the identification and classification of cells from scRNA-seq datasets remain significant challenges.
Results: To address this, we developed a new computational tool called CIA (Cluster Independent Annotation), which accurately identifies cell types across different datasets without requiring a fully annotated reference dataset or complex machine learning processes. Based on predefined cell type signatures, CIA provides a highly user-friendly and practical solution to cell-type and functional annotation of single cells. The CIA framework is implemented in both the Python and R programming languages, making it applicable to all main single-cell analysis frameworks, and it is available under the MIT license with its documentation at the following links: Python package: https://pypi.org/project/cia-python/ . Python tutorial: https://cia-python.readthedocs.io/en/latest/tutorial/Cluster_Independent_Annotation.html . R package and tutorial: https://github.com/ingmbioinfo/CIA_R .
Conclusions: Our results demonstrate that CIA classification performances are comparable to the other state-of-the-art approaches, while requiring a significantly lower computational running time. Overall, CIA simplifies the process of obtaining reproducible signature-based cell assignments that can be easily interpreted through graphical summaries providing researchers with a powerful tool to explore the complex transcriptional landscape of single cells.
{"title":"CIA: unveiling cellular identities with cluster-independent annotation in single-cell RNA sequencing data for comprehensive cell type characterization and exploration.","authors":"Ivan Ferrari, Mattia Battistella, Francesca Vincenti, Andrea Gobbini, Federico Marini, Samuele Notarbartolo, Jole Costanza, Stefano Biffo, Renata Grifantini, Sergio Abrignani, Eugenia Galeota","doi":"10.1186/s12859-025-06320-z","DOIUrl":"10.1186/s12859-025-06320-z","url":null,"abstract":"<p><strong>Background: </strong>Single-cell RNA sequencing (scRNA-seq) has revolutionized our understanding of the transcriptional landscape of complex tissues, enabling the discovery of novel cell types and biological functions. However, the identification and classification of cells from scRNA-seq datasets remain significant challenges.</p><p><strong>Results: </strong>To address this, we developed a new computational tool called CIA (Cluster Independent Annotation), which accurately identifies cell types across different datasets without requiring a fully annotated reference dataset or complex machine learning processes. Based on predefined cell type signatures, CIA provides a highly user-friendly and practical solution to cell-type and functional annotation of single cells. The CIA framework is implemented in both the Python and R programming languages, making it applicable to all main single-cell analysis frameworks, and it is available under the MIT license with its documentation at the following links: Python package: https://pypi.org/project/cia-python/ . Python tutorial: https://cia-python.readthedocs.io/en/latest/tutorial/Cluster_Independent_Annotation.html . R package and tutorial: https://github.com/ingmbioinfo/CIA_R .</p><p><strong>Conclusions: </strong>Our results demonstrate that CIA classification performances are comparable to the other state-of-the-art approaches, while requiring a significantly lower computational running time. Overall, CIA simplifies the process of obtaining reproducible signature-based cell assignments that can be easily interpreted through graphical summaries providing researchers with a powerful tool to explore the complex transcriptional landscape of single cells.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"38"},"PeriodicalIF":3.3,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12875040/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145773458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}