Pub Date : 2025-12-23DOI: 10.1186/s12859-025-06308-9
Xingxin Chen, Zhuo Wang, Zhen Miao, Bin Nie
Background: Multi-drug combinations represent an effective strategy for treating complex diseases. However, due to the vast number of unknown interactions among drugs, accurately predicting drug-drug interactions (DDIs) is essential for preventing adverse drug reactions that may cause serious harm to patients. Therefore, DDI prediction plays a critical role in pharmacology.
Results: In this paper, we propose a novel DDI prediction model that integrates a self-attention mechanism with a capsule neural network, termed ACaps-DDI. The model effectively combines chemical information from internal drug substructures with biological information from external drug targets and drug-metabolizing enzymes to predict potential drug-drug interactions.
Conclusions: Experimental results on two benchmark datasets show that the ACaps-DDI model outperforms six other classification models across seven evaluation metrics, demonstrating its strong predictive performance and generalization ability. Ablation studies further confirm the effectiveness of individual components within the ACaps-DDI architecture. Finally, case studies involving three drugs (cannabidiol, torasemide, and cyclophosphamide) validate the model's ability to predict previously unknown drug interactions. In conclusion, the ACaps-DDI model exhibits high predictive accuracy for known drugs and demonstrates promising predictive capability for unseen drugs, highlighting its practical significance for clinical research on drug interactions.
{"title":"Research on drug-drug interaction prediction using capsule neural network based on self-attention mechanism.","authors":"Xingxin Chen, Zhuo Wang, Zhen Miao, Bin Nie","doi":"10.1186/s12859-025-06308-9","DOIUrl":"10.1186/s12859-025-06308-9","url":null,"abstract":"<p><strong>Background: </strong>Multi-drug combinations represent an effective strategy for treating complex diseases. However, due to the vast number of unknown interactions among drugs, accurately predicting drug-drug interactions (DDIs) is essential for preventing adverse drug reactions that may cause serious harm to patients. Therefore, DDI prediction plays a critical role in pharmacology.</p><p><strong>Results: </strong>In this paper, we propose a novel DDI prediction model that integrates a self-attention mechanism with a capsule neural network, termed ACaps-DDI. The model effectively combines chemical information from internal drug substructures with biological information from external drug targets and drug-metabolizing enzymes to predict potential drug-drug interactions.</p><p><strong>Conclusions: </strong>Experimental results on two benchmark datasets show that the ACaps-DDI model outperforms six other classification models across seven evaluation metrics, demonstrating its strong predictive performance and generalization ability. Ablation studies further confirm the effectiveness of individual components within the ACaps-DDI architecture. Finally, case studies involving three drugs (cannabidiol, torasemide, and cyclophosphamide) validate the model's ability to predict previously unknown drug interactions. In conclusion, the ACaps-DDI model exhibits high predictive accuracy for known drugs and demonstrates promising predictive capability for unseen drugs, highlighting its practical significance for clinical research on drug interactions.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"293"},"PeriodicalIF":3.3,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12729404/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145817508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-23DOI: 10.1186/s12859-025-06353-4
Benjamin Lieser, Georgy Belousov, Johannes Söding
Background: Most popular tools for reconstructing phylogenetic trees from multiple sequence alignments use a model of molecular evolution in which a single substitution matrix or a small set of fixed matrices are shared between all columns. Models with column-specific rate matrices can in principle be fit by automatic differentiation methods, but in practice the heavy computational burden associated with computing the gradients of the many matrix exponentials has hindered exploration of such models.
Implementation: Here, we present a highly efficient approach for reverse-mode differentiation of the log likelihood computed with Felsenstein's algorithm under any time-reversible substitution model. PhyloGrad is implemented in Rust and has Python bindings to easily combine it with automatic differentiation tools.
Results: Depending on the tree size, PhyloGrad is 30-100 times faster than automatic differentiation in Pytorch and uses 10-100 times less memory. Even in the task of fitting one global model it is still at least 10 times faster than IQ-TREE3. PhyloGrad accelerates current model optimizations and enables the field to easily explore and implement novel site-specific models.
{"title":"Phylograd: fast column-specific calculation of substitution model gradients.","authors":"Benjamin Lieser, Georgy Belousov, Johannes Söding","doi":"10.1186/s12859-025-06353-4","DOIUrl":"10.1186/s12859-025-06353-4","url":null,"abstract":"<p><strong>Background: </strong>Most popular tools for reconstructing phylogenetic trees from multiple sequence alignments use a model of molecular evolution in which a single substitution matrix or a small set of fixed matrices are shared between all columns. Models with column-specific rate matrices can in principle be fit by automatic differentiation methods, but in practice the heavy computational burden associated with computing the gradients of the many matrix exponentials has hindered exploration of such models.</p><p><strong>Implementation: </strong>Here, we present a highly efficient approach for reverse-mode differentiation of the log likelihood computed with Felsenstein's algorithm under any time-reversible substitution model. PhyloGrad is implemented in Rust and has Python bindings to easily combine it with automatic differentiation tools.</p><p><strong>Results: </strong>Depending on the tree size, PhyloGrad is 30-100 times faster than automatic differentiation in Pytorch and uses 10-100 times less memory. Even in the task of fitting one global model it is still at least 10 times faster than IQ-TREE3. PhyloGrad accelerates current model optimizations and enables the field to easily explore and implement novel site-specific models.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"20"},"PeriodicalIF":3.3,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145817545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19DOI: 10.1186/s12859-025-06322-x
Jian Zhang, Jingjing Yang, Changlong Wen
Background: Kompetitive Allele-Specific PCR (KASP) is a fluorescence-based, high-throughput and cost-effective genotyping technology widely used for detecting single nucleotide polymorphisms (SNPs) and insertion-deletions (InDels) across various species. However, few software tools are available for automatically designing KASP primers, especially for InDel variations.
Results: To address the lack of free and user-friendly automated tools for KASP primer design, we analyzed the sequence characteristics of KASP primers and developed a user-friendly program named EasyKASP on the Excel VBA platform. EasyKASP designs KASP primers for both SNP and InDel variations, with an average processing time of only 0.03 s per primer pair. A total of 80 SNP loci and 6 InDel loci with variations of different lengths were selected to validate the KASP markers designed by EasyKASP, all of which were successfully amplified and genotyped using KASP technology.
Conclusions: EasyKASP is a simple and rapid tool for KASP primer design, demonstrating broad applicability in KASP genotyping studies.
{"title":"EasyKASP: a simple and fast tool for KASP primer design.","authors":"Jian Zhang, Jingjing Yang, Changlong Wen","doi":"10.1186/s12859-025-06322-x","DOIUrl":"10.1186/s12859-025-06322-x","url":null,"abstract":"<p><strong>Background: </strong>Kompetitive Allele-Specific PCR (KASP) is a fluorescence-based, high-throughput and cost-effective genotyping technology widely used for detecting single nucleotide polymorphisms (SNPs) and insertion-deletions (InDels) across various species. However, few software tools are available for automatically designing KASP primers, especially for InDel variations.</p><p><strong>Results: </strong>To address the lack of free and user-friendly automated tools for KASP primer design, we analyzed the sequence characteristics of KASP primers and developed a user-friendly program named EasyKASP on the Excel VBA platform. EasyKASP designs KASP primers for both SNP and InDel variations, with an average processing time of only 0.03 s per primer pair. A total of 80 SNP loci and 6 InDel loci with variations of different lengths were selected to validate the KASP markers designed by EasyKASP, all of which were successfully amplified and genotyped using KASP technology.</p><p><strong>Conclusions: </strong>EasyKASP is a simple and rapid tool for KASP primer design, demonstrating broad applicability in KASP genotyping studies.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"292"},"PeriodicalIF":3.3,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12717768/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145792807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Single-cell RNA sequencing (scRNA-seq) has revolutionized our understanding of the transcriptional landscape of complex tissues, enabling the discovery of novel cell types and biological functions. However, the identification and classification of cells from scRNA-seq datasets remain significant challenges.
Results: To address this, we developed a new computational tool called CIA (Cluster Independent Annotation), which accurately identifies cell types across different datasets without requiring a fully annotated reference dataset or complex machine learning processes. Based on predefined cell type signatures, CIA provides a highly user-friendly and practical solution to cell-type and functional annotation of single cells. The CIA framework is implemented in both the Python and R programming languages, making it applicable to all main single-cell analysis frameworks, and it is available under the MIT license with its documentation at the following links: Python package: https://pypi.org/project/cia-python/ . Python tutorial: https://cia-python.readthedocs.io/en/latest/tutorial/Cluster_Independent_Annotation.html . R package and tutorial: https://github.com/ingmbioinfo/CIA_R .
Conclusions: Our results demonstrate that CIA classification performances are comparable to the other state-of-the-art approaches, while requiring a significantly lower computational running time. Overall, CIA simplifies the process of obtaining reproducible signature-based cell assignments that can be easily interpreted through graphical summaries providing researchers with a powerful tool to explore the complex transcriptional landscape of single cells.
{"title":"CIA: unveiling cellular identities with cluster-independent annotation in single-cell RNA sequencing data for comprehensive cell type characterization and exploration.","authors":"Ivan Ferrari, Mattia Battistella, Francesca Vincenti, Andrea Gobbini, Federico Marini, Samuele Notarbartolo, Jole Costanza, Stefano Biffo, Renata Grifantini, Sergio Abrignani, Eugenia Galeota","doi":"10.1186/s12859-025-06320-z","DOIUrl":"10.1186/s12859-025-06320-z","url":null,"abstract":"<p><strong>Background: </strong>Single-cell RNA sequencing (scRNA-seq) has revolutionized our understanding of the transcriptional landscape of complex tissues, enabling the discovery of novel cell types and biological functions. However, the identification and classification of cells from scRNA-seq datasets remain significant challenges.</p><p><strong>Results: </strong>To address this, we developed a new computational tool called CIA (Cluster Independent Annotation), which accurately identifies cell types across different datasets without requiring a fully annotated reference dataset or complex machine learning processes. Based on predefined cell type signatures, CIA provides a highly user-friendly and practical solution to cell-type and functional annotation of single cells. The CIA framework is implemented in both the Python and R programming languages, making it applicable to all main single-cell analysis frameworks, and it is available under the MIT license with its documentation at the following links: Python package: https://pypi.org/project/cia-python/ . Python tutorial: https://cia-python.readthedocs.io/en/latest/tutorial/Cluster_Independent_Annotation.html . R package and tutorial: https://github.com/ingmbioinfo/CIA_R .</p><p><strong>Conclusions: </strong>Our results demonstrate that CIA classification performances are comparable to the other state-of-the-art approaches, while requiring a significantly lower computational running time. Overall, CIA simplifies the process of obtaining reproducible signature-based cell assignments that can be easily interpreted through graphical summaries providing researchers with a powerful tool to explore the complex transcriptional landscape of single cells.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"38"},"PeriodicalIF":3.3,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12875040/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145773458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-17DOI: 10.1186/s12859-025-06350-7
Jack Freeman, Robert J Millikin, Leo Xu, Ishaan Sharma, Bethany Moore, Cannon Lock, Kevin Shine George, Aviral Bal, Chitrasen Mohanty, Ron Stewart
{"title":"SKiM-GPT: combining biomedical literature-based discovery with large language model hypothesis evaluation.","authors":"Jack Freeman, Robert J Millikin, Leo Xu, Ishaan Sharma, Bethany Moore, Cannon Lock, Kevin Shine George, Aviral Bal, Chitrasen Mohanty, Ron Stewart","doi":"10.1186/s12859-025-06350-7","DOIUrl":"10.1186/s12859-025-06350-7","url":null,"abstract":"","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"16"},"PeriodicalIF":3.3,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12829140/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145773432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-12DOI: 10.1186/s12859-025-06332-9
Jacob Pfeil, Liqian Ma, Hin Ching Lo, Tolga Turan, R Tyler McLaughlin, Xu Shi, Severiano Villarruel, Stephen Wilson, Xi Zhao, Josue Samayoa, Kyle Halliwill
{"title":"Pairwise ratio transformation of gene expression data leads to improved checkpoint response prediction in lung cancer patients.","authors":"Jacob Pfeil, Liqian Ma, Hin Ching Lo, Tolga Turan, R Tyler McLaughlin, Xu Shi, Severiano Villarruel, Stephen Wilson, Xi Zhao, Josue Samayoa, Kyle Halliwill","doi":"10.1186/s12859-025-06332-9","DOIUrl":"10.1186/s12859-025-06332-9","url":null,"abstract":"","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"15"},"PeriodicalIF":3.3,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12809930/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145740262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-09DOI: 10.1186/s12859-025-06343-6
Jie Kang, Melanie K Hess, Ken G Dodds, Rudiger Brauning, John C McEwan, Barry J Foote, Judy F Foote, Agnieszka Konkolewska, Shannon M Clarke, Andrew S Hess
{"title":"SimGBS: a rapid method for simulating large-scale genotyping-by-sequencing data.","authors":"Jie Kang, Melanie K Hess, Ken G Dodds, Rudiger Brauning, John C McEwan, Barry J Foote, Judy F Foote, Agnieszka Konkolewska, Shannon M Clarke, Andrew S Hess","doi":"10.1186/s12859-025-06343-6","DOIUrl":"10.1186/s12859-025-06343-6","url":null,"abstract":"","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"12"},"PeriodicalIF":3.3,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12801997/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145712957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-09DOI: 10.1186/s12859-025-06313-y
Jiahui Sun, Shengli Wu, Xiangjun Shen, Chris Nugent, Hu Lu
To improve the effectiveness and efficiency of biomedical information retrieval by proposing ranking-based methods for selecting an optimal subset of retrieval systems for data fusion, we propose three ranking-based subset selection methods SFS (Sequential Forward Search), D&P (Diversity & Performance), and P&D (Performance & Diversity). These methods were applied in combination with the Reciprocal Rank Fusion technique. Experiments were conducted on four medical datasets from TREC, using between 62 and 125 candidate retrieval systems, and selecting up to 15 for fusion. The proposed subset selection methods significantly improved retrieval performance. Fusing the selected systems using RRF yielded improvements ranging from 10% to over 60% compared to the best individual retrieval system across the datasets. They also outperform the state-of-the-art technology by a large margin. In summary, our subset selection approach offers a practical and cost-efficient solution for biomedical information retrieval, achieving substantial performance gains while reducing computational overhead.
{"title":"Subset selection based fusion for biomedical information retrieval tasks.","authors":"Jiahui Sun, Shengli Wu, Xiangjun Shen, Chris Nugent, Hu Lu","doi":"10.1186/s12859-025-06313-y","DOIUrl":"10.1186/s12859-025-06313-y","url":null,"abstract":"<p><p>To improve the effectiveness and efficiency of biomedical information retrieval by proposing ranking-based methods for selecting an optimal subset of retrieval systems for data fusion, we propose three ranking-based subset selection methods SFS (Sequential Forward Search), D&P (Diversity & Performance), and P&D (Performance & Diversity). These methods were applied in combination with the Reciprocal Rank Fusion technique. Experiments were conducted on four medical datasets from TREC, using between 62 and 125 candidate retrieval systems, and selecting up to 15 for fusion. The proposed subset selection methods significantly improved retrieval performance. Fusing the selected systems using RRF yielded improvements ranging from 10% to over 60% compared to the best individual retrieval system across the datasets. They also outperform the state-of-the-art technology by a large margin. In summary, our subset selection approach offers a practical and cost-efficient solution for biomedical information retrieval, achieving substantial performance gains while reducing computational overhead.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"11"},"PeriodicalIF":3.3,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12801601/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145712972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}