Pub Date : 2026-01-22DOI: 10.1186/s12859-026-06379-2
Chiara Schiller, Matthias Lemmer, Sonja Reitter, Janina A Lehmann, Kai Fenzl, Johanna Schott
Background: Polysome profiling is a widespread technique to study mRNA translation. After separation of cellular particles by ultracentrifugation on a sucrose-density gradient, a UV absorbance profile is recorded during elution, which mostly reflects RNA content and shows distinct peaks for ribosomal subunits, monosomes and polysomes with increasing number of ribosomes. This profile can be used to assess global translational activity, or to reveal changes in ribosome biogenesis and translation elongation. In addition, it is also possible to measure the association of fluorescently tagged proteins with ribosomal subunits or polysomes. Alignment and quantification of polysome profiles usually relies on spreadsheet programs, custom R/Python scripts or commercial software.
Results: With QuAPPro, we present the first interactive web app that allows quantification and alignment of polysome profiles, independently of the device or software that was used to generate the profiles. QuAPPro was written in R, with a graphical user interface implemented in R shiny. It supports interactive visualization and analysis of polysome profiles, including profile smoothing, baseline selection, alignment along a defined point on the x-axis, quantification of profile subsections and deconvolution for resolving individual peaks. Fluorescence profiles can be aligned and quantified in parallel. Finally, quantification results can be summarized and visualized as bar plots. Every interactive plot can be exported directly in a publication-ready format.
Conclusions: This user-friendly tool does not only speed up the analysis of polysome profiles but also facilitates reproducibility and documentation of the process, without the need for programming abilities or commercial software.
{"title":"QuAPPro: an R shiny app for quantification and alignment of polysome profiles.","authors":"Chiara Schiller, Matthias Lemmer, Sonja Reitter, Janina A Lehmann, Kai Fenzl, Johanna Schott","doi":"10.1186/s12859-026-06379-2","DOIUrl":"10.1186/s12859-026-06379-2","url":null,"abstract":"<p><strong>Background: </strong>Polysome profiling is a widespread technique to study mRNA translation. After separation of cellular particles by ultracentrifugation on a sucrose-density gradient, a UV absorbance profile is recorded during elution, which mostly reflects RNA content and shows distinct peaks for ribosomal subunits, monosomes and polysomes with increasing number of ribosomes. This profile can be used to assess global translational activity, or to reveal changes in ribosome biogenesis and translation elongation. In addition, it is also possible to measure the association of fluorescently tagged proteins with ribosomal subunits or polysomes. Alignment and quantification of polysome profiles usually relies on spreadsheet programs, custom R/Python scripts or commercial software.</p><p><strong>Results: </strong>With QuAPPro, we present the first interactive web app that allows quantification and alignment of polysome profiles, independently of the device or software that was used to generate the profiles. QuAPPro was written in R, with a graphical user interface implemented in R shiny. It supports interactive visualization and analysis of polysome profiles, including profile smoothing, baseline selection, alignment along a defined point on the x-axis, quantification of profile subsections and deconvolution for resolving individual peaks. Fluorescence profiles can be aligned and quantified in parallel. Finally, quantification results can be summarized and visualized as bar plots. Every interactive plot can be exported directly in a publication-ready format.</p><p><strong>Conclusions: </strong>This user-friendly tool does not only speed up the analysis of polysome profiles but also facilitates reproducibility and documentation of the process, without the need for programming abilities or commercial software.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"22"},"PeriodicalIF":3.3,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12849056/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146028149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1186/s12859-026-06368-5
Gilles Sireta, Gwendal Cueff, Vincent Darbot, Marie Lefebvre, Simon Amiard, Aline V Probst, Christophe Tatout
Background: Traditional short-read RNA-Seq analysis pipelines predominantly focus on protein-coding genes, often overlooking other genomic sequences such as transposable elements (TEs) and non-coding RNA dynamics and do not usually investigate splicing events or transcript usage. To fully capture the complexity of the transcriptome, and in particular transcriptomic regulation, it is crucial to adopt a comprehensive approach that integrates these diverse aspects, providing a more complete and nuanced understanding of expression dynamics in the studied organism.
Results: To address these limitations, we present CRESCENT (Comprehensive RNA-seq Expression, Splicing, and Coding/non-coding Element Network Tool), a Snakemake workflow capable of performing a fully automated and comprehensive RNA-Seq analysis. CRESCENT integrates multiple tools at each step of the workflow and enables analysis of differential expression, differential alternative splicing, differential transcript usage, and gene ontology-based functional enrichment for all three. The workflow takes advantage of multiple Snakemake wrappers to minimize required installations for the user, integrating the latest versions of popular bioinformatic tools. It can be run for a complete analysis or for only a specific part in accordance with the configuration file provided by the user. The CRESCENT workflow was validated, demonstrating the pipeline's reliability, as differentially expressed protein-coding genes, TEs and differential alternative splicing events were consistent with previously published datasets. Finally, benchmarking CRESCENT performance indicated that it can be run on a personal computer or a remote server, including a high-performance computing cluster, allowing a user to process small single-end sequencing on species possessing a small genome like Arabidopsis thaliana to very large paired-end sequencing on polyploid species like wheat.
Conclusion and availability: CRESCENT is a scalable solution for comprehensive transcriptomic profiling. It is freely available at https://github.com/gilless429/crescent.
{"title":"CRESCENT, a comprehensive RNA-Seq expression, splicing, and coding/non-coding element network tool.","authors":"Gilles Sireta, Gwendal Cueff, Vincent Darbot, Marie Lefebvre, Simon Amiard, Aline V Probst, Christophe Tatout","doi":"10.1186/s12859-026-06368-5","DOIUrl":"https://doi.org/10.1186/s12859-026-06368-5","url":null,"abstract":"<p><strong>Background: </strong>Traditional short-read RNA-Seq analysis pipelines predominantly focus on protein-coding genes, often overlooking other genomic sequences such as transposable elements (TEs) and non-coding RNA dynamics and do not usually investigate splicing events or transcript usage. To fully capture the complexity of the transcriptome, and in particular transcriptomic regulation, it is crucial to adopt a comprehensive approach that integrates these diverse aspects, providing a more complete and nuanced understanding of expression dynamics in the studied organism.</p><p><strong>Results: </strong>To address these limitations, we present CRESCENT (Comprehensive RNA-seq Expression, Splicing, and Coding/non-coding Element Network Tool), a Snakemake workflow capable of performing a fully automated and comprehensive RNA-Seq analysis. CRESCENT integrates multiple tools at each step of the workflow and enables analysis of differential expression, differential alternative splicing, differential transcript usage, and gene ontology-based functional enrichment for all three. The workflow takes advantage of multiple Snakemake wrappers to minimize required installations for the user, integrating the latest versions of popular bioinformatic tools. It can be run for a complete analysis or for only a specific part in accordance with the configuration file provided by the user. The CRESCENT workflow was validated, demonstrating the pipeline's reliability, as differentially expressed protein-coding genes, TEs and differential alternative splicing events were consistent with previously published datasets. Finally, benchmarking CRESCENT performance indicated that it can be run on a personal computer or a remote server, including a high-performance computing cluster, allowing a user to process small single-end sequencing on species possessing a small genome like Arabidopsis thaliana to very large paired-end sequencing on polyploid species like wheat.</p><p><strong>Conclusion and availability: </strong>CRESCENT is a scalable solution for comprehensive transcriptomic profiling. It is freely available at https://github.com/gilless429/crescent.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":""},"PeriodicalIF":3.3,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146017181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1186/s12859-025-06352-5
Sebastian Raubach, Miriam Schreiber, Ruth Hamilton, Gaynor McKenzie, Susan McCallum, Benjamin Kilian, Alan Humphries, Loi Huu Nguyen, Tin Huynh Quang, Akanksha Singh, Shivali Sharma, Sarah Trinder, Manuel Feser, Paul D Shaw
Background: Accurate acquisition of phenotypic data is critical for cataloguing and utilising genetic variation in cultivated crops, landraces, and their wild relatives. The collection of phenotypic data using handwritten notes often introduces errors which can and should be avoided. Electronic data collection is crucial for ensuring error prevention and data standardisation and thus ensuring high-quality, reliable data.
Implementation: This paper describes the development of GridScore NEXT, a new plant phenotyping application that significantly advances the state of the art for collecting field trial data in plant genetics, pre-breeding and crop improvement research. Building on its predecessor, GridScore, the development of GridScore NEXT was driven by real life, in the field interactions with expert user groups across a number of crops. This iterative design methodology allowed the development and testing of new features. Collaborators from the 'Biodiversity for Opportunities, Livelihoods and Development' (BOLD) project, focusing on crops including rice, grasspea, and alfalfa, along with barley, potato, vegetable and blueberry teams, provided invaluable insights through training sessions and interviews and in the field use of the application.
Results: Key improvements to GridScore NEXT include enhanced data collection tools, supporting individual plant phenotyping within plots and enabling new data types such as GPS coordinates and image traits. GridScore NEXT provides customisable user defined validation rules to help prevent errors and incorporates barcode scanning for accurate, efficient data capture. The application offers an increased toolbox of data visualizations over its predecessor including heatmaps and statistical box plots, which aid in identifying potential data issues and understanding trial performance in the field. GridScore NEXT is cross-platform and can operate without an internet connection, making it ideal for field use in remote areas. Its adoption has led to standardisation of methods, significant error reduction, and the timely sharing of data, enabling quicker decision-making in pre-breeding and characterisation experiments. GridScore NEXT is available under an open-source (Apache 2.0) licence and freely available to all with no restrictions. It offers self-hosting options for enhanced data security and privacy. GridScore NEXT shows broad applicability across a diverse range of not only plant phenotyping experiments, but any experiment that requires the collection of accurate data.
{"title":"Beyond the clipboard: data collection with GridScore NEXT.","authors":"Sebastian Raubach, Miriam Schreiber, Ruth Hamilton, Gaynor McKenzie, Susan McCallum, Benjamin Kilian, Alan Humphries, Loi Huu Nguyen, Tin Huynh Quang, Akanksha Singh, Shivali Sharma, Sarah Trinder, Manuel Feser, Paul D Shaw","doi":"10.1186/s12859-025-06352-5","DOIUrl":"https://doi.org/10.1186/s12859-025-06352-5","url":null,"abstract":"<p><strong>Background: </strong>Accurate acquisition of phenotypic data is critical for cataloguing and utilising genetic variation in cultivated crops, landraces, and their wild relatives. The collection of phenotypic data using handwritten notes often introduces errors which can and should be avoided. Electronic data collection is crucial for ensuring error prevention and data standardisation and thus ensuring high-quality, reliable data.</p><p><strong>Implementation: </strong>This paper describes the development of GridScore NEXT, a new plant phenotyping application that significantly advances the state of the art for collecting field trial data in plant genetics, pre-breeding and crop improvement research. Building on its predecessor, GridScore, the development of GridScore NEXT was driven by real life, in the field interactions with expert user groups across a number of crops. This iterative design methodology allowed the development and testing of new features. Collaborators from the 'Biodiversity for Opportunities, Livelihoods and Development' (BOLD) project, focusing on crops including rice, grasspea, and alfalfa, along with barley, potato, vegetable and blueberry teams, provided invaluable insights through training sessions and interviews and in the field use of the application.</p><p><strong>Results: </strong>Key improvements to GridScore NEXT include enhanced data collection tools, supporting individual plant phenotyping within plots and enabling new data types such as GPS coordinates and image traits. GridScore NEXT provides customisable user defined validation rules to help prevent errors and incorporates barcode scanning for accurate, efficient data capture. The application offers an increased toolbox of data visualizations over its predecessor including heatmaps and statistical box plots, which aid in identifying potential data issues and understanding trial performance in the field. GridScore NEXT is cross-platform and can operate without an internet connection, making it ideal for field use in remote areas. Its adoption has led to standardisation of methods, significant error reduction, and the timely sharing of data, enabling quicker decision-making in pre-breeding and characterisation experiments. GridScore NEXT is available under an open-source (Apache 2.0) licence and freely available to all with no restrictions. It offers self-hosting options for enhanced data security and privacy. GridScore NEXT shows broad applicability across a diverse range of not only plant phenotyping experiments, but any experiment that requires the collection of accurate data.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":""},"PeriodicalIF":3.3,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146017147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-20DOI: 10.1186/s12859-025-06346-3
Mehmet Ali Balikci, Cyrille Mesue Njume, Ali Cakmak
Biomarkers play a pivotal role in disease diagnosis and prognosis by offering molecular insights into biological states. The rapid growth of high-throughput omics technologies has enabled the generation of large-scale biomarker datasets, yet analyzing these complex, high-dimensional data remains a major challenge-particularly for researchers lacking advanced computational expertise. While numerous tools exist for omics data analysis, many fall short in providing an integrated, user-friendly environment tailored specifically for biomarker discovery and interpretation. To address this gap, we present BioMark, a web-based platform designed to streamline biomarker analysis across diverse omics types. BioMark integrates robust statistical methods with widely used machine learning algorithms to support key workflows including statistical analysis, dimensionality reduction, classification, and subsequent model explanation. The platform emphasizes accessibility, offering intuitive visualizations and automated reporting to facilitate interpretation and dissemination of results. Notably, BioMark also offers a feature-ranking strategy that consolidates outputs from multiple analytical methods, enhancing the robustness of biomarker identification. By lowering the barrier to advanced biomarker analytics, BioMark empowers a broader range of researchers to uncover clinically relevant molecular signatures and accelerate translational research. Biomark is available online at https://bioinf.itu.edu.tr/biomark.
{"title":"BioMark: biomarker analysis tool.","authors":"Mehmet Ali Balikci, Cyrille Mesue Njume, Ali Cakmak","doi":"10.1186/s12859-025-06346-3","DOIUrl":"https://doi.org/10.1186/s12859-025-06346-3","url":null,"abstract":"<p><p>Biomarkers play a pivotal role in disease diagnosis and prognosis by offering molecular insights into biological states. The rapid growth of high-throughput omics technologies has enabled the generation of large-scale biomarker datasets, yet analyzing these complex, high-dimensional data remains a major challenge-particularly for researchers lacking advanced computational expertise. While numerous tools exist for omics data analysis, many fall short in providing an integrated, user-friendly environment tailored specifically for biomarker discovery and interpretation. To address this gap, we present BioMark, a web-based platform designed to streamline biomarker analysis across diverse omics types. BioMark integrates robust statistical methods with widely used machine learning algorithms to support key workflows including statistical analysis, dimensionality reduction, classification, and subsequent model explanation. The platform emphasizes accessibility, offering intuitive visualizations and automated reporting to facilitate interpretation and dissemination of results. Notably, BioMark also offers a feature-ranking strategy that consolidates outputs from multiple analytical methods, enhancing the robustness of biomarker identification. By lowering the barrier to advanced biomarker analytics, BioMark empowers a broader range of researchers to uncover clinically relevant molecular signatures and accelerate translational research. Biomark is available online at https://bioinf.itu.edu.tr/biomark.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":""},"PeriodicalIF":3.3,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146008491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-19DOI: 10.1186/s12859-025-06330-x
Violeta de Anca Prado, Fábio Pértille, Pedro Sá, Marta Gòdia, Joëlle Rüegg, Josep C Jimenez-Chillaron, Carlos Guerrero-Bosagna
{"title":"Benchmarking of methods to analyse data derived from GBS-MeDIP.","authors":"Violeta de Anca Prado, Fábio Pértille, Pedro Sá, Marta Gòdia, Joëlle Rüegg, Josep C Jimenez-Chillaron, Carlos Guerrero-Bosagna","doi":"10.1186/s12859-025-06330-x","DOIUrl":"10.1186/s12859-025-06330-x","url":null,"abstract":"","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"17"},"PeriodicalIF":3.3,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12829230/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146002877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1186/s12859-025-06264-4
Laura Aviñó-Esteban, Heura Cardona-Blaya, Marco Musy, Antoni Matyjaszkiewicz, James Sharpe, Giovanni Dalmasso
Background: Although some aspects of limb development can be treated as a 2D problem, a true understanding of the morphogenesis and patterning requires 3D analysis. Since the data on gene expression patterns are largely static 3D image stacks, a major challenge is an efficient pipeline for staging each data-set, and then aligning and warping the data into a standard atlas for convenient visualisation.
Results: We present a novel bioinformatic pipeline tailored for 3D visualization and analysis of developing limb buds. The pipeline integrates key steps such as data acquisition, volume cleaning, surface extraction, staging, alignment, and advanced visualization techniques. Its modular design allows researchers to customize workflows while maintaining compatibility with tools such as Fiji and Vedo. The pipeline can be accessed at https://github.com/LauAvinyo/limblab .
Conclusions: The pipeline advances 3D gene expression analysis in limb development by integrating flexible tools for staging, alignment, and visualization. It is user-friendly, scalable to other samples, and optimized for research needs. Future updates will enhance customization and expand applicability to other species and developmental biology fields.
{"title":"Limblab: pipeline for 3D analysis and visualisation of limb bud gene expression.","authors":"Laura Aviñó-Esteban, Heura Cardona-Blaya, Marco Musy, Antoni Matyjaszkiewicz, James Sharpe, Giovanni Dalmasso","doi":"10.1186/s12859-025-06264-4","DOIUrl":"10.1186/s12859-025-06264-4","url":null,"abstract":"<p><strong>Background: </strong>Although some aspects of limb development can be treated as a 2D problem, a true understanding of the morphogenesis and patterning requires 3D analysis. Since the data on gene expression patterns are largely static 3D image stacks, a major challenge is an efficient pipeline for staging each data-set, and then aligning and warping the data into a standard atlas for convenient visualisation.</p><p><strong>Results: </strong>We present a novel bioinformatic pipeline tailored for 3D visualization and analysis of developing limb buds. The pipeline integrates key steps such as data acquisition, volume cleaning, surface extraction, staging, alignment, and advanced visualization techniques. Its modular design allows researchers to customize workflows while maintaining compatibility with tools such as Fiji and Vedo. The pipeline can be accessed at https://github.com/LauAvinyo/limblab .</p><p><strong>Conclusions: </strong>The pipeline advances 3D gene expression analysis in limb development by integrating flexible tools for staging, alignment, and visualization. It is user-friendly, scalable to other samples, and optimized for research needs. Future updates will enhance customization and expand applicability to other species and developmental biology fields.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"27 1","pages":"6"},"PeriodicalIF":3.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12794269/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145958550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-09DOI: 10.1186/s12859-025-06337-4
Naafey Aamer, Muhammad Nabeel Asim, Andreas Dengel
{"title":"Comic: explainable drug repurposing via contrastive masking for interpretable connections.","authors":"Naafey Aamer, Muhammad Nabeel Asim, Andreas Dengel","doi":"10.1186/s12859-025-06337-4","DOIUrl":"10.1186/s12859-025-06337-4","url":null,"abstract":"","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"24"},"PeriodicalIF":3.3,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12849513/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145942461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-08DOI: 10.1186/s12859-025-06361-4
David H Rogers, Cullen Roth, Cameron Tauxe, Jeannie T Lee, Christina R Steadman, Karissa Y Sanbonmatsu, Anna Lappala, Shawn R Starkenburg
{"title":"From 2D to 4D: a containerized workflow and browser to explore dynamic chromatin architecture.","authors":"David H Rogers, Cullen Roth, Cameron Tauxe, Jeannie T Lee, Christina R Steadman, Karissa Y Sanbonmatsu, Anna Lappala, Shawn R Starkenburg","doi":"10.1186/s12859-025-06361-4","DOIUrl":"10.1186/s12859-025-06361-4","url":null,"abstract":"","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"37"},"PeriodicalIF":3.3,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12870729/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145932001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1186/s12859-025-06358-z
Erik D Huckvale, Hunter N B Moseley
Background: The associations of metabolites with biochemical pathways are highly useful information for interpreting molecular datasets generated in biological and biomedical research. However, such pathway annotations are sparse in most molecular datasets, limiting their utility for pathway level interpretation. To address these shortcomings, several past publications have presented machine learning models for predicting the pathway association of small biomolecule (metabolite and xenobiotic) using data from the Kyoto Encyclopedia of Genes and Genomes (KEGG). But other similar knowledgebases exist, for example MetaCyc, which has more compound entries and pathway definitions than KEGG.
Results: As a logical next step, we trained and evaluated multilayer perceptron models on compound entries and pathway annotations obtained from MetaCyc. From the models trained on this dataset, we observed a mean Matthews correlation coefficient (MCC) of 0.845 with 0.0101 standard deviation, compared to a mean MCC of 0.847 with 0.0098 standard deviation for the KEGG dataset. However, KEGG's 184 metabolic-only pathway predictions (out of 502 total pathways) have a mean MCC of 0.800 with 0.021 standard deviation. Since MetaCyc pathways are metabolic focused, the MetaCyc results represent over a 5.6% improvement in metabolic pathway prediction performance.
Conclusions: These performance results are pragmatically the same, demonstrating that in aggregate, the 4055 MetaCyc pathways can be effectively predicted at the current state-of-the-art performance level.
{"title":"Predicting the pathway involvement of metabolites annotated in the MetaCyc knowledgebase.","authors":"Erik D Huckvale, Hunter N B Moseley","doi":"10.1186/s12859-025-06358-z","DOIUrl":"10.1186/s12859-025-06358-z","url":null,"abstract":"<p><strong>Background: </strong>The associations of metabolites with biochemical pathways are highly useful information for interpreting molecular datasets generated in biological and biomedical research. However, such pathway annotations are sparse in most molecular datasets, limiting their utility for pathway level interpretation. To address these shortcomings, several past publications have presented machine learning models for predicting the pathway association of small biomolecule (metabolite and xenobiotic) using data from the Kyoto Encyclopedia of Genes and Genomes (KEGG). But other similar knowledgebases exist, for example MetaCyc, which has more compound entries and pathway definitions than KEGG.</p><p><strong>Results: </strong>As a logical next step, we trained and evaluated multilayer perceptron models on compound entries and pathway annotations obtained from MetaCyc. From the models trained on this dataset, we observed a mean Matthews correlation coefficient (MCC) of 0.845 with 0.0101 standard deviation, compared to a mean MCC of 0.847 with 0.0098 standard deviation for the KEGG dataset. However, KEGG's 184 metabolic-only pathway predictions (out of 502 total pathways) have a mean MCC of 0.800 with 0.021 standard deviation. Since MetaCyc pathways are metabolic focused, the MetaCyc results represent over a 5.6% improvement in metabolic pathway prediction performance.</p><p><strong>Conclusions: </strong>These performance results are pragmatically the same, demonstrating that in aggregate, the 4055 MetaCyc pathways can be effectively predicted at the current state-of-the-art performance level.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"36"},"PeriodicalIF":3.3,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12870939/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145910342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Drug combination therapy often outperforms monotherapy in cancer treatment, but the vast number of available drugs makes manual screening for synergistic combinations costly. Computational methods, especially deep learning, can reduce the search space by predicting likely synergistic drug combinations. Recent studies have improved drug synergy prediction by modeling associations among different biological entities, but drug-drug interactions have not been fully leveraged in this scenario, which motivated the work presented in this paper.
Methods: This paper proposes a deep learning method named HGTSynergy to predict synergistic drug combinations, which employs a heterogeneous graph attention network and a tailored task to capture complex latent patterns in the drug network as prior knowledge. The learned knowledge is then transferred through a transfer learning framework to the downstream task of predicting drug synergy scores, effectively enhancing predictive performance.
Results: A five-fold nested cross-validation is employed to train HGTSynergy. In the synergy regression task, HGTSynergy outperforms seven deep learning methods, achieving a mean squared error of 222.83, root mean squared error of 14.91, and Pearson correlation coefficient of 0.75. For the synergy classification task, it also surpasses other methods with an area under the receiver operating characteristic curve of 0.90, area under the precision-recall curve of 0.63, accuracy of 0.94, precision of 0.72, and Cohen's Kappa of 0.52. The ablation study verifies that the heterogeneous graph attention network and the transfer learning framework both have a positive effect on prediction performance. Moreover, a series of analyses demonstrates that the proposed method exhibits strong generalization performance and interpretability. The case study further validates its consistency with prior research.
Conclusions: This study suggests that drug synergy prediction can be improved by comprehensively modeling diverse drug-drug interaction types and leveraging transfer learning to extract prior knowledge from them. The ability of HGTSynergy to discover new anticancer synergistic drug combinations outperforms other state-of-the-art methods. HGTSynergy promises to be a powerful tool to pre-screen anticancer synergistic drug combinations.
{"title":"Hgtsynergy: a transfer learning method for predicting anticancer synergistic drug combinations based on a drug-drug interaction heterogeneous graph.","authors":"Xiaowen Wang, Yanming Huang, Hongming Zhu, Dongsheng Mao, Xiaoli Zhu, Qin Liu","doi":"10.1186/s12859-025-06360-5","DOIUrl":"10.1186/s12859-025-06360-5","url":null,"abstract":"<p><strong>Background: </strong>Drug combination therapy often outperforms monotherapy in cancer treatment, but the vast number of available drugs makes manual screening for synergistic combinations costly. Computational methods, especially deep learning, can reduce the search space by predicting likely synergistic drug combinations. Recent studies have improved drug synergy prediction by modeling associations among different biological entities, but drug-drug interactions have not been fully leveraged in this scenario, which motivated the work presented in this paper.</p><p><strong>Methods: </strong>This paper proposes a deep learning method named HGTSynergy to predict synergistic drug combinations, which employs a heterogeneous graph attention network and a tailored task to capture complex latent patterns in the drug network as prior knowledge. The learned knowledge is then transferred through a transfer learning framework to the downstream task of predicting drug synergy scores, effectively enhancing predictive performance.</p><p><strong>Results: </strong>A five-fold nested cross-validation is employed to train HGTSynergy. In the synergy regression task, HGTSynergy outperforms seven deep learning methods, achieving a mean squared error of 222.83, root mean squared error of 14.91, and Pearson correlation coefficient of 0.75. For the synergy classification task, it also surpasses other methods with an area under the receiver operating characteristic curve of 0.90, area under the precision-recall curve of 0.63, accuracy of 0.94, precision of 0.72, and Cohen's Kappa of 0.52. The ablation study verifies that the heterogeneous graph attention network and the transfer learning framework both have a positive effect on prediction performance. Moreover, a series of analyses demonstrates that the proposed method exhibits strong generalization performance and interpretability. The case study further validates its consistency with prior research.</p><p><strong>Conclusions: </strong>This study suggests that drug synergy prediction can be improved by comprehensively modeling diverse drug-drug interaction types and leveraging transfer learning to extract prior knowledge from them. The ability of HGTSynergy to discover new anticancer synergistic drug combinations outperforms other state-of-the-art methods. HGTSynergy promises to be a powerful tool to pre-screen anticancer synergistic drug combinations.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"35"},"PeriodicalIF":3.3,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12870253/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145910284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}